In a recent fuzz testing webinar, Rikke Kuipers, Defensics product manager, and Dennis Kengo Oka, senior solution architect, discussed how to use agent instrumentation to improve fuzzing of infotainment systems and telemetric units. Audience members had questions about fuzz testing, the Agent Instrumentation Framework, and Defensics, our fuzz testing tool. So we’d like to answer them here.
Fuzzing typically doesn’t have a false-positive rate. The confusion comes from the fact that when you execute the test case, there might be results in your system under test (SUT)—for example, some unexpected behavior or something that crashes—that aren’t immediately checked again if you execute only one test case.
If you keep your system and input the same—in other words, if you use the same test case—there should always be the same result in your SUT. However, if you’re running 20 test cases, for example, you might overflow a buffer in test case 20 and cause something to crash. That wouldn’t happen if you restarted your target system and just executed test case 20.
Using external instrumentation or the Agent Instrumentation Framework allows you to instrument that on a test case basis, so you can immediately see the impact of just one test case—for example, on the memory used by a process and the ability to detect that.
It comes down to metrics. Fuzzing has an infinite space issue in the sense that it’s negative testing. That means you can have an arbitrary number of test cases. For example, if you take one field in a packet and insert a single character, say ‘A,’ that’s a test case. You can insert two A’s in the field as a test case. And you can continue like that until you have one billion A’s in the field. Everything in between is a test case. It looks very fancy on marketing slides to have one billion test cases for this specific functional field. But it doesn’t make much sense when it comes to executing them against the target system.
There are various metrics that define what a good fuzzer should have. The first one is high-quality anomalies. Your normalization database should contain anomalies that are likely to trigger issues when inserted into various parts of the code.
Second, you should set up the instrumentation correctly to get information on whether a specific function crashes so you don’t execute more test cases to trigger that specific bug. For example, if you’re testing a specific field and you trigger an overflow, you don’t want to waste your time executing a hundred more test cases.
It’s also about coverage. The fuzzer should understand the protocol it’s testing against and create test cases that cover the entire breadth of the protocol. It should also be able to go very deep to the protocol stage while executing test cases.
Defensics is a good example of this. We have the full implementation of the protocol, so all the building blocks of the protocol are mirrored in our tool. As a result, we can take the model, multiply it by the anomalies we have using our normalization engine, and then dynamically create test cases.
That’s not really the intention behind how we approach fuzzing. Fuzzing is designed to look at the implementation of protocols running on the target.
Of course, there are various ways of instrumenting your target—via SNMP, via external instrumentation, and so on. So you can very easily get performance-related data from your target. For example, we often find when we execute test cases that we cause a spike in CPU and memory use. Defensics offers multiple ways of visualizing that, such as a graph that correlates all the different inputs. You can use that information to cross-correlate a test case causing high CPU spikes. Then you can loop that specific test case and see how it affects your target. But again, Defensics isn’t meant to be a load tester or any such tool. There are other tools with that dedicated functionality, and it’s not our core intent for the Defensics platform.
We haven’t created any specific agents for the AUTOSAR Classic Platform, but you can create your own agents and run them. Our agent framework allows you to be flexible and create the agents you want, so everyone using the Defensics fuzzer can easily extend the instrumentation used in it. The agents themselves are simple Python scripts, and the list of agents you can run is endless. For example, we have skeleton agents, which allow you to quickly drop any functionality you’d like to test into your own environment, whether for the AUTOSAR Classic Platform or any other target system.
No. In Defensics, we take the approach of specification-based fuzzing, so the entire model is mirrored in the tool itself. Based on that, we can execute and generate test cases. Basically, we’re acting like a genuine endpoint to the SUT. Since we fully understand the protocol, we can create test cases that (hopefully) trigger certain exceptions in the target and then use the protocol instrumentation itself to assess the health of the target.
Since we understand the protocol, we can also, for example, execute a sequence of methods and packages that tests specific functionality. For example, in the case of Bluetooth, we might test the streaming of an audio file to the target. This includes everything from connecting to the target to streaming to disconnecting. If we find that this sequence executes in a certain timeframe—depending on the protocol, that could be 500 or 1,000 milliseconds—and that it executes correctly in terms of the messages we get back from the target, we can mark the test case as a pass. Defensics has that level of awareness of the protocol itself.
Yes, you can control the speed of data sent to the SUT, but that very much depends on the protocol. We have over 270 protocols, all of which might have custom configurable options specific to the protocol itself. For example, there’s a frame rate limit setting you can use in CAN Bus that controls the amount of frames injected onto the bus per second. There’s a different option you can set in Bluetooth for injection over radio, and other options across different protocols.