Yes, AI chatbots can write code very fast, but you still need human oversight and security testing in your AppSec program.
Chatbots are taking the tech world and the rest of the world by storm—for good reason. Artificial intelligence (AI) large language model (LLM) tools can write things in seconds that would take humans hours or days—everything from research papers to poems to press releases, and yes, to computer code in multiple programming languages.
Which means, as a parade of organizations and researchers have already demonstrated, they can be used in software development. But if you’re thinking of joining that parade, you need to make sure it doesn’t take your organization by the wrong kind of storm…because it could.
Indeed, amid the excitement and amusement about what chatbots can do, there is an undercurrent of panic that they could soon do things that we don’t want them to do, and we won’t be able to stop them.
Actually, it’s more than an undercurrent. Late last month, the Future of Life Institute published an open letter signed by nearly 1,400 people so far, including tech luminaries like Twitter owner and Tesla CEO Elon Musk and Apple cofounder Steve Wozniak, calling for a six-month “pause” on research and training on AI systems any more powerful than the latest iteration of OpenAI’s ChatGPT, labeled GPT-4. According to the letter, “an out-of-control race to develop and deploy ever-more-powerful digital minds that no one—not even their creators—can understand, predict, or reliably control [means] we risk loss of control of our civilization.”
Of course that was met with another storm—of metaphors—declaring that it’s absurd to call for a worldwide pause on any technology that already exists. Take your pick: the horse is out of the barn, the cat is out of the bag, the train has left the station, you can’t put the toothpaste back in the tube, the genie is out of the bottle, or Pandora’s Box is already open.
But while all that is being hashed out (or not), the current reality is that ChatGPT and other AI LLMs are here, and they can write pretty good code very fast. Which could help you or hurt you. The trick is knowing how to keep it on the “helps you” side for your organization. Jamie Boote, senior consultant with Black Duck, sums it up in one sentence: “Understanding what AI is and isn’t good at is key.” Indeed, understanding that means you’re less likely to use it for the wrong stuff.
It turns out that, even in its early iterations, it’s quite good at some things. Given the right prompt, or set of prompts, chatbots can respond with amazing substance in seconds. No need to “think,” no need for background reading or interviews, and no need to spend time tapping a computer keyboard. The text just flows as if it had been copied and pasted—which it sort of has. Boote noted that since ChatGPT was launched at the end of November, its value is that it can do programming grunt work much faster than junior developers, and it works 24/7—no salary, benefits, or lunch breaks needed.
“You used to need a human brain for that,” he said, “but because ChatGPT has been trained—probably months and months or years and years of training of this model—all that upfront uploaded work means it can respond in seconds.” And as long as the massive amount of data it relies on is accurate, what you get is accurate as well.
But it turns out, “pretty good” doesn’t mean perfect. Enrique Dans, writing in Medium, called AI LLMs “an impressively scaled-down version of the text autocomplete function on our smartphone or email, which can seem ‘smart’ at some times (and at others, infuriatingly idiotic).”
A review of ChatGPT in ZDNet concluded that “if you ask ChatGPT to deliver a complete application, it will fail. […] Where ChatGPT succeeds, and does so very well, is helping someone who already knows how to code to build specific routines and get specific tasks done.”
And an article in Wired magazine noted that “these chatbots are powerfully interactive, smart, creative, and sometimes even fun. They’re also charming little liars: The datasets they’re trained on are filled with biases, and some of the answers they spit out, with such seeming authority, are nonsensical, offensive, or just plain wrong.” Or, when it comes to coding, lacking very important information.
A team of Black Duck researchers demonstrated recently that code written by GitHub’s generative AI development tool Copilot (created in partnership with OpenAI and described as a descendent of GPT-3) didn’t catch an open source licensing conflict.
Ignoring licensing conflicts can be very costly. One of the most famous examples of that is Cisco, which failed to comply with requirements of GNU’s General Public License, under which its Linux routers and other open source software programs were distributed. After the Free Software Foundation brought a lawsuit, Cisco was forced to make that source code public. The amount of money it cost the company was never disclosed, but most experts say it was substantial.
This shouldn’t be a surprise. As every vendor of AI LLM tools has acknowledged, they are only as good as the dataset they have been trained on. And as has been shown with ChatGPT, they will declare falsehoods with the same level of confidence that they declare truth. In short, they need adult supervision, as any human developer would. “AI tools can assist developers when used in the correct context, such as writing a unit test or troubleshooting a given stack trace or repetitive task automation,” said Jagat Parekh, group director of software engineering with Black Duck, and the leader of the researchers who tested Copilot. But he added that “generative AI tools are only as good as the underlying data they are trained on. It’s possible to produce biased results, to be in breach of license terms, or for a set of code recommended by the tools to have a security vulnerability.”
Parekh said another risk that isn’t getting much discussion yet is that an AI tool could recommend a code snippet to implement a certain common function, and for that snippet to become commonly used. And if, after all that, a vulnerability is discovered in that snippet, “now it is a systemic risk across many organizations.” So while vulnerabilities are found in just about every human-written codebase, with AI code that is broadly used “the scale of impact is much, much higher,” he said. That means software written by chatbots needs the same level of testing scrutiny that human-written code does, with a full suite of automated testing tools for static and dynamic analysis, software composition analysis to find open source vulnerabilities and licensing conflicts, and pen testing before production. “Attention to AppSec tools’ results would help enterprise organizations identify and mitigate compliance, security, and operational risks stemming from adoption of AI-assisted tools,” Parekh said.
Of course, there is general agreement that AI LLMs are still in an embryonic stage. They will only become more capable, likely for both better and worse. Parekh said it’s too early to know the long-term impact of the technology. “Overall, new versions of ChatGPT will require less supervision over time,” he said. “However, the question of how that translates into trust remains open, and that’s why having the right AppSec tools with greater quality of results is more important than ever.”
Or, put another way, use chatbots only for what they’re good at. And then remember that you still need to supervise them.