The idea that hostile code slips into a computer program, even before the source code has been compiled into an executable application, is not new. This is why using third-party libraries and external code repositories may have some risks.
Researchers from the University of Cambridge published a study with ‘Cool New Tricks’ for planting targeted vulnerabilities in source code – which may be invisible to the naked eye.
It is a treatise on various techniques called the “Trojan Source”, with a clear reference to the famous wooden Trojan horse from Greek mythology.
The vulnerabilities they disclose will now be withheld for 99 days to coordinate detections with changes required by a number of compiler publishers, code editors and code repositories that will implement protections against the risks they represent.
Readable source code usually contains some comments, either in the form of unused code or text explaining what parts of the program do. The researchers failed to trick the compiler into interpreting the already commented code.
On the other hand, they found a technique to trick people studying the source code into believing that parts of what was apparently commented out were actually active code – which could end up in the program with potentially disastrous consequences.
They do this using what are called bi-directional algorithms (bidirectional text) and control characters, which are part of the Unicode standard for specifying the interpretation of the direction in which text should be displayed.
These mechanisms make it possible to display, for example, Hebrew and Arabic characters and words, which are read from right to left, correctly, including when quoted as part of other Latin text flowing from left to right.
These control signs are not necessarily visible to the naked eye, they are intended for mechanical interpretation. Cambridge researchers have discovered that most compilers and development tools are not equipped with the ability to handle bidirectional characters or indicate the use of such comments within source code comments, the website reports. register.
– We’ve figured out ways to handle encoding of source code files, which means people and compilers will notice different logic. A particularly malicious method uses Unicode directional control characters to display the code as minified anagrams of its real logic, the researchers write in advertisement.
The vulnerability is in the Unicode specification and has its own CVE identifier along with the tracking number. CVE-2021-42574.
Uses homologous characters
The thesis also describes the discovery of another similar attack, in which regular letters of the Latin alphabet are replaced by similar letters from other character sets.
Researchers believe that visually similar Unicode characters, or so-called homologous characters, can be misused to damage the source code. An attack they describe Report (PDF), which is to create false functions with names that reproduce a real function, but when a latin character is replaced by a very similar variant of the Cyrillic alphabet.
The last vulnerability was given the serial number CVE-2021-42694.
Matthew Green, a professor of coding at Johns Hopkins University, says research from Cambridge clearly shows that most translators can be deceived. They can be tricked into manipulating code in other unexpected ways, according to popular security writer Brian Krebs.
– Before I read this post, I wouldn’t be surprised that Unicode can be used somehow. What surprises me is how many compilers will happily interpret Unicode without any kind of protection – and how effective the right-to-left coding technique is at infiltrating code into codebases. It’s a really clever trick that I didn’t think was possible. Yikes, a computer science professor tells the site Crepes on security.
“Web specialist. Lifelong zombie maven. Coffee ninja. Hipster-friendly analyst.”