This
gives us options for inserting “invisible” whitespace that can
change/confuse code within C (within limits), Ruby, Java, and Swift
(using the Zero Width Space). Allowing a malicious actor the ability to
create code that behaves differently from it appearance.
Editor Behaviour/Syntax highlighting
So,
whilst an attacker can create malicious, but innocent looking code, the
attacker has to deal with the problem of syntax highlighting code
editors. We now investigate how editors show (or not) these hidden
characters. The editors tested are using their default syntax
highlighting for the appropriate language, changing to light/dark mode
or alternative themes were not tested, so your milage may vary.
Visual Studio Code
Visual
Studio Code does a good job of indicating that something is different
with the code, simply opening the file we see that the Mongolian Vowel
Separator is highlighted:
And can be seen when the mouse hovers over:
The same behaviour occurs when dealing with the Zero Width Space.
There
is an option to disable highlighting of invisible characters, but this
does not change the syntax highlighting which indicates a difference
between int (blue) and admin (white/light grey)
Visual Studio
Whilst
the attempts to compile the code on Visual Studio 22 failed with error
conditions, it’s still worth seeing if the syntax highlighting would
spot the code. If we open the file directly within Visual Studio (not
as part of an existing project) we see:
The syntax highlighting seems to differentiate between the two intadmin types, and when the file is included in a project:
It
becomes more obvious that something is wrong with the code. These
results are replicated when using the Zero Width Space character as
well.
Notepad++
By default we see the hidden character quite obviously:
However,
within notepad++ there is an option View->Show Symbol->Show
Non-printing Characters, if this is disabled we see the following:
Vi/Vim
Vi and VIM show us the Mongolian Vowel Separator and Zero Width Space as:
Emacs
Emacs,
on the other hand, does not show us the Mongolian Vowel Separator or
Zero Width Space character, but makes it obvious by way of syntax
highlighting:
Eclipse
Eclipse is typically the domain of Java code, so looking at the working Java example within Eclipse we see:
Both
the Zero Width Space and Mongolian Vowel Separator are not visible, nor
is there any difference in syntax highlight to indicate that something
is up with the code.
Clearly Eclipse is the editor of choice for hiding our malicious code in Java.
Code Repositories
So
far we have identified languages and characters that allow us to create
code that looks one way and acts another, allowing a bad actor the
ability to hide malicious code or a potential backdoor within a
codebase. The next obvious question is can we put our bad code
somewhere where it will not be seen, but still used. We must therefore
look at code repositories. Here we shall investigate three
- GitHub (home of 28 million public repositories)
- GitLab
- BitBucket
If we can hide our code in any of these...
GitHub
GitHub
has a desktop application that allows developers to manage their
repositories and push changes up to Github.com. The tool allows the user
the ability to review the history of any file and the changes made to
them.
Looking at our malicious example:
and zooming in to the interesting part:
The
syntax highlighting here does not indicated in anyway that line 10
contains our evil Mongolian Vowel Separator Character. The same is true
for the code with the Zero Width Space:
The
Github.com website itself has a number of themes that can affect the
syntax highlighting colours used, but there are two “defaults”, Light
Default and Dark Default. I tend to work with the dark theme for most
things, so viewing our code we can see:
There is a very subtle change on line 10 between the int (light grey) and admin (white) which is likely to go un-noticed.
The in-built editor mode however
Does not have this subtle change!
Testing the default light theme:
Does not show any differences, this is replicated in the editor as well:
So, there is scope for hiding our Mongolian Vowel Separator in code stored and published on GitHub
GitLab
Gitlab uses VSCode as its web API, so it highlights the hidden character when editing files stored in GitLab:
However
the code display exhibits a similar problem to GitHub, in that the
“Light” syntax highlighting themes may be too subtle to spot any
oddities:
The Dark themes make the code differences more obvious. This behaviour is the same when dealing with the Zero Width Space.
When
viewing the committed change, the code is syntax highlighted, but like
the viewer, the highlighting is subtle, and hard to spot:
Bitbucket
During the initial research the Bitbucket Editor did not highlight the syntax in its default mode:
Making it impossible to spot the hidden character by differences in the syntax highlighting.
The viewer, however, shows a subtle difference (the int keyword is slightly bolder):
But again, this may be missed.
Since reporting this to Atlassian, they have altered the syntax highlighting in the viewer:
However there is no change in the editor.
There is a more obvious difference when using the Zero Width Space, the editor clearly shows the hidden character:
However, the viewer exhibits the same behaviour as it does when handling the Mongolian Vowel Separator.
The
committed change does not have any syntax highlighting visible, and
therefore would not be spotted, if performed a code review, the reviewer
would likely miss the hidden character.
Results
|
|
C
|
Ruby
|
Swift
|
Java
|
|
GitHub Desktop App
|
Malicious code
hidden
|
Malicious code
hidden
|
Syntax highlighting
is obvious
|
Malicious code
hidden
|
|
GitHub
|
Viewer – very
subtle syntax highlighting
Editor – malicious
code hidden
|
Viewer - Malicious
code hidden
Editor – malicious
code hidden
|
Viewer - Malicious
code hidden
Editor - Syntax
highlighting is obvious
|
Viewer – Malicious
code hidden in light mode, very subtle syntax highlighting in dark mode
Editor – Syntax
highlighting is obvious
|
|
GitLab
|
Viewer - very
subtle syntax highlighting in light mode, more obvious in dark mode
Editor –
inline VS code highlights missing character
|
Viewer –
Syntax highlighting is obvious.
Editor –
inline VS code highlights missing character
|
Viewer - very
subtle syntax highlighting in light mode, more obvious in dark mode
Editor –
inline VS code highlights missing character
|
Viewer – very
subtle syntax highlighting in light mode, more obvious in dark mode
Editor –
inline VS code highlights missing character
|
|
Bitbucket
|
Viewer – Syntax highlighting is obvious.
Editor – MVS
hidden, ZWS highlighted
|
Viewer - Malicious
code hidden
Editor – MVS
hidden, ZWS highlighted
|
Viewer – very
subtle syntax highlighting
Editor – MVS
hidden, ZWS highlighted
|
Viewer - Malicious
code hidden
Editor – MVS
hidden, ZWS highlighted
|
In
most cases all the repositories viewers are either very subtle in their
highlighting (and therefore could pass a visual code inspection) or
invisible to the naked eye.
When editing on the websites, only GitLab by using Visual Studio Code is consistently showing the hidden characters.
Prior Work
This research was inspired by two previous works, firstly Trojan Source (https://trojansource.codes/)
where Unicode bi-directional control characters were introduced into
source code such that the code that was being read by a human (at say a
code/pull request review stage) does not match the code that the
compiler will ultimately compile. The classic example:
Contains strategically place Bi-directional control characters so that you are, in fact, admin.
Related
work considers the use of homoglyph attacks with identifiers, where
similar looking characters are replaced to add visual confusion e.g.
replacing Latin with their Cyrillic equivalents (https://www.irongeek.com/homoglyph-attack-generator.php
helps with such attempts). Trojan source also considered “invisible
characters” without specifying the “invisible characters” used and noted
in the original paper that such attacks failed. Here, I believe we
show, given specific characters, success across several languages.
The second piece of work is a blog post from 2014 (https://codeblog.jonskeet.uk/2014/12/01/when-is-an-identifier-not-an-identifier-attack-of-the-mongolian-vowel-separator/)
mentioning the use of the Mongolian Vowel Separator within identifiers
in C# code, and how the two different compilers (csc and Roslyn) handle
them. Which identified the unusual history of the Mongolian Vowel
Separator as a sometimes whitespace, sometimes control character.
Conclusions
By
using unusual "whitespace" characters like the Mongolian Vowel
Separator and the Zero Width Space it is possible to create code that
looks, by visual inspection, like one thing, but the compiler behaviour
and end results are different. Enough that many eyeballs can miss the
maliciously inserted characters.
The languages that are affected by this issue are:
As
a developer, thankfully most IDEs have the ability to highlight either
"odd" syntax highlighting, or highlights the "invisible" character.
Some editors can be configured to hide this, but in most cases the
highlighting is on by default. The only editor that "fails" is Eclipse
when dealing with Java code.
But
if an attacker can get their code uploaded to one of the main code
repositories, either by a malicious pull request to an existing repo, or
posting an interesting, innocent looking, library. There is every
chance that the lack of appropriate syntax/hidden character
highlighting, that the malicious code will not be spotted
So
there is scope for a very subtle, but potentially devastating supply
chain attack that can bypass the many eyes problem as developers look
through the code in their favourite code repository.