Injections, where code meets data

CC0 image by Henryk Krzyżanowski

Injections are still one, if not the most serious, flaws a developer can make. We deal with more and more complicated data structures, and the mindset we have around it hasn’t evolved much over the past years.

You have surely heard about SQL injection before. You have learned how to defend against them and have a repertoire of tools to aid that process. It’s complicated and full of rules and pitfalls.

A small shift of mindset is worth 100x more than a bag of tools and techniques. That’s what you will get, a solid fundamental that you can build on. We’ll deconstruct injection and build it back together. I promise you won’t ever think of them the same way.

Let’s go!

The basics

There are three requirements for an injection to occur. First, you need to be manipulating data that has some structure. Second, you must put input into that. And third, someone must process the results. An example is in order.

You are constructing logs. For the sake of simplicity, suppose your log file has a straightforward structure. It separates entries using a newline character. That’s it, no dates, nothing, just the ‘\n’ character. Structure, check!

Next, after a new log creation, you simply append it to the file:

log = 'X event'

Safe and secure so far! To make things more useful (and riskier) you also add some user input to the mix:

log = 'X event: ' + userInput

And finally, there is also a component in your app that reads this log file and shows you all the events:

logs = logFileHandle.readAll()

The structure in action

Take a few seconds to look through the previous code and see where the data’s structure appears within it. Hint: splitBy('\n'), writeLine().

Let’s see how our code interacts with data.

userInput -> '1'
userInput -> '2'
userInput -> '3'
userInput -> '4'

# logfile content #
X event: 1
X event: 2
X event: 3
X event: 4

The log file’s structure is intact. Data interpolation happened safely.

CC0 image by Dominika Roseclay

Now, an attacker gets smart and tries to inject a log that never actually happened. How? Watch!

Remember, any line in the log file will show up as a separate event on the UI.

userInput -\> '1'
userInput -\> '2\nY event'
userInput -\> '3'

# logfile content # X event: 1 X event: 2\n Y event X event: 3

By choosing a malicious userInput, the attacker was able to escape from the scope of a single line and modify the structure of the logs.

What went wrong?

Whenever we incorporate data into a structure, we always have assumptions about the input. Sometimes validated, other times we trust it blindly.

In this case, we assumed that userInput is scoped to a single log event, it is bound to the X event: part that comes before it.

That doesn’t hold any more, does it? Once parsed, Y event will be indistinguishable from events that truly happened. So which line is responsible for the error?


This is where the injection happens. Notice that the component we are using to write our log files does not know the log file’s structure.

You could argue that it does because writeLine() is connected to the structure. Good catch, but that’s more of a coincidence than a conscious choice. It just so happens that lots of languages provide line manipulation capabilities for filehandles. But writeLine() is still just file manipulation; it is utterly blind to its argument.

Let’s generalize our problem statement.

Injection happens when the code does not know the structure of the data it manipulates.

Let’s look at a few examples.

Injection examples

Try identifying the underlying assumptions in each case!


query = 'SELECT column FROM table WHERE id = ' + userInput

Supplying 1 or 1=1 as userInput would yield:

query = 'SELECT column FROM table WHERE id = 1 OR 1=1

It is essentially nullifying the WHERE statement. Notice that we intend to be able to interpolate data into that query. Because we use string manipulation to achieve this, an attacker can escape the data scope and modify the structure of the query. Why?

String manipulation does not know an SQL statement’s structure.


doc = '<tag><tag2>text</tag2><tag3>' + userInput '</tag3></tag>'

Giving a string with tags in it as input: </tag3><tag2>injected</tag2><tag3> will yield:


Again, pure string manipulation has no idea about XML or HTML. Now, consider this, how do we usually render webpages server side? Using string manipulation or using DOM aware classes?

Exactly! No wonder XSS is still a thing. Don’t get me wrong; there are good reasons why we do this. Most of it has to do with performance. String manipulation is fast and streamable! What a trade-off : )

Homework: what about rendering on the client-side? How is that different?


OS Command

cmd = 'ping ' + userInput

This is one of the deadliest. Inputting && ls will yield:

ping && ls

This is as bad as it gets - direct command execution within our app.


query = '(&(user=' + userInput + ')(pass=' +  userInput2 + '))'

Giving this username)(&) yields:


This is bypassing the password check completely.

There are lots more examples: header injection, XPath injection, CRLF injection, etc… You get the idea.

Anything that has structure and is manipulated by code that is unaware of that structure is injectable.

Let’s talk a bit about the third requirement.

…someone must process the results…

Injection in itself isn’t a problem up till the point that someone relies on the structural integrity of the data. This is very important, as lots of time, the entity processing this data would also be capable of manipulating it safely. This concept will play a crucial role when we talk about defenses. Hint: SQL prepared statements.

So am I vulnerable?

CC0 image by Alberto Barco Figari

Although there are some exotic cases, most injections are easy to recognize and follow the same pattern. They almost exclusively occur near system boundaries. Take a look!

Injection typeSystem 1System 2Use-case
SQLYour appDatabaseQuerying
CommandYour appOS shellRunning OS commands
HTMLYour appBrowserGenerating a page
Email headerYour appEmail clientSending email
XMLYour appYour appWriting/Reading app configuration files

Why? Simple!

Whenever two systems interact, they need structured data to exchange information. When there is structure, there can be injections.

Keep this in mind when you are at the whiteboard designing.

During coding, make sure you use classes and libraries that are aware of the structure of the data you are manipulating. We’ll get more in-depth on this in the next post about defenses.

How bad is it?

It depends. Injections can compromise all three aspects of security: confidentiality, integrity, and availability.

This one is easy. Using SQL injection, an attacker is able to access data bypassing authentication and authorization rules completely.

Think about our first example. Log event injection. If an attacker can do that, then the integrity of your log files pretty quickly converges to 0.

Now this one is a bit tricky. An attacker needs to inject data that will cause the system to use lots of resources. Like consuming some CPU and entropy using command injection.

ping && cat /dev/random > /dev/null

The impact of an injection can be grave, so don’t underestimate it.

20-second injection primer

Injection happens when the code does not know the structure of the data it manipulates. They mostly occur on system boundaries. The consequences of them are wide-ranging, usually quite serious, potentially affecting all three pillars of the famous CIA security model.

As a general rule of thumb, when manipulating data, always use classes and components which are aware of the structure of the data. That’s the theory.

In the next post, we’ll look at how this looks like in practice, and why it is very complicated in some cases.

What do you think? Don’t agree with something? Don’t keep it to yourself; share it in the comments!

comments powered by Disqus