Many people are talking about the attribution of the Sony hack. Was it or was it not North Korea? I do not care. I thought I would talk about a couple of things in driving towards attribution and analysis of a breach. This is high level with a couple of shallow dives to the technical. First…
There are three kinds of attribution:
- Political (usually based on who wins)
- Technical (looks at the weapons or techniques)
- Forensic (looks at evidence which can be scientifically validated for legal action)
There are three things that investigators should look at:
- Motive (who wanted this done, or could be induced to do some activity)
- Means (who had the knowledge, skills, and ability to do some activity)
- Opportunity (who had the capability and accesses necessary to do some activity)
There has been a lot of ink spilled on cyber kill chains, and I am not going to do that here. I want to talk about a few other ideas more broadly. An attack on an enterprise asset has a lifecycle. Think of it as a start and an end. The attacker wants to use an exploit that is quiet and not going to arouse suspicion if detected. In fact many authors state that most breaches are detected on the exfiltration of data from the enterprise rather than the origination of the attack.
As such, technical attribution takes on a specific level of difficulty. Attribution of the attack tools is specifically tricky. Since various security tools are dual use grabbing a library or even complete tool set from the Internet might hide my associations and provide anonymity. Artifacts from other entities will infest the code. As an example if I use a library developed by a Chinese speaker, a compiler set to Iranian language, an Israeli developed packer (malware compression) and send it all from an American Internet Protocol address the technical attribution will be muddied. The focus on those key artifacts is problematic. Though they can be indicators, they are not specific enough.
A second mechanism of technical attribution often focused on by the media is the concept of re-use. If a particular library was used by an entity in one attack, and then used by an entity in a separate attack it does not indicate shared affiliation. This is an issue because re-use of software and code is a principle taught to every computer science student in an accredited program. Re-use is a key in about every software development occupation. Once a tool has been used and discovered others will re-use that tool without compunction.
So far, we have focused basically on malware. There are other indicators in the enterprise that are available for identifying entities. When we say cyberspace we mean the humans, the technologies, and the activities of communication which includes things like the Internet and software applications. Cyberspace is more than just the Internet and networks it is made of three layers.
- Physical (The medium, and hardware)
- Logical (The software and protocols)
- Cognitive (The intelligence, social, and ideology)
We can look at what wires are connected to what places on the planet to attribute geography of a communication pathway. Sensors at the logical level can see traffic transiting the terrestrial and space networks for particular data packets. With enough sensor nodes even sophisticated cloaking and multiple pivot points (hops) can have reduced anonymity with nation state and large corporate resources. An Internet protocol address is a form of location awareness but absent physical constraints on the address (one way in and one way out) it is only an indicator but not a primacy of fact in analysis or attribution.
The logical layer is fraught with issues of ability to manipulate the reality and hide information. This is not a treatise on hacking, but suffice it to say that if you have root control of a system all things that system reports are suspect. As such log files and other indicators can be falsified. What can happen is introspection in to protected log files (stored off a server) can have early warning indicators that are found after the breach is discovered. Off line analysis of memory images accessed and downloaded may also have sufficient indicators or technical details for particular forms of attribution. If you can access memory, and if the memory dump is reliable.
The intrusion and detection systems of a breached company will provide some level of indication of actor intention, but if system administrator accounts are breached. If the accesses cross into the system owners accounts they can be compromised just like any personal computer. They are actually only hardware and software. Segmentation of duties can help but accesses and exploits on these systems are trivial once the administrator team has been breached. Intrusion detection systems work best as early indicators when people are watching them and responding to them in real time.
The cognitive layer is interesting. Many companies make their way using primarily ideological and messaging resources between threat actors as their evidence for an intelligence product. Reading messages on hacker boards, reading peoples social media activities, and getting to know the threat actors behaviors is a good way to keep tabs on a culture. However, it is of limited utility when dealing with sophisticated adversaries. Fortunately, very few adversaries are sophisticated or significantly vested in operational security practices. The confidence level of information gathered at this level is usually only useful if multiple independent sources can be accessed.
The real benefit in understanding these three layers of cyberspace is that they can be used for corroboration and validity checking across each layer. The number of entities identified as possible threat actors in each can be numerous. There may only be a few candidates that exist in all of the layers. Each piece of evidence gathered has to be analyzed and backed up by other pieces of evidence. Then each piece of evidence is analyzed to see if it fits into the puzzle of the particular breach. Unlike television crime dramas, these types of cases are not finished in an hour. Then again, I do not watch television so what would I know?
The analysis process can rely on software tools like Maltego, Palantir, and Splunk for timeline or entity relationship analysis. Depending on the volume of log files that can be trusted, this may be a big data problem requiring several rounds of data transform. Do not be caught up on the “new shiny” technical side of things. This is really an investigative cognitive effort. It is a time consuming art that requires patience to be done right.
As each of the layers provides evidence the filters of motive, means and opportunity are applied to create a suspect pool. Since this is cyberspace tossing a suspect early is a significant risk. The information is always considered to be tampered with when dealing with a breach. Corroboration of the evidence across multiple unassociated sensors is important. As that evidence coalesces into a story previously discarded evidence items need to be analyzed again to see if they refute any particular point. Exculpatory evidence as well as the probative need to be considered in an investigation.
Once all of this and some more magic is done a level of attribution can be achieved. Unless the threat actor signed their code with their home address, was already working for the investigating agency, or there is some other non-repudiation technique in place. Then we should not be reading this story at all. This stuff takes time and focus on one layer of cyberspace can have deleterious effects on the analysis functions and validity of attribution. I only used cyber exactly seven times in this article to talk about human and technical artifact interaction. No complaining.