Nearly thirty years have passed since I first started using the internet – I reflect on changes I have seen and ask if anything has changed.
I first used the internet way back in the mid 1990’s – at that time it was still quite “new” and most people hadn’t begun to use it. In 1995, less than 1 percent of the world’s population accessed the internet. Created by academia, it was still to catch on in the workplace as a source of information, but librarians like me had already worked out its potential.
Even then I knew just how difficult it could be to search it properly, with pages and pages of hits; it was a difficult task finding the information needed. I taught myself how to use it properly by studying each search engine – back then Google hadn’t been invented – and working out which worked best. Alta Vista was considered one of the best search engines at the time. I was working in the Health & Safety Executive and I provided short training sessions for the staff on how to use it. This quickly expanded to a full training course where I taught my colleagues how to search for information and how to know if it was a credible website.
Moving to HM Customs & Excise in 2002, my course escalated into a full time job, travelling around the country delivering the training. By now, the course had extended to become a full review of search engines, how the internet worked and ground-breakingly at the time – the deep and dark web – or “invisible web” as it was known then.
Feedback told me little was understood at that time about how the internet worked; deep and dark web had not been heard of. So, has much changed in twenty years?
Dark and deep web are known of, but still not fully understood and how the internet works gets harder to fathom out! In 2002, Google, having been around for 4 years, had indexed approximately 16% of the internet by crawling the top few pages of a website. It’s now estimated that Google has indexed only 0.004% of the internet – hardly surprising when it’s grown to about 5 trillion megabytes.
There are around 4 billion websites indexed on Internet search engines and accessible to the average user. These websites are on a part of the Internet most people interact with daily, known as the surface web. The deep web, on the other hand, consists of areas of the internet not easily accessible on a browser. This includes virtual private networks, corporate databases, and websites not directly linked to the surface web. It is estimated the deep web is considerably larger than the surface web—by 400 to 500 times. Only 0.2 percent of the internet is estimated to be on the surface web; the other 99.8 percent of internet data resides on the deep web.
This explains why Government and law enforcement investigators find using the internet – or Open-Source Intelligence (“OSint”) as its better known – so hard.
Challenges faced in public sector investigations
Let’s start with a definition of Open-Source Intelligence, “intelligence derived from publicly available information that has limited public distribution or access” is how the Ministry of Defence defines it and the National Police Chiefs Council (NPCC) defines it as “publicly available information”.
Why would investigators want to use it? If they have access to their own data and buy in third party data, why isn’t that enough?
The Police Foundation’s view is that it’s, “a critical component of modern intelligence and investigative tools. The volume of data available online is constantly growing, providing investigators with a rich information source to draw from. The insights that OSINT can offer are unlikely to be found in internal datasets, curated databases, or sanctions lists. Failure to make use of open source data can lead to both embarrassment and intelligence failure”.
I agree, OSINT – given the size of the internet – is much greater than third party data – and much cheaper, so should be used in investigations, proving a rapid and economic way of understanding of the entity under scrutiny.
Interestingly, in a police intelligence context, ‘publicly available data’ extends to financial data (such as credit reports and bank details), vehicle data (from DVLA and insurance specialists) as well as data supplied by third party aggregators to law enforcement. While a lot of this data is compiled or aggregated, not all of the data provided is consented data – credit refence data, while provided to law enforcement agencies, is for a closed user group (i.e. a shared user group of banks and credit lenders). Sharing with the public sector is usually only for specific purposes, such as prevention and detection of crime, legal proceedings, or national security.
The Police Foundation says that LexisNexis datasets offers 6 petabytes of data, versus the entire internet at 1,200 petabytes. As a result, “investigators could be missing out on 99% of available data, meaning that they will almost certainly miss valuable insights” – this is certainly the case in the shooting at Plymouth in August 2021.
The Plymouth shooting of 5 people by Jake Davison found he has been discussing the incel movement on social media, a group populated by young men who describe themselves as “involuntary celibate”. He initially had his firearms license revoked following his involvement in an assault in 2020. However, the licence and shotgun were returned in July 2021.The Independent Office for Police Conduct (IOPC) is investigating the circumstances surrounding the issue and later re-issue of the licence in question. This has led to new guidance being issued in November 2021 which requires a force’s firearms licensing authority to review “Information obtained from open-source social media”. Had those checks been carried out prior to his license being reissued in July 2021, it might have been a different story.
So, if it’s such a rich source of information, why isn’t OSINT being used?
I believe this is due to the size of internet and a lack of understanding of how to find the right information.
Searching has changed with the advent of Google in 1998. Prior to that, search engines used keywords for searching and only AskJeeves used a question – now it’s expected that Google keywords should be able to find detailed and specific search results, but they don’t. Despite having sophisticated algorithms, the user is still presented with pages and pages of hits and can spend hours trawling though them looking for that nugget of information about the entity in question. This “problem” is drastically reduced when Boolean Searching is used – the use of AND, OR, and NOT cuts down the volume of results. Using other tips such as speech marks or “in” help to narrow down the results even further. These are usually part of the basic internet training sessions public sector attend. However, not everyone attends these courses as they can be expensive, so most staff rely on colleagues and word of mouth to learn these. There is no set standard to which Government or Law Enforcement investigators should be trained to. This art of searching is known as “tradecraft”.
Even with tradecraft, how can the search be improved?
The range of data available on the internet is vast – from news, social media and other user generated content to grey literature and commercial sources. This can lead to the user wandering off down a rabbit hole following a particular path and not only forgetting where they are up to but missing out many other potential leads. Keeping track of the search is difficult and very time consuming.
Having a process for approaching a search is vital. Web scrapers can help, but often the information retrieved is viewed out of context and becomes misleading. Many investigators, even with a planned process, invariably end up writing down the sites visited or copying and pasting them into a document to keep track; that’s not to mention keeping track of recovered files, spreadsheets, word documents and much more with a compliant audit trial. Couple this with using the third-party sites we mentioned above and the problem becomes yet more compounded. The investigator needs to log in and out of third party data, search engines, internal data sets and much more, often using multiple different passwords and user-names. Third-party data can’t be searched by the internet as it sits behind a paywall, neither can sites that sit behind firewalls and require logging into. Not helpful!
So, if Google only have 0.004% of the internet, how do you find the rest?
The rest is known as dark or deep web, i.e., below the surface of the web. Deep web is made up of content that is not indexed by search engines and usually needs accreditation to access – so, technically, third-party data sits in here, alongside subscription based sites, password protected sites and sites accessed via an online form. However, a large amount of this content is still readily available to the public via open-sources.
Dark web is much more anonymous than the ‘surface’ web and is often where illegal or nefarious activities take place such as drug dealing, weapon sales and exploitation/trafficking. Government use the dark web much less than Law Enforcement do. The UK Government has taken steps to counter these growing threats. Significant investment into their intelligence capabilities has increased an understanding of the threat and enabled them to conduct more effective covert campaigns.
A 2014 study found the most common type of content requested by those using hidden services via ‘The Onion Router’ (Tor) – developed in 2002 – was child pornography followed by black marketplaces. Researchers at King’s College London found that 57% of the hidden-services websites within Tor network facilitate criminal activity, including drugs, illicit finance and pornography.
In contrast to surface web browsers, the Tor browser allows users to connect to web pages anonymously by bouncing connections randomly between Tor nodes to obfuscate the IP address of the end user. The anonymity Tor provides makes it an attractive tool for users who wish to engage in illegal activities.
So, what have Law Enforcement Agencies and Government organisations done to tackle this?
In 2015 the UK announced a dedicated unit for tackling dark web crime called ‘Joint Operations Cell’ or JOC. This is a joint, co-located, initiative between the National Crime Agency (NCA) and Government Communications Headquarters (GCHQ) to “”increase our ability to identify and stop serious criminals, as well as those involved in child sexual exploitation and abuse online. This is a challenging task as we must detect them while they attempt to hide in the mass of data. We are committed to ensuring no part of the internet, including the dark web, can be used with impunity by criminals to conduct their illegal acts.”
The UK launched a five-year National Cyber Security Strategy in 2016 that included £1.9 billion of investment and established the National Cyber Security Centre and the UK government also launched the £13.5 million Cyber Innovation Centre to help enhance the UK’s global reputation in cybersecurity.
Is it working?
Yes, The National Crime Agency were heavily involved with one of the largest ever international operations targeting a criminal dark web marketplace in October 2021. The operation, known as Dark HunTOR, saw officers from the Dark Web Intelligence, Collection and Exploitation team (DICE) – a joint unit comprising experts from the NCA and policing – analyse the UK data and identified the criminal dealers assessed to present the highest risk. Intelligence was then passed to Regional Dark Web Operations Teams – based within Regional Organised Crime Units and the Metropolitan Police Service – who arrested the individuals on suspicion of selling criminal goods on the dark web.
It is difficult to locate the dark website a criminal used to commit a crime. As a result, criminal activity on the dark web is often very difficult to identify and track, let alone gather enough information to obtain a conviction. Police and Government do their best – upskilling officers to tackle this. However, these officers need to be kept upskilled with regular training which costs money, and they cover too wide an area to be 100% effective. Having a tool to remove the manual data aggregation would certainly help, and may negate the tradecraft expertise and constant upskilling of the officers.
What else can I do?
There are a number of areas to address to help the public sector increase their chances of finding these lines of enquiry.
Specialist training in the use of OSint, deep and dark web is a must. There are many training providers out there, but care must be taken to find trainers who fully understand what they are training, and preferably, ones that have “walked the walk” and actually been an investigator themselves.
Without additional staff, both the Government and Police are drastically under-resourced to properly research OSint for evidential links to criminality. It doesn’t help that with the pandemic, Government implemented a recruitment freeze, preventing posts being filled. Its only now in some parts of the civil service that the freeze is lifting. Police forces have also been under-staffed. The Uplift programme – the Government’s manifesto pledge to recruit an additional 20,000 police officers by March 2023 – is hampered by slow and manual vetting checks and the full quota may not be reached in time.
Expertise and experience of applicants
The move out of London within the civil service estates will not help – many skilled and experienced staff will leave, and their replacements will need training and experience – taking many years to get to the same levels as today.
OSINT is, according to the Police Foundation, a “critical component of modern intelligence and investigative tools. The volume of data available online is constantly growing, providing investigators with a rich information source to draw from. The insights that OSINT can offer are unlikely to be found in internal datasets, curated databases, or sanctions lists. Failure to make use of open source data can lead to both embarrassment and intelligence failure”.
It is hard to stay one step ahead of the criminal minds, but if we get the training and staffing levels right, it will certainly help.
To facilitate increased use of OSINT within investigations systemic, strategic and technological change is needed.
Firstly, organisations need to shift towards more flexible commercial and procurement methods that reflect the reality that many high-quality open-source tools are to be found in early-stage companies. These companies sometimes find that they are accidentally designed out of the complex procurement processes in Government and other large institutions. G-Cloud 13 and hopefully the 2023 procurement legislation will see more SME’s becoming successful at winning business in the OSint world.
Legislation effecting OSINT searching
The Association of Police Chief Officers states “online research and investigation is a powerful tool against crime”. It can also present new challenges to law enforcement as the use of OSint can still interfere with a person’s right to respect for their private and family life which is enshrined in Article 8 of the Human Right Act 1998 and ECHR. They have provided a helpful guide that, whilst aimed at police investigators, would be equally useful for Law Enforcement teams.
Aggregated data provision, which includes OSint, dark and deep web is a must. But without the right tool, it just makes the problem worse. Public Sector need to more aware of this ever changing market and ensure, before any tenders are released, they have taken full advantage of market sector engagement – whereby suppliers are invited to put forward ideas, innovations and solutions that may be of interest to the buyer.
The main problem is – if the investigator cannot articulate what they need – how is the procurement team supposed to turn unclear requirements into a specification? Market engagement with suppliers, attending conferences where suppliers are exhibiting and training the investigators to understand what tools and techniques are new and innovative is the best starting point.
How can Synalogik help?
Our founders were in your shoes – investigators looking for that needle in a haystack, but swamped by a tsunami wave of results, sources and false positives. Deciding to tackle the problem, they created Scout® – a single platform that allows the user to search across OSint at the same time as third party and Government/Police data – applying a risk assessment over the results to narrow down the hits – making them focused and pertinent. Our expertise and tradecraft have been built into Scout® to help investigators make efficient use of the platform and save up to 85% of their time in aggregating the data versus manually logging in and out of the sources available. This year with the support from UK public sector clients we were nominated for and won the Queen’s Award for Innovation; sadly the last Queen’s Award for many years to come.
This needs clarifying, or qualifying further. It’s an important point, but I don’t know what is meant by it exactly.