14 min read

Challenges in Open-Source Intelligence: Managing Uncertainty and Information Quality

These days, the demand for OSINT (Open-Source Intelligence) services is higher than ever. By some estimates, the OSINT market will reach over $6 billion by the end of 2027 — a sizable jump from its $3.8 billion worth in 2020.

The reason for this is simple — there’s plenty of demand for open-source intelligence because many companies are looking to increase their volume of data gathering from public sources to receive critical business insights.

In turn, these insights are crucial for business planning, resulting in a competitive edge compared to other companies. However, this isn’t the only reason companies turn to OSINT vendors in droves — business intelligence is also a security matter.

Third-Party Risk Management

Our global economy is increasingly relying on outsourced services, which means many companies’ reliance on suppliers is steadily growing — especially in terms of security. Consequently, the need for efficient TPRM (Third-Party Risk Management) is becoming more critical than ever.

Companies need to be sure of how safe their corporate information is — and whether there are any supply chain risks to be dealt with promptly. This demand has given rise to a market of TPRM service providers who work to help their clients understand the risk levels brought by each supplier relationship. Then, they also assist in managing and reducing those risks.

These days, those services frequently include a combination of proprietary in-house algorithms and OSINT — resulting in ratings that remind of individuals’ credit scores and help companies gain a clear overview of their cyber security across the supply chain.

Importance of TPRM

All things considered — if your company outsources any services to suppliers or you have any other type of supply chain connection with third parties, you need to understand the threats and vulnerabilities they bring to your cyber security.

So, what are some of these vulnerabilities in practice?

For starters, any of your suppliers may likely be able to access your company’s and, more importantly, your customers’ data. As part of your company’s GRC capabilities and efforts, you must ensure that your third-party supply chain partners take cybersecurity and data protection just as seriously as your company does.

To achieve this, most companies have traditionally relied on contracts with those third parties — stipulating the same level of data governance and security as the company maintains. However, in practice, this can only get you so far, particularly in the age of cyber-attacks.

By the time you pursue legal action against a third party for exposing you to a data breach, the far more significant damage is already done — both from a financial and a reputational standpoint.

In short, this means that the digital age requires a more efficient and proactive approach to monitoring and assessing third-party supply relationships, especially those with open access to your company data.

Remember: the most robust and comprehensive internal cybersecurity program won’t be worth much if even one of your vendors fails to prevent a breach — exposing you to vulnerabilities simultaneously.

So, TPRM is indispensable to modern business — but where does OSINT enter the picture?

The Role of OSINT

Regarding TPRM, open-source intelligence has become an indispensable source of valuable data. This is open-source data from the Internet, practically available to most online users.

However, the power of OSINT lies in the volume of collected data. When gathered and analyzed adequately on a large scale, OSINT data can yield powerful insights, which is why it’s become a staple in both business intelligence and law enforcement investigations.

Considering this, what does an OSINT investigation look like — and are there any downsides to obtaining readily available data from a wide range of sources on the Internet?

OSINT Investigations 101

Companies and their key decision makers aren’t the only ones who use open-source intelligence while weighing their choices and making complex decisions. Countries and the politicians who run them have done the same for ages — centuries before the invention of the Internet.

For instance, the Russian Tsar Nicholas II once said that his country didn’t need spies when it came to subterfuge against Britain — they just needed the latest issue of the Times. And that was at the end of the 19th century!

Five decades later, Allen Dulles, the first director of the CIA, would say largely the same thing — claiming that more than 80% of the intelligence gathered by his agency was collected from publicly available sources.

Of course, the reach of OSINT investigations was far more limited before the rise of the Internet and digital media outlets — researching the newspapers, radio, and television media in another country was a far more expensive endeavor.

In fact, OSINT investigations were prohibitively expensive for almost any organization, save for entire countries — most companies had not developed OSINT methods of gathering business intelligence just yet, save for the biggest ones.

All of that would change with the advent of social media — suddenly, everyone using social media became a media outlet of their own, hugely broadening the horizon of OSINT investigations.

Similar to the traditional investigations centered around gathering intelligence, OSINT investigations revolve around locating and collecting evidence that supports your organization’s goals.

In practice, an OSINT analyst will rigorously search open web spaces for relevant data that could be crucial to their operation — like social media platforms and online news publications.

Once the process of collecting the data is finished and analysts have a reasonable amount of raw data to work with, the time comes to move on to the next part of the process — converting all of that data from its basic form into something resembling actionable intelligence.

So, to surmise — OSINT investigations function similarly to any other type of investigation, with one key difference: collecting evidence is conducted in public spaces, such as publicly available web pages and directories.

The Benefits of Open-Source Investigations

As you can see, publicly available data could provide an incredible amount of value to any investigation. And with the barriers of entry to research being so low on the Internet, it’s not difficult for TPRM and OSINT investigators to easily find crucial pieces of information.

When this information is collected, investigative teams go through the usual OSINT cycle of turning this information into actionable intelligence, streamlining their findings through OSINT reports, and finally sharing said findings with all the appropriate stakeholders.

Naturally, a modern company with numerous third-party exposures on the Internet can reap plenty of benefits from such an investigation in terms of security. Most of the information you need to conclude whether third-party suppliers and vendors are exposing you to security risks is already out there and publicly available — all it takes is a thorough OSINT procedure to reveal it.

Here are some other ways in which OSINT investigations are beneficial:

● Cost-effectiveness — modern OSINT investigations are far less costly compared to traditional intelligence sources, like human agents or advanced technology.

● Ease of access — Seeing as OSINT data is publicly available, it’s logically also much easier to access.

● No legal issues — Many legal issues arise when you need to share information obtained from other intelligence sources with third parties. However, as OSINT info is already available to the general public, you and your company don’t need to worry about infringing copyright laws or NDAs.

Risks Associated with OSINT Data

Despite the enormous benefits that OSINT investigations provide, there are also substantial risks associated with their methods of acquiring intelligence.

For instance, third-party threats have ways of tracking users’ activities on the Internet via tracking links — these URLs contain additional code that shows how users reach a certain website.

So, all investigative activity could leave a digital trail — which, when investigating threats, can ironically make them exposed to malicious attacks. That’s why OSINT investigators have to take additional steps to hide their Internet traffic from third parties.

Also, if malicious actors discover that they are being investigated, they frequently retaliate — often by trying to hack the organization investigating them or even by planting misinformation that could lead OSINT investigators down the wrong trail.

While proper OSINT training may counteract such risks, misinformation remains a huge problem — and one that’s not exclusively created by malicious actors trying to cover their tracks. In practice, most of the misinformation on the Internet appears in the form commonly known as “fake news”.

Fake News — The Problem With Misinformation

Right away, we should point out that fake news is by no means a new phenomenon. As a form of propaganda, it’s centuries-old — practically as old as news outlets. However, the easily accessible nature of news in the digital age has brought forth a time of misinformation and spin on a scale that’s never been seen before.

The open media distribution channels created by social networking platforms have made spreading messages as easy as typing them into a phone or a computer. You can instantly share anything with the world — unfortunately, the viral nature of modern information also means fake news is far easier to disseminate.

Fake news almost always carries severe repercussions, possibly endangering individuals and whole societies. From the cyber underworld to gangs, terror groups, all the way to politicians of all stripes and colors, everyone seems to have accepted fake news as a legitimate means to an end.

While this creates serious issues for law enforcement and national security agencies across the world, misinformation also makes open-source investigations far more complex than they would be otherwise.

The core challenge posed by fake news to OSINT professionals is quite clear — intelligence analysts have to spend a lot of time, money, and human resources just to distinguish reliable sources from fake news. It’s a necessary process because it’s the only way to use open source channels to derive usable intelligence effectively.

However, fake news hinders that investigative process — especially on social media, where it complicates analyzing and identifying qualitative intelligence immensely. And the viral nature of social media only exacerbates the problem further.

While most social media users aren’t actively and consciously working to convince others of falsehoods, many still end up acting as unknowing vehicles for the spread of fake news. That’s where most of the complexity lies for agents and analysts who constantly rely on open sources for the extraction of valuable insights.

Fake News — How Does It Work?

Originally, most of the fake news found online is created on relatively obscure and niche websites. However, that’s not the form in which fake news is most powerful. It reaches its full potential once it’s promoted and distributed through social media, usually with the support of the creators and an initially small group of followers.

Sooner or later, it reaches innocent bystanders who don’t know how to discern it from actual news and legitimate information. Usually, these bystanders are targeted specifically while the fake news is still being created. Certain platforms and fake news topics target people from different socioeconomic and educational backgrounds, genders, religions, or geographical regions.

So, the question is — can OSINT agents and analysts cope with the modern version of fake news distributed online?

In a word: yes. But it’s impossible without the use of the very thing that helped fake news achieve such an astronomic growth — advanced technology.

Today, a lot of human manpower is needed to reliably identify falsehoods, even if we’re talking about highly-skilled OSINT investigators. And the manual approach sometimes just doesn’t cut it even then.

Considering this, what kind of technology is necessary to give TPRM and OSINT investigators the upper hand against malicious actors and third parties constantly hiding their “dirty laundry” in plain sight?

Well, these intelligence analysts spend most of their time gathering and processing data in hopes of finding new clues and leads that would confirm or deny their suspicions regarding the subject of the investigation.

Traditionally, they would be trained to recognize and eliminate biased or false sources — but in the world of social media, every individual is a potential source, and there’s simply too much data to achieve the desired accuracy.

This is a problem if we’re approaching OSINT manually — but that’s why the industry standard is slowly shifting to the only logical answer to its data processing conundrum: artificial intelligence.

Even without fake news, the high volumes of OSINT data generated every single day would make artificial intelligence the optimal solution for its processing. That’s why machine learning technologies have become the heart of OSINT intelligence software, frequently helping discern the real news from the fake ones.

Analytics are relying more heavily on data mining, crunching, and analysis — all in the context of Big Data. That’s why the conventional telltale signs like semantics and language are still crucial to the complex process of eliminating fake news; we’re just using artificial intelligence technologies to do it at scale.

Natural language processing powered by machine learning gives us the amazing ability to quickly process a large number of data sources and extract insightful findings; while also separating concrete sources from bogus information.

Web intelligence software that runs on artificial intelligence technologies can be trained to recognize specific speech and writing patterns and eliminate the ones that stem from the same untruthful sources.

All in all, the amount of fake news found in open sources today forces OSINT investigators to rely not only on automated data collection but also on new technologies that automate much of the data processing. Ideally, technologies that employ artificial intelligence will continue to grow at the same pace as the technologies used by malicious actors.

OSINT and Misinformation In Practice

Considering all of the above, how does misinformation pose problems for OSINT investigations in practice?

There are plenty of examples. For instance, a couple of months before the pandemic, a viral video started circulating social media — a video depicting the founder and CEO of Facebook, Mark Zuckerberg.

The video showed Zuckerberg talking, in a full-on Bond villain imitation, about the fact that he controls many people’s online data and how he’s thankful to Spectre, the villainous organization that James Bond routinely fights against in his movies.

So, why would Zuckerberg make such a weird, deadpan joke about how he’s in control of our data? You guessed it; he didn’t. The video was a so-called “deepfake” — just the latest in a long line of technologies making OSINT investigations harder and harder.

These videos are expertly produced (and AI-powered) fakes. Their creators use incredibly complex software to scan countless pictures and videos of a public individual or celebrity — ironically, through OSINT data — and then create a fake video that convincingly mimics their mannerisms and voice.

It’s utterly surreal, and it’s why you’ve come across videos of Barack Obama declaring nuclear war or Donald Trump selling the White House to McDonald's. And sure, it may be funny when the video is clearly fake — but what happens when malicious actors use the technology to make fully realistic but fake scenarios and statements?

Identifying disinformation on the Internet has become a crucial priority for key decision makers, both in the private and public sectors. Tech companies that manage and own social media platforms have declared that they will fervently catalog, identify, and remove content that’s proven to be intentionally misleading.

However, the algorithms that realistically alter visual content — like deepfakes — make the process understandably difficult. And with over 60% of US-based adults getting most of their news from social networks, it’s an urgent issue to solve.

Luckily, companies like Facebook, Microsoft, and Google have started collaborating in an effort to refine and build tools capable of automatically flagging deepfakes. And since the start of the pandemic, TikTok, YouTube, Facebook, and Twitter have all worked on policies to recognize and govern “synthetic” and altered media.

Facebook has been using human fact-checkers for a long time to detect misinformation — but it’s starting to appear on a scale that makes human investigation moot. The only way forward is to develop counteracting AI technologies — and there’s a long road ahead.

Steps To Ensure Data Quality

So, what does misinformation on the Internet mean for companies that use OSINT intelligence for TPRM? In that specific case, companies may face eroded customer satisfaction due to poor data quality and security.

Besides using AI-powered OSINT software that will apply some serious mathimagics to corroborate every piece of information, there are other ways to ensure you’re looking at accurate information on any given topic. For example, before you even start verifying a piece of content found online, you need to think about a more fundamental question — is the picture, video, or text you’re looking at genuinely connected to a statement or event that happened?

The most significant part of the misinformation found online revolves around completely fabricated events — which is the easiest falsehood to check.

Imagine, for instance, that you need to verify a video touted as proof of rumors about problems with a third-party vendor or supplier and their bad business practices. Before you start checking the identity of the person who posted and captured the video, or its time, date, and location, ask yourself: have there been any rumors that the video claims to prove?

If that initial information is false, so is the video proof — and the former is usually easier to verify than the latter.

Breaking Down Verification

The neat part about verification is that no matter what you’re checking, the basic tenets are always the same. Whether you’re looking at a dummy account, an altered photo, or a fake eyewitness video, here are some of the elementary checks you’ll go through:

Provenance — you need to ascertain whether you’re looking at the original piece of content, article, or account.
Source — you should learn who created the original content, article, or account.
Time and location — the timing and location can tell you a lot about the validity of the content.
Motivation —why someone would post a piece of content, create a website, or establish an account speaks volumes about its validity.

When an OSINT investigator answers these basic questions, they’re one step closer to knowing precisely how strong their verification must be in the specific case.

Dealing With Misinformation in Visual Content

Considering all the above, how do OSINT investigators deal with visual misinformation, which is becoming increasingly easier to fabricate?

Provenance

As we’ve mentioned above, the first key question is — are we looking at a photo’s original version? A basic reverse image search can give us clues; if identical images have been indexed before the described event supposedly took place, it’s a clear fake.

On the other hand, that search might show a bunch of images with individual identical features — which means it could be a composite. Or there may not be any other versions online — and if the picture also passes basic reflection and shadow checks, it may very well be an original. Of course, the ultimate proof is speaking to the source, if possible.

Source

That leads us to another essential pillar of OSINT verification — the source. In this case, if a picture was sent by an anonymous chat app number or email or uploaded by a user who doesn’t appear by that username anywhere else online: it’s likely a fake.

Conversely, if you were able to identify the uploader through their domain or profile pictures which they made no effort to hide — the image has a greater chance of being genuine. Likewise, if you can contact them via social media and they confirm that they’re the authors of the photo — it’s another plus.

And finally, if they answer your questions and their answers are confirmed by the image’s EXIF data, the author’s online footprint, and weather reports: it’s most likely a genuine image.

Location

If no visual clues point to the location and zero location data, the image is more likely to be doctored. On the other hand, clothing, architecture, and signage clues that establish a genuine geographical region increase the odds of the image being original.

Also, OSINT investigators might cross reference landmarks and the surrounding landscape with mapping tools and try to come up with latitude and longitude coordinates. If the locations in the photo match other images from the area and online maps, it’s probably a valid source of information.

Time

If a photo was sent or uploaded anonymously and without EXIF data, that’s certainly a red flag. On the other hand, if its time stamp on social media clearly shows that the photo was uploaded right after the depicted event happened, and shadows and weather conditions line up with other location information and EXIF data — it’s more likely that you’re looking at a real photo.

Motivation

When it comes to any kind of content uploaded online, this is the big one — why was a photo captured? What was the uploader trying to achieve?

If you don’t know the identity of the author and uploader, ascertaining their motivations is pretty much impossible. And if the social account behind the photo was created recently or they have few to no other posts, you’re probably looking at a malicious dummy account with suspect motivations.

On the other hand, if the uploader’s identity confirms they’re working with an advocacy or activist organization, you’ll have a clearer view of their motives. The same goes for activities that confirm the uploader is a journalist, holidaymaker, a local worker, etc.

Conclusion

Today’s business and national intelligence efforts increasingly rely on OSINT methodologies. However, open-source intelligence isn’t without its challenges — especially when it comes to misinformation that’s abundant in today’s world fueled by social media content. With that in mind, extensive cross-referencing and checking is necessary, both by human investigators and with the help of AI and machine learning technologies.