Fingerprint what, exactly?

Fingerprint what, exactly?
Photo by Arthur Mazi / Unsplash

A recent spirited discussion in the Cranky Comrades Discord occurred regarding this article: https://blog.foxio.io/ja4%2B-network-fingerprinting which highlighted the continuing mindset that a new detection method, a new signature style or a new standard will "fix" the issues with the old versions and then we'll "solve security". While the blog post in question doesn't claim to be "fixing" security, it does claim, what I view to be, a highly optimistic validity of the telemetry provided.

I highly recommend you give the post a read and make your own analysis of this technology. Additionally, I do not mean to demean the hard work of those involved in this project but I must ask why?. JA4 network fingerprinting supposes a replacement for the JA3 TLS fingerprinting standard with a number of extensions that allegedly allow client and server fingerprinting, SSH traffic fingerprinting and plenty of other useful sounding telemetry. However: These metrics (and I strongly believe they are just that, metrics) are based on static features of TCP traffic that are trivial to modify. These metrics rely on specifics of the application defaults used for the client or server, namely the TLS cipher suite, extensions, source IP or domain and hash of those suites for the client and server. Similarly the HTTP client fingerprint tracks the HTTP method, version, requested language, cookies and headers. This theme continues for the remainder of standard. None of these factors are particularly difficult to change and can be considered a minimum barrier to entry for any vaguely sophisticated scanning or scraping.

JA4 HTTP Client Fingerprint Case Study

This component was the simplest to bypass as it relies on the request method, HTTP version, cookies (if present) and the hash of the headers, cookie fields and values. This took two hours to bypass to a sufficient degree that telemetry would have little use. The time to prototype a bypass included:

1) Eating a delicious dinner made by my lovely partner
2) Ranting about said project to some friends
3) Figuring out how "Crungled" would be conjugated in German (Thanks to said friends. It's likely a regular verb and would follow standard rules)
4) Writing the article up to this point

Each request pseudo-randomly generates a new user agent, picks one of any valid Accept-Encoding fields and ContentType fields in the headers then generates a From field with a fake and pseudo-random email. These are generated primarily from the fake_useragent Python module and the wordlist under /usr/share/dict/words included in most Linux distributions. A list of all valid TLDs was included, sourced from ICANN. While this is not an exhaustive source of entropy, it can be trivially added to and provides sufficient data to mask a large volume of requests from a single source IP. When distributed across a botnet, consumer VPN endpoints, proxy services or any other means of obtaining a sufficient number of source IP addresses, this component of the fingerprint can be made irrelevant.

In theory, this fingerprint should allow blue teams to identify regularly occurring traffic within their environment and begin identifying outliers that do not match the baseline. This can primarily be used for outgoing or internal traffic where similar HTTP requests will appear as following the same patterns and therefore the same (or similar set of) JA4H fingerprint. This assumption breaks down when looking at external traffic. When these factors are taken into account with a public-facing application, even legitimate traffic will have a wide variety of inputs that muddy the water substantially. Different client devices, browsers, out of date software and all six Haiku users will throw a veritable buffet of user agents, header options and source IPs at the application. These combinations will all create a unique JA4H fingerprint, making statistical detection of outliers less assured. This is further complicated by the background noise of scanning and scraping conducted at all times across the internet. This traffic may be unwanted, but doesn't raise to the level of being malicious, but still pollutes the dataset for identified clients. However, this gives an attacker ample room to "hide" amongst the noise and produce seemingly disconnected, statistically irrelevant events that are hard to track in a coherent way. So what does this really get us?

In short? More analyst fatigue. As if we didn't have enough. What would it require to actually use these fingerprints for actionable defensive use? Frankly, I'm not sure. I have yet to get use out of the JA3 fingerprint standard on its own and it rarely is used as a high confidence discriminator when combined with other traffic indicators. If these indicators are used with all the possible extensions and there is not an attempt to bypass any given layer of fingerprinting it has the potential to add further telemetry to logging data. These fingerprinting methods fall back on the outdated idea of "Enumerating badness" in the same tradition of IP blocklists, malware hashes and spam lists. These can have a short-lived use to block point-in-time malicious behavior but cannot and should not be relied upon to provide a meaningful benefit versus preventative actions like hardening the system being targeted or patching your software.

The PoC or GTFO part:

At time of publishing this includes a quick script to bypass JA4H, but more will be added to as the urge strikes me.
https://codeberg.org/katherinez/ja4-fingerprinting-bypass

Is it perfect? Hell no. Does it work? Yeah. PRs welcome, the code is licensed under BSD 3-Clause in the spirit of the original spec.