Public bundle repos expose 1000’s of API safety tokens—and so they’re lively

Deal Score0
Deal Score0

As a part of the event of JFrog Xray’s new Secrets Detection feature, we needed to check our detection capabilities on as a lot actual world knowledge as potential, each to ensure we eradicate false positives and to catch any errant bugs in our code.

As we continued testing, we found there have been much more recognized lively entry tokens than we anticipated. We broadened our checks to full-fledged analysis, to know the place these tokens are coming from, to evaluate the viability of utilizing them, and to have the ability to privately disclose them to their homeowners. On this weblog put up we’ll current our analysis findings and share greatest practices for avoiding the precise points that led to the publicity of those entry tokens.

Entry tokens – what are all of them about?

Cloud providers have grow to be synonymous with fashionable computing. It’s exhausting to think about working any type of scalable workload with out counting on them. The advantages of utilizing these providers include the danger of delegating our knowledge to overseas machines and the accountability of managing the entry tokens that present entry to our knowledge and providers. Publicity of those entry tokens might result in dire penalties. A current instance was the largest data breach in history, which uncovered one billion information containing PII (personally identifiable data) because of a leaked entry token.

Not like the presence of a code vulnerability, a leaked entry token normally means the instant “sport over” for the safety crew, since utilizing a leaked entry token is trivial and, in lots of instances, negates all investments into safety mitigations. It doesn’t matter how refined the lock on the vault is that if the mix is written on the door.

Cloud providers deliberately add an identifier to their entry tokens in order that their providers might carry out a fast validity test of the token. This has the aspect impact of constructing the detection of those tokens extraordinarily straightforward, even when scanning very giant quantities of unorganized knowledge.


Instance token

GitHub gho_16C7e42F292c6912E7710c838347Ae178B4a
GitLab gplat-234hcand9q289rba89dghqa892agbd89arg2854
npm npm_1234567890abcdefgh
Slack xoxp-123234234235-123234234235-123234234235-adedce74748c3844747aed48499bb

Which open-source repositories did we scan?

We scanned artifacts in the commonest open-source software program registries: npm, PyPI, RubyGems,, and DockerHub (each Dockerfiles and small Docker layers). All in all, we scanned greater than 8 million artifacts.

In every artifact, we used Secrets Detection to seek out tokens that may be simply verified. As a part of our analysis, we made a minimal request for every of the discovered tokens to:

  1. Verify if the token remains to be lively (wasn’t revoked or publicly unavailable for any cause).
  2. Perceive the token’s permissions.
  3. Perceive the token’s proprietor (at any time when potential) so we might disclose the difficulty privately to them.

For npm and PyPI, we additionally scanned a number of variations of the identical bundle, to attempt to discover tokens that have been as soon as obtainable however eliminated in a later model.

jfrog 01 rev JFrog

‘Energetic’ vs. ‘inactive’ tokens

As talked about above, every token that was statically detected was additionally run via a dynamic verification. This implies, for instance, making an attempt to entry an API that doesn’t do something (no-op) on the related service that the token belongs to, simply to see that the token is “obtainable to be used.” A token that handed this take a look at (“lively” token) is obtainable for attackers to make use of with none additional constraints.

We are going to confer with the dynamically verified tokens as “lively” tokens and the tokens that failed dynamic verification as “inactive” tokens. Be aware that there is perhaps many causes {that a} token would present up as “inactive.” For instance:

  • The token was revoked.
  • The token is legitimate, however has further constraints to utilizing it (e.g., it have to be used from a particular supply IP vary).
  • The token itself is just not actually a token, however slightly an expression that “seems like” a token (false optimistic).

Which repositories had essentially the most leaked tokens?

The primary query that we needed to reply was, “Is there a particular platform the place builders are almost certainly to leak tokens?”

When it comes to the sheer quantity of leaked secrets and techniques, evidently builders must be careful about leaking secrets and techniques when constructing their Docker Photographs (see the “Examples” part beneath for steerage on this).

jfrog 02 rev JFrog

We hypothesize that the overwhelming majority of Docker Hub leaks are brought on by the closed nature of the platform. Whereas different platforms permit builders to set a hyperlink to the supply repository and get safety suggestions from the neighborhood, there’s a increased worth of entry in Docker Hub. Particularly, the researcher should pull the Docker picture and discover it manually, presumably coping with binaries and never simply supply code.

A further downside with Docker Hub is that no contact data is publicly proven for every picture, so even when a leaked secret is discovered by a white hat researcher it may not be trivial to report the difficulty to the picture maintainer. Consequently, we will observe pictures that retain uncovered secrets and techniques or different sorts of safety points for years.

The next graph exhibits that tokens present in Docker Hub layers have a a lot increased likelihood of being lively, in comparison with all different repositories.

jfrog 03 rev JFrog

Lastly, we will additionally have a look at the distribution of tokens normalized to the variety of artifacts that have been scanned for every platform.

jfrog 04 rev JFrog

When ignoring the variety of scanned artifacts for every platform and specializing in the relative variety of leaked tokens, we will see that Docker Hub layers nonetheless supplied essentially the most tokens, however second place is now claimed by PyPI. (When trying on the absolute knowledge, PyPI had the fourth most tokens leaked.)

Which token sorts have been leaked essentially the most?

After scanning all token sorts which are supported by Secrets and techniques Detection and verifying the tokens dynamically, we tallied the outcomes. The highest 10 outcomes are displayed within the chart beneath.

jfrog 05 rev JFrog

We will clearly see that Amazon Net Providers, Google Cloud Platform, and Telegram API tokens are the most-leaked tokens (in that order). Nevertheless, evidently AWS builders are extra vigilant about revoking unused tokens, since solely ~47% of AWS tokens have been discovered to be lively. Against this, GCP had an lively token price of ~73%.

Examples of leaked secrets and techniques in every repository

You will need to look at some actual world examples from every repository with a purpose to elevate consciousness to the potential locations the place tokens are leaked. On this part, we’ll deal with these examples, and within the subsequent part we’ll share tips about how these examples ought to have been dealt with.

DockerHub – Docker layers

Inspecting the filenames that have been current in a Docker layer and contained leaked credentials exhibits that the commonest supply of the leakage are Node.js purposes that use the dotenv bundle to retailer credentials in atmosphere variables. The second commonest supply was hardcoded AWS tokens.

The desk beneath lists the commonest filenames in Docker layers that contained a leaked token.


# of situations with lively leaked tokens

.env 214
./aws/credentials 111
config.json 56
gc_api_file.json 50 47
key.json 40 38
credentials.json 35 35

Docker layers may be inspected by pulling the picture and working it. Nevertheless, there are some instances the place a secret might have been removed by an intermediate layer (through a “whiteout” file), and in that case, the key received’t present up when inspecting the ultimate Docker picture. It’s potential to examine every layer individually, utilizing instruments resembling dive, and discover the key within the “eliminated” file. See the screenshot beneath.

jfrog 06 JFrog

Docker layer with credentials opened within the dive layer inspector.

Inspecting the contents of the “credentials” file reveals the leaked tokens.

jfrog 07 JFrog

AWS credentials leaked through ./aws/credentials.

DockerHub – Dockerfiles

Docker Hub contained greater than 80% of the leaked credentials in our analysis.

Builders normally use secrets and techniques in Dockerfiles to initialize atmosphere variables and move them to the applying working within the container. After the picture is printed, these secrets and techniques grow to be publicly leaked.

jfrog 08 JFrog

AWS credentials leaked via Dockerfile atmosphere variables.

One other widespread possibility is the utilization of secrets and techniques in Dockerfile instructions that obtain the content material required to arrange the Docker utility. The instance beneath exhibits how a container makes use of an authentication secret to clone a repository into the container.

jfrog 09 JFrog

AWS credentials leaked via the Dockerfile through a git clone command.

With, the Rust bundle supervisor, we fortunately noticed a special final result than all different repositories. Though Xray detected practically 700 packages that include secrets and techniques, just one of those secrets and techniques confirmed up as lively. Curiously, this secret wasn’t even used within the code, however was discovered inside a remark.

jfrog 10 JFrog


In our PyPI scans, a lot of the token leaks have been present in precise Python code.

For instance, one of many capabilities in an affected mission contained an Amazon RDS (Relational Database Service) token. Storing a token like this can be positive, if the token solely permits entry for querying the instance RDS database. Nevertheless, when gathering permissions for the token, we found that the token offers entry to the complete AWS account. (This token has been revoked following our disclosure to the mission maintainers.)

jfrog 11 JFrog

AWS token leakage within the supply code of a PyPI bundle.

jfrog 11b rev JFrog

Unintended full admin permissions (*/*) on an “instance” Amazon RDS token.


Aside from hardcoded tokens in Node.js code, npm packages can have customized scripts outlined within the scripts block of the bundle.json file. This enables working scripts outlined by the bundle maintainer in response to sure triggers, such because the bundle being constructed, put in, and so on.

A recurring mistake we noticed was storing tokens within the scripts block throughout improvement, however then forgetting to take away the tokens when the bundle is launched. Within the instance beneath we see leaked npm and GitHub tokens which are utilized by the construct utility semantic-release.

jfrog 12 JFrog

npm token leakage in npm “scripts” block (bundle.json).

Often, the dotenv bundle is meant to resolve this downside. It permits builders to create an area file referred to as .env within the mission’s root listing and use it to populate the atmosphere variables in a take a look at atmosphere. Utilizing this bundle within the right method solves the key leak, however sadly, we discovered improper utilization of the dotenv bundle to be one of the vital widespread causes of secrets and techniques leakage in PyPI packages. Though the bundle documentation explicitly says to not commit the .env information to model management, we discovered many packages the place the .env file was printed to npm and contained secrets and techniques.

The dotenv documentation explicitly warns in opposition to publishing .env information:

No. We strongly advocate in opposition to committing your .env file to model management. It ought to solely embody environment-specific values resembling database passwords or API keys. Your manufacturing database ought to have a special password than your improvement database.


Going over the outcomes for RubyGems packages, we noticed no particular outliers. The detected secrets and techniques have been discovered both in Ruby code or in arbitrary configuration information contained in the gem.

For instance, right here we will see an AWS configuration YAML that leaked delicate tokens. The file is meant to be a placeholder for AWS configuration, however the improvement part was altered with a reside entry/secret key.

jfrog 13 JFrog

AWS token leakage in spec/dummy/config/aws.yml.

The most typical errors when storing tokens

After analyzing all of the lively credentials we’ve discovered, we will level to numerous widespread errors that builders ought to look out for, and we will share a number of tips on retailer tokens in a safer means.

Mistake #1. Not utilizing automation to test for secret exposures

There have been loads of instances the place we discovered lively secrets and techniques in sudden locations: code feedback, documentation information, examples, or take a look at instances. These locations are very exhausting to test for manually in a constant means. We recommend embedding a secrets and techniques scanner in your DevOps pipeline and alerting on leaks earlier than publishing a brand new construct.

There are various free, open-source instruments that present this type of performance. One in every of our OSS suggestions is TruffleHog, which helps a plethora of secrets and techniques and validates findings dynamically, decreasing false positives.

For extra refined pipelines and broad integration assist, we offer JFrog Xray.

jfrog 14 JFrog

A GitHub token leaked in documentation, supposed as read-only however in actuality supplied full edit permissions.

We will be happy to hear your thoughts

Leave a reply
Enable registration in settings - general