Updated on November 28, 2023 in #flask, #ruby-on-rails

Help Find and Remove Hard Coded Passwords and Secrets in a Project

help-find-and-remove-hard-coded-passwords-and-secrets-in-a-project.jpg

It's never fun to accidentally leak secrets in git committed files. Here's a few ways to help address them in your code base.

Quick Jump:

Prefer watching videos? Here it is on YouTube.

We’re going to go over a couple of ways to help detect secrets without using anything but your brain and grep – no third party services will be needed. We’ll also go over a few quick ways to remove them.

# Checking Known Files Manually

Here’s a couple of spots where I’ve seen secrets exist while doing contract work:

README.md or more generally documentation
- It’s easy to copy / paste something here such as a curl command with a token
.env.example
- This is a prime spot for them to exist out of convenience
Dockerfile
- ARG, ENV and RUN instructions may have secrets but they could be elsewhere
docker-compose.yml
- The build, environment or command properties may have secrets
Docker entrypoint script
- The file path is dependent on your project but they could exist here
Assorted files in your app’s config/, settings/, etc. directories
- There could be quite a few here

For all of these, I suggest manually going through all lines in each file. Hopefully this detects most of them if your app only or mostly reads config values defined in 1 spot. By the way, I’ve written about the importance of defining config options in 1 spot.

If you’re looking for ways to address the above:

README.md or more generally documentation
- Reference env variables that were defined elsewhere (examples are below)
.env.example
- Store sensitive values in a secret store or anywhere outside of version control, or leverage your web framework’s way of handling secrets if it exists such as encrypted credentials with Rails
Dockerfile
- Take advantage of using Docker secrets if you need build-time secrets
- Whenever possible try to make them defined at run-time
docker-compose.yml
- Use the env_file property to reference an .env file
- Use Docker Compose’s variable interpolation
Docker entrypoint script
- Reference environment variables that were defined elsewhere
Assorted files in your app’s config/, settings/, etc. directories
- Extract secrets into environment variables or whatever secret store mechanism you want to use

One thing to think about is if the secret is safe to commit or not. For example in some of my .env.example files I do commit a POSTGRES_PASSWORD=password value because this is only used in development. I’m ok with “leaking” that because it makes using the project a lot easier to get going and the production secret is different and not commit.

# Searching for the Usual Suspects

Now that the low hanging fruit is handled we can start searching the whole code base for well known strings that likely have sensitive information:

# Perform a recursive regex case insensitive search for known values, but ignore
# specific directories. You can use `git grep` instead of grep if you prefer.
grep -REi --exclude-dir=".git" \
  "(auth|authentication|authorization|bearer|secret|token|pass|password|username)" .

You can also include the o flag to only show the matches instead of the whole line with the highlighted match but I find seeing the whole line helps see the context around the match.

The above isn’t going to catch everything but I’ve lost count of how many things I’ve picked up with this regular expression over the years. Usually it’s bearer tokens or API keys in general. username is included because on more than 1 occasion I’ve seen cases where the username was the actual password.

I left auth|authentication|authorization and pass|password as separate items even though they could have been simplified down to auth and pass because the advantage of keeping them separate is you can add the -w flag to the above grep command to limit the search to whole words only.

For example as is the above command will match secretToken but using -w would not because it’ll only match on secret or token individually. Another example where not using -w is handy is something like app_password. Using a whole word match could be good for a first pass if you have a lot of false positives but I like searching without it too.

The above should clean up a decent chunk, but this isn’t bullet proof. For example if you have postgresql://admin:myuniquepw@example.com/mydb floating around the above regex won’t catch it. Hopefully in this case it would be defined in your config/ directory which you caught but I wanted to throw this out as an example of what to think about.

# Looking for Long Strings without Spaces

Hopefully the above 2 methods caught just about everything but you never know. Maybe you’re working on a 12 year old project that has had 20 developers work on it at different stages of development.

Maybe you didn’t have a refined code review process back then and there’s cases of secrets being leaked elsewhere such as comments right above an API call or inside of your tests because you didn’t think to read a token from a config value instead.

All of this is going to be app dependent of course, but you can scan your whole code base for strings that might have sensitive information.

What Should We Look For?

For example, maybe your criteria is any 12+ characters that appear in between single or double quotes. This is going to cause a lot of false positives but it’s better than nothing.

For example if you scanned HTML template files for "images/hello.jpg" it would match because it happens to be 16 characters which is > 12 but if you try to get cute and ignore values that have / you might skip legit tokens because certain algorithms include /.

Depending on the size of your code base, I would suggest going for a more specific search to begin with, otherwise you’ll have way too many false positives. You can always loosen things up later.

Maybe to begin with you can go with:

Any 24+ non-space characters wrapped in single or double quotes
The next character is either a space, ., ; or the end of the line

That 2nd rule will help reduce a ton of noise. Variants like this will get picked up, I am using abcdef123456 here in place of a longer token:

tok = "abcdef123456";
tok = "abcdef123456"\n
tok = "abcdef123456" . "?filter=cool"
tok = "abcdef123456"."?filter=cool"

But at the same time <img src="images/hello.jpg"> won’t get picked up because > isn’t one of the next characters we’re looking for.

Also, depending on which framework or tools your app is built with, there might be a number of directories to skip if it helps reduce false positives. For example you can probably safely skip your css/ directory, although client side JavaScript is worth scanning because you might have tokens hard coded (I’ve seen it plenty of times in client work).

You can set --exclude-dir and --exclude multiple times to ignore multiple directories and files. Chances are you have a bunch of directories and files that are related to caching and build output which can be ignored. They can produce a lot of annoying false positives since they tend to be minified files with very long strings.

What Would Our grep Command Look Like?

These commands should work on both the GNU and BSD versions of grep which means they are compatible with Linux, WSL 2 and macOS.

Here’s the tighter version (less false positives):

grep -REi \
  --exclude-dir=".git" \
  --exclude-dir="assets/css" \
  "('|\")\S{24,}('|\")(\s|\.|;)" .

Here’s the looser version (more false positives but may catch more leaks):

grep -REi \
  --exclude-dir=".git" \
  --exclude-dir="assets/css" \
  "('|\")\S{12,}('|\")" .

Here’s a breakdown of the regex:

('|\") matches a single or double quote
\S{24,} matches any 24+ non-whitespace characters
('|\") matches a single or double quote
(\s|\.|;) matches any space-like character, period or semi-colon

Feel free to adjust either one, https://regex101.com/ is a great site to use to test your regex.

Keep in mind the above isn’t perfect. It will match unmatched quotes which could have false positives but you also need to know which battles to fight. In the cases I’ve been involved with the above worked quite well.

If you have to perform a more thorough search for compliance reasons or you just have a feeling that something was leaked nothing will beat a manual scan which is coming up next.

# When in Doubt, Check Everything Manually

If the stakes are high and you think you might have leaked secrets then your last line of defense is manually checking every line of every file.

If your app isn’t huge this might not be too bad. Maybe you can crank through this in an hour or a few hours. If you think it might take a few hours I’d suggest breaking it up in stages, such as doing it 30 or 45 minutes at a time and then take a break.

This gives your brain a chance to reset because this type of work is tedious and it’s too easy to default to quickly scanning things because your brain convinced you that there’s no leaked secrets because the last 5,000 lines you looked at didn’t have one.

Don’t be this guy haha, otherwise all of your efforts are going to waste:

The video below covers running some of these commands against a few projects.

# Demo Video

References

Example Docker starter apps

Timestamps

0:39 – Checking known files manually
8:05 – Removing secrets from certain known files
12:04 – Searching for the usual suspects
17:09 – Searching everything for potential secrets
21:39 – Understanding and running our catch all grep command
26:57 – If all else fails, manually check everything

What are your best tips to help detect leaked secrets? Let us know below!

Like you, I'm super protective of my inbox, so don't worry about getting spammed. You can expect a few emails per year (at most), and you can 1-click unsubscribe at any time. See what else you'll get too.

Learn Docker With My Newest Course