Help Find and Remove Hard Coded Passwords and Secrets in a Project
It's never fun to accidentally leak secrets in git committed files. Here's a few ways to help address them in your code base.
Prefer watching videos? Here it is on YouTube.
We’re going to go over a couple of ways to help detect secrets without using
anything but your brain and grep
– no third party services will be needed.
We’ll also go over a few quick ways to remove them.
# Checking Known Files Manually
Here’s a couple of spots where I’ve seen secrets exist while doing contract work:
README.md
or more generally documentation- It’s easy to copy / paste something here such as a curl command with a token
.env.example
- This is a prime spot for them to exist out of convenience
Dockerfile
ARG
,ENV
andRUN
instructions may have secrets but they could be elsewhere
docker-compose.yml
- The
build
,environment
orcommand
properties may have secrets
- The
- Docker entrypoint script
- The file path is dependent on your project but they could exist here
- Assorted files in your app’s
config/
,settings/
, etc. directories- There could be quite a few here
For all of these, I suggest manually going through all lines in each file. Hopefully this detects most of them if your app only or mostly reads config values defined in 1 spot. By the way, I’ve written about the importance of defining config options in 1 spot.
If you’re looking for ways to address the above:
README.md
or more generally documentation- Reference env variables that were defined elsewhere (examples are below)
.env.example
- Store sensitive values in a secret store or anywhere outside of version control, or leverage your web framework’s way of handling secrets if it exists such as encrypted credentials with Rails
Dockerfile
- Take advantage of using Docker secrets if you need build-time secrets
- Whenever possible try to make them defined at run-time
docker-compose.yml
- Use the
env_file
property to reference an.env
file - Use Docker Compose’s variable interpolation
- Use the
- Docker entrypoint script
- Reference environment variables that were defined elsewhere
- Assorted files in your app’s
config/
,settings/
, etc. directories- Extract secrets into environment variables or whatever secret store mechanism you want to use
One thing to think about is if the secret is safe to commit or not. For example
in some of my .env.example
files I do commit a POSTGRES_PASSWORD=password
value because this is only used in development. I’m ok with “leaking” that
because it makes using the project a lot easier to get going and the production
secret is different and not commit.
# Searching for the Usual Suspects
Now that the low hanging fruit is handled we can start searching the whole code base for well known strings that likely have sensitive information:
# Perform a recursive regex case insensitive search for known values, but ignore
# specific directories. You can use `git grep` instead of grep if you prefer.
grep -REi --exclude-dir=".git" \
"(auth|authentication|authorization|bearer|secret|token|pass|password|username)" .
You can also include the o
flag to only show the matches instead of the
whole line with the highlighted match but I find seeing the whole line helps
see the context around the match.
The above isn’t going to catch everything but I’ve lost count of how many
things I’ve picked up with this regular expression over the years. Usually it’s
bearer tokens or API keys in general. username
is included because on more
than 1 occasion I’ve seen cases where the username was the actual password.
I left auth|authentication|authorization
and pass|password
as separate
items even though they could have been simplified down to auth
and pass
because the advantage of keeping them separate is you can add the -w
flag to
the above grep command to limit the search to whole words only.
For example as is the above command will match secretToken
but using -w
would not because it’ll only match on secret
or token
individually. Another
example where not using -w
is handy is something like app_password
. Using a
whole word match could be good for a first pass if you have a lot of false
positives but I like searching without it too.
The above should clean up a decent chunk, but this isn’t bullet proof. For
example if you have postgresql://admin:myuniquepw@example.com/mydb
floating
around the above regex won’t catch it. Hopefully in this case it would be
defined in your config/
directory which you caught but I wanted to throw this
out as an example of what to think about.
# Looking for Long Strings without Spaces
Hopefully the above 2 methods caught just about everything but you never know. Maybe you’re working on a 12 year old project that has had 20 developers work on it at different stages of development.
Maybe you didn’t have a refined code review process back then and there’s cases of secrets being leaked elsewhere such as comments right above an API call or inside of your tests because you didn’t think to read a token from a config value instead.
All of this is going to be app dependent of course, but you can scan your whole code base for strings that might have sensitive information.
What Should We Look For?
For example, maybe your criteria is any 12+ characters that appear in between single or double quotes. This is going to cause a lot of false positives but it’s better than nothing.
For example if you scanned HTML template files for "images/hello.jpg"
it
would match because it happens to be 16 characters which is > 12 but if you try
to get cute and ignore values that have /
you might skip legit tokens because
certain algorithms include /
.
Depending on the size of your code base, I would suggest going for a more specific search to begin with, otherwise you’ll have way too many false positives. You can always loosen things up later.
Maybe to begin with you can go with:
- Any 24+ non-space characters wrapped in single or double quotes
- The next character is either a space,
.
,;
or the end of the line
That 2nd rule will help reduce a ton of noise. Variants like this will get
picked up, I am using abcdef123456
here in place of a longer token:
tok = "abcdef123456";
tok = "abcdef123456"\n
tok = "abcdef123456" . "?filter=cool"
tok = "abcdef123456"."?filter=cool"
But at the same time <img src="images/hello.jpg">
won’t get picked up because
>
isn’t one of the next characters we’re looking for.
Also, depending on which framework or tools your app is built with, there might
be a number of directories to skip if it helps reduce false positives. For
example you can probably safely skip your css/
directory, although client
side JavaScript is worth scanning because you might have tokens hard coded
(I’ve seen it plenty of times in client work).
You can set --exclude-dir
and --exclude
multiple times to ignore multiple
directories and files. Chances are you have a bunch of directories and files
that are related to caching and build output which can be ignored. They can
produce a lot of annoying false positives since they tend to be minified files
with very long strings.
What Would Our grep Command Look Like?
These commands should work on both the GNU and BSD versions of grep which means they are compatible with Linux, WSL 2 and macOS.
Here’s the tighter version (less false positives):
grep -REi \
--exclude-dir=".git" \
--exclude-dir="assets/css" \
"('|\")\S{24,}('|\")(\s|\.|;)" .
Here’s the looser version (more false positives but may catch more leaks):
grep -REi \
--exclude-dir=".git" \
--exclude-dir="assets/css" \
"('|\")\S{12,}('|\")" .
Here’s a breakdown of the regex:
('|\")
matches a single or double quote\S{24,}
matches any 24+ non-whitespace characters('|\")
matches a single or double quote(\s|\.|;)
matches any space-like character, period or semi-colon
Feel free to adjust either one, https://regex101.com/ is a great site to use to test your regex.
Keep in mind the above isn’t perfect. It will match unmatched quotes which could have false positives but you also need to know which battles to fight. In the cases I’ve been involved with the above worked quite well.
If you have to perform a more thorough search for compliance reasons or you just have a feeling that something was leaked nothing will beat a manual scan which is coming up next.
# When in Doubt, Check Everything Manually
If the stakes are high and you think you might have leaked secrets then your last line of defense is manually checking every line of every file.
If your app isn’t huge this might not be too bad. Maybe you can crank through this in an hour or a few hours. If you think it might take a few hours I’d suggest breaking it up in stages, such as doing it 30 or 45 minutes at a time and then take a break.
This gives your brain a chance to reset because this type of work is tedious and it’s too easy to default to quickly scanning things because your brain convinced you that there’s no leaked secrets because the last 5,000 lines you looked at didn’t have one.
Don’t be this guy haha, otherwise all of your efforts are going to waste:
The video below covers running some of these commands against a few projects.
# Demo Video
References
Timestamps
- 0:39 – Checking known files manually
- 8:05 – Removing secrets from certain known files
- 12:04 – Searching for the usual suspects
- 17:09 – Searching everything for potential secrets
- 21:39 – Understanding and running our catch all grep command
- 26:57 – If all else fails, manually check everything
What are your best tips to help detect leaked secrets? Let us know below!