Recursively Print Files with Their Path Using Find, Cat or Tail
Here's a few ways to output multiples files so you can skim which file has which content.
There’s a number of ways we can do this depending on what you’re optimizing for. Is it speed in a script? Perhaps a 1 off command you plan to run from your terminal? Are you targeting / using Bash 3, 4 or ZSH?
# Why?
My use case was for contract work. A company was looking to switch from Apache to nginx in a 10+ year old massive PHP project. The first thing I wanted to do was get a lay of the land and see how many Apache related files were sprinkled across the project.
I’m more familiar with nginx but I do know with Apache it’s common to litter
.htaccess
files through out your code base in specific directories.
I wanted a quick way to see how many of these files existed and what type of configuration was there. Basically, I wanted to get a rough idea of how many rules would need to be ported over so I can assess the situation.
This could be the difference between saying it’s a 3 hour job, 20 hour job or even suggest that it’s not worth it due to risk. Most of their bottlenecks are at the PHP level not serving static files, but still they have more knowledge with nginx and are short staffed to do the work and asked me what I thought.
That left me wanting to run a 1 off command to answer the questions I had and here we are.
# The Commands
I’m going to list out the solutions I incrementally tried in the order I tried them based on the mini-challenges I had along the way.
I’ve set up a couple of nested directories and .htaccess
files for the sake
of this post. Do not use these htaccess rules as a best practice. I just
grabbed a couple of assorted rules so we have something to look at here.
Find all .htaccess
files:
$ find . -type f -name .htaccess
# . the path we want to start finding files in (the current directory)
# -type f only include files
# -name .htaccess the file we're interested in
./a/b/1/.htaccess
./a/b/c/d/.htaccess
./a/.htaccess
./.htaccess
This will recursively find all files named .htaccess
and return the paths of
the files. That gets us half way there.
Find all .htaccess
files and cat their contents:
$ find . -type f -name .htaccess -exec cat {} \;
# -exec cat {} \; run the cat command on each file that was found
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [R=301,L]
Redirect 301 / https://example.com/
Redirect to www
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]
Options +FollowSymLinks
RewriteEngine On
This gets us our answer but there’s no way to visually see which files have which contents. It was good for a 10 second quick scan to see how many rules I was dealing with but there’s not enough context.
Find all .htaccess
files and cat their contents with the file path:
$ find . -type f -name .htaccess -print -exec cat {} \;
# -print include the file path
./a/b/1/.htaccess
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [R=301,L]
./a/b/c/d/.htaccess
Redirect 301 / https://example.com/
./a/.htaccess
Redirect to www
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]
./.htaccess
Options +FollowSymLinks
RewriteEngine On
This is pretty close. Now the file path is printed out before its output but in my case I was dealing with dozens of files and hundreds of lines of output. It wasn’t very human friendly to read the output.
Find all .htaccess
files and cat their contents with a custom label:
$ find . -type f -name .htaccess -printf '\n%p\n' -exec cat {} \;
# -printf '\n%p\n' %p is the path (%f would be just the file)
./a/b/1/.htaccess
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [R=301,L]
./a/b/c/d/.htaccess
Redirect 301 / https://example.com/
./a/.htaccess
Redirect to www
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]
./.htaccess
Options +FollowSymLinks
RewriteEngine On
Now we’re getting there. This looks pretty good. You could even add more new lines if you wanted a bit more separation in the files since if you have a lot of output a single line break isn’t quite enough.
Find all .htaccess
files and tail their contents:
$ find . -type f -name .htaccess -exec tail -n+1 {} +
# tail -n+1 {} + print out the whole file
==> ./a/b/1/.htaccess <==
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [R=301,L]
==> ./a/b/c/d/.htaccess <==
Redirect 301 / https://example.com/
==> ./a/.htaccess <==
Redirect to www
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]
==> ./.htaccess <==
Options +FollowSymLinks
RewriteEngine On
Unlike cat
, the tail
command will add a custom label to separate each file
if there’s multiple files. If we’re ok with its format of ==> $path <==
this
could work. The ASCII arrows are a nice visual aid to break up the text,
especially if your files have new lines in them.
Truthfully I got my answer in the previous command with cat but I spent an extra few minutes Googling around afterwards to see if there was a better way to do this for curiosity’s sake.
Globbing with cat:
$ cat **/.htaccess
Options +FollowSymLinks
RewriteEngine On
Redirect to www
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [R=301,L]
Redirect 301 / https://example.com/
If your shell supports globbing then you can simplify things. Of course we’re back to not being able to separate anything visually but spoiler alert, we can use tail instead.
The above will work if you’re using zsh which has globbing enabled by default.
If you’re using Bash, you’ll need at least Bash 4+ and you’ll want to enable
globstar
.
In Bash (not zsh) you can run shopt globstar
to check if it’s enabled or not.
Here’s a 1 liner to enable it, run our command and disable it again:
shopt -s globstar && cat **/.htaccess && shopt -u globstar
If you’re using Bash and are running a number of commands that need globbing
you can enable it on its own before you run your commands. You don’t need to
&&
it with your command(s).
Globbing with tail:
$ tail -n+1 **/.htaccess
==> .htaccess <==
Options +FollowSymLinks
RewriteEngine On
==> a/.htaccess <==
Redirect to www
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]
==> a/b/1/.htaccess <==
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [R=301,L]
==> a/b/c/d/.htaccess <==
Redirect 301 / https://example.com/
Pretty cool! This is what I will use in the future for 1 off commands.
I’d consider using it in a Bash script too since you can enable globstar
just
for the script but I would experiment with that if / when the time comes. If
you need POSIX compliance then the find approach is one way to go.
Benchmarks:
I didn’t go crazy benchmarking things but I will say this. My example set up
for this post only has 4 files and a few directories. When I used time
here’s
the “total” time when I ran the commands about 10 times each:
- Find with tail:
0.003 total
- Globbing with tail:
0.002 total
I guess globbing is faster since it’s expanding the file paths out and letting tail deal with operating on multiple files where as with find we’re calling both find and the tail command.
The video below goes into more detail about all of the commands run.
BONUS: What about matching multiple file patterns?
This wasn’t covered on video but someone in the YouTube comments asked what would you do to match several file patterns at once.
For that you can use brace expansion:
# Match all Python and Markdown files.
tail -n+1 **/*.{py,md}
# Match all __init__.py files and README.md files.
tail -n+1 **/{__init__.py,README.md}
As long as 1 or more files exist for all matching patterns it will list them out. If you try to run the above command when any of the matching file patterns don’t exist then your shell will throw a no matches found error.
# Demo Video
Timestamps
- 0:14 – Going over a specific use case
- 1:28 – The example for this video
- 1:51 – Find all matching files
- 2:38 – Catting out each matching file
- 4:26 – Including the file path above the output
- 4:59 – Using a custom printf label with find
- 5:46 – Using tail instead of cat
- 7:37 – Replacing find with shell globbing
- 8:33 – Enabling globstar with Bash
- 10:13 – Very informal benchmarking (find vs globbing)
What’s your preferred way to handle this use case? Let me know below.