Using NGINX Regex Capture Groups to Redirect URL Paths
This can be handy if you change your URL structure and want to make sure your old URLs still work and indexes get updated.
Prefer video? Here it is on YouTube.
Over the last ~10 years or so I’ve made a number of changes to this blog’s URLs. I try to be a good citizen of the web and avoid breaking URLs because you never know who might be linking back to you.
I use NGINX as my web server and set up a few 301 redirects. These are permanent redirects that let search engines and others know that your content has moved to a new location and HTTP clients such as browsers will auto-redirect to it.
Capture groups help a lot here because they let you “capture” part of a regular expression match and put it into a variable that you can use outside of the match.
In this case, that would be taking part of the incoming URL path and performing a redirect to another URL path but it’s not limited to redirects.
This idea of a capture group belongs to regular expressions and is well supported in many programming languages and tools, NGINX happens to support it.
Changing a Part of Your URL Path
Early on I tried to prematurely optimize my URLs thinking I would have courses, books and other digital goods which could be described as “products” but that wasn’t worth it. I’ve kept it simple and just use courses because that’s what I have.
That means redirecting any /products/
URL to /courses/
, including
/products/
as well as /products/hello-world
.
location ~ ^/products/(.*)$ {
return 301 /courses/$1$is_args$args;
}
location ~ ^/products/(.*)$
matches a regular expression- The capture group is defined with
(.*)
in the location line, the paranthesis starts and ends the capture group and the regex inside is what gets stored in a variable we’ll use on the next line - The
$1
variable on the 2nd line contains what was captured $is_args$args
is unrelated but makes sure any query string params are included
Changing Paginated URLs
I converted my blog from Jekyll to Hugo and with Jekyll I had my paginated blog
pages at /blog/page2
, /blog/page3
, etc. but with Hugo the URL structure was
different. It uses blog/page/2
, /blog/page/3
, etc..
This is similar to the first example except it shows we can match on any type of regular expression we want, in this case only digits rather than everything.
location ~ ^/blog/page(\d+)/?$ {
return 301 /blog/page/$1$is_args$args;
}
(\d+)
uses a capture group that only matches 1 or more digits- Everything else is the same as the previous example
You can test the above against this site with:
$ curl -v https://nickjanetakis.com/blog/page3 2>&1 | grep "< Location:"
< Location: https://nickjanetakis.com/blog/page/3
By default curl won’t auto-redirect to the destination but if you include -v
it shows you where it will redirect. You can use -L
or --location
to have
it auto-redirect like a browser.
Multiple / Named Capture Groups
For my personal sites I don’t have any examples of using multiple capture
groups at once but you could have something like ^/issues/(.*)/comments/(.*)$
and then you can access your captures with $1
and $2
in the order they are
specified.
If you find yourself getting confused by using $1
or numbered references you
can name them instead. Here’s the first example but using a named capture
group:
location ~ ^/products/(?<product>.*)$ {
return 301 /courses/$product$is_args$args;
}
?<product>
lets you define a name of your choosing$product
is the same as$1
except it has a custom name
The video below shows how some of this works.
# Demo Video
Timestamps
- 1:01 – Redirecting parts of a URL path
- 3:35 – Adjusting paginated blog post pages
- 4:58 – Named captures and multiple captures
What type of problems have you solved with capture groups? Let me know below.