Learn Docker With My Newest Course

Dive into Docker takes you from "What is Docker?" to confidently applying Docker to your own projects. It's packed with best practices and examples. Start Learning Docker →

Converting My 500+ Page Blog from Jekyll to Hugo

converting-my-500-page-blog-from-jekyll-to-hugo.jpg

This was a fun adventure that took less time than I thought it would have. Here's a bunch of things I learned along the way.

Quick Jump:

Prefer video? Here it is on YouTube. It covers a bit more detail than this post since I open both site in a code editor and jump around.

I originally wrote this right after migrating to Hugo but I waited 6+ months to post this to make sure everything was in good working order and stable. It’s been over 6 months and everything is good to go. I’m really happy with how it turned out.

I’ve mostly left this post untouched in its original form but I added little callouts to validate some of my assumptions back then and also add a few new details.

Hugo and Jekyll are both popular static site generators.

One does not simply wake up and decide they want change tech stacks on their 9 year old site with 500+ posts and 300+ drafts. I think before we get into the nitty gritty technical details, answering the “why?” is important here. If you don’t care, that’s ok, skip around.

Was it because Jekyll was too slow? Not entirely

Full site builds with Jekyll took 40 seconds and --incremental builds took about 4 seconds. That’s on an i5 3.2ghz / 16 GB of memory / SSD workstation I put together back in 2014. I mean, that’s not blazing fast but it’s wasn’t terrible.

With that said, Hugo does full builds in ~3 seconds and incremental builds in 80ms.

That gives near instant live reload when writing posts which makes a huge difference from waiting 4 seconds. It’s something I noticed right away in a good way and if I knew how big of an impact that was, I would have considered switching earlier.

The main reason is because I traveled to Portugal recently for a Docker Captains summit and getting Jekyll set up on my Chromebook running Linux so I can publish a few blog posts while traveling was a pain.

It was a pain because I was stuck using an old version of Jekyll-Assets which in turn locked me into an older version of Jekyll which in turn locked me into using an old version of Ruby 2.7. Jekyll along with live reload was never something I was able to get to work inside of Docker with this combination of versions too.

Combine that with using an unsupported Linux distro (GalliumOS) which didn’t make it easy to get specific versions of specific files to compile Ruby gems that require C dependencies.

I felt like I dodged 20 bullets just getting Jekyll to build successfully on that Chromebook and I have experience working with Rails so Ruby is not foreign to me. That’s a pain I don’t want to continue dealing with, especially when I travel more.

So then the decision was, do I try to refactor my way out of Jekyll-Assets so I can update Jekyll and Ruby? I use that plugin because it digests your assets by adding a SHA-256 hash to their file names. This lets me cache them with nginx. I can’t not have that feature.

I didn’t find an alternative solution so it was time to consider approaching this from ground zero. Let me take every pain point I encountered and see if I can solve everything. Yep, the “grand rewrite”. In this case it made sense.

I always knew Hugo was fast and quite popular. I even toyed with the idea of switching to it 2-3 years ago but I didn’t have enough of a compelling reason.

I do know Docker’s documentation is built with it which gave me a pretty big confidence boost that it can be successfully used on a complex project. I didn’t even bother looking at other solutions because I know having a super fast statically compiled binary solution is pretty much as good as it gets.

I will say this. Having the idea of switching lingering in the back of my mind for all that time definitely created a mental tax. Every time it popped up into my thoughts it made me think I’m slacking off or avoiding something when I didn’t do it.

I decided it’s time and went into hardcore learning mode. About 30 hours later my whole site was converted with a combination of Python scripts, shell scripts and manual checks. I broke that up over the course of a few days.

That mental tax vanished and it felt great.

By the way, I had no idea it was going to take ~30 hours. I thought this would have taken way longer which was also why I was reluctant to start. That and there were too many unknowns that I first had to figure out and as I’m sure you know having unknowns is a great way to talk yourself out of doing something.

# Solving the Bigger Problems Before I Start

Step one is getting a lay of the land. What do I even need to change or solve from a technical sense before I invest serious time into this? That seems like a reasonable first step.

It’s not just a matter of converting a few Jekyll template tags to Hugo. I had a few custom Jekyll plugins that I wrote or was using.

For example, one of them scans all links and adds target="_blank" to all external links. An external link is any absolute URL that doesn’t match my configured domain.

Another one pulls tweets in from Twitter so I can embed tweets.

There were also other features I built using frontmatter and built-in template logic such as generating the “Quick Jump” that you see in most of my posts. Also, I don’t link this anywhere on my site but I do have a photo gallery too, which is built up from YAML data files and template logic.

Another technical challenge was ensuring my URLs don’t change for both my pages and assets. That means I have to digest assets in the same exact format as Jekyll-Assets or this migration cannot happen since I’m not going to 301 redirect ~1,000+ images.

Also, I encountered a lot of resistance with how Hugo deals with trailing slashes with URLs. You can either use “ugly” URLs with .html in them and then redirect them with nginx or another web server or you can use “pretty” URLs but every URL will end with a trailing slash, such as /about/ instead of /about.

In my opinion that was a deal breaker for me. From a technical perspective I did not want:

  • All requests to get 301’d
  • All of my pages having a trailing slash
  • Alternatively do some post-processing on the static HTML files to remove the slash

I actually broke this rule and did the entire migration without solving the URL problem first because I mentally agreed that no matter what I’m going to figure out a solution.

Sheer determination is a powerful thing. Fortunately I did figure out how to work around this with Hugo once I got into the thick of it. We will cover that in the details later.

As for the other bigger challenges, I did confirm they will be possible to pull off.

There’s a billion other smaller technical hurdles but the above were the big ones. Ok, we’re really doing this. Time to get started with the migration!

Just kidding.

Go for a Fresh Look or Keep the Same Theme?

My theme is something I built with vanilla Bootstrap many years ago. I don’t NOT like it but it would be nice to change eventually.

Another interesting bigger technical challenge was Jekyll-Assets allows me to separate my SCSS and JS into separate files through Sprockets and then it bundles everything.

I’ve already been using esbuild and Tailwind for every other project for years and decided I wanted to use that with Hugo too but I didn’t want to convert all of my SCSS and JS.

I ultimately decided trying to change my theme during this move is too much at once. I can do this incrementally. Get everything going on Hugo with my existing theme and then consider switching after it’s stable.

I ended up taking the non-minified but concatenated single CSS and JS files from Jekyll and plopped it straight into my esbuild set up without Tailwind for now. That was pretty painless, everything “just worked”. Then esbuild minifies it for production builds.

Ok cool, now let’s begin the migration of everything else.

# High Level Feature Scan

I know I had to convert 500+ blog posts from Jekyll’s Liquid template language to Hugo’s Go template language but in my mind that’s the easy part. I knew I could script out a 90% automated solution with Python in a few hours. I didn’t think about this part until I identified the features I need to convert over.

For this feature dump, I didn’t care about ordering or prioritizing them, I just wanted them out of my head so I dumped them into a plain text list.

This wasn’t meant to be a perfect or complete list. It was to give me a rough idea of the scope of the project. I can always add to it later as I go. I time boxed myself to 30 minutes:

  • Pretty URLs without trailing slashes
  • Show N latest posts
  • Pagination
  • Generate a table of contents (“Quick Jump”)
  • Filter posts by tag
  • Ability to organize draft posts in a separate directory
  • Ability to see drafts and future posts in development
  • Embed YouTube videos
  • Embed tweets
  • Digest CSS, JS and most images
  • Ability to not digest certain images (such as the gallery images)
  • Minify final HTML for production builds
  • Live reload for local development
  • External links open in a new browser window
  • Split out content types (blog, courses, etc.)
  • Content types can have their own layout
  • Ability to toggle features on a per page basis
    • Comments, table of contents, different newsletter form, affiliate link callout, etc.
  • Use template helpers for cross links to ensure I don’t have dead links
  • Ability to segment template logic into partials, etc.
  • Load data from YAML files
  • Sitemap
  • RSS feed
  • robots.txt
  • Twitter and OpenGraph tags
  • Favicon at various sizes

What this exercise let me confirm was there’s not a whole lot there which is uncharted territory. Most of these things are straight forward where I could check the docs or internet for the specific syntax for what I needed to do.

It was one of my first “wins” of the project. This is doable, it’s going to happen. I like wins like this because it puts me into a different type of mindset. It’s like the hard part is done and now it’s just implementation details.

Spoiler alert, some of these implementation details were really tricky to figure out but I learned a lot along the way!

# Hello World with Hugo

The first thing I did was get the plumbing of the project up and running.

That means getting Hugo and esbuild running in Docker with Docker Compose. Fortunately this was a snap. I used my other Dockerized starter apps as inspiration.

Hugo Is Fast

One fun problem I had here was esbuild produces files in public/ and I configured Hugo to read assets in from that directory.

When using depends_on with Docker Compose, I made sure esbuild starts before Hugo but if Hugo finishes starting before esbuild outputs its files then Hugo will error out due to certain assets not existing while they’re referenced.

To resolve this, I added this to my Docker entrypoint script:

assets_dir="public/assets"
until [ -f "${assets_dir}/css/app.css" ] && [ -f "${assets_dir}/js/app.js" ]; do
  sleep 0.25
done

The above ensures the assets produced by esbuild are available on disk before Hugo’s server even begins to start.

I know there’s a chance this could be an infinite loop but I mean, it’s my personal blog here. I know those files will exist eventually and if they don’t then Hugo won’t start and I know something is up.

In the 6+ months of using this set up, that code has executed hundreds if not 1,000+ times and it was never a problem. Works for me!

Once I got a basic set up running then I started to learn Hugo. I’ll spare you the details but it’s how I research and learn everything else. Basically Googling for specific problems and tinkering while optimizing for short feedback loops of having a question and trying out a solution.

One nice perk of this migration is I already have the answer book in front of me with Jekyll. I kept my Jekyll server running locally and constantly compared Hugo’s output to it to ensure it at least looks the same. This was wildly useful.

I started with the home page. That uncovered some challenges of working with Hugo.

I don’t think it’s worth walking through literally everything I learned while converting each page, so let’s initially focus more on comparing Jekyll’s terms to Hugo’s. We can cover some of the more interesting features and challenges afterwards.

# Comparing Terms for Hugo vs Jekyll

Each bullet is labeled with what it’s called in Hugo.

  • Layouts
    • These exist in both, even being able to use blocks too
  • Partials
    • Jekyll uses {% include myfile.html %}
    • Hugo uses {{ partial "myfile.html" }}
  • Shortcodes
    • You can think of these as creating custom logic you can use in your content
      • Embedding YouTube, creating links, removing code duplication, etc.
    • With Jekyll you’d typically use include for this too
  • Functions
    • Hugo has a bunch of functions you can call in your templates
      • Trim strings, math, hashing, etc.
    • Jekyll calls these “tags” through Liquid and there’s a bunch too
  • Frontmatter
    • This exists in both, Hugo supports YAML, TOML or JSON frontmatter
    • I plan to stick with YAML, it works and looks great for this sort of thing
  • Content
    • You have “content” pages, this is where your Markdown files go
    • A “blog post” is just a type of content, there is no special _posts directory
  • Config
    • Both support config files and environment specific configs (dev, prod, etc.)

There’s a number of other things like assets, static files and data files which are named about the same in both. There’s also the concept of themes in both, but I am not using a third party theme in either case.

By the way, there’s all sorts of other Hugo specific terms related to content management I’m not listing here since they aren’t the focus of this post. You can find them in their docs.

# Note Worthy Comparisons

I don’t think it’s worth covering everything like how to define a variable in Liquid vs Go or using a for in loop vs a range to loop over something. You can easily Google these things!

Let’s focus on a few basic but useful features and a number of more interesting things.

File Based Dates with YYYY-MM-DD Prefixes

With Jekyll this works out of the box because it’s built into Jekyll when you create “posts”. You can also configure Jekyll to hide the date from the URL by setting permalink: "/blog/:slug". Of course that’s a personal preference but that’s the combo I use.

Hugo doesn’t enforce this because it has no special notion of what a “post” is but you can get the same effect pretty easily by setting this in your config:

frontmatter:
  date: [":filename", ":default"]

That will let your files sit on disk like YYYY-MM-DD-hello-world.md and with no further configuration your URLs will be /hello-world/ (we’ll talk about the trailing slash later).

When you have 500+ posts, having them sorted naturally with YYYY-MM-DD is very helpful. Otherwise you have no idea when something was created if you’re glancing your files.

Also having the date in there lets you fuzzy match on it. There’s been many times where I knew I wrote something in let’s say the summer of 2022 but I forgot the title so I began searching for 2022-07 to see what pops up.

Separate Directory for Your Drafts

With Jekyll this works out of the box because “posts” are a special thing and a “draft” is a special type of post.

Hugo by default lets you set draft: true in your frontmatter but that expects both your live posts and drafts exist in the same directory on disk.

I didn’t want them to be combined because I like quickly seeing which of my posts are live or drafts. I also very often fuzzy find files with “drafts” in their name.

This is surprisingly easy to do with Hugo but it was tricky to figure out what features of Hugo needed to be used to achieve this.

All you have to do is create a content/drafts/_index.md file and then within that file add:

---
build:
  # Ensure Hugo doesn't produce a /drafts/ URL and associated static HTML file.
  render: "never"

cascade:
  # We want drafts to be classified as content type "blog".
  type: "blog"

  # We want all of these content files to be set as a draft.
  draft: true
---

Pretty cool, build lets you configure various build options and cascade will apply these settings to all pages of this content type.

When you’re writing a “draft” or a “blog” post nothing changes there. You don’t need to mark anything as a draft in each post or worry about it. You can promote a post from draft to live by moving it from content/drafts/ to content/blog/ on disk.

When you’re looping over your blog posts, it will mix in both together in the same way that Jekyll does. When building your production site you would omit including drafts and future posts but include them in development. Done!

They are omit by default. In your development config you can have:

buildDrafts: true
buildFuture: true

Removing Trailing Slashes from URLs

Jekyll was great in this regard. You can choose if you wanted a trailing slash or not in your URL based on how you organized your files.

If you had hello.html in the root of your project it wouldn’t get a trailing slash but if you had hello/index.html then it would. Straight forward and intuitive.

With Hugo it’s not like that. You can configure uglyURLs to be true or false. If true then you’ll have hello.html else it will be /hello/.

However, it’s possible to workaround this:

In all of your templates, anytime you output a URL (Permalink, RelPermalink, URL, etc.) you can pipe it to strings.TrimSuffix "/" such as {{ .RelPermalink | strings.TrimSuffix "/" }}.

Then you can override the rel and relref shortcodes to do the same. These are used for creating verified cross links. For example create layouts/shortcodes/ref.html and then add {{ (ref . .Params) | strings.TrimSuffix "/" -}} to the file.

Now you can use pretty URLs without a trailing slash.

Hugo’s server in development will redirect to the version of the page with a trailing slash but nginx or another web server won’t do that. That works for me, it ensures there’s no 301s happening in production.

In my case I did use to have /blog/ with a trailing slash but I decided to consistently not use trailing slashes in the end. In that case I did configure nginx to 301 redirect /blog/ to /blog so old back links continue to work. All of the links within my site use /blog so the redirect is avoided. I consider this a victory.

Custom Digested File Names

All digesting was handled by Jekyll-Assets and it has a specific format of filename-digest.extension, so hello.jpg will become hello-abc123.jpg where abc123 is a SHA-256 checksum.

Hugo calls this process a “fingerprint” and it’s built-into the tool. You can do resources.Get "/assets/css/app.css" | fingerprint to fingerprint a resource.

By default it will use filename.digest.extension, so hello.jpg will become hello.abc123.jpg where abc123 is a SHA-256 checksum. You can also adjust the hashing algorithm to MD5 or others if needed.

The issue is at the moment there’s no way to customize the delimiter. I want to use - not . so I opened an issue as a proposal and it’s scheduled for a future version of Hugo.

In the meantime I wanted this feature today because not being able to customize this is a deal breaker due to 1,000+ images being digested with a different file name.

With some help of the community, we figured out a solution. I rolled it up into an image.html partial as seen below:

{{ $fingerprintDelimiter := "-" }}
{{ $src := print "/assets/" .src }}
{{ $height := .height }}
{{ $width := .width }}
{{ $class := .class }}

{{ with resources.Get $src }}
  {{ $sha256 := sha256 .Content }}
  {{ $ext := path.Ext .Name }}
  {{ $pathWithoutExt := replaceRE `\.[^.]+$` "" .Name }}
  {{ $pathWithoutExt = strings.TrimLeft "/" $pathWithoutExt }}
  {{ $fullPathWithFingerprint := printf "%s%s%s%s" $pathWithoutExt $fingerprintDelimiter $sha256 $ext }}
  {{ with resources.Copy $fullPathWithFingerprint . }}
    <img src="{{ .RelPermalink }}" {{ with $class }} class="{{ $class }}"{{ end }} height="{{ $width | default .Height }}" width="{{ $width | default .Width }}" alt="{{ index ((split .Name "/") | last 1) 0 }}" >
  {{ end }}
{{ end }}

I also introduced a shortcode for it so I can use it in both layouts and content files.

For example, here’s how to use it: {{< image src="blog/some-image.jpg" >}}

That will produce this output:

<img src="/assets/blog/some-image-de45a50b6ecdd1e11fc0dbd5dd2619145ad3c47efa4b527e3a9813a28ca24f01.jpg" height="980" width="1607" alt="some-image.jpg">

Once a solution is in Hugo I’d still have the partial to make it easy to reference images and nothing will change in any layout or content files. All of the logic around copying, calculating a hash and assembling a file name will go away. Yay for deleting code.

For now this isn’t too bad because this complexity is contained in 1 file and its implementation can change without affecting other files.

If I were starting a new blog today I wouldn’t worry about using the custom delimiter because then I’d accept Hugo’s defaults. I would still make the partial though. It’s worth having that abstraction.

Syntax Highlighting

I have Jekyll configured to use the Rouge highlighter. All I did was find a specific theme’s CSS file and dropped it into my project. It’s compatible with any Pygments theme too.

Hugo uses Chroma. Hugo makes it easy to pick an existing style. For example, I use:

markup:
  highlight:
    style: "monokailight"

The problem is by default the $ and # characters have a dark background with dark red text which clashes hard with the style. I had to customize it.

To do that I did:

  • hugo gen chromastyles --style=monokailight
  • Copy the output to your clipboard
  • Paste the contents of that into my existing app.css file
  • Check the page source to see which class needs to be overwritten
  • In my case it was only the .err class
  • Update the colors to whatever you want for that class
  • Add noClasses: false as a new config option under the style option
    • This instructs Hugo to use the custom CSS file instead of the style

There’s lots of styles to choose from but I haven’t found one I really like yet. For now Monokai Light works well enough.

Assorted Template Functions

I don’t want to include too many basic examples but here’s a few interesting ones.

Latest 12 blog posts

On my home page I list out a dozen recent posts but I insert a clearfix div every 3 entries to make sure things are broken up into 3 columns. Remember, I didn’t want to re-do much with my CSS and I know this problem can be solved [with CSS](/blog/displaying-database-results-across-multiple-columns-with-1-line-of-css.

In Jekyll, I had this:

{% for post in site.posts limit: 12 %}
  <div class="col-md-4">
    <!-- Ommitting for brevity -->
  </div>

  {% assign divisible_result = forloop.index | modulo: 3 %}
  {% if divisible_result == 0 %}
    <div class="clearfix"></div>
  {% endif %}
{% endfor %}

In Hugo, I went with:

{{ range $index, $post := sort (where .Site.RegularPages "Type" "blog") "File.Path" "desc" | first 12 }}
  <div class="col-md-4">
    <!-- Ommitting for brevity -->
  </div>

  {{ if modBool (add $index 1) 3 }}
    <div class="clearfix"></div>
  {{ end }}
{{ end }}

One of the interesting differences is Liquid’s loop index starts at 1 where as Go’s starts at 0. Initially I wasn’t getting anywhere near the results I wanted to see.

A general takeaway here is seeing what a value is pretty much enables cheat codes. As soon as I printed the loop’s index I immediately saw what was happening and I had to add 1 to Go’s $index variable.

As a bonus, the above code snippet also shows to sort files and get the latest N results. I used File.Path because I’m using YYYY-MM-DD prefixed files.

Dealing with apostrophes and plus symbols

While testing the output of both tools I noticed that Jekyll output a regular apostrophe, double quotes and various other symbols in the source code for both the <title> and <meta name="description" content=""> tags.

However, Hugo used the ASCII code such as &#39; for '. Keep in mind this was only in the page’s source code. The browser tab’s title had the normal character.

Given how important these 2 tags are for SEO purposes, I didn’t want to risk having them change. I don’t think it would have an effect but I noticed most other big sites didn’t use the ASCII code. They output the real character.

This turned out to be pretty tricky to solve.

The docs mention you can do .Title | safeHTML but that does not work. You will get the ASCII code in the output.

Instead you have to do:

{{ printf "<title>%s</title>" $title | safeHTML }}

I still don’t understand why fully but it’s something related to Go’s template parsing.

For the description, I ended up with:

{{ $description := trim (replace .Description "\n" " ") " " | chomp | printf "content=%q" | plainify | safeHTMLAttr }}

<meta name="description" {{ $description }}>

Not all of that is related to ASCII codes but it normalizes whitespace, removes trailing new lines, removes HTML tags and declares this a safe HTML attribute.

I set all of my descriptions with frontmatter and I wanted a robust way to handle a bunch of different ways to define that. For example:

description: |
  Hello world.

That leaves a trailing new line because you didn’t use |- so chomp gets rid of that.

I also output the description on most pages too and once in a while I have HTML tags such as a link. I don’t want those links included in the meta description.

In any case, both of those problems are now solved.

Assorted Content Differences

Here’s a few unexpected changes I had to make after switching to Hugo. Thankfully I caught all of these before I pushed anything live.

Leading white space within HTML tags

If you had both of these in a Markdown file, Jekyll treated them the same:

<div>
    <p>Hmm</p>
</div>
<div>

    <p>Hmm</p>
</div>

In both cases your page’s output would contain “Hmm” where it’s wrapped in a <p></p> tag within the source code.

Hugo will render the 2nd example differently. It will literally output <p>Hmm</p> and it ends up being wrapped in a <pre></pre> tag.

Technically that adheres to the CommonMark spec because it has 4 leading spaces, still that was kind of unexpected to me because I don’t have the spec memorized and Jekyll auto-magically handled this difference.

In attempts to find these I ran grep -ER "^\s{4}<" content. This found a number of false positives but it was still good enough to manually go through them in a few minutes to fix a couple of posts that had this issue.

Nested numbered lists

With Jekyll I had lists set up like this where the nested list had 2 spaces of indentation:

1. A
  - X
2. B
  - Y

That would output things as I wanted, like this:

  1. A
    • X
  2. B
    • Y

However in Hugo, the output looked like this:

  1. A
  • X
  1. B
  • Y

Yikes. The fix there is to use 4 spaces instead of 2.

It gets more fun. In my 120+ SRE skills blog post I had over 100 items in a numbered list.

On the 100th item if you use 4 spaces for the indented list you end up with this:

  1. A - X

It pulls it up to the same line. The fix there is for the 100th item and higher you need to use 6+ spaces instead of 4. Good times.

This one was annoying to find with grep. I ended up just looking for all posts with any form of list with a pattern of ^\s{2}- and adjusted the few posts that needed it.

Blog Tags

There’s a bunch of ways to implement tags in Jekyll and I suppose Hugo too. In both cases I wanted a way to define tags: ["docker", "flask"] as frontmatter in my posts. That’s what I did previously in Jekyll and then I wrote various logic to handle outputting that.

With Hugo I created a “taxonomy” which really means having a content/tags/_index.md file and then a bunch of specific tag related files in that directory such as content/tags/docker/_index.md.

That file only has frontmatter, such as:

---
slug: "docker-tips-tricks-and-tutorials"
title: "Docker"
description: |
  I've been using Docker since 2014. Along the way I've picked up a bunch of
  Docker experience and best practices. Here's what I learned.

params:
  heroTitle: "Docker Tips, Tricks and Tutorials"
---

Finally in Hugo’s config I put:

taxonomies:
  tag: "tags"

And now you can visit /blog/tag/docker-tips-tricks-and-tutorials on my site. I wanted to keep the same URL as I did with Jekyll.

For displaying tags such as what I do on my blog’s index page you can loop over your tags with {{ range $name, $taxonomy := .Site.Taxonomies.tags }}.

At the top of every individual blog post I list out the tags such as #docker, #flask, for that you can loop over a post’s tags with {{ range $index, $tag := .GetTerms "tags" }}.

Looping over Data Files

Both tools let you have data files, such as YAML files and then you can loop over them to do whatever you want in your templates.

For example you might have something like:

- title: "My Course"
  url: "..."
- title: "Another Course"
  url: "..."

Then you can loop over this data structure and generate a page. That’s exactly what I do for my courses page.

For Jekyll, in the course’s page that reads this file you can do {% for course in site.data.courses %}. With Hugo, it’s about the same {{ range $index, $course := .Site.Data.courses }}.

Where it gets tricky with Hugo is when you want to dynamically pull out a specific entry from the data file based on a variable (such as the page’s slug). This didn’t come up for me with courses but it did with my image gallery’s photos.

With Jekyll, you can do {% for gallery in site.data.galleries[page.slug] %}, that’s similar to how you can access a dictionary’s value with a variable key in a lot of languages.

With Hugo, I did {{ range $index, $gallery := (index .Site.Data.galleries .Slug) }}. I don’t even remember where I found this, maybe it was tucked away deep in a GitHub issue. All I know is it wasn’t straight forward (at least not to me).

Pagination

This one probably isn’t worth mentioning only because it’s pretty easy in both set ups. Jekyll has a plugin and it’s built into Hugo. The docs have good examples for both.

Using Hugo’s Render Hooks

Hooks can be used to adjust the output of rendering Markdown to HTML. I used them to help solve 2 problems. Opening external links in a new window and also to add named anchors for headings.

A big caveat here is this only applies to Markdown content. If you have links in your layouts, those will not get this hook applied. With my custom Jekyll plugin, it affected all links from anywhere since it ran before it output the final HTML. It didn’t matter where it came from.

It’s not the end of the world since most of my links are in content files but it’s for sure worth bringing this up. It does bug me though because it means I need to manually remember to modify links in layouts depending on where they go.

External links

Hugo has a link hook. All you have to do is create this file layouts/_default/_markup/render-link.html and then put in the code you want to use for creating links.

{{- $u := urls.Parse .Destination -}}

<a href="{{ .Destination | safeURL }}"
  {{- with .Title }} title="{{ . }}"{{ end -}}
  {{- if and $u.IsAbs (not (in .Destination .Page.Site.BaseURL)) }} target="_blank"{{ end -}}
>
  {{- with .Text | safeHTML }}{{ . }}{{ end -}}
</a>
{{- /* chomp trailing newline */ -}}

I mostly grabbed that from the docs.

I won’t bother posting the Jekyll solution since it’s a custom Jekyll plugin. It parses the HTML using Nokogiri and inserts the attribute when applicable.

Named anchors

I really like it when content oriented sites make it easy to link somewhere to a specific part of a page, such as a heading.

You can find this feature on my site too. I add a # link next to each <h3> heading.

Hugo has a hook for headings too, you can create this file layouts/_default/_markup/render-heading.html and then put in the code you want to use to modify your headings.

{{ if eq .Level 3 }}
  <p><a name="#{{ .Anchor | safeURL }}"></a></p>
  <h{{ .Level }} id="{{ .Anchor | safeURL }}">
    <a href="#{{ .Anchor | safeURL }}" style="position: absolute; left: -12px;">#</a>
    {{ .Text | safeHTML }}
  </h{{ .Level }}>
{{ else }}
  <h{{ .Level }}>{{ .Text | safeHTML }}</h{{ .Level }}>
{{ end }}

In my case I only apply this to <h3> headings. You can modify that if you want.

The end result is nice. When I’m writing a post all I have to do is add ### Hello and the hook will add the named anchor and the # link.

In Jekyll I did this in a much worse way. It sort of combos with the idea of creating the “Quick Jump” table of contents which is coming up next.

Table of Contents

In Jekyll I combined the idea of named anchor links and the “Quick Jump” feature together. It was something I implemented a year after I started my blog and didn’t think it through in the best way possible. However, it has worked all these years so I never changed it.

Each page that has it has frontmatter defined like this:

toc:
  - "My Title"
  - "Another Title"

Near the top of the page, it loops through this to display them as the quick jump.

Then within the page itself I reference a specific title like this:

<a name="{{ page.toc[0] | slugify }}"></a>
### {{ page.toc[0] }}

The [0] is the index of the toc list. Ultimately it was a lot of copy / pasting between headings because I rarely typed it out. This wasn’t ideal because if I wanted to re-order headers I had to shift things around.

Looking back it could have been easier with a custom Jekyll plugin that did something similar to the heading hook from Hugo.

With Hugo there is a built-in .TableOfContents method which outputs all of your headers in a <ul id="TableOfContents">. All I do is render that as the quick jump and use CSS to add the little pipe separator between <li> items.

Then I define my headings like normal with ### My Title and there’s no frontmatter involved except to set toc: true or toc: false which defaults to true so I don’t have to set it in most pages.

This also allows separating the quick jump and the named anchor links. For example my about page has no quick jump but it has headings with named anchor links. In this case I have toc: false set.

It’s worth pointing out by default Hugo configures this TOC to start with <h2> and end with <h3>. My page layout includes an <h1> and <h2> but these don’t get included in the TOC. All of my individual pages have the “biggest” heading as <h3> so it works out in the end.

You can read how to customize the start and end levels in their docs.

Sitemap and RSS Feed

For Jekyll I added the Jekyll-Sitemap plugin and that was it.

Hugo will create a Sitemap by default which was nice but I did have to customize it to remove trailing slashes from the URLs. That involved creating this file layouts/_default/sitemap.xml, grabbing the default template from GitHub and then doing a strings.TrimSuffix "/" on all of the .Permalink references.

One note worthy difference is Jekyll will include all pagination pages such as /blog/page/3 in the sitemap where as Hugo does not. It’s debatable on if you should even include them since they can be crawled. I did not find an easy way to include them with Hugo so I left them out.

It’s been 6+ months and I didn’t notice any difference in traffic after making the switch which indicates nothing was negatively impacted from an SEO perspective.

For the RSS feed I created a custom atom.xml file for both Jekyll and Hugo. I grabbed the default RSS template for Hugo on GitHub.

It loops over all blog posts and grabs the latest 25 entries. The formatting and implementation of the file is the same in both tools minus templating logic.

When it came to only including blog posts I modified the default template to use {{- $pages = where $pctx.RegularPages "Type" "blog" }}.

It’s worth pointing out 2 Hugo specific changes.

First, I wanted to make sure Hugo outputs an atom.xml file since that’s what I had with Jekyll. I did that by adding this to my Hugo config file:

outputFormats:
  RSS:
    mediatype: "application/rss"
    baseName: "atom"

Second, it’s worth pointing out that by default Hugo will create separate RSS feeds for different areas of your site (section, taxonomies and your content pages). I didn’t want that behavior. I only wanted 1 file.

In that case you can add this to your Hugo config file:

outputs:
  home: ["html", "rss"]
  section: ["html"]
  taxonomy: ["html"]
  term: ["html"]

That only generates a root RSS feed and ignores generating files for the others.

Twitter and YouTube Embeds

With Jekyll for Twitter I used this plugin. That lets you embed tweets with {% twitter https://x.com/USER/status/TWEET_ID %}.

With Hugo there’s a built-in shortcode: {{< x user="USER" id="TWEET_ID" >}}

By the way to even produce the above output I had to put </* and */> inside the squiggly braces to avoid processing them. There is no {% raw %} tag like Liquid. The author of Hugo describes this as “special comment out shortcode syntax” he made up.

Also, one cool feature with Hugo’s twitter implementation was it showed a warning in the terminal when it couldn’t retrieve a tweet since the user no longer existed. The Jekyll plugin cached the results so I haven’t pulled it from Twitter in likely years.

With Jekyll for YouTube I just took the embed code from YouTube and turned it into an include to reference it. Calling it looks like: {% include youtube.html uid="YOUTUBE_ID" %}

With Hugo there’s a built-in shortcode {{< youtube "YOUTUBE_ID" >}}. You can customize it with a number of settings. Not too shabby.

I have hundreds of YouTube embeds and I didn’t want to do those by hand. I ended up writing a tiny bit of Python code to do the replacement:

# I shortened the variable name of "content" to "s" so it fits in 1 line here:
s = re.sub(r"{% include embed.html type=\"youtube\" uid=\"(.*)\" %}", r'{{< youtube "\1" >}}', s)
s = re.sub(r"{% include embed.html type='youtube' uid='(.*)' %}", r'{{< youtube "\1" >}}', s)

I did it in 2 passes to catch both double and single quote variants. I know it could have been optimized with a few “or” conditions in the regex but this was a 1 off script and it finishes in less than 1 second. Regex capture groups are useful from time to time.

Then I grepped both code bases and confirmed I got the correct counts of each tag to double check my work. There was a lot of that sort of thing done whenever I used Python or sed to do replacements.

# Double Checking Your Work

I kept a running TODO list on what to check or convert.

It’s not meant to be an exhaustive list since I only started it half way through the migration, I also removed a bunch of items from this post related to the topics we covered above.

I wanted to do everything in my power to ensure nothing was different after the migration unless I made an explicit decision for it.

  • Remove unnecessary whitespace in content files
  • Remove trailing new lines in content files
  • Replace YouTube embeds
  • Replace captures (a Liquid feature to make more “complicated” variables)
  • Replace image references from Jekyll-Assets to Hugo
  • Replace post_url references to relref
  • Remove .site.url references in favor of relative URLs (minus exceptions)
  • Trim and chomp new lines where it makes sense
  • Remove trailing slashes from /blog/, /courses/ and /galleries/
  • Remove all {% raw %} tags
  • Ensure all static files exist (robots.txt, fonts, favicons, non-blog images, etc.)
  • Confirm newsletter still works
  • Gallery images should not be digested
  • Make sure Disqus still works (I can pull up old comments for older posts)
  • Diff the sitemap to make sure it includes everything I want
  • Make sure social sharing links still work
  • Make sure Twitter cards and OpenGraph tags are correct
  • Confirm the number of posts in Hugo matches what’s in Jekyll
  • Double check published directory to make sure all Hugo tags are processed

Quality Control

Lastly I used this as an opportunity to clean up some inconsistencies in my posts around formatting and overall quality control. I did not manually check everything in every page. I spot checked a few pages after running a bunch of scripts and commands.

I didn’t include most of the Python scripts and grep / sed commands here. A lot of it is pretty specific to my site’s content and personal Jekyll set up. A takeaway for you might be to consider doing the same.

If you have specific questions I’d be happy to answer them in the comments.

Programmatically Generate New YAML Frontmatter

When you have 500+ pages, going back and manually updating the frontmatter to match whatever new format you want is way too tedious.

My goal was to convert the old frontmatter of this:

layout: 'post'
tags: ['dev-business', 'dev-environment']
has_affiliate_link: true

card: 'blog/cards/the-tools-i-use.jpg'
title: "The Tools I Use"
description: "Here's a list of software and hardware that I use on a regular basis as a developer and video creator. I will be keeping it updated."

toc:
  - "OS"
  - "Code Editor and Terminal"
  - "Notable Apps"
  - "Computer, Desk and Phone"
  - "Recording and Music"

Into new frontmatter that looks like this:

tags: ["dev-business", "dev-environment"]

title: "The Tools I Use"
description: |
  Here's a list of software and hardware that I use on a regular basis as a
  developer and video creator. I will be keeping it updated.

params:
  card: "the-tools-i-use.jpg"
  affiliateLink: true

Here’s some improvements:

  • Single vs double quotes are consistent
  • The layout and toc fields are removed
  • Top level fields are separated with new lines in an easier to skim way
  • Non-Hugo specific fields are namespaced to params
  • The card in the new frontmatter no longer includes blog/cards/ in the path

There’s other changes too, such as changing a field name from bottom_cta to newsletter. That field wasn’t included in the above example.

Here’s some zero dependency Python code you can re-purpose for your set up. I’ve marked up the code with extra comments to better explain it:

import os
import re
import yaml

# Create a mini-template for how I want the new frontmatter to look.
hugo_frontmatter = """
tags: $tags

title: "$title"
description: |
  $description

params:
  card: "$card"
  newsletter: "$newsletter"
  affiliateLink: $affiliateLink
"""

# Regex to match frontmatter.
regex = re.compile(r"---([\S\s]*)---", re.MULTILINE)

# Loop over all blog posts.
dir = os.fsencode("./content/blog")

for file in os.listdir(dir):
  filename = os.fsdecode(file)

  if filename.endswith(".md"):
    with open(f"./content/blog/{filename}") as f:
      content = f.read()

      frontmatter = re.findall(regex, content)[0]

      # This protects against multiple frontmatter matches in case you use ---
      # later on in a post which is unrelated to its frontmatter, such as <hr>.
      frontmatter = frontmatter.split("---", 1)[0]

      yaml_content = yaml.safe_load(frontmatter)

      replaced_content = content
      has_bottom_cta = False
      has_affiliate_link = False

      # Start building up the new frontmatter from the old frontmatter.
      if yaml_content.get("tags"):
        tags = []

        for tag in yaml_content["tags"]:
          tag_str = '"' + tag + '"'
          tags.append(tag_str)

        new_frontmatter = new_frontmatter.replace("$tags", f'[{", ".join(tags)}]')

      if yaml_content.get("bottom_cta"):
        has_bottom_cta = True
        new_frontmatter = new_frontmatter.replace("$newsletter", yaml_content["bottom_cta"])

      if yaml_content.get("has_affiliate_link"):
        has_affiliate_link = True
        new_frontmatter = new_frontmatter.replace("$affiliateLink", str(yaml_content["has_affiliate_link"]).lower())

      if yaml_content.get("title"):
        new_frontmatter = new_frontmatter.replace("$title", yaml_content["title"])

      if yaml_content.get("card"):
        new_frontmatter = new_frontmatter.replace("$card", yaml_content["card"])

      if yaml_content.get("description"):
        new_frontmatter = new_frontmatter.replace("$description", f'{yaml_content["description"]}')

      if not has_bottom_cta:
        new_frontmatter = new_frontmatter.replace('  newsletter: "$newsletter"\n', "")

      if not has_affiliate_link:
        new_frontmatter = new_frontmatter.replace("  affiliateLink: $affiliateLink\n", "")

      # Your new frontmatter will be included here.
      replaced_content = replaced_content.replace(frontmatter, new_frontmatter)

Out of ~500 posts this updated every single one without issues. This was one of those times where spending an hour or whatever it was to get this automated and ironed out was faster than doing it manually.

I kept printing the new_frontmatter until I was sure it was fully working. Once it was working I modified the Python code to save the file with the replaced_content.

Updating Hugo Versions with Confidence

Given I was stuck using an old version of Jekyll I haven’t changed versions in many years. With Hugo this is going to be a different story. I imagine I’ll update reasonably often.

While Hugo has an extensive test suite, I feel a little uneasy updating to a newer version and not having any insight on if my page’s output changed. It would be like shipping application code to production without any tests.

I’m a big fan of the run script approach for adding project specific aliases or functions.

I have a ./run publish command which builds a production version of the site. It doesn’t push it anywhere, that just locally builds the site.

I added this code at the end of that command:

  local backup_path="/tmp/published/"

  if [ -n "${backup}" ]; then
    rm -rf "${backup_path}"
    cp -a published/ "${backup_path}"
  fi

The basic idea is if I ./run publish --backup it will copy the final output to a temp directory. Now if I update Hugo to a new version and ./run publish again I can diff the published/ (new version) and /tmp/published/ (old version) directories to see if anything changed.

You can think of this as 1 big end-to-end test that’s really fast.

That gives me peace of mind that something didn’t malfunction and will also help me understand what is changing between Hugo versions. I’ve already updated from v1.30.0 to v1.31.0 using this approach and saw no changes which is the outcome I was expecting.

Over the last 6+ months I’ve done this a few times. It’s all good! I am currently running v0.144.2 but that will be upgraded over time.

This strategy also helped me uncover that Hugo injects a meta tag on your home page with its version such as <meta name="generator" content="Hugo 0.144.2">.

You can disable that by adding disableHugoGeneratorInject: true in your Hugo config file. I disabled it since I don’t like the idea of leaking internal implementation details of how the site was generated, although technically in this case it’s only static files, it’s a matter of principle. I have the version defined as a Docker build arg so it’s easy to see in development.

I even ended up opening a pull request on Docker’s documentation to remove that tag. I don’t think it’s well known that Hugo does this.

Detecting default templates I’ve changed

Another thing to consider is I locally copied a few default Hugo pagination, RSS and sitemap templates and customized them. When updating Hugo versions it’ll be important to detect if I need to update those templates due to something changing.

I created (3) run script functions to check them:

# This is a private helper function.
function _diff_templates {
  local local_path="layouts/${1:-}"
  local remote_path="${2:-}"
  local tag="${3:-master}"
  local remote_url="https://raw.githubusercontent.com/gohugoio/hugo/${tag}/tpl/tplimpl/embedded/templates/${remote_path}"

  diff --color -u <(curl --silent "${remote_url}") "${local_path}"
}

function diff:pagination {
  _diff_templates "partials/pagination.html" "_partials/pagination.html" "${1:master}"
}

function diff:rss {
  _diff_templates "_default/rss.xml" "rss.xml" "${1:master}"
}

function diff:sitemap {
  _diff_templates "_default/sitemap.xml" "sitemap.xml" "${1:master}"
}

Now when I run ./run diff:sitemap it checks my local template vs what’s on Hugo’s master branch. I can also run ./run diff:sitemap vXXX if I want to check that remote tag instead of master when I’m doing a version update.

Here’s an example output:

$ run diff:sitemap
--- /dev/fd/63  2024-08-05 08:52:14.751901664 -0400
+++ layouts/_default/sitemap.xml        2024-08-03 15:47:12.785479481 -0400
@@ -4,19 +4,19 @@
   {{ range where .Pages "Sitemap.Disable" "ne" true }}
     {{- if .Permalink -}}
   <url>
-    <loc>{{ .Permalink }}</loc>{{ if not .Lastmod.IsZero }}
+    <loc>{{ .Permalink | strings.TrimSuffix "/" }}</loc>{{ if not .Lastmod.IsZero }}
     <lastmod>{{ safeHTML ( .Lastmod.Format "2006-01-02T15:04:05-07:00" ) }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
     <changefreq>{{ . }}</changefreq>{{ end }}{{ if ge .Sitemap.Priority 0.0 }}
     <priority>{{ .Sitemap.Priority }}</priority>{{ end }}{{ if .IsTranslated }}{{ range .Translations }}
     <xhtml:link
                 rel="alternate"
                 hreflang="{{ .Language.LanguageCode }}"
-                href="{{ .Permalink }}"
+                href="{{ .Permalink | strings.TrimSuffix "/" }}"
                 />{{ end }}
     <xhtml:link
                 rel="alternate"
                 hreflang="{{ .Language.LanguageCode }}"
-                href="{{ .Permalink }}"
+                href="{{ .Permalink | strings.TrimSuffix "/" }}"
                 />{{ end }}
   </url>
     {{- end -}}

When I see that, I know there’s my expected changes by removing trailing slashes. If I saw new output there I would know I need to go back and update my local template.

Additional Quality Control After 6+ Months

Over the last 6+ months I ran into a few interesting things not covered above.

Running out of disk space on my dev box

My dev box is 10+ years old. My WSL 2 instance is on my primary Windows drive which is a 256 GB SSD. Long story short I ran out of disk space mid-build. I knew not to push it because I saw the error but still adding more layers of protection is worth it.

This can be a problem because the site will incrementally be built. If you run out of space mid-way, you could end up with a really broken site.

In my run script to publish, I added this condition to prevent that. The amount will need to be adjusted for your site but I know for sure mine doesn’t require more than 1 GB to build:

  # /c in this case is my drive's mount point.
  if [ "$(df --output=avail /c | tail -n 1)" -lt 1000000 ]; then
    echo "Aborting, there's not enough disk space to complete this build!"
    exit 1
  fi
Parallel build race conditions

Hugo will do parallel processing to build your site across all of your CPU cores by default and I ran into what I think are a number of race conditions with this where I ended up having partial link tags which were broken because relref didn’t work correctly. This resulted in broken links and YouTube embeds.

This is a scary one because there’s no warnings or errors thrown by Hugo. You can only notice it by happening to see the page’s output.

I was able to reproduce it many times in development and production builds but I cannot reproduce it consistently so the author of Hugo didn’t accept the bug report.

I ended up setting the HUGO_NUMWORKERMULTIPLIER=1 env variable when I do production builds to disable parallel processing. I’ve done hundreds of builds with this set up and it never happened again as far as I know. Builds are already really fast.

I added both of these checks to my publish script. Both grep commands look for different types of failures that I’ve seen first hand. Each path is a location where links can be present:

  local paths=("published/blog" "published/confirmed" "published/courses" "published/galleries" "published/newsletter" "published/podcast" "published/subscribed" "published/unsubscribed" "published/work-together")

  # Note to reader: replace () with [], I had to replace it for this blog post
  # otherwise my site wouldn't build because the pattern was being found. :D
  if grep -R "\(.*\)/" ${paths[@]}; then
    echo
    echo "Aborting, this build produced broken Markdown links!"
    exit 1
  fi

  # I know this can be improved but it runs in ~20ms on 10+ year old hardware.
  #
  # Note to reader: replace 5 with %, I had to replace it for this blog post
  # otherwise my site wouldn't build because the pattern was being found again!
  if grep -R "5{" ${paths[@]} | grep -v "5{\$fg" | grep -v "5{}" | grep -v ";5{"; then
    echo
    echo "Aborting, this build produced broken Markdown links!"
    exit 1
  fi

I don’t know if it’s 100% fool-proof but I haven’t gotten any reports from anyone saying they found something busted on my site and neither regex has returned a match since I disabled parallel builds. For the first month or so of builds I spot checked at least 20 pages per deploy and found nothing out of the ordinary.

I’m not happy about this workaround but I feel ok that things are building ok.

# Safely Performing the Switch Live

I sync’d everything to a separate directory on my server so I didn’t overwrite what’s there. I also created a separate nginx vhost file on a separate hugo.nickjanetakis.com sub-domain so I could privately test things without touching my real site.

I even went as far as installing nginx locally to pre-test a few new redirect rules.

By the way, one difference with Hugo and pretty URLs is it generates an index.html file inside of a directory named after your content. For example blog/hello-world/index.html. Jekyll would have generated blog/hello-world.html.

To account for that I adjusted my nginx try_files to try_files $uri/index.html $uri.html $uri $uri/ =404; so it can find them.

Once I was happy everything worked I backed up my original nginx virtual host file, copied over the new one and renamed the server_name to point to my root domain. Then I deleted the A record for hugo because forgetting to do that in the past led to very bad things.

At this point I was live and if I wanted to revert, it would have taken a few seconds by switching the nginx config back to the original.

I didn’t have to revert. Everything worked on the first try.

The only issue I’m aware of was I forgot to diff my atom.xml file like I did for the sitemap. Thankfully a reader of the blog emailed me and let me know about a day after it went live.

The Hugo version included all pages where as the Jekyll version only included the latest 25 blog posts. Needless to say he was wondering what happened when a bunch of non-blog post pages appeared in his feed.

I fixed that very quickly and it’s been smooth sailing as far as I can tell. If you find something that looks off, please let me know!

In the last 6+ months I haven’t noticed anything else being off. The amount of preparation and treating this as a detail oriented problem really paid off in the end.

# Final Thoughts

In the end I’m happy I made the switch and would do it again but I can’t say that I’m 100% happy with Hugo. Like any tech choice, it’s nearly impossible to find something that’s perfect. I think Hugo has enough pros that it’s worth using if you have a decent sized site.

I didn’t have any real experience with Go prior to this. I appreciate the explictness of its templating language but at the same time it’s really verbose at times and has a few confusing (to me) rules and foot guns. I generally prefer Liquid over it for now but I’m willing to give it a chance.

I also like how Jekyll is less restrictive on how and where you can use certain features. For example Hugo doesn’t let you use partials in your content files, shortcodes are only available in your content files and render hooks only apply to content files.

What this really boils down to is you have to duplicate a number of things between layouts and content files and also sometimes create both a shortcode and partial just so you can call the shortcode which in turn calls the partial due to this restriction.

Sometimes you can’t use either. For example my nav bar has links and I have to hard code URLs like /blog and /about as the href="" value because you can’t use relref in a layout.

Lastly Jekyll lets you define frontmatter in your layouts which makes it very easy to make derivative layouts that are slightly different. Hugo doesn’t let you do that so you have to take more convoluted approaches to solve the same problem.

This really reminds me of Ruby’s “sharp knives” philosophy. You can write clean code and if you need to reach for something sharp to solve a specific problem the language has ways to do it and trusts that you know what you’re doing.

Go has a different philosophy and it leaks into the tools being created with it. Hugo feels like the author made certain design decisions and you have to stick to them how they are.

In Hugo’s defense it does have a lot of features and I haven’t deeply explored each one. There’s no way I would call myself an expert. It’s also a very well thought out project and I can appreciate all of the hard work, experience and design that went into creating it.

My only concern with Hugo is if I want to do something custom, I’m not sure it’ll be possible with what’s available.

For example on my Running in Production podcast site, when I display the show note topics in a list I include clickable timestamps which seek straight to that second in the audio player. That’s automated by a Jekyll plugin.

That site is also made with Jekyll. Each podcast post has Markdown that looks like this:

## Topics Include

- 7:13 -- The process to build a custom piece of hardware
- 12:33 -- 3D printing a custom case
- 15:16 -- The assembly process and selling about 150-200ish devices a month

I wrote a tiny Jekyll plugin to convert the above into what you see on the site:

# Developed by Nick Janetakis - https://nickjanetakis.com
#
# Usage in layouts: {{ content | audio_seek }}

require "jekyll"
require "nokogiri"

def hh_mm_ss_to_seconds(human_time)
  # This accepts hh:mm:ss, mm:ss or ss.
  human_time.split(":").map { |a| a.to_i }.inject(0) { |a, b| a * 60 + b}
end

module Jekyll
  module AudioSeek
    def audio_seek(content)
      doc = Nokogiri::HTML.fragment(content)

      # Stop if we could't parse with HTML.
      return content unless doc

      # Note to reader: I had to replace / with | in (2) spots in the line below
      # to prevent the site building regex from blocking this. This is the first
      # time a false positive came up in 9 months. I will come up with a more
      # long term solution but I will leave this in to demonstrate how that
      # original regex could have false positives.
      doc.xpath('h2[contains(text(), "Topics Include")]|following-sibling::ul[1]|li').each do |li|
        original_inner_html = li.inner_html
        split_delimiter = " – "
        original_inner_html_split = original_inner_html.split(split_delimiter)

        seek_time_human = original_inner_html_split.first
        seek_time_seconds = hh_mm_ss_to_seconds(seek_time_human)

        rest = original_inner_html_split.last

        seek_link = "<a href='#' data-audio-seek='#{seek_time_seconds}'
                        class='audio-seek'>#{seek_time_human}</a>"

        new_inner_html = "#{seek_link}#{split_delimiter}#{rest}"

        li.inner_html = new_inner_html
      end

      doc.to_s
    end
  end
end

Liquid::Template.register_filter(Jekyll::AudioSeek)

At a high level, all that’s doing is:

  • Look for that “Topics Include” heading
  • Grab the next <ul> so this logic only applies to topics
  • Split on --
  • Convert the 7:33 into seconds
  • Insert a link with a data attribute around the 7:33 text

If anyone is an expert with Hugo, I’d love to see how you can do the same with Hugo.

At the time of this post, there’s a render hook for links and headings but not lists but you’d still only want to scope this transformation to a specific list, not all of them. I know it can be solved with JavaScript but I don’t want to do that logic client side.

The video below mostly goes over this post with a little more color and I do jump into the blog’s code base from time to time.

# Video

Timestamps

  • 0:37 – Why did I switch?
  • 6:21 – Refactor or grand rewrite?
  • 9:49 – Solving the big problems before I start
  • 16:04 – Switching themes too?
  • 18:46 – High level features
  • 27:44 – Hello world, Hugo is fast
  • 32:14 – Learning Hugo quickly
  • 33:53 – Hugo vs Jekyll terms
  • 41:39 – YYYY-MM-DD filenames but not URLs
  • 44:45 – Splitting posts and drafts
  • 49:28 – Removing trailing slashes
  • 53:06 – Custom file digests
  • 56:45 – Syntax highlighting
  • 59:11 – Latest N posts
  • 1:02:06 – Apostrophes and symbols
  • 1:05:30 – Beware of leading spaces
  • 1:10:04 – Customized blog tags
  • 1:13:04 – Looping over data files
  • 1:17:01 – Render hook for external links
  • 1:20:06 – Render hook for named anchor headings
  • 1:24:17 – Table of contents
  • 1:29:47 – Sitemap
  • 1:32:46 – Twitter embeds
  • 1:35:42 – YouTube embeds
  • 1:37:23 – A checklist to double check your work
  • 1:40:56 – Generate new frontmatter with Python
  • 1:44:31 – Updating Hugo with confidence
  • 1:47:08 – Detecting default template changes
  • 1:50:13 – Running out of disk space
  • 1:52:17 – Parallel build race conditions
  • 1:56:36 – Safely switching your live site
  • 2:01:17 – Final thoughts
  • 2:04:58 – Not sure about custom behaviors

Did you migrate to Hugo? How did it go? Let me know below.

Never Miss a Tip, Trick or Tutorial

Like you, I'm super protective of my inbox, so don't worry about getting spammed. You can expect a few emails per year (at most), and you can 1-click unsubscribe at any time. See what else you'll get too.



Comments