Using sed Range Patterns, grep and tr to Parse a Changelog File
In this video, we'll pipe together a few Unix tools to parse out changes from a specific release in a Markdown based changelog file.
I’m a big fan of using the command line to solve problems that require parsing text from files. By the end of this video you’ll know how to break down text parsing problems and how to use a few Unix tools together to solve problems like the one covered in this video.
In this specific video, the goal is to parse a CHANGELOG.md
file so that you
can input a specific release tag and get a back a list of bullet points
associated to that release. This information could then be sent to Slack, email
or whatever you want as part of a CI / CD pipeline.
Since we’ll cover both the “why” and the “how”, you’ll see how to apply the same strategies to your specific text related problems not just the one in the video, although who knows, you might wind up wanting to do the same thing I’m doing here with your changelog file.
# Building Up the Command Pipeline
The Command
# Original script used in the video, which has a subtle bug when using tr.
sed -n "/^## v1.9.2$/,/^## /p" CHANGELOG.md \
| grep -E "^(-|\s+)" \
| tr "-" ">" \
| sed ":a $!N;s/\n[ \t]\+/ /;ta P;D"
# A revised version of the script that is more strict on replacing - with >
# only if the line starts with -. This prevents replacing a hyphen that happens
# to exist in the middle of the line or anywhere else in the bulleted item.
sed -n "/^## v1.9.2$/,/^## /p" CHANGELOG.md \
| grep -E "^(-|\s+)" \
| sed "s/^-/>/" \
| sed ":a $!N;s/\n[ \t]\+/ /;ta P;D"
Timestamped Table of Contents
- 1:10 – First order of business? Break down the problem and find patterns
- 3:45 – Thinking about edge cases
- 4:59 – Downloading the CHANGELOG file to follow along if you want
- 5:57 – Beginning to solve the problem by using a sed range pattern
- 10:49 – Using grep to only get lines that are bullet points
- 11:37 – Using tr to transform bullet points into Markdown quotes
- 12:36 – Dealing with bullets that are hard wrapped into multiple lines
- 14:28 – Introducing an alien sed command to pull up non-hyphen lines
- 16:41 – The command line is super helpful for solving various text problems
Reference Links
- https://github.com/nickjj/ansible-docker/blob/master/CHANGELOG.md
- https://stackoverflow.com/a/17022177
What type of problems have you solved with sed and grep? Let me know below.