Delete Lines That Match a Pattern or the Opposite Pattern Using sed
This could be useful to remove unwanted lines in a file or output, such as processing a file and writing out a new file to analyze.
The TL;DR is:
- Matching a pattern:
sed "/PATTERN/d" your_file
- Matching the opposite (invert) pattern:
sed "/PATTERN/!d" your_file
(notice the!
)
For example, here’s how to delete all lines that don’t start with a space:
sed "/^ /!d" demo_file
# A Real World Example
I did this recently to write a one off script for a client to help identify all of the spots in their code base where they were referencing CodeIgniter config items. It’s a 10+ year old code base with thousands of config item references that had over a dozen different call styles.
So I wrote a script that used grep
to scan their code base using a
combination of greedy and more specific regular expression patterns.
Most config items are referenced with this pattern $this->config->item(.*)
but in practice there were over a dozen different call styles.
I started with a very greedy match like config->
to identify a bunch,
including false positives. Then I tightened it up. Here’s a few example call
style patterns:
$CI->config->item(.*)
$this->config->item(.*)
$this->ci->config->item(.*)
$this->CI->config->load(.*)
Long story short I wrote both the greedy and specific matches to separate files using a format that looks similar to this:
Matching: <INSERT_PATTERN>
/tmp/some_file.php:100: $this->config->item('hello');
/tmp/some_file.php:531: $this->config->item('world');
/tmp/another_file.php:72: $this->CI->config->load('nice');
Realistically it included the counts of each pattern but that’s not important here.
Then I used sed
, sort
and diff
to compare both files and show the result.
This let me quickly see the difference between the greedy and specific matches
to get all of the non-false positives.
#!/usr/bin/env bash
# Usage example: ./diff-config-calls greedy_file specific_file
set -o errexit
set -o pipefail
set -o nounset
file_a="${1}"
file_b="${2}"
sed "/^ /!d" "${file_a}" | sort > "${file_a}.processed"
sed "/^ /!d" "${file_b}" | sort > "${file_b}.processed"
diff --color --unified "${file_a}.processed" "${file_b}.processed"
rm "${file_a}.processed" "${file_b}.processed"
That produced an easy to read diff. Without writing these types of scripts it would have been ridiculously tedious to go through over a million lines of code with thousands of results.
The next step was programmatically converting those CodeIgniter config references into Laravel since this client was switching to Laravel one component at a time but that goes beyond the scope of this post, however that’s why I wanted to remove all false positives and ensure I caught all of the legit config references.
The video below goes into running the TL;DR examples.
# Demo Video
Timestamps
- 0:09 – Going over the TL;DR example
- 1:12 – A real world use case
What use cases have you applied this to in the past? Let me know below.