🍿 Deduplicate feeds from NetNewsWire

Snacks (🍿) are my collection of recipes for solving problems. Often recorded (and cleaned up) from actual discussions where I'm involved in helping others with technical problems. Not all solutions are mine but I started collecting them in one place because you never know when you need one.

I use [NetNewsWire] as my "lobby" RSS reader: it's a place where I add a bunch of feeds, usually from blogging challenges through OPML files and then read them there before they make it to my main Feedly feed.

However, as years go by, same blogs appear multiple times and that's annoying. As I imported Blaugust 2024 participant OPML to my feed, I ended up with quite a few duplicates.

I don't have the patience to go through them manually so I started exploring programmatical options.

NetNewsWire stores the feed information in ~/Library/Containers/com.ranchero.NetNewsWire-Evergreen/Data/Library/Application Support/NetNewsWire/Accounts/OnMyMac/Subscriptions.opml file (on Mac). The OPML format is XML so I was able to take advantage of the nice xmllint tool that comes pre-installed on a Mac.

Step 0, take backups

It's a good idea to take a backup of the Subscription file so manually messing with it doesn't ruin your RSS reader completely.

cp Subscriptions.opml Subscriptions.opml.bak

Step 1, find all the feed URLs

xmllint --xpath "//outline/@xmlUrl" Subscriptions.opml

With --xpath, we can give it an Xpath Expression and it will return all it finds. //outline matches all <outline> elements and /@xmlUrl finds the xmlUrl attribute inside.

Step 2, find duplicates

xmllint --xpath "//outline/@xmlUrl" Subscriptions.opml | sort | uniq -c | sort --numeric-sort

I used a classic shell trick of sorting the list alphabetically, then combining exact matches and counting them (with -c flag) and sorting again, now based on the amount of times we saw each URL.

Step 3, only show duplicates

To only show ones that have values higher than 1, I add a bit of awk at the end:

xmllint --xpath "//outline/@xmlUrl" Subscriptions.opml | sort | uniq -c | sort --numeric-sort | awk -F' ' '{if($1>1)print$2}'

The awk command defines a separator to be space (-F' '), then checks if the first item is larger than 1 (if($1>1)) and if it is,m prints the second value (print$2).

Step 4, remove duplicates from file

I did step 4 manually, by opening the Subscriptions.opml file in a code editor and searching for the short list of duplicates, removing the older entries.

Step 5, some are duplicates but with slightly different URLs

Sometimes the exact feed URL might have changed and I wanted to clear also duplicates based on the title of the feed.

To do that, I changed the first part of the command to:

xmllint --xpath "//outline/@title" Subscriptions.opml

and the final part (awk) to

grep -ve "^\s*1"

since I couldn't rely on space as the separator. What this does is that it finds all the lines where

the line does not (-v) start
with regular expression (-e) of
^\s*1
- ^ = start of line
- \s* = any number of white space
- 1 = number 1

(One could use this instead of the awk in step 3 as well, I left both of them here for the sake of learning new tools.)

I then did step 4 again but this time with the duplicate titles

Step 6, restart NetNewsWire

To make NetNewsWire reload the feeds, restart it.

No more duplicates, yay!