Treble clef, released under public domain

Digital audio library

Notes published the
10 - 12 minutes to read, 2392 words
Categories: android backup linux music version control systems windows
Keywords: abcde android backup cmus drm linux music picard version control systems windows

After recovering, I put some thought into how to organize the audio library.

I am not an audiophile, in fact, I normally do not listen to music at all, and I definitively do not sing, except in extremely rare circumstances.

Nevertheless, I like to have an overview of what I have, be it inside my PC or physical objects.

So I could either toss the CD and audio library I have or try to organize it. As I’ve spent so much time recovering the digital archive, I obviously decided not to toss it.

Digitalize everything

The first step was to decide to digitalize all the CDs I own. It makes it easier to have a consistent overview. The digitalized CDs are currently in a Box, unused.

MusicBrainz

This is not the first thing I did, but in retrospect, it should be the first step; create an account on MusicBrainz.

As it only requires an email address, an account can be totally anonymous.

I suppose there are similar websites, the main reason I’ve created the account is that multiple programs use the MusicBrainz database as a backend for querying metainformation about music files, like artist name, album, genre, and so on.

There is no reason for creating an account if I would have only queried the database.

In fact, I’ve used it before reorganizing my music collection on multiple occasions for years.

As I know I have some CDs that were not on MusicBrainz, or that were not complete, or where the cover art or some other information was missing.

Multiple reasons helped to decide to create an account and invest some time in adding the data I needed:

  • if I ever need to retag my music collection again, I do not need to add the metadata by hand a second time

  • if there are some errors, those might be caught and corrected by others

  • if no metadata is missing, I have one unified way to handle all of it

Just after the first couple of days, because I made some typos, someone else reviewed and corrected my first contributions. Thus it already paid out during the first week.

Most things to do are trivial (adding cover art, missing artist, genre, …​) while adding a new release is less intuitive, but well documented.

Ripping program

The next thing to do should be to search for a program for copying all disks to the drive of the PC.

Many audio players can do it, but I would have preferred a separate program, for two main reasons.

The first is that I have not decided yet how to manage my music collection. For example which format to use, and how to organize all the files. Also, I might change the music player, and I definitively want to always use the same process for ripping my CDs to avoid, for example, too many differences in audio quality.

My main requirement was that the ripping process should scan automatically MusicBrainz for embedding at least some pieces of information, and I do not want to fiddle with too many settings (audio quality, …​)

I’ve decided to use abcde (A Better CD Encoder), I did not make any comparison, but I liked the name. With Debian’s default settings (create .ogg file, search metadata on MusicBrainz) I was happy enough not to change anything.

I was unsure if .ogg is a well-supported format. As far as I could see, it is supported on Android, Rockbox, and PCs. Good enough for me.

Note 📝
Obviously I’ve inherited a device that I like and does not support .ogg files, and is not supported by RockBox. Oh well. ¯\_(ツ)_/¯

I did explicitly want to avoid using something like FLAC because it simply takes up too much space. Yes, external hard drives might be (relatively) cheap, but portable MP3 players and phone drives are, in comparison, small, and loading gigantic files also hurts battery life. Of course, I could simply convert those files to mp3 or something else on the fly, but It would make synchronizing devices much more difficult. For example; do I have all the songs with the corrected metadata on my phone? If the formats are different, I need to somehow compare the metadata by opening all the files. If the format is the same, a binary comparison is sufficient (and also permits finding files that are invalid, or files that are changed by accident).

Currently, it is a simple drag-and-drop from any file manager, or with a diff tool, and I like it being so simple (and fast!).

Picard for organizing music files

While copying all files to my PC, I’ve used Picard to find the exact information about any CD by comparing the Barcodes (and I’ve added those that were missing).

I’ve also used Picard’s default folder structure convention: artist name/album name, as it seems a sensible decision, but I’ve preferred to handle such structure manually, as there are a couple of exceptions.

The biggest one is that such folder structure does not prevent collisions. For example, 50 Cent released "Get Rich or Die Trying" (identical album name, and of course same artist) in different years, like 2003 and 2005.

For those albums, I’ve also added the year in the folder name.

Of course, there are other possible collisions; it is not uncommon to make a release in a different part of the world with the same name, but not necessarily the same content. This issue could be avoided by adding the MusicBrainz ID (MBID) of every album in the folder name. As those collisions are rare in my library, it is probably not worth it.

Note 📝
In case of a collision in the naming scheme, if you decide that Picard should move the files to the corresponding folders, rest assured that it will delete any file. For example, instead of overwriting file.mp3, it would rename the second to file (2).mp3.

Some music files, like Classical Music, Audio Tracks, Ringtones, Audiobooks, Kid’s Music, or Podcasts, are in a separate sub-folder, because the way I consume those is very different.

Audio formats

I see little value in having dozens of different formats. When copying a CD I use ogg, otherwise I tend to keep the original format which is often mp3.

I do not think that someone still sells music files with some DRM system (thank god).

As I do not want to have artificial limitations with which programs I can listen to Music, or on which platforms I can listen to it, if encountered, I would try to avoid them as long as possible.

Playlists

I am mainly using "handcrafted" .m3u files, located in the ~/Music/playlist folder.

The paths are relative so that it is easy to copy those between devices, as long as the relative structure is maintained, for example

#EXTM3U

#PLAYLIST: Christmas

../Compilation/Its Christmas/01 John Yoko _ Plastic Ono Band - Happy Xmas (War Is Over).ogg
../Compilation/Its Christmas/02 Band Aid - Do They Know It's Christmas_.ogg
../Compilation/Its Christmas/03 Roy WoodwithWizzard - I Wish It Could Be Christmas Everyday.ogg

The files are not completely handcrafted, for example, I used

printf '#EXTM3U\n\n#PLAYLIST: Christmas\n' > christmas.m3u
find "../Compilation/Its Christmas" -xtype f \( -name '*\.ogg' -o -name '*\.flac' -o -name '*.mp3' \) | sort >> christmas.m3u
# and other folders containing mainly christmas music

for creating the Christmas playlists.

Note 📝
I am using -xtype, and not -type. This is not a typo, because all files are not actual files, but symlinks to files (see my current backup system).

Audio player

I used for some time Amarok, it works reasonably well and it supports both GNU/Linux and Windows systems. Nowadays I use mostly cmus (also available on Windows thanks to Cygwin), but any player, as long as it

  • does not touch or organize my Music files (as some players like to do)

  • does not spin up unnecessarily the disk

  • shows a view by "Album Artist" and "Album"

  • shows some metadata

should be good enough.

A big plus is if I can open the audio player with a given directory as a parameter.

For example, if I want to listen to some Classical Music, located in a specific folder, there is no need to have a view over the whole library.

Another plus is support for .nomedia and .nomusic files, or something similar.

Backup system

These notes would be a complete waste of time if they did not mention how I am ensuring that there is a backup of the digital library.

I am currently using two methods, simply because I am a slow learner.

The first is a copy of the ~/Music folder on an external drive.

It is a manual process, so I do not do it regularly, but it is a foolproof process and easy to understand.

The second, the "automated" method, involves git annex.

Short explanation

The short explanation is that my ~/Music folder is a git repository and that all files are read-only. When I want to change something (for example correct some metadata), the files need to be unlocked/made writable, edited, and then committed (which makes them read-only again).

After committing, changes are pushed to a different machine, thus ensuring a second copy of all files.

More detailed workflow

The first question would be, why not use git directly instead of git annex.

There are multiple reasons, the first one is that git is slow when handling large binary files.

The second reason is that such a repository would get very big after a few commits, I am generally not interested in the whole history of an audio file. If it would cost an unnoticeable amount of space, then I would not have anything against it, but if it makes the ~/Music bigger by a factor of ten or hundred, then it is not an issue that can simply get ignored.

git annex solves both problems by replacing the files with a symlink to a read-only file inside the .git folder.

Using symlink adds many new possibilities, but unfortunately also a lot of complexities.

git annex strives to permit every workflow, and keep the interface as simple to use as possible, but as I am not using it regularly (my music collection does not change that often), I am still unsure how some operations work, as there are some differences when using it compared to git. The walkthrough provides a good overview, but there are some limitations that I do not like (mainly Windows support).

For the sake of completeness, this is an overview of the most used commands.

When adding files, it is possible to use git add, or git annex add. In the second case, only a symlink will be committed. This has multiple implications (like the possibility of not pulling all files), but the main advantage is that git remains fast in all its operations.

When using git annex add, the file is moved inside .git/annex and replaced with a symlink. The file inside .git/annex is also read-only, making it (generally) not possible to change.

If one wants to modify an already committed file, it is possible to unlock it with git annex unlock <files and/or directories>, edit it, and add it again with git annex add.

When pushing files with git push, only the symlink is pushed, not the file itself(!).

For pushing the content to a separate PC (or pulling it on my second machine), I use mostly git annex sync --content.

Note 📝
Why not use git lfs? Because it needs a special remote; the data would live outside the git repository.

Windows support

git annex supports Windows, but I do not find the current status usable.

Windows and NTFS both support symlinks and read-only files, but git annex does not take advantage of it, it’s actually worse, it does not recognizes symlinks as such. (Granted, symlinks on Windows are a mess)

Instead, files are copied, which means that the ~/Music folder would be as twice as big. Also, every time I execute the test suite, some tests fail. I am sure those are testing some corner error cases, maybe even using strange characters, but it does not add much confidence.

For the Windows machines I am interested in, I use an external drive. Data are not changed that often and synchronizing by hand is not an issue (the data never changes on the Windows machines, if it does it would be by accident).

Not mainstream

Gitlab was, as far as I know, was the only "public" instance providing support for git-annex, but dropped it in favor of git-lfs.

Thus git annex is not as used as other approaches, and after Gitlab dropped support for it, most hopes that Windows would enhance the support for it vanished.

My main fear is that, even if uses a simple format, it might vanish because there is too little interest in supporting it.

Note 📝
yes, Cygwin and WSL can create symlinks, and there are other types too, but support is scarce and inconsistent because they all have different limitations.

Portable MP3 players/Phone

git annex is not only useful as a backup system but also for synchronizing data. It is the main use case for which it has been designed.

I am currently not taking advantage of these functionalities.

The main reason is that most, if not all portable audio players use FAT32 as a filesystem, which does not support symlinks.

Even Rockbox, an alternate firmware for some portable players, does not support other filesystems (and has no intereset in supporting them).

Android phones do support filesystems like ext3 or ext4, but cannot cope with SD cards that are not formatted in FAT32 or exFat (which does not support symlinks either).

git annex actually works with FAT partitions too, but has a different workflow, as there are no symlinks.

At that point, it is just easier (for me) to synchronize those devices by hand, as most audio files do not change, and if they do, it is always from the same machine.


Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.