Recover music files (part 2)

Repairing corrupt mp3 files

As mentioned previously, synchronizing music files between my PC and external music player between devices is a copy-paste process. While doing a diff I noticed that some files on Rockbox were different than on my PC.

A closer inspection revealed that the structure of some of the mp3 file was partially broken. As mentioned, some files were recovered from a faulty drive, and it did not occur to me to validate all audio file or search for possible issues, except removing empty files.

mp3val seems a handy tool:

find . -name "*.mp3" -exec mp3val -f {} +

Some error that was outputted in the console:

Analyzing file "Usher Ft. Alicia Keys - My Boo, Part II (Prod. By Jer.mp3"...
WARNING: "Usher Ft. Alicia Keys/Tapemasters Inc.-Mixtape Wonde/Usher Ft. Alicia Keys - My Boo, Part II (Prod. By Jer.mp3" (offset 0x53ba86): Garbage at the end of the file
INFO: "Usher Ft. Alicia Keys/Tapemasters Inc.-Mixtape Wonde/Usher Ft. Alicia Keys - My Boo, Part II (Prod. By Jer.mp3": 8752 MPEG frames (MPEG 1 Layer III), +ID3v1+ID3v2, CBR
Rebuilding file "Usher Ft. Alicia Keys - My Boo, Part II (Prod. By Jer.mp3"...
FIXED: "Usher Ft. Alicia Keys/Tapemasters Inc.-Mixtape Wonde/Usher Ft. Alicia Keys - My Boo, Part II (Prod. By Jer.mp3": File was rebuilt
Done!


Analyzing file "Heaven's Light_Hellfire.mp3"...
WARNING: "The Hunchback Of Notre Dame/The Hunchback of Notre Dame/Heaven's Light_Hellfire.mp3" (offset 0x4f28f7): It seems that file is truncated or there is garbage at the end of the file
INFO: "The Hunchback Of Notre Dame/The Hunchback of Notre Dame/Heaven's Light_Hellfire.mp3": 12412 MPEG frames (MPEG 1 Layer III), +ID3v2, CBR
Rebuilding file "Heaven's Light_Hellfire.mp3"...
FIXED: "The Hunchback Of Notre Dame/The Hunchback of Notre Dame/Heaven's Light_Hellfire.mp3": File was rebuilt
Done!


Analyzing file "01 Top Gun.mp3"...
INFO: "Bury Your Dead/Cover Your Tracks/01 Top Gun.mp3": 5633 MPEG frames (MPEG 1 Layer III), +ID3v1+ID3v2, CBR
Done!

Analyzing file "Nirvana - Heart Shaped Box.mp3"...
WARNING: "Nirvana/Unknown Album/Nirvana - Heart Shaped Box.mp3" (offset 0x35a3d8): MPEG stream error, resynchronized successfully
WARNING: "Nirvana/Unknown Album/Nirvana - Heart Shaped Box.mp3" (offset 0x445924): It seems that file is truncated or there is garbage at the end of the file
WARNING: "Nirvana/Unknown Album/Nirvana - Heart Shaped Box.mp3": VBR detected, but no VBR header is present. Seeking may not work properly.
WARNING: "Nirvana/Unknown Album/Nirvana - Heart Shaped Box.mp3": Non-layer-III frame encountered. See related INFO message for details.
WARNING: "Nirvana/Unknown Album/Nirvana - Heart Shaped Box.mp3": Different MPEG versions or layers in one file. See related INFO message for details.
INFO: "Nirvana/Unknown Album/Nirvana - Heart Shaped Box.mp3": 10717 MPEG frames (1 V1L1, 0 V1L2, 10716 V1L3, 0 V2L1, 0 V2L2, 0 V2L3, 0 V2.5L1, 0 V2.5L2, 0 V2.5L3), +ID3v1, no VBR header
Rebuilding file "Nirvana - Heart Shaped Box.mp3"...
FIXED: "Nirvana/Unknown Album/Nirvana - Heart Shaped Box.mp3": File was rebuilt
Done!

Analyzing file "01 Girls Rock your Boys.mp3"...
WARNING: "motley crue/Monsters of Rock/01 Girls Rock your Boys.mp3" (offset 0x17924): MPEG stream error, resynchronized successfully
INFO: "motley crue/Monsters of Rock/01 Girls Rock your Boys.mp3": 11168 MPEG frames (MPEG 1 Layer III), +ID3v1, CBR
Rebuilding file "01 Girls Rock your Boys.mp3"...
FIXED: "motley crue/Monsters of Rock/01 Girls Rock your Boys.mp3": File was rebuilt
Done!

Analyzing file "08 To Be Your Loss.mp3"...
WARNING: "The Morning After Girls/Alone/08 To Be Your Loss.mp3" (offset 0x0): Garbage at the beginning of the file
WARNING: "The Morning After Girls/Alone/08 To Be Your Loss.mp3" (offset 0x496356): Garbage at the end of the file
INFO: "The Morning After Girls/Alone/08 To Be Your Loss.mp3": 7682 MPEG frames (MPEG 1 Layer III), +ID3v1, CBR, CRC
Rebuilding file "08 To Be Your Loss.mp3"...
FIXED: "The Morning After Girls/Alone/08 To Be Your Loss.mp3": File was rebuilt
Done!

Analyzing file "Unknown Artist - liquid - sweet harmony [lake remix].mp3"...
WARNING: "Unknown Artist/Unknown Album/Unknown Artist - liquid - sweet harmony [lake remix].mp3" (offset 0x4107a8): MPEG stream error, resynchronized successfully
WARNING: "Unknown Artist/Unknown Album/Unknown Artist - liquid - sweet harmony [lake remix].mp3": Wrong number of MPEG frames specified in Xing header (18899 instead of 18893)
WARNING: "Unknown Artist/Unknown Album/Unknown Artist - liquid - sweet harmony [lake remix].mp3": Wrong number of MPEG data bytes specified in Xing header (11536602 instead of 11532532)
INFO: "Unknown Artist/Unknown Album/Unknown Artist - liquid - sweet harmony [lake remix].mp3": 18893 MPEG frames (MPEG 1 Layer III), +ID3v2, Xing header
Rebuilding file "Unknown Artist - liquid - sweet harmony [lake remix].mp3"...
FIXED: "Unknown Artist/Unknown Album/Unknown Artist - liquid - sweet harmony [lake remix].mp3": File was rebuilt
Done!


Analyzing file "Brown Eyed Girl.mp3"...
WARNING: "Weezer/Unknown Album/Brown Eyed Girl.mp3": This is a RIFF file, not MPEG stream
WARNING: "Weezer/Unknown Album/Brown Eyed Girl.mp3" (offset 0x4009f1): It seems that file is truncated or there is garbage at the end of the file
INFO: "Weezer/Unknown Album/Brown Eyed Girl.mp3": 8029 MPEG frames (MPEG 1 Layer III), +ID3v1+ID3v2, CBR
Rebuilding file "Brown Eyed Girl.mp3"...
FIXED: "Weezer/Unknown Album/Brown Eyed Girl.mp3": File was rebuilt
Done!

Analyzing file "28 Pina Colada Boy.mp3"...
WARNING: "Baby Alice/Mashup/28 Pina Colada Boy.mp3": Wrong number of MPEG data bytes specified in Xing header (4678181 instead of 4669861)
INFO: "Baby Alice/Mashup/28 Pina Colada Boy.mp3": 7076 MPEG frames (MPEG 1 Layer III), +ID3v1+ID3v2, Xing header
Rebuilding file "28 Pina Colada Boy.mp3"...
FIXED: "Baby Alice/Mashup/28 Pina Colada Boy.mp3": File was rebuilt
Done!

Analyzing file "02 Believe.mp3"...
WARNING: "/Lenny Kravitz/Are You Gonna Go My Way/02 Believe.mp3" (offset 0x48ba7c): MPEG stream error, resynchronized successfully
WARNING: "/Lenny Kravitz/Are You Gonna Go My Way/02 Believe.mp3": VBR detected, but no VBR header is present. Seeking may not work properly.
WARNING: "/Lenny Kravitz/Are You Gonna Go My Way/02 Believe.mp3": Non-layer-III frame encountered. See related INFO message for details.
WARNING: "/Lenny Kravitz/Are You Gonna Go My Way/02 Believe.mp3": Different MPEG versions or layers in one file. See related INFO message for details.
INFO: "/Lenny Kravitz/Are You Gonna Go My Way/02 Believe.mp3": 11279 MPEG frames (1 V1L1, 0 V1L2, 11278 V1L3, 0 V2L1, 0 V2L2, 0 V2L3, 0 V2.5L1, 0 V2.5L2, 0 V2.5L3), +ID3v1+ID3v2, no VBR header
Rebuilding file "02 Believe.mp3"...
FIXED: "/Lenny Kravitz/Are You Gonna Go My Way/02 Believe.mp3": File was rebuilt
Done!

Analyzing file "04 Drop It Like Its Hot (LP).mp3"...
WARNING: "/Snoop Dogg/Drop It Like Its Hot/04 Drop It Like Its Hot (LP).mp3" (offset 0x80a): MPEG stream error, resynchronized successfully
WARNING: "/Snoop Dogg/Drop It Like Its Hot/04 Drop It Like Its Hot (LP).mp3" (offset 0x59f411): It seems that file is truncated or there is garbage at the end of the file
WARNING: "/Snoop Dogg/Drop It Like Its Hot/04 Drop It Like Its Hot (LP).mp3": VBR detected, but no VBR header is present. Seeking may not work properly.
INFO: "/Snoop Dogg/Drop It Like Its Hot/04 Drop It Like Its Hot (LP).mp3": 10328 MPEG frames (MPEG 1 Layer III), +ID3v1+ID3v2, no VBR header
Rebuilding file "04 Drop It Like Its Hot (LP).mp3"...
FIXED: "/Snoop Dogg/Drop It Like Its Hot/04 Drop It Like Its Hot (LP).mp3": File was rebuilt
Done!

And many more.

It processed 9536 mp3 files, and found issues and repaired 2989, nearly one third!

Just to be sure I also executed Picard again, just in case fixing some mp3 files altered some metadata.

Are now all songs better than before? It could be that by fixing it we removed some important information.

The real solutions are systematic backup and data validation. I’ll eventually notice someday if some file is corrupt or complete garbage and remove it.

It won’t be that important if some files are corrupt wit no possibilities to repair it, I waited years before sorting the library out.

Splitting files

While searching for "interesting files"/some outliers, I looked at those with the biggest file size.

I noticed some m4a where actually audiobooks. Too bad they were a single gigantic file with no chapters.

Fortunately, the chapter information was stored separately on a text file or an image.

As I did not found an easy way to add those chapter information in the file itself, I decided to split the audio file. Splitting it to multiple files has also another advantage. Portable players consume noticeable more energy while processing very big files, and are also less responsive.

I copied the beginning of the chapter in a text file, where every line encoded in "h:m:s" how long is a chapter

0:33:39
0:23:35
0:27:07
0:27:48
0:49:37
....

and wrote a bash script for doing the hard work

#!/usr/bin/env bash

set -o nounset
set -o errexit
set -o pipefail

filename="$1"; shift;
chapters="$1"; shift;

extension="${filename##*.}";
filename_without_extension="${filename%.*}"

chindex=1
begin=0

readarray -t arr < "$chapters"

ffmpeg_default_opt=("-nostdin" "-loglevel" "warning" "-i" "$filename")

for line in "${arr[@]}"; do :;
  echo -- "$line"
  sec="${line#*:}";
  sec="${sec#*:}";
  hour_min="${line%:*}";
  min="${hour_min#*:}"
  hour="${hour_min%%:*}";
  duration=$((10#$sec + 10#$min * 60 + 10#$hour * 60*60));
  outfile="$filename_without_extension.ch$chindex.m4a"
  if [ "$extension" = "mp4" ]; then :;
    ffmpeg "${ffmpeg_default_opt[@]}" -acodec copy -ss "$begin" -t "$duration" "$outfile";
  elif [ "$extension" = "m4a" ]; then :;
    # generates warning "stream 0, timescale not set" if, for example, there is an embedded image
    ffmpeg "${ffmpeg_default_opt[@]}" -c copy      -ss "$begin" -t "$duration" "$outfile";
  else :;
    printf 'Unknown file extension %s' "$extension";
    exit 1;
  fi
  begin=$((10#$begin + 10#$duration));
  chindex=$((10#$chindex+1));
done

I chose bash over sh as it supports arrays, which simplifies a couple of operations, and I chose it over other languages like python as in the end, it’s simply calculating where each chapter begins and calling an external program: ffmpeg.

The initial script had a bug: numbers, like $sec, were not prefixed with 10#`. Number sequences starting with a 0 are interpreted as octal numbers. Thus 010 is the decimal "8", and 08 is not a valid octal. To get bash to treat it as a decimal number, one can remove the leading zero or force base 10 with the prefix 10#.

Manually add covers

Some audio files where simply registration, which had a cover art saved in the same folder as a separate image.

With Picard, it’s possible to embed it with ease. I did not try (yet) to manipulate tags from the command line.