Random bits of free software

ZenRecover

A friend of mine once ended up with a corrupt hard disk in his Creative Zen™ mp3 player, to the point of not being able to boot the device anymore. This particular family of mp3 players do not work as standard USB mass storage devices and use a proprietary filesystem, so automated file recovery using standard tools was not an option.

With the help of quetzalcoatl, who had already analyzed the filesystem layout to great lengths, I wrote a Python script to walk the filesystem structure and recover the user's files and directories. To use it, one needs to access the actual filesystem, which usually means opening the broken mp3 player and connecting the internal hard disk to a notebook or other IDE interface. This filesystem is known to be used in the Creative Nomad™ JukeBox 3, Creative Zen™, and possibly other mp3 players of the same brand.

Usage

python zenrecover.py
Usage: zenrecover.py [-o OFFSET] DISK_OR_IMAGE SECTION OUTPUT_DIR
DISK_OR_IMAGE is the disk containing the filesystem, or an image thereof
OFFSET is the offset at which the filesystem starts (in bytes, default 20M)
SECTION is the section of the filesystem to recover: "archives" or "songs"
OUTPUT_DIR is the directory in which to place the recovered files

That's it. If the filesystem is not severely damaged, the tool should be able to recover all of your files.

Otherwise you will need to understand the filesystem layout explained below, examine the disk with a hex editor, and possibly do some tracing and debugging of ZenRecover itself, to figure out what's going wrong with the automated process. What I'm saying is that the tool will only work on a coherent or slightly damaged filesystem; otherwise your best bet is to take it as a starting point.

Download

zenrecover.py (size: 7.9K; license: GPL; last updated on 8 Oct 2008)
LRU.py (size: 2.3K; license: GPL; last updated on 8 Oct 2008) Library written by Josiah Carlson, used by ZenRecover.

Filesystem layout

The hard disk of the aforementioned mp3 players is partitioned into two distinct filesystems; one for the operating system, formatted in MiniFS and one for the user data, formatted in CFS. The latter usually begins at offset 20MiB from the start of the disk, so this is the default offset used by ZenRecover. If the tool fails to find the root inode in your disk, the filesystem might be placed at a different offset. ZenRecover only understands CFS, where the user data resides.

CFS works like a traditional UNIX inode-based filesystem, such as Linux ext2. It appears to be based on Dominic Giampaolo's BFS.

CFS, as found on said mp3 players, assumes a disk sector of 512B. The fundamental unit in CFS is the cluster, made of 16 contiguous sectors, or 0x2000 bytes. As already said, CFS usually starts at sector 0xA000.

The first cluster is numbered -1 (take note of this!) and is filled with 0xFF. Next comes cluster 0, filled with 0x00. Here is the relationship between cluster number and offset from the start of CFS (not counting the first 20MB of the disk), most useful when examining the filesystem in a hex editor:

offset  = (cluster + 1) × 0x2000
cluster = offset / 0x2000 − 1

Cluster 1 (the third cluster from the start of CFS) should contain some volume information, including prabably the location of the root inode. At cluster 2 should begin a cluster usage bitmap for the entire disk, which supposedly varies in dimension according to the size of the hard disk. I say should because in my case they were both TFU, therefore ZenRecover does not assume to find any useful information in their places.

At the cluster immediately following the cluster usage bitmap, we find the root inode. A CFS inode has the following (incomplete) structure:

offset type description
0 int32 magic number BE 3B D9 0A
4 int32 self-reference, ie. "You should have found me in cluster x"
0x20 int32[12] cluster numbers of the first 12 data clusters (each = -1 if unused)
0x58 int32 cluster number of the second class data cluster chain (see below)
0x64 int32 cluster number of the third class data cluster chain (see below)
0x78 int32 serial number, set to -1 in the root inode
0x7C int32 number of metadata records
0x80 start of metadata

All ints are stored PDP-endian. That is, int16 are stored little-endian, but int32 are stored in a strange way: 0x11223344 becomes 22 11 44 33 on disk. Strings appear to be NUL-terminated UCS-2LE. Bitmaps are arrays of int32, so they follow the same byte-swapping:

uint32 #0, bit #0  = bitmap bit #0
uint32 #0, bit #31 = bitmap bit #31
uint32 #1, bit #0  = bitmap bit #32

…and so on, so that the order of bits for every int32 in a bitmap is:

23 <- 16 | 31 <- 24 | 7 <- 0 | 15 <- 8

Files seem to have metadata, directories don't. Data clusters, which are where the actual file data (or directory entries) lay, are referenced in a three-tier structure:

inode
 \_ up to 12 data clusters (some of which might be set to -1)
 \_ second class chain cluster (seems to always be allocated)
 |   \_ up to 2048 clusters of data (some might be set to -1)
 \_ third class chain cluster (seems to always be allocated)
     \_ up to 2048 clusters of pointers (some might be set to -1)
         \_ up to 2048² clusters of data (some might be set to -1)

Metadata are a sequence of variable length, tagged records:

offset type description
0 int16 magic = 3
2 int16 length of this record
4 string[2] tag (NUL-terminated UCS-2LE, 2 chars = 6 bytes)
10 byte[length] data

Here are some useful metadata records (other metadata seem to come from the ID3 tags of the songs):

tag type description
"07" string filename with extension
"0=" string backslash-separated original path of the file, before it was copied over to the mp3 player
"0>" int32 file size

Directories are just like ordinary files, except that they have no metadata and that their actual data is an array of directory entries. Each directory entry points to the inode of a specific file or subdirectory. Directories in CFS seem to be allocated 8 contiguous data clusters at a time. Every block of 8 data clusters has this layout:

offset type description
8 int32 number of allocated entries*
16 a 204 byte array usage bitmap, providing for 1632 bits
220 exactly 1632 dir entries, 40 bytes each
36 null bytes of padding at the end

*: the number of allocated entries is only set in the first data cluster, ie. in the first block, and contains the number of children of the directory. This number allows for a simple consistency check against the block bitmaps.

Every directory entry is 40 bytes long and has the following layout:

offset type description
0 int32 cluster number of the inode
4 int16 length of the full filename
8 string filename, truncated to 15 characters if longer

So the whole filename is only contained in the metadata, which only files possess. This would imply that directories are limited to 15 character names. This is not a problem in practice, because the mp3 players we examined had this directory structure:

/songs all the songs, without any subdirectory
/archives all the files uploaded as data, without any subdirectory
/playlists
/recordings
/system

That's all! Hope you find it useful.

Disclaimer

The information provided in this web site was inferred from limited factual evidence and may be inaccurate or out-of-date. The author cannot be held responsible for (but not limited to) any loss or corruption of data, physical damage whether consequestial or inconsequential, business loss, personal injury, or any other type of damage or injury arising as a direct or indirect result of consulting this web site.

The product names used in this web site are for identification purposes only. All trademarks and registered trademarks are the property of their respective owners.

This site is not affiliated with Creative Technology Ltd.

This personal web site is not affiliated with, nor does it represent the views, position or attitude of my employer or of their clients.