Mobileread
KRDS - A parser for Kindle reader data store files
#1  jhowell 08-12-2019, 05:59 PM
A recent discussion prompted me to look into how annotations are stored on Kindle devices running recent firmware versions.

Information related to each book being read is saved in a pair of sidecar files in the book's .sdr folder. These files contain serialized data objects used by the e-book reader application. The first file contains objects that change with every page turn such as the last page read and reading timing. The second file contains less frequently changed data such as personal annotations, font & dictionary choices, and synced reading position.

The file extensions used depend on the book format:
The data format appears to be proprietary to Amazon and is similar to the Amazon Ion Binary Encoding used by KFX. It encodes the name of each object being serialized along with a list of property values. Values each have an associated data type, such as integer or string. Decoding objects requires knowledge of the data structure associated with each class.


KRDS (Kindle Reader Data Store)

I have written a Python script to parse these files. The main function accepts an input file name, parses it into a Python data structure, and outputs the result as a human readable JSON file.

I reverse engineered the data structures for several classes commonly used by the Kindle reader, but it is likely that I missed some things. Reports of any file that is not handled properly are welcome.


Usage
Spoiler Warning below







Download and unzip the attachment to this post to obtain "krds.py". It should run under recent versions of Python 2 or 3.

Code
usage: python krds.py [-h] pathname
Convert Kindle reader data store files to JSON
positional arguments: pathname Pathname to be processed (.azw3f, .azw3r, .mbp1, .mbs, .yjf, .yjr)
optional arguments: -h, --help show this help message and exit
Enclose the name of the file to be converted in double quotes if it contains spaces.
The output file will have the same name with ".json" appended.



Sample Output
Spoiler Warning below







Decoded .yjr file:
Code
{ "font.prefs": { "typeface": "_INVALID_,und:bookerly", "lineSp": 1, "size": 5, "align": 1, "insetTop": -1, "insetLeft": -1, "insetBottom": -1, "insetRight": -1, "unknown1": -1, "bold": 1, "userSideloadableFont": "", "customFontIndex": -1, "mobi7SystemFont": "", "mobi7RestoreFont": false, "readingPresetSelected": "" }, "sync_lpr": true, "annotation.cache.object": { "annotation.personal.highlight": [ { "startPosition": "ATwDAAAAAAAA:3803", "endPosition": "ATwDAAADAQAA:4062", "creationTime": "2019-08-11T15:24:03.083000", "lastModificationTime": "2019-08-11T15:24:03.083000", "template": "0\ufffc0" }, { "startPosition": "AS0DAAAAAAAA:1696", "endPosition": "AS0DAADoAAAA:1928", "creationTime": "2019-08-11T15:24:03.088000", "lastModificationTime": "2019-08-11T15:24:03.088000", "template": "0\ufffc0" }, { "startPosition": "AWsDAAAAAAAA:12846", "endPosition": "AW0DAAB7AQAA:13491", "creationTime": "2019-08-11T15:24:03.088000", "lastModificationTime": "2019-08-11T15:24:03.088000", "template": "0\ufffc0" }, { "startPosition": "ATUDAAAAAAAA:1975", "endPosition": "ATsDAAAtAgAA:3802", "creationTime": "2019-08-11T15:24:03.088000", "lastModificationTime": "2019-08-11T15:24:03.088000", "template": "0\ufffc0" }, { "startPosition": "AUQDAAAAAAAA:5510", "endPosition": "AUgDAAADAQAA:6194", "creationTime": "2019-08-11T15:24:03.083000", "lastModificationTime": "2019-08-11T15:24:03.083000", "template": "0\ufffc0" }, { "startPosition": "ASsDAAAAAAAA:1477", "endPosition": "ASsDAABOAAAA:1555", "creationTime": "2019-08-11T15:24:03.088000", "lastModificationTime": "2019-08-11T15:24:03.088000", "template": "0\ufffc0" }, { "startPosition": "AW8DAAAAAAAA:13552", "endPosition": "ASIEAABwAAAA:42227", "creationTime": "2019-08-11T15:24:03.030000", "lastModificationTime": "2019-08-11T15:24:03.030000", "template": "0\ufffc0" }, { "startPosition": "AWkDAAAAAAAA:12350", "endPosition": "AWkDAADvAAAA:12589", "creationTime": "2019-08-11T15:24:03.088000", "lastModificationTime": "2019-08-11T15:24:03.088000", "template": "0\ufffc0" }, { "startPosition": "AT8DAAAAAAAA:4154", "endPosition": "AUADAAAxAQAA:4745", "creationTime": "2019-08-11T15:24:03.088000", "lastModificationTime": "2019-08-11T15:24:03.088000", "template": "0\ufffc0" } ], "annotation.personal.note": [ { "startPosition": "AUADAAAxAQAA:4745", "endPosition": "AUADAAAxAQAA:4745", "creationTime": "2019-08-11T15:24:03.083000", "lastModificationTime": "2019-08-11T15:24:03.083000", "template": "0\ufffc0", "note": "Here is another note for the book" }, { "startPosition": "ATwDAAADAQAA:4062", "endPosition": "ATwDAAADAQAA:4062", "creationTime": "2019-08-11T15:24:03.088000", "lastModificationTime": "2019-08-11T15:24:03.088000", "template": "0\ufffc0", "note": "This is my first note in this book" }, { "startPosition": "AWwDAACcAAAA:13111", "endPosition": "AWwDAACcAAAA:13111", "creationTime": "2019-08-11T15:24:03.079000", "lastModificationTime": "2019-08-11T15:24:03.079000", "template": "0\ufffc0", "note": "More notes" }, { "startPosition": "ASIEAABwAAAA:42227", "endPosition": "ASIEAABwAAAA:42227", "creationTime": "2019-08-11T15:24:03.088000", "lastModificationTime": "2019-08-11T15:24:03.088000", "template": "0\ufffc0", "note": "A really long highlight" } ], "annotation.personal.bookmark": [ { "startPosition": "AVoDAAAAAAAA:9430", "endPosition": "AVoDAAAAAAAA:9430", "creationTime": "2019-08-11T15:24:03.088000", "lastModificationTime": "2019-08-11T15:24:03.088000", "template": "0\ufffc0" }, { "startPosition": "AUsDAAAAAAAA:6642", "endPosition": "AUsDAAAAAAAA:6642", "creationTime": "2019-08-11T15:24:03.088000", "lastModificationTime": "2019-08-11T15:24:03.088000", "template": "0\ufffc0" } ] }, "ReaderMetrics": { "booklaunchedbefore": "true" }, "erl": "AcgiAAA0AAAA:1206501"
}
[zip] krds-v1.zip (3.8 KB, 21 views)
Reply 

#2  j.p.s 08-12-2019, 06:31 PM
Wow! Thanks!

I'll try kicking the tires when I get some time.
Reply 

#3  jhowell 08-12-2019, 07:54 PM
Quote j.p.s
I'll try kicking the tires when I get some time.
I hope that what I found out can be useful in your project.
Reply 

#4  PoP 08-13-2019, 03:28 PM
Quote jhowell
...
the file extensions are .mbs and .mbp1 (for MOBI), .azw3f and .azw3r (for KF8), and .yjf and .yjr (for KFX)
...
Thanks for sharing. Also found that .pds and .pdt (for PDF) have that signature and decode similarly.


Spoiler Warning below






.pds
Code
{ "font.prefs": { "typeface": "_INVALID_,und:bookerly", "lineSp": -1, "size": -1, "align": -1, "insetTop": 28, "insetLeft": 28, "insetBottom": 0, "insetRight": 28, "unknown1": -1, "bold": -1, "userSideloadableFont": "", "customFontIndex": -1, "mobi7SystemFont": "_INVALID_,und:bookerly", "mobi7RestoreFont": false, "readingPresetSelected": "" }, "sync_lpr": false, "annotation.cache.object": { "annotation.personal.highlight": [ { "startPosition": "1 6 51 1 255 177 53 17", "endPosition": "1 48 280 1 305 345 79 21", "creationTime": "2019-08-13T16:05:05.820000", "lastModificationTime": "2019-08-13T16:05:05.820000", "template": "0\ufffc0" }, { "startPosition": "1 67 361 1 123 445 63 17", "endPosition": "1 108 9 1 450 1049 13 21", "creationTime": "2019-08-13T16:06:03.643000", "lastModificationTime": "2019-08-13T16:06:03.643000", "template": "0\ufffc0" } ], "annotation.personal.note": [ { "startPosition": "1 48 280 1 305 345 79 21", "endPosition": "1 48 280 1 305 345 79 21", "creationTime": "2019-08-13T16:05:46.309000", "lastModificationTime": "2019-08-13T16:05:46.309000", "template": "0\ufffc0", "note": "Ingr\u00e9dients" }, { "startPosition": "1 108 9 1 450 1049 13 21", "endPosition": "1 108 9 1 450 1049 13 21", "creationTime": "2019-08-13T16:06:23.148000", "lastModificationTime": "2019-08-13T16:06:23.148000", "template": "0\ufffc0", "note": "Recette" } ], "annotation.personal.bookmark": [ { "startPosition": "1 0 0 0", "endPosition": "1 0 0 0", "creationTime": "2019-08-13T16:01:53.130000", "lastModificationTime": "2019-08-13T16:01:53.130000", "template": "0\ufffc0" } ] }, "language.store": { "language": "fr", "unknown1": 0 }, "ReaderMetrics": { "booklaunchedbefore": "true" }
}
.pdt
Code
{ "fpr": { "position": "80 0 0 0", "time": null, "timeZoneOffset": null, "country": "", "device": "" }, "page.history.store": [], "lpr": { "position": "1 0 0 0", "time": "2019-08-13T16:06:55.068000" }
}
Reply 

#5  jhowell 08-13-2019, 07:04 PM
Quote PoP
Thanks for sharing. Also found that .pds and .pdt (for PDF) have that signature and decode similarly.
Thanks for the info.

I tested a Topaz (.azw1) file and it uses .tal and .tas files with the same type of content.

I will update the first post to add this information.
Reply 

#6  shamanNS 08-14-2019, 07:36 AM
So, this script does not extract the actual text that was highlighted?
Reply 

#7  jhowell 08-14-2019, 08:09 AM
Quote shamanNS
So, this script does not extract the actual text that was highlighted?
That is correct. The script decodes whatever is in the files indicated in the first post of this thread. The reader application has no need to store the actual text separately from the book format file.

The linkage between the files that this program decodes and the book's content are fields labeled with "position" in the name. These are strings that identify where to find content within a book and are interpreted differently for each book format.

KF8 (azw3) format appears to be the simplest case. The position is a decimal number giving an offset within the raw HTML content of the book, as can be obtained using the kindleunpack software. See the work done by j.p.s for an example of how to make use of this information.

MOBI (azw) format is similar, but there appears to be additional information that I have not attempted to decode.

KFX uses two values separated by a colon. The first is a base64 encoding of the eid and offset, which are fields used internally by KFX to determine the location of content. The second is the actual position number, which in the case of KFX counts visible unicode characters instead of raw HTML bytes.

I have not looked into how position numbers are handled in the other formats that Kindle supports.
Reply 

Today's Posts | Search this Thread | Login | Register