Mobileread
azw3r highlight and note extraction info
#11  j.p.s 08-10-2019, 01:54 PM
Quote ilovejedd
Awesome work! Question though, how do you use this (syntax)? I'm assuming Linux only? Will this work on a Linux LiveUSB?

Thanks!
The C code needs to be compiled. It is not linux specific, but the POSIX mmap() call might not be supported everywhere. It should work fine on a Linux LiveUSB with a C compiler or with a compiled binary copied from a compatible linux system.

Microsoft has been misleadingly claiming POSIX compliance for decades, but it is my understanding that Microsoft Windows Subsystem for linux (or whatever it is called) is the real deal.

If you have a C compiler and it doesn't work, I can make a small change that just reads the entire azw3r file into a buffer since it is very unlikely that one would ever be too large to do that.

Maybe a pythonista will come along and crank out a python equivalent or improvement.

To compile:
Code
cc -o azw3r azw3r.c
To run (1st is for notes only, 2nd is for highlights only, 3rd for both highlights and notes, and 4th sorts the notes to be by where they are in the book):
Code
azw3r -i name.azw3r > name.notes
azw3r -h -i name.azw3r > name.highlights
azw3r -h -n -i name.azw3r > name.notes
azw3r -i name.azw3r | sort -n > name.notes
Example output:
Code
97434 97443 Note: 'Not correct definition for this book.'
114792 114796 Note: 'Should be in x-ray terms category.'
135617 135632 Note: 'Same as Tut'
533488 533494 Note: 'Not a person.'
553723 553726 Note: 'Not a podcast.'
712228 712235 Note: 'Not a video game.'
The output of the C program can be the end product, or it can be processed into some other format or used as input for some other program, such as the perl script attached to the first post that inserts the notes and/or hightlights into the kindleunpack rawml output of the book.
Reply 

#12  j.p.s 08-11-2019, 06:11 PM
I've attached a perl script, azw3r.pl as the gzip'd azw3r.pl.gz to the first post. It provides the same functionality as the azw3r.c program. It should run on any platform that has perl installed. Same syntax, e.g.
Code
azw3r.pl -i name.azw3r > name.notes
or
perl azw3r.pl -i name.azw3r > name.notes
I think I can get something that works for yjr and maybe mbp1, but won't be able to start on it until next weekend.
Reply 

#13  j.p.s 08-11-2019, 08:18 PM
Update:

The azw3r C program and perl script, as is, work for KFX yjr files for highlights or notes separately, that is using only the -n or -h option and not both at the same time. It is OK for the yjr file have both highlights and notes in it. (The C program works with both at the same time on azw3r files.)

The perl script, unlike the C program, does work fine for listing both highlights and notes at the same time for both yjr and azw3r files.

The perl script notes_insert.pl is not able to process the listings for KFX yjr files.

The perl script azw3r.pl probably does not work for notes longer than 255 characters on any file type. This should be easy to fix.
Reply 

#14  Luca2903 08-13-2019, 03:28 PM
HI JPS, very interesting work.

Could you please be so kind to try and help me a little bit?

I have this problem here, and I'd like to understand more if your solution is able to help me.

https://www.mobileread.com/forums/sh...44#post3878444

Thanks!
Reply 

#15  j.p.s 08-13-2019, 06:10 PM
Quote Luca2903
HI JPS, very interesting work.

Could you please be so kind to try and help me a little bit?

I have this problem here, and I'd like to understand more if your solution is able to help me.

https://www.mobileread.com/forums/sh...44#post3878444

Thanks!
My method does not interact with amazon servers, but extracts notes from the files in the .sdr directories for your books on your kindle to plain text output. Depending on how pretty you want the format of the notes, my method might work for you.

You might also look at jhowell's kindle reader data store KRDS https://www.mobileread.com/forums/sh...d.php?t=322172
Reply 

#16  ilovejedd 08-14-2019, 12:24 PM
Quote j.p.s
My method does not interact with amazon servers, but extracts notes from the files in the .sdr directories for your books on your kindle to plain text output. Depending on how pretty you want the format of the notes, my method might word for you.

You might also look at jhowell's kindle reader data store KRDS https://www.mobileread.com/forums/sh...d.php?t=322172
Question, is this capable of extracting the actual text of highlights? I don't think those are stored in the .sdr files (just location).
Reply 

#17  j.p.s 08-14-2019, 01:17 PM
Quote ilovejedd
Question, is this capable of extracting the actual text of highlights? I don't think those are stored in the .sdr files (just location).
The notes_insert.pl script does as part of modifying the rawml file to reflect highlighting.

It would be pretty easy to add an option to both the azw3r C program and perl script to extract the text of the highlights. Part of why I haven't done it yet is because I am unsure how useful those are out of context and because they would contain any HTML markup within the highlight text. (The latter surprised me when it showed up in some short highlights.)
Reply 

#18  j.p.s 08-14-2019, 11:20 PM
I'm attaching a PDF of a book with inserted highlights and notes to this post along with associated files to make it. It turns out that the utility html2ps does not choke on the XML in a rawml file like my web browser does, so there was no need to comment out the XML. What is also surprising to me is that the TOC in the PDF works. This is not meant to be a book with highlights and notes, but rather the highlights and notes shown in context.

The source book is EPUB of The Humbugs of the World by P T Barnum from the Mobileread Library. I used kindlegen to make a dual mobi and used kindleunpack to extract the rawml and azw3, which I copied to a kindle and quickly made 9 highlights with bogus notes.

Then I copied the azw3r and dumped the notes, which also gives the start and end of the each higlight. Next I used the notes_insert.pl from the first post to modify the rawml, then html2ps and ps2pdf. You can search the PDF for '[HL]' or '[Note:' to find the highlights and notes.
[zip] HighlightsNotes_in_pdf.zip (1.01 MB, 18 views)
Reply 

#19  j.p.s 08-17-2019, 01:53 PM
Quote ilovejedd
Question, is this capable of extracting the actual text of highlights? I don't think those are stored in the .sdr files (just location).
The latest C and perl versions now have an option to extract the highlighted text from the rawml file. When the -h argument is supplied along with -r filename.rawml a fourth column will be printed consisting of the highlighted text in single quotes.

These changes, along with an expanded README are in the latest release, v0.1.4, at https://github.com/jps-e/azw3r and the attachments azw3r.c.gz and azw3r.pl.gz have been updated in post #1 of this thread.
Reply 

#20  j.p.s 08-17-2019, 05:57 PM
Quote Luca2903
HI JPS, very interesting work.

Could you please be so kind to try and help me a little bit?

I have this problem here, and I'd like to understand more if your solution is able to help me.

https://www.mobileread.com/forums/sh...44#post3878444

Thanks!
Luca2903,

you have not said what kindle format your books are in. If they are KF8 (azw3), then I think my scripts are in good enough shape for just about anyone to extract both notes and highlights as text as separate files and/or insert them into the text of the book for context.
Reply 

 « First  « Prev Next »  Last »  (2/4)
Today's Posts | Search this Thread | Login | Register