Mobileread
Obelisk -- legal distribution of format-shifted copyrighted works
#1  llasram 03-04-2008, 09:30 PM
I love curly quotation marks. They're so round and inviting. I also love free e-books, and so have been delighted by Tor's current free–e-book–each–week program. Perhaps by Tor my loves may be joined? But alas not – the HTML versions Tor provides have ASCII quotation marks, and when I asked if this could be rectified was told “I'm afraid the quotation-mark conversion has to stay.”

So for Robert Charles Wilson’s Spin I rolled up my crazy-sleeves, pulled out by regexps, and fixed them myself. Every last one. And modified the CSS and some of the markup to much more more closely resemble the formatting in the PDF version. Then wrapped it up as a valid .epub book. Then converted/tweaked to produce a great-looking Sony Reader BBeB book.

And they’re all for only me! Nope, can’t give them to you. The power of copyright compels me! I can add those curly quotes myself because I have the source HTML to start with. If I start handing people my curly-quoted version I have no means to stop it from falling into new hands which didn’t already have the straight-from-Tor edition.

Or do I?

I could provide you with a grid of just the byte offsets of the various curly quotes. Some extreme variant of diff/patch in which nothing of the original copyrighted text persists. It would contain just my curly quotes, owned by me under copyright law and free to give you as I wish. You provide the straight-from-Tor e-book, mix in my curly quotes and poof! – you have a be-curled edition of Spin. But this doesn’t work for format-shifting over compression, encoding changes, etc., where “put a curly quote here” ceases to makes sense.

Unless we distill the idea down to the lowest level – what is XOR but the difference between two bits?

Let’s try an experiment, which I’m calling Obelisk[1]. Download the following files:
obelisk.py
Mohm5pei#WilsonSpin_HTML.zip#Spin.epub.obelisk
AhZe5shu#WilsonSpin_HTML.zip#Spin.lrf.obelisk
Then get your copy of WilsonSpin_HTML.zip handy, pop open your favorite shell, and run:

Code
python obelisk.py Mohm5pei WilsonSpin_HTML.zip Mohm5pei#WilsonSpin_HTML.zip#Spin.epub.obelisk Spin.epub
python obelisk.py AhZe5shu WilsonSpin_HTML.zip AhZe5shu#WilsonSpin_HTML.zip#Spin.lrf.obelisk Spin.lrf
The results should be curly-quoted .epub and BBeB versions of Spin, seamlessly merging Tor’s bits with mine into unified wholes.

Let me know what you think.

[1] Obelisk is similar to and inspired by a “project” called Monolith, although with rather different goals.
Reply 

#2  kovidgoyal 03-04-2008, 09:46 PM
Assuming the source file has an even number of quotes, shouldn't replacing them with curly quotes be as simple as

Code
intag = False
inquote = False
for i, chr in enumerate(data): if chr == '<': intag = True elif chr == '>' intag = False elif not intag and chr == '"': if inquote: data[i] = right curly quote inquote = False else: data[i] = left curly quote inquote = True
Or is there something about curly quotes I'm missing?
Reply 

#3  llasram 03-04-2008, 10:07 PM
Quote kovidgoyal
Assuming the source file has an even number of quotes, shouldn't replacing them with curly quotes be as simple as
It’s mostly mechanizable, but not quite that simply. For example:
“This quotation-marked bit goes on for more than one paragraph. It doesn’t end with a double quote.

“And here I have some ‘examples’ of single quotes. I’ve got several of ’em. The examples’ quotation marks point in all kinds of directions.

“And here ends the quote.”
So pretty much the rules are:

Code
<ws>" == “
"<ws> == ”
\w'\w == ’
'<ws> == ’
<ws>' == ‘
Where <ws> is whitespace plus ( ) [ ] - – —.

But then have to manually check all the instances of “<ws>‘” and probaly start by looking for any quotations marks with white space on both sides (usually found when doing "something like 'this' ").

So anyway. Mostly mechanizable, but still some manual labor to get it perfect. And can’t automate improving the CSS. :-)
Reply 

#4  kovidgoyal 03-04-2008, 10:16 PM
Ah I see, well lets see if Tor starts beating on your door in the middle of the night.
Reply 

#5  JSWolf 03-04-2008, 11:38 PM
Um... it won't work because it never came zipped. And how do we know the filename to use in the ZIP file or even if we have the exact same contents?
Reply 

#6  llasram 03-05-2008, 12:21 AM
Quote JSWolf
Um... it won't work because it never came zipped. And how do we know the filename to use in the ZIP file or even if we have the exact same contents?
The e-mails actually contain links to two separate HTML versions. One is the HTML content served directly, the other is a ZIP archive which contains the images used in the book, a (broken) OPF file, etc.
Reply 

#7  JSWolf 03-05-2008, 12:34 AM
Quote llasram
The e-mails actually contain links to two separate HTML versions. One is the HTML content served directly, the other is a ZIP archive which contains the images used in the book, a (broken) OPF file, etc.
Yes, you are correct. My apologies. I'll give your script another go and see how it works out.
Reply 

#8  JSWolf 03-05-2008, 12:39 AM
How do I use your script to generate a diff file for other content? I'd love to do one for Mistborn based on the PDF to make the LRF from it.
Reply 

#9  JSWolf 03-05-2008, 12:52 AM
I've taken the EPUB edition and built an LRF to my specification. Looks nice. Now all I need to do is build a proper ToC and I'll be all set.
Reply 

#10  llasram 03-05-2008, 09:10 AM
Quote JSWolf
How do I use your script to generate a diff file for other content? I'd love to do one for Mistborn based on the PDF to make the LRF from it.
It's symmetric, so:

Code
python obelisk.py SALT KEYFILE INFILE OUTFILE
For both decryption and encryption. The SALT parameter is some string of your choosing but should not be reused for a particular KEYFILE. For example:

Code
python obelisk.py sai3sahS 9780765350381.zip Mistborn.lrf sai3sahS#9780765350381.zip#Mistborn.lrf.obelisk
HOWEVER – I am not a lawyer. This certainly seems reasonable given that one needs the original file to reconstitute the derived file, but I don’t really know if Tor and/or your nation’s legal system will see it that way. This is an experiment – use at your own risk.
Reply 

  Next »  Last »  (1/3)
Today's Posts | Search this Thread | Login | Register