Mobileread
Converting .IMP to anything? WE ARE NOW THERE!
#1  nrapallo 04-16-2008, 12:37 PM
EDIT: 16 May 2008

Welcome to deimp.exe, the text decompressor (extractor) for .imp files created by Nick Rapallo (me).
See the thread Reverse-engineering the .IMP format for deimp's C source code as deimp_v0.1_source.zip.

Version 0.1 is very basic, but works well given these caveats:
Usage:

1. Unzip deimp_v0.1.zip into the directory where your .imp files are stored. All sub-directories below are processed recursively, leaving only the '.txt' extracted.

2. Double-click the windows batch file 'extract text from imp files.bat' and wait.

3. That's it. Just edit resulting '.txt' files. Please note that you will have to replace non-common characters like certain quotes, mdashes, etc...

The extracted text (with no images/hyperlinks) can then be easily converted by BookDesigner or equivalent.

Thanks (and a bit of karma) goes to delphidb96 for the link to the LZSS source code (it is the basis of deimp)! And thanks go to Michael Dipperstein (mdipper@alumni.engr.ucsb.edu) for his LZSS source code (lzss-0.6.zip). More information on LZSS encoding may be found at: http://michael.dipperstein.com/lzss

Enjoy!
-Nick

p.s. you do not need the 'unimp.zip' and 'reimp.zip' files; just get the 'deimp_v0.1.zip'!

p.p.s. added 'The Pilgrims Progress in Words of One Syllable.RES.txt', sample conversion of .imp here

p.p.p.s. should you need to extract only one .imp file, you can use the 'extract.bat.txt' attachment (just save as 'extract.bat') and in the MS-DOS command prompt window, type:
1. unimp "Impfilename.imp" and
2. extract "Resdirname" (note no '.RES' and 'Resdirname' may differ from 'Impfilename').


Previously....
This was a response posted in another forum about converting .imp to .prc (or anything useful!)
Quote nrapallo
No, Mobi2IMP doesn't convert from imp to mobi.

Unfortunately, the .IMP format is an 'end' one; meaning there is no known way to extract the original source (.html and images).

However, there are several ways that the ebook 'text' can be extracted to varying degrees of success, as follows:

1. Using the 'ebook viewer.exe' installed on the PC (after installing the free eBook Publisher software here), you would 'Print' using a printer driver that saves to .pdf format, then OCR the resulting .pdf.

2. With some .IMP, it is possible to open them with a text editor (like Wordpad) and 'go to the middle' of the document. There you may find the text used in the .IMP ebook. This depends on which software and compression-setting created that .IMP (does seems to work with recent eBook Publisher creations that are not internally compressed - with LZSS)

Either way, you would loose all formatting, links and all HTML codes in the process. But, would have the text to 'begin' the conversion process.

Most users creating .IMP's ALSO retain the original source for this very reason.

Hope this helps.

Edit: this has all been previously discussed before here
From the IMP@Wiki » Technical Specs:
Quote
DATA.FRK File
Element text is extracted and placed in this file. Elements tags are replaced with control characters. This file can be compressed and encrypted with compression occuring before encryption. This file is compressed when the element <meta name="x-SBP-compress" content="on"/> is included in the <x-metadata> element of the package file. The compression algorithm used is LZSS. This file is encrypted when the element <meta name="x-SBP-encrypt" content="on"/> is included in the <x-metadata> element of the package file. The encryption algorithm used is DES. The 8 byte encryption key is in the SoftBook Edition Encryption Key File (.key) at offset 0x0C.

Characters less than 0x20 are removed expect for line break which is replaced with 0x20. Mutliple 0x20 characters are replaced with a single 0x20.

Control characters
0x0A end of document, forced page break
0x0B start of element except <span>
0x0D line break element <br />
0x0E start of table element <table>
0x0F image element <img />
0x13 end of table cell </td> tag
0x14 horizontal rule element <hr />
0x15 before and after page header content
0x16 before and after page footer content
By looking at the .IMP specs and exploding the .imp to .res format with unimp.exe, I have noticed that the "text" portion (Data.frk) of the 1150 and 1200 .imp versions are identical (even for very complex files)! The formatting changes are stored in the various files within the .res for each reader and those differ greatly!

Alas, if the "text" is compressed, then it is not visible within the .imp. It must first be 'uncompressed' using a LZSS algorithm so we are not there yet!

Still trying... WE ARE NOW THERE!

p.s. I've added the unimp.exe program as well as a reimp (sbtest.exe) program. Just drag and drop your file onto the .exe name in Windows Explorer (or a shortcut icon you've created on your desktop).
[zip] unimp.zip (2.5 KB, 3897 views)
[zip] reimp.zip (194.8 KB, 2920 views)
[zip] deimp_v0.1.zip (726.1 KB, 5155 views)
[txt] The Pilgrims Progress in Words of One Syllable.RES.txt (128.7 KB, 1494 views)
[txt] extract.bat.txt (391 Bytes, 1877 views)
[txt] deimp-readme.txt (1.7 KB, 2009 views)
Reply 

#2  LeserattePD 04-17-2008, 08:37 AM
Thanks a lot for taking this problem on!

It is much appreciated, as I am planning to move from my ebookwise to an eink device sometime this year and would hate to go through downloading all my ebooks again. Thankfully my books mostly are from BAEN so I can download them in another format, but I also have a few secure format books from ebookwise and would hate to loose those (especially as they are so expensive to begin with!).
Reply 

#3  nrapallo 04-17-2008, 09:43 AM
Quote LeserattePD
Thanks a lot for taking this problem on!

It is much appreciated, as I am planning to move from my ebookwise to an eink device sometime this year and would hate to go through downloading all my ebooks again. Thankfully my books mostly are from BAEN so I can download them in another format, but I also have a few secure format books from ebookwise and would hate to loose those (especially as they are so expensive to begin with!).
Sorry, we are not there yet, not even close!

And I am not looking to 'crack' secured .imp ebooks, only non-DRMed .imp would be supported, if I can find a LZSS uncompressor routine.
Reply 

#4  delphidb96 04-17-2008, 12:15 PM
Quote nrapallo
Sorry, we are not there yet, not even close!

And I am not looking to 'crack' secured .imp ebooks, only non-DRMed .imp would be supported, if I can find a LZSS uncompressor routine.
Well, you have no need to worry about 'cracking' any BAEN books because they've always been DRM-free. (That's what I love about BAEN books.) As for an LZSS decompressor routine... I've attached just one of the many source files that I googled to this post. I found it here:

http://michael.dipperstein.com/lzss/#download

Enjoy!

Derek
[zip] lzss-0.6.zip (52.6 KB, 1784 views)
Reply 

#5  nrapallo 04-17-2008, 12:22 PM
Quote delphidb96
Well, you have no need to worry about 'cracking' any BAEN books because they've always been DRM-free. (That's what I love about BAEN books.) As for an LZSS decompressor routine... I've attached just one of the many source files that I googled to this post. I found it here:

http://michael.dipperstein.com/lzss/#download

Enjoy!

Derek
Nice find! All I kept on getting with google was the simple v1.0 lzss.zip (though I didn't try it yet)!

I will tinker with this to see if I can get this algorithm to work.
Reply 

#6  delphidb96 04-17-2008, 12:51 PM
Quote nrapallo
Nice find! All I kept on getting with google was the simple v1.0 lzss.zip (though I didn't try it yet)!

I will tinker with this to see if I can get this algorithm to work.
Please do as I've got a ton of .imps I want to convert for my Cybook!

Derek
Reply 

#7  Roberts324 04-18-2008, 09:08 AM
Quote nrapallo
Nice find! All I kept on getting with google was the simple v1.0 lzss.zip (though I didn't try it yet)!

I will tinker with this to see if I can get this algorithm to work.
That would be nice, indeed!

And a new dent to your knife's handle...
Reply 

#8  nrapallo 05-16-2008, 10:13 AM
More to follow very soon...

See post #1 above for ALL the juicy details!
Reply 

#9  GeoffC 05-16-2008, 11:13 AM
Quote nrapallo
More to follow very soon...
Forgive the intrusion , but what's an .imp file ?
Reply 

#10  HarryT 05-16-2008, 11:19 AM
The book format used by the EB1150 bookreader (the subject of this forum section).
Reply 

  Next »  Last »  (1/3)
Today's Posts | Search this Thread | Login | Register