Mobileread
Merging books with same format
#1  Doug-W 02-17-2011, 11:24 AM
I had the case where I had a library with some books, and a second library with more books, some of which may have been an improved version of the books in the first library. I first imported the new books into the library, but now I had a series of duplicate books. In this case however, I knew that if there was a duplicate book, the original book had the most up to date meta data, and it was only the case where there were two books with the same title that I'd want to merge them and have the format of the newer book overwrite the format of the older book.

So an hour or so with database2.py, and I have this snippet:
calibre-debug

Code
from calibre.library.database2 import LibraryDatabase2
db = LibraryDatabase2('/path/to/library/folder');
dupes = db.conn.get('select title from books group by title having count(*) > 1;')
for dupe in dupes: ids = db.conn.get('select id from books where title=? ORDER BY id DESC', (dupe)) base_id = ids.pop(); for id in ids: formats = db.conn.get('SELECT format from data where book=?', (id)) for format in formats: f = db.format(id, format, index_is_id=True, as_file=False) if not f: continue stream = cStringIO.StringIO(f) db.add_format(base_id, format, stream, index_is_id=True, path=tpath, notify=False) db.remove_format(id, format, index_is_id=True, commit=False) db.delete_book(id, commit=False)
db.conn.commit()
db.clean()
And, all duplicate books are merged, with formats overwriting one another. Hope that helps or that someone can come up with a better way of handling it
Reply 

#2  Starson17 02-17-2011, 03:23 PM
Quote Doug-W
if there was a duplicate book, the original book had the most up to date meta data, and it was only the case where there were two books with the same title that I'd want to merge them and have the format of the newer book overwrite the format of the older book ....
As of 0.7.45 you could have turned on the overwrite option of automerge, saved the books out of the first library and imported them into the library with the good metadata. New formats would overwrite old formats.

Once the books are in a library, you can get duplicates to merge by using the Copy To Library function and sending all the books to the new library. The problem there is that there's no current way to separate formats from metadata. You can control which metadata/format record survives by sorting, but you can't separate a format from its associated metadata.

I didn't implement the new overwrite option (one of the three options) for automerge in CTL. Isn't that always the way! I didn't see it as being very useful, but now I can see scenarios (like this) where it may be of some value.

When I get a moment, I suppose I'll implement the new options for CTL. Then you could send the books sorted by oldest date into an empty library (ignore dupe formats turned on) to build records with good metadata (and original formats), followed by sending them again (sorted the same way, but with overwrite formats turned on) to replace older formats with newer ones (which will arrive last during CTL and therefore be the surviving format in a record with the original metadata.)
Reply 

#3  Doug-W 02-17-2011, 07:37 PM
Interesting, could you reply back to this thread and let me know when that's done? I'd love to give that a try.

Since I have so much library data, I've just made 3-4 copies of the different libraries at different times so that I can try different things and undo them if it turns out that it doesn't work.
Reply 

#4  kiwidude 02-17-2011, 07:58 PM
Quote Doug-W
Interesting, could you reply back to this thread and let me know when that's done? I'd love to give that a try.

Since I have so much library data, I've just made 3-4 copies of the different libraries at different times so that I can try different things and undo them if it turns out that it doesn't work.
Did you see the first part of Starson's answer above? If you don't mind using the "Send to disc" feature you could "re-add" your books that way, in which case you will have the full suite of overwrite options as of Calibre 0.7.45.

It is only if you want to avoid the Send to disc and instead use a direct "Copy to library" that there is a gap currently.

Just in case you didn't want to wait. When you save to disc you can write out the .opf file and do a book per directory so you won't lose your metadata for books that don't exist in the library you then import into.
Reply 

#5  Doug-W 02-17-2011, 10:22 PM
Yep, I'm going to experiment with it now and see how it goes. I think the problem is that I'm still running 0.7.44 but the source code I downloaded to write the above was 0.7.45 *oops*
Reply 

Today's Posts | Search this Thread | Login | Register