Mobileread
Coping with countless images
#1  Vroni 11-25-2019, 03:07 AM
Hi,

i have an epub with tons of pictures. Round about 50% is just a fleuron, and all of them are the same.

The filenames are all numbered in sequence, so there is no chance to identify a picture by its name.

So mi renamed one of the fleuron images to fleuron.jpg to be the master and change all other occurences to that file. I can walk trough all img elements, but unfortunetly the fleuron image is surrounded by a lot of different other images. In the preview its now not clear what picture has currently been caught by the regex. So this approach ends up in a lot of manual tasks. either by replacing by trial and if the wrong one has been marked rollback and try the next one. Or i use the "open picture in tab" to see if the current caught picture is the fleuron one and the reference can be set to the master fleuron.

I tried another approach from the report but you cant rename or jump from the piture list to anything else - except deleting which is not helpful.

If the preview would mark what has been marked in the code view - that would be an easy thing - but thats not available.

Has someone another idea to speed this process up? Did i miss something?

\\\/roni
Reply 

#2  Doitsu 11-25-2019, 04:58 AM
Quote Vroni
Has someone another idea to speed this process up? Did i miss something?
Since you know some Python, you could use BeautifulSoup to get all image tags and PIL to get the image size. The following Edit plugin code should work for you:

Spoiler Warning below






Code
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
from io import BytesIO
from PIL import Image
from sigil_bs4 import BeautifulSoup
def run(bk): # preferences max_width = 50 fleuron_name = 'fleuron.jpg' # process file list for (html_id, file_href) in bk.text_iter(): file_name = os.path.basename(file_href) print('Processing {}...\n'.format(file_name)) html = bk.readfile(html_id) # load html code into BeautifulSoup soup = BeautifulSoup(html, 'html.parser') orig_soup = str(soup) # look for images img_tags = soup.find_all('img') for img in img_tags: if 'src' in img.attrs: href = img['src'] base_name = os.path.basename(href) id = bk.basename_to_id(base_name) if id: # get image file size imgdata = bk.readfile(id) img_data = Image.open(BytesIO(imgdata)).convert('L') width, height = img_data.size if width <= max_width: img['src'] = href.replace(base_name, fleuron_name) print('{} renamed to {}'.format(base_name, fleuron_name)) else: print(img['src'] + ' skipped! (empty img tag)\n') if str(soup) != orig_soup: bk.writefile(html_id, str(soup.prettyprint_xhtml())) print('\n{} updated\n'.format(file_name)) print('\nPlease click OK to close the Plugin Runner window.') return 0
def main(): print('I reached main when I should not have\n') return -1
if __name__ == "__main__": sys.exit(main())


It looks for images with a width of up to 50 pixels and changes the file name in the img src attribute to fleuron.jpg.

If that code catches too many false positives, you might find KevinH's Access-Aide plugin helpful. Simply change the alt attribute of all fleurons to fleuron and then use a regex to change the file name of all images with a fleuron alt attribute.
Reply 

#3  Vroni 11-25-2019, 06:22 AM
Quote Doitsu
Since you know some Python,.
That's what I was afraid of.
Reply 

#4  DiapDealer 11-25-2019, 08:18 AM
Quote Vroni
That's what I was afraid of.
???
You're afraid of having options that you wouldn't have if there were no plugin interface and a working knowledge of Python?
Reply 

#5  Vroni 11-25-2019, 08:31 AM
Afraid can mean anxiety or fear.

So i was afraid that Python is the only option i have.

And if you would look over my shoulder how slowly I am still in python...

My first (and only) plugin took weeks
Reply 

#6  DiapDealer 11-25-2019, 08:44 AM
Understood. I just figured HAVING an option as opposed to NOT having any might actually be comforting in some small way. Call me crazy, though.
Reply 

#7  Turtle91 11-25-2019, 09:37 AM
Quote Vroni
Hi,

i have an epub with tons of pictures. Round about 50% is just a fleuron, and all of them are the same.

The filenames are all numbered in sequence, so there is no chance to identify a picture by its name....
Is there some kind of surrounding tag that can be used to identify which image is used as a fleuron?

eg:

<div class="fleuron"><img alt="" src="../Images/01.jpg"/></div>
<div class="fleuron"><img alt="" src="../Images/02.jpg"/></div>
<div class="fleuron"><img alt="" src="../Images/15.jpg"/></div>

search: <div class="fleuron"><img alt="" src="../Images/(.*?).jpg"/></div>
replace: <div class="fleuron"><img alt="" src="../Images/fleuron.jpg"/></div>

Then run a report and delete all images that are used 0 times.
Reply 

#8  Brett Merkey 11-25-2019, 09:38 AM
Quote
The filenames are all numbered in sequence, so there is no chance to identify a picture by its name.
I don't use Sigil so I don't know its behavior but I had the problem a few times and Calibre helped me out. There was no way to differentiate the images with regex so I used a regex that would find every image and just stepped thru every one. At each found image, Calibre selected the code and showed me the image in the preview pane. For most images, pass. For the fleuron, "Replace and Find."

Not elegant, but sounds like much less effort than what you described.
Reply 

#9  Turtle91 11-25-2019, 09:41 AM
An alternate - if a little more manually intensive - is to open the inspector. Hover the mouse over the different images in the inspector list and it will highlight the image in the preview pane. Then you can note which image name is the fleuron.
Reply 

#10  Tex2002ans 11-25-2019, 10:03 AM
Let's say you're starting with images:

Code
<img alt="" src="../Images/image001.png" />
[...]
<img alt="" src="../Images/image098.png" />
<img alt="" src="../Images/image099.png" />
This is potentially what I would do:

1. Go into Tools > Reports > Image Files.

2. Look through the images, any that are duplicate fleurons: Right-Click > Delete From Book:

show attachment »

3. Once you get rid of all the fleurons, then return back to the main Sigil window.

Open the Images folder, Shift-Click to highlight all images, Right-Click > Rename:

show attachment »

4. Rename to something completely different. Like "TempImages001".

Now, all your surviving images will be named "TempImages001", "TempImages002":

Code
<img alt="" src="../Images/TempImages001.png" />
[...]
<img alt="" src="../Images/image098.png" />
<img alt="" src="../Images/TempImages002.png" />
while all the non-existent fleurons will be under the old naming convention.

5. Now you can use Regex to easily change all the old image code into "fleuron.png":

Search: <img alt="" src="[^"]+image\d+\.png" />
Replace: <img alt="" src="../Images/fleuron.png" />

6 (Optional). Now go through and give your surviving images all human-readable names.
Reply 

  Next »  Last »  (1/3)
Today's Posts | Search this Thread | Login | Register