Mobileread
What Features or Tools does Sigil Still Need Yet?
#161  isaacbh 02-22-2021, 05:11 PM
Run SavedSearches on import

Not sure if it should be done with a plugin or in Sigil itself. I'm sure many here have developed over time a set of regex searches that they apply for every file added to the book (and then some specific to the book). It would be nice to have something like a checkmark next to saved-searches entries/groups that will run those entries automatically upon importing with "Add Existing Files". It's only a small convenience, I know (replacing having to Ctrl-Alt-F -> select group -> Replace all), but any savings adds up. I will also cast my vote for preserving search flags.

Thanks!
Reply 

#162  Coleccionista 03-05-2021, 05:16 PM
On the topic of Saved Searches I'd love to see in Sigil or in a Sigil plugin something that would not rely on a static list of entries. I sometimes find myself working with an ebook that let's say has 100s of <i class="xx">...</i> but I cannot expect to know beforehand the contents.

I want to search and replace for the appropiate <i>, <em> or <cite> tags and even add the lang/xml:lang when needed so I cannot really use Search&Replace because every 2 matches I need to switch the replace pattern.

What I would like is something like the spellchecker window where you would enter a tag pattern to search for (i.ex: <i class="xx">) and it will return an orderable, alphabetically and/or by frequency, list of all the text found inside this tag:
Code
Text Number of times Replace tag Language code
Origin of Species 24 cite en
alea 14 i -
Der Weiss Kunig 10 cite de
¬°basta! 2 em -
The idea is that in one swoop you would replace all the bland <i>/<em> or whatever styles for ones that are correct in language and semantics.
Once you have selected the changes Sigil would run all the list of Search and Replaces.
Reply 

#163  Tex2002ans 03-05-2021, 10:15 PM
Quote Coleccionista
On the topic of Saved Searches I'd love to see in Sigil or in a Sigil plugin something that would not rely on a static list of entries. I sometimes find myself working with an ebook that let's say has 100s of <i class="xx">...</i> but I cannot expect to know beforehand the contents.
I've been thinking of similar for a while.

There's portions of these things that exist, but nothing that combines them all into one super power user tool! :P

1. Replacing <i class="xyz"> -> <em>

It doesn't let you see the inner HTML, and you still have to do one-by-one cleanup (but it has regex capabilities for class names).

But the Sigil/Calibre plugins exist:

I wrote a tutorial here:

(These 2 plugins are incredibly high up in my workflow.)

It would be nice to be able to apply this in a nice list, then batch convert... but for that, see #2 below.

2. Style Mapping

This is a nice menu where you could see all current Styles, then you could assign them an equivalent HTML + class in the output.

InDesign and some of the Word->InDesign import tools have this.

For example, being able to say:

This is a video showing off InDesign's Style Mapping. And here are two Adobe pages explaining it in more detail:

Also see lots of my links/posts in these two threads:

This would be an absolutely fantastic functionality to have in Calibre while converting... although I currently don't feel it fits within the scope of Sigil. (But I could be wrong!)

Partial Functionality: If the full-blown Style Mapper is too much, I'm imagining something similar to Tools > Delete Unused Stylesheet...

Maybe a "Consolidate Stylesheet", where you could map nearly redundant classes into each other (like those Word/InDesign CSS where dozens of classes are almost exact duplicates, with only a minuscule difference).

You could check a box (or map) "calibre1", "calibre2", "calibre10", then have it consolidate all those into a single "Clean1". :P

And similar to InDesign, it would be nice to have a little window below that showed you:

when you click on each Style.

3. "Spellcheck List" for Search

I also wrote about something similar last year:

Past few years, I've "secretly" been using this concept of "Italic Lists" to catch typos/errors.

Quote Tex2002ans
For example, ripping every single <i> out and sorting into an alphabetical list:

Code
<i>Enciclopedia Italiana</i>
<i>New York Times</i>
<i>Volksgemeinschaft</i>
<i>Wall Street Journal</i>
<i>Washington Post</i>
<i>individual</i>
<i>laissez-faire</i>
<i>negative</i>
From a glance, you can usually tell which ones are meant to be <i> (newspapers, book titles, foreign words/terms) and which ones are <em> (individual words).

[...]
Splitting ALL italics, then sorted alphabetically + uniques... opens up a whole new class of previously missed errors.

Code
<i>Wall Street Journal</i>
<i>Wa11 Street Journal</i>
right next to each other stands out like a sore thumb.

Having everything displayed beautifully in a "Sigil/Calibre Spellcheck List"-form would be super icing on top.

If there's some sort of editor out there that lets you mass search text/HTML + display similar to Sigil's Spellcheck List... I'd be EXTREMELY interested.

Note: Notepad++'s "Find All" displays in a chronological list form, although it displays the entire line. When working with long paragraphs, many times the hit is going to display off screen:

show attachment »

And there is an (unreleased) Sigil Plugin that let you search using Regex. The hits appear chronologically in the Validation Results, then you could double-click to jump to its exact location:

show attachment »

Helpful, but nowhere near as nice as Spellcheck Lists!

4. Marking Lang

I wrote a few non-standard ways you could hackishly use the Spellcheck Lists to accomplish this:

Sure, nothing as easy/fancy... but it "works".

But yeah... more extremely powerful "Spellcheck List"-like interfaces... ten thumbs up from me.

- - -

I think the Style Mapper is the core to most of this.

Once that functionality gets introduced, I think the potential for the power tools like the "Lang Mapper" or "HTML+Class Mapper" or "Mass Replace Mapper" would follow.
Reply 

#164  Coleccionista 03-06-2021, 03:44 AM
Wow @Tex2002ans! I'm impressed by all the information on your post. Certainly I'm going to add TagMechanic to my plugins in Sigil and let's see if future versions can advance in this area.

One of the things I would also love is if when Sigil can't save the book from HTML errors (missing < or > or a tag, etc) it would give you more information. The popup dialog doesn't identify the wrong file and right now I have to turn live preview and check all recently modified files to get the warning error in LP with the line number (when lucky).
Code View should mark the line with a red dot like you see in Text/Code Editors and switch to offending page/line as soon as you click on "Manually Correct"
Reply 

#165  KevinH 03-06-2021, 09:06 AM
Use the well-formed check button.
Reply 

#166  KevinH 03-06-2021, 09:13 AM
Hi All,

Does Sigil, sometime in the future, need something like a big green "Publish" button that would:

- regenerate the Nav, and any html TOC
- automatically run Mend and Prettify on all code
- update all manifest properties in the OPF
- add in the NCX/Guide for backwards compatibility with older EPUB2 only readers
- add in missing xml:lang, lang, and titles
- add in missing Aria roles
- add in accessibility metadata
- verify no errors from epubcheck
- verify no errors from ACE
- do the equivalent of a save-as to a "Completed Works" folder.
- and finally remove all its previously created checkpoints in the repo

Perhaps then we could add a Publish Preferences setting dialog that would allow you to indicate which of these steps you want to use, the path to the "Completed Works" folder and etc.

Would something along those lines be useful?

KevinH
Reply 

#167  BeckyEbook 03-06-2021, 11:16 AM
Hello,

In theory, I would be in favor of implementing such a button, but working with real EPUB files makes these elements less useful to me personally.
However, I believe that for many people it would be a gift that would make the result files in the wild better.

-----
It would be much more interesting to implement, for example, saving search parameters (already discussed in the thread with proposals), and by the way, I would be very pleased with the Issue #220 solution.
-----

Still, a button like this could be handy.

Becky
Reply 

#168  KevinH 03-06-2021, 12:29 PM
Saving search parameters is already on my future to-do list (see an earlier post in this thread) and this is not a zero sum game since we are talking about over the next year or 2 or even 3.

Issue #220 can already be done with specially crafted regular expressions with look ahead and look behind to rule out unclosed html tags.

Alternatively we could strip out all tags but keep track of the starting position of each character (or word maybe) and then do search on that.

The problem is how to do replace when the found text to be replaced spans multiple nodes in the tree. This is actually a hard problem to solve. That is why find and replace in most browsers is limited to contiguous strings which of course breaks down when markup tags, spans, drop caps, etc are involved.

Until I can think of some way of solving that issue, we are limited to using complex regular expressions to rule out the contents of the tags themselves.

So all of these suggested new features should be viewed on their own merit.

Let us (the Sigil developers) worry about their priority. We will only accept new feature suggestions that we feel we can actually handle and that are doable without major rewrites of Sigil (I am done with major rewrites of Sigil, there are very very few files I have not had to edit over the last 5 years so I really do not want to have to repeat that anytime soon.)

Thanks for your input!
Reply 

#169  KevinH 03-06-2021, 01:16 PM
FYI,

Here is a simple example to find the word "title" *not* inside a tag itself, here is the simplest regex search I could think of off the top of my head. It assumes there is no bare text in the body tag and that the xhtml is well formed.

I tried it and it appears to work. There are probably better more exhaustive regex, that can handle even broken xhtml.

Code
title(?=[^>]*<)
This basically says search for "title" but lookahead to make sure there are no closing tag chars ">" before you find the next opening tag char "<".

There are probably look behind versions that could work with reverse logic. And there are ways to use regex to find a two strings that ignores any intervening tags.

Give it a try. You could add a saved search easily to do that. But again it will not handle find and replacement of text that crosses over elements (over nodes in the tree). That is the hard part unless you have one to one corresponding matching of matching substrings to replacement substrings which in general need not be the case.

And of course if you use &lt; and &gt; inside strings to show a "tag" or code snippet, these would be found by mistake so reviewing each find before the replace would be needed.
Reply 

#170  Tex2002ans 03-06-2021, 02:05 PM
Quote Tex2002ans
Note: Notepad++'s "Find All" displays in a chronological list form, although it displays the entire line. When working with long paragraphs, many times the hit is going to display off screen:
In Notepad++, after pressing Find All:

I Right-Clicked the "Search Results" box at the bottom, and there's a setting called "Word wrap long lines".

That fixes one little issue I had. :P

Quote Tex2002ans
Wow @Tex2002ans! I'm impressed by all the information on your post.


Quote Coleccionista
Certainly I'm going to add TagMechanic to my plugins in Sigil and let's see if future versions can advance in this area.
It's an essential plugin.

The most important reason I use it is because it handles nested tags. So let's say you have a <span> inside a <span>:

Code
<span class="italics">This is <span class="emphasis">emphasis</span> and this should still be italics.</span>
Trying to replace the outer <span>:

Search: <span class="italics">(.+?)</span>
Replace: <i>\1</i>

would lead to this:

Code
<i>This is <span class="emphasis">emphasis</i> and this should still be italics.</span>
TagMechanic actually parses the HTML tree, so it knows what opening/closing tags belong together.

Code
<i>This is <span class="emphasis">emphasis</span> and this should still be italics.</i>
Quote Coleccionista
One of the things I would also love is if when Sigil can't save the book from HTML errors (missing < or > or a tag, etc) it would give you more information. The popup dialog doesn't identify the wrong file and right now I have to turn live preview and check all recently modified files to get the warning error in LP with the line number (when lucky).


I know I wrote about that years ago... it's buried somewhere on MobileRead... lol.

Some of the popups tell you exact filenames, but many just say there's a "not well-formed XHTML file" but don't tell you exactly where.

For now, a workaround I do is run Doitsu's "EPUBCheck" plugin.

This points out exact filename + usually gives you a more accurate picture:

show attachment »

You can also double-click to jump to the location in the file.

Quote KevinH
Does Sigil, sometime in the future, need something like a big green "Publish" button that would:
Hmmmm... this does sound like a user-friendly enhancement.

Like an easy "press this as a final step".

Quote KevinH
Perhaps then we could add a Publish Preferences setting dialog that would allow you to indicate which of these steps you want to use, the path to the "Completed Works" folder and etc.
Agree with this.

And the optional checkboxes are key.

Maybe even in the popup window, it lists all the steps + adds:

so IF Sigil is doing something weird/unexpected during the publishing step, they'll know someplace to look.

Quote KevinH
- regenerate the Nav, and any html TOC
Many times, you'd do manual adjustments.

(Either the look of the HTML or the content.opf itself.)

Like some publishers still stupidly insist on adding front/backmatter to the TOC.

Something like this might bring more frustration. (But again, checkbox solves that! Now... to have it on or off by default... .)

Quote KevinH
- automatically run Mend and Prettify on all code
Which reminds me of another extremely minor niggle (although I haven't tested on the latest Sigil versions).

Let's say you have this XHTML. It's already nice and prettified:

Code
<html xmlns="http://www.w3.org/1999/xhtml">
<head> <title></title>
</head>
<body> <p>This is an example.</p>
</body>
</html>
Right-Click + Link Stylesheets:

Code
<html xmlns="http://www.w3.org/1999/xhtml">
<head> <title></title><link href="../Styles/stylesheet.css" type="text/css" rel="stylesheet"/></head>
<body> <p>This is an example.</p>
</body>
</html>
Then you have to prettify all over again. :P

I think a similar thing happens with Tools > Table Of Contents > Create Table of Contents + Tools > Add Cover.

Maybe these buttons should insert nice, prettified code by default.
Reply 

 « First  « Prev Next »  Last »  (17/18)
Today's Posts | Search this Thread | Login | Register