Mobileread
"Delete unused stylesheet classes" problems with overloaded styles
#11  DiapDealer 12-31-2020, 08:59 AM
Also keep in mind that any automated process's ability to accurately determine whether a css class is "in use" will never be 100%.
Reply 

#12  wrCisco 12-31-2020, 12:23 PM
Hi, since I didn't want to reappear just to point out that the plugin cssRemoveUnusedSelectors can be used to overcome some limitations of the builtin Delete Unused Stylesheet Classes, I took a peek at the Sigil source code to look for the culprit of this behavior.

If I correctly understood the code, Sigil in BookReports::AllClassesUsedInHTMLFileMapped retrieves all the couples tagname - classname used in xhtml files, and then look for a match with every css selector that happens to contain at least one .dot-preceded-classname (CSSInfo::getAllCSSSelectorsForElementClass). If the selector doesn't contain tagnames, it will match if it contains the appropriate classname, otherwise it will have to contain the appropriate sequence tagname.classname for the match to happen.

This is problematic for compound selectors, since they can contain tagnames and classnames which don't refer to the same element: so, if a user have a selector like ".chapter p" it will probably never match anything, and I'm afraid it won't be easy to amend the code (at least, it wouldn't be easy if it was me that had to work on it...).

A selector like "div.chapter p" will instead match if there is a div element in xhtml with the class chapter, even if it hasn't any p descendant - which can be good, I guess.

Pseudo classes and pseudo elements, like :first-of-type and ::first-line, seems to me that they aren't handled in any way in CSSInfo, so I suppose that they become part of the classname or the tagname that precedes them. In fact, if you have
Code
<p class="matchthis:first-of-type">
in xhtml and
Code
p.matchthis:first-of-type {}
in css, it won't be deleted.

(There is also the possibility that I misunderstood all the Sigil's source code and the answers are totally different. New year will tell... Cheers! )
Reply 

#13  KevinH 12-31-2020, 02:00 PM
Long on my todo list is to add a real C or C++ library css parser but I have not found one that I like. If I do I will add it. Until we do, we have to live with the Sigil css parser which was designed in css2 epub2 days and unfortunately not my code at all so I am not 100% sure I am following it.

So the best we can offer in the case of compound tags is to split them internally and then assume any more complex structure is to be left alone, ie, not offered up as an unused selector of any sort.

I will look into it.
Reply 

#14  hobnail 12-31-2020, 02:57 PM
Quote Frenzie
.pc-rw will match any element with a pc-rw class. E.g.:
Code
<p class="pc-rw">
<img alt="I'm matched by .pcrw img! :-D">
</p>
div.pc-rw will only match divs with a pc-rw class.
Code
<p class="pc-rw">
<img alt="I'm not matched by div.pcrw img! D-:">
</p>
Agreed. My "fix" was using the assumption that Sigil doesn't handle bare classes in combinators. Possibly the css overgeneralized and the pc-rw was only used on divs, but otherwise it's not a good solution.
Reply 

#15  DNSB 12-31-2020, 03:23 PM
Quote AlanHK
I was looking at an ePub with these styles:
(passes ePubCheck , no warnings).
Were you using Sigil's internal Delete Unused Stylesheet Classes tool or the CSSRemoveUnusedSelectors plugin?

I switched to using the plugin since it seemed to work better with some overly complex stylesheets.
Reply 

#16  Frenzie 12-31-2020, 04:14 PM
I skimmed the code a bit and I share @wrCisco's superficial impression of what it seems to be doing.

Quote hobnail
Agreed. My "fix" was using the assumption that Sigil doesn't handle bare classes in combinators. Possibly the css overgeneralized and the pc-rw was only used on divs, but otherwise it's not a good solution.
Cool!
Reply 

#17  KevinH 12-31-2020, 09:47 PM
Okay, it seems the the CSSInfo parser of Sigil does not handle combinators at all nor pseudo classes nor @media rules.

To properly test a css selector that uses adjacent, child, or descendent combinators means some use of a css selector based query or xpath like interface for Sigil's html5 repair parser gumbo. And as far as I know, these simply do not exist in C++ or C. I will continue to search for one. The closest I can find is a jQuery like interface for gumbo here:

https://github.com/lazytiger/gumbo-query

but it appears to be 5 years old with no real updates.


If I can not find anything useful, we must then turn to python and its css-parser and cssselect and lxml to do this properly. But that means we would just be pretty much duplicating wrCisco's plugin but internal to Sigil using pyqt5 in place of tk.

That seems to be wasteful duplication. Perhaps we should delete the unused class removal feature from Sigil and instead point people to wrCisco's plugin for that functionality completely.

Ideas? Thoughts?
Reply 

#18  Frenzie 01-01-2021, 03:10 AM
I'm guessing you already evaluated Qt's CSS parser and determined it was unsuitable to the purpose?

Quote
Perhaps we should delete the unused class removal feature from Sigil and instead point people to wrCisco's plugin for that functionality completely.
Fwiw, sounds good to me.
Reply 

#19  DiapDealer 01-01-2021, 06:41 AM
Couldn't we incorporate (with his permission, of course) wrCisco's python code into Sigili's python3lib and use the c++ embedded python interface to access it? Thus skipping the need to use PyQt at all for the gui? I'm not certain what else the existing plugin might provide, but even if we don't bring it entirely "in house" (eliminating the need for the third-party plugin altogether), surely we can come up with an interface to the portions we DO need to access via embedded python interpreter while still exposing those same absorbed parts to plugins via the plugin framework? Thus avoiding duplication.
Reply 

#20  Turtle91 01-01-2021, 09:59 AM
Quote DiapDealer
Couldn't we incorporate (with his permission, of course) wrCisco's python code into Sigili's python3lib and use the c++ embedded python interface to access it? Thus skipping the need to use PyQt at all for the gui? I'm not certain what else the existing plugin might provide, but even if we don't bring it entirely "in house" (eliminating the need for the third-party plugin altogether), surely we can come up with an interface to the portions we DO need to access via embedded python interpreter while still exposing those same absorbed parts to plugins via the plugin framework? Thus avoiding duplication.
There are 2 of wrCisco's plugins that I use regularly - each is a side of the same coin: cssRemoveUnusedSelectors and cssUndefinedClasses. The first, as you know, removes CSS selectors that aren't used in the HTML, the second removes class references in the HTML that don't have a corresponding style in the CSS.

If wrCisco doesn't object, it seems like incorporating BOTH of those plugins into the same Sigil function (with all the appropriate user selections) would make sense.

As a very minor nit - the Remove Unused Selectors does not combine leftover CSS.

Spoiler Warning below






eg
Code
sup, sub {font-size:0.675em}
sup {vertical-align: 35%}
sub {vertical-align: -20%}
<p>Today is the 1<sup>st</sup> day of 2021!!!!</p>
becomes:
Code
sup {font-size:0.675em}
sup {vertical-align: 35%}
<p>Today is the 1<sup>st</sup> day of 2021!!!!</p>
when, ideally, it should be:
Code
sup {font-size:0.675em; vertical-align: 35%}
<p>Today is the 1<sup>st</sup> day of 2021!!!!</p>
Reply 

 « First  « Prev Next »  Last »  (2/9)
Today's Posts | Search this Thread | Login | Register