Mobileread
Two problems with splitting files
#1  BeckyEbook 01-26-2021, 07:44 AM
First case
  1. Open any file, place the cursor in front of any open or close tag, and try to split the file (Split At Cursor).
Message: File cannot be split at this position.

Why? After all, it is enough to move the cursor one field to the left and then the split will work.


Second case
  1. Open any file, but change the extension from xhtml to xml (I see files like that in the wild quite often).
  2. Place the cursor anywhere (apart from the one I described in the first case of course) and try to split the file (Split At Cursor).
Message: Cannot split since it may not be an HTML file.

Despite the message, the split will be performed ... but the split part of the file will be lost (we lose a piece of code BEFORE split position, and it remains AFTER split)
sigil-cannot-split-before-tag.png 
Reply 

#2  KevinH 01-26-2021, 11:24 AM
The cursor is actually on the first <, not "before it.
so if CV shows something like |<tag> , then the cursor is in the tag. If you had a block cursor it would highlight the < but CV uses an insertion point cursor.

The long held Sigil rule is you can not split when in a tag. If you think this is too tight, the I would be happy to consider a patch or PR that handles just that one case.


Note, a file with an xml file extension can be of any type, it could be an opf, an ncx, an xhtml, a generic xml file, etc. There is no way to know the tag meanings: a body tag may not be what you think it means, there may be no html root tag, there is no way to tell which are void tags and which are not, which are block tags and which are not, where it might even be safe to split, and how splitting will impact a generic xml file.

And according to the epub3 spec, file extension must be xhtml if the file is an xhtml file.

The first step when seeing lots of xml files that are really xhtml files (or html files that are really xhtml files) is to use Sigil's ability to bulk rename a set of files by just changing the extension as they do not meet the current epub spec left as .xml files.

Again, I would consider a patch or PR to handle this differently but Sigil has long operated this way, and forcing people to change file extensions to more correct ones that meet the latest spec makes good sense to me. Please note, that old epub2 spec that allowed pure xml "islands" has long been deprecated mainly since no reader bothered to support it and just treated all .xml files as .xhtml files. This changed in epub3.

Hope this explains why you are seeing what you are seeing.


Quote BeckyEbook
First case
  1. Open any file, place the cursor in front of any open or close tag, and try to split the file (Split At Cursor).
Message: File cannot be split at this position.

Why? After all, it is enough to move the cursor one field to the left and then the split will work.


Second case
  1. Open any file, but change the extension from xhtml to xml (I see files like that in the wild quite often).
  2. Place the cursor anywhere (apart from the one I described in the first case of course) and try to split the file (Split At Cursor).
Message: Cannot split since it may not be an HTML file.

Despite the message, the split will be performed ... but the split part of the file will be lost (we lose a piece of code BEFORE split position, and it remains AFTER split)
Reply 

#3  exaltedwombat 01-26-2021, 11:29 AM
I can reproduce the second issue, but not the first. Sigil 1.4.3 Windows 10.
Reply 

#4  KevinH 01-26-2021, 11:38 AM
The point I was trying to make is both are expected behaviour. So a patch and rationale to change this would be needed for either.
Reply 

#5  DiapDealer 01-26-2021, 11:50 AM
If a user gets the message "Cannot split since it may not be an HTML file" then why is the split still being performed (with the subsequent loss of data)? Shouldn't the split attempt be aborted under those conditions if possible?
Reply 

#6  KevinH 01-26-2021, 12:26 PM
Quote DiapDealer
If a user gets the message "Cannot split since it may not be an HTML file" then why is the split still being performed (with the subsequent loss of data)? Shouldn't the split attempt be aborted under those conditions if possible?
Probably. That code is either in CodeViewEditor.cpp itself or in MainWindow.

That I can look at.
Reply 

#7  KevinH 01-26-2021, 12:41 PM
Well the first "issue" can be found in CodeViewEditor.cpp here if anyone wants to play with that, they could simply check for a |<tag> here and allow that even if "considered in tag".
Code
QString CodeViewEditor::SplitSection()
{ QString text = toPlainText(); int split_position = textCursor().position(); // Abort splitting the section if user is within a tag - MainWindow will display a status message if (IsPositionInTag(split_position, text)) { return QString(); }
I do not see that error message "Cannot split since it may not be an HTML file" anywhere in CodeViewEditor though. But in MainWindow.cpp that error message comes here:

Code
void MainWindow::CreateSectionBreakOldTab(QString content, HTMLResource *originating_resource)
{ if (content.isEmpty()) { QMessageBox::warning(this, tr("Sigil"), tr("File cannot be split at this position.")); return; } // XXX: This should be using the mime type not the extension. if (!TEXT_EXTENSIONS.contains(QFileInfo(originating_resource->Filename()).suffix().toLower())) { QMessageBox::warning(this, tr("Sigil"), tr("Cannot split since it may not be an HTML file."));	return; }
But by this point the "damage" has been done and the original file is now doomed.

It seems SplitAtCursor is handled inside CodeView where it takes the bottom half of the file and uses the current file name for it and then the piece above under a new name. All of this was designed before SplitAtSplitMarkers was done.

Hmm....
Reply 

#8  KevinH 01-26-2021, 12:52 PM
Okay, this needs to be handled in FlowTab.cpp SplitSection() before the OldTabRequest is fired as that is obviously too late.

Code
void FlowTab::SplitSection()
{ if (!IsDataWellFormed()) { return; } QWidget *mainWindow_w = Utility::GetMainWindow(); MainWindow *mainWindow = qobject_cast<MainWindow *>(mainWindow_w); if (!mainWindow) { Utility::DisplayStdErrorDialog("Could not determine main window."); return; } HTMLResource * nav_resource = mainWindow->GetCurrentBook()->GetConstOPF()->GetNavResource(); if (nav_resource && (nav_resource == m_HTMLResource)) { Utility::DisplayStdErrorDialog("The Nav file can not be split"); return; } // Handle warning the user about undefined url fragments. if (!mainWindow->ProceedWithUndefinedUrlFragments()) { return; } if (m_wCodeView) { emit OldTabRequest(m_wCodeView->SplitSection(), m_HTMLResource); }
}
So from this, I think we can simply remove the "Cannot be split" from that part of MainWindow.cpp as it is too late by that point to not hurt something.

We can then "*assume*" that a .xml file that actually gets successfully loaded into a FlowTab itself must be xhtml (otherwise it would be loaded in a XMLTab) and maybe even remove that test in MainWidow completely since it is already too late.

I will try doing that.
Reply 

#9  KevinH 01-26-2021, 01:03 PM
@BeckyEbook,
I just pushed a change to master for your second "issue". When you get a free moment, please give it a try and let us know.

At least half of the file should not be missing anymore but there is no guarantee that a generic xml file will be split properly either. Only xhtml is this designed to work for.
Reply 

#10  BeckyEbook 01-26-2021, 02:09 PM
I can confirm that it is better now – the second case does not delete part of the file
Thank you.
Reply 

  Next »  Last »  (1/2)
Today's Posts | Search this Thread | Login | Register