Mobile Edition
Single post / Thread: Batch convert MS Word to other formats
#1  Faster 06-07-2011, 02:00 PM
I wrote this in partial reponse to a request, but because of its complexity it's getting its own thread.

Here's a macro to batch convert .doc files. If there are any errors blame it on Cabernet Merlot and report problems here.

Please note that I'm including full instructions to help anyone who is not familiar with VBA macros. Please do not be offended if you already know this stuff. It's intended also for beginners who happen upon this thread.

Word docs can be batch converted to TXT, RTF, or Filtered HTML and in Word 2007 you can 'export' to PDF.
I created the macro to be used with Word 2003 and 2007.
If you wish to take advantage of Word 2007's ability to export as PDF you must remove the apostrophe at the start of this line:-

'ActiveDocument.ExportAsFixedFormat OutputFileName:=strDocName, ExportFormat:=wdExportFormatPDF

- as unfortunately I haven't the time to find a way to work around the compile error that occurs with this line in Word 2003. As you'd expect Word 2003 simply doesn't know 'wdExportFormatPDF' which became available in Word 2007.

All your doc files go in one folder. You open Word which has this macro in it. You run the macro. All the doc files are loaded, converted and saved in a new folder. Your original docs are unchanged in the first folder.

Here's the code.
Option Explicit
Sub ChangeDocsToTxtOrRTFOrHTML()
'with export to PDF in Word 2007 Dim fs As Object Dim oFolder As Object Dim tFolder As Object Dim oFile As Object Dim strDocName As String Dim intPos As Integer Dim locFolder As String Dim fileType As String On Error Resume Next locFolder = InputBox("Enter the folder path to DOCs", "File Conversion", "C:\myDocs") Select Case Application.Version Case Is < 12 Do fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML", "File Conversion", "TXT")) Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML") Case Is >= 12 Do fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML or PDF(2007+ only)", "File Conversion", "TXT")) Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML" Or fileType = "PDF") End Select Application.ScreenUpdating = False Set fs = CreateObject("Scripting.FileSystemObject") Set oFolder = fs.GetFolder(locFolder) Set tFolder = fs.CreateFolder(locFolder & "Converted") Set tFolder = fs.GetFolder(locFolder & "Converted") For Each oFile In oFolder.Files Dim d As Document Set d = Application.Documents.Open(oFile.Path) strDocName = ActiveDocument.Name intPos = InStrRev(strDocName, ".") strDocName = Left(strDocName, intPos - 1) ChangeFileOpenDirectory tFolder Select Case fileType Case Is = "TXT" strDocName = strDocName & ".txt" ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatText Case Is = "RTF" strDocName = strDocName & ".rtf" ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatRTF Case Is = "HTML" strDocName = strDocName & ".html" ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatFilteredHTML Case Is = "PDF" strDocName = strDocName & ".pdf" ' *** Word 2007 users - remove the apostrophe at the start of the next line *** 'ActiveDocument.ExportAsFixedFormat OutputFileName:=strDocName, ExportFormat:=wdExportFormatPDF End Select d.Close ChangeFileOpenDirectory oFolder Next oFile Application.ScreenUpdating = True
End Sub
Putting the macro into Word:
Copy the code into Notepad
In Notepad > Format > uncheck 'Word Wrap'. IMPORTANT as broken lines won't compile.

Open Word
WORD 2003
From the View Menu select Toolbars > Visual Basic
On the Visual Basic toolbar click the Visual Basic icon (hover cursor to find it) or Press ALT + F11
WORD 2007
Click the big multicoloured cloverleaf icon in top left. Click Word Options button (at very bottom). Check 'Show Developer tab in the Ribbon' and click 'OK'. Now on the same line as 'Home', at the far right, you'll see 'Developer'. Click this. At the left end of this toolbar click 'Visual Basic'.

In the Visual Basic Editor > View > Project Explorer (but it may be showing already).

Click the plus sign next to 'Normal'.
Click the plus sign next to 'Modules'.
Double click 'NewMacros' to open its code panel.
Scroll to the end of any macros, if present.

Right click Normal > Insert > Module
This will probably be named 'Module 1' and is in the 'Modules' folder above 'NewMacros'.
Double click 'Module 1' to open its code panel.

Copy ALL the code (Ctrl A, Ctrl C) from Notepad and paste into the code panel in the Visual Basic Editor.

Organise your Doc files:
Leave Word for the moment.
You must now place your Word docs into a single folder. (Start with a couple of doc files to try it out)
I suggest that you put this folder in a place where its 'long path name' will be short, eg in your root directory and you'll need the 'long path name' of this folder.*(see next)

*How to get the file path:
Open the folder with your docs in and copy the path from the address bar.
If it's not showing: Tools > Folder Options > View > CHECK 'Display the full path in the address bar' > OK.

You can make this the default path in your macro by changing the line:
locFolder = InputBox("Enter the folder path to DOCs", "File Conversion", "C:\myDocs")
locFolder = InputBox("Enter the folder path to DOCs", "File Conversion", "Put YOUR Path Here")

'Default' means if you click 'OK' instead of inserting a new path this is the path used.
Your converted files (Text, RTF, HTML files) will go into a folder adjacent to the folder holding the doc files.
This folder will be created if it doesn't already exist and be named 'myDocsConverted' or 'whatever you've called the folder' + 'Converted'.

To run the code:
From the Visual Basic Editor - Either place the cursor somewhere inside the code and click the 'Run' icon (a triangle) or press F5
Go into Word, Word 2003 - click the 'Run' icon and select this macro 'ChangeDocsToTxtOrRTFOrHTML'.
Or in Word 2007 - Developer tab - click Macros - select this macro 'ChangeDocsToTxtOrRTFOrHTML'.

You will be asked for the location of the folder (entered as a long path name). If you've amended the default path in the macro you can simply click OK.
You will be asked if you want to save copies as Text, RTF or HTML (filtered) with TXT the default. (Also PDF in Word 2007)
There will be some screen flicker as each file is loaded, saved and closed.
That's it. Done!

When you close Word, the macro will be saved in '', either in 'NewMacros' or 'Module 1'. You can re-use it whenever you open Word.
To remove it, select all the code and delete.

Possible problems:
~ "Word cannot give a document the same name as an open document" ~
If you have a txt, rtf or html file already open in Word and try to SaveAs with the same filename and same extension it will cause the macro to error. If this happens. Click 'End' in the dialog that appears. Close the offending file.
If you click DeBug by mistake - Go into the Visual Basic Editor (ALT + F11) and click the 'Stop' icon (a square near the triangle). Close the offending file.

Be aware that there is considerable variation in file size as you change format:
From smallest to largest, with text file only it's often:- txt < html < doc <rtf <pdf
but with an image included:- html - doc < pdf < rtf. (Watch out for a file with images becoming too big a single file for your ebook reader.)

If you get notices regarding Macro Security then you'll need to alter your security settings within MS Office.

Today's Posts | Search | Login | Register