Doc To MediaWiki Converter
From Physics 111-Lab Wiki
Contents |
Our Converter File
Our .doc to MediaWiki converter is based on scripts found all over the web. We've combined several of them. Primarily, we started with Word2Wiki.bas. This script handles formatting, tables, lists, and also images. The PasteInHere.doc file contains the ConvertToMediaWiki macro - just paste in your materials into the .doc file, save the file as something descriptive, and then run the macro. The macro will produce two things:
- a .txt file. The content of this file can be pasted directly into MediaWiki edit box.
- a folder containing the images. Each image has a prefix derived from the name of the .doc file - this helps greatly if you are uploading numerous pages.
Download: PasteInHere.doc
For safety, the macro does not touch your .doc file.
Steps in Converting a Word File to Wiki
- Consider how the Wiki version of the document will be organized. Consider breaking up long documents (greater than 10 pages) into multiple Wiki pages in some logical structure. Sections that might have been called Appendix A, B, . . . could be given more meaningful names. You may want to convert the Word file into a single Wiki page initially, then cut and paste sections into new Wiki pages as needed.
- Set up and apply consistent heading styles in the Word document (Heading 1, Heading 2, . . .)
- Remove headers and footers
- Any pictures within a frame need to be moved out of the frame; also, compare all images in the .doc file to their .pdf version: if the .pdf version is higher quality, replace the .doc version with it
- Copy-Paste the .doc text into the conversion document(PasteInHere.doc), and save the conversion document as something corresponding to the lab you are working on. i.e. BRA for Beta Ray Spectroscopy
- Run the conversion macro
- Remove Backslashes added around brackets by conversion program
- Check to make sure headings have been converted correctly (the heading level in wiki text is related to the number of equal signs i.e. =This Section= would have a level of Heading 1). Sometimes it puts the right amount, but spaces it so that the wiki doesn’t detect it.
- One thing it does wrong is add a carriage return, putting the right hand set of equal signs on the line below; wiki doesn’t realize this! You need to fix it.
- The other thing it will do is start the section text on the same line as the section heading. This will cause wiki to think the section heading is part of the text, causing it to not be detected.
- Fix numbered lists and any text related to them. All lettered lists are turned into numbers by wiki.
- There can’t be any empty lines between the separate numbered sections or it will reset
- Equations are converted to images that are difficult to edit. Replace equations, greek letters, and other symbols with TeX code.
- Put carriage returns around equations to space them correctly and add few colons in front to indent them.
- Add Images
- that image spacing is much more easily done in the wiki via the markup code.
- Set the image size in the wiki image markup tag to approximately the same size as the length or width (set it to the larger of the two)
- Check all super and sub scripts
Source Code for Word Document to Wiki converter
Here is the source code in case you are interested:
Private myCopy As Document
Private fileName As String
Private filePath As String
'version 1.1
Sub ConvertToMediaWiki()
On Error Resume Next
Application.ScreenUpdating = False
'manipulate only the copy for safety
MakeACopy
'clean it up a bit before we begin
LeftAlign
ReplaceQuotes
MediaWikiEscapeChars
CleanFormattingParagraphEndingsBold
CleanFormattingParagraphEndingsItalic
'begin converting
ConvertHyperlinks
MediaWikiConvertH1
MediaWikiConvertH2
MediaWikiConvertH3
MediaWikiConvertH4
MediaWikiConvertH5
MediaWikiConvertBoldItalic
MediaWikiConvertItalic
MediaWikiConvertBold
MediaWikiConvertSuperscript
MediaWikiConvertSubscript
MediaWikiConvertLists
MediaWikiConvertTables
'fix and save
FixParagraphEndings
EmitImagesAndSaveasHTMLandTXT
myCopy.Close
Application.ScreenUpdating = True
End Sub
Private Sub MakeACopy()
filePath = ActiveDocument.Path + "\"
sBuffer = Trim$(ActiveDocument.Name)
If Len(sBuffer) > 0 Then
nPosn = InStrRev(sBuffer, ".", , vbTextCompare)
If nPosn > 0 Then
sBuffer = Left$(sBuffer, nPosn - 1) & sSuffix
End If
End If
fileName = sBuffer
Set myCopy = Documents.Add(ActiveDocument.FullName)
End Sub
Private Sub LeftAlign()
myCopy.Select
Selection.WholeStory
With Selection.ParagraphFormat
.LeftIndent = InchesToPoints(0)
.SpaceBeforeAuto = False
.SpaceAfterAuto = False
End With
With Selection.ParagraphFormat
.FirstLineIndent = InchesToPoints(0)
.SpaceBeforeAuto = False
.SpaceAfterAuto = False
End With
Selection.ParagraphFormat.Alignment = wdAlignParagraphLeft
End Sub
Private Sub ConvertHyperlinks()
Dim link As Hyperlink
For Each link In myCopy.Hyperlinks
If Len(link.TextToDisplay) < 1 Then
link.Range.InsertBefore link.Address
Else
link.Range.InsertBefore "[" & link.Address & " " & link.TextToDisplay & "]"
End If
Next link
While myCopy.Hyperlinks.Count > 0
For Each link In myCopy.Hyperlinks
link.Range.Delete
Next link
Wend
End Sub
Private Sub MediaWikiConvertH1()
ReplaceHeading wdStyleHeading1, "="
End Sub
Private Sub MediaWikiConvertH2()
ReplaceHeading wdStyleHeading2, "=="
End Sub
Private Sub MediaWikiConvertH3()
ReplaceHeading wdStyleHeading3, "==="
End Sub
Private Sub MediaWikiConvertH4()
ReplaceHeading wdStyleHeading4, "===="
End Sub
Private Sub MediaWikiConvertH5()
ReplaceHeading wdStyleHeading5, "====="
End Sub
Private Sub MediaWikiConvertBoldItalic()
myCopy.Select
With Selection.Find
.ClearFormatting
.Font.Bold = True
.Font.Italic = True
.Text = ""
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Forward = True
.Wrap = wdFindContinue
Do While .Execute
With Selection
If Len(.Text) > 1 And InStr(1, .Text, vbCr) Then
' Just process the chunk before any newline characters
' We'll pick-up the rest with the next search
.Collapse
.MoveEndUntil vbCr
End If
' Don't bother to markup newline characters (prevents a loop, as well)
If Not .Text = vbCr Then
.InsertBefore "'''''"
.InsertAfter "'''''"
End If
.Style = myCopy.Styles("Default Paragraph Font")
.Font.Bold = False
.Font.Italic = False
End With
Loop
End With
End Sub
Private Sub MediaWikiConvertBold()
myCopy.Select
With Selection.Find
.ClearFormatting
.Font.Bold = True
.Text = ""
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Forward = True
.Wrap = wdFindContinue
Do While .Execute
With Selection
If Len(.Text) > 1 And InStr(1, .Text, vbCr) Then
' Just process the chunk before any newline characters
' We'll pick-up the rest with the next search
.Collapse
.MoveEndUntil vbCr
End If
' Don't bother to markup newline characters (prevents a loop, as well)
If Not .Text = vbCr Then
.InsertBefore "'''"
.InsertAfter "'''"
End If
.Style = myCopy.Styles("Default Paragraph Font")
.Font.Bold = False
End With
Loop
End With
End Sub
Private Sub MediaWikiConvertItalic()
myCopy.Select
With Selection.Find
.ClearFormatting
.Font.Italic = True
.Text = ""
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Forward = True
.Wrap = wdFindContinue
Do While .Execute
With Selection
If Len(.Text) > 1 And InStr(1, .Text, vbCr) Then
' Just process the chunk before any newline characters
' We'll pick-up the rest with the next search
.Collapse
.MoveEndUntil vbCr
End If
' Don't bother to markup newline characters (prevents a loop, as well)
If Not .Text = vbCr Then
.InsertBefore "''"
.InsertAfter "''"
End If
.Style = myCopy.Styles("Default Paragraph Font")
.Font.Italic = False
End With
Loop
End With
End Sub
' Replace all smart quotes with their dumb equivalents
Private Sub ReplaceQuotes()
Dim quotes As Boolean
quotes = Options.AutoFormatAsYouTypeReplaceQuotes
Options.AutoFormatAsYouTypeReplaceQuotes = False
ReplaceString ChrW(8220), """"
ReplaceString ChrW(8221), """"
ReplaceString "‘", "'"
ReplaceString "’", "'"
Options.AutoFormatAsYouTypeReplaceQuotes = quotes
End Sub
Private Function ReplaceHeading(styleHeading As String, headerPrefix As String)
Dim normalStyle As Style
Set normalStyle = myCopy.Styles(wdStyleNormal)
myCopy.Select
With Selection.Find
.ClearFormatting
.Style = myCopy.Styles(styleHeading)
.Text = ""
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Forward = True
.Wrap = wdFindContinue
Do While .Execute
With Selection
If InStr(1, .Text, vbCr) Then
' Just process the chunk before any newline characters
' We'll pick-up the rest with the next search
.Collapse
.MoveEndUntil vbCr
End If
' Don't bother to markup newline characters (prevents a loop, as well)
If Not .Text = vbCr Then
.InsertBefore headerPrefix
.InsertBefore vbCr
.InsertAfter headerPrefix
End If
.Style = normalStyle
End With
Loop
End With
End Function
Private Function EscapeCharacter(char As String)
ReplaceString char, "\" & char
End Function
Private Sub MediaWikiEscapeChars()
EscapeCharacter "*"
EscapeCharacter "#"
'EscapeCharacter "_"
'EscapeCharacter "-"
'EscapeCharacter "+"
EscapeCharacter "{"
EscapeCharacter "}"
EscapeCharacter "["
EscapeCharacter "]"
'EscapeCharacter "~"
'EscapeCharacter "^^"
EscapeCharacter "|"
'EscapeCharacter "'"
End Sub
Private Function ReplaceString(findStr As String, replacementStr As String)
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = findStr
.Replacement.Text = replacementStr
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Function
Private Sub MediaWikiConvertSuperscript()
myCopy.Select
With Selection.Find
.ClearFormatting
.Font.Superscript = True
.Text = ""
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Forward = True
.Wrap = wdFindContinue
Do While .Execute
With Selection
.Text = Trim(.Text)
If Len(.Text) > 1 And InStr(1, .Text, vbCr) Then
' Just process the chunk before any newline characters
' We'll pick-up the rest with the next search
.Collapse
.MoveEndUntil vbCr
End If
' Don't bother to markup newline characters (prevents a loop, as well)
If Not .Text = vbCr Then
.InsertBefore "<sup>"
.InsertAfter "</sup>"
End If
.Style = myCopy.Styles("Default Paragraph Font")
.Font.Superscript = False
End With
Loop
End With
End Sub
Private Sub MediaWikiConvertSubscript()
myCopy.Select
With Selection.Find
.ClearFormatting
.Font.Subscript = True
.Text = ""
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Forward = True
.Wrap = wdFindContinue
Do While .Execute
With Selection
.Text = Trim(.Text)
If Len(.Text) > 1 And InStr(1, .Text, vbCr) Then
' Just process the chunk before any newline characters
' We'll pick-up the rest with the next search
.Collapse
.MoveEndUntil vbCr
End If
' Don't bother to markup newline characters (prevents a loop, as well)
If Not .Text = vbCr Then
.InsertBefore "<sub>"
.InsertAfter "</sub>"
End If
.Style = myCopy.Styles("Default Paragraph Font")
.Font.Subscript = False
End With
Loop
End With
End Sub
Private Sub MediaWikiConvertLists()
Dim para As Paragraph
For Each para In myCopy.ListParagraphs
With para.Range
.InsertBefore " "
For i = 1 To .ListFormat.ListLevelNumber
If .ListFormat.ListType = wdListBullet Then
.InsertBefore "*"
Else
.InsertBefore "#"
End If
Next i
.ListFormat.RemoveNumbers
End With
Next para
End Sub
Private Sub MediaWikiConvertTables()
Dim thisTable As Table
For Each thisTable In myCopy.Tables
With thisTable
For Each arow In thisTable.Rows
With arow
For Each acell In arow.Cells
With acell
acell.Range.InsertBefore "|"
End With
Next acell
.Range.InsertAfter vbCrLf + "|-"
End With
Next arow
.Range.InsertBefore "{|" + vbCrLf
.Range.InsertAfter vbCrLf + "|}"
.ConvertToText "|"
End With
Next thisTable
End Sub
Private Sub CleanFormattingParagraphEndingsBold()
Selection.Find.ClearFormatting
With Selection.Find
.Text = "^p"
.Font.Bold = True
.Replacement.ClearFormatting
.Replacement.Font.Bold = False
.Replacement.Text = "^p"
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = True
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Private Sub CleanFormattingParagraphEndingsItalic()
Selection.Find.ClearFormatting
With Selection.Find
.Text = "^p"
.Font.Italic = True
.Replacement.ClearFormatting
.Replacement.Font.Italic = False
.Replacement.Text = "^p"
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = True
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Private Sub FixParagraphEndings()
ReplaceString "^p", "^p^p"
'lists are strange - NO ^p before tolerated
ReplaceString "^p^p#", "^p#"
ReplaceString "^p^p*", "^p*"
End Sub
Private Sub EmitImagesAndSaveasHTMLandTXT()
Dim ImageFolder As String
Dim s As Shape
For Each s In myCopy.Shapes
If s.Type = msoPicture Then
s.ConvertToInlineShape
End If
Next
FolderName = filePath + fileName + "_files"
myCopy.SaveAs fileName:=filePath + fileName + ".htm", _
FileFormat:=wdFormatFilteredHTML, LockComments:=False, Password:="", _
AddToRecentFiles:=True, WritePassword:="", ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, SaveNativePictureFormat:=False, SaveFormsData _
:=False, SaveAsAOCELetter:=False
Set fs = CreateObject("Scripting.FileSystemObject")
'make a new folder for the images
Set fs1 = CreateObject("Scripting.FileSystemObject")
ImageFolder = filePath + fileName + " Images"
If fs1.FolderExists(ImageFolder) Then
'do nothing
Else
Set f = fs1.CreateFolder(ImageFolder)
End If
'ok - image folder is done
Set objFSO = CreateObject("Scripting.FileSystemObject")
Dim imgname As String
If fs.FolderExists(FolderName) Then
Dim fo: Set fo = fs.GetFolder(FolderName)
Dim iShape As InlineShape
Dim Folder: Set Folder = fo.Files
i = 1
For Each File In Folder
If i <= myCopy.InlineShapes.Count Then
Set iShape = myCopy.InlineShapes.Item(i)
iShape.Range.InsertBefore "[[Image:" & fileName & File.Name & "|thumb|250px|Title and Description]]"
i = i + 1
End If
Set objFile = objFSO.GetFile(FolderName + "\" + File.Name)
' Copy the file to its destination
objFile.Copy (filePath + fileName + " Images\" + fileName + " " + File.Name)
Next
myCopy.SaveAs fileName:=filePath + fileName + ".txt", FileFormat:=wdFormatText
Set filesys = CreateObject("Scripting.FileSystemObject")
Set htmlfile = filesys.GetFile(filePath + fileName + ".htm")
htmlfile.Delete
Set htmlfolder = filesys.GetFolder(FolderName)
htmlfolder.Delete
End If
End Sub
Other converters
There are many other converters but they are typically more difficult to use and modify. Word2MediaWikiPlus is a very powerful one, albeit complicated. The Word2MediaWikiPlus instructions are here and the package can be downloaded here.
