Pandoc Latex To Word



In mylast post, I explained how I used Pandoc to convert from LaTeX to Word (doc, docx) and Word-compatible (RTF) formats, but I had some issues with getting figure references and numbers to show up. Today I’ll explain how to use a nice program called LaTeX2RTF to achieve similar results and get everything to show up more or less as intended. This is Part 2 in the continuing saga. Part 3 (sometime in the future) will come back to Pandoc.

Today’s post includes:

Sep 14, 2018 Pandoc is a command-line tool for converting files from one markup language to another. Markup languages use tags to annotate sections of a document. Commonly used markup languages include Markdown, ReStructuredText, HTML, LaTex, ePub, and Microsoft Word DOCX. With Pandoc you can convert Markdown documents to PDF, HTML, Words DOCX or many other formats. After installing Pandoc, you can simply run it from command line. Note: By default, Pandoc uses LaTeX to generate PDF documents. So, if you want to generate PDF documents, you need to install a LaTex processor first (list of required LaTeX packages).

  • Basic instructions on how to use LaTeX2RTF
  • Some pros and cons of using LaTeX2RTF
  • Sources of latex bibliography (bst) files

Pandoc Template. A pandoc LaTeX template to convert markdown files to PDF or LaTeX using the Trivadis CI. It is designed for lecture notes and exercises with a focus on computer science. The template is compatible with pandoc 2. Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. Pandoc can convert between numerous markup and word processing formats, including, but not limited to, various flavors of Markdown, HTML, LaTeX and Word docx.

BTW, this post from 2011 is worth checking out too, as it mentions several other options for converting from LaTeX to another format.

How to use LaTeX2RTF to convert from LaTex to Word formats

  1. Download LaTeX2RTF here (I’m using the current Win32 version [latex2rtf-2.3.8_win.exe] but by the time you’ve read this there’s probably a newer version out, so check the website first)
  2. Follow the installation & usage instructions here
  3. Make sure you have a valid latex (tex) file, biblio file, bst file (for the citation style), and any other files you need for the file to compile properly
  4. Check out the jabbrv package if you need to use journal abbreviations in your citations
  5. Compile your latex file using your favorite latex editor/compiler so that you generate the .aux and .bbl files (normally you need to run latex -> bibtex -> latex -> latex)
  6. Run LaTeX2RTF via either the command line or the GUI – I haven’t tried the command line option yet, only the GUI, and its usage is pretty self-explanatory
  7. Voila! You should have a nicely formatted document
  8. For issues, refer to the user’s manual or the support page

Pros and Cons of using LaTeX2RTF

+ Pros

  • No effort required to get figure and table references to show up correctly.
  • Don’t need to worry about going through your 1000 references to find non-ascii characters (not a trivial task!) – if it compiles under Latex, it’ll work under LaTeX2RTF.
  • If you use the command line option, you can also predefine the output filename (like in Pandoc), as well as other options. If you use the GUI, the tex filename is used as the output filename by default. You may want to generate a different output filename to help your co-authors keep track of the current revision number/version.

– Cons

  • If you want to use a different citation style not already available for latex, it can be a real pain to create or edit a bst file (and this is one of the reasons I wanted to use Pandoc!). Normally getting the right bib style file this isn’t a problem for IEEE or other technical journals, but then again, you don’t need to convert to Word to submit your manuscript there! For medical and social sciences journals, you may very well be unable to find a suitable bst file. An alternative is to find a similar citation style and just manually correct the references by hand. This isn’t my preferred solution because you risk making mistakes if you change the references during one of your revisions (that may not seem like a big deal now, but just wait til you’re ready to click “Submit” and one of your co-authors asks for one last revision that changes all your references around!). Eventually I did find the BST file I needed. See below.
  • I haven’t found any other negative points for LaTeX2RTF, to be honest. 🙂 It’s possible that it can’t handle specific LaTeX packages — if you’ve had problems, let us know in the comments below!

Some sources of latex bibliography (bst) files

  • Check with the specific journal first
  • Some standard bib files on CTAN
  • Quite a few Biology and Medicine style and bib formats here
  • For Journal of Vision and Medical Physics
  • Let us know of other good sources in the comments below!

This document is for people who are unfamiliar with command line tools. Command-line experts can go straight to the User’s Guide or the pandoc man page.

First, install pandoc, following the instructions for your platform.

Pandoc is a command-line tool. There is no graphic user interface. So, to use it, you’ll need to open a terminal window:

  • On OS X, the Terminal application can be found in /Applications/Utilities. Open a Finder window and go to Applications, then Utilities. Then double click on Terminal. (Or, click the spotlight icon in the upper right hand corner of your screen and type Terminal – you should see Terminal under Applications.)

  • On Windows, you can use either the classic command prompt or the more modern PowerShell terminal. If you use Windows in desktop mode, run the cmd or powershell command from the Start menu. If you use the Windows 8 start screen instead, simply type cmd or powershell, and then run either the “Command Prompt” or “Windows Powershell” application. If you are using cmd, type chcp 65001 before using pandoc, to set the encoding to UTF-8.

  • On Linux, there are many possible configurations, depending on what desktop environment you’re using:

    • In Unity, use the search function on the Dash, and search for Terminal. Or, use the keyboard shortcut Ctrl-Alt-T.
    • In Gnome, go to Applications, then Accessories, and select Terminal, or use Ctrl-Alt-T.
    • In XFCE, go to Applications, then System, then Terminal, or use Super-T.
    • In KDE, go to KMenu, then System, then Terminal Program (Konsole).

You should now see a rectangle with a “prompt” (possibly just a symbol like %, but probably including more information, such as your username and directory), and a blinking cursor.

Let’s verify that pandoc is installed. Type

and hit enter. You should see a message telling you which version of pandoc is installed, and giving you some additional information.

First, let’s see where we are. Type

on Linux or OSX, or

on Windows, and hit enter. Your terminal should print your current working directory. (Guess what pwd stands for?) This should be your home directory.

Let’s navigate now to our Documents directory: type

Pdf to word

and hit enter. Now type

(or echo %cd% on Windows) again. You should be in the Documents subdirectory of your home directory. To go back to your home directory, you could type

The .. means “one level up.”

Go back to your Documents directory if you’re not there already. Let’s try creating a subdirectory called pandoc-test:

Now change to the pandoc-test directory:

If the prompt doesn’t tell you what directory you’re in, you can confirm that you’re there by doing

Pandoc latex to word free

(or echo %cd%) again.

OK, that’s all you need to know for now about using the terminal. But here’s a secret that will save you a lot of typing. You can always type the up-arrow key to go back through your history of commands. So if you want to use a command you typed earlier, you don’t need to type it again: just use up-arrow until it comes up. Try this. (You can use down-arrow as well, to go the other direction.) Once you have the command, you can also use the left and right arrows and the backspace/delete key to edit it.

Most terminals also support tab completion of directories and filenames. To try this, let’s first go back up to our Documents directory:

Now, type

and hit the tab key instead of enter. Your terminal should fill in the rest (test), and then you can hit enter.

To review:

Latex
  • pwd (or echo %cd% on Windows) to see what the current working directory is.
  • cd foo to change to the foo subdirectory of your working directory.
  • cd .. to move up to the parent of the working directory.
  • mkdir foo to create a subdirectory called foo in the working directory.
  • up-arrow to go back through your command history.
  • tab to complete directories and file names.

Type

and hit enter. You should see the cursor just sitting there, waiting for you to type something. Type this:

When you’re finished (the cursor should be at the beginning of the line), type Ctrl-D on OS X or Linux, or Ctrl-Z followed by Enter on Windows. You should now see your text converted to HTML!

What just happened? When pandoc is invoked without specifying any input files, it operates as a “filter,” taking input from the terminal and sending its output back to the terminal. You can use this feature to play around with pandoc.

Pandoc Convert Pdf To Word

By default, input is interpreted as pandoc markdown, and output is HTML. But we can change that. Let’s try converting from HTML to markdown:

Now type:

and hit Ctrl-D (or Ctrl-Z followed by Enter on Windows). You should see:

Docx

Now try converting something from markdown to LaTeX. What command do you think you should use?

You’ll probably want to use pandoc to convert a file, not to read text from the terminal. That’s easy, but first we need to create a text file in our pandoc-test subdirectory.

Important: To create a text file, you’ll need to use a text editor, not a word processor like Microsoft Word. On Windows, you can use Notepad (in Accessories). On OS X, you can use TextEdit (in Applications). On Linux, different platforms come with different text editors: Gnome has GEdit, and KDE has Kate.

Start up your text editor. Type the following:

Now save your file as test1.md in the directory Documents/pandoc-test.

Note: If you use plain text a lot, you’ll want a better editor than Notepad or TextEdit. You might want to look at Sublime Text or (if you’re willing to put in some time learning an unfamiliar interface) Vim or Emacs.

Go back to your terminal. We should still be in the Documents/pandoc-test directory. Verify that with pwd.

Now type

(or dir if you’re on Windows). This will list the files in the current directory. You should see the file you created, test1.md.

To convert it to HTML, use this command:

The filename test1.md tells pandoc which file to convert. The -s option says to create a “standalone” file, with a header and footer, not just a fragment. And the -o test1.html says to put the output in the file test1.html. Note that we could have omitted -f markdown and -t html, since the default is to convert from markdown to HTML, but it doesn’t hurt to include them.

Check that the file was created by typing ls again. You should see test1.html. Now open this in a browser. On OS X, you can type

On Windows, type

You should see a browser window with your document.

To create a LaTeX document, you just need to change the command slightly:

Try opening test1.tex in your text editor.

Pandoc can often figure out the input and output formats from the filename extensions. So, you could have just used:

Pandoc knows you’re trying to create a LaTeX document, because of the .tex extension.

Now try creating a Word document (with extension docx).

If you want to create a PDF, you’ll need to have LaTeX installed. (See MacTeX on OS X, MiKTeX on Windows, or install the texlive package on Linux.) Then do

You now know the basics. Pandoc has a lot of options. At this point you can start to learn more about them by reading the User’s Guide.

Here’s an example. The --mathml option causes pandoc to convert TeX math into MathML. Type

then enter this text, followed by Ctrl-D (Ctrl-Z followed by Enter on Windows):

Now try the same thing without --mathml. See the difference in output?

If you forget an option, or forget which formats are supported, you can always do

Pandoc Latex To Word Conversion

to get a list of all the supported options.

On OS X or Linux systems, you can also do

to get the pandoc manual page. All of this information is also in the User’s Guide.

If you get stuck, you can always ask questions on the pandoc-discuss mailing list. But be sure to check the FAQs first, and search through the mailing list to see if your question has been answered before.