Monday, April 23, 2012

How to Easily Use OpenType Fonts in LaTeX

I became interested in LaTeX out of a desire to be able to produce high-quality PDFs for self-published books. Someday I hope to be able to produce books of comparable quality to these humanities books typeset in TeX. This idea became even more feasible when I discovered the text content could be written in Markdown and converted to LaTeX with pandoc (More information in this article).

Typographically, the example books I linked to above are more the exception than the rule: the vast majority of LaTeX documents use the same boring default font, Computer Modern, that was originally packaged with the software in the 1980s. Using Computer Modern in a self-published book would be almost as bad as using Times New Roman or Arial.

If you try to figure out whether and how you might be able to use your computer’s normal fonts with LaTeX, you will soon come across a lot of extremely complicated and incomplete documentation about how to convert TrueType or OpenType fonts into a format LaTeX can use.

The happy truth is that these instructions are now obsolete: you now have easy access to OpenType fonts on Windows and Mac platforms, thanks to a new version of LaTeX called XeTeX. XeTeX includes a package called fontspec that gives full access to all system fonts, as well as advanced features for OpenType fonts, such as ligatures and small caps. XeTeX is available for Mac, but what most people don’t say is that this font-accessing goodness can also be used on Windows since XeTeX is included with Windows distributions such as TeX Live and MikTeX.

That being understood, here’s how to use your system fonts in your TeX documents (source):

  1. Use the xelatex command in place of pdflatex
  2. Add \usepackage{xltxtra} at the beginning of your preamble (some XeTeX goodies, in particular it also loads fontspec, which is needed for font selection).
  3. Add \setmainfont{Name of OTF font} in the preamble.
  4. No step 4.

Note: If you are using the aforementioned pandoc to generate your TeX documents, you do not need to do step 2 — pandoc already includes the fontspec package in its default template. Also, you can set the main font by adding the option --variable=mainfont:"font name" when calling the pandoc command.

Thursday, April 19, 2012

Publish multiple Markdown files to HTML in Windows

I wrote this script as a means of setting up a dead-simple “knowledge base” in HTML format.

The idea is to write documentation as a collection of plain-text files in Markdown format and have a no-fuss way to publish them as HTML, re-publishing changes as necessary.

In order for this script to work, you need to be on Windows, and you need to install a program called pandoc.

How to use it:

  1. Save a copy of this script file in any folder containing a bunch of Markdown-formatted text files. Include a stylesheet.css file in this folder as well if you want the HTML files to have CSS styling.
  2. Run the script (double-click it) — it will silently create updated HTML files for every text file in the folder. Only text files whose HTML counterparts are out of date or nonexistent will be processed.

You can either copy and paste the code below into Notepad and save it as a .vbs file, or you can download the latest version in a zip file. The code in the download will be more extensively commented, and may also contain enhancements developed since this post was written.

Here’s the basic code (provided under the terms of the Artistic License 2.0 —

Set objShell = CreateObject("WScript.Shell")
Set objFSO = CreateObject("Scripting.FileSystemObject")

strThisFolder = objFSO.GetParentFolderName(Wscript.ScriptFullName)
Set objStartFolder = objFSO.GetFolder(strThisFolder)
strConverterCommand = "pandoc -f markdown -t html -c stylesheet.css -o "

Set objFilesToUpdate = CreateObject("Scripting.Dictionary")

Set colFiles = objStartFolder.Files
For Each objFile in colFiles
    If objFSO.GetExtensionName(objFile.Name) = "txt" Then

        ' Check if HTML version of this text file exists in this folder
        strHTMLName = strThisFolder & "\" & Replace(objFile.Name, ".txt", ".html")
        If objFSO.FileExists(strHTMLName) Then

            ' If it exists, compare the timestamps
            Set objHTMLFile = objFSO.GetFile(strHTMLName)
            If objFile.DateLastModified > objHTMLFile.DateLastModified Then
                'If the text file is newer, add this text file to the list
                objFilesToUpdate.Add objFile.Name, strHTMLName
            End if

            ' If the file does not exist yet, add this text file to the list
            objFilesToUpdate.Add objFile.Name, strHTMLName
        End if
    End if

' Update all the text files in the list.
colFilesToUpdate = objFilesToUpdate.Keys
For Each strSourceFile in colFilesToUpdate

    objShell.Run strConverterCommand & objFilesToUpdate.Item(strSourceFile) & " " & strSourceFile, 3, True

Possible Future Improvements:

  • The script isn’t very helpful about telling you how long the process is going to take. I looked at several options for providing a progress bar or some kind of status output, but ultimately VBScript is just really sucky at this.
  • Pandoc is a very powerful converter. One could easily tweak the script to add options for producing LaTeX or even PDF files.