GitHub - wzpan/hexo-renderer-pandoc: A pandoc-markdown-flavor renderer for hexo. Pandoc understands a number of useful markdown syntax extensions, including document metadata (title, author, date); footnotes; tables; definition lists; superscript and subscript; strikeout; enhanced ordered lists (start number and numbering style are significant); running example lists; delimited code blocks with syntax highlighting; smart quotes, dashes, and ellipses; markdown inside HTML.
- Authors: Dano Morrison & Lina Tran
- Research field: Neuroscience
- Lesson topic: How to use Pandoc and Markdown for writing scientific manuscripts
- Lesson content URL: https://github.com/UofTCoders/studyGroup/tree/gh-pages/lessons/misc/pandoc-intro
Do you enjoy the spartan, academic aesthetic of LaTeX documents but not have the time to learn its unforgiving syntax? Does writing HTML documents by hand make you feel uncool? Do you enjoy automating difficult things? If so, Pandoc is what you’re looking for.
Pandoc is a command line tool that you can use to automatically convert files from markup format to another. With Pandoc, you can write in something easy like Markdown, Microsoft Word, or LibreOffice, and convert it something hard like:
- HTML
- Ebook formats
- LaTeX
- and many others
Markdown is the best way to write things for Pandoc, and probably the best way to write things for publishing to the web. If you haven’t heard of it, it’s a lightweight, shortened version of HTML that uses simplified tags like * ~ - #
to format documents. It’s mostly writing plain text, but with a little practice you can easily implement the most common types of text formatting, like headings, lists, links, images, etc. The veritable bible of Markdown is Daring Fireball.
With Markdown and Pandoc, you can write something that looks like this:
That can be converted into HTML like this:
The globular theory of matter
A history of globbing
If we accept the philosophical (and now scientifically incontrovertible) position that the movement of all rotund entities are mediated by underlying interactions between globular units, then it follows that whenever a globby is stored in the world such that it can be reglobbed at a later time (ie. a glob-memory), there must be some concurrent change that occurs in the underlying structure of globby particles.
Many great scientists have dedicated their lives to the study of globs and greatly advanced the globular theory of matter. In the next decade, who knows what globular secrets will be unearthed?
and even converted into LaTeX so it looks like this:
The real power of Pandoc, however, comes when, with style templates and a little extra formatting work, you’re able to produce fully typeset and personally styled documents with a markdown file and a command line.
Boom! Page numbers and everything!
- Install Pandoc
- Install LaTeX: TeXLive (Linux), MacTeX (Mac), MikTeX (Windows)
- Install XeTex?
Whether you’re doing something like sudo apt-get install pandoc && apt-get install texlive
or brew install pandoc && brew cask install mactex
, if you can run pandoc -v
and pdflatex -v
in a terminal, you’re good to go.
italicsbold
- one
- two
- three
- one
- two
- three
refer to code
inline with backticks
Once Pandoc is installed, the easiest way to convert a file is to open up your terminal (in the folder where your file is) and call:pandoc <filetobeconverted> -o <newfilename.xx>
Where .xx
is the file extension of the new file you want
By default, this will produce a ‘fragment’ of the file type you want. If you’d like to create a standalone document you need Pandoc to generate some code for you rather than just have it convert the Markdown markup to another language. If you want to create an HTML page with Head and Body sections or a LaTeX document with all the necessary boilerplate, simply add -s
after the file you want to convert in your terminal command.
For example,pandoc sample.md -s -o sample.tex
will take this
and produce all of this
That’s sure better than writing all that stuff yourself!You can also go straight to pdf (if you have the right dependencies installed) withpandoc sample.md -s -o sample.pdf
You may have figured out that -o
stands for ‘output’ and -s
stands for ‘standalone.’ There’s also a lot of other pandoc command modifiers that you can take a look at any time by running pandoc -h
Pandoc Github Flavored Markdown
You can also combine multiple input files into one output files with the regular character *
(it’s not a special pandoc feature, but pandoc supports having multiple files thrown at from the terminal).-H
Will let you include other files as the header of whatever you’re producing-V
will let you pass variables (ie. fontsize, documentclass for LaTeX documents)
This manual will be helpful for understanding these advanced features.
Chances are, if you actually are trying to publish something in LaTeX, you’re going to want to put your own style on things. It’s not that hard to do with Pandoc, all it takes is:
- saving a Pandoc LaTeX template somewhere
- edit that template the way you want
- tell pandoc to use that template in producing an output
1. Pandoc LaTeX Templates
These aren’t just regular LaTeX templates but Pandoc-specific templates that instruct Pandoc how to convert files into LaTeX. Here’s an example of one
2. Editing the templates
If you want to expand on what the template provides, you can go into it and change or add things, maybe specific fonts or packages you would like to use
Pandoc Windows
3. Use that template to generate an output
The easiest step, simply add a --template=yourtemplate.tex
modifier into your console command. Make sure that your template is either in your working directory or in /.pandoc/templates
NOTE: You have to make this directory yourself.You can use variables in the template such as fontfamily
to style your own file. Example:pandoc sample.md -o sample.pdf --template=mytemplate.tex -V fontfamily=sans
Markdown has become the de-facto standard for writing software documentation. This post documents my experience using Pandoc to convert Word documents (docx) to markdown.
To follow along, install Pandoc, if you haven’t done so already. Word documents need to be in the docx format. Legacy binary doc files are not supported.
Pandoc supports several flavors of markdown such as the popular GitHub flavored Markdown (GFM). To produce a standalone GFM document from docx, run
The --extract-media
option tells Pandoc to extract media to a ./media
folder.
Creating a PDF
To create a PDF, run
Pandoc requires (LaTeX) to produce the PDF. Remove --toc
option if you don’t want Pandoc to create a table of contents (TOC). Remove -N
option if you don’t want it to number sections automatically.
Markdown Editor
You’ll need a text editor to edit a markdown file. I use vscode. It has built-in support for editing and previewing markdown files. I use a few additional plugins to make editing markdown files more productive
HTML in Markdown
GFM allows HTML blocks in markdown. These get rendered when previewed in vscode, GitHub, or GitLab. Pandoc suppresses raw HTML output to PDF format and hence HTML blocks get rendered as plain text. For example, <sup>1</sup>
gets rendered as (1) instead of (^1). You can use ^text^
in Pandoc’s markdown syntax to render superscript.
You can use HTML character entities to write out characters and symbols not available on the keyboard.
Tables
Pandoc converts docx tables whose cells contain a single line of text each, to the pipe table syntax. Column text alignment is not rendered—you can add that back using colons. Relative column widths can be specified using dashes. Pipe table cells with long text or images, may stretch beyond the page.
Tables in docx that have complex data in cells such as lists and multiple lines, are converted to HTML table syntax. That is highly unfortunate because Pandoc renders HTML tables to PDF as plain text.
Pandoc Github Css
It is not unusual for docx tables, with complex layouts such as merged cells, to be missing columns or rows. I suggest simplifying such tables, in the original docx, before conversion.
Review all tables very carefully!
I’ve obtained nice results with Pandoc’s grid table syntax, but these tables cannot be previewed in vscode, GitHub, or GitLab.
Table of Contents
Pandora converts TOC in docx as a sequence of lines, where each line corresponds to a topic or section. Section headings are generated without numbering. I suggest deleting the TOC, and using the command line options discussed earlier to number sections and to render TOC.
If you have cross-references in docx that use section numbers, you can generate a hyperlinked TOC using the Markdown TOC plugin of vscode. The plugin can also add, update, or remove section numbers.
I suggest avoiding section numbers for cross-referencing and using hyperlinked section references instead.
Images
Images are exported to their native format and size. They are rendered in GFM using the ![[caption]](path)
syntax. Image sizes cannot be customized in GFM syntax, but Pandoc’s markdown syntax allows setting image attributes such as width using the ![[caption]](path){key1=value1 key2=value2}
syntax.
Figures
Pandoc does not convert vector diagrams created using Word’s figures and shapes. You’ll need to screen grab, or copy and paste, the image rendered by Word.
You can use mermaid.js syntax to recreate diagrams such as flowcharts and message sequence charts. mermaid.js syntax can be embedded in markdown, and converted using mermaid-filter
GitHub doesn’t yet allow you to preview mermaid.js diagrams, but GitLab does. vscode is able to preview them using the Markdown Preview Mermaid Support plugin.
Captions
Pandoc converts captions in the docx as plain text positioned after an image or table. I suggest using Pandoc’s native markdown syntax for captions.
Cross-references
Haskell Pandoc
GFM does not natively support linking to figures and tables, and HTML anchors are not a viable option with Pandoc. Link to the section containing a figure or table when referencing it from other parts of the document.
Figure and table numbers in docx may sometimes go missing from cross-references.
I suggest reviewing captions and cross-references very carefully!
Large Documents
Pandoc can handle large documents that have hundreds of pages. You may want to maintain large documents in separate markdown files. This makes concurrent editing productive and allows for reuse. It also allows for faster previews on GitHub or GitLab. In fact, previewing may entirely fail to work for complex documents. You may want to pre-render such documents to HTML using Pandoc.
Pandoc is capable of converting multiple markdown files
Regular Expressions
Using regular expressions significantly speeds up your ability to search and replace text. Some examples follow
Empty heading
^#+s*$
Line with trailing spaces
s+$
Repeated whitespace between words
bss+b
Whitespace before , or .
s+[,;.]
Paragraph starts with small case
nn[a-z]
Word figure not followed by a number
figures+(?!([d]){1,})
Word section not followed by a number
sections+(?!(d+.*d*?){1,})