By default, the
MarkdownIt parser is initialised to comply with the CommonMark spec, which allows for parsing arbitrary HTML tags.
This can be useful for many use cases, for example when writing articles for one’s own blog or composing technical documentation for a software package.
However, extra precautions are needed when parsing content from untrusted sources.
Generally, the output should be run through sanitizers to ensure safety and prevent vulnerabilities like cross-site scripting (XSS).
markdown-it-py, there are two strategies for doing this:
Enable HTML (as is needed for full CommonMark compliance), and then use external sanitizer package(s).
Disable HTML, and then use plugins to selectively enable markup features. This removes the need for further sanitizing.
Unlike the original
markdown-it-py enables the more convenient, but less secure, CommonMark-compliant settings by default.
This is not safe when using
markdown-it-py in web applications that parse user-submitted content.
In such cases, using the
js-default preset is strongly recommended.
from markdown_it import MarkdownIt
Note that even with the default configuration,
markdown-it-py prohibits some kind of links which could be used for XSS:
data:(except some images: gif/png/jpeg/webp)
If you find a security problem, please report it to email@example.com.
Usually, plugins operate with tokenized content, and that’s enough to provide safe output.
But there is one non-evident case you should know - don’t allow plugins to generate arbitrary element
If those depend on user input - always add prefixes to avoid DOM clobbering.
See discussion for details.
So, if you decide to use plugins that add extended class syntax or autogenerating header anchors - be careful.