TIDY(1) | 5.6.0 | TIDY(1) |
tidy - check, correct, and pretty-print HTML(5) files
tidy [options] [file ...] [options] [file ...] ...
Tidy reads HTML, XHTML, and XML files and writes cleaned-up markup. For HTML variants, it detects, reports, and corrects many common coding errors and strives to produce visually equivalent markup that is both conformant to the HTML specifications and that works in most browsers.
A common use of Tidy is to convert plain HTML to XHTML. For generic XML files, Tidy is limited to correcting basic well-formedness errors and pretty printing.
If no input file is specified, Tidy reads the standard input. If no output file is specified, Tidy writes the tidied markup to the standard output. If no error file is specified, Tidy writes messages to the standard error.
Tidy supports two different kinds of options. Purely command-line options, starting with a single dash '-', can only be used on the command-line, not in configuration files. They are listed in the first part of this section. Configuration options, on the other hand, can either be passed on the command line, starting with two dashes --, or specified in a configuration file, using the option name, followed by a colon :, plus the value, without the starting dashes. They are listed in the second part of this section, with a sample config file.
For command-line options that expect a numerical argument, a default is assumed if no meaningful value can be found. On the other hand, configuration options cannot be used without a value; a configuration option without a value is simply discarded and reported as an error.
Using a command-line option is sometimes equivalent to setting the value of a configuration option. The equivalent option and value are shown in parentheses in the list below, as they would appear in a configuration file. For example, -quiet, -q (quiet: yes) means that using the command-line option -quiet or -q is equivalent to setting the configuration option quiet to yes.
Single-letter command-line options without an associated value can be combined; for example '-i', '-m' and '-u' may be combined as '-imu'.
Configuration options can be specified by preceding each option with -- at the command line, followed by its desired value, OR by placing the options and values in a configuration file, and telling tidy to read that file with the -config option:
tidy --option1 value1 --option2 value2 ...
tidy -config config-file ...
Configuration options can be conveniently grouped in a single config file. A Tidy configuration file is simply a text file, where each option is listed on a separate line in the form
option1: value1
option2: value2
etc.
The permissible values for a given option depend on the option's Type. There are five Types: Boolean, AutoBool, DocType, Enum, and String. Boolean Types allow any of yes/no, y/n, true/false, t/f, 1/0. AutoBools allow auto in addition to the values allowed by Booleans. Integer Types take non-negative integers. String Types generally have no defaults, and you should provide them in non-quoted form (unless you wish the output to contain the literal quotes).
Enum, Encoding, and DocType Types have a fixed repertoire of items, which are listed in the Supported values sections below.
You only need to provide options and values for those whose defaults you wish to override, although you may wish to include some already-defaulted options and values for the sake of documentation and explicitness.
Here is a sample config file, with at least one example of each of the five Types:
// sample Tidy configuration options
output-xhtml: yes
add-xml-decl: no
doctype: strict
char-encoding: ascii
indent: auto
wrap: 76
repeated-attributes: keep-last
error-file: errs.txt
Below is a summary and brief description of each of the options. They are listed alphabetically within each category.
This option takes a list of one or more keys indicating the message type to mute. You can discover these message keys by using the mute-id configuration option and examining Tidy's output.
See also: --mute-id
See also: --mute
This option specifies if Tidy should print only the contents of the body tag as an HTML fragment.
If set to auto, this is performed only if the body tag has been inferred.
Useful for incorporating existing whole pages as a portion of another page.
This option has no effect if XML output is requested.
Note that if the input already includes an <?xml ... ?> declaration then this option will be ignored.
If the encoding for the output is different from ascii, one of the utf* encodings, or raw, then the declaration is always added as required by the XML standard.
See also: --char-encoding, --output-encoding
This is needed if the whitespace in such elements is to be parsed appropriately without having access to the DTD.
If set to omit the output won't contain a DOCTYPE declaration. Note this this also implies numeric-entities is set to yes.
If set to html5 the DOCTYPE is set to <!DOCTYPE html>.
If set to auto (the default) Tidy will use an educated guess based upon the contents of the document. Note that selecting this option will not change the current document's DOCTYPE on output.
If set to strict, Tidy will set the DOCTYPE to the HTML4 or XHTML1 strict DTD.
If set to loose, the DOCTYPE is set to the HTML4 or XHTML1 loose (transitional) DTD.
Alternatively, you can supply a string for the formal public identifier (FPI).
For example:
doctype: "-//ACME//DTD HTML 3.14159//EN"
If you specify the FPI for an XHTML document, Tidy will set the system identifier to an empty string. For an HTML document, Tidy adds a system identifier only if one was already present in order to preserve the processing mode of some browsers. Tidy leaves the DOCTYPE for generic XML documents unchanged.
This option does not offer a validation of document conformance.
This option causes Tidy to set the DOCTYPE and default namespace as appropriate to XHTML, and will use the corrected value in output regardless of other sources.
For XHTML, entities can be written as named or numeric entities according to the setting of numeric-entities.
The original case of tags and attributes will be preserved, regardless of other options.
Any entities not defined in XML 1.0 will be written as numeric entities to allow them to be parsed by an XML parser.
The original case of tags and attributes will be preserved, regardless of other options.
See also: --output-file
Setting the option to yes allows you to tidy files without changing the file modification date, which may be useful with certain tools that use the modification date for things such as automatic server deployment.
Note this feature is not supported on some platforms.
See also: --error-file
You are advised to keep copies of important files before tidying them, as on rare occasions the result may not be what you expect.
This option specifies what level of accessibility checking, if any, that Tidy should perform.
Level 0 (Tidy Classic) is equivalent to Tidy Classic's accessibility checking.
For more information on Tidy's accessibility checking, visit Tidy's Accessibility Page at http://www.html-tidy.org/accessibility/.
Use this option with care; if Tidy reports an error, this means Tidy was not able to (or is not sure how to) fix the error, so the resulting output may not reflect your intention.
This option specifies the character encoding Tidy uses for input, and when set, automatically chooses an appropriate character encoding to be used for output. The output encoding Tidy chooses may be different from the input encoding.
For ascii, latin0, ibm858, mac, and win1252 input encodings, the output-encoding option will automatically be set to ascii. You can set output-encoding manually to override this.
For other input encodings, the output-encoding option will automatically be set to the the same value.
Regardless of the preset value, you can set output-encoding manually to override this.
Tidy is not an encoding converter. Although the Latin and UTF encodings can be mixed freely, it is not possible to convert Asian encodings to Latin encodings with Tidy.
See also: --input-encoding, --output-encoding
This option specifies the character encoding Tidy uses for input. Tidy makes certain assumptions about some of the input encodings.
For ascii, Tidy will accept Latin-1 (ISO-8859-1) character values and convert them to entities as necessary.
For raw, Tidy will make no assumptions about the character values and will pass them unchanged to output.
For mac and win1252, vendor specific characters values will be accepted and converted to entities as necessary.
Asian encodings such as iso2022 will be handled appropriately assuming the corresponding output-encoding is also specified.
Tidy is not an encoding converter. Although the Latin and UTF encodings can be mixed freely, it is not possible to convert Asian encodings to Latin encodings with Tidy.
See also: --char-encoding
The default is appropriate to the current platform.
Genrally CRLF on PC-DOS, Windows and OS/2; CR on Classic Mac OS; and LF everywhere else (Linux, macOS, and Unix).
This option specifies if Tidy should write a Unicode Byte Order Mark character (BOM; also known as Zero Width No-Break Space; has value of U+FEFF) to the beginning of the output, and only applies to UTF-8 and UTF-16 output encodings.
If set to auto this option causes Tidy to write a BOM to the output only if a BOM was present at the beginning of the input.
A BOM is always written for XML/XHTML output using UTF-16 output encodings.
This option specifies the character encoding Tidy uses for output. Some of the output encodings affect whether or not some characters are translated to entities, although in all cases, some entities will be written according to other Tidy configuration options.
For ascii, mac, and win1252 output encodings, entities will be used for all characters with values over 127.
For raw output, Tidy will write values above 127 without translating them to entities.
Output using latin1 will cause Tidy to write character values higher than 255 as entities.
The UTF family such as utf8 will write output in the respective UTF encoding.
Asian output encodings such as iso2022 will write output in the specified encoding, assuming a corresponding input-encoding was specified.
Tidy is not an encoding converter. Although the Latin and UTF encodings can be mixed freely, it is not possible to convert Asian encodings to Latin encodings with Tidy.
See also: --char-encoding
This option can be set independently of the clean option.
This option can be used to modify the behavior of clean when set to yes.
This option specifies if Tidy should merge nested <div> such as <div><div>...</div></div>.
If set to auto the attributes of the inner <div> are moved to the outer one. Nested <div> with id attributes are not merged.
If set to yes the attributes of the inner <div> are discarded with the exception of class and style.
See also: --clean, --merge-spans
This option can be used to modify the behavior of clean when set to yes.
This option specifies if Tidy should merge nested <span> such as <span><span>...</span></span>.
The algorithm is identical to the one used by merge-divs.
See also: --clean, --merge-divs
You should consider saving using Word's Save As..., and choosing Web Page, Filtered.
If set to yes when using clean, &emdash;, ”, and other named character entities are downgraded to their closest ASCII equivalents.
See also: --clean
Only entities compatible with the DOCTYPE declaration generated are used.
Entities that can be represented in the output encoding are translated correspondingly.
See also: --doctype, --preserve-entities
The apostrophe character ' is written out as ' since many web browsers don't yet support '.
Use with care, as it is your responsibility to make your documents accessible to people who cannot see the images.
If set to yes a name attribute, if not already existing, is added along an existing id attribute if the DTD allows it.
If set to no any existing name attribute is removed if an id attribute exists or has been added.
This option is automatically set if the input is in XML.
<span>foo <b>bar<b> baz</span>
Tidy will output
<span>foo <b>bar</b> baz</span>
By default, c will be used.
This option enables the use of tags for autonomous custom elements, e.g. <flag-icon> with Tidy. Custom tags are disabled if this value is no. Other settings - blocklevel, empty, inline, and pre will treat all detected custom tags accordingly.
The use of new-blocklevel-tags, new-empty-tags, new-inline-tags, or new-pre-tags will override the treatment of custom tags by this configuration option. This may be useful if you have different types of custom tags.
When enabled these tags are determined during the processing of your document using opening tags; matching closing tags will be recognized accordingly, and unknown closing tags will be discarded.
See also: --new-blocklevel-tags, --new-empty-tags, --new-inline-tags, --new-pre-tags
This is useful when you want to take existing HTML and use it with a style sheet.
This option specifies if Tidy should replace unexpected hyphens with = characters when it comes across adjacent hyphens.
The default is auto will which will act as no for HTML5 document types, and yes for all other document types.
HTML has abandonded SGML comment syntax, and allows adjacent hypens for all versions of HTML, although XML and XHTML do not. If you plan to support older browsers that require SGML comment syntax, then consider setting this value to yes.
If the value is no Tidy normalizes attribute values by replacing any newline or tab with a single space, and further by replacing any contiguous whitespace with a single space.
To force Tidy to preserve the original, literal values of all attributes and ensure that whitespace within attribute values is passed through unchanged, set this option to yes.
This is required for XHTML documents.
This option specifies if Tidy should keep the first or last attribute, if an attribute is repeated, e.g. has two align attributes.
See also: --join-classes, --join-styles
Additionally if drop-proprietary-attributes is enabled, then not applicable attributes will be dropped, too.
When set to no, these checks are not performed.
This option specifies if Tidy should output attribute names in upper case.
When set to no, attribute names will be written in lower case. Specifying yes will output attribute names in upper case, and preserve can used to leave attribute names untouched.
When using XML input, the original case is always preserved.
The default is no which results in lower case tag names, except for XML input where the original case is preserved.
<b class="rtop-2">foo <b class="r2-2">bar</b> baz</b>,
Tidy will output <b class="rtop-2">foo bar baz</b>.
This option specifies new block-level tags. This option takes a space or comma separated list of tag names.
Unless you declare new tags, Tidy will refuse to generate a tidied file if the input includes previously unknown tags.
Note you can't change the content model for elements such as <table>, <ul>, <ol> and <dl>.
This option is ignored in XML mode.
See also: --new-empty-tags, --new-inline-tags, --new-pre-tags, --custom-tags
This option specifies new empty inline tags. This option takes a space or comma separated list of tag names.
Unless you declare new tags, Tidy will refuse to generate a tidied file if the input includes previously unknown tags.
Remember to also declare empty tags as either inline or blocklevel.
This option is ignored in XML mode.
See also: --new-blocklevel-tags, --new-inline-tags, --new-pre-tags, --custom-tags
This option specifies new non-empty inline tags. This option takes a space or comma separated list of tag names.
Unless you declare new tags, Tidy will refuse to generate a tidied file if the input includes previously unknown tags.
This option is ignored in XML mode.
See also: --new-blocklevel-tags, --new-empty-tags, --new-pre-tags, --custom-tags
This option specifies new tags that are to be processed in exactly the same way as HTML's <pre> element. This option takes a space or comma separated list of tag names.
Unless you declare new tags, Tidy will refuse to generate a tidied file if the input includes previously unknown tags.
Note you cannot as yet add new CDATA elements.
This option is ignored in XML mode.
See also: --new-blocklevel-tags, --new-empty-tags, --new-inline-tags, --custom-tags
This option specifies if Tidy should indent block-level tags.
If set to auto Tidy will decide whether or not to indent the content of tags such as <title>, <h1>-<h6>, <li>, <td>, or <p> based on the content including a block-level element.
Setting indent to yes can expose layout bugs in some browsers.
Use the option indent-spaces to control the number of spaces or tabs output per level of indent, and indent-with-tabs to specify whether spaces or tabs are used.
See also: --indent-spaces
Note that the default value for this option is dependent upon the value of indent-with-tabs (see also).
See also: --indent
Set it to yes to indent using tabs instead of the default spaces.
Use the option indent-spaces to control the number of tabs output per level of indent. Note that when indent-with-tabs is enabled the default value of indent-spaces is reset to 1.
Note tab-size controls converting input tabs to spaces. Set it to zero to retain input tabs.
If set yes this option specifies Tidy should keep certain tabs found in the source, but only in preformatted blocks like <pre>, and other CDATA elements like <script>, <style>, and other pseudo elements like <?php ... ?>. As always, all other tabs, or sequences of tabs, in the source will continue to be replaced with a space.
Setting this option causes all tags for the <html>, <head>, and <body> elements to be omitted from output, as well as such end tags as </p>, </li>, </dt>, </dd>, </option>, </tr>, </td>, and </th>.
This option is ignored for XML output.
This option allows prioritizing the writing of attributes in tidied documents, allowing them to written before the other attributes of an element. For example, you might specify that id and name are written before every other attribute.
This option takes a space or comma separated list of attribute names.
This option specifies that Tidy should sort attributes within an element using the specified sort algorithm. If set to alpha, the algorithm is an ascending alphabetic sort.
When used while sorting with priority-attributes, any attribute sorting will take place after the priority attributes have been output.
See also: --priority-attributes
Tidy won't add a meta element if one is already present.
This option specifies if Tidy should add some extra empty lines for readability.
The default is no.
If set to auto Tidy will eliminate nearly all newline characters.
Tidy tries to wrap lines so that they do not exceed this length.
Set wrap to 0 (zero) if you want to disable line wrapping.
Note that this option can be set independently of wrap-script-literals. By default Tidy replaces any newline or tab with a single space and replaces any sequences of whitespace with a single space.
To force Tidy to preserve the original, literal values of all attributes, and ensure that whitespace characters within attribute values are passed through unchanged, set literal-attributes to yes.
See also: --wrap-script-literals, --literal-attributes
Tidy wraps long script string literals by inserting a backslash character before the line break.
See also: --wrap-attributes
For more information about HTML Tidy:
http://www.html-tidy.org/
For more information on HTML:
HTML: Edition for Web Authors (the latest HTML
specification)
http://dev.w3.org/html5/spec-author-view
HTML: The Markup Language (an HTML language reference)
http://dev.w3.org/html5/markup/
For bug reports and comments:
https://github.com/htacg/tidy-html5/issues/
Or send questions and comments to public-htacg@w3.org.
Validate your HTML documents using the W3C Nu Markup Validator:
http://validator.w3.org/nu/
Tidy was written by Dave Raggett <dsr@w3.org>, and subsequently maintained by a team at http://tidy.sourceforge.net/, and now maintained by HTACG (http://www.htacg.org).
The sources for HTML Tidy are available at https://github.com/htacg/tidy-html5/ under the MIT Licence.
5.6.0 | HTML Tidy |