The function converts a HTML document to a plain text.
The plain text generated by the function is similar to a text captured
by pressing Ctrl+A and Ctrl+C on a page when loaded in a web browser.
The function doesn't aspire to preserve the formatting or to be perfectly correct with respect to HTML specifications.
However, it attempts to generate newlines and whitespaces in and around HTML elements
to avoid merging distinct parts of text and thus enable extraction of data from the text (e.g. phone numbers).
Example usage
const text =htmlToText('<html><body>Some text</body></html>'); console.log(text);
Note that the function uses cheerio to parse the HTML.
Optionally, to avoid duplicate parsing of HTML and thus improve performance, you can pass
an existing Cheerio object to the function instead of the HTML text. The HTML should be parsed
with the decodeEntities option set to true. For example:
importcheeriofrom'cheerio'; const html ='<html><body>Some text</body></html>'; const text =htmlToText(cheerio.load(html,{decodeEntities:true}));
Parameters
htmlOrCheerioElement: string | CheerioAPI
HTML text or parsed HTML represented using a cheerio function.
The function converts a HTML document to a plain text.
The plain text generated by the function is similar to a text captured by pressing Ctrl+A and Ctrl+C on a page when loaded in a web browser. The function doesn't aspire to preserve the formatting or to be perfectly correct with respect to HTML specifications. However, it attempts to generate newlines and whitespaces in and around HTML elements to avoid merging distinct parts of text and thus enable extraction of data from the text (e.g. phone numbers).
Example usage
Note that the function uses cheerio to parse the HTML. Optionally, to avoid duplicate parsing of HTML and thus improve performance, you can pass an existing Cheerio object to the function instead of the HTML text. The HTML should be parsed with the
decodeEntities
option set totrue
. For example: