Skip to main content
Version: 3.0

htmlToText

Callable

  • htmlToText(htmlOrCheerioElement: string | CheerioAPI): string

  • The function converts a HTML document to a plain text.

    The plain text generated by the function is similar to a text captured by pressing Ctrl+A and Ctrl+C on a page when loaded in a web browser. The function doesn't aspire to preserve the formatting or to be perfectly correct with respect to HTML specifications. However, it attempts to generate newlines and whitespaces in and around HTML elements to avoid merging distinct parts of text and thus enable extraction of data from the text (e.g. phone numbers).

    Example usage

    const text = htmlToText('<html><body>Some text</body></html>');
    console.log(text);

    Note that the function uses cheerio to parse the HTML. Optionally, to avoid duplicate parsing of HTML and thus improve performance, you can pass an existing Cheerio object to the function instead of the HTML text. The HTML should be parsed with the decodeEntities option set to true. For example:

    import cheerio from 'cheerio';
    const html = '<html><body>Some text</body></html>';
    const text = htmlToText(cheerio.load(html, { decodeEntities: true }));

    Parameters

    • htmlOrCheerioElement: string | CheerioAPI

      HTML text or parsed HTML represented using a cheerio function.

    Returns string

    Plain text