social
Index
Interfaces
Variables
- DISCORD_REGEX
- DISCORD_REGEX_GLOBAL
- EMAIL_REGEX
- EMAIL_REGEX_GLOBAL
- FACEBOOK_REGEX
- FACEBOOK_REGEX_GLOBAL
- INSTAGRAM_REGEX
- INSTAGRAM_REGEX_GLOBAL
- LINKEDIN_REGEX
- LINKEDIN_REGEX_GLOBAL
- PINTEREST_REGEX
- PINTEREST_REGEX_GLOBAL
- TIKTOK_REGEX
- TIKTOK_REGEX_GLOBAL
- TWITTER_REGEX
- TWITTER_REGEX_GLOBAL
- YOUTUBE_REGEX
- YOUTUBE_REGEX_GLOBAL
Functions
Interfaces
SocialHandles
discords
emails
facebooks
instagrams
linkedIns
phones
phonesUncertain
pinterests
tiktoks
twitters
youtubes
Variables
constDISCORD_REGEX
Regular expression to exactly match a Discord invite or channel.
It has the following form: /^...$/i and matches URLs such as:
https://discord.gg/discord-developers
https://discord.com/invite/jyEM2PRvMU
https://discordapp.com/channels/1234
https://discord.com/channels/1234/1234
discord.gg/discord-developers
Example usage:
import { social } from 'crawlee';
if (social.DISCORD_REGEX.test('https://discord.gg/discord-developers')) {
    console.log('Match!');
}
constDISCORD_REGEX_GLOBAL
Regular expression to find multiple Discord channels or invites in a text or HTML.
It has the following form: /.../ig and matches URLs such as:
https://discord.gg/discord-developers
https://discord.com/invite/jyEM2PRvMU
https://discordapp.com/channels/1234
https://discord.com/channels/1234/1234
discord.gg/discord-developers
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.DISCORD_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Discord channels found!`);
constEMAIL_REGEX
Regular expression to exactly match a single email address.
It has the following form: /^...$/i.
constEMAIL_REGEX_GLOBAL
Regular expression to find multiple email addresses in a text.
It has the following form: /.../ig.
constFACEBOOK_REGEX
Regular expression to exactly match a single Facebook profile URL.
It has the following form: /^...$/i and matches URLs such as:
https://www.facebook.com/apifytech
facebook.com/apifytech
fb.com/apifytech
https://www.facebook.com/profile.php?id=123456789
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.facebook.com/apifytech/photos
Example usage:
import { social } from 'crawlee';
if (social.FACEBOOK_REGEX.test('https://www.facebook.com/apifytech')) {
    console.log('Match!');
}
constFACEBOOK_REGEX_GLOBAL
Regular expression to find multiple Facebook profile URLs in a text or HTML.
It has the following form: /.../ig and matches URLs such as:
https://www.facebook.com/apifytech
facebook.com/apifytech
fb.com/apifytech
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.facebook.com/apifytech/photos
the expression extracts only the following base URL:
https://www.facebook.com/apifytech
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.FACEBOOK_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Facebook profiles found!`);
constINSTAGRAM_REGEX
Regular expression to exactly match a single Instagram profile URL.
It has the following form: /^...$/i and matches URLs such as:
https://www.instagram.com/old_prague
www.instagram.com/old_prague/
instagr.am/old_prague
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.instagram.com/cristiano/followers
It also does NOT match the following URLs:
https://www.instagram.com/explore/
https://www.instagram.com/_n/
https://www.instagram.com/_u/
Example usage:
import { social } from 'crawlee';
if (social.INSTAGRAM_REGEX.test('https://www.instagram.com/old_prague')) { console.log('Match!'); } ```
constINSTAGRAM_REGEX_GLOBAL
Regular expression to find multiple Instagram profile URLs in a text or HTML.
It has the following form: /.../ig and matches URLs such as:
https://www.instagram.com/old_prague
www.instagram.com/old_prague/
instagr.am/old_prague
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.instagram.com/cristiano/followers
the expression extracts just the following base URL:
https://www.instagram.com/cristiano
The regular expression does NOT match the following URLs:
https://www.instagram.com/explore/
https://www.instagram.com/_n/
https://www.instagram.com/_u/
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.INSTAGRAM_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Instagram profiles found!`);
constLINKEDIN_REGEX
Regular expression to exactly match a single LinkedIn profile URL.
It has the following form: /^...$/i and matches URLs such as:
https://www.linkedin.com/in/alan-turing
en.linkedin.com/in/alan-turing
linkedin.com/in/alan-turing
https://www.linkedin.com/company/linkedin/
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.linkedin.com/in/linus-torvalds/latest-activity
Example usage:
import { social } from 'crawlee';
if (social.LINKEDIN_REGEX.test('https://www.linkedin.com/in/alan-turing')) {
    console.log('Match!');
}
constLINKEDIN_REGEX_GLOBAL
Regular expression to find multiple LinkedIn profile URLs in a text or HTML.
It has the following form: /.../ig and matches URLs such as:
https://www.linkedin.com/in/alan-turing
en.linkedin.com/in/alan-turing
linkedin.com/in/alan-turing
https://www.linkedin.com/company/linkedin/
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.linkedin.com/in/linus-torvalds/latest-activity
the expression extracts just the following base URL:
https://www.linkedin.com/in/linus-torvalds
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.LINKEDIN_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} LinkedIn profiles found!`);
constPINTEREST_REGEX
Regular expression to exactly match a Pinterest pin, user or user's board.
It has the following form: /^...$/i and matches URLs such as:
https://pinterest.com/pin/123456789
https://www.pinterest.cz/pin/123456789
https://www.pinterest.com/user
https://uk.pinterest.com/user
https://www.pinterest.co.uk/user
pinterest.com/user_name.gold
https://cz.pinterest.com/user/board
Example usage:
import { social } from 'crawlee';
if (social.PINTEREST_REGEX.test('https://pinterest.com/pin/123456789')) {
    console.log('Match!');
}
constPINTEREST_REGEX_GLOBAL
Regular expression to find multiple Pinterest pins, users or boards in a text or HTML.
It has the following form: /.../ig and matches URLs such as:
https://pinterest.com/pin/123456789
https://www.pinterest.cz/pin/123456789
https://www.pinterest.com/user
https://uk.pinterest.com/user
https://www.pinterest.co.uk/user
pinterest.com/user_name.gold
https://cz.pinterest.com/user/board
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.PINTEREST_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Pinterest pins found!`);
constTIKTOK_REGEX
Regular expression to exactly match a Tiktok video or user account.
It has the following form: /^...$/i and matches URLs such as:
https://www.tiktok.com/trending?shareId=123456789
https://www.tiktok.com/embed/123456789
https://m.tiktok.com/v/123456789
https://www.tiktok.com/@user
https://www.tiktok.com/@user-account.pro
https://www.tiktok.com/@user/video/123456789
Example usage:
import { social } from 'crawlee';
if (social.TIKTOK_REGEX.test('https://www.tiktok.com/trending?shareId=123456789')) {
    console.log('Match!');
}
constTIKTOK_REGEX_GLOBAL
Regular expression to find multiple Tiktok videos or user accounts in a text or HTML.
It has the following form: /.../ig and matches URLs such as:
https://www.tiktok.com/trending?shareId=123456789
https://www.tiktok.com/embed/123456789
https://m.tiktok.com/v/123456789
https://www.tiktok.com/@user
https://www.tiktok.com/@user-account.pro
https://www.tiktok.com/@user/video/123456789
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.TIKTOK_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Tiktok profiles/videos found!`);
constTWITTER_REGEX
Regular expression to exactly match a single Twitter profile URL.
It has the following form: /^...$/i and matches URLs such as:
https://www.twitter.com/apify
twitter.com/apify
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.twitter.com/realdonaldtrump/following
Example usage:
import { social } from 'crawlee';
if (social.TWITTER_REGEX.test('https://www.twitter.com/apify')) {
    console.log('Match!');
}
constTWITTER_REGEX_GLOBAL
Regular expression to find multiple Twitter profile URLs in a text or HTML.
It has the following form: /.../ig and matches URLs such as:
https://www.twitter.com/apify
twitter.com/apify
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.twitter.com/realdonaldtrump/following
the expression extracts only the following base URL:
https://www.twitter.com/realdonaldtrump
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.TWITTER_REGEX_STRING);
if (matches) console.log(`${matches.length} Twitter profiles found!`);
constYOUTUBE_REGEX
Regular expression to exactly match a single Youtube channel, user or video URL.
It has the following form: /^...$/i and matches URLs such as:
https://www.youtube.com/watch?v=kM7YfhfkiEE
https://youtu.be/kM7YfhfkiEE
https://www.youtube.com/c/TrapNation
https://www.youtube.com/channel/UCklie6BM0fhFvzWYqQVoCTA
https://www.youtube.com/user/pewdiepie
Please note that this won't match URLs like https://www.youtube.com/pewdiepie that redirect to /user or /channel.
Example usage:
import { social } from 'crawlee';
if (social.YOUTUBE_REGEX.test('https://www.youtube.com/watch?v=kM7YfhfkiEE')) {
    console.log('Match!');
}
constYOUTUBE_REGEX_GLOBAL
Regular expression to find multiple Youtube channel, user or video URLs in a text or HTML.
It has the following form: /.../ig and matches URLs such as:
https://www.youtube.com/watch?v=kM7YfhfkiEE
https://youtu.be/kM7YfhfkiEE
https://www.youtube.com/c/TrapNation
https://www.youtube.com/channel/UCklie6BM0fhFvzWYqQVoCTA
https://www.youtube.com/user/pewdiepie
Please note that this won't match URLs like https://www.youtube.com/pewdiepie that redirect to /user or /channel.
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.YOUTUBE_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Youtube videos found!`);
Functions
emailsFromText
- The function extracts email addresses from a plain text. Note that the function preserves the order of emails and keep duplicates. - Parameters- text: string- Text to search in. 
 - Returns string[]- Array of emails addresses found. If no emails are found, the function returns an empty array. 
emailsFromUrls
- The function extracts email addresses from a list of URLs. Basically it looks for all - mailto:URLs and returns valid email addresses from them. Note that the function preserves the order of emails and keep duplicates.- Parameters- urls: string[]- Array of URLs. 
 - Returns string[]- Array of emails addresses found. If no emails are found, the function returns an empty array. 
parseHandlesFromHtml
- The function attempts to extract emails, phone numbers and social profile URLs from a HTML document, specifically LinkedIn, Twitter, Instagram and Facebook profile URLs. The function removes duplicates from the resulting arrays and sorts the items alphabetically. - Note that the - phonesfield contains phone numbers extracted from the special phone links such as- [call us](tel:+1234556789)(see phonesFromUrls) and potentially other sources with high certainty, while- phonesUncertaincontains phone numbers extracted from the plain text, which might be very inaccurate.- Example usage: - import { launchPuppeteer, social } from 'crawlee';
 const browser = await launchPuppeteer();
 const page = await browser.newPage();
 await page.goto('http://www.example.com');
 const html = await page.content();
 const result = social.parseHandlesFromHtml(html);
 console.log('Social handles:');
 console.dir(result);- Parameters- html: string- HTML text 
- optionaldata: null | Record<string, unknown> = null- Optional object which will receive the - textand- $properties that contain text content of the HTML and- cheerioobject, respectively. This is an optimization so that the caller doesn't need to parse the HTML document again, if needed.
 - Returns SocialHandles- An object with the social handles. 
phonesFromText
- The function attempts to extract phone numbers from a text. Please note that the results might not be accurate, since phone numbers appear in a large variety of formats and conventions. If you encounter some problems, please file an issue. - Parameters- text: string- Text to search the phone numbers in. 
 - Returns string[]- Array of phone numbers found. If no phone numbers are found, the function returns an empty array. 
phonesFromUrls
- Finds phone number links in an array of URLs and extracts the phone numbers from them. Note that the phone number links look like - tel://123456789,- tel:/123456789or- tel:123456789.- Parameters- urls: string[]- Array of URLs. 
 - Returns string[]- Array of phone numbers found. If no phone numbers are found, the function returns an empty array. 
Representation of social handles parsed from a HTML page.