social
Index
Interfaces
Variables
- DISCORD_REGEX
- DISCORD_REGEX_GLOBAL
- EMAIL_REGEX
- EMAIL_REGEX_GLOBAL
- FACEBOOK_REGEX
- FACEBOOK_REGEX_GLOBAL
- INSTAGRAM_REGEX
- INSTAGRAM_REGEX_GLOBAL
- LINKEDIN_REGEX
- LINKEDIN_REGEX_GLOBAL
- PINTEREST_REGEX
- PINTEREST_REGEX_GLOBAL
- TIKTOK_REGEX
- TIKTOK_REGEX_GLOBAL
- TWITTER_REGEX
- TWITTER_REGEX_GLOBAL
- YOUTUBE_REGEX
- YOUTUBE_REGEX_GLOBAL
Functions
Interfaces
SocialHandles
discords
emails
facebooks
instagrams
linkedIns
phones
phonesUncertain
pinterests
tiktoks
twitters
youtubes
Variables
constDISCORD_REGEX
Regular expression to exactly match a Discord invite or channel.
It has the following form: /^...$/i
and matches URLs such as:
https://discord.gg/discord-developers
https://discord.com/invite/jyEM2PRvMU
https://discordapp.com/channels/1234
https://discord.com/channels/1234/1234
discord.gg/discord-developers
Example usage:
import { social } from 'crawlee';
if (social.DISCORD_REGEX.test('https://discord.gg/discord-developers')) {
console.log('Match!');
}
constDISCORD_REGEX_GLOBAL
Regular expression to find multiple Discord channels or invites in a text or HTML.
It has the following form: /.../ig
and matches URLs such as:
https://discord.gg/discord-developers
https://discord.com/invite/jyEM2PRvMU
https://discordapp.com/channels/1234
https://discord.com/channels/1234/1234
discord.gg/discord-developers
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.DISCORD_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Discord channels found!`);
constEMAIL_REGEX
Regular expression to exactly match a single email address.
It has the following form: /^...$/i
.
constEMAIL_REGEX_GLOBAL
Regular expression to find multiple email addresses in a text.
It has the following form: /.../ig
.
constFACEBOOK_REGEX
Regular expression to exactly match a single Facebook profile URL.
It has the following form: /^...$/i
and matches URLs such as:
https://www.facebook.com/apifytech
facebook.com/apifytech
fb.com/apifytech
https://www.facebook.com/profile.php?id=123456789
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.facebook.com/apifytech/photos
Example usage:
import { social } from 'crawlee';
if (social.FACEBOOK_REGEX.test('https://www.facebook.com/apifytech')) {
console.log('Match!');
}
constFACEBOOK_REGEX_GLOBAL
Regular expression to find multiple Facebook profile URLs in a text or HTML.
It has the following form: /.../ig
and matches URLs such as:
https://www.facebook.com/apifytech
facebook.com/apifytech
fb.com/apifytech
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.facebook.com/apifytech/photos
the expression extracts only the following base URL:
https://www.facebook.com/apifytech
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.FACEBOOK_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Facebook profiles found!`);
constINSTAGRAM_REGEX
Regular expression to exactly match a single Instagram profile URL.
It has the following form: /^...$/i
and matches URLs such as:
https://www.instagram.com/old_prague
www.instagram.com/old_prague/
instagr.am/old_prague
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.instagram.com/cristiano/followers
It also does NOT match the following URLs:
https://www.instagram.com/explore/
https://www.instagram.com/_n/
https://www.instagram.com/_u/
Example usage:
import { social } from 'crawlee';
if (social.INSTAGRAM_REGEX.test('https://www.instagram.com/old_prague')) { console.log('Match!'); } ```
constINSTAGRAM_REGEX_GLOBAL
Regular expression to find multiple Instagram profile URLs in a text or HTML.
It has the following form: /.../ig
and matches URLs such as:
https://www.instagram.com/old_prague
www.instagram.com/old_prague/
instagr.am/old_prague
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.instagram.com/cristiano/followers
the expression extracts just the following base URL:
https://www.instagram.com/cristiano
The regular expression does NOT match the following URLs:
https://www.instagram.com/explore/
https://www.instagram.com/_n/
https://www.instagram.com/_u/
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.INSTAGRAM_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Instagram profiles found!`);
constLINKEDIN_REGEX
Regular expression to exactly match a single LinkedIn profile URL.
It has the following form: /^...$/i
and matches URLs such as:
https://www.linkedin.com/in/alan-turing
en.linkedin.com/in/alan-turing
linkedin.com/in/alan-turing
https://www.linkedin.com/company/linkedin/
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.linkedin.com/in/linus-torvalds/latest-activity
Example usage:
import { social } from 'crawlee';
if (social.LINKEDIN_REGEX.test('https://www.linkedin.com/in/alan-turing')) {
console.log('Match!');
}
constLINKEDIN_REGEX_GLOBAL
Regular expression to find multiple LinkedIn profile URLs in a text or HTML.
It has the following form: /.../ig
and matches URLs such as:
https://www.linkedin.com/in/alan-turing
en.linkedin.com/in/alan-turing
linkedin.com/in/alan-turing
https://www.linkedin.com/company/linkedin/
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.linkedin.com/in/linus-torvalds/latest-activity
the expression extracts just the following base URL:
https://www.linkedin.com/in/linus-torvalds
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.LINKEDIN_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} LinkedIn profiles found!`);
constPINTEREST_REGEX
Regular expression to exactly match a Pinterest pin, user or user's board.
It has the following form: /^...$/i
and matches URLs such as:
https://pinterest.com/pin/123456789
https://www.pinterest.cz/pin/123456789
https://www.pinterest.com/user
https://uk.pinterest.com/user
https://www.pinterest.co.uk/user
pinterest.com/user_name.gold
https://cz.pinterest.com/user/board
Example usage:
import { social } from 'crawlee';
if (social.PINTEREST_REGEX.test('https://pinterest.com/pin/123456789')) {
console.log('Match!');
}
constPINTEREST_REGEX_GLOBAL
Regular expression to find multiple Pinterest pins, users or boards in a text or HTML.
It has the following form: /.../ig
and matches URLs such as:
https://pinterest.com/pin/123456789
https://www.pinterest.cz/pin/123456789
https://www.pinterest.com/user
https://uk.pinterest.com/user
https://www.pinterest.co.uk/user
pinterest.com/user_name.gold
https://cz.pinterest.com/user/board
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.PINTEREST_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Pinterest pins found!`);
constTIKTOK_REGEX
Regular expression to exactly match a Tiktok video or user account.
It has the following form: /^...$/i
and matches URLs such as:
https://www.tiktok.com/trending?shareId=123456789
https://www.tiktok.com/embed/123456789
https://m.tiktok.com/v/123456789
https://www.tiktok.com/@user
https://www.tiktok.com/@user-account.pro
https://www.tiktok.com/@user/video/123456789
Example usage:
import { social } from 'crawlee';
if (social.TIKTOK_REGEX.test('https://www.tiktok.com/trending?shareId=123456789')) {
console.log('Match!');
}
constTIKTOK_REGEX_GLOBAL
Regular expression to find multiple Tiktok videos or user accounts in a text or HTML.
It has the following form: /.../ig
and matches URLs such as:
https://www.tiktok.com/trending?shareId=123456789
https://www.tiktok.com/embed/123456789
https://m.tiktok.com/v/123456789
https://www.tiktok.com/@user
https://www.tiktok.com/@user-account.pro
https://www.tiktok.com/@user/video/123456789
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.TIKTOK_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Tiktok profiles/videos found!`);
constTWITTER_REGEX
Regular expression to exactly match a single Twitter profile URL.
It has the following form: /^...$/i
and matches URLs such as:
https://www.twitter.com/apify
twitter.com/apify
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.twitter.com/realdonaldtrump/following
Example usage:
import { social } from 'crawlee';
if (social.TWITTER_REGEX.test('https://www.twitter.com/apify')) {
console.log('Match!');
}
constTWITTER_REGEX_GLOBAL
Regular expression to find multiple Twitter profile URLs in a text or HTML.
It has the following form: /.../ig
and matches URLs such as:
https://www.twitter.com/apify
twitter.com/apify
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.twitter.com/realdonaldtrump/following
the expression extracts only the following base URL:
https://www.twitter.com/realdonaldtrump
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.TWITTER_REGEX_STRING);
if (matches) console.log(`${matches.length} Twitter profiles found!`);
constYOUTUBE_REGEX
Regular expression to exactly match a single Youtube channel, user or video URL.
It has the following form: /^...$/i
and matches URLs such as:
https://www.youtube.com/watch?v=kM7YfhfkiEE
https://youtu.be/kM7YfhfkiEE
https://www.youtube.com/c/TrapNation
https://www.youtube.com/channel/UCklie6BM0fhFvzWYqQVoCTA
https://www.youtube.com/user/pewdiepie
Please note that this won't match URLs like https://www.youtube.com/pewdiepie that redirect to /user or /channel.
Example usage:
import { social } from 'crawlee';
if (social.YOUTUBE_REGEX.test('https://www.youtube.com/watch?v=kM7YfhfkiEE')) {
console.log('Match!');
}
constYOUTUBE_REGEX_GLOBAL
Regular expression to find multiple Youtube channel, user or video URLs in a text or HTML.
It has the following form: /.../ig
and matches URLs such as:
https://www.youtube.com/watch?v=kM7YfhfkiEE
https://youtu.be/kM7YfhfkiEE
https://www.youtube.com/c/TrapNation
https://www.youtube.com/channel/UCklie6BM0fhFvzWYqQVoCTA
https://www.youtube.com/user/pewdiepie
Please note that this won't match URLs like https://www.youtube.com/pewdiepie that redirect to /user or /channel.
Example usage:
import { social } from 'crawlee';
const matches = text.match(social.YOUTUBE_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Youtube videos found!`);
Functions
emailsFromText
The function extracts email addresses from a plain text. Note that the function preserves the order of emails and keep duplicates.
Parameters
text: string
Text to search in.
Returns string[]
Array of emails addresses found. If no emails are found, the function returns an empty array.
emailsFromUrls
The function extracts email addresses from a list of URLs. Basically it looks for all
mailto:
URLs and returns valid email addresses from them. Note that the function preserves the order of emails and keep duplicates.Parameters
urls: string[]
Array of URLs.
Returns string[]
Array of emails addresses found. If no emails are found, the function returns an empty array.
parseHandlesFromHtml
The function attempts to extract emails, phone numbers and social profile URLs from a HTML document, specifically LinkedIn, Twitter, Instagram and Facebook profile URLs. The function removes duplicates from the resulting arrays and sorts the items alphabetically.
Note that the
phones
field contains phone numbers extracted from the special phone links such as[call us](tel:+1234556789)
(see phonesFromUrls) and potentially other sources with high certainty, whilephonesUncertain
contains phone numbers extracted from the plain text, which might be very inaccurate.Example usage:
import { launchPuppeteer, social } from 'crawlee';
const browser = await launchPuppeteer();
const page = await browser.newPage();
await page.goto('http://www.example.com');
const html = await page.content();
const result = social.parseHandlesFromHtml(html);
console.log('Social handles:');
console.dir(result);Parameters
html: string
HTML text
optionaldata: null | Record<string, unknown> = null
Optional object which will receive the
text
and$
properties that contain text content of the HTML andcheerio
object, respectively. This is an optimization so that the caller doesn't need to parse the HTML document again, if needed.
Returns SocialHandles
An object with the social handles.
phonesFromText
The function attempts to extract phone numbers from a text. Please note that the results might not be accurate, since phone numbers appear in a large variety of formats and conventions. If you encounter some problems, please file an issue.
Parameters
text: string
Text to search the phone numbers in.
Returns string[]
Array of phone numbers found. If no phone numbers are found, the function returns an empty array.
phonesFromUrls
Finds phone number links in an array of URLs and extracts the phone numbers from them. Note that the phone number links look like
tel://123456789
,tel:/123456789
ortel:123456789
.Parameters
urls: string[]
Array of URLs.
Returns string[]
Array of phone numbers found. If no phone numbers are found, the function returns an empty array.
Representation of social handles parsed from a HTML page.