Reverse engineering GraphQL persistedQuery extension
GraphQL is a query language for getting deeply nested structured data from a website's backend, similar to MongoDB queries.
The request is usually a POST to some general /graphql
endpoint with a body like this:
When scraping data from websites using GraphQL, it’s common to inspect the network requests in developer tools to find the exact queries being used. However, on some websites, you might notice that the GraphQL query itself isn’t visible in the request. Instead, you only see a cryptic hash value. This can be confusing and makes it harder to understand how data is being requested from the server.
This is because some websites use a feature called "persisted queries. It's a performance optimization that reduces the amount of data sent with each request by replacing the full query text with a precomputed hash. While this improves website speed and efficiency, it introduces challenges for scraping because the query text isn’t readily available.