Web Page Scraping and Testing using GraphQL and Playwright

alpha2phi
2 min readApr 4, 2021

Build a web page scraping and testing service using GraphQL and Playwright.

Photo by Frank Albrecht on Unsplash

Overview

In my previous article, I walked through with you on developing serverless APIs to test web pages under different resolutions using Puppeteer. In this article let’s use Playwright, which is a similar library to perform web browser automation.

Playwright is a library available in Node.js, Python, and Java to automate Chromium, Firefox, and WebKit with a single API. It is built to enable cross-browser web automation that is ever-green, capable, reliable, and fast.

Setup

Install Playwright

Let’s install Playwright and browser binaries for Chromium, Firefox, and WebKit. Playwright requires Python 3.7+.

$ pip install playwright
$ playwright install

Install Python Libraries

Let’s install the required Python libraries. The requirements.txt is shown below. I am going to use FastAPI and graphene to develop the GraphQL APIs.

Pillow
fastapi
playwright
graphene>=2.0
uvicorn

Run pip install -r requirements.txt to install the libraries.

Application

GraphQL API to Capture Web Page with Specific ViewPort

Below is the FastAPI source code which

  • exposes a GraphQL query endpoint that accepts URL, width, and height parameters.
  • uses Playwright to capture the web page with the preferred width and height.
  • returns a base64-encoded PNG image string.

--

--

alpha2phi

Software engineer, Data Science and ML practitioner.