Overview
In my previous article, I walked through with you how to use Python + requests + lxml
to scrape stock data. In this article let’s explore using Robotic Process Automation (RPA) in a Jupyter Notebook environment to perform web scraping. Personally, I find Jupyter Notebook + RPA
a great combination as the interactive nature of Jupyter Notebook allows for quick iterations and trial-and-errors when developing robots. Also, another good thing is that all these tools are open source.
I am going to usexeus-robot
which is a Jupyter kernel for Robot Framework based on the native implementation of the Jupyter protocol xeus.
Setup
xeus-robot
I assumed you already have JupyterLab 3.0 and above installed. To install xeus-robot and its dependencies, just follow the instructions and run the following commands
$ conda install -c conda-forge xeus-robot
xeus-robot depends on Robot Framework which is a generic open-source automation framework for acceptance testing, acceptance test-driven development (ATDD), and robotic process automation (RPA).
SeleniumLibrary
Since I am going to perform web scraping, I need to install SeleniumLibrary from Robot Framework.
$ pip install --upgrade robotframework-seleniumlibrary
Browser Drivers
I also need to install a web driver based on the browser I want to automate. I can use webdrivermanager
to install the browser driver. In this case, I installed for both Firefox and Chrome.
$ pip install webdrivermanager
$ webdrivermanager firefox chrome --linkpath /usr/local/bin
Note that I install the drivers to /usr/local/bin
. You can definitely install them to another location, but make sure the location is in your environment PATH.