In my previous article, I walked through with you how to use
Python + requests + lxml to scrape stock data. In this article let’s explore using Robotic Process Automation (RPA) in a Jupyter Notebook environment to perform web scraping. Personally, I find
Jupyter Notebook + RPA a great combination as the interactive nature of Jupyter Notebook allows for quick iterations and trial-and-errors when developing robots. Also, another good thing is that all these tools are open source.
I assumed you already have JupyterLab 3.0 and above installed. To install xeus-robot and its dependencies, just follow the instructions and run the following commands
$ conda install -c conda-forge xeus-robot
xeus-robot depends on Robot Framework which is a generic open-source automation framework for acceptance testing, acceptance test-driven development (ATDD), and robotic process automation (RPA).
Since I am going to perform web scraping, I need to install SeleniumLibrary from Robot Framework.
$ pip install --upgrade robotframework-seleniumlibrary
I also need to install a web driver based on the browser I want to automate. I can use
webdrivermanager to install the browser driver. In this case, I installed for both Firefox and Chrome.
$ pip install webdrivermanager
$ webdrivermanager firefox chrome --linkpath /usr/local/bin
Note that I install the drivers to
/usr/local/bin. You can definitely install them to another location, but make sure the location is in your environment PATH.