This is an old revision of the document!


Selenium

Web scraping toolkit.

Debian

Install Selenium
apt update
apt full-upgrade
apt install python3-selenium

Windows 10

Install Python
Install Selenium
python -m pip install --upgrade pip
python -m pip install selenium 
Install tools

Here are a few tool which can be useful for tidying HTML content for evaluation and for storing scraped data in a Wikibase.

python -m pip install lxml
python -m pip install beautifulsoup4
python -m pip install "WikibaseIntegrator>=0.12"
python -m pip install dotenv
WSL1 alias

Chromium may be installed within WSL1 but cannot operate so creating an alias to the Windows Python executable allows scripts to run from WSL1 with the Windows browser.

TCSH

alias py "/mnt/c/Users/username/AppData/Local/Programs/Python/Python313/python.exe"

bash

alias py="/mnt/c/Users/username/AppData/Local/Programs/Python/Python313/python.exe"

Test

This test will open chrome.exe or chromium and visit a page.

#! /usr/bin/env python3

from selenium import webdriver
# CHROME
from selenium.webdriver.chrome.options import Options
# FIREFOX
#from selenium.webdriver.firefox.options import Options

options = Options()

# CHROME
options.add_argument("--incognito")
driver = webdriver.Chrome(options=options)
# FIREFOX
#options.add_argument("-private")
#driver = webdriver.Firefox(options=options)

driver.implicitly_wait(60)
driver.get("https://www.kewl.org/")

driver.quit()

Incognito mode was found to be a requirement on a Linux host otherwise chromium would wait about 30 seconds before opening the URL.

Resources

This website uses cookies. By using the website, you agree with storing cookies on your computer. Also you acknowledge that you have read and understand our Privacy Policy. If you do not agree leave the website.More information about cookies