This is an old revision of the document!
Table of Contents
Selenium
Web scraping toolkit.
Debian
Install Selenium
apt update apt full-upgrade apt install python3-selenium
Windows 10
Install Python
Install Selenium
python -m pip install --upgrade pip python -m pip install selenium
Install tools
Here are a few tool which can be useful for tidying HTML content for evaluation and for storing scraped data in a Wikibase.
python -m pip install lxml python -m pip install beautifulsoup4 python -m pip install "WikibaseIntegrator>=0.12" python -m pip install dotenv
WSL1 alias
Chromium may be installed within WSL1 but cannot operate so creating an alias to the Windows Python executable allows scripts to run from WSL1 with the Windows browser.
TCSH
alias py "/mnt/c/Users/username/AppData/Local/Programs/Python/Python313/python.exe"
bash
alias py="/mnt/c/Users/username/AppData/Local/Programs/Python/Python313/python.exe"
Test
This test will open chrome.exe or chromium and visit a page.
#! /usr/bin/env python3 from selenium import webdriver from selenium.webdriver.chrome.options import Options #from selenium.webdriver.firefox.options import Options options = Options() options.add_argument("--incognito") driver = webdriver.Chrome(options=options) #options.add_argument("-private") #driver = webdriver.Firefox(options=options) driver.implicitly_wait(60) driver.get("https://www.kewl.org/") driver.quit()
Incognito mode was found to be a requirement on a Linux host otherwise chromium would wait about 30 seconds before opening the URL.