tools:selenium

Selenium
- Debian
- Windows 10
- Test
- Tool
- Resources

Selenium

Web scraping toolkit.

Debian

Install Selenium

apt update
apt full-upgrade
apt install python3-selenium

Windows 10

Install Python

Python 3.13.2 for AMD64

Install Selenium

python -m pip install --upgrade pip
python -m pip install selenium

Install tools

Here are a few tool which can be useful for tidying HTML content for evaluation and for storing scraped data in a Wikibase.

python -m pip install lxml
python -m pip install beautifulsoup4
python -m pip install "WikibaseIntegrator>=0.12"
python -m pip install dotenv

WSL1 alias

Chromium may be installed within WSL1 but cannot operate so creating an alias to the Windows Python executable allows scripts to run from WSL1 with the Windows browser.

TCSH

alias py "/mnt/c/Users/username/AppData/Local/Programs/Python/Python313/python.exe"

bash

alias py="/mnt/c/Users/username/AppData/Local/Programs/Python/Python313/python.exe"

Test

This test will open chrome.exe or chromium and visit a page.

#! /usr/bin/env python3

from selenium import webdriver
# CHROME
from selenium.webdriver.chrome.options import Options
# FIREFOX
#from selenium.webdriver.firefox.options import Options

options = Options()

# CHROME
options.add_argument("--incognito")
driver = webdriver.Chrome(options=options)
# FIREFOX
#options.add_argument("-private")
#driver = webdriver.Firefox(options=options)

driver.implicitly_wait(60)
driver.get("https://www.google.com/")

driver.quit()

CHROME incognito mode was found to be a requirement on a Linux host otherwise chromium would wait about 30 seconds before opening the URL.

Tool

sget is a simple tool to fetch a URL, strip various tags and save the content to a file.

This can be used to inspect a web page prior to precise scraping.

Resources

VIM for AMD64

Table of Contents

Selenium

Debian

Install Selenium

Windows 10

Install Python

Install Selenium

Install tools

WSL1 alias

Test

Tool

Resources