This is an old revision of the document!
Table of Contents
Selenium get
This is a simple tool to fetch a page using selenium and allow access to its wait parameters.
Install
sudo apt install python3-full mercurial hg clone https://hg.kewl.org/pub/sget cd sget
Demo
This web page loads a page of temporary content and after a delay it will reload.
The reloaded content contains a subset of the full content and requires scrolling to reveal it all.
Fetch the temporary page
py sget.py "https://www.deckenmalerei.eu/e8a3bf28-6365-42fe-a4d3-41608ed870e8" d0.html
Fetch the reloaded page
py sget.py "https://www.deckenmalerei.eu/e8a3bf28-6365-42fe-a4d3-41608ed870e8" -d10 d10.html
Compare temporary and reloaded
diff d0.html d10.html | head -30 9c9 < CbDD - lade e8a3bf28-6365-42fe-a4d3-41608ed870e8 --- > CbDD - Abtsgmünd, Gartenpavillon Schloss Hohenstadt [Text] 50,54c50,1386 < <div class="fullScreenCentered loadingIndicator"> < <div class="spinningCircleBasic spinningCircleTop spinningCircleRight spinningCircleBottom"> < </div> < <div class="fullScreenText"> < lade Daten ... --- > <div class="dataPage"> > <div class="document"> > <div style="margin-top: 0.5rem;"> > <div class="text-right"> > <div class="qrView"> > <div> > <a href="/e8a3bf28-6365-42fe-a4d3-41608ed870e8"> > <span> > QR Code > </span> > <span class="icon-align-big"> > <i class="material-icons md-36"> > arrow_drop_down > </i> > </span> > </a> > </div> > </div> > <div class="mapView">
Fetch the page again but this time wait for the reloaded content based on the reloaded tag attribute
py sget.py "https://www.deckenmalerei.eu/e8a3bf28-6365-42fe-a4d3-41608ed870e8" -b XPATH -v "//div[@class='dataPage']" dataPage.html
Scrolling the page down to reveal more images is not a function of sget.