Extracting and monitoring web content with PowerShell
This kind of request comes up all the time on StackOverflow and /r/PowerShell.
âHow can I extract content from a webpage using PowerShellâ.
And itâs an interesting problem to solve. Â However, nothing motivates like greed, and I recently revisited this topic in order to help me track down the newest must-have item, the Switch.
In fact, this post COULD have been called âFinding a Nintendo Switch with PowerShellâ!
I have been REALLY wanting a Nintendo Switch, and since Iâll be flying up to NYC next month for Tomeâs NYC TechStravaganza (come see me if youâll be in Manhattan that day!), itâs the perfect justification for She-Who-Holds-The-Wallet for me to get one!
But EVERYWHERE is sold out. Â Still! Â :(
However, the stores have been receiving inventory every now and then, and I know that when GameStop has it in stock, I want to buy it from them! Â With that in mind, I knew I just needed a way to monitor the page and alert me when some text on it changes.
Web scraping, here we go!
Is this even legal?
Caveat: Scraping a site isnât illegal, but it might void the terms of some sites out there. Â Furthermore, if you scrape too often, you might be blocked from the site temporarily or forever. Â Donât get greedy in scraping, or try to use it commercially.
If a site provides an API, go that route instead, as API are sanctioned and provided by the company to use, and require 1% of the resources of loading a full page.
Finally, some Content Management Systems will never update an existing page, but create a new one with a new URL and update all links accordingly.  If youâre not careful, you could end up querying a page that will never change.
GameStop Nintendo Switch with Neon Joycons
First thingâs first, letâs load this page in PowerShell and store it in a variable, weâll be using Invoke-WebRequest
 to handle this task.
$url ='http://www.gamestop.com/nintendo-switch/consoles/nintendo-switch-console-with-neon-blue-and-neon-red-joy-con/141887' $response = Invoke-WebRequest -Uri $url
Next, I want to find a particular element on the page, which Iâll parse to see if it looks like they have some in stock. For that, I need to locate the ID or ClassName of the particular element, which weâll do using Chrome Developer Tools.
On the page, right-click âInspect Elementâ on an element of your choosing. Â In my case, I will right-click on the âUnavailableâ text area.
This will launch the Chrome Developer Console, and should have the element selected for you in the console, so you can just copy the class name. Â You can see me moving the mouse around, I do this to see which element is the most likely one to contain the value.
Â
You want the class name, in this case ats-prodBuy-inventory
. Â We can use PowerShellâs wonderful HTML parsing to do some heavy lifting here, by leveraging the HTMLWebResponseObject
âs useful ParsedHTML.getElementsByClassName
 method.
So, to select only the element in the body with the class name of ats-prodBuy-inventory
, Iâll run:
$rep.ParsedHtml.body.getElementsByClassName('ats-prodBuy-inventory')
This will list ALL the properties of this element, including lots of HTML info and properties that we donât need.
To truncate things a bit, Iâll select only properties which have text or content somewhere in the property name.
$rep.ParsedHtml.body.getElementsByClassName($classname) | select \*text\*,\*content\*
The output:
innerText : Currently unavailable online outerText : Currently unavailable online parentTextEdit : System.__ComObject isTextEdit : False oncontextmenu : contentEditable : inherit isContentEditable : False
Much easier to read. Â So, now I know that the innerText
 or outerText
 properties will let me know if the product is in stock or not.  To validate, I took a look at another product which was in stock, and saw that it was the same properties.
All that remained was to take this few-liner and and convert it into a script which will loop once every 30 mins, with the exit condition of when the message text on the site changes. Â When it does, Iâm using a tool I wrote a few years ago Send-PushMessage, to send a PushBullet message to my phone to give me a headâs up!
$url ='http://www.gamestop.com/nintendo-switch/consoles/nintendo-switch-console-with-neon-blue-and-neon-red-joy-con/141887'
While ($($InStock -eq $notInStock)){ $response = Invoke-WebRequest -Uri $url $classname ='ats-prodBuy-inventory' $notInStock = 'Currently unavailable online'
$InStock = $response.ParsedHtml.body.getElementsByClassName($classname) | select -expand innertext "$(get-date) is device in stock? $($InStock -ne $notInStock)\`n-----$InStock" Start-Sleep -Seconds (60\*30) } Send-PushMessage -Type Message -title "NintendoSwitch" -msg "In stock, order now!!!!"
This is what Iâve been seeingâŚbut eventually Iâll get a Push Message when the site text changes, and then, Iâll have my Switch!
Willing to help!
Are you struggling to extract certain text from a site? Â Donât worry, Iâm here to help! Â Leave me a comment below and Iâll do my best to help you. Â But before you ask, checkout this post on Reddit to see how I helped someone else with a similar problem.