YouTube Video Metadata Scraping with PowerShell

Published January 21, 2020 by FoxDeploy

Trigger Warning : I discuss eating disorders and my opinions pro-eating disorder media briefly in this post. If this content is difficult for some, I recommend scrolling past The Background and resuming at The ProjectĀ instead.

Background

I ā¤ YouTube. I have learned so much about development from folks like I am Tim Curry, or from the amazing Microsoft Virtual Academy courses from Jeffrey Snover and Jason Helmick (original link ). Most days I catch the repeats from Stephen Colbert, and then jam out to synthwave or chillhop. In fact, I listened to one particular mix so many times while learning c# that I still get flashbacks when I hear the songs on it againā€¦sleepness nights trying to uncover everything I donā€™t know. I even have my own Intro to PowerShell Video that I think my mom watched 70,000 times.

My kids grew up singing songs from Dave and Eva, Little Baby Bum, Super Simple Songs and now Rachel and the TreeSchoolers, and it was one of the first services I signed up for and still pay for today (aside from NetFlix, and that one stint where I got CDs through the mail, yeahā€¦)

But a few months ago I heard that YouTube will recommend videos which are pro eating-restriction and bulimia within four videos of the sorts of content targeted at young children. I have a history with people who experience these disorders and want to be sure we face it head on in my family, but that doesnā€™t mean I will allow impressionable minds to be exposed to content which presents this issue in a positive light.

If YouTube is not going to be safe for the type of stuff my children want to watch, I needed to know.Ā  Unfortunately the person who told me of this can not remember their source, nor could I find any decent articles on the topic, but I thought that this smelled like a project in the making.

Ā The Project

I wanted to see which sorts of videos YouTube will recommend as a user continues to watch videos on their site. I started with two sets of videos, one for girls fashion and the other for weight loss information.

Fashion 1, Fashion 2, Fashion 3

Weight 1, Weight 2, Weight 3

For each video, we would get the video details, its tags, its thumbnail and then also the next five related videos.Ā  Weā€™d continue until we hit 250 videos.

Ā Getting set up

Setting up a YouTube API account is very simple. You can sign up here. Notice how there is no credit card link? Interestingly from what I could tell, there is no cost to working with the YoUTube API. But that is not to say that itā€™s unlimited. YouTube uses a Quota based program where you have 10,000 units of quota to spend a day on the site. Sounds like a lot but it is reallyĀ not when doing research.

Operation Cost Description Ā  Ā 
v3/videos?part=snippet,contentDetails 5 retrieves info on the video, the creator, and also the tags and the description Ā  Ā 
v3/Search 100 retrieves 99 related videos Ā  Ā 
SaveThumbnail 0 retrieves the thumbnail of a video given the videoID Ā  Ā 

I hit my quota cap within moments and so had to run my data gathering over the course of a few days.

As for the thumbnail, I couldnā€™t find a supported method of downloading this using the API, but I did find this post on StackOverflow which got me started.

The Functions

Once I wrote these functions, I was ready to go:

Connect-PSYouTubeAccount is just another credential storage system using SecureString.Ā  Be warned that other administrators on the device where you use this cmdlet could retrieve credentials stored as a SecureString.Ā  If youā€™re curious for more info, read up on the DPAPI here , or here,Ā  or ask JeffTheScripter, as he is very knowledgable on the topic.Ā  FWIW this approach stores the key in memory as a SecureString, then converts to string data only when needed to make the web call.

The Summary

You can access the data Iā€™ve already created here in this new repository, PSYouTubeScrapes.Ā  Ā But just be aware that it is kind of terrible UX looking through 8,000 tags and comments, so I took a dependency on the awesome PSWordCloud PowerShell module which I used to make a wordcloud out of the most common video tags.

A note on YouTube Comments: they contain the worst of humanity and should never ever be entered by any person.Ā  I intentionally decided not to research them or publish the work I did on them, because, wow.

So, here is a word cloud of the two datasets, generated using this script.

https://gist.github.com/1RedOne/8ceb22e662d6b75af2a956bec7407ad6

A word cloud of the most commong tags for Weight loss videos traversed with this tool, including 'theStyleDiet', 'Commedy' Beauty', and 'Anna Saccone', who seems to be a YouTuber popular in this area Anna Saccone has a LOT of fashion and weight videos, but seemed pretty positive from what I saw[/caption]

The Conclusion

All in all, I felt that the content was pretty agreeable!Ā  if the search for childrenā€™s videos DID surface some stranger childrenā€™s videos like this one, I have to say that I didnā€™t think any of the videos were overly negative, exploitative, nor did I see any ā€˜Elsagateā€™ style content.Ā  Thatā€™s not to say that YouTube is perfect, but I think it seems safe enough, even if I will probably review their YouTube history and let them use YouTube Kids instead of the full app.

Have a set of recommended videos youā€™d like me to search like this?Ā  Post them in a thread on /r/FoxDeploy or leave a comment with your videos and Iā€™ll see what we come up with.

If you conduct your own trial with this code and example and want to share, feel free to submit a pull request to the repo as well (note that we .gitignore all jpeg and png files to keep the repo size down).Ā  You can access the data Iā€™ve already created here in this new repository, PSYouTubeScrapes.


Microsoft MVP

Five time Microsoft MVP, and now I work for the mothership


Need Help?

Get help much faster on our new dedicated Subreddit!

depicts a crowd of people in a night club with colored lights and says 'join the foxdeploy subrreddit today'


Blog Series
series_sml_IntroToDsc
series_sml_PowerShellGUI series_sml_IntroToRaspberryPi Programming series_sml_IntroToWindows Remote Management Series The Logo for System Center Configuration Manager is displayed here Depicts a road sign saying 'Learning PowerShell Autocomplete'




Blog Stats