YouTube Video Metadata Scraping with PowerShell
Trigger Warning : I discuss eating disorders and my opinions pro-eating disorder media briefly in this post. If this content is difficult for some, I recommend scrolling past The Background and resuming at The ProjectĀ instead.
Background
I ⤠YouTube. I have learned so much about development from folks like I am Tim Curry, or from the amazing Microsoft Virtual Academy courses from Jeffrey Snover and Jason Helmick (original link ). Most days I catch the repeats from Stephen Colbert, and then jam out to synthwave or chillhop. In fact, I listened to one particular mix so many times while learning c# that I still get flashbacks when I hear the songs on it againā¦sleepness nights trying to uncover everything I donāt know. I even have my own Intro to PowerShell Video that I think my mom watched 70,000 times.
My kids grew up singing songs from Dave and Eva, Little Baby Bum, Super Simple Songs and now Rachel and the TreeSchoolers, and it was one of the first services I signed up for and still pay for today (aside from NetFlix, and that one stint where I got CDs through the mail, yeahā¦)
But a few months ago I heard that YouTube will recommend videos which are pro eating-restriction and bulimia within four videos of the sorts of content targeted at young children. I have a history with people who experience these disorders and want to be sure we face it head on in my family, but that doesnāt mean I will allow impressionable minds to be exposed to content which presents this issue in a positive light.
If YouTube is not going to be safe for the type of stuff my children want to watch, I needed to know.Ā Unfortunately the person who told me of this can not remember their source, nor could I find any decent articles on the topic, but I thought that this smelled like a project in the making.
Ā The Project
I wanted to see which sorts of videos YouTube will recommend as a user continues to watch videos on their site. I started with two sets of videos, one for girls fashion and the other for weight loss information.
Fashion 1, Fashion 2, Fashion 3
For each video, we would get the video details, its tags, its thumbnail and then also the next five related videos.Ā Weād continue until we hit 250 videos.
Ā Getting set up
Setting up a YouTube API account is very simple. You can sign up here. Notice how there is no credit card link? Interestingly from what I could tell, there is no cost to working with the YoUTube API. But that is not to say that itās unlimited. YouTube uses a Quota based program where you have 10,000 units of quota to spend a day on the site. Sounds like a lot but it is reallyĀ not when doing research.
Operation | Cost | Description | Ā | Ā |
---|---|---|---|---|
v3/videos?part=snippet,contentDetails | 5 | retrieves info on the video, the creator, and also the tags and the description | Ā | Ā |
v3/Search | 100 | retrieves 99 related videos | Ā | Ā |
SaveThumbnail | 0 | retrieves the thumbnail of a video given the videoID | Ā | Ā |
I hit my quota cap within moments and so had to run my data gathering over the course of a few days.
As for the thumbnail, I couldnāt find a supported method of downloading this using the API, but I did find this post on StackOverflow which got me started.
The Functions
Once I wrote these functions, I was ready to go:
- Connect-PSYouTubeAccount
- Get-PSYouTubeRelatedVideo
- Get-PSYouTubeVideoInfo
- Get-PSYouTubeVideoThumbail
Connect-PSYouTubeAccount is just another credential storage system using SecureString.Ā Be warned that other administrators on the device where you use this cmdlet could retrieve credentials stored as a SecureString.Ā If youāre curious for more info, read up on the DPAPI here , or here,Ā or ask JeffTheScripter, as he is very knowledgable on the topic.Ā FWIW this approach stores the key in memory as a SecureString, then converts to string data only when needed to make the web call.
The Summary
You can access the data Iāve already created here in this new repository, PSYouTubeScrapes.Ā Ā But just be aware that it is kind of terrible UX looking through 8,000 tags and comments, so I took a dependency on the awesome PSWordCloud PowerShell module which I used to make a wordcloud out of the most common video tags.
A note on YouTube Comments: they contain the worst of humanity and should never ever be entered by any person.Ā I intentionally decided not to research them or publish the work I did on them, because, wow.
So, here is a word cloud of the two datasets, generated using this script.
https://gist.github.com/1RedOne/8ceb22e662d6b75af2a956bec7407ad6
Anna Saccone has a LOT of fashion and weight videos, but seemed pretty positive from what I saw[/caption]
The Conclusion
All in all, I felt that the content was pretty agreeable!Ā if the search for childrenās videos DID surface some stranger childrenās videos like this one, I have to say that I didnāt think any of the videos were overly negative, exploitative, nor did I see any āElsagateā style content.Ā Thatās not to say that YouTube is perfect, but I think it seems safe enough, even if I will probably review their YouTube history and let them use YouTube Kids instead of the full app.
Have a set of recommended videos youād like me to search like this?Ā Post them in a thread on /r/FoxDeploy or leave a comment with your videos and Iāll see what we come up with.
If you conduct your own trial with this code and example and want to share, feel free to submit a pull request to the repo as well (note that we .gitignore
all jpeg and png files to keep the repo size down).Ā You can access the data Iāve already created here in this new repository, PSYouTubeScrapes.