PersonalFinance is one of my go to subreddits. I was curios to analyze the posts in personal finance. One can use the PRAW package to scrape Reddit posts. Specifically, I used this Python script to download information about every personalfinance submission from Jan 01, 2016. Note that the script was written in Python 2.7, so it would save a lot of headache if one uses same version to run the script. Reddit API allows to make one archive request per second, and you can get up to 100 comments per API call; needless to say it’s slow. So far I’ve scrapped data up to Feb 05, 2016, and I decided to parse those 9,825 json files using MATLAB. Below charts were generated using MATLAB.
Submissions by flair type
When each post is submitted, it is assigned one unique flair. Below is the percentage of each individual flair – they have been plotted in no particular order.
- No surprise that taxes takes away the largest percentage since it was tax season.
- Debt, Credit and Retirement all are around 10%.
- Investing, Housing, Planning and Employment hover from 7.9% to 5.6%.
Behavior of flairs across time
Amount of taxes related posts increase as the date approaches Feb. It wouldn’t be surprising if they increase as we reach tax date. I will update this plot once I get all json files until today. Other flairs stay pretty much stagnant.
I will update this post once I have all data. I will also be analyze submission rate and upvote ratio of the posts.
Inspiration for my analysis came from this post.