Feed Me Seymour! A Content Based Recipe Recommendation System
Why I did it:
Why I did it:
I enjoy cooking in my spare time, but I end up cooking the same things on repeat. Sure this is a function of grad school, but it would be nice to at least know there are other recipes I can try!
Based on the ingredients from each recipe, the algorithm outputs the 10 most similar recipes, assuming that one would like (or be able to make) a new recipe that has similar ingredients to an old recipe.
What I did:
- Data scraped from allrecipes.com for 10,000 recipes
- Cleaned data:
- Removed recipes that were inputted erroneously (n = 2500)
- Removed irrelevant words from the ingredients list (didn't want "shredded" to increase or decrease similarity between recipes)
- Recompiled ingredients into a single character string
- Created algorithm:
- Converted ingredient lists for each recipe into tf-idf vectors
- Computed similarity between all recipes from vectors
- Extract top 10 most similar recipes
What I have yet to do:
- I wanted to see if I could extract cuisine type or meal type (or both) based on the ingredients alone. For example, an ingredient like fish sauce is typically found in Vietnamese cuisine and possibly other Asian cuisines, but nowhere else. My attempt at clustering this data didn't get me too far; I'd like to learn more about hierarchical clustering for that. (code is within
recs_2.py
) - I want to implement other variables in the recommendation system. In short, make this algorithm recommend recipes by cuisine type, food/ingredient aversions, meal types, vegan/vegetarian/pescatarian
What I gained:
- A database of recipes when I can get the time to try something new!
- Concrete experience in Python that I can share
- Experience in data scraping and data cleaning text data (useful for text NLP projects)