One weekend, I was looking at hiking apps to see if they could recommend a hike based on what I felt like doing or through an image of the type of scenery I wanted to see. As an avid hiker, data enthusiast, and constant innovator, this made me think – what if we could leverage machine learning to recommend a hike? What if I could translate the typical questions I received into an ML powered hiking application:
Hey Arjun – I feel like seeing lots of trees and birds. What hike should I go to?”
Or “I have family visiting this weekend. What trail should I take them to? I recently did this hike (shows image of a hike on their phone..)”
I quickly realized that this would require a major undertaking to build out an AI application – going from gathering data from actual hikes, tagging them with reviews, building complex Natural Language Processing (NLP) and computer vision models, operationalizing the model, and finally making recommendations accessible through an end user interface – but I really wanted to try it out.
The How – Data, Modeling, And Predictions
Here’s how I went from reviewing some sample images on my phone to building and testing various cutting edge machine learning techniques to developing an AI app in less than a week to recommend hiking trails!
The author on the Overlook Trail in Big Sur, California.
I had about 200 plus scenic images across a lot of my favorite hikes and trails in the last four years of being in the Bay Area! Some of my favorites being Gray Whale Cove Trail in Montara, Golden Gate Park in San Francisco, or just going for a walk by the Embarcadero. Additionally, I tagged the trips with reviews: “I liked this trail a lot, lots of trees and birds to see.” Each review added a piece of information that would give users the ability to make a wise choice (without my help in the future).
Visualizing all the scenic hikes and relevant statistics through a drag and drop upload in Datarobot.
Some sample reviews look like this!
As a data scientist, while I had extensive experience fitting and training the more common classification and regression algorithms on tabular data, I had never developed a reliable, accurate, and integrated computer vision and a natural language processing model – simply because of the time required to iterate on such approaches, the amount of data required to get started, and the necessary skills required to accomplish such a task! So how did I test and iterate on hundreds of the most advanced techniques to develop a reliable model in a few hours? DataRobot AutoML!
Quick note if you are new to ML:In this case, machine learning is helping us learn patterns from historical hiking trips to accurately and reliably recommend which hike would be a good fit!
Me clicking the start button to train and test hundreds of different automated data processing, feature engineering, and algorithmic tasks on the scenic hiking data above!
Reviewing some of the cool techniques the platform tried out — advanced text tokenization, image augmentation, Stochastic Gradient Descent Classifier, Word2Vec, fastText, neural networks, and more!
Of the hundreds of experiments automatically tested, now I am reviewing the model insights to make sure the recommended approach is reliable – I don’t want the model to learn that this hike is classified as “Gray Whale Cove Trail” because of a random rock in the image – it should be learning that there is an ocean, mountains, and majestic views on this hike.
Deployment can often be the hardest stage in a project. As a data scientist, you simply expect that once you build some reliable experiments and share insights your work is done – now my IT or MLOps friends can deploy the code (or operationalize this model). However, since this was a weekend project, I had to be my own MLOps team and get my model live and ready to serve real-time predictions for my users (friends).
Copy and paste API scripting code – I wish it took longer so I can go on a hike in the meantime! Anyways, moving to the last step here.
The What — Making Recommendations and Building Apps
Why does the end-to-end process involve so many stages? As a data scientist, you already have to prepare the data, build and validate models, (educate different audiences on what you are trying to solve), and now I am supposed to be building a front end AI app? However, with some open source streamlit app examples and snippets, I could easily integrate the above API code and test my idea and solution vision!
For fellow coders – feel free to check out the app and modeling code here: Github Repository
Bay Area Hiking Trail Prediction App.
Feel free to try out the app this weekend in the Bay Area (or now if you can’t wait like me!): Bay Area Hiking Trail Prediction App.
For now, I’ve built my model for nine hiking trails. But the project is really motivating me to revisit old hikes to take more pictures (and generate more data). I recently got my hands on the book, Best Hikes in the Bay Area, and there’s a lot more hiking trails that I can’t wait to explore and take pictures of! And the best part of this project is the fact the model can be retrained, updated, and made fully functional within a couple of hours on a weekend. I’ll be continuing to share my hiking passion with my friends and family through my stories and AI solutions!
If you have feedback or questions about the data, process, or my favorite hiking trails – feel free to reach out! Always looking to make incremental improvements! Comment if you would like to see a system setup for adding data or contribute to the app!
Thankful for the support from my fellow hiking friend and colleague, Austin Chou, on helping with data collection and modeling activities!