Did you know that you can navigate the posts by swiping left and right?

Indiahacks ML Finale | LIVE BLOG

08 Sep 2017 . category: report . Comments
#data science #machine learning #hackathon

Welcome to the Indiahacks Live Blog Version 2.0! This is the grand finale, fifteen of us fighting it out at Taj Vivanta, Bangalore :)

Setptember 9th, 11:02 – So, i don’t rank within the Top 10 on the final leaderboard. Slightly disheartened, but i should remember that i’m just a newbie in this game. A lot of my competitors were experienced data scientists and top kagglers. If nothing, i should be taking inspiration from the champs, there’s a lot to learn, a lot to do! Adios :)

03:38 – Yeah, calling it a night. Will update when the private leaderboard is released in the morning.

02:48 – I’m trying incremental ideas, not helping a lot. Guess my position is stuck at 9th now. With my mind zoning off, settling the competition seems like the best option.

01:28 – There seems to be some bug in the submissions judge, apparently all entries are being rejudged. Meanwhile, i implemented a greedy approach, in decreasing order of language frequency. To be tested.

September 9th, 00:22 – The competition has been extended till 8AM, came as a shocker to me! I had my dinner, took an Uber back home, and slipped to 8th in the meanwhile! Will fight for a couple more hours now.

21:53 – Slipped to 6th. Have tried multiple approaches using view ratio, and considering multiple movies based on the share of their view ratios. Putting in my last ditch efforts now.

21:05 – Introduced a language and genre combo, which worsened my score. Kinda weird, but makes sense. Also, submitted my similarities-based approach, gave me 0.12x on the lb. Have a couple more ideas on the “popular” approach, a couple more hours to go. Hanging on to the second position :)

20:16 – Introduced language into the previous approach, my lb score jumped up to 0.2288. Currently second on the leaderboard, this is seriously hilarious!

19:38 – I’m literally laughing at what has transpired in the last 5 minutes. While my similarities were being computed, i decided to go ahead with a naive submission – Most popular 20 movies for all users. This stupidity has put me up on 5th position, with 3 more guys tied. While everyone is busy building their pipeline, this naivety seems to have lost on most. Should not be long before people catch up. Gotta improvise :p

18:34 – The pipeline is ready, waiting for the similaritis to run. Banking on the fact that hashing the similarities of popular items will help me stay within the time limit. Seems like a long wait :/

18:00 – Onto my second cappucino within an hour. The 2 mil rows are giving me the hots :p Decided to go ahead with filtering the not-so-frequent users, for the sake of time and complexity, compromising on training data. Still building the pipeline for test data iteration. Looks like a couple submits by end of the hackathon would be a good result! Shouldn’t be demotivated. Cmon fight!

16:38 – Frustrated. Switched from a user-user collaborative system to item-item. Should’ve figured this out earlier, given i have 2 mil users and 3K movies. Feel like i need a break, should hopeful for a submit in an hour or so.

15:16 – Getting in the flow, slowly. The data is up and ready, i’m still figuring out how i want to structure my dataframes. I’m reminded of a similar problem i’d taken up around a year back, around building hybrid recommender systems. Current plan of action is to build a workable pipeline, and then take all my feature engg. ideas from there.

14:37 – I’ve had a look at the problem statement – Recommending movies to users based on their previous viewing history on Hotstar. 2 mil rows in training, and a limited number of submissions – add unclean data to the mix. This challenge is gonna be fun. Starting with cleaning the dataset.

September 8th, 13:45 – 15 minutes to start. I’m more hungry than excited, gotta grab some lunch!


Shubhankar is an awesome person. He's Co-Founder & CTO of Houseware, building the command center for modern revenue teams. In his spare time, he likes to go out on runs!