Unhackathon #11 roundup

You may wonder what Hong Kong’s most wanted has to do with data science, well at our December Unhackathon we showed how the HK most wanted list can be turned from a website into a usable dataset.

December’s Unhackathon aimed to tackle the basics of scraping, introduced a new dataset of Asia’s infrastructure projects and made further exploration of the human genome, namely looking for parts of the genome which do not have known relevance to physical conditions.

To kick off our morning of industry talks, DSHK co-organiser Guy Freeman built a working webscraper in 20 minutes while talking us through the process.

You can see the video of his presentation below.

His talk sparked a lot of interest among our guests, many of whom then committed the rest of the afternoon to investigating scraping practices.

DSHK regular Daniil and DSHK co-organiser Xavier aimed to develop new techniques to evade measures that block scraping attempts. Daniil had been attempting to scrape the AngelList website but was having difficulties managing his script’s efficiency. You can see his presentation below.

See the slides here: Scraping Angel.co

DSHK co-organiser Robert Porsch set the task of looking through a public dataset of the human genome association statistics. Watch his presentation below on his project which looked for any localised genetic correlations between depression and schizophrenia.

You can follow along with Robert’s slides at this link: DS HK genomics (hackathon #11)

We have now switched to holding our events on the first Sunday of the month, with the exception of January when we will not hold an event. Please join us again on February 3 at Times Square, Tower 1, 20/F. Drinks will be provided.

 

 

Unhackathon #10 roundup

Our first unhackathon after Typhoon Mangkhut blew out our September meetup kicked off with a talk about a practical application of machine learning, and followed up with an introduction of the benefits of the L0 norm.

Two teams attempted to satisfy their curiosity with analysis of the Hong Kong housing market and a new cryptocurrency blockchain.

Our events happen monthly on Sundays, including talks from industry leaders and practitioners of data science, followed by a pitch session and group work on projects for the afternoon. You can find out more and stay up to date with our next events on our Slack channel, our Meetup page and on Facebook.

The talks

South China Morning Post data engineer Jonathan Barone introduced us to project Dali, a tool under development at the 115 year old newspaper intended to catalogue elements in images, potentially identifying faces and places from the media company’s archive of images.

You can see his talk in the video and slides below.

Dali slides can be read here.

Robert Porsch showed us an alternative way to regularise parameters by using the L0 (L-zero) norm.
He demonstrated that this new penalisation function is able to outperform more traditional approaches, such as the L1 norm, given a large enough sample size.
He has applied this method to predict the genetic risk for health outcomes and behavioural traits.

Read his slides here.

(Apologies for not including video, we had technical difficulties trying to record it.)

The hackathon session

Two groups formed to tackle real estate and create a new blockchain recommendation system.

Take a look at the videos of their presentations below.

Blockchain recommendation engine

This team set themselves the ambitious target of creating a system that can recommend a spread of cryptocurrencies using a number of existing systems including BigQuery, Docker and others. Check out the outcome in their presentation below.

Real estate analysis

This team started with the transaction history of real estate in Hong Kong and another dataset of stock exchange information. Their results were to show which areas had higher risk of low return on investment, also showing a correlation between stock market turmoil and housing transactions.

 

Next event

We are looking at dates in early December. Stay tuned on our Slack #events and Facebook for announcements.

 

Overwatch strategies revealed with data science

Ram de Guzman presented this analysis of Overwatch team strategies using scraped data from Winston’s Lab (which gathers it directly from game videos). His insight revealed how the best teams in South Korea arranged their teams and fought.

In the video he describes the process of gathering his data, then shows in impressive visualisations how that data relates to actual game strategy.

Watch his talk at our 6th unhackathon in March here:

 

And you can follow his project here.