You may wonder what Hong Kong’s most wanted has to do with data science, well at our December Unhackathon we showed how the HK most wanted list can be turned from a website into a usable dataset.

December’s Unhackathon aimed to tackle the basics of scraping, introduced a new dataset of Asia’s infrastructure projects and made further exploration of the human genome, namely looking for parts of the genome which do not have known relevance to physical conditions.

To kick off our morning of industry talks, DSHK co-organiser Guy Freeman built a working webscraper in 20 minutes while talking us through the process.

You can see the video of his presentation below.

His talk sparked a lot of interest among our guests, many of whom then committed the rest of the afternoon to investigating scraping practices.

DSHK regular Daniil and DSHK co-organiser Xavier aimed to develop new techniques to evade measures that block scraping attempts. Daniil had been attempting to scrape the AngelList website but was having difficulties managing his script’s efficiency. You can see his presentation below.

See the slides here: Scraping

DSHK co-organiser Robert Porsch set the task of looking through a public dataset of the human genome association statistics. Watch his presentation below on his project which looked for any localised genetic correlations between depression and schizophrenia.

You can follow along with Robert’s slides at this link: DS HK genomics (hackathon #11)

We have now switched to holding our events on the first Sunday of the month, with the exception of January when we will not hold an event. Please join us again on February 3 at Times Square, Tower 1, 20/F. Drinks will be provided.