Unhackathon #10 roundup

Our first unhackathon after Typhoon Mangkhut blew out our September meetup kicked off with a talk about a practical application of machine learning, and followed up with an introduction of the benefits of the L0 norm.

Two teams attempted to satisfy their curiosity with analysis of the Hong Kong housing market and a new cryptocurrency blockchain.

Our events happen monthly on Sundays, including talks from industry leaders and practitioners of data science, followed by a pitch session and group work on projects for the afternoon. You can find out more and stay up to date with our next events on our Slack channel, our Meetup page and on Facebook.

The talks

South China Morning Post data engineer Jonathan Barone introduced us to project Dali, a tool under development at the 115 year old newspaper intended to catalogue elements in images, potentially identifying faces and places from the media company’s archive of images.

You can see his talk in the video and slides below.

Dali slides can be read here.

Robert Porsch showed us an alternative way to regularise parameters by using the L0 (L-zero) norm.
He demonstrated that this new penalisation function is able to outperform more traditional approaches, such as the L1 norm, given a large enough sample size.
He has applied this method to predict the genetic risk for health outcomes and behavioural traits.

Read his slides here.

(Apologies for not including video, we had technical difficulties trying to record it.)

The hackathon session

Two groups formed to tackle real estate and create a new blockchain recommendation system.

Take a look at the videos of their presentations below.

Blockchain recommendation engine

This team set themselves the ambitious target of creating a system that can recommend a spread of cryptocurrencies using a number of existing systems including BigQuery, Docker and others. Check out the outcome in their presentation below.

Real estate analysis

This team started with the transaction history of real estate in Hong Kong and another dataset of stock exchange information. Their results were to show which areas had higher risk of low return on investment, also showing a correlation between stock market turmoil and housing transactions.

 

Next event

We are looking at dates in early December. Stay tuned on our Slack #events and Facebook for announcements.

 

Unhackathon #9 roundup

With the World Cup in Russia wrapping up on the same evening as our ninth un-hackathon, football was on our minds, and, with tongue in cheek, Data Science Hong Kong co-organiser Xavier put the question to our cohort of predictive modellers to find the winner, hours before the result was known.

IMG_20180715_165227
We were asked to build a predictive model for the world cup result, but our vote worked well enough. 

Getting down to more serious stuff, Houston Ho presented his company’s work on using machine learning to predict whether an employee is due to leave their position. His tool aims to give human resources teams a score for each employee based on a variety of characteristics. His model is achieving 80% accuracy and he says he can achieve more.

See his presentation below.

DSHK co-organiser Guy Freeman also presented about his new development offering a central repository for scraped data, using an open source philosophy. He showed the system’s potential by using a dataset of property transactions in Hong Kong spanning 20 years.

See his presentation below.

Group projects

Michael attracted the most interest among the group in his restaurant prediction model project. Using data from two restaurant booking systems, he aimed to predict how busy a restaurant would get using a machine learning model.

See the results from the group’s work in the slides below.

Forecasting Visitors of Restaurants

Visiting from Tokyo, Suzana Ilic brought exciting skills to the unhackathon, and decided to set her sights on unpicking hype in the crypto space. She was a bit camera shy so no video but you can see her slides below.

Quantifying Hype

Image from iOS

Morris Wong aimed to build an auto-tagging system for publishing, along the lines of taggernews. This tech could have wide-scale application when it’s up and running. See his presentation below.

Pocket exploration

That’s it for this month’s event. We will be announcing our next event shortly.

Unhackathon #7 round-up: making sense through data

What time do people rent share bikes in San Jose? Houston and a group of data scientists has looked at bike share data in California and made some curious obvservations at our April unhackathon.

We also heard from Nick Lam-wai who is building a database on Hong Kong’s budget, the blueprint of government spending and priorities. And Chris Choy, who was working with Nick also discovered how to take historical PDFs of the budget and read the tables into Nick’s database. Expect big things from this group.

Our second meet up at Accellerate in Sheung Wan started with a discussion of the  Catboost library by Daniil Chepenko, who explains its benefits over other methods such as random forest.

Catboost is a gradient boosting library for work on decision trees, developed by the Russian search engine Yandex, building on many years of development in this field.

See his presentation video below, and follow the slides here.

Projects

Willis sought to find out what makes a Kickstarter project work. He came to the hackathon with data from 2009-2017, and a trained model with 60% accuracy, up from 30% at the beginning of his work. Knowing whether a Kickstarter will succeed is a huge investment advantage, so watch the short videos to see how well he went.

Pitch:

Conclusion:

Elizabeth Briel and Ben Davis have been seeking new ways to tell the story of global warming’s effects on arctic sea ice, and came to the hackathon with data they wanted to turn into a song. See the results below.

Slides are here.

Pitch:

Conclusion:

Nick Lam-wai created a thorough database of the Hong Kong budget, turning it from a human readable collection of documents back into one ready for machine analysis.

Slides here.

Pitch:

Conclusion:

Data science news round up

Our tight-knit community of data scientist have shared a wealth of news and inspiring projects from around the web over the past couple of months. Here is a brief round up of the more interesting articles, and remember, you can join in on our slack group.

2-l-304106-unsplash

Millions of Chinese farmers reap benefits of huge crop experiment

An article that demonstrates the world changing potential of evidence based approaches to the world’s problems. For me, it’s also a reminder that it’s often not the latest buzzword or most glamourous topics that have the most impact.

Winning with Data Science

Next is an article examining the business and organisational side of data science. This is a topic that probably doesn’t get enough attention compared to the latest and coolest algorithm. It’s important for data scientists to take an interest in how organisations should adapt, if they don’t it will probably be decided by someone not qualified to make the decision!

nasa-43569-unsplash

What Comes After Deep Learning?

This article examines whether deep learning is actually a blind alley and considers what new approaches might be next for data science. Also a brief examination of the question of US vs China in the AI “arms race”.

‘Who’s Leading AI’ Isn’t the Intelligent Question

Our final article explores the much talked about question of whether the US or China is winning and why it’s not the right question to ask.

If you found any of these articles interesting then do come and join the discussion on our Slack group, where you will also find details of meetups. https://datasciencehk.slack.com/