Un-hackathon #10

Our 10th Hackathon for Data Science: a full day of fun and working together on YOUR data science projects!

At this event attendees will have the chance to pitch their projects, or join other people’s. And in the beginning of the day we will host some fantastic industry specialists to share their experiences operating in the data science field.

Signup at: Eventbrite, Meetup, Facebook

The event will be held at the South China Morning Post offices at Times Square

 

Schedule of events:

9.30am – Arrive, registration
10am – Welcome
10.15am – Talks begin
11.30am – Pitch session, recruitment
12pm – Work on projects
5.30 pm – Present results of work session

Location:

SCMP: 20/f, Tower 1, Times Square, 1 Matheson St, Causeway Bay

Requirements:

Laptop / charger for those joining the coding
Prepared data, and projects pitches for the ones submitting projects
If presenting, send us your presentation slides ahead of time so we can prepare them.
50HKD in cash for the space rental

Recommendations for project submissions:

Send us your presentation slides! Drop a link to one of the organisers on Slack or another way. We want to minimise time spent switching laptops so we will run your slides from our pc.
Prepare data in advance as much as you can; spending the day cleaning or retrieving data won’t gather crowds of DS! Contact organisers if you need a data repository to share data with all your team members.
If the project is already underway, prepare an introduction to it so that people can join. If you’re presenting slides, send them to us before you arrive, make sure the task you propose is feasible during the time of the event, and describe the skills you expect your team to have: R or Python? AWS, Spark? etc.

For final presentations:

Start writing the final presentation right from the start and add elements little-by-little all day long. Articulate the reason you want to do the project, and the solution. Make it understandable to everyone.
If you wish, your work will be published on this website with your bio, name, etc.

Other details:

50 participants max
Food/drink: Only water, coffee and tea are provided. Attendees can order their own food to the venue, take a break to find a restaurant nearby or bring their own lunch.
Price: 50 HKD. We charge a fee to cover venue and food costs. We are a not-for-profit organisation and will aim to keep the costs of our events as low as possible to make it accessible to all.

Unhackathon #9 roundup

With the World Cup in Russia wrapping up on the same evening as our ninth un-hackathon, football was on our minds, and, with tongue in cheek, Data Science Hong Kong co-organiser Xavier put the question to our cohort of predictive modellers to find the winner, hours before the result was known.

IMG_20180715_165227
We were asked to build a predictive model for the world cup result, but our vote worked well enough. 

Getting down to more serious stuff, Houston Ho presented his company’s work on using machine learning to predict whether an employee is due to leave their position. His tool aims to give human resources teams a score for each employee based on a variety of characteristics. His model is achieving 80% accuracy and he says he can achieve more.

See his presentation below.

DSHK co-organiser Guy Freeman also presented about his new development offering a central repository for scraped data, using an open source philosophy. He showed the system’s potential by using a dataset of property transactions in Hong Kong spanning 20 years.

See his presentation below.

Group projects

Michael attracted the most interest among the group in his restaurant prediction model project. Using data from two restaurant booking systems, he aimed to predict how busy a restaurant would get using a machine learning model.

See the results from the group’s work in the slides below.

Forecasting Visitors of Restaurants

Visiting from Tokyo, Suzana Ilic brought exciting skills to the unhackathon, and decided to set her sights on unpicking hype in the crypto space. She was a bit camera shy so no video but you can see her slides below.

Quantifying Hype

Image from iOS

Morris Wong aimed to build an auto-tagging system for publishing, along the lines of taggernews. This tech could have wide-scale application when it’s up and running. See his presentation below.

Pocket exploration

That’s it for this month’s event. We will be announcing our next event shortly.

Unhackathon #8

Our 8th Hackathon for Data Science: a full day of fun and working together on YOUR data science projects!

At this event attendees will have the chance to pitch their projects, or join other people’s. And in the beginning of the day we will host some fantastic industry specialists to share their experiences operating in the data science field.

Signup at: Eventbrite, Meetup, Facebook

naked-hub

Schedule of events:

9.30am – Arrive, registration
10am – Welcome
10.15am – Talks begin
11.30am – Pitch session, recruitment
12pm – Work on projects
5.30 pm – Present results of work session

Location:
16F, 40-44 Bonham Strand, Sheung Wan, Hong Kong

Requirements:

Laptop / charger for those joining the coding
Prepared data, and projects pitches for the ones submitting projects
If presenting, send us your presentation slides ahead of time so we can prepare them
50HKD in cash for the space rental

Recommendations for project submissions:

Send us your presentation slides! Drop a link to one of the organisers on Slack or another way. We want to minimise time spent switching laptops so we will run your slides from our pc.
Prepare data in advance as much as you can; spending the day cleaning or retrieving data won’t gather crowds of DS! Contact organisers if you need a data repository to share data with all your team members.
If the project is already underway, prepare an introduction to it so that people can join. If you’re presenting slides, send them to us before you arrive, make sure the task you propose is feasible during the time of the event, and describe the skills you expect your team to have: R or Python? AWS, Spark? etc.

For final presentations:

Start writing the final presentation right from the start and add elements little-by-little all day long. Articulate the reason you want to do the project, and the solution. Make it understandable to everyone.
If you wish, your work will be published on this website with your bio, name, etc.

Other details:

50 participants max
Food/drink: Only water, coffee and tea are provided. Attendees can order their own food to the venue, take a break to find a restaurant nearby or bring their own lunch.
Price: 50 HKD. We charge a fee to cover venue and food costs. We are a not-for-profit organisation and will aim to keep the costs of our events as low as possible to make it accessible to all.

Unhackathon #7 round-up: making sense through data

What time do people rent share bikes in San Jose? Houston and a group of data scientists has looked at bike share data in California and made some curious obvservations at our April unhackathon.

We also heard from Nick Lam-wai who is building a database on Hong Kong’s budget, the blueprint of government spending and priorities. And Chris Choy, who was working with Nick also discovered how to take historical PDFs of the budget and read the tables into Nick’s database. Expect big things from this group.

Our second meet up at Accellerate in Sheung Wan started with a discussion of the  Catboost library by Daniil Chepenko, who explains its benefits over other methods such as random forest.

Catboost is a gradient boosting library for work on decision trees, developed by the Russian search engine Yandex, building on many years of development in this field.

See his presentation video below, and follow the slides here.

Projects

Willis sought to find out what makes a Kickstarter project work. He came to the hackathon with data from 2009-2017, and a trained model with 60% accuracy, up from 30% at the beginning of his work. Knowing whether a Kickstarter will succeed is a huge investment advantage, so watch the short videos to see how well he went.

Pitch:

Conclusion:

Elizabeth Briel and Ben Davis have been seeking new ways to tell the story of global warming’s effects on arctic sea ice, and came to the hackathon with data they wanted to turn into a song. See the results below.

Slides are here.

Pitch:

Conclusion:

Nick Lam-wai created a thorough database of the Hong Kong budget, turning it from a human readable collection of documents back into one ready for machine analysis.

Slides here.

Pitch:

Conclusion:

Overwatch strategies revealed with data science

Ram de Guzman presented this analysis of Overwatch team strategies using scraped data from Winston’s Lab (which gathers it directly from game videos). His insight revealed how the best teams in South Korea arranged their teams and fought.

In the video he describes the process of gathering his data, then shows in impressive visualisations how that data relates to actual game strategy.

Watch his talk at our 6th unhackathon in March here:

 

And you can follow his project here.

Hong Kong property data can work for you: here’s how

Buying a property will — for most people — be the biggest purchase they will ever make. So what’s the smart way for a data scientist to figure out the best flat deal in Hong Kong? Scrape and hack the market of course.

Normal home buyers may burn a lot of shoe leather visiting dozens of real estate agents, spend hours on websites looking at individual entries, maybe even start a spreadsheet.

But using a bit of data, or in this case around 1.65 million transaction records, we start to see through the sales pitches and get a feel for the total market.

Alternatively, there are many websites with dozens of pages as research fodder.

hk1to90
Only 90 pages, no biggie

What’s missing is a tool that allows you to view the actual sale prices throughout Hong Kong with a single user interface.  That’s what we developed during Data Science Hong Kong’s fifth unhackathon using data kindly donated by Hong Kong’s premier Open Data platform dataguru.hk.

A market exploration tool

Using a dataset of 1.65 million transactions scraped from the site of one of Hong Kong’s main real estate agencies, this tool (pictured below) shows the average price per square foot by location and time.

new_screen
A visual representation of property price per square foot in 2017

Reading the data

Each circle represents a building, colour-coded and shaded darker to show higher prices per square foot. Where more sales are registered in the user’s chosen time period the circle grows larger.

With this tool, market understanding becomes a lot easier and more intuitive. In one single interface, all the actual transactions can be summarized and filtered using attributes like price, size, address or sale date — for example, flats between 300 and 400 square feet.

At first glance, the colours of the markers would suggest that Central and the west of Hong Kong Island look like the most expensive areas, followed by Tsim Sha Tsui, and the southern Kowloon peninsula.

Looking closer at the northern edge of Kowloon Park, we can set the transaction year on 2017. A summary appears upon hovering over the property and we can see that 3 flat sales occurred at 10-24 Parkes Street (known as Wing Fu Mansion), for an average price of under HK$14,400 a square foot.

focus_example
Sales in the north of Kowloon Park in 2017 

The following animation shows how easy it is to navigate through the districts and market history, and get a clear and visual idea of prices in the area.

js_research_red
Only the beginning for our property search tool

You can try it yourself by following this link.

How to get the data

To build this tool, we need first to collect the data.

transaction_sample
How the original data was presented online

This dataset was scraped using the Python webscraping library Scrapy. For most transactions, it has the flat, floor area, floor and building address and most importantly the prices and dates of sales.

json_line
One transaction in JSON format

After cleaning the data, our team began looking at prices and transaction volumes, including outliers. You can see a presentation of the data here.

Collecting geolocation data

To be able to correctly place a flat on a map, we need its geolocation, consisting of a longitude and latitude pair. The only location data we have is the address, so we can use the Google Maps API to convert the addresses to lat/lon pairs.

But Google limits API calls at 2500 addresses, with a 50 cent charge for an additional 1000 requests. Small change, but fortunately it’s not necessary in Hong Kong as there are other organizations that also offer this service.

The OFCA government website, used to check whether a building has digital television or fibre-optic broadband, also returns the geolocation of the searched address. We automatically requested all block addresses. In the following screenshots, we verify correct resolution of the address.

resu_robinson
Geolocation results for 103 Robinson Road
robinson_road
Google Map results for this geolocation
number
Street view on the result: building # 103

After joining transactions and geolocations, we can start building the tool.

Building the visualization tool

The main steps to build  this tool are:

  • Creating a basic html template with text blocks and containers for titles, caption, transaction year filter and popup text
  • Loading the map from the ESRI javascript API. The parameters are set to focus on Hong Kong.
  • Loading the transaction data and looping over it . For each year and address, we compute the circle parameters.
    • The radius is a linear function of the square root of transaction quantities. With this choice, circle area and transaction quantity are linearly dependent.
    • The colors (red and green channels) are simple linear functions of the price.
  • Managing user events: When the user hovers the mouse on a point, a javascript function is called to display the data related to the point. When the mouse leave, the data is cleared.

Taking the next steps

Various improvements to this first version can be implemented:

  • Some data quality issues have been raised (missing flat areas, block address, etc.) and need to be corrected
  • The decision of buying a flat includes other factors that can be incorporated into this visualization tool. For instance, car traffic, public transport options, school networks (some of which are already included in the data), average pollution levels, altitude/topography
  • Adding current flat sale offers in the vicinity to help find the best deals

If you have any question or want to further explore the data, don’t hesitate to send a message in the slack group: datasciencehk.slack.com

Credits:

dataguru.hk for the data and support
Thanks go to the Data Science Hong Kong organizers for this event.

Data science news round up

Our tight-knit community of data scientist have shared a wealth of news and inspiring projects from around the web over the past couple of months. Here is a brief round up of the more interesting articles, and remember, you can join in on our slack group.

2-l-304106-unsplash

Millions of Chinese farmers reap benefits of huge crop experiment

An article that demonstrates the world changing potential of evidence based approaches to the world’s problems. For me, it’s also a reminder that it’s often not the latest buzzword or most glamourous topics that have the most impact.

Winning with Data Science

Next is an article examining the business and organisational side of data science. This is a topic that probably doesn’t get enough attention compared to the latest and coolest algorithm. It’s important for data scientists to take an interest in how organisations should adapt, if they don’t it will probably be decided by someone not qualified to make the decision!

nasa-43569-unsplash

What Comes After Deep Learning?

This article examines whether deep learning is actually a blind alley and considers what new approaches might be next for data science. Also a brief examination of the question of US vs China in the AI “arms race”.

‘Who’s Leading AI’ Isn’t the Intelligent Question

Our final article explores the much talked about question of whether the US or China is winning and why it’s not the right question to ask.

If you found any of these articles interesting then do come and join the discussion on our Slack group, where you will also find details of meetups. https://datasciencehk.slack.com/

April 15 Unhackathon #7

poster_7

We are organizing another Un-Hackathon on April 15th! You can sign up here! We have organized a number of talks and a day of collaborative, hands-on problem solving.

Details:

  • 9.30am – Arrive, registration
  • 10am – Welcome
  • 10.15am – Talks begin
  • 11.30am – Pitch session, recruitment
  • 12pm – Work on projects
  • 5.30 pm – Present results of work session

Location:

11F, 40-44 Bonham Strand, Sheung Wan, Hong Kong

Requirements:

Laptop and charger for those joining the coding.
Prepared data and project pitches for those submitting projects.
If presenting, send us your presentation slides ahead of time so we can prepare them.
50HKD in cash for admin and organisation.
Recommendations for project submissions:
Prepare data in advance as much as you can; spending the day cleaning or retrieving data won’t gather crowds of DS! Contact organisers if you need a data repository to share data with all your team members.
If the project is already underway, prepare an introduction to it so that people can join (if you’re presenting slides, send them to us before you arrive), and make sure the task you propose is feasible during the time of the event, and describe the skills you expect your team to have: R or Python? AWS, Spark? etc.

For final presentations:

Start writing the final presentation right from the start and add elements little by little all day long. Recall the context of the project and articulate the presentations to make it understandable by the non-initiated public around you.
If you wish your work will be published on the website datasciencehongkong.com with your bio, name, etc.

Other details:

50 participants max
Food / drink: Only water, coffee and snacks are provided. Attendees can order their own food to the venue, take a break to find a restaurant in Kennedy Town or bring their own lunch.
Price: 50 HKD. We charge a fee to cover costs. We are not a for-profit organisation and will aim to keep the costs of our events as low as possible to make it accessible to all.

March 18 Unhackathon #6

nh-sw-hk-1Data scientists: we are organising another Un-hackathon in our monthly series. There will be talks and a day of collaborative, hands-on problem solving.

Sign up on our Eventbrite page and stay up to date with upcoming events on our meetup page.

 

Details:

  • 9.30am – Arrive, registration
  • 10am – Welcome
  • 10.15am – Talks begin
  • 11.30am – Pitch session, recruitment
  • 12pm – Work on projects
  • 5.30 pm – Present results of work session

Location:

16F, 40-44 Bonham Strand, Sheung Wan, Hong Kong

Requirements:

Laptop and charger for those joining the coding.
Prepared data and project pitches for those submitting projects.
If presenting, send us your presentation slides ahead of time so we can prepare them.
50HKD in cash for admin and organisation.
Recommendations for project submissions:
Prepare data in advance as much as you can; spending the day cleaning or retrieving data won’t gather crowds of DS! Contact organisers if you need a data repository to share data with all your team members.
If the project is already underway, prepare an introduction to it so that people can join (if you’re presenting slides, send them to us before you arrive), and make sure the task you propose is feasible during the time of the event, and describe the skills you expect your team to have: R or Python? AWS, Spark? etc.

For final presentations:

Start writing the final presentation right from the start and add elements little by little all day long. Recall the context of the project and articulate the presentations to make it understandable by the non-initiated public around you.
If you wish your work will be published on the website datasciencehongkong.com with your bio, name, etc.

Other details:

50 participants max
Food / drink: Only water, coffee and snacks are provided. Attendees can order their own food to the venue, take a break to find a restaurant in Kennedy Town or bring their own lunch.
Price: 50 HKD. We charge a fee to cover costs. We are not a for-profit organisation and will aim to keep the costs of our events as low as possible to make it accessible to all.