Monthly un-hackathon + talks

Join our Data Science un-hackathon on the first Sunday every month. We host a full day of fun and working together on your data science projects!

At this event attendees will have the chance to pitch their projects, or join with others. We start the day with some fantastic industry specialists who will share their experiences operating in the data science field.

The event will be held at the South China Morning Post offices at Times Square on the first Sunday of the month.

Signup at: Eventbrite, Meetup, Facebook

Schedule of events:

9.30am – Arrive, registration
10am – Welcome
10.15am – Talks begin
11.30am – Pitch session, recruitment
12pm – Work on projects
5.30 pm – Present results of work session

Location:

SCMP: 20/f, Tower 1, Times Square, 1 Matheson St, Causeway Bay

Requirements:

Laptop / charger for those joining the coding
Prepared data, and projects pitches for the ones submitting projects
If presenting, send us your presentation slides ahead of time so we can prepare them.
HK$50 in cash for the space rental

Recommendations for project submissions:

Send us your presentation slides! Drop a link to one of the organisers on Slack or another way. We want to minimise time spent switching laptops so we will run your slides from our pc.
Prepare data in advance as much as you can; spending the day cleaning or retrieving data won’t gather crowds of DS! Contact organisers if you need a data repository to share data with all your team members.
If the project is already underway, prepare an introduction to it so that people can join. If you’re presenting slides, send them to us before you arrive, make sure the task you propose is feasible during the time of the event, and describe the skills you expect your team to have: R or Python? AWS, Spark? etc.

For final presentations:

Start writing the final presentation right from the start and add elements little-by-little all day long. Articulate the reason you want to do the project, and the solution. Make it understandable to everyone.
If you wish, your work will be published on this website with your bio, name, etc.

Other details:

50 participants max
Food/drink: Only water, coffee and tea are provided. Attendees can order their own food to the venue, take a break to find a restaurant nearby or bring their own lunch.
Price: HK$50. We charge a fee to cover organisation costs. We are a not-for-profit organisation and will aim to keep the costs of our events as low as possible to make it accessible to all.

Unhackathon #10 roundup

Our first unhackathon after Typhoon Mangkhut blew out our September meetup kicked off with a talk about a practical application of machine learning, and followed up with an introduction of the benefits of the L0 norm.

Two teams attempted to satisfy their curiosity with analysis of the Hong Kong housing market and a new cryptocurrency blockchain.

Our events happen monthly on Sundays, including talks from industry leaders and practitioners of data science, followed by a pitch session and group work on projects for the afternoon. You can find out more and stay up to date with our next events on our Slack channel, our Meetup page and on Facebook.

The talks

South China Morning Post data engineer Jonathan Barone introduced us to project Dali, a tool under development at the 115 year old newspaper intended to catalogue elements in images, potentially identifying faces and places from the media company’s archive of images.

You can see his talk in the video and slides below.

Dali slides can be read here.

Robert Porsch showed us an alternative way to regularise parameters by using the L0 (L-zero) norm.
He demonstrated that this new penalisation function is able to outperform more traditional approaches, such as the L1 norm, given a large enough sample size.
He has applied this method to predict the genetic risk for health outcomes and behavioural traits.

Read his slides here.

(Apologies for not including video, we had technical difficulties trying to record it.)

The hackathon session

Two groups formed to tackle real estate and create a new blockchain recommendation system.

Take a look at the videos of their presentations below.

Blockchain recommendation engine

This team set themselves the ambitious target of creating a system that can recommend a spread of cryptocurrencies using a number of existing systems including BigQuery, Docker and others. Check out the outcome in their presentation below.

Real estate analysis

This team started with the transaction history of real estate in Hong Kong and another dataset of stock exchange information. Their results were to show which areas had higher risk of low return on investment, also showing a correlation between stock market turmoil and housing transactions.

 

Next event

We are looking at dates in early December. Stay tuned on our Slack #events and Facebook for announcements.

 

Un-hackathon #10

Our 10th Hackathon for Data Science: a full day of fun and working together on YOUR data science projects!

At this event attendees will have the chance to pitch their projects, or join other people’s. And in the beginning of the day we will host some fantastic industry specialists to share their experiences operating in the data science field.

Signup at: Eventbrite, Meetup, Facebook

The event will be held at the South China Morning Post offices at Times Square

 

Schedule of events:

9.30am – Arrive, registration
10am – Welcome
10.15am – Talks begin
11.30am – Pitch session, recruitment
12pm – Work on projects
5.30 pm – Present results of work session

Location:

SCMP: 20/f, Tower 1, Times Square, 1 Matheson St, Causeway Bay

Requirements:

Laptop / charger for those joining the coding
Prepared data, and projects pitches for the ones submitting projects
If presenting, send us your presentation slides ahead of time so we can prepare them.
50HKD in cash for the space rental

Recommendations for project submissions:

Send us your presentation slides! Drop a link to one of the organisers on Slack or another way. We want to minimise time spent switching laptops so we will run your slides from our pc.
Prepare data in advance as much as you can; spending the day cleaning or retrieving data won’t gather crowds of DS! Contact organisers if you need a data repository to share data with all your team members.
If the project is already underway, prepare an introduction to it so that people can join. If you’re presenting slides, send them to us before you arrive, make sure the task you propose is feasible during the time of the event, and describe the skills you expect your team to have: R or Python? AWS, Spark? etc.

For final presentations:

Start writing the final presentation right from the start and add elements little-by-little all day long. Articulate the reason you want to do the project, and the solution. Make it understandable to everyone.
If you wish, your work will be published on this website with your bio, name, etc.

Other details:

50 participants max
Food/drink: Only water, coffee and tea are provided. Attendees can order their own food to the venue, take a break to find a restaurant nearby or bring their own lunch.
Price: 50 HKD. We charge a fee to cover venue and food costs. We are a not-for-profit organisation and will aim to keep the costs of our events as low as possible to make it accessible to all.

Unhackathon #9 roundup

With the World Cup in Russia wrapping up on the same evening as our ninth un-hackathon, football was on our minds, and, with tongue in cheek, Data Science Hong Kong co-organiser Xavier put the question to our cohort of predictive modellers to find the winner, hours before the result was known.

IMG_20180715_165227
We were asked to build a predictive model for the world cup result, but our vote worked well enough. 

Getting down to more serious stuff, Houston Ho presented his company’s work on using machine learning to predict whether an employee is due to leave their position. His tool aims to give human resources teams a score for each employee based on a variety of characteristics. His model is achieving 80% accuracy and he says he can achieve more.

See his presentation below.

DSHK co-organiser Guy Freeman also presented about his new development offering a central repository for scraped data, using an open source philosophy. He showed the system’s potential by using a dataset of property transactions in Hong Kong spanning 20 years.

See his presentation below.

Group projects

Michael attracted the most interest among the group in his restaurant prediction model project. Using data from two restaurant booking systems, he aimed to predict how busy a restaurant would get using a machine learning model.

See the results from the group’s work in the slides below.

Forecasting Visitors of Restaurants

Visiting from Tokyo, Suzana Ilic brought exciting skills to the unhackathon, and decided to set her sights on unpicking hype in the crypto space. She was a bit camera shy so no video but you can see her slides below.

Quantifying Hype

Image from iOS

Morris Wong aimed to build an auto-tagging system for publishing, along the lines of taggernews. This tech could have wide-scale application when it’s up and running. See his presentation below.

Pocket exploration

That’s it for this month’s event. We will be announcing our next event shortly.

Unhackathon #8

Our 8th Hackathon for Data Science: a full day of fun and working together on YOUR data science projects!

At this event attendees will have the chance to pitch their projects, or join other people’s. And in the beginning of the day we will host some fantastic industry specialists to share their experiences operating in the data science field.

Signup at: Eventbrite, Meetup, Facebook

naked-hub

Schedule of events:

9.30am – Arrive, registration
10am – Welcome
10.15am – Talks begin
11.30am – Pitch session, recruitment
12pm – Work on projects
5.30 pm – Present results of work session

Location:
16F, 40-44 Bonham Strand, Sheung Wan, Hong Kong

Requirements:

Laptop / charger for those joining the coding
Prepared data, and projects pitches for the ones submitting projects
If presenting, send us your presentation slides ahead of time so we can prepare them
50HKD in cash for the space rental

Recommendations for project submissions:

Send us your presentation slides! Drop a link to one of the organisers on Slack or another way. We want to minimise time spent switching laptops so we will run your slides from our pc.
Prepare data in advance as much as you can; spending the day cleaning or retrieving data won’t gather crowds of DS! Contact organisers if you need a data repository to share data with all your team members.
If the project is already underway, prepare an introduction to it so that people can join. If you’re presenting slides, send them to us before you arrive, make sure the task you propose is feasible during the time of the event, and describe the skills you expect your team to have: R or Python? AWS, Spark? etc.

For final presentations:

Start writing the final presentation right from the start and add elements little-by-little all day long. Articulate the reason you want to do the project, and the solution. Make it understandable to everyone.
If you wish, your work will be published on this website with your bio, name, etc.

Other details:

50 participants max
Food/drink: Only water, coffee and tea are provided. Attendees can order their own food to the venue, take a break to find a restaurant nearby or bring their own lunch.
Price: 50 HKD. We charge a fee to cover venue and food costs. We are a not-for-profit organisation and will aim to keep the costs of our events as low as possible to make it accessible to all.

Unhackathon #7 round-up: making sense through data

What time do people rent share bikes in San Jose? Houston and a group of data scientists has looked at bike share data in California and made some curious obvservations at our April unhackathon.

We also heard from Nick Lam-wai who is building a database on Hong Kong’s budget, the blueprint of government spending and priorities. And Chris Choy, who was working with Nick also discovered how to take historical PDFs of the budget and read the tables into Nick’s database. Expect big things from this group.

Our second meet up at Accellerate in Sheung Wan started with a discussion of the  Catboost library by Daniil Chepenko, who explains its benefits over other methods such as random forest.

Catboost is a gradient boosting library for work on decision trees, developed by the Russian search engine Yandex, building on many years of development in this field.

See his presentation video below, and follow the slides here.

Projects

Willis sought to find out what makes a Kickstarter project work. He came to the hackathon with data from 2009-2017, and a trained model with 60% accuracy, up from 30% at the beginning of his work. Knowing whether a Kickstarter will succeed is a huge investment advantage, so watch the short videos to see how well he went.

Pitch:

Conclusion:

Elizabeth Briel and Ben Davis have been seeking new ways to tell the story of global warming’s effects on arctic sea ice, and came to the hackathon with data they wanted to turn into a song. See the results below.

Slides are here.

Pitch:

Conclusion:

Nick Lam-wai created a thorough database of the Hong Kong budget, turning it from a human readable collection of documents back into one ready for machine analysis.

Slides here.

Pitch:

Conclusion:

Overwatch strategies revealed with data science

Ram de Guzman presented this analysis of Overwatch team strategies using scraped data from Winston’s Lab (which gathers it directly from game videos). His insight revealed how the best teams in South Korea arranged their teams and fought.

In the video he describes the process of gathering his data, then shows in impressive visualisations how that data relates to actual game strategy.

Watch his talk at our 6th unhackathon in March here:

 

And you can follow his project here.

March 18 Unhackathon #6

nh-sw-hk-1Data scientists: we are organising another Un-hackathon in our monthly series. There will be talks and a day of collaborative, hands-on problem solving.

Sign up on our Eventbrite page and stay up to date with upcoming events on our meetup page.

 

Details:

  • 9.30am – Arrive, registration
  • 10am – Welcome
  • 10.15am – Talks begin
  • 11.30am – Pitch session, recruitment
  • 12pm – Work on projects
  • 5.30 pm – Present results of work session

Location:

16F, 40-44 Bonham Strand, Sheung Wan, Hong Kong

Requirements:

Laptop and charger for those joining the coding.
Prepared data and project pitches for those submitting projects.
If presenting, send us your presentation slides ahead of time so we can prepare them.
50HKD in cash for admin and organisation.
Recommendations for project submissions:
Prepare data in advance as much as you can; spending the day cleaning or retrieving data won’t gather crowds of DS! Contact organisers if you need a data repository to share data with all your team members.
If the project is already underway, prepare an introduction to it so that people can join (if you’re presenting slides, send them to us before you arrive), and make sure the task you propose is feasible during the time of the event, and describe the skills you expect your team to have: R or Python? AWS, Spark? etc.

For final presentations:

Start writing the final presentation right from the start and add elements little by little all day long. Recall the context of the project and articulate the presentations to make it understandable by the non-initiated public around you.
If you wish your work will be published on the website datasciencehongkong.com with your bio, name, etc.

Other details:

50 participants max
Food / drink: Only water, coffee and snacks are provided. Attendees can order their own food to the venue, take a break to find a restaurant in Kennedy Town or bring their own lunch.
Price: 50 HKD. We charge a fee to cover costs. We are not a for-profit organisation and will aim to keep the costs of our events as low as possible to make it accessible to all.

Unhackathon #5: Discovering trends in property data and scams in ICOs

Our fifth un-hackathon kicked off the year with 30 eager data scientists attending. The event at Makerhive in Kennedy town was the first of the year and combined industry talks with project based collaboration.

As the mercury dipped outside the fires of creativity burned bright among our attendees. Five project leaders suggested projects to focus the skills of our data scientists on uncovering new insights into the property market in Hong Kong, with a 1.6 million row record of transactions over the past 20 years. Another project aimed to discover whether public data can spot a scam initial coin offering, or ICO.

Presentations

IMG_20180204_103740_HDR

Pranav Agrawal, an HK University of Science and Technolology student, presented a code tutorial on Multi-layer perceptrons in PyTorch. The in-depth, code-centric tutorial took us step by step through the process. We can share links to the documentation here:

Github

Presentation


Hang Xu presented his method of looking at DNA using the word-to-vector model. He said his method of adapting the word2vec model to analyse DNA was superior to the best usage of the current method of analysing DNA using a one-hot vector method.

Presentation

Projects

  • Guy’s property data analysis
  • Jenson’s ICO scam detector
  • Kirill’s Ansible machine learning speed booster

Property data analysis

Using a 1.2gb table of 1.6 million property transactions in Hong Kong, from 1997 to today, this group looked for trends and insights in the property market. Some of the central questions were quantifying the rate that property prices were growing in relation to wage growth in the city.

They found some bargains, even in the current market. See their presentation with their findings.

Ansible speed boosting for NumPy and R

A lot of machine learning tools depend on matrix manipulation libraries, e.g. NumPy. In a basic configuration it uses CPU for linear algebra computations, such as matrix multiplication, SVD or Eigenvalues decomposition. OpenBLAS speeds computations 4-10x via Fortran binding.

Github

See their presentation here.

Is this ICO a scam?

The group pulled a list of over 1600 ICOs from the past two years, and with the question of whether they could establish whether it is a scam, evaluated their value. The second step was to gather the return on investment for each of the ICOs, and the countries they were reported to have come from.

See their presentation and findings here.

Job explorer

Morris Wong worked on scraping a dataset to build a structured system to help jobseekers vet a company before joining. Using stealjobs.com data he aims to build an explorer in the shape of GitXplore using four metrics: income, working hours, promotion prospect, happiness. The data is user generated.

See you all at our next event in March.

 

Un-hackathon #5 – February 4

poster_DSHK_feb4thWe have a new event on February 4 at the Hive.
At this event attendees will have the chance to pitch their projects, or join other people’s. And in the beginning of the day we will host some fantastic industry specialists to share their experiences operating in the data science field.
Please sign up to our Eventbrite ticket site to secure your place.

Speakers:

  • Pranav Agrawal, HKUST student researching morphological natural language processing.
    This talk will focus on basics of Pytorch, giving the brief steps regarding setting up PyTorch, implementing a basic neural network and a convolutional neural network. Basic knowledge regarding deep learning is assumed.
  • Hang Xu, PhD Candidate, will speak about his project: Application of word2vec to represent biological sequences

Schedule of events:

9.30am – Arrive, registration
10am – Welcome
10.15am – Talks begin
11.30am – Pitch session, recruitment
12pm – Work on projects
5.30 pm – Present results of work session

Location:

10/F Cheung Hing Industrial Building
12P Smithfield Road, Kennedy Town
See you on the 4th!