Un-hackathon #5 – February 4

poster_DSHK_feb4thWe have a new event on February 4 at the Hive.
At this event attendees will have the chance to pitch their projects, or join other people’s. And in the beginning of the day we will host some fantastic industry specialists to share their experiences operating in the data science field.
Please sign up to our Eventbrite ticket site to secure your place.

Speakers:

  • Pranav Agrawal, HKUST student researching morphological natural language processing.
    This talk will focus on basics of Pytorch, giving the brief steps regarding setting up PyTorch, implementing a basic neural network and a convolutional neural network. Basic knowledge regarding deep learning is assumed.
  • Hang Xu, PhD Candidate, will speak about his project: Application of word2vec to represent biological sequences

Schedule of events:

9.30am – Arrive, registration
10am – Welcome
10.15am – Talks begin
11.30am – Pitch session, recruitment
12pm – Work on projects
5.30 pm – Present results of work session

Location:

10/F Cheung Hing Industrial Building
12P Smithfield Road, Kennedy Town
See you on the 4th!

Coindex by Xavier M

Xavier M made a quick overview of a project called Coindex at the December Un-hackathon, aimed at studying the array of cryptocurrencies with a market point of view. The study is done to devise quantitative systematic strategies that trading bots would execute.

After acknowledging that yes, Bitcoin was matching all criteria characterizing a bubble, Xavier refocused the challenge towards building something profitable out of it, whether it’s a bubble or not.

In a talk session that did not require or demand any programming, quantitative analyst Xavier proposed several applications that some research on cryptocurrencies could provide: trading cross-exchange arbitrages, identifying and following trends and investing in low-frequency trading strategies which provide a return similar to an individual trade while mitigating the risk.

Short-term horizon trading

The first category falls into trading with a short-term horizon, also known as day-trading, and Xavier showed a simple cross-market arbitrage monitor of real time opportunities for profit that could be made by buying cheap in one exchange and simultaneously selling high in another one, across six exchanges.

image1

The app is elementary and aims simply to instruct, but could be extended easily to more complex real-time arbitrages, and also by adding trading functions for identified arbitrages.

Building more complex arbitrages or simply understanding the detailed working of the market, or microstructure, means a bit of data science has to come into play.

To do this, we can exploit the order books that each exchange publicly releases in real time. This kind of data allows study of the market microstructure and enables the design of high-frequency strategies. A full field of research can be explored (see for example Marco Avellaneda and Sasha Stoikov’s High-frequency trading in a limit order book or Rama Cont, Stoikov and Rishi Talreja’s A stochastic model for order book dynamics.

An example strategy would be to examine if one market is lagged compared to others. If this is the case, then other markets can surely be used as predictors.

Another example would be to study big orders, and see how to make a profit out of these.

image3

If for example, as it is often heard, crypto markets are completely manipulated, then it could be interested to be able to identify manipulation, and based on the properties of such event, use it for profit.

Xavier provided a database made of order books of six exchanges retrieved every 30 seconds available to any data scientist wanting to design price prediction models or other strategies based on order book data.

Low-frequency investment strategies

The second category was about designing low-frequency investment strategies, where trading seldom occurs, but carries a lower risk than simply holding Bitcoins. Such risk reduction can classically be achieved through diversification, but as was shown in a study by R. Porsch, currencies recently tend to correlate to each other, reducing the benefit of diversification.
Nevertheless, other tactics are possible. For instance, systematic rebalancing with fixed weights for each currency, so every month, week or day the portfolio is rebalanced so that it holds the same value of every currency in USD equivalent. Following this while Bitcoin does an impressive 15x, the least performing portfolio does 30x, and with an equivalent volatility level.

image2

These are extremely simple investment ideas, and many more can be designed to reduce risk (volatility) but not the return.

About the author

Xavier Mathieu developed his career as a quantitative team manager with BNP Paribas. He is now the CEO of Modwize limited. He co-organises the group data science Hong Kong.

December 10: Un-hackathon #4

Our fourth Un-hackathon took on a new structure. Adding to the programme of hands-on, project style skillsharing and development over a day of focused hacking, organisers arranged talks from industry leaders and accomplished practitioners.

About 40 people came to hear the talks and join the traditional hackathon groups which we go into detail about in separate posts, linked below. For more detailed writing on each project’s challenges and achievements, scroll past the talks section.

The talks

Lavine Hemlani and Bilal Khan from Accelerated HK began the talks with an inspiring speech on the future of artificial intelligence. Khan posed this call to action: “Do we let AI be in the hands of very few people?” We don’t, and the pair told us their strategy to teach AI and grow a community of practitioners so that this emerging power is not just a tool for big business.

Robert Porsch PhD. spoke on genomics — the mapping and study of the genome — and the problems he was facing in dealing with 80-90 gigabyte genome data sets. His work on human genomes involves seeking out unique DNA patterns of complex illnesses, sometimes hidden in chains of thousands of mutations — in order to identify them and predict the genetic causes of diseases such as huntington’s or the risk of developing cancer.

Gogovan’s Michal Szczecinski — Hong Kong’s first unicorn company — took us through his role in predicting demand and spotting fraud in his business. By constructing visualisations and dashboards, his business can work with greater oversight. As he says, his role is “to facilitate a smarter decision”. He shared his six steps to general optimisation: learn, brainstorm, prioritise, develop, execute, analyse, and showed us a methodically produced manual on practicing data science.

Ho Wa Wong, an open data activist, has been reconstructing government data into structured datasets which the Hong Kong government hasn’t yet made publicly usable in a convenient way. The data.gov.hk site has public government data but it’s limited both in historical data and range, and the date is often in various formats. Wong aims to add to the pool of available data by coding systems to scrape and clean the data and make the sets available here. He also parsed the Legco transcript and made it available here.

Similarly on a public focused level, Data Science Hong Kong organiser Wang Xiaozhou has been using a hidden Markov model in an attempt to improve geocoding in Hong Kong. The model is being trained to spot the particles of addresses and as Wang showed, by training the model to identify parts of the address such as street types or the district name the model can learn fast and develop speed after few steps, even when the format of the address is changing.

From the academic sector, Leif Sigerson has been mining the Twitter API to find dialogues from small communities talking about psychological problems. Using R and rtweet he would scrape sets of up to 3200 tweets from identified users to build an “ego map”, which connects dots between a user and their followers, and then aims to map out the connectedness of their followers to each other. The psychology department at his university was excited by the prospect of a large sample size but he said it was still sceptical about the methodologies employed in his approach.

The projects

Jason Chan Jin-an had built a predictor of MMA match outcomes which he said had more than 70 per cent accuracy by comparing fighters on factors like their winning record and physical attributes, and his model was indeed earning him some money, he said. Jason came to the un-hackathon to enhance his predictor by automatically setting fighters status to active or retired, thereby avoiding comparisons between active and non-active fighters.

See more about his project here.

Xavier Mathieu, a Data Science Hong Kong organiser and former quantitative team manager at BNP Paribas conducted talks on how to study cryptocurrencies and develop low- and high-frequency trading strategies as well as identifying manipulation and seeing opportunities in them.

See more about his project here.

The UFC MMA Predictor Web App by Jason Chan Jin-an

The UFC MMA Predictor is a web app built by Jason Chan Jin An to predict winners of upcoming UFC fights. The web app is entirely built in Python and uses a combination of dynamic web scraping, data cleansing, machine learning and web dev.

image001

The challenge

The project was showcased at the December 10 Un-hackathon hoping to overcome the challenge of displaying lists of active and inactive fighters. Before the Un-hackathon, the fighter list was displaying all fighters that had ever fought in the UFC, some of whom had retired. This would mean results would return irrelevant items and mislead users.

The achievement

During the Un-hackathon, thanks to feedback from participants, Chan found a Wikipedia page that has the list of current fighters in the UFC which is frequently updated.

Chan then built a Scrapy spider to crawl the page to retrieve the active fighter list, which then subsets his fighter database to only active fighters. Chan then redeployed the web app.

image003

Web app process

The spiders are scheduled to run every week. The data is then automatically pushed to Amazon S3, where the website then reads the data. The fighter data are kept current.

For more information

For contacts and more information about the web app and documentation, please visit the following links:

GitHub: https://github.com/jasonchanhku/UFC-MMA-Predictor

Jupyter Documentation: https://github.com/jasonchanhku/UFC-MMA-Predictor/blob/master/UFC%20MMA%20Predictor%20Workflow.ipynb

LinkedIn: https://www.linkedin.com/in/jason-chan-jin-an-45a76a76/