Project
There are a list of software I wrote or was highly involved in since 2012. I am an enthusiast of open-source, I like Github and do not like Gitlab.
Workload Zoo
- Zoo of real Database workloads
- Disclose Later
ResTune
- ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases, SIGMOD 2021(To Appear).
- Minimize the resource cost while guaranteeing the SLA requirement by tuning the database knobs
- Leverage the tuning experience across workloads and hardware from the cloud provider’s perspective
Leaper
- Leaper: A Learned Prefetcher for Cache Invalidation in LSM-tree based Storage Engines (paper), VLDB 2020.
- Address LSM Unstable Performance Issue
- Design Learned Prefetcher, Plug ML into OLTP Systems
Symphony
- Smiplified and unified AI pipeline system
- An end-to-end and assembling AI Software Platform(Model Building, Data Transformation, Training, Serving and more)
- Participated in the architecture designs and was highly involved in early development
DyNet
- The Dynamic Neural Network Toolkit
- Multi-device support, Self-described I/O format for native save/load, ParameterCollection Interface, etc
- 100+ commits
- Top #5 contributor
Poseidon-Tensorflow
- Distributed Tensorflow implementation upon Poseidon communication lib
- Overlap sync(communication)/computation during Mini-batch SGD
- Make your native Tensorflow model training script distributed with zero code modification!
- Linear speedup up to 64 GPUs
Apache HAWQ
- Native SQL Engine on Hadoop, Collaborative Project at Pivotal
- Voted as ASF Committer: 90+ Commits, 50+ JIRAS, 50+ mail threads in dev/user mailing list in 7 months
- Top #3 committer in the open source community
- 2016
Rec
- Recommendation Cloud Service based on SQL: combine infrastructures inside Pivotal including HAWQ, MADlib, Cloud Foundry
- Winner Project of the Hackday Competition✌️
- With @lma, May 2016
Paracel Toolkits
- Distributed Algorithm Library built on Paracel framework
- Algorithms include regression(ridge, lasso), classification(lr), clustering(kmeans, Spectral Clustering), graph processing(pagerank), recommendation systems(svd, mf, similarity, decision tree, als) and topic modeling(lda)
- @Douban.Inc, 2014
Plato
- Realtime Recommendation System based on Factor Model
- Online/nearline/offline(balltree/regression/matrix factorization) three-layer backends
- An application of Plato called platoon is used for Douban FM, improve 5% completion rate
- @Douban.Inc, 2015.
Paracel
- A distributed optimization framework with parameter server
- Open sourced at github
- Internal used at Douban
- @Douban.Inc, 2013
afc
- Atomic Forces Calculation
- Try to solve Relaxation Problem
- The project is based on the research of Chongyu Wang, Shanying Wang and Tao Cui.
- @Department of Physics, Tsinghua University, 2012
Threp
- A remapping system for the Earth System Model
- @Department of Computer Science and Technology, Tsinghua University