Awesome H2O
Below is a curated list of all the awesome projects, applications, research, tutorials, courses and books that use H2O, an open source, distributed machine learning platform. H2O offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models, Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Cox Proportional Hazards, K-means, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (AutoML).
H2O.ai produces many tutorials, blog posts, presentations and videos about H2O, but the list below is comprised of awesome content produced by the greater H2O user community.
We are just getting started with this list, so pull requests are very much appreciated!
If you think H2O is awesome too, please ⭐ the H2O GitHub repository.
Contents
Blog Posts & Tutorials
- Anomaly Detection With Isolation Forests Using H2O Dec 03, 2018
- Predicting residential property prices in Bratislava using recipes - H2O Machine learning Nov 25, 2018
- Inspecting Decision Trees in H2O Nov 07, 2018
- Machine Learning With H2O — Hands-On Guide for Data Scientists Jun 27, 2018
- Using machine learning with LIME to understand employee churn June 25, 2018
- Analytics at Scale: h2o, Apache Spark and R on AWS EMR June 21, 2018
- Automated and unmysterious machine learning in cancer detection Nov 7, 2017
- Time series machine learning with h2o+timetk Oct 28, 2017
- Sales Analytics: How to use machine learning to predict and optimize product backorders Oct 16, 2017
- HR Analytics: Using machine learning to predict employee turnover Sep 18, 2017
- Autoencoders and anomaly detection with machine learning in fraud analytics May 1, 2017
- Building deep neural nets with h2o and rsparkling that predict arrhythmia of the heart Feb 27, 2017
- Predicting food preferences with sparklyr (machine learning) Feb 19, 2017
- Moving largish data from R to H2O - spam detection with Enron emails Feb 18, 2016
- Deep learning & parameter tuning with mxnet, h2o package in R Jan 30, 2017
- Are categorical variables getting lost in your random forests? Oct 28, 2016
Books
- Machine Learning Using R Karthik Ramasubramanian, Abhishek Singh. (2016)
- Practical Machine Learning with H2O: Powerful, Scalable Techniques for Deep Learning and AI Darren Cook. (2016)
- Disruptive Analytics Thomas Dinsmore. (2016)
- Computer Age Statistical Inference: Algorithms, Evidence, and Data Science Bradley Efron, Trevor Hastie. (2016)
- R Deep Learning Essentials Joshua F. Wiley. (2016)
- Spark in Action Petar Zečević, Marko Bonaći. (2016)
- Handbook of Big Data Peter Bühlmann, Petros Drineas, Michael Kane, Mark J. van der Laan (2015)
Research Papers
- Machine Learning Methods to Perform Pricing Optimization. A Comparison with Standard GLMs. (2018)
- Algorithmic trading using deep neural networks on high frequency data Andrés Arévalo, Jaime Niño, German Hernandez, Javier Sandoval, Diego León, Arbey Aragón. (2017)
- Generic online animal activity recognition on collar tags Jacob W. Kamminga, Helena C. Bisby, Duc V. Le, Nirvana Meratnia, Paul J. M. Havinga. (2017)
- Robust and flexible estimation of data-dependent stochastic mediation effects: a proposed method and example in a randomized trial setting Kara E. Rudolph, Oleg Sofrygin, Wenjing Zheng, and Mark J. van der Laan. (2017)
- Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition Vincent Dorie, Jennifer Hill, Uri Shalit, Marc Scott, Dan Cervone. (2017)
- Using deep learning to predict the mortality of leukemia patients Reena Shaw Muthalaly. (2017)
- Use of a machine learning framework to predict substance use disorder treatment success Laura Acion, Diana Kelmansky, Mark van der Laan, Ethan Sahker, DeShauna Jones, Stephan Arnd. (2017)
- Ultra-wideband antenna-induced error prediction using deep learning on channel response data Janis Tiemann, Johannes Pillmann, Christian Wietfeld. (2017)
- Inferring passenger types from commuter eigentravel matrices Erika Fille T. Legara, Christopher P. Monterola. (2017)
- Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500 Christopher Krauss, Xuan Anh Doa, Nicolas Huckb. (2016)
- Identifying IT purchases anomalies in the Brazilian government procurement system using deep learning Silvio L. Domingos, Rommel N. Carvalho, Ricardo S. Carvalho, Guilherme N. Ramos. (2016)
- Predicting recovery of credit operations on a Brazilian bank Rogério G. Lopes, Rommel N. Carvalho, Marcelo Ladeira, Ricardo S. Carvalho. (2016)
- Deep learning anomaly detection as support fraud investigation in Brazilian exports and anti-money laundering Ebberth L. Paula, Marcelo Ladeira, Rommel N. Carvalho, Thiago Marzagão. (2016)
- Deep learning and association rule mining for predicting drug response in cancer Konstantinos N. Vougas, Thomas Jackson, Alexander Polyzos, Michael Liontos, Elizabeth O. Johnson, Vassilis Georgoulias, Paul Townsend, Jiri Bartek, Vassilis G. Gorgoulis. (2016)
- The value of points of interest information in predicting cost-effective charging infrastructure locations Stéphanie Florence Visser. (2016)
- Adaptive modelling of spatial diversification of soil classification units. Journal of Water and Land Development Krzysztof Urbański, Stanisław Gruszczyńsk. (2016)
- Scalable ensemble learning and computationally efficient variance estimation Erin LeDell. (2015)
- Superchords: decoding EEG signals in the millisecond range Rogerio Normand, Hugo Alexandre Ferreira. (2015)
- Understanding random forests: from theory to practice Gilles Louppe. (2014)
Benchmarks
- Are categorical variables getting lost in your random forests? - Benchmark of categorical encoding schemes and the effect on tree based models (Scikit-learn vs H2O). Oct 28, 2016
- Deep learning in R - Benchmark of open source deep learning packages in R. Mar 7, 2016
- Szilard's machine learning benchmark - Benchmarks of Random Forest, GBM, Deep Learning and GLM implementations in common open source ML frameworks. Jul 3, 2015
Presentations
- Pipelines for model deployment Apr 25, 2017
- Machine learning with H2O.ai Jan 23, 2017
Courses
- UCLA: Tools in Data Science (STATS 418) - Masters of Applied Statistics Program.
- GWU: Data Mining (Decision Sciences 6279) - Masters of Science in Business Analytics.
- University of Cape Town: Analytics Module - Postgraduate Honors Program in Statistical Sciences.
- Coursera: How to Win a Data Science Competition: Learn from Top Kagglers - Advanced Machine Learning Specialization.
Utilities
License
To the extent possible under law, H2O.ai has waived all copyright and related or neighboring rights to this work.