背景介绍
由Datawhale组织的「一周算法实践」活动,通过短期实践一个比较完整的数据挖掘项目,迅速了解相关实际过程
GitHub Link
GitHub - Datawhale Datamining Practice - awyd234
任务描述
【任务1.2 - 模型构建】构建随机森林、GBDT、XGBoost和LightGBM这4个模型,评分方式任意。
1. 随机森林
1 | from sklearn.ensemble import RandomForestClassifier |
Output
1 | RandomForest fit finished, score: 0.7694463910301331 |
2. GBDT
1 | from sklearn.ensemble import GradientBoostingClassifier |
Output
1 | GBDT fit finished, score: 0.7806587245970568 |
3.XGBoost
3.1 官方介绍
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.
3.2 环境准备
安装相关第三方包
1 | pip install -i install -i https://pypi.doubanio.com/simple/ xgboost |
3.3 代码运行
1 | from xgboost import XGBClassifier |
Output
1 | XGBT fit finished, score: 0.7855641205325858 |
4.LightGBM
4.1 官方介绍
LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:
Faster training speed and higher efficiency.
Lower memory usage.
Better accuracy.
Support of parallel and GPU learning.
Capable of handling large-scale data.
4.2 环境准备
安装第三方包
1 | pip install -i install -i https://pypi.doubanio.com/simple/ lightgbm |
4.3 遇到的错误
然而当尝试import时
1 | from lightgbm import LGBMClassifier |
出现了下列错误提示
1 | Traceback (most recent call last): |
在GitHub Issue上寻找了一种方法尝试进行解决,此前CSDN也搜索过类似问题,解释是LightGBM编译依赖OpenMP,Apple Clang不支持,于是尝试使用以下方式进行解决
1 | pip uninstall lightgbm |
我的命令行没有g++-8和gcc-8,于是更改成了
1 | export CXX=g++ CC=gcc |
当执行cmake ..时,再次出错
1 | -- The C compiler identification is AppleClang 9.1.0.9020039 |
这个是cmake版本问题,于是更新了下cmake
1 | brew upgrade cmake |
当删除了build目录,重新cmake ..时,再次出错
1 | -- The C compiler identification is AppleClang 9.1.0.9020039 |
于是,尝试执行了
1 | brew install libomp |
此时再次执行cmake,再次出错,但这时import不报错了,所以先放着,估计是和libomp的安装有关,和git clone应该没关系,因为我是用virtualenv开了虚拟环境跑的
4.4 代码实现
1 | from lightgbm import LGBMClassifier |
Output
1 | LightGBM fit finished, score: 0.7701471618780659 |
4.5 参数分析
通过进入LGBMClassifier内部,其继承自LGBMModel,于是分析LGBMModel的初始化函数
1 | Parameters |
- boosting_type
- 有时也称为boosting或者boost,默认gbdt
- 本例子中,如果直接设置gbdt,和直接用GBDT跑出来的结果不一致,可能和其他默认参数设置有关