丰泉机械(丰泉环保电力有限公司怎么样)

barry0012个月前产品信息410

  机器学习过程中的四个误区:

数据泄露;过拟合;数据采用和切分;数据质量。

  In a recent presentation, Ben Hamnerdescribed the common pitfalls in machine learning projects he and his colleagues have observed during competitions on Kaggle.

  The talk was titled “Machine Learning Gremlins” and was presented in February 2014 at Strata.

  In this post we take a look at the pitfalls from Ben’s talk, what they look like and how to avoid them.

  Machine Learning Process

  Early in the talk, Ben presented a snap-shot of the process for working a machine learning problem end-to-end.

  

  Machine Learning Process

  Taken from “Machine Learning Gremlins” by Ben Hamner

  This snapshot included 9 steps, as follows:

Start with a business problem

Source data

Split data

Select an evaluation metric

Perform feature extraction

Model Training

Feature Selection

Model Selection

Production System

  He commented that the process is iterative rather than linear.

  He also commented that each step in this process can go wrong, derailing the whole project.

  Discriminating Dogs and Cats

  Ben presented a case study problem for building an automatic cat door that can let the cat in and keep the dog out. This was an instructive example as it touched on a number of key problems in working a data problem.

  

  Discriminating Dogs and Cats

  Taken from “Machine Learning Gremlins” by Ben Hamner

  Sample Size

  The first great takeaway from this example was that he studied accuracy of the model against data sample size and showed that more samples correlated with greater accuracy.

  He then added more data until accuracy leveled off. This was a great example of understanding how easy it can be get an idea of the sensitivity of your system to sample size and adjust accordingly.

  Wrong Problem

  The second great takeaway from this example was that the system failed, it let in all cats in the neighborhood.

  It was a clever example highlighting the importance of understanding the constraints of the problem that needs to be solved, rather than the problem that you want to solve.

  Pitfalls In Machine Learning Projects

  Ben went on to discuss four common pitfalls in when working on machine learning problems.

  Although these problems are common, he points out that they can be identified and addressed relatively easily.

丰泉机械(丰泉环保电力有限公司怎么样)

  

  Overfitting

  Taken from “Machine Learning Gremlins” by Ben Hamner

Data Leakage: The problem of making use of data in the model to which a production system would not have access. This is particularly common in time series problems. Can also happen with data like system id’s that may indicate a class label. Run a model and take a careful look at the attributes that contribute to the success of the model. Sanity check and consider whether it makes sense. (check out the referenced paper “Leakage in Data Mining” PDF)

Overfitting: Modeling the training data too closely such that the model also includes noise in the model. The result is poor ability to generalize. This becomes more of a problem in higher dimensions with more complex class boundaries.

Data Sampling and Splitting: Related to data leakage, you need to very careful that the train/test/validation sets are indeed independent samples. Much thought and work is required for time series problems to ensure that you can reply data to the system chronologically and validate model accuracy.

Data Quality: Check the consistency of your data. Ben gave an example of flight data where some aircraft were landing before taking off. Inconsistent, duplicate, and corrupt data needs to be identified and explicitly handled. It can directly hurt the modeling problem and ability of a model to generalize.

丰泉机械(丰泉环保电力有限公司怎么样)

Summary

  Ben’s talk “Machine Learning Gremlins” is a quick and practical talk.

  You will get a useful crash course in the common pitfalls we are all susceptible to when working on a data problem.

  出处:machinelearningmastery。

标签: 丰泉机械

相关文章

液压电梯(液压电梯的原理图详解)

液压电梯(液压电梯的原理图详解)

    潮人说:“大连又上榜啦!”这两天大连又有一条好消息引起不少人关注2016年度中国表现最佳城市大连上榜啦!!  近日米尔肯研究所在北京发表2016年度“中国最佳表现城市”指标报告。我大连力压北京...

机械常识(机械常识答案)

机械常识(机械常识答案)

  【导语】机器表作为高周详计时仪器一样往常我们分为两种:手动上链和自动上链。机器表的长处在于无需互换电池,只需反常使用和如期的去中止保养。机器表的保养至关重要,使得手表的使用寿数得以延伸和外表件的美...

徐工吊车(徐工吊车售后服务电话)

徐工吊车(徐工吊车售后服务电话)

  为庆祝国庆节的到来,喜迎十九大胜利召开,9月24日天安门广场开始布置“祝福祖国”巨型花篮,徐工起重机以其优良性能当仁不让地承担起这项光荣任务。24日凌晨,两辆徐工130吨吊车驶入天安门广场,将花盘...

机械五金配件(机械五金配件加工管理系统)

机械五金配件(机械五金配件加工管理系统)

  五金冲压件在各个行业几乎都有应用,种类繁多,应用十分广泛。下面铭丰庆五金制品厂小编主要是为大家整理介绍五金冲压件的分类和应用。  一、五金冲压件,可以分为小五金冲压件和大五金冲压件,对于五金冲压件...

吊车大臂伸缩的原理图(吊车大臂伸缩怎么操作)

吊车大臂伸缩的原理图(吊车大臂伸缩怎么操作)

找小月牵红线:13879555447   宜春相亲会成立于2017年12月,属宜春在线旗下机构,目前注册嘉宾一万+,是一家集情感维护、婚姻介绍、一对一牵线,自主相亲等服务为一体的互联网相亲平台。我们的...

机械原子灰(机械原子灰打磨后很多小孔怎么回事)

机械原子灰(机械原子灰打磨后很多小孔怎么回事)

  水性修补腻子比传统腻子更优吗?水性腻子如何使用?  装饰和各类家具木制品表面缺陷的修补(木纹、棕眼、钉眼、节疤、裂缝、拼缝、缺口等)一直是很多木制品厂棘手的问题,用于金属行业的原子灰已被好多厂家延...

发表评论    

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。