丰泉机械(丰泉环保电力有限公司怎么样)

barry0011个月前产品信息240

  机器学习过程中的四个误区:

数据泄露;过拟合;数据采用和切分;数据质量。

  In a recent presentation, Ben Hamnerdescribed the common pitfalls in machine learning projects he and his colleagues have observed during competitions on Kaggle.

  The talk was titled “Machine Learning Gremlins” and was presented in February 2014 at Strata.

  In this post we take a look at the pitfalls from Ben’s talk, what they look like and how to avoid them.

  Machine Learning Process

  Early in the talk, Ben presented a snap-shot of the process for working a machine learning problem end-to-end.

  

  Machine Learning Process

  Taken from “Machine Learning Gremlins” by Ben Hamner

  This snapshot included 9 steps, as follows:

Start with a business problem

Source data

Split data

Select an evaluation metric

Perform feature extraction

Model Training

Feature Selection

Model Selection

Production System

  He commented that the process is iterative rather than linear.

  He also commented that each step in this process can go wrong, derailing the whole project.

  Discriminating Dogs and Cats

  Ben presented a case study problem for building an automatic cat door that can let the cat in and keep the dog out. This was an instructive example as it touched on a number of key problems in working a data problem.

  

  Discriminating Dogs and Cats

  Taken from “Machine Learning Gremlins” by Ben Hamner

  Sample Size

  The first great takeaway from this example was that he studied accuracy of the model against data sample size and showed that more samples correlated with greater accuracy.

  He then added more data until accuracy leveled off. This was a great example of understanding how easy it can be get an idea of the sensitivity of your system to sample size and adjust accordingly.

  Wrong Problem

  The second great takeaway from this example was that the system failed, it let in all cats in the neighborhood.

  It was a clever example highlighting the importance of understanding the constraints of the problem that needs to be solved, rather than the problem that you want to solve.

  Pitfalls In Machine Learning Projects

  Ben went on to discuss four common pitfalls in when working on machine learning problems.

  Although these problems are common, he points out that they can be identified and addressed relatively easily.

丰泉机械(丰泉环保电力有限公司怎么样)

  

  Overfitting

  Taken from “Machine Learning Gremlins” by Ben Hamner

Data Leakage: The problem of making use of data in the model to which a production system would not have access. This is particularly common in time series problems. Can also happen with data like system id’s that may indicate a class label. Run a model and take a careful look at the attributes that contribute to the success of the model. Sanity check and consider whether it makes sense. (check out the referenced paper “Leakage in Data Mining” PDF)

Overfitting: Modeling the training data too closely such that the model also includes noise in the model. The result is poor ability to generalize. This becomes more of a problem in higher dimensions with more complex class boundaries.

Data Sampling and Splitting: Related to data leakage, you need to very careful that the train/test/validation sets are indeed independent samples. Much thought and work is required for time series problems to ensure that you can reply data to the system chronologically and validate model accuracy.

Data Quality: Check the consistency of your data. Ben gave an example of flight data where some aircraft were landing before taking off. Inconsistent, duplicate, and corrupt data needs to be identified and explicitly handled. It can directly hurt the modeling problem and ability of a model to generalize.

丰泉机械(丰泉环保电力有限公司怎么样)

Summary

  Ben’s talk “Machine Learning Gremlins” is a quick and practical talk.

  You will get a useful crash course in the common pitfalls we are all susceptible to when working on a data problem.

  出处:machinelearningmastery。

标签: 丰泉机械

相关文章

拉链机械设备价格(拉链机器设备厂)

拉链机械设备价格(拉链机器设备厂)

  导读  我从来不认为有什么朝阳和夕阳行业,但我认为有朝阳企业和夕阳企业。    3月5日,在天鹰资本2016年会上,经济学家许小年现场发表“从资本积累到技术创新“的主题演讲,分享他对当前中国宏观经...

大型玉米施肥机械(玉米地施肥机器)

大型玉米施肥机械(玉米地施肥机器)

  保定招聘网  关注:保定招聘||保定人才||保定求职||保定找工作||保定招聘会||保定人才市场||的亲们,您必须要关注保定最火爆的本土招聘网公众号:baixingjob 点击左下方“阅读原文”...

无锡市输送机械有限公司(无锡输送机厂家)

无锡市输送机械有限公司(无锡输送机厂家)

序号 公司名称 营业收入(万) 净利润(万) 报告年度 1 江阴澄星实业集团有限公司 2,427,007 47,951 2016/12/31 2...

新乡市大华起重机械厂(河南大华重工科技有限公司)

新乡市大华起重机械厂(河南大华重工科技有限公司)

考试内容搭建作品考级第一课:《跷跷板》1,搭建要求:(1) 具备跷跷板大体框架, 模型具备稳定性;(2) 跷跷板在静止状态可保持水平平衡;(3) 利用三角形稳定性来保证跷跷板支架的稳固;2,杠杆的五要...

五金建材店赚钱吗(建材五金店卖什么)

五金建材店赚钱吗(建材五金店卖什么)

我和媳妇是做建材生意起家的,之前一直经营得不错, 但这两年传统行业不景气,建材这行也不好做了,就考虑要不要换个行当。但说转行,问题很多,一来各行各业都有门槛,二来现在的店面怎么办,三来之前积累的资源浪...

济工机械(提升机咨询济南济工机械厂)

济工机械(提升机咨询济南济工机械厂)

现如今,农村创业者不在少数,而创业项目也不知凡几。但调查发现,凡是选择玉米生产加工的厂家,发展前景都很好,因此,如果你也想在农村这片土地上有所作为,可以考虑入手一台玉米深加工机械,利润空间很大。 以玉...

发表评论    

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。