第2章 机器学习介绍
Artificial intelligence and machine learning are creating new cognitive tools that enhance our ability to think at scale and that capacity will produce rewards for every person on the planet.
人工智能和机器学习正在创造新的认知工具,增强我们大规模思考的能力,并且能力将使地球上的每个人获益。
Vint Cerf, Chief Internet Evangelist Google, one of the Fathers of the Internet
Vint Cerf,谷歌首席互联网传播者,互联网之父
THREE REASONS DATA SCIENTISTS SHOULD READ THIS CHAPTER
数据科学家应该阅读本文的三个理由
If you are a data scientist, there are three solid reasons to read this chapter.
如果您是数据科学家,有三个理由你应该阅读本章。
First is to let me know if there is something else you would like marketing professionals to know.
首先,请让我了解你是否想让营销专业人员了解的更多,本文是不是有什么遗漏。
That will be very helpful in later editions and render me eternally grateful to you.
您的建议将在以后的版本中非常有用,我会永远感谢你。
Next, you should read this chapter to understand just how much marketing people will know once they have read it so you don’t have to plow through the basics again.
其次,您应该阅读本章,以便您知晓,读过这本书的营销从业人员对于人工智能了解多少,下次你和他们打交道时,就不用再介绍一些基本概念了。
If you come across a marketing person you haven’t worked with before and it seems he or she really doesn’t quite get it, share a copy of this book.
如果你遇到一个新的营销人员,从来没有合作过,他对于人工智能不太了解,那么你就可以把这本书分享给他。
Third, you will understand just how much (or how little) a marketing person cares about the fine details of how the analytical sausage is made.
第三,通过这本书,您会了解一个营销人员是否真的关心数据分析的细节。
You’ll be able to have a meaningful conversation
这样,你就能进行一场有深度的对话了。
59
59
:
人工智能营销
60
with marketing professionals, knowing the proper level of language to use around them without their eyes glazing over or their phone suddenly becoming the most important thing in the universe.
与营销专业人士在一起时,使用合适的用语非常重要,这样可以避免他们的注意力分散。
EVERY REASON MARKETING PROFESSIONALS SHOULD READ THIS CHAPTER
营销专业人士应该阅读本书的理由
If you’re in marketing, this chapter provides the glossary, the expla- nation, and the translation you need to be on the other side of that meaningful conversation.
如果你在营销行业,本章将介绍专业术语,基本概念,以便你可以进行一场专业的对话。
This may well be the chapter that you skim and then go back to over time, rather than learn and internalize.
你以后可能会经常会翻阅这些内容,而不仅仅的现在学习一下而已。
For you, this is a brief peek behind the curtain.
对你来说,这是大餐前的开胃菜。
To create a killer brochure, you don’t need to understand ink dye sublimation onto paper, but you really should know the difference between aqueous coating and UV coating as well as varnish, lamination, spot coating, foil stamping, embossing, and letterpress, and why you might want to spend more for one over the other.
要印刷一本精品读物,你不需要了解墨水染料与纸张的化学反应,但是你应该了解水性涂料和UV涂料之间的区别,了解清漆,层压,点涂,烫印,压花和凸版印刷。以及你为什么应该在这一点上花钱而不是另一点。
How those things are accomplished is not your problem.
这些工序是怎么实现的不是你关系的问题。
What do they cost and what are the results?
他们的成本是多少,能带来什么效果?
That would be your job.
这些才是你应该关心的。
WE THINK WE’RE SO SMART
我们认为我们很聪明
Pattern recognition is child’s play—even for a child.
模式识别是小儿科的游戏-即时对于孩子而言。
Why is it so impres- sive in a machine?
为什么在机器里面却令人印象深刻?
Let’s look at speed, reliability, and unemotional decision making.
让我们来看看速度,可靠性和冷酷的决策。
Humans evolved to notice movement (will it eat me?) and differ- entiate between subtle shades of color (can I eat it?).
经过进化,人类会注意到物体运动(它会吃掉我吗?),区别有细微差别的颜色(我可以吃它吗?)
We can see when something is not quite right and take precautions against harm.
我们可以察觉到什么时候情况不大对并采取预防措施来防止伤害发生。
We can see if something’s missing.
我们可以察觉是否缺少某些东西。
We’re proud of our abilities to infer and deduce, but computers can do all of the above tirelessly and endlessly, and get better along the way.
我们为自己的推断和演绎能力感到自豪,但计算机可以不知疲倦地,无休止地完成上述所有工作,并在这个过程中会变得越来越熟练。
Professional marketers make decisions based on years of experi- ence.
专业营销人员根据多年的经验做出决策。
They also make decisions based on hunger, anger, envy, and politics.
他们还可以基于饥饿,愤怒,嫉妒和政治因素做出决定。
Sometimes, they’re just tired or overwrought.
有时,他们会感到疲惫或过度紧张。
A 2014 paper from the MIT Media Lab1 compared humans to machines in the task of choosing people most likely to con-
一份MIT媒体实验室2014年的论文将人和机器比较,去挑选最有可能
vert into mobile Internet users.
成为移动互联网用户的顾客
Not only did the machine choose better, it chose better customers.
机器不仅做的更好,而且还选出了更优质的客户。
The machine-selected customers had a 13-times-better conversion rate compared to those selected using best practices marketing techniques.
与通过营销人员选择的用户相比,机器选择的客户的转化率提高了13倍。
Further, 98 percent of the machine-selected converted customers renewed their mobile
不仅如此,活动过后,98%的机器选出的客户更换了他们的手机
61
Internet packages after the campaign, compared to 37 percent in the human-selected group.
套餐,营销人员选出的用户只有37%的人更换。
We should not turn all of our decision making over to the machines, but we should understand the power—and the shortcomings—of these new tools as we continue in the ongoing race between the amount of data we have and the number of conjectures we can derive from that data.
我们不应该把所有的决策权都交给机器,但是我们要认识到新工具的威力和缺陷,尤其在处理我们拥有的数据和我们可以从数据里获得的洞察的时候。
If all we know is age and gender (males 18–34), then our options are limited.
如果我们所了解的信息只是年龄和性别(男性18-34),那么我们的选择(洞察)是有限的。
But as we add new attributes (location, brand affinities, social media connections, website interactions, Facebook posts, store visits, education level, vehicle type, political party, latest tweets), we face millions of possible hypotheses.
但随着我们不断获得新的信息(位置,品牌影响力,社交媒体连接,网站互动,Facebook帖子,商店访问,教育水平,车型,政党,最新推文),我们面临数百万种可能的关系假设。
The more data we have, the more hypotheses we can test.
我们拥有的数据越多,我们需要检验的假设就越多。
The more tests we make, the higher the odds that our next marketing action will yield better results.
我们进行的检验越多,我们下一次营销行动产生更好结果的机会就越高。
Since that’s beyond our abilities, it’s time to call in the machines.
这种情况远远超出了我们的能力,所以是时候让机器出场了。
DEFINE YOUR TERMS
定义术语
You’ll recall Tom Mitchell saying, “The field of Machine Learning seeks to answer the questions, ‘How can we build computer systems that automatically improve with experience, and what are the fundamen- tal laws that govern all learning processes?’”
你可能会记得Tom Mitchell说过:“机器学习试图回答这些问题,’我们如何创建能够根据经验自动改进的计算机系统,以及管理所有学习过程的基本法则是什么?”
In his “The Discipline of Machine Learning,”2 Mitchell offers a logical approach to differentiat- ing machine learning from computer science and statistics.
在他的“机器学习”中,米切尔提供了一种逻辑方法,将机器学习与计算机科学和统计学区分开来。
The defining questions of computer science are:
定义了计算机科学的问题是:
How can we manually program computers to perform specific functions and solve problems, and which problems are inherently intractable?
我们要怎样去编写程序执行某个功能,解决某些问题,哪些问题又是计算机无法解决的?
The questions that largely define statistics are:
很大程度上定义了统计学的问题是:
What can we infer from historical information to predict the future?
我们可以从历史中获取什么信息去预测未来?
What conclusions can be inferred from data?
从数据中可以得出什么结论?
The defining questions for machine learning are:
定义了机器学习的问题是:
How can we build systems that automatically improve with experience?
我们如何构建能够根据经验自动改进的系统?
Can we get com- puters to decide for themselves what computational architectures and algorithms are most effective for manipulating data to reach a specific outcome?
我们可以让计算机自己决定,为了从数据中得到某个结果,哪些框架和算法是最优的吗?
Let’s start at 35,000 feet up with a simple definition by Aatash Shah,3 who described statistical modeling as “a formalization of rela- tionships between variables in the data in the form of mathematical equations,” while machine learning is “an algorithm that can learn from data without relying on rules-based programming.”
让我们换一种说法,基于Aatash Shah的简单定义,他将统计学建模描述为“数学方程形式描述的数据变量之间的关系”,而机器学习是“可以从数据中学习的算法”而不是依赖于规则的编程。”
62
Statistics is about plotting out the points and connecting the dots.
统计学是画出点并找到点之间的关系。
Machine learning is about figuring out if there are any dots, where they are, and how they are alike or different.
机器学习是弄明白:如果有任何点,它们在哪里,它们是相似的还是不同的。
Matt Gershoff, co-founder of the online optimization platform company Conductrics, compares basic cruise control to self-driving cars as examples of where we might deploy either standard program- ing or machine learning.
在线优化平台公司Conductrics的联合创始人Matt Gershoff将定速巡航控制与自动驾驶的关系作为一个例子,来更好的介绍标准的计算机编程或机器学习
Cruise control only needs to speed up and slow down based on what the target speed is set to and how fast the car is currently going.
定速巡航控制只需要根据设定的速度和汽车当前的速度进行加速或者减速。
That’s all.
这就行了。
If the car is going to hit something, then the driver has to use the brakes and the steering wheel to avoid a crash.
如果汽车快要撞上什么东西了,那么驾驶员必须使用制动器和方向盘以避免碰撞。
Unlike basic cruise control, which only needs sensor data about the car’s speed, self-driving additionally requires the conversion of visual information from cameras into meaningful inputs that the autonomous driving program can use to manage speed, direction, and avoidance of potential obstacles (other cars, people, etc.).
定速巡航只需要获取汽车速度的传感器数据,而自动驾驶还需要将来自摄像机的视觉信息转换为有意义的输入,自动驾驶利用这些输入来识别潜在障碍物的速度,方向,并进行规避(比如其他汽车,人等等。)
Furthermore, the car can become “smarter” if given more informa- tion.
此外,如果能获得更多信息,汽车的自动驾驶可以变得“更智能”。
If you introduce temperature and weather into the system, it can learn that cold and precipitation could result in ice, snow, and slush, requiring the vehicle to drive slower.
如果您将温度和天气导入系统,它可以知道温度低和降水可能导致冰,雪和雪泥,车辆需要开得更慢。
Whatever analogy you prefer, it all begins with models and all models are wrong.
无论你喜欢什么比喻,一切都从模型开始,而所有模型都是错误的。
ALL MODELS ARE WRONG
所有的模型都是错的
George Edward Pelham Box was a prolific author about all things sta- tistical and is well known for his observation, “All models are wrong, but some are useful.”
乔治·爱德华·佩勒姆·博克斯(George Edward Pelham Box)是一位统计学领域的博客作者,他的一个著名观点就是:“所有模型都错了,但有些模型很有用。”
When I was four years old, my brother and I got a Lionel train set for Christmas.
当我四岁的时候,我和我的兄弟为圣诞节准备了莱昂内尔火车。
It was pure awesomeness.
这真是太棒了。
My father wanted to play with it with us and we loved that, too.
我的父亲想和我们一起玩,我们也很喜欢。
We played with it for years and always brought it out just before Christmas.
我们玩了多年,总是在圣诞节前把它拿出来。
It fueled our imaginations.
它激发了我们的想象力。
It was electric.
它是电动的。
It was detailed.
它有很多细节的东西。
It was a scale model.
它是一个比例模型。
While you could really launch a helicopter from a flatbed car, you couldn’t ride it.
虽然你觉得可以从一辆车上驾驶一架直升机,但你却无法驾驶它。
It was just a model.
它只是一个模型。
(See Figure 2.1.)
When I was nine years old, we went to the New York World’s Fair and were dumbfounded by General Motors’ Futurama exhibit with its Lunar Colony, Undersea City, and City of the Future.
当我九岁的时候,我们去了纽约世界博览会。在那儿,我们被通用汽车的Futurama展览,月球殖民地,海底城市和未来之城惊呆了。
By the time I would be old enough to go places by myself, this stuff would have come true!
到我年龄足够大,可以自己旅行的时候,这些东西就会成为现实!
In 1964, it was only an imagined place.
而在1964年,它只是一个想象中的地方。
It was just a model.
它只是一个模型。
When I went to college, I gave serious thought to architecture.
当我上大学以后,我开始认真考虑建筑。
I liked the idea of designing something from scratch, something
我喜欢从零开始设计东西的想法
63
Figure 2.1 Lionel Helicopter Car
permanent, something I could point at and say, “I made that.”
做一些永久性的东西,一些东西让我可以宣称:我把它实现了。
I had to admit that what I really wanted was to build those astounding, perfect, intricate, 3-D cardboard mockups of what the building would look like.
我不得不承认,我真正想要的是建造那些惊人的,完美的,错综复杂的3-D纸板模型,以便我看看建筑物的样子。
They would have tiny trees and little cars and people the size of pencil erasers.
他们会有小树和小汽车以及橡皮擦大小的人。
They would be elaborate and complex, and they would completely communicate a vision.
这些模型将是复杂的,它们将完全传达一个愿景。
They would spell out everything anybody needed to know about how I saw the future.
他们会展示出别人需要知道的关于我如何看待未来的一切。
They would be visually engrossing, but they would lack the information necessary to actually build the thing.
在视觉上,它们会非常引人入胜,但它们缺乏实际建造这些东西所需的信息。
You couldn’t really open the doors and windows.
你无法真正打开门窗。
It was just a model.
它只是一个模型。
Accountants build accounting systems that show how money flows into, out of, and between businesses.
会计师建立财务系统,显示资金如何在企业之间流入,流出。
Website builders rely on infor- mation architecture designers who create a map of how people might expect to find what they are looking for and accomplish a specific task.
网站建设者依赖于网络架构设计师,他们创建了一个流程图,展示人们可能期望找到他们正在寻找的东西,并完成一些特定的任务。
Because (as David Weinberger likes to say) the world is messy, ana- log, and open to interpretation, these models only work well enough to get by.
因为(正如David Weinberger所说)世界是混乱的,可模拟的,并且可以解释,这些系统,流程只是很好地适应罢了。
Using any software or website, we quickly come across cir- cumstances that were not preconceived well enough to accommodate a certain situation.
使用任何软件或网站时,我们很快就会发现那些没有了解到足够多的信息,以适应某种处境。
We realize it’s not really the way the world works;
我们意识到世界并不是以我们想象的方式运行。
it’s just a model.
模型只是模型而已。
Where marketing meets big data meets artificial intelligence, there is an irrational expectation that we can collect enough information that it will auto-magically correlate the important bits and causality will drop out the bottom.
在营销遇见大数据,遇见人工智能的情况下,行业有一种非理性的期望:我们可以收集到足够的信息,它将自动地关联重要节点,因果关系将从底层剔除。
We are never going to be able to build a detailed model that describes and then predicts what an individual will do. Even with all of the behavioral, transactional, and attitudi- nal data in the world, humans are just too messy, analog, and open to too many interpretations.
我们永远无法构建描述并预测个人将做什么的详细模型。即使拥有世界上所有的行为,交易和态度数据,人类的行为也只是过于混乱,模拟,所以对很多的解释模型持开放态度。
64
Even as we make progress predicting the movement of swarming insects, fish, and cancer cells, we can only hope to create an attribution algorithm that is suggestive and directional.
即使我们在预测蜂群,鱼类和癌细胞的运动方面取得了进展,我们也只能希望创建一种新的归因算法,它很具有启发性和方向性。
We can only hope to create a model that helps us test theories.
我们只能希望创建一个帮助我们测试理论的模型。
We can only hope to create a model that is useful.
我们只希望创建一个有用的模型。
For all the data we collect, for all of the data storage we fill, for all of the big data systems we cluster in the cloud and sprinkle with artifi- cial intelligence, our constructs are not going to provide facts, unearth hard-and-fast rules, or give us ironclad assurance that our next step will result in specific returns on our investment.
对于我们收集的所有数据,对于我们存储的所有的数据,对于我们在云上聚集并部署了人工智能的所有大数据系统,我们的创造不会提供真相,发掘定律,或提供十足的保证,以便让我们的下一步的投资获得具体回报。
They are only models.
他们只是模型。
But some are useful.
不过有些模型很有用而已。
USEFUL MODELS
有用的模型
You are able to navigate on Earth because you have a model of it in your head.
你可以在地球上使用导航,因为你的脑子里有一个模型。
That model includes locations like a map and a whole bunch of rules about how the world works.
该模型包括很多要点,比如地图及关于世界如何运作的一整套规则。
You work out these rules over time, and as your capability grows, you try to make sure your mental map keeps up.
经历一段时间,您可以制定出这些规则,并且随着您的能力的增长,您会尝试确保您内心的地图也会更新。
A child spills his milk at dinner.
一个孩子在吃饭时洒了牛奶。
His mental model includes the distance from himself to the glass, the degree to which he must open his hand, and the amount of attention he needs to pay to complete the action.
他的内心模型包括从他自己到玻璃杯的距离,他张开手的程度,以及完成这些动作所需的注意力。
That last one is a killer because the other two elements change over time.
最后一个因素是最重要的,因为其他两个元素会随着时间而变化。
If the child is not careful, the milk spills once more and the crying commences.
如果孩子不小心,牛奶会再次撒出来,他又会开始哭泣。
You have a mental map in your head about your customers.
您的头脑中有关于客户的心理地图。
Experience, reading, seminars, and stories from colleagues have given you an image of the marketplace you’re addressing, the buying cycles you can count on, and the levels of engagement required to attract attention and convince a prospect to become a buyer.
来自同事的经验,阅读,研讨会和故事为您提供了您正在解决的市场形象,您可以依赖的购买周期,以及吸引注意力并说服潜在客户成为买家所需的参与程度。 个人的经验,书籍阅读,研讨会以及同事的故事,这些都会帮你解决面临的市场问题,还有产品的购买周期,为了吸引用户注意力并将之转化为产品的消费者所需要的参与程度等。
You have a feel for your competitors.
你对竞争对手有所了解。
You have a grasp of macroeconomics as well as the impact the weather has on sales.
您了解了宏观经济以及天气对销售的影响。
You might disagree with your boss, your colleagues, or your agency about any and all of the above, but through conversation, persuasion, and perhaps, coercion (with a fair amount of political maneuvering and one-upmanship), a common vision is crafted from which campaign plans emerge.
对于以上提到的因素,你可能与你的老板,你的同事或你的代理机构有不同的意见,但通过交谈,也许能说服他们,或许强迫他们接受(利用一些操控或者计谋),制定了一个共同的愿景,然后开始了营销活动计划。
The communal mental model of the above is updated by the results of those campaigns.
利用这些这些活动的结果,继续更新你的心理模型。
You had a collective hypothesis, and you executed on it and reaped the rewards for better or worse.
你有一个共同的假设,你执行它并获得结果,无论好坏。
Then you try again.
然后你再试一次。
If you try a lot and fail every now and then, you learn and become better.
如果你经常进行尝试并且偶尔也会有失败,你将会通过学习变得更好。
If you fail a lot, eventually you are no longer invited to try, try again.
不过如果你失败了很多,也许最终你不再被允许再次失败了。
65
A mathematical model is one you can test repeatedly as dry runs.
数学模型是可以反复测试的。
The spreadsheet is a testament to the game of What If.
Excel表格可以帮我计算这些假设情况。
If we lower the price by this much, and that increases the sales by this much, will we make up for it in volume?
如果我们将价格降低一些,因此销售额增加了一些,那么这么做是否可以弥补它的数量上的变化呢?
An Excel model can also help us with simple decision making when we assign weights to action and outcomes.
当我们在行动和结果之间权衡时,Excel模型也可以帮助我们做出简单的决策。
You’re often accosted by colleagues to do different types of work.
你经常和同事合作,完成不同类型的工作。
How do you prioritize?
你怎样去权衡?
Make a model.
构建一个模型。
Score each task on a scale of 1 to 5 based on whether it is easy, cheap, fast, and has a significant impact.
基于任务是否容易,经济,快速并且具有显着影响,按1到5的等级对每项任务进行评分。
As you can see in Figure 2.2, Task A is easy and cheap and fast, but has a low impact.
正如您在图2.2中看到的那样,任务A简单,经济且快速,但影响较小。
Still, it is so easy and cheap and fast that it scores the highest.
尽管如此,它因为如此的简单,便宜和快速,所以得分最高。
This prioritization model would be perfect if it weren’t for real life.
如果不涉及到现实生活,这种优先级模型将是完美的。
Real life dictates that Task E must be completed first because it came from your boss’s boss.
现实生活要求任务E必须首先完成,因为它来自你老板的老板。
You’ll have to do Task D before you do Task A because D came from somebody to whom you owe a favor.
在执行任务A之前,您觉得必须执行任务D,因为D来自帮助过您的人。
However, if you do Task D before you do Task B, you’re likely to be late for your daughter’s birthday party.
但是,如果你在执行任务B之前执行任务D,那么你女儿的生日聚会你可能会迟到。
The model is not the object.
模型不是对象。
The map is not the territory.
地图不是领土。
It is only a representation.
它们只是一种表现形式。
If it seems to fit, it’s useful—for a while.
如果它似乎不合适,它只是暂时有用罢了。
Complex models require a number of assumptions.
复杂模型需要许多假设。
Rain will lower sales.
下雨将降低销量。
Launching a new product will cannibalize sales.
推出新产品将蚕食现有产品的销售。
Hiring new salespeople will increase sales.
雇用新的销售人员将增加销售额。
New products and new salespeople will increase costs.
新产品和新销售人员将增加成本。
Plug in your numbers, turn the crank, and see what the results might be.
输入您的数据,开始计算,看看结果可能是什么。
But the weather changes, competitors shift strategies, the President of the United States tweets about your company, and your underlying assumptions get farther from reality by the minute.
但是天气变了,竞争对手转变战略了,美国总统发布有关贵公司的Twitter了,以致你的模型会逐渐远离现实。
Figure 2.2 A prioritization model in Excel
图2.2 Excel中的优先级模型
66
How do you know a model is going to be useful?
你怎么知道模型会有用?
Test it.
测试它。
Start with 12 months of data and build a model that accurately describes the first nine months.
一开始你有12个月的数据,先构建一个可以准确描述前九个月数据的模型。
It’ll take some tweaking, but eventually you have a cal- culation that’s close enough.
过程中可能会有一些调整,不过最终你会得到一个足够拟合的计算模型。
Next, run that model and see how close it comes to predicting the next three months.
接着,运行该模型,看看你的预测未来三个月与实际的三个月的接近程度。
Is it way off?
差的很远?
Then it’s not useful.
那么这个模型没有用。
Try again.
再试试。
Is the new version better than a coin toss?
新版模型比你瞎猜会更优?
It’s useful—for a while.
那么它暂时是有用的。
TOO MUCH TO THINK ABOUT
要考虑的还有多
They say 3 percent of the people use 5 to 6 percent of their brain
他们说3%的人只使用5%到6%的大脑
97 percent use 3 percent and the rest goes down the drain I’ll never know which one I am but I’ll bet you my last dime 99 percent think with 3 percent 100 percent of the time
97%的人使用3%,其余的荒废掉了,我永远不知道我是属于哪一种。我敢拿我最后一分钱和你打赌,99%的在100%的时间里都认为自己是那3%。
64 percent of all the world’s statistics are made up right there on the spot
全世界64%的统计数据都是在那种状态完成的
82.4 percent of people believe’em whether they’re accurate statistics or not
82.4%的人相信这些,不管它们是否是准确的统计数据
I don’t know what you believe but I do know there’s no doubt
我不知道你相信什么,但我知道毫无疑问
I need another double shot of something 90 proof I got too much to think about
我需要另外一次证明,我有太多想法了
“Statistician’s Blues,” Todd Snider
“统计学家的蓝色,”托德斯奈德
You can get a lot of value out of a handful of variables, but the promise of big data and machine learning is that the more data and the more data types you have, the more the likely the machine can find something useful.
您可以从少数变量中获得有价值的东西,但大数据和机器学习的好处是,数据越多、数据类型越多,机器发现有用的东西的可能性就越大。
Using data set A, you plan to build a model that predicts Y as a function of X. Using a data set of current customers, you plan to build a model that predicts propensity to buy as a function of demographics or:
使用数据集A,您打算设计一个模型,基于函数X去预测Y。使用当前客户的数据集,您计划设计一个模型,根据人口统计数据去预测购买倾向或:
A People who have already bought your product
基于已经已经购买过您产品的人去设计。
Y People who are most likely to be interested in your product X Age, gender, and ZIP code
Y代表最有可能对您的产品感兴趣的人,X代表年龄,性别和邮政编码等
67
That’s a model you’ve used for years, but it could be much better.
这是你这么多年来一直使用的模型,但它可能会变得更好。
A People who have already bought your product
基于已经已经购买过您产品的人去设计。
Y People who are most likely to be interested in your product X Search terms typed into Google
Y代表最有可能对您的产品感兴趣的人,X代表在谷歌里面搜索的人
That’s a half-trillion-dollar idea (Google’s recent market cap).
这个想法值0.5万亿美元(谷歌最近的市值)。
There’s something there worth pursuing.
总有一些值得追求的东西。
But what if you cranked it up a bit?
但是,如果你把这个变量加入以后?
A People who have already bought your product
基于已经已经购买过您产品的人去设计。
Y People who are most likely to be interested in your product X The demographic and search data above, plus
Ÿ代表最有可能对您的产品感兴趣的人,X代表人口属性,搜索行为,再加上:
Direct interaction with your call center
与您的呼叫服务中心有过沟通
Direct interaction with your company website
与贵公司网站有直接互动
Direct interaction with your mobile app
直接与您的APP有互动
Direct interaction on the advertising ecosystem
广告活动中有直接互动
Direct interaction on websites that sell their data
在CRM的网站上有直接互动
Data from aggregators: credit scores, shopping, etc.
来自数据中心的数据:信用评分,购物等。
The weather
天气
The economic climate
经济大势
The price of tea in China
中国茶叶的价格
Wait!
等等!
That’s just too many variables to hold in your head at once.
现在的变量太多了,你的脑子已经无法处理了。
It’s too much to think about.
要考虑的东西太多了。
But you can ask the machine to review all the above and determine which attributes are the most predictive, and please use those to build a model for you.
但您可以要求机器检查以上所有因素并确定哪些因素最具预测性,然后就可以使用这些因素为您构建模型。
Better yet, you can ask the machine to review the models it has created and make a value judgment on which ones might be best.
更优的情况是,您可以要求机器检查它创建的模型,并进行调整以便获得最优的模型。
In the Harvard Business Review,4 Tom Davenport described a large, well-known technology and services vendor trying to build a model showing which executives among their client base had the highest propensity to purchase.
在“哈佛商业评论”中,汤姆达文波特介绍了一家知名的大型技术和服务供应商,他们试图建立一个模型,说明客户群中哪些高管具有最高的购买倾向。
Using traditional human-crafted modeling, the company once employed 35 offshore statisticians to generate 150 propensity models a year.
在传统的人工建模情形下,该公司曾雇用35名外包统计人员,每年设计了150个购买倾向模型。
Then it hired a company called Modern Analytics that specializes in autonomous analytics, or what it calls the “Model Factory.”
后来,这家公司聘请了一家名为Modern Analytics的公司,这家公司专门从事智能分析,也被称之为“模型工厂”。
Machine learning approaches quickly bumped the number of models up to 350 in the first year, 1,500 in the second, and
机器学习方法在第一年迅速将模型数量增加到了350,在第二年中增加到了1500,
68
now to about 5,000 models.
到现在已经增加到了5000个。
The models use 5 trillion pieces of information to generate over 11 billion scores a month predicting a particular customer executive’s propensity to buy particular products or respond to particular marketing approaches.
这些模型利用了5万亿条信息,每月产生超过110亿的分值,来预测客户主管购买特定产品的倾向,或者或会对哪些营销推广方法感兴趣。
80,000 different tactics are recommended to help persuade customers to buy.
还给出了建议-使用80,000种不同的策略来说服顾客购买。
Using traditional approaches to propensity modeling to yield this level of granularity would require thousands of human analysts if it were possible at all.
如果使用传统方法去设计倾向建模以获得这种级别的结果,即时有可能,那也需要成千上万名分析师。
Davenport named Cisco Systems as an example of a firm that “went from doing tens of artisanal propensity models to tens of thousands of autonomously generated ones.”
达文波特认为思科系统公司称也是一个例子,该公司“从数十个人工建模过渡到数万个人工智能生成的模型”。
Tens of thousands of autonomously generated models sounds very productive for a small group, but are those models any good?
相对于一个小组来说,成千上万的生成的模型听起来非常高效,但这些模型有用吗?
All models are wrong; some are useful.
所有的模型都是错的,有些有用。
How do we get the machine to create better models?
我们如何让机器创造更好的模型?
We teach it some things and it learns some things on its own.
我们教它一些东西,然后让他自学
MACHINES ARE BIG BABIES
机器是婴儿
Machine learning happens in three ways:
机器学习有三种方式:
supervised, unsupervised, and by reinforcement, just like kids.
有监督,无监督,加强,就像孩子一样。
If you introduce your toddler to a kitten and say, “Kitty” a few (dozen) times, eventually, the child will repeat after you.
如果你把你的孩子带到猫面前并介绍说“猫咪”几十次,最终,小孩会跟着你重复。
Show the child another kitten and she will spontaneously say, “Kitty!” delighted that that thing has a name and that she knew it.
给孩子看另一只小猫,她就会自发地说,“猫咪!”很高兴这个东西有一个名字而且她知道它。
Every time she comes across a cat, she will say, “Kitty!”
每次遇到一只猫,她都会说,“猫咪!”
Then, one day, she’ll come across a puppy and will, of course, say, “Kitty!”
然后,有一天,她会遇到一只小狗,当然她也会说,“猫咪!”
Now, she is in for a rude awakening.
现在,她处在觉醒时刻。
That’s no cat; it’s a dog.
那不是猫,那是狗
You correct her, and she spends a fair amount of time unsure whether that next one is a kitty or a doggie.
你纠正了她以后,她花了相当多的时间来确定下一个是猫咪还是小狗。
Pointing to the cat and telling her it’s a kitty is supervised learning.
指着猫,并告诉她这是一只猫咪,这是有监督的学习。
You’ve given her an example and a label.
你给了她一个例子和标签。
Correcting her when she makes a mistake and praising her when she gets it right is reinforcement learning.
当她犯错误纠正,并在她正确的时候称赞,这就是强化学习。
Eventually, her confidence grows as she is corrected less and less and continues to be praised when right.
最终,随着纠正的次数越来越少,因正确而受到称赞越来越多,她的自信心越来越强。
Unsupervised learning happens when the child is alone.
当孩子独处时,无监督学习就会发生。
She throws a toy and it bounces or it doesn’t, and she learns something about the nature of balls as compared to stuffed animals.
她抛出一个玩具球,然后玩具球弹跳或者不弹跳。与填充动物玩具相比,她了解了球的本质。
She eats a dandelion and discovers that it’s bitter.
她吃了蒲公英,发现它很苦。
She casually says something mean to another child and discovers words can hurt.
她随意地批评了一些对另一个孩子有意义的东西,发现言语可能会伤害到人。
69
Bob Page took time off after running analytics at Yahoo!, then eBay, and then being product vice president at Hortonworks to revel in the joys of ensuring his genes would live on.
鲍勃佩奇曾经在雅虎,eBay进行分析工作,现在是Hortonworks产品副总裁,现在专注于基因领域。
His experience mirrors the above.
他的经历反映了上述情况。
I’d been watching my (then 20 month old) daughter pick up new concepts, and watching her generalize.
我一直在观察我(当时20个月大)的女儿怎样接受新的概念,以及她的概括能力。
It confused me at first when a woman walked by at the park and she would point and say “mama!” but I finally realized she meant “she’s in the class of humans called mama” as opposed to “she’s my mama.”
当一个女人在公园走过时,她会指着她并说出“妈妈!”,这让我很困惑,但我最终意识到她的意思是“她在人类中被称为妈妈”而不是“她是我的妈妈”。
Now she applies the terms to a large range of things—this piece of broccoli is dada, this one is mama and this is baby.
现在她将这些称呼应用于各种各样的东西 – 这片西兰花是dada,这个是妈妈,这是宝宝。
She saw a bee in our lavender and said “bee!” followed by “mama!” and I had to say yes it’s a bee, but it might not be a mama.
她在我们的薰衣草中看到一只蜜蜂,先说“蜜蜂!”接着是“妈妈!”我不得不说是的,它是一只蜜蜂,但它不是一个妈妈。
She thought about it for a second, and declared that the bee was dada.
她想了一会儿,并宣称蜜蜂是dada。
The whole process has been fascinating to observe.
整个过程一直非常令人着迷。
Trying to explain the concept of color to her without describing properties of light is really eye opening (haha), especially since there isn’t just one “blue.”
试图向她解释颜色的概念而不涉及到光的属性的时候,真是让我开了眼界(哈哈),特别是因为当时不仅仅只有“蓝色”时。
It also had me thinking about how little our current AI systems really “know” anything, or can generalize from simple rules and relationships, vs. memorizing mountains of training
这让我思考我们当前的AI系统,他们真正“知道”任何东西吗,他们可以从简单的规则和关系中概括出一些规律吗?
(or historical) data to capture patterns that can be matched against “live” data.5
或者他们只是记忆了大量的实验数据(历史)去匹配实时数据。
Like children, machines learn from data.
像孩子一样,机器从数据中学习。
Given lots of data and time, they get to be really good at some things.
如果给他们足够多的数据和时间,他们会在某些方面非常擅长。
WHERE MACHINES SHINE
机器在哪些领域擅长
Machine learning is very useful for high-cardinality and high- dimensionality problems.
机器学习对于解决高基数和高维度问题非常有用。
At this point, the data scientists are rolling their eyes at the simplicity of this description and the marketing professionals are rolling their eyes because I fell into a quagmire of jargon only useful for Buzzword Bingo.
对于这一点,数据科学家对于这种简单的描述感到不可思议,营销专业人也感到震惊,我喜欢上了用这种术语。
Dear Data Scientist, please bear with me.
亲爱的数据科学家,请给我些耐心。
Dear Marketing Pro- fessional, there are some terms you will need to know because data scientists will use them casually without realizing that words can obfuscate.
亲爱的营销专业人员,您首先需要了解一些术语,因为数据科学家可能会随意使用它们,却不会意识到它们给你带来的困惑。
70
High Cardinality
高基数
Cardinality refers to the uniqueness of elements in a database column.
基数是指数据库中列元素的独特性。
Each row contains information about one person and the columns are attributes about that person (name, rank, serial number).
每行包含一个人的信息,每列是关于该人的属性(名称,等级,序列号)。
High cardinality is where, like e-mail and phone numbers, each entry is absolutely unique.
高基数就像电子邮件和电话号码一样,每个条目都是绝对独一无二的。
Low cardinality is where one value can belong to many entries.
低基数是一个值可以适用很多输入条目。
Everybody in the database has a unique e-mail address and a unique phone number.
数据库中的每个人都有一个唯一的电子邮件地址和一个唯一的电话号码。
The column that shows what city they live in has moderate cardinality as there might be a lot of people in New York.
如果某一列是居住城市,这一列有适度的基数,因为城市纽约可能关联很多人。
There’s very low cardinality in the column that notes whether they are dead or alive as there are only two choices unless one of the rows contains Schrödinger’s cat.
列中的基数有时会比较低,比如人是死还是活,因为这个列里面只有两个选项。当然,除非其中一行包含了薛定谔的猫。
High Dimensionality
高维度
Lots and lots of attributes about an individual would create a highly dimensional database.
存储关于一个人的多个属性时,需要创建高维度的数据库。
When we get beyond name, rank, and serial number, we start seeing data sets with hundreds and thousands of traits, characteristics, or actions relating to a prospective customer.
当我们的眼光不仅仅局限在客户的名称,等级和序列号时,我们开始发掘出数百个与潜在客户相关的特点,特征或特定行为的数据集。
Machine learning comes into the picture because it can keep a mul- tidimensional map of data in mind even when there are a thousand different things to remember and some of them can vary in type by the thousands.
机器学习这个时候进入了我们的视野,因为它可以记住多维图,即使维度有上千种,每个维度又有上千种变化。
What is a multidimensional map?
什么是多维图表?
A database of drivers is two- dimensional.
司机信息数据库是二维的。
For each row (driver’s license number) you have columns including: name, address, data of birth, eye color, hair color, and so on.
对于每一行(驾照),您有以下列:姓名,地址,出生日期,眼睛颜色,头发颜色等等。
If you want to keep track of people as they change, you need a third dimension: time.
如果您想在数据改变时跟踪他们,您需要第三个维度:时间。
Then you know when they moved to a new house, changed their hair color, or put on a few pounds.
然后你知道他们什么时候搬到新房子,改变他们的头发颜色,或者增加几磅体重。
As more attributes are added to the data set, more dimensions are needed.
随着更多属性添加到数据集中,需要更高的维度。
Gideon Lewis-Kraus described how machine learning tackles the problem of language interpretation with multiple dimensions.6
Gideon Lewis-Kraus描述了机器学习如何解决多维度的语言理解问题的
When you summarize language .
当你总结语言时,
.
. you essentially produce multidimensional maps of the distances, based on common usage, between one word and every single other word in the language.
你主要上可以根据语言常用用法,在一个单词和另一个单词之间生成距离的多维图表。
The machine is not “analyzing” the data the way that we might, with linguistic rules that identify some of them as nouns and others as verbs.
机器不是按照我们的方式“分析”语言数据:利用语言规则将其中的一些标识为名词,将其他标识为动词。
Instead, it is shifting and twisting and warping the words around in the map.
相反,它正在改变,扭曲图表中的文字。
71
In two dimensions, you cannot make this map useful.
在二维中,这个图表没有用。
You want, for example, “cat” to be in the rough vicinity
例如,你觉得“cat”大约在“dog”这个词附近。
of “dog,” but you also want “cat” to be near “tail” and near “supercilious” and near “meme,” because you want to try to capture all of the different relationships—both strong and weak—that the word “cat” has to other words.
但是你也希望看到“cat”与“tail”相关,还有“supercilious”,“meme”,因为你想要发掘所有相关的词- 无论关系是强还是弱 -:“cat”这个词与其他所有词的关系。
It can be related to all these other words simultaneously only if it is related to each of them in a different dimension.
只有当它们在不同的维度上与它们中的每一个词相关时,它才能同时与所有这些词语相关联。
You can’t easily make a 160,000-dimensional map, but it turns out you can represent a language pretty well in a mere thousand or so dimensions—in other words, a universe in which each word is designated by a list of a thousand numbers.
你不无法轻易制作一张160,000维的图表,但事实证明你可以在一千个左右的维度中很好地解释一种语言 – 换句话说,在一个环境中,每个单词由一千个属性的列表定义。
[Google research scientist Quoc V.] Le gave me a
[谷歌研究科学家Quoc V.] Le给了我一个答复,
good-natured hard time for my continual requests for a mental picture of these maps.
在我不断要求对这些图表进行描绘的时候。
“Gideon,” he would say, with the blunt regular demurral of Bartleby, “I do not generally like trying to visualize thousand-dimensional vectors in three-dimensional space.”
“吉迪恩,”巴特比会很直率的说,“我一般不喜欢试图在三维空间中想象出千维向量。”
Machines do this map twisting and warping through multiple lay- ers of artificial neurons that, like the brain, reinforce connections when they prove valuable.
机器通过多个人工神经元层进行图表分析,这些神经元像大脑一样,在它们被证明有价值时会加强某些数据关联。
That makes it good for finding patterns.
这使得它有助于发现模式。
STRONG VERSUS WEAK AI
强大VS弱小的AI
The topmost classification of artificial intelligence separates the weak from the strong.
最重要的人工智能分类将弱与强分开。
According to the University of California, Berkeley, website, A Holistic Approach to AI,7 “Strong AI’s goal is to develop Artificial Intelligence to the point where the machine’s intellectual capability is functionally equal to a human’s.”
根据加州大学伯克利分校的网站,人工智能的整体方法,“强大的AI的目标是将人工智能发展到机器的能力在功能上与人类相同的程度。”
Creating an artificial mind is a fascinating goal and the stuff of sci- ence fiction, but not our topic of discussion.
创造一个智能思维是一个吸引人的目标,但这不是我们讨论的话题。
We’re more interested in machines that can perform a specific task:
我们对能够执行特定任务的人工智能更感兴趣:
Choose the right e-mail headline, segment a vast audience into groups for targeting, choose the next-best-action for convincing a prospect to buy, and so on.
选择合适的电子邮件标题,将广大受众群体划分为多个群组进行定向,选择下一个最佳动作以便说服潜在用户购买,等等。
That’s known as narrow or weak AI.
这中被称为狭窄意义上的AI或弱AI。
Matt Gershoff prefers thinking about machine learning as a spe- cific tool to solve specific tasks one at a time.
Matt Gershoff更喜欢将机器学习视为一次一个的解决特定任务的工具。
“It’s really all about learning an agent function,” he says.
“这完全是为了学习代理功能,”他说。
“Initially, you give the function inputs and it passes back outputs—decisions it’s made to influence the outside world.”
“最初,你提供功能输入,它会传递输出 – 这个输出会影响外部世界。”
72
The machine starts off crappy, random, but then over time, based on the feedback from the environment, starts to select actions that, if all goes to plan, begin to perform well on the given task.
开始的时候,机器的表现很蹩脚,随机,但随着时间的推移,根据不断的反馈,如果一切顺利,机器在给定任务上表现会有良好的表现。
People sometimes get a little confused between Predictive Analytics and Machine Learning.
人们有时会在Predictive Analytics和Machine Learning之间感到一丝困惑。
While they are related, Predictive Analytics is used to, not surprisingly, make predictions about the world.
虽然它们是相关的,但毫无疑问,Predictive Analytics用于预测世界。
It is like when you stop and only think about what might happen if you were to take some action.
就像你停下来,只考虑如果你采取某些行动的话,未来会发生什么。
Machine Learning is more like when you both think about what will happen, and then take an action based on what you think will happen.
机器学习更像是当你知道会发生什么的同时,基于你认为将发生的事情采取一些行动。
Once the machine passes back a decision (this e-mail headline or that banner ad) and that decision is put into action, the machine can read the result.
一旦机器返回了决定(选择电子邮件的标题或投放某个横幅广告)并且该将决定付诸实施,机器就可以读取结果。
That result is the next round of input for the machine to think about, adjust its “opinion,” and output the next, better, decision.
这个结果是机器下一轮思考的输入,接着继续调整其“决定”,并输出下一个更好的决策。
Narrow AI is designed to do something specific.
弱AI只关注做一些特定的事情。
THE RIGHT TOOL FOR THE RIGHT JOB
正确工具做正确的事
We become what we behold.
我们成为了我们所希望看到的。
We shape our tools, and thereafter our tools shape us.
我们塑造我们的工具,然后我们的工具塑造我们。
Marshall McLuhan
Marshall McLuhan
A marketer meets with a webmaster and says,
营销人员见到网站管理员,并说,
“We want a fully responsive, dynamic, bleeding edge website.”
“我们想要一个响应式的,动态的,前卫的网站。”
“By that, you mean you want it to automatically scale depending on the device, and you want it to personalize the content for the visitor, and you want it to look cool and hip and modern?”
“按你的说法,你的意思是你希望这个网站根据设备自动缩放,你希望它为访问者提供个性化内容,你希望它看起来很酷,时髦和现代?”
“Yeah, that’s right!”
“是啊,没错!”
“Well, first of all, do you want it to be Responsive or Adaptive?”
“嗯,首先,你想要它是响应式还是自适应?”
(Blank stare from marketer.)
(营销人员蒙了。)
Remember those days?
还记得那些日子吗?
Back when the “webmaster” could baffle you with jargon?
当“网站管理员”可以用行话来回复你的需求的时候?
You didn’t need to learn all the underpinnings
你不需要学习所有的基础知识
73
(ink-dye sublimation), but you did need to know enough to have a cogent conversation.
(墨水染料怎样升华),但你确实需要知道足够的知识来进行一场有说服力的对话。
Let’s try again.
让我们再试一次。
“Well, first of all, do you want it to be Responsive or Adaptive?”
“嗯,首先,你想要它是响应式还是自适应?”
“We want it to continually and fluidly change depending on the device and we want some of it built to a few preset factors so a combination of the two is ideal.”
“我们希望网页能够根据设备不断变化,并且我们希望它能够根据一些预设因素进行改造,因此两者的结合是比较理想的。”
“Well, we could use CSS3 or Susy to handle layout, and if you want sticky footers, we’re going to recommend Flexbox.”
“好吧,我们可以使用CSS3或Susy来处理布局问题,如果你想要粘性页脚,我们推荐使用Flexbox。”
“Those details are your call.”
“这些细节由你决定”
It’s important to understand the first level of the conversation with some general ideas about overhead costs, both in terms of a scientist’s time and machine time.
从科学家的时间和机器时间的角度,了解有关间接成本的一般概念,理解对话的第一个层次非常重要。
Getting the right ad in front of the right person at the right time is a different problem than finding a market segment that will be less price sensitive.
在正确的时间在合适的人面前推送正确的广告与找到对价格敏感度较低的细分市场是两个不同的问题。
Now that we’ve shifted to narrow or weak AI, we can sub- divide between the three main categories of machine learning to choose from: supervised, unsupervised, and reinforcement.
现在我们转向关注狭窄或弱AI,我们可以在机器学习的三个主要类别之间进一步细分,三个主要类别是:监督,无人监督和强化。
How do you choose?
你怎么选择?
Some methods are better for some types of problems.
某些方法对于某些类型的问题更适合。
The first distinction is whether you need a machete or a scalpel.
第一个区别是你需要砍刀还是手术刀。
In “The Future of Machine Intelligence,”8 Benjamin Recht explains
在“机器智能的未来”中,Benjamin Recht解释了机器
the difference between robustness and performance in machine learn- ing systems:
学习体系的稳健和性能之间的区别:
In engineering design problems, robustness and performance are competing objectives.
在工程设计问题中,稳健和性能是相互竞争的目标。
Robustness means having repeatable behavior no matter what the environment is doing.
无论环境怎么变化,稳健意味着行为可重复。
On the other hand, you want this behavior to be as good as possible.
另一方面,您希望此性能指标尽可能的好。
There are always some performance goals you want the system to achieve.
您希望系统是要能实现一些性能目标的。
Performance is a little bit easier to understand—faster, more scalable, higher accuracy, etc. Performance and robustness trade off with each other: the most robust system is the one that does nothing, but the highest performing systems typically require sacrificing some degree of safety.
性能比较容易理解 – 更快,更具可扩展性,更高的准确性等。性能和稳健性相互制约,他们之间存在取舍:最稳健系统是什么都不变的系统,性能最高的系统通常需要牺牲一定程度的安全。
74
Matt Gershoff provides a clear analogy.
Matt Gershoff提供了一个明确的示例。
An F1 car is one of the fastest road going vehicles but it can really only reach its top speed on an F1 track, but it’s also very complicated and fragile.
F1赛车是最快的公路赛车之一,但它实际上只能在F1赛道上达到最高速度,同时但它也非常复杂和脆弱。
Alternatively, a rally car can’t go nearly as fast as an F1 car on an F1 track, but on a rally course, is the fastest because it’s more robust.
拉力赛车的速度几乎不能像F1赛道上的F1赛车一样快,但在拉力赛道上,它速度最快,因为它更加稳健。
It’s also more versatile and can work in lots of environments, even if it can’t beat an F1 on an F1 track.
拉力赛车也更通用,可以在很多环境中工作,即使它无法在F1赛道上击败F1。
This trade-off between methods that do very well in certain, narrow environments, vs. methods that tend to do fairly well across many environments, is one that you will often have to make.9
在某些限制条件严格环境中表现良好的方法与在许多环境中表现相当好的方法之间的权衡取舍,是您经常需要考虑的。
What’s the difference between different types of different tech?
不同类型的不同技术之间有什么区别?
How much do you need to know to participate in the conversation without getting left behind on the one hand or having to subscribe to “What’s Happening This Minute with AI”?
你需要了解多少,才能不被AI时代抛弃,进行一场有意义的AI对话呢?你需要订阅“人工智能在这一分钟内容发生了什么”吗?
First, you should be comfortable with and understand the differ- ence between classification and regression.
首先,您应该熟悉并理解分类和回归之间的区别。
And for that, we delve into classical statistics used in data mining.
为此,我们深入研究了数据挖掘中使用的经典统计只是。
Classification versus Regression
分类与回归
Classification is just what it sounds like.
分类的意思就是字面意思。
It sorts elements (customers, campaigns, product lines) into classes: male versus female, branding versus direct response, high margin versus low margin, and so on.
它将元素(客户,活动,产品线)分类为:男性与女性,品牌推广与效果推广,高利润与低利润等等。
Specific values get sorted into distinct categories.
特定值被划分为不同的类别。
If you’re over 30, you’re no longer “young.”
如果你年龄超过30岁,你就不再“年轻”了。
If your food item margin is better than
如果您的食品项目的利润率
5.8 percent, it goes in the High bucket.
优于5.8%,那么它将划入优质类别。
Classification is great if you’re just sorting by gender to choose which e-mail message to recommend in an e-mail blast.
如果您只是在考虑基于性别去草拟电子邮件的内容,则性别分类很有用。
If only some of your products are usually bought by men and others are usually bought by women, then sending the wrong version to “Pat” might not bother you.
如果您的某些产品通常由男性购买而其他产品通常由女性购买,那么将错误的邮件发送到随便“某个人”可能不会有问题。
If, on the other hand, your products are exclusively purchased by one or the other, you might want to use regression instead.
另一方面,如果您的产品是由一个或某个人独家购买的,您也许希望使用回归方法。
Regression will tell you how likely it is that Pat is male or female.
回归将告诉你“某个人”是男性还是女性的可能性。
As the word implies, regression is a matter of looking backward.
正如这个词暗示的那样,回归是一个向后看的方法。
In solving a problem of this sort, the grand thing is to be able to reason backwards.
在解决这类问题时,最重要的是能够向后推理。
That is a very useful accomplishment, and a very easy one, but people do not
这是一个非常有用的方法,也是一个非常容易的方法,但人们却没有
75
practice it much.
练习的够多。
In the every-day affairs of life it is more useful to reason forwards, and so the other comes to be neglected.
在日常生活中,向前推理更有用,因此另向后推理被忽略了。
There are fifty who can reason synthetically for one who can reason analytically.
对于能够进行分析推理的人来说,有很多人可以被推理分析。
Sherlock Holmes, A Study in Scarlet
夏洛克福尔摩斯,血色研究
Regression analysis is good at dealing with a spectrum of results expressed as a number.
回归分析擅长处理以可以用数字表示的一系列结果。
Rather than saying John always rides his bike to work, it says there is a higher probability that John will drive when it rains, and the harder it rains, the higher the likelihood.
它并告诉我们约翰总是骑着自行车上班,而是说下雨天的时候,约翰开车的概率更高,而且雨越大,开车的可能性越大。
With those broad categories out of the way, we should spend a little more time on the three big categories of machine learning:
有了这些广泛的类别,我们应该花更多的时间在机器学习的三大类上:
supervised, unsupervised, and reinforcement.
监督,无人监督和强化。
Supervised Machine Learning
有监督的机器学习
Supervised learning is used when you know what the answer is for the examples you have.
当你知道你的问题的答案是什么的时候,你就会使用监督学习。
You know a cat when you see one, but you can’t look at a million pictures.
当你看到一只猫时,你会认识它是一只猫,但你没办法去浏览一百万张照片。
So, you teach the machine to recognize cats by showing it as many cat pictures as you can and label them “cat.”
所以,通过给机器展示尽可能多的猫的照片,教机器识别猫,并将它们标记为“猫”。
You identify the customers you believe to be the best by whatever definition you please, and then ask the machine to go find others that fit the profile without having to create the profile yourself.
基于您自己喜欢的标准,您识别出来了您认为的最优质的客户,然后你让机器自己去找到其他的潜在客户,而无需自己创建匹配文件。
Your list of current customers is the training data.
您确定的当前客户的列表是培训数据。
The machine looks at your list, figures out what they have in common, and decides which elements are the most predictive of “goodness.”
机器会查看您的列表,找到它们的共同点,并确定哪些特征的预测效果“优质”。
Using these criteria, it looks at the supplied database of prospective customers and shows you the ones you should be targeting.
使用这些特征标准,它会查看所提供的潜在客户数据,并向您显示您应该关注的人。
You then get the chance to say, “Yes, these are the cats we’re look- ing for,” or correct the errant machine by providing an alternative label for those that don’t fit.
然后,您说:“是的,这些是我们正在寻找的猫”,或者通过为那些不合适的人提供别的标签来纠正错误的答案。
The machine might decide that all your best cus- tomers are named Daniel, so it brings you all the Daniels it can find.
机器可能会决定将所有最好的客户都被命名为Daniel,因此它会所有叫Daniels的跳出来给你。
While true, it is not useful.
Carry on a conversation with the machine until it starts coming up with good-enough answers, then better answers, and then results that are far superior and in much less time than humans can.
与机器进行互动,直到它开始提供足够好的答案,然后继续找到更好的答案,最终提供比人类所花时间更少,更优质的结果。
Bayes’ Theorem
贝叶斯定理
A little over 250 years, ago, the Reverend Thomas Bayes thought a lot about probabilities and laid the groundwork for modern classifica- tion solutions.
在250多年前,牧师托马斯·贝叶斯(Thomas Bayes)对概率进行了很多思考,并为现代概念分类奠定了基础。
He built on the idea of conditional probability—“that
他建立了条件概率
76
the likelihood of something happening depends on what happened before.”
“一件事情发生的可能性取决于之前发生的事情。”
Problem:
问题:
Given the number of times in which an unknown event has happened and failed:
给出未知事件发生和失败的次数:
Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named.
求其在一次事件中发生的概率在任意指定的概率密度之间的机遇
“An Essay towards Solving a Problem in the Doctrine of Chances,”
“论有关机遇问题的求解”
Reverend Thomas Bayes
托马斯贝叶斯牧师
In strict statistical terms, flipping a coin and getting heads 49 times in a row has no impact on the likelihood of the next flip being tails.
在严格的统计情形下,连续投币中49次都是头像朝上的情况,对下一次的结果没有影响。
There is no conditional connection between the first 49 and the next.
49次头像和下一个结果之间没有关联。
Personally, I would bet on it.
就个人而言,我敢打赌下次还是头像。
Not because I have some hidden sta- tistical secret—quite the opposite.
不是因为我懂一些不为人知的统计秘密 – 恰恰相反,我不懂。
I have an emotional response to it even though the outcome is completely independent from what went before.
尽管结果完全独立于以前的表现,但我的情绪是主观的。
This is why statistics are vital.
这就是为什么统计至关重要的原因。
But in a case where there is a connection, the math works wonders.
但是在存在关联的情况下,数学会表现的很好。
The likelihood of people buying an iPhone case goes up dramatically if they have previously purchased an iPhone.
如果人们之前购买过iPhone,那么人们购买iPhone手机壳的可能性会大幅上升。
The chances of people buying motorcycle insurance takes a giant jump if they buy a motorcycle, and then goes up bit by bit by bit when you start layering in factors like ZIP code, credit rating, and age.
如果他们购买了摩托车,那么他们购买摩托车保险的机会就会大幅增加,然后当你开始考虑邮政编码,信用评级和年龄等因素时,可能性会一点一点地改善。
If 22 percent of people in Santa Maria, California, subscribe to Planet Earth magazine and 6 percent of the population drive an elec- tric/hybrid car, you can only surmise that the same will hold true for San Louis Obispo.
如果加利福尼亚州圣玛丽亚有22%的人订阅Planet Earth杂志,而且有6%的人口驾驶电动/混合动力汽车,那么你会猜测圣路易斯奥比斯波差不多也会如此。
Then you throw in a few other factors like weather and topology and the calculation starts getting complex.
然后你会抛出纳入一些其他因素,比如天气和拓扑结构,计算开始变得复杂。
How much does the weather impact the probability?
天气对这种预测有多大影响?
Not nearly as much as the primary industries of the local areas, one being agribiz and the other education and healthcare.
没有这几个因素的影响大:当地的主要产业农业,教育水平和医疗保健体系。
The Good Reverend worked out a method of calculating how much to revise the probabilities in the face of new data and, even more important, how to account for the probability that the new information might be incorrect.
Good Reverend制定了一种方法,用于计算在引入新数据时概率改变的程度,更重要的是,如何评估新信息可能不正确的概率。
Decision Trees
决策树
Remember Guess the Animal?
还记得猜动物的例子吗?
The machine takes the information it has about the subject, finds the most significant differentiator, and splits the data set into two, and then two again, and so forth.
机器获取物体的相关信息,找到最重要的识别因素,并将数据集拆分为两个,然后再拆分两个,依此类推。
77
Of all the people who might have the highest customer lifetime value for buying your motorcycle insurance, the most telling attributes about them might be gender, age, credit score, and whether they park their motorcycle in a garage.
在所有人中,那些顾客终身价值最高的购买摩托车保险人群中,最有说服力的特性可能是性别,年龄,信用评分以及他们是否在车库停放摩托车。
Assuming only two outcomes of the question of gender, the first split is a rather simple decision node.
假设这里只有关于性别属性的两个答案,第一个划分就是是一个相当简单的决策节点。
The same is true of whether the bike is parked in the garage.
第二个决策节点是否把摩托车停在车库里也是如此。
(See Figure 2.3.)
(见图2.3。)
This sort of sorting can happily be done by hand, but when more and more attributes are poured into the mix, it helps to have an algo- rithmic assist.
这种分类可以很轻松的手动完成,但是当越来越多的属性被加到模型中的时候,它需要借助算法的辅助。
The third split can have multiple outcomes depending on the buck- ets of age, the fourth, on the buckets of credit score, and so on.
第三次属性拆分可以有多种结果,具体取决于这个属性年龄段,第四次拆分一样,信用评分等级等等。
At these lower stages, the splits are no longer black and white, but shades of propensities.
在这个阶段,划分不再是简单黑色和白色属性,而是多个属性的混合了。
(See Figure 2.4.)
(见图2.4。)
The result may finally tease out that married males between 38 and 62 with a credit score above 700 who live in temperate climes
结果可能最终是:38至62岁的已婚男性,其中信用评分高于700,生活在温暖气候地区。
Gender
性别
Male 78% Female 22%
男性78% 女性22%
Inside 69%
内部 69%
Figure 2.3 Guys who park motorcycle indoors are more likely to buy insurance.
图2.3 在室内停放摩托车的人更有可能购买保险。
Gender
性别
better target
更好的目标
Male 78% Female 22%
男性78%女性22%
better target
更好的目标
Inside 69%
内部 69%
need ML
需要机器学习
Age
年龄
Credit Score
信用分数
ZIP Code etc.
邮政编码
Figure 2.4 The more attributes, the more high-level math is required.
图2.4属性越多,所需的数学知识就越高。
78
and regularly work out are most likely to respond to a promotion for insurance.
并定期锻炼的人最有可能对保险促销感兴趣。
Decision trees are great tools because they produce easy-to- understand results and can be parsed to understand how they reached their conclusions.
决策树是很好的工具,因为它们可以给出容易理解的结果,并且可以进行分解以便了解它们如何得出结论的。
The visual aspect makes it all the easier to compre- hend.
可视化的图形使人们更容易理解。
The glory of having machine learning do the heavy lifting is that the marketer does not have to decide what belongs at the top of the tree (counterintuitively, the “root”) to benefit from the outcomes (the “leaves”).
让机器学习去做这些体力活的吸引人之处在于,营销人员会从结果中收益,而不需要去决定哪些决策树中的节点在顶上(或者根部)。
Just let the machine figure out the significant variables.
让机器去找到重要的变量。
Random Forest
随机森林算法(Random Forest)
It’s the same as the above, but more so.
它与上面的相同,但更甚。
A random forest method gen- erates lots of decision trees by only looking at a randomly selected number of elements in a data set.
随机森林方法通过在数据集中随机选择的样本来生成大量决策树。
It then randomly chooses some of the trees and uses them as input for generating another decision tree.
然后它随机选择一些树并将它们用作生成另一些决策树的输入。
This process can be run again and again with multiple forests spawning the next generation of trees.
这个过程可以一次又一次地运行,由多个森林生成下一代树。
Why go to the trouble?
为什么要这么麻烦?
This approach is very useful for crunching through very large amounts of data.
这种方法对于处理大量级的数据非常有用。
Instead of trying to analyze all of it, the random forest method just grabs a chunk.
随机森林方法不是试图分析所有数据,而是抓住一块进行分析。
It’s also good with higher dimensional data sets (lots and lots of attributes).
它对于更高维度的数据集(有很多很多属性)应用也很好。
Support Vector Machines
支持向量机
You want to classify people into two groups: those who are most likely to buy and those who are not.
您希望将人员分为两类:最有可能购买的人和不购买的人。
All the evaluation and calculation you can think of on all the data you have says the people you’re looking at are all over the map.
基于所有数据,你能想到的所有评估和计算方法都告诉你,你正在找的人都不在这儿。
They are all equally likely to be on one side of the fence as the other.
这些都可能同样位于标准的一侧或者另一侧。(没办法区分开来)
A support vector machine looks at these people on a 3-D chessboard instead of a flat graph.
支持向量机在三维上分析这些人而不是二维平面中。
Adding the third dimension allows the machine to see that the fence can be built between the two distinct groups if it floats that fence in the air.
添加第三维后,这些人立体的漂浮在空中,这样机器就可以把这些人区分开来。
(See Figure 2.5.)10
(见图2.5。)
A support vector machine looks at your data in four, eight, or a thousand dimensions, thereby finding a way to classify them into groups that just wasn’t possible before.
支持向量机从四维,八维或一千维上查看您的数据,因此可以找到一种方法将它们分类,这在之前是不可能的。
This is very good for high-dimensionality problems.
这对于解决高维度问题非常有用。
Supervised learning has you teach the machine what you want it to know.
有监督的学习让你教会机器你想要它知道什么。
Unsupervised learning asks the machine to teach you what it discovers.
无监督学习要求机器告诉你它发现了什么。
79
Figure 2.5 Looking at data in 3-D can make classification easier.
图2.5查看3-D中的数据可以使分类变得更容易。
Unsupervised Learning
无监督学习
The joy of unsupervised learning is the element of surprise.
无监督学习的乐趣是它带来的惊喜。
You don’t ask the machine to solve a specific problem.
您不要求机器解决特定问题。
You merely ask the machine to tell you something you didn’t know.
你只是要求机器告诉你一些你不知道的事情。
What is interesting in this data?
这些数据有趣吗?
In unsupervised learning, you tell the machine to study a gazillion pictures of cats and tell you what it discovers.
在无人监督的学习中,你告诉机器去研究大量的猫图片并让它告诉你它发现了什么。
It might say that cats are usually found on sofas and chairs, and that most cats seem uninterested in the photographer.
它可能会说猫通常会待在沙发和椅子上,大多数猫似乎对拍摄的人不感兴趣。
When worried about customer churn and recovery, a review of your customer data might reveal that you can immediately identify customers who are most likely to defect and never buy from you again.
当你担心客户流失问题时,通过审查您的客户数据可能会发现答案,您可以立即识别最有可能的问题消费者,他们不会从你这儿购买了。
When asked for the attribute that best predicts defection, the machine spits out one word, Obituary.
当被问及哪个属性的流失预测效果最好时,机器吐出一个词,讣告(Obituary)。
This is true, but not useful.
这是真的,但没有用处。
In data science terms, we’re dealing with clustering (what do these individuals have in common?), asso- ciation (what is generally true about these people?), and anomalies (what stands out?).
用数据科学术语描述,我们正在处理聚类(这些人有什么共同点?),关联(一般怎样描述这些人?)和异常值问题(什么是突出的?)。
Cluster Analysis
聚类分析
Machine learning is great at seeing patterns.
机器学习很擅长找到模式。
Humans evolved to see patterns as well: patterns of leaves (I remember that this plant is edi- ble), patterns in movement (That’s not a dog, it’s a coyote!), and pat- terns in the weather (Time to find a warm place to hunker down during this blizzard).
人类进化到也能找到模式:叶子的模式(我记得这种植物是可以吃的),运动中的模式(那不是狗,它是一只土狼!),以及天气中的模式(暴风雪来了应该找个温暖的地方躲起来)。
Of course, this takes some training.
当然,这需要一些培训。
Ask a child to put away his clothes and he might put all the blue clothes in one drawer and all the red clothes in another instead of sorting out the shirts from the pants.
你让孩子收好他的衣服,他可能把所有的蓝色衣服放在一个抽屉里,把所有的红色衣服放在另一个抽屉里,而不是将裤子和衬衫分别整理好。
That’s perfectly logical.
这是完全合乎逻辑的。
80
But humans are also able to find patterns in truly random infor- mation.
但人类也能够在随机信息中找到模式。
If you stare at a photo of video static, you will see it move.
如果你盯着静态的视频照片,你会想像出它的移动。
You will start to make out designs.
你将开始自己想象。
You will start to see conspiracy theories.
你开始看到阴谋。
(Oh, look—there’s Jesus on my grilled cheese sandwich!)
(哦,快看我的烤奶酪三明治上有耶稣!)
While humans’ minds can fool them, a machine learning algorithm will only see patterns that actually exist.
虽然人类的想法可以被欺骗,但机器学习算法只会看到实际存在的模式。
Machines do not succumb to apophenia (the human tendency to perceive meaningful patterns within random data).
机器不会屈服于臆想(人类倾向于在随机数据中创造有含义的模式)。
Machines don’t believe in winning streaks or a basketball player’s “hot hand.”
机器不相信连胜或篮球运动员的“好运气”。
To a machine, says Matt Gershoff, “Stars are to data points as galax- ies are to clusters.
马特·格什霍夫(Matt Gershoff)说,对于一台机器来说,星星是数据点的话,星系就是聚类。
A cluster analysis might be to ‘find’ the galaxies from the stars.”
聚类分析可能是从星星中“发现”星系。“
A machine learning algorithm can find people who search for “Sony—DSC-W830 20.1-Megapixel Digital Camera—Silver” after having searched for “digital camera,” “digital camera reviews,” and “digital cameras with wifi,” and let you know that they are 50 percent more likely to purchase than people who searched for “digital camera,” “digital camera reviews,” and “digital cameras on sale.”
机器学习算法可以找到搜索一开始搜索“数码相机”,“数码相机评论”和“无线数码相机”后搜索“Sony-DSC-W830 20.1百万像素数码相机 – 银色”的人,并让告诉你,与搜索“数码相机”,“数码相机评论”和“数码相机发售”的人相比,他们购买的可能性要高50%。
A machine can see patterns that a person simply wouldn’t imagine to be worthwhile.
机器可以看到模式-人根本无法想象的有价值的模式。
It may discover that people who had recently visited a pet website are more inclined to buy the camera’s extended warranty.
它可能会发现最近访问宠物网站的人更倾向于给相机的购买延长保修。
There’s no rhyme or reason to that, but it is still actionable from a marketing perspective.
虽然这没有任何规律或理由,但从营销角度来看,这仍然是可行的。
A machine might also discover a pattern of no practical business value whatsoever.
机器也可能发现一种没有实际商业价值的模式。
That pattern might be too infrequent to be action- able, or apply to too few customers.
这种模式可能太罕见而无法采取对应的行动,或适用的客户太少。
It should be possible, however, for the machine to learn to make that distinction over time.
但是,随着时间的推移,机器是可以学习区分出来这些不同的模式的。
Association
关联
In the store, you might want to put items that are routinely purchased together right next to each other.
在商店中,您可能希望将时常一起购买的东西放在一起。
Alternatively, if the association is so overwhelmingly high that you won’t harm sales, you might place them at either end of the store, forcing buyers to travel past other temptations in order to boost opportunistic sales (impulse buying).
或者,如果关联非常强且不会妨碍销售得话,你可以将它们放在商店的两端,迫使买家去买另一种商品时,必须经过整个货架,这可以促进货架中间其他商品的销售(冲动购买)。
Amazon has leveraged this concept to a high degree with their people-who-bought-this-also-bought-that prod.
亚马逊已经在很大程度上利用这个概念“购买这个商品的人也买了其他商品”。
It works very well.
它的效果很好。
Like any other algorithm, it needs to be monitored by humans.
像任何其他算法一样,它也需要受到人类的监控。
It may well be that a certain age-range of people who buy a certain brand of toothpaste pick up hemorrhoid cream as well, but that wouldn’t make for a great promotional message.
有可能的情况是:某个年龄段的人购买某种品牌的牙膏一般也会购买痔疮膏,但这个信息并不适合用在促销上。
Outside of Amazon, you are most likely to see this approach resulting in extremely long cash register receipts.11 (See Figure 2.6.)
在亚马逊之外,您很可能看到这种方法导致的长长的收银机打出的的收据(见图2.6)
81
Figure 2.6 CVS uses what you buy to discount things you are most likely to buy.
图2.6 CVS基于您购买的商品来给您最有可能购买的商品提供折扣券。
Associations can apply to much more than shopping, such as
关联不仅适用于购物场合,
People who read this article or page also read that one.
例如阅读本文或页面的人也阅读过那篇文章。
People who saw this page, and then that page, purchased more.
看过这个页面,然后看了那个页面的人购买了更多商品。
People who used this mobile app, downloaded that one.
使用这个APP的用户,也下载了那个APP。
People who turned left when entering the store bought more of these.
进入商店时左转的人买了更多商品。
The two key elements to understand about association analytics when working with a data scientist are support and confidence.
在与数据科学家合作时,了解关联分析的两个关键要素是支持度和可信度。
Support refers to the number of times the items have shown up in the shopping basket while confidence is the ratio of times the two associated items have shown up together.
支支持度是指项目在购物篮中出现的次数,而可信度是两个相关项目一起出现的次数的比率。
82
If people bought toothpaste 400 times and dental floss 300 times today, the support number is how many times they showed up together.
如果人们今天买了400次牙膏和300次牙线,支持度是他们一起被购买的次数。
If they show up together 300 times, then the confidence is 3/4 or 75 percent for the association between paste and floss, but 100 percent for the association between floss and paste.
如果它们一起出现300次,则牙膏和牙线之间的关联置信度为3/4或75%,而对于牙线和牙膏的置信度则为为100%。
One must be the antecedent, and the other the consequent:
一个必须在前,另一个必然是结果:
If toothpaste, then floss with an association rule confidence probability of 75 percent; if floss, then toothpaste for sure.
如果基于牙膏,那么关联规则的概率可能会达到75%;如果基于牙线,那么牙膏的概率达到100%。
An association with very low support might just have happened by chance.
置信度很低的关联可能只是偶然发生的组合。
It’s not statistically significant.
这在统计上并不显著。
The first three times I walked under a ladder something bad happened, so I stopped walking under ladders and told all my friends to avoid them.
我第一次走在梯子下面时不好的事情发生了,所以我不再在梯子下走路,并告诉我所有的朋友要避开它们。
My friend walks under ladders all the time and nothing bad has ever happened to him.
我的朋友一直走在梯子下面,没有发生过任何糟糕的事。
In my case, I have low support and low confidence.
就我而言,我的支持度和置信度都很低。
My buddy has high support (all the time) and high confidence (nothing ever happened).
我的伙伴有很高的支持度(始终)和置信度(没有发生过)。
Clearly, we’d rather bank on his results than mine.
显然,我们宁愿相信他的结果而不是我的。
Anomaly Detection
异常检测
One of these things is not like the others, One of these things just doesn’t belong,
其中一件事与其他事情不同,其中一件事不属于这类,
Can you tell which thing is not like the others By the time I finish my song?
在我唱完这首歌之前你能说出哪些东西不像其他东西吗?
Sesame Street
芝麻街
The same talent for finding patterns is useful for finding outliers.
发现相同模式和发现异常值都很有用。
These are two ends of the same spectrum.
这是一个硬币的两面。
Some things are alike, some things are just plain weird, and the rest fall in the middle somewhere.
有些东西是相似的,有些东西有点奇怪,其余的东西都落在两者中间的某个地方。
What’s the value of an anomaly?
异常的价值是什么?
It’s essential for fraud detection.
这对于欺诈检测至关重要。
When you get that phone call from your credit card company asking if you bought a tank of gas in Omaha and a flat-screen television in Dallas, and you live in Atlanta, that’s your friend, the anomaly detec- tion system, at work.
比如你接到信用卡公司的电话,询问你是否在奥马哈购买了一罐汽油,并在达拉斯购买了一台电视机,而你住在亚特兰大,这就是你的朋友-异常检测系统正在工作。
Fraud is important in commerce, but we in marketing like to spend most of our time on raising revenue.
欺诈检测在商业中很重要,因为我们花了很多的时间在营销以便增加收入。
A sudden spike in Twitter men- tions of your brand, a flood of traffic to your website from a given referring page, or a surge in e-mail subscriptions related to a specific search term are all happy actions that could spell opportunity.
Twitter中的品牌提及突然飙升,来自特定推荐页面的流量激增,或者与特定搜索字词相关的电子邮件订阅数量激增,这些都我们乐意看到的现象,他们代表着商业机会。
The flipside is that you get an anomalous drop in attention, which represents a need for swift intervention to determine if your server is
另一方面,你的品牌关注度出现异常下降,这意味着需要迅速采取措施:确定你的服务器是否正常,
83
down, your new app crashes, or the FDA just announced that your new product causes cancer.
你的新APP崩溃了,或者FDA刚刚宣布你的新产品会导致癌症。
Those are obvious and the stuff of standard digital analytics.
这些是显而易见的,也是标准数字分析的内容。
But what if the anomaly detected was much more subtle?
但如果要检测到的异常很不显眼呢?
What if it were based on four or five unrelated events, but still offered an opportunity to build your brand or stave off public embarrassment?
如果异常是四个或五个不相关的事件呢,但它仍然提供了建立您的品牌或避免公众尴尬的机会?
Keeping your eyes open for outliers has always been a competitive edge and now we have a technical edge to help in that regard.
花大力气找到的异常值一直是竞争优势的来源,现在我们在这方面有技术帮手了。
Neural Networks
神经网络
Neural networks are probably the most mentioned type of artificial intelligence because it’s based loosely on how the brain works.
神经网络可能是最常提到的人工智能了,因为它基于我们大脑的工作方式。
The association between “brain” and “intelligence” is just too strong to ignore.
“大脑”和“智力”之间的关联很密切,不容忽视。
The human brain makes connections between neurons.
人脑神经元之间联系密切。
The more often the connection is made and/or the stronger the emotion associ- ated with the connection, the stronger that connection and the more likely it is to be triggered again.
连接的频率越高和/或与连接相关联的情绪越强,连接就越强,再次触发的可能性业越大。
A computerized neural network does the same based on math—but without the emotion.
计算机的神经网络做同样的事情,但没有掺杂情感因素。
Each artificial neuron has its own limits on when it passes a signal along.
每个人工神经元在传递信号时都有限制。
If it has high support and high confidence, it sends the mes- sage along to the next.
如果它具有高支持度和高置信度,它会将消息发送到下一个消息。
The more often it has high support and high confidence, the more likely it is to pass the signal along.
它拥有高支持度和高置信度的次数越多,传递信号的可能性就越大。
The simplest neuron can take some number of inputs and output a decision.
最简单的神经元可以接收一些输入并输出决策。
Each input is weighted to impact that decision differently.
每个输入加权后,不同程度地影响该决定。
When trying to decide whether to go out to the movies, you’re going to start by considering the cost, the weather, and the effort.
在决定是否去看电影时,您将首先考虑成本,天气和精力这些因素。
These are your inputs.
这些就是你的输入。
These are very important issues as the output from any of them will impact a go/no-go decision.
这些输入都非常重要,因为基于他们的输出会影响我们去或者不去的决定。
If it’s too nice a day to spend indoors or too inclement to go out, you’re not going to the movies.
如果在室内度过一天很美好或者外面的天气太恶劣,你就不去看电影了。
But if the weather is “normal,” then that input has no impact on your decision whatsoever.
但如果天气“正常”,则该输入对您的决定没有任何影响。
If it’s the end of the month and you’re feeling broke, it’s a no-go.
如果这是月底,而你钱花光了,那就不去看电影了。
If you’re tired, or sick, or just overwhelmed by inertia, it’s a no-go.
如果你感到疲倦,或生病,或者只是被惰性所控制,那就不行去了。
These inputs are not binary; they operate on a grayscale.
这些输入不是二进制的;它们在灰度空间运行。
Each would be a single neuron at work.
每次都是单个神经元工作。
Each weighs one considera- tion and delivers the go/no-go decision based solely on these factors.
每次都仅考虑这些因素,并基于这些因素提供去或不去的决定。
Then, the combination of neural decisions must be considered together so the outputs of the first three neurons are passed to the next.
然后,必须综合考虑神经元决策的组合,以便将前三个神经元的输出传递给下一个。
The processing of each neuron depends on the weight of the inputs and the bias of each neuron, both of which are unique for each neuron.
每个神经元的处理取决于输入的权重和每个神经元的偏差,每个神经元的权重和偏差都是唯一的。
The network kicks out an answer and the machine is either trained by
神经网络给出答案,由
www.allitebooks.com
http://www.allitebooks.org/
84
a human or by the result of an action in the real world.
一个人或现实世界中的某个动作的结果来训练机器。
That training changes the weights and biases of each neuron until the outputs start improving.
这些训练改变每个神经元的权重和偏差,直到输出结果得到改善。
A data scientist would map that out as shown in Figure 2.7.
数据科学家会将其映射出来,如图2.7所示。
The combination of weighted inputs tips the scale one way or the other.
加权的输入组合以一种方式或另一种方式提示程度。
The weights depend on whether feeling flush is more important to you than crummy weather.
权重取决于你感觉发烧是否比恶劣的天气更重要。
The expectation is that your feelings about these issues might change from moment to moment.
结果是你对这些因素的感受可能会随时不同。
You might feel broke, but are willing to throw caution to the wind.
你可能会感到沮丧,但愿意豁出一切。
You might hate the cold weather, but be willing to go this time.
你平时可能讨厌寒冷的天气,但这一次愿意去看电影。
Therefore, the process must monitor your feelings and recalculate until you take action and the results of that action can be fed back to the neurons to alter the weighting.
因此,整个计算过程必须监控您的感受并重新计算,直到您采取行动,并且该动作的结果可以反馈给神经元,进而神经元改变权重。
A neural network is considered a learning system when it responds to the response it gets from the environment (your mood in this case).
神经网络响应被认为是一种学习系统,它从环境中获得的响应时并相应的调整(在上面的例子中:你的情绪)
(See Figure 2.8.)
(见图2.8。)
Things can get more sophisticated quickly when you realize that the output can be more than binary.
当输出结果不是简单的YES或NO时,事情会很快变得更加复杂。
Rather than Go/No-Go, the output could be 65 percent Go.
不是去或不去,输出可能是65%可能性是去。
Add to that the ability to wire these net- works together in multiple layers, which is necessary when deciding
除此之外,还可以在多层神经网络中连接这些网络,
Cost
Weather Neuron
Go / No-Go
Effort
Figure 2.7 Going out to the movies
85
Hidden Neuron
Input 1
Hidden Neuron
Input 2
Output
Hidden Neuron
Input 3
Hidden Neuron
Figure 2.8 A simple neural net
to go to the movies.
这在决定去电影院看电影的例子中是必要的。
If the weather is good, your bankroll is large, and you’re energized, you have to decide what movie you want to see: layer two.
如果天气好,你的预算充足,而且你精力充沛,你必须决定你想看什么电影:到了第二层。
Maybe you’ve seen too many socially important but somewhat depressing movies lately.
也许你最近看过太多的反映社会现实但有点令人沮丧的电影。
Perhaps you can’t stomach yet another comic book superhero sequel.
也许你不能忍受再看一部超级英雄漫画的电影了。
You’d like nothing better than to go see an uplifting, fun musical.
你最喜欢看一个令人激动的,有趣的音乐片了。
But then you have to ask your date: layer three.
但是你必须问你的约会对象:到了第三层。
Multiple layers of decisions coded into a neural network bring us to deep learning.
神经网络加上多层决策把我们推向了深度学习领域。
Deep Learning
深度学习
Let’s give our machine an e-mail marketing task.
让我们交给机器一个电子邮件营销任务。
If it sends an e-mail with the word offer in the subject line to some recipients and deal in the other, it measures the response and learns that one of them works better and suggests that particular subject line be used for the rest.
如果它在主题行中向某些收件人发送带有“offer”字样的电子邮件并在另一些人发送带有“deal”的邮件,则会衡量回应并得知其中一个主题更好用,并建议其余邮件使用这个特定主题。
This is simple A/B testing.
这是简单的A / B测试。
It’s just playing with the average result.
这样只会得到普通的结果。
If we give the machine access to additional information about the recipients (age, gender, previous response), the machine can sort through that data and suggest that males between the ages of 18 and
如果我们让机器了解收件人的其他信息(年龄,性别,之前的回应),则机器可以对该数据进行排序,并得出
86
Deep neural network
input layer
hidden layer 1
hidden layer 2 hidden layer 3
output layer
Figure 2.9 Three-layered deep learning network
图2.9三层深度学习网络
34 are more likely to respond to subject line A. That can be coded in a single-layer, neural decision system.
18到34岁的男性更乐意回应主题A的邮件,这个信息可以纳入到单层神经决策系统。
If we add additional information (level of education, propensity to buy dental floss with toothpaste), the machine can take the output of the first layer and feed it into the next.
如果我们加入额外的信息(教育水平,一起购买牙膏和牙线的倾向),机器可以获取第一层的输出并将其加入下一层。
When we give the machine more data (time of day, recency of previous response, postal code) and give it more control over the environment (opening sentence, included photos, length of mes- sage), the machine can use multiple neural layers to calculate which combinations of e-mail components might be best for which types of recipients.
当我们给机器更多的数据(时间,上一次回复的时间,邮政编码)并给予它更多的外部控制权(开放的撰写内容,涉及照片,消息的长度),机器可以使用多个神经层来计算哪些电子邮件组件组合最适合哪种类型的收件人。
The term deep learning refers to the deeper and deeper layers of neurons you have working on your problem (Figure 2.9).12
深度学习这个术语指的是越来越深层次的去分析你的问题。(图2.9)
This gives rise to dynamic neural networks where the information can flow in a much less controlled manner, allowing the machine to build context and come to conclusions more quickly.
这就涉足到了动态神经网络,信息可以更自由的进行流动,受更少的控制,允许机器构建上下文并且更快地得出结论。
This is where we move into an area of machines that program themselves on the fly.
这就是我们进入一个新的机器学习的领域,迅速的自我编程。
That is, they can dynamically change their “opinion” about relative inputs.
也就是说,他们可以动态地改变他们对相对输入的“意见”。
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton spelled it out in their May 2015 article in Nature, “Deep Learning.”13
Yann LeCun,Yoshua Bengio和Geoffrey Hinton在他们2015年5月的发表在自然杂志的文章“深度学习”中详细阐述了这一点。
Deep-learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level
深度学习是一种多层表示学习方法,用简单的非线性模块构建而成,
87
(starting with the raw input) into a representation at a higher, slightly more abstract level.
这些模块将上一层表示(从原始数据开始)转为更高层,更抽象的表示。
With the composition of enough such transformations, very complex functions can be learned.
通过多次的这种变换的组合,机器可以学习非常复杂的功能。
For classification tasks, higher layers of representation amplify aspects of the input that are important for discrimination and suppress irrelevant variations.
对于分类任务,较高的表示层会放大输入中对于分类很重要的信息,抑制输入中对于分类无关的信息。
An image, for example, comes in the form of an array of pixel values, and the learned features in the first layer of representation typically represent the presence or absence of edges at particular orientations and locations in the image.
例如,对于以像素值数组形式提供的一幅图像,第一层学习到的特征一般是边缘信息,即图像是否存在特定朝向或在特定的位置。
The second layer typically detects motifs by spotting particular arrangements of edges, regardless of small variations in the edge positions (identifying an eyebrow).
第二层检测边缘信息按特定方式组成及基图案(识别眉毛),而不关心边缘位置的变化。;
The third layer may assemble motifs into larger combinations that correspond to parts of familiar objects (a whole eye), and subsequent layers would detect objects as combinations of these parts
第三层将基本图案组合起来,对应于类似物体的部件(眼睛),后序层检测由这些部件组成的物体
(a face).
(脸)。
The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure.
深度学习的关键方面是这些特征层不是由人类工程师设计的:它们是使用通用学习程序从数据中学习的。
Reinforcement Learning
强化学习
Reinforcement learning allows for feedback to the machine so that it can improve its output the next time.
强化学习允许机器接受反馈,以便机器下次可以改善其输出。
Yes, that’s a cat./No, that’s a dog.
是的,那是一只猫。不,那是一只狗。
Yes, you can walk straight./No, you’ve hit a wall.
是的,你可以直着走。不,你已经撞墙了。
Yes, this e-mail subject line worked./No, people did not respond.
是的,这个电子邮件的主题有用./不,人们对主体没有回应。
Reinforcement happens when the machine gets feedback from the outside world—the environment—or from some of its own neurons.
当机器从外部世界 – 环境 – 或从其自身的一些神经元获得反馈时,就会发生强化。
The machine comes up with an opinion about which ad to show or how much budget to spend on specific search keywords and takes action.
机器会对要展示哪个广告或在特定搜索关键字上花费多少提出意见并采取行动。
The response to that action is reinforcement.
对某些行动的回应就是强化。
Data scientists refer to an AI system as an agent that is getting rewarded or penalized.
数据科学家认为这种AI系统是获得奖励或惩罚的代理。
This is different from supervised learning in that the feedback comes from the environment rather than a human supervisor.
它与监督学习的不同之处在于,反馈来自环境而非人类。
The machine is out there on its own, exploring the territory.
机器在那里独自探索。
It knows where you want it to end up, but you are not there to correct its every move.
它知道你想要它最终到达的位置,但你不能每一步都去教它(纠正它)。
This is the machine doing its best to create a mental model of the world, whether that’s dynamic content delivery, customer service, or just selecting the most impactful banner ad, and
机器尽最大努力创建心理模型,无论是动态内容投放,客户服务,还是只选择最具影响力的横幅广告,
88
taking action repeatedly, continuously improving its ability to get to the desired goal.
然后反复计算,不断提高其达到预期目标的能力。
In “The Future of Machine Intelligence,”14 Risto Miikkulainen puts
在“机器智能的未来”中,Benjamin Recht
it this way:
这样讲述:
Suppose you are driving a car or playing a game:
假设您正在开车或玩游戏:
It’s harder to define the optimal actions, and you don’t receive much feedback.
很难去定义什么事最佳的操作?您没有收到太多的反馈。
In other words, you can play the whole game of chess, and by the end, you’ve either won or lost.
换句话说,你可以玩一盘国际象棋游戏,到最后,你要么赢了,要么输了。
You know that if you lost, you probably made some poor choices.
你知道,如果你输了,你可能做出了一些糟糕的选择。
But which?
哪一步呢?
Or, if you won, which were the
或者,如果你赢了,
well-chosen actions?
哪几步是好棋呢?
This is, in a nutshell, a reinforcement learning problem.
简而言之,这就是强化学习的问题。
Put another way, in this paradigm, you receive feedback periodically.
换句话说,在这个范例中,您会定期收到反馈。
This feedback, furthermore, will only inform you about how well you did without in turn listing the optimal set of steps or actions you took.
此外,此反馈只会告诉您有多好,而不会依次列出您采取的最佳步骤或操作。
Instead, you have to discover those actions through exploration—testing diverse approaches and measuring their performance.
相反,您必须通过探索去发现这些操作-测试各种操作去测量其结果。
MAKE UP YOUR MIND
做决定
In considering all of the above (Do you need an F1 or a rally car?), Microsoft offers another set of deliberation factors in its paper, “How to Choose Algorithms for Microsoft Azure Machine Learning.”15 Your considerations are accuracy, training time, linearity, and parameters.
在考虑上述所有问题时(您需要F1还是拉力赛车?),Microsoft在其论文“如何为Microsoft Azure机器学习选择算法”中提供了另一组决定因素.您的考虑因素是准确性,训练时间,线性和参数。
With these four criteria, the article ranks 25 different algorithms for evaluation.
根据这四个标准,本文对25种不同的评估算法进行了排序。
The answer to the question “What Machine Learning algorithm should I use?” is always “It depends.”
“我应该使用什么机器学习算法?”的答案始终是“它取决于”。
It depends on the size, quality, and nature of the data.
这取决于数据的大小,质量和性质。
It depends on what you want to do with the answer.
这取决于你想对答案做什么。
It depends on how the math of the algorithm was translated into instructions for the computer you are using.
这取决于算法的数学如何转换为您正在使用的计算机的指令。
And it depends on how much time you have.
这取决于你有多少时间。
Even the most experienced data scientists can’t tell which algorithm will perform best before trying them.
在没有试用之前,即使是最有经验的数据科学家也无法确定哪种算法的表现最佳。
How important is accuracy?
准确性有多重要?
In a self-driving car, it might not be important to tell a cat from a dog or to determine if the car in front of you is 28.4346 feet away and is going 5.827 miles per hour faster than you.
在自动驾驶汽车中,区分一条狗和一只猫可能并不重要,确定你前面的汽车是否在28.4346英尺之外并且比你每小时快了5.827英里可能也并不重要。
Maybe 28.5 feet and 6 miles per hour is good enough.
也许28.5英尺和每小时6英里足够了。
89
If that’s the case, then you should be willing to trade a little accuracy for processing time.
如果是这种情况的话,那么您应该愿意损失一些准确性而获得更多的处理时间。
The amount of data you’re working with also impacts learning time.
您正在使用的数据量也会影响学习时间。
With lots of data you may have to sacrifice accuracy if you need the machine to learn fast.
如果数据很多,你为了让机器快速学习,可能需要损失一些准确度。
Linearity is the ability for the results to line up in a straight line on a graph.
线性是指结果在图表上以直线排列的特性。
In the same neighborhood, the more rooms a house has, the more it will cost.
在同一个社区,房子的房间越多,成本就越高。
The more you smoke, the more likely you are to get cancer.
吸烟越多,患癌症的可能性就越大。
When it comes to marketing, you can count yourself lucky when you find a situation that is linear:
在营销方面,当你发现一个线性的情况时,你可以说是很幸运的了:
The more you advertise, the more you sell.
你广告投的越多,你卖得越多。
This approach tends to be “algorithmically simple and fast to train.”
这种方法往往“算法简单,训练速度快”。
How many parameters do you have?
你有几个参数?
These “are the knobs a data scientist gets to turn when setting up an algorithm.”
这些“是数据科学家在设置算法时可以使用的旋钮。”
You may have a million customers or 200 million prospects, but that’s just the instances.
您可能拥有一百万客户或2亿潜在客户,但这仅仅是个例子。
The parameters are all the attributes about each one in your database and the level of cardinality (how many different options there are per attribute).
参数是数据库中每个参数的所有属性以及基数级别(每个属性有多少个不同的选项)。
You might have a range of 120 possible ages, 43,000 ZIP codes in the United States, and an attribute that shows how much each per- son likes vanilla ice cream on a scale of one to five.
您可能拥有120种可能的年龄值,43,000个美国邮政编码,以及一个属性来显示每个人喜欢香草冰淇淋的程度,程度的范围值从一到五。
“The training time and accuracy of the algorithm can sometimes be quite sensitive to get- ting just the right settings.
“算法的训练时间和准确性有时对正确的设置非常敏感。
Typically, algorithms with large numbers of parameters require the most trial and error to find a good combina- tion.”
通常,具有大量参数的算法需要很多的试验和错误才能找到优质的组合。“
Dataiku (rhymes with haiku) put together an infographic to help determine which algorithm is right for which use (Figure 2.10).16
Dataiku汇总了一个信息图,以帮助确定哪种算法适合哪种用途(图2.10)。
ONE ALGORITHM TO RULE THEM ALL?
一个算法来适用所有的情形?
How about using them all?
使用所有的算法?
If the data scientist on your team suggests using an ensemble algorithm, nod knowingly.
如果团队中的数据科学家建议使用整体算法,请点头表示同意。
All she means is taking the output from one method and using it as input of another, and then another, until everybody is satisfied that the predictions coming out are valuable.
她的意思是从一种方法中获取输出并将其用作另一种方法的输入,然后再试用另外以一种方法,直到每个人都满意-得到的预测是有价值的。
The ensemble approach sets up a working group or a coalition of AI methods to argue with each other as adversarial networks and form a consensus.
集合方法建立了一个工作组或AI方法联盟,以相互竞争并形成共识。
But shouldn’t there be a tried-and-true, tested, trusted, peer-reviewed combination of methods that is the agreed-upon solu- tion to all of our calculation problems?
但是,不应该已经有了一个经过验证的,经过测试的,可信赖的,经过同行评审的AI方法组合吗?它是我们协商后的所有计算问题的解决方案吗?
If only that were the case.
如果真的有这样的方法就好了。
In his book, The Master Algorithm:
在他的书中,主算法:
How the Quest for the Ultimate Learning Machine Will Remake Our World,17 Pedro Domingos spells
终极学习机器的追求将如何重塑我们的世界
90
Figure 2.10 Dataiku offers advice on how to fit the algorithm to the task.
91
out “The Five Tribes of Machine Learning”:
这本书阐述了“机器学习的五个流派”:
symbolists, connectionists, evolutionaries, Bayesians, and analogizers.
符号主义者,联结主义者,进化主义者,贝叶斯主义者,行为类比主义者
Symbolists lean on inverse deduction, starting with a set of premises and conclusions to work backward to fill in the gaps by manipulating symbols.
象征主义者依靠逆推导,从一组前提和结论开始,通过操纵符号逆向工作以填补现有知识的空白。
“Their master algorithm is inverse deduction, which figures out what knowledge is missing in order to make a deduction go through, and then makes it as general as possible.”
“他们的主算法是逆推导,它找出缺少哪些知识以便进行推论,然后使其尽可能通用。”
There are the connectionists who work on mimicking the brain, “by adjusting the strengths of connections between neurons.”
联结主义者通过调整神经元之间连接的强度来模仿大脑。
This is, “back propagation which comparison systems output with the desired one and then successively changes the connections in layer after layer of neurons so as to bring the output closer to what it should be.”
这就是“反向传播,它对系统输出的结果与理想结果进行比较,然后逐层改变一层又一层神经元的连接,以使输出更接近理想的结果。”
That is deep learning.
这也被称为深度学习。
Evolutionaries rely on the idea of genomes and DNA.
进化主义依赖于基因组和DNA的概念。
“The key problem that evolutionaries solve is learning structure: not just adjusting parameters, like back propagation does, but creating the brain that those adjustments can then fine-tune.”
“进化主义者解决的关键问题是学习结构:不仅仅是调整参数,就像反向传播做的一样,而是创造大脑去进行这些微调。”
This approach tries to get the machine to “mate” and “evolve” programs like living things.
这种方法试图使机器变成类似于“生物”那样进行进化的程序。
That makes them adaptable as they can adjust to the unknown.
这使得它们具有适应性,因为它们可以适应未知的情况。
Bayesians focus on uncertainty.
贝叶斯主义专注于不确定性。
“The problem then becomes how to deal with noisy, incomplete, and even contradictory informa- tion without falling apart.”
“问题就变成了如何处理嘈杂,不完整甚至相互矛盾的信息,而不会分崩离析。”
Using “probabilistic inference,” found in Bayes’ Theorem, “tells us how to incorporate new evidence into our beliefs.”
使用贝叶斯定理中的“概率推理”,“告诉我们如何将新证据纳入我们已有的看法中。”
This group calculates probabilities, taking flawed results into consideration, and then allows for actual results to feed back into the calculation.
贝叶斯理论计算概率时,将结果纳入考虑中,然后允许实际结果反馈到计算中。
Analogizers look at the similarities between situations to infer other parallels.
类比主义者会查看情境之间的相似之处,以推断其他相似之处。
“If two patients have similar symptoms, perhaps they have the same disease.”
“如果两名患者有类似的症状,也许他们患有同样的疾病。”
This group uses support vector machines, “which figure out which experience to remember and how to combine them to make new predictions.”
这个派别使用支持向量机算法,“它确定了要记住哪些经验以及如何将它们结合起来进行新的预测。”
How do all of these fit together in a master algorithm?
所有这些流派应该如何在主算法中组合在一起?
Each tribe’s solution to its central problem is a brilliant, hard one advance.
每个流派对其核心问题的解决方案都是一个聪明的,艰难的进步。
But the true master algorithm must solve all five problems, not just one.
但真正的主算法必须解决所有五个问题,而不仅仅是一个问题。
For example, to cure cancer we need to understand the metabolic networks in the cell: which genes regulate which others, which chemical reactions the resulting proteins control, and how adding a new molecule to the mix would affect the network.
例如,为了治愈癌症,我们需要了解细胞中的代谢网络:哪些基因调节哪些蛋白质,这些蛋白质控制哪些化学反应,以及在混合物中添加新分子会如何影响网络结构。
It would be silly to try and learn all of this
试图从头开始学习所有这些将是愚蠢的,忽略了生物学家几十年来辛辛苦苦积累的所有知识也是愚蠢的。
from scratch, ignoring all the knowledge that biologists have painstakingly accumulated over the decades.
92
Symbolists know how to combine the knowledge with data from DNA sequence, gene expression microarrays, and so on, to produce results that you couldn’t get with either alone.
符号主义者知道如何将DNA序列,基因芯片等知识结合起来,以产生单个知识无法获得的结果。
But the knowledge we obtain by inverse deduction is purely qualitative; we need to learn not to trust who interacts with whom, but how much, and back propagation can do that.
但我们通过逆推导得到的知识纯粹是定性的;我们需要了解不要相信与你打交道的人,但是程度是多少呢,反向传播可以做到这一点。
Nevertheless, both inverse deduction and back propagation would be lost in space without some basic structure in which to hang the interactions and parameters they find, and genetic programming can discover it.
然而,如果没有一些基本的结构设定,没有相互作用和参数,逆向推导和反向传播都会马上失效,遗传编程则可以。
At this point if we had complete knowledge of the metabolism and all the data relevant to a given patient, we could figure out a treatment for her.
如果我们完全了解新陈代谢以及与给定患者相关的所有数据,我们可以为她确定治疗方法。
But in reality the information we have is always very incomplete, and even incorrect in places; we need to make headway despite that, and that’s what probabilistic inference is for.
但实际上,我们所拥有的信息总是非常不完整,甚至在某些地方都是不正确的;尽管如此,我们仍需要继续前行,而这就是概率推论可以做的事情。
In the hardest cases, the patient cancer looks very different from previous ones, and all our learned knowledge fails.
在最困难的情况下,患者的癌症与以前的癌症看起来非常不同,并且我们所学的所有知识都用不上。
Similarity-based algorithms can save the day by seeing analogies between superficially very different situations, zeroing in on their essential similarities and ignoring the rest.
基于相似度的算法可以通过在差异很大的情况之间进行类比,关注核心的相似性并忽略其余情况来拯救生命。
Evolutionaries evolve structures; connectionists learn parameters; symbolists compose new elements on the fly; Bayesians weigh the evidence; and analogizers map the outcome to new situations.
进化论演化结构;联结者学习参数;符号主义者构成了新的元素;贝叶斯人权衡证据;模拟器将结果映射到情形上。
Domingos offers his theory, but so far, that’s as far as it goes.
多明戈斯提供了他的理论,这就是我们目前所拥有的理论知识。
If you’re a marketer, dedicated to understanding the whole enchilada, working your way through The Master Algorithm is a worthy challenge.
如果您是一名营销人员,致力于了解整个辣酱玉米饼馅的行业,那么学习The Master Algorithm是一项值得挑战的工作。
If you’re a data scientist, it’s a wonderfully light read.
如果你是一名数据科学家,这是一个非常轻松的阅读。
Given all of the above, you will fare far better with this technology if you’re comfortable with a little ambiguity.
考虑到上述所有情况,如果您对某种模糊性感到可接受意,那么使用机器学习这种技术您会变得更优秀。
ACCEPTING RANDOMNESS
接受随机性
Keeping in mind that “All models are wrong; some models are useful,” also know that randomness is your friend.
请记住“所有模型都是错误的;有些模型很有用,“也要知道随机性是你的朋友。
If all models are wrong, then you want to make sure they don’t do something catastrophic.
如果所有模型都错了,那么你要确保他们不会带来一些灾难性的后果。
Colin Fraser nails it.18
科林弗雷泽指出。
[A]ny time you are using a predictive model to make business decisions, you need to understand that the predictions will sometimes be wrong, and you need to
任何时候,您使用预测模型做出业务决策时,您需要知道预测有时会出错,你需要
93
understand the different ways that the model could be wrong.
了解模型可能出错的不同方式。
Maybe the model tends to over or underestimate for certain types of observations.
对于某些类型的观察,模型可能会高估或低估。
Maybe the model is very good at making predictions about one class of observation, but fails miserably for making predictions about some other class.
也许该模型非常擅长对某一类观察进行预测,但对于对其他一些类进行预测却不擅长。
And different types of models will have errors with different characteristics.
并且不同类型的模型有不同特征的错误。
You may have an option between two different models, one that is wrong often but only by a little bit, and one that is usually right but spectacularly wrong when it is wrong.
你可能需要在两个不同的模型之间做一个选择,一个经常是会出错,但是错的程度很低;另一个一般是正确的,但是如果出错,错误会非常大。
Some models provide the opportunity to tune parameters in order to favor one type of wrong to another.
有些模型提供了调整参数的机会,以便偏向于出这种类型的错误,不太容易出另一种错误。
Again, many of these types of hypotheses about model error can be tested prior to actually deploying the model by making sure to use a test set or some other method of validation.
同样,在实际部署模型之前,可以通过使用测试集或其他验证方法来检验模型错误类型的假设。
What happens when the model is wrong?
当模型出错时会发生什么?
Do we piss someone off?
我们惹某些人生气了吗?
Do we miss an opportunity?
我们错过机会吗?
Does someone die?
有人死了吗?
The costs of being wrong vary wildly from project to project, and as a manager of a project involving data science, it is your job to understand what those costs really are.
错误的成本因项目而异,并且作为数据科学的项目相关的经理,您应该了解这些错误的成本究竟有多大。
With an understanding of those costs, you are equipped to work collaboratively with a data scientist to tune the model to be wrong in all the right ways.
了解这些成本后,您就能够与数据科学家协作,去调整那些一直出错的模型。
The assumption is that once you get a machine to Do the Right Thing once, you can set it and forget it, right?
假设一旦你让一台机器做了一次正确的事,你就会保持设置并忘记了它,对吗?
Once it’s trained, it’ll just get smarter.
一旦机器受过训练,它就会变得更聪明。
That’s not so.
事实并非如此。
There are a lot of variables at work and it’s best to understand them well enough to have a healthy respect for varied outcomes.
工作中有很多变量,最好能够很好地理解模型,以便对各种结果有一个慎重的尊重。
If a machine learning system is fired up and trained using different data from the same data set, it’s going to come to different conclusions.
如果机器学习系统是使用来自同一数据集的不同数据进行整理和训练的,那么它将得到不同的结论。
How different is that?
那到底有多大的不同?
Different enough that they’ve named this effect model variance.
很不同,以至于他们已经将这种不同命名为模型方差。
Embrace this expectation and make sure the model is intentionally using randomized data.
接受这种不同并确保模型有意的使用随机数据。
The sequence that data elements are given to a neural network will have a big impact on the outcome.
将数据元素提供给神经网络的顺序将对结果产生重大的影响。
Best practice says that you should embrace randomness here and indiscriminately reorder those records to keep the machine on its toes.
最佳的实践表明,你应该在这里纳入随机性并不加区分地重新排序这些记录数据,让机器保持警觉。
The question is one of variance toleration.
这里的问题是方差容忍度。
If multiple iterations of your model disagree by more than you can tolerate, send it back to the lab.
如果模型多次迭代后得到的结果你依然不满意,请将其发送回实验室。
There, the data scientists can run a larger number of iterations and statistically cross-validate it to produce a high-confidence result.
数据科学家可以大量的迭代计算并在统计上交叉验证,进而产生高置信水平的结果。
94
WHICH TECH IS BEST?
哪种模型最好?
All of them are.
他们都是。
In “The Future of Machine Intelligence,”19 editor David Beyer asks Gurjeet Singh if there is a single view that’s ana- lytically superior toward the goal of understanding any particular problem.
在“机器智能的未来”中,编辑David Beyer问Gurjeet Singh:针对某个特定问题是否有一个模型理论上更擅长呢?。
Singh:
Singh:
You don’t necessarily want a single view.
您没必要盯着某一个模型。
There is no single right answer, because different combinations of these algorithms will produce different types of insights in your data.
这儿没有一个正确的答案,因为这些算法的不同组合将基于您的数据产生不同类型的想法。
They are all equally valid
它们都同样有效,
if you can prove their statistical validity.
如果你证明了他们的统计有效性。
You don’t want to somehow confine yourself to a single right answer.
你不要想着某种程度上将自己局限在某一个正确的答案。
You want to pull out statistically significant ideas from all of these.
你想从所有这些模型中获取统计上显着的想法。
Bayer:
Bayer:
Do the insights or results across different views of the same data ever contradict each other?
相同数据在不同模型中的见解或结果是否会相互矛盾?
Singh:
Singh:
In fact, one of the things that’s beneficial in our approach is that these algorithms are correlated with each other.
事实上,我们的方法中的一个益处是:这些算法是彼此相关的。
In many cases, you find the evidence for the same phenomena over and over again across multiple maps.
在许多情况下,您可以在多个地图上一次又一次地找到解释相同现象的证据。
Wherever that happens, you can be more confident about what you found.
无论是在哪儿发生,您都会更加确信您所发现的内容。
One of AI’s greatest achievements since IBM’s Watson beat- ing Ken Jennings at Jeopardy was Google’s AlphaGo beating Lee Sedol, a 9-dan Go professional, without handicaps in March 2016.
自从IBM的Watson在Jeopardy击败肯·詹宁斯以来,人工智能取得的最大成就之一就是2016年3月Google的AlphaGo毫无障碍的击败了Lee Sedol,一位9段专业人士。
How did they do it?
Alphago是如何做到的呢?
The paper published in Nature magazine (“Mastering the Game of Go with Deep Neural Networks and Tree Search”20) said that the research used “Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning from games of self-play.”
发表在“自然”杂志上的论文(“利用深度神经网络和树搜索掌握围棋”)表示,该研究使用了“蒙特卡罗树搜索与深度神经网络,还有已经过人类游戏专家的监督学习训练,自我游戏的强化学习。“
Now you know.
现在你知道答案了。
FOR THE MORE STATISTICALLY MINDED
想了解更多的统计知识
Not comfortable with your knowledge of statistics jargon?
你对自己的统计知识不太满意?
No worries; Himanshu Sharma is here to help with this post you can return to over and over again for a refresher.
别担心; Himanshu Sharma在这里帮助你列在这儿,你可以一遍又一遍地回到这里来复习。
It’s called “Bare Minimum Statistics for Web Analytics.”21
它被称为“网络分析的最低统计知识”。
95
Sharma does an admirable job spelling out:
Sharma做了令人钦佩的工作:
What Is Statistical Inference?
什么是统计推断?
It is the process of drawing (a) conclusion from the data which is subject to random variation.
它是从随机变化的数据中得出结论的过程。
Observational error is an example of statistical inference.
观测误差是统计推断的一个例子。
In order to minimize observational error, we need to segment the ecommerce conversion rate into visits and transactions.
为了最大限度地减少观测误差,我们需要将电子商务转化率这个指标细分为访问次数和交易次数。
What Is a Sample?
什么是样本?
A sample is that subset of population which represents the entire population.
样本是可以代表整体人口的子集。
So analysing the sample should produce similar results as analysing all of the population.
因此,分析样本应该产生与分析所有人口得到相似的结果。
Sampling is carried out to analyse large data sets in a reasonable amount of time and in a cost efficient manner.
进行抽样(抽取样本)为在合理的时间以合理的成本来分析大量数据。
What Is Statistical Significance?
什么是统计显着性?
Statistical significance means statistically meaningful.
统计显著性意味着在统计上是有意义的。
Statistical significant result—result which is unlikely to have occurred by chance.
统计上显著的结果 – 结果不太可能是偶然发生的。
Statistically insignificant result—result which is likely to have occurred by chance.
统计上不显著的结果 – 结果很可能是偶然发生的。
What Is Noise?
什么是噪声?
Noise is the amount of unexplained variation/randomness in a sample.
噪声是样本中无法解释的变化/随机性。
Confidence (or Statistical Confidence) is the confidence that the result has not occurred by a chance.
置信度(或统计置信度)是结果不是偶然发生的置信水平。
What Is a Null Hypothesis?
什么是零假设?
According to null hypothesis, any kind of difference you see in a data set is due to chance and not due to a
根据零假设,您在数据集中看到的任何差异都是偶然的,不是由于某种
96
particular relationship.
特定的关系。
Null hypothesis can never be proven.
零假设是无法被证明的。
A statistical test can only reject a null hypothesis or fail to reject a null hypothesis.
统计检验只能拒绝零假设或不能拒绝零假设。
It cannot prove a null hypothesis.
它不能去证明零假设。
What Is an Alternative Hypothesis?
什么是备择假设?
An alternative hypothesis is the opposite of the null hypothesis.
备择假设与零假设是对立的。
According to alternative hypothesis, any kind of difference you see in a data set is due to a particular relationship and not due to chance.
根据备择假设,您在数据集中看到的任何差异都是由于特定关系而产生的而非偶然产生的。
In statistics the only way to prove your hypothesis is to reject the null hypothesis.
在统计学中,证明你的假设的唯一方法是拒绝零假设。
You don’t prove the alternative hypothesis to support your hypothesis.
您不能去证明备择假设去证明你的假设。
Remember your hypothesis needs to (be) based on qualitative data and not on personal opinion.
请记住,您的假设需要(基于)定性数据,而不是个人意见。
What Is a False Positive?
什么是误报?
False positive is a positive test result which is more likely to be false than true.
误报是一个更可能是负的结果却被检测为正。
For example, an A/B test which shows that one variation is better than the other when it is not really the case.
例如,一个A / B测试表明,一个变化优于另一个变化,但现实确实刚好是相反的。
What Is a Type I Error?
什么是I类错误?
Type I error is the incorrect rejection of a true null hypothesis.
I类错误是对零假设的错误拒绝。
It represents a false positive error.
它代表了一个误报类型的错误。
What Is a Type II Error?
什么是II类错误?
Type II error is the failure to reject a false null hypothesis.
II类错误是未能拒绝假的零假设。
It represents a false negative error.
它代表了把一个非法的判断成合法的类型的错误(漏报)。
All statistical tests have a probability of making type I and type II errors.
所有统计检验都有可能发生I类和II类错误。
What Is a Correlation?
什么是相关?
Correlation is a statistical measurement of relationship between two variables.
相关性是两个变量之间关系的统计上的衡量。
Let us suppose “A” and “B” are two variables.
让我们假设“A”和“B”是两个变量。
If as A goes up, B goes up, then A and B are positively correlated.
如果A上升,B也上升,则A和B是正相关的。
However, if as A goes up, B goes down, then A and B are negatively correlated.
但是,如果A上升,B下降,则A和B是负相关的。
97
What Is Causation?
什么是因果关系?
Causation is the theory that something happened as a result.
因果关系是一种结果理论,即某件事引起了一个结果。
For example, fall in temperature increased the sale of hot drinks.
例如,气温下降增加了热饮的销售。
Sharma does a superb job relating these terms to web analytics, along with practical examples and detailed descriptions.
Sharma列出了网络分析相关的术语,实操案例以及一些细节描述,这个工作很出色。
This is worth book- marking.
这本书值得一读。
■■■
Vincent Granville does an equally wonderful job going a bit more technical and a lot more mathematical in his post at Data Science Central where he spells out “24 Uses of Statistical Modeling.”22 Here are a handful of those two-dozen uses.
Vincent Granville的工作也同样出色,他在数据科学中心(Data Science Central)上发表了“24个统计建模的应用”的,以下是这些用法中的一小部分。
Spatial Models
空间模型
Spatial dependency is the co-variation of properties within geographic space: characteristics at proximal locations appear to be correlated, either positively or negatively.
空间依赖性指的是一个地理空间内的属性的共同变化:位置相近的特征似乎也是相关的,要么正相关,要么负相关。
Spatial dependency leads to the spatial auto-correlation problem in statistics since, like temporal auto-correlation, this violates standard statistical techniques that assume independence among observations
空间依赖性带来了统计中的空间自相关的问题,就像时间自相关一样,这违反了统计学的基本假设:变量之间相互独立。
Time Series
时间序列
Methods for time series analyses may be divided into two classes: frequency-domain methods and
时间序列分析的方法可以分为两类:频域分析和
time-domain methods.
时域分析。
The former include spectral analysis and recently wavelet analysis; the latter include auto-correlation and cross-correlation analysis.
前者包括光谱分析和小波分析;后者包括自相关和互相关分析。
In time domain, correlation analyses can be made in a filter-like manner using scaled correlation, thereby mitigating the need to operate in frequency domain.
在时域中,可以使用缩放的相关以类似滤波器的方式进行相关性分析,从而减轻在频域中操作的需要。
Additionally, time series analysis techniques may be divided into parametric and non-parametric methods.
另外,时间序列分析技术可以分为参数和非参数方法。
The parametric approaches assume that the underlying stationary stochastic process has a certain structure which can be described using a small number of parameters (for example, using an autoregressive or
参数方法假设平稳随机过程具有某种结构,该结构可以使用少量参数来描述的(例如,使用自回归 或者
98
moving average model).
移动平均模型)
In these approaches, the task is to estimate the parameters of the model that describes the stochastic process.
在这些方法中,任务是估计出描述平稳随机过程的模型参数。
By contrast,
对比之下,
non-parametric approaches explicitly estimate the covariance or the spectrum of the process without assuming that the process has any particular structure.
非参数方法明确地估计过程的协方差或频谱,而不必假设过程具有任何特定结构。
Methods of time series analysis may also be divided into linear and non-linear, and univariate and multivariate.
时间序列分析的方法也可以分为线性和非线性分析,以及单变量和多变量分析。
Market Segmentation
市场细分
Market segmentation, also called customer profiling, is a marketing strategy which involves dividing a broad target market into subsets of consumers, businesses, or countries that have, or are perceived to have, common needs, interests, and priorities, and then designing and implementing strategies to target them.
市场细分,也称为客户分类,是一种营销策略,是将大的目标市场划分成小的子集,划分的标准是基于共同的或者认为有共同需求,利益和优先级的消费者,企业或者国家,然后再设计和实施对应的策略。
Market segmentation strategies are generally used to identify and further define the target customers, and provide supporting data for marketing plan elements such as positioning to achieve certain marketing plan objectives.
市场细分策略通常用于识别和进一步找到目标客户,并为营销计划细节提供支持数据,例如进行品牌定位以实现某些营销计划的目标。
Businesses may develop product differentiation strategies, or an undifferentiated approach, involving specific products or product lines depending on the specific demand and attributes of the target segment.
企业可以根据目标细分市场的特定需求和属性,为特定产品或产品线的制定差异化策略或提供统一的策略。
Recommendation Systems
推荐系统
Recommender systems or recommendation systems (sometimes replacing “system” with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the “rating” or “preference” that a user would give to an item.
推荐系统或建议系统(有时用诸如“平台”或“引擎”之类的同义词替换“系统”)是信息过滤系统的子类,这个系统会去预测用户将给予项目的“评级”或“偏好”。
Association Rule Learning
关联规则学习
Association rule learning is a method for discovering interesting relations between variables in large databases.
关联规则学习是一种发现大型数据库中变量之间有趣关系的方法。
For example, the rule { onions, potatoes }
例如,在超市的销售数据中找到的规则
== > { burger } found in the sales data of a
{洋葱,土豆} ==> {汉堡},
supermarket would indicate that if a customer
这意味着一个消费者如果
99
buys onions and potatoes together, they are likely to also buy hamburger meat.
一起买了洋葱和土豆,他们也有可能会买汉堡。
In fraud detection, association rules are used to detect patterns associated with fraud.
在欺诈检测中,关联规则用于检测与欺诈相关的模式。
Linkage analysis is performed to identify additional fraud cases: if credit card transaction from user A was used to make a fraudulent purchase at store B, by analyzing all transactions from store B, we might find another user C with fraudulent activity.
通过连锁分析来识别其他的欺诈案例:如果用户A的信用卡在商店B进行了欺诈性购买,那么通过分析商店B的所有交易,我们可能找到有欺诈活动的另一个用户C.
Attribution Modeling
归因模型
An attribution model is the rule, or set of rules, that determines how credit for sales and conversions is assigned to touchpoints in conversion paths.
归因模型是一个规则或一组规则,用于确定如何将对销售和转化的贡献分配给转化路径中的各个接触点。
For example, the Last Interaction model in Google Analytics assigns 100% credit to the final touchpoints (i.e., clicks) that immediately precede sales or conversions.
例如,Google Analytics中的“最后交互”模型给销售或转化之前的最后一次接触点(即购买按钮的点击)分配了100%的贡献成绩。
Macro-economic models use long-term, aggregated historical data to assign, for each sale or conversion, an attribution weight to a number of channels.
宏观经济模型使用长期的,累计的历史数据把每次销售或转化的贡献权重分配给多个渠道。
These models are also used for advertising mix optimization.
这些模型还用于广告渠道的优化。
Clustering
聚类
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).
聚类分析或聚类是以某种方式对一组对象进行分组,同一组(或称为簇群)中的对象之间,在某种意义上更加相似,与其他组(簇群)中的对象差异相对较大 。
It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.
它是探索性数据挖掘的主要任务,也是统计分析的常用方法,应用领域广发,包括机器学习,模式识别,图像分析,信息检索和生物信息学。
Unlike supervised classification, clustering does not use training sets.
与监督分类不同,聚类不使用训练集。
Though there are some hybrid implementations, called semi-supervised learning.
不过业界有一些混合的模型,称为半监督学习。
Churn Analysis
流失分析
Customer churn analysis helps you identify and focus on higher value customers, determine what actions typically precede a lost customer or sale, and better
客户流失分析可帮助您识别并关注告价值客户,确定哪些是客户流失或销售流失的前置指标,
100
understand what factors influence customer retention.
更好的了解影响客户留存的因素。
Statistical techniques involved include survival analysis as well as Markov chains with four states:
涉及的统计技术包括生存分析以及具有四种状态的马尔可夫链分析:
brand new customer, returning customer, inactive (lost) customer, and re-acquired customer, along with path analysis (including root cause analysis) to understand how customers move from one state to another, to maximize profit.
新客户,回头客,非活动(丢失)客户和重新获得的客户,还有路径分析(包括根本原因分析),以了解客户如何从一个状态转变到另一个状态,以最大限度地提高利润。
Related topics: customer lifetime value, cost of user acquisition, user retention.
相关的主题有:客户终身价值,获客成本,用户留存。
Optimum Bidding
最优竞价
This is an example of automated, black-box, machine-to-machine communication system, sometimes working in real time, via various APIs.
这是自动化的机器和机器之间通信的黑盒子的一个例子,有时通过各种API实时进行。
It is backed by statistical models.
它是有统计模型支持的。
Applications include detecting and purchasing the right keywords at the right price on Google AdWords, based on expected conversion rates for millions of keywords, most of them having no historical data; keywords are categorized using an indexation algorithm and aggregated into buckets (categories) to get some historical data with statistical significance, at the bucket level.
应用的领域包括根据数百万关键字的预期转化率,在Google AdWords上找到并以合适的价格去购买正确的关键词,其中大多数关键词没有历史数据可以参考;通过使用索引算法对关键词进行分类,并关键词汇总到桶(类)中,以便在类的级别去获得具有统计显著性的历史数据。
This is a real problem for companies such as Amazon or eBay.
对于像亚马逊或eBay这样的公司来说,这个算法很重要。
Or it could be used as the core algorithm for automated high frequency stock trading.
它也可以作为自动化高频股票交易的核心算法。
Multivariate Testing
多变量分析
Multivariate testing is a technique for testing an hypothesis in which multiple variables are modified.
多变量分析是一种用于测试多个变量变化的效果。
The goal is to determine which combination of variations performs the best out of all of the possible combinations.
目标是确定哪些变量的组合在所有可能的组合中表现最佳。
Websites and mobile apps are made of combinations of changeable elements that are optimized using multivariate testing.
网站和移动APP由多个可变元素的组合组成,这些元素可以使用多变量分析进行优化。
This involves careful design-of-experiment, and the tiny, temporary difference (in yield or web traffic) between two versions of a webpage might not have statistical significance.
这需要仔细的实验设计,因为两个版本的网页之间微小的,临时的差异(转化或网络流量)可能没有统计上的显著性。
While ANOVA and tests of hypotheses are used by industrial or healthcare statisticians for multivariate testing, we have developed systems that are model-free, data-driven, based on data binning
虽然方差分析和假设检验被工业或医疗统计学家用于多变量分析,但我们基于数据分箱技术和无模型置信区间开发了的无模型,纯数据驱动的方法。
101
and model-free confidence intervals.
Stopping a multivariate testing experiment (they usually last 14 days for web page optimization) as soon as the winning combination is identified, helps save a lot of money.
一旦最优的组合被识别,就停止多变量分析实验(对于网页优化来讲,它们通常持续14天的时间),这有助于节省大量预算。
Note that external events—for instance a holiday or some server outage—can impact the results of multivariate testing, and need to be addressed.
请注意,外部事件(例如假日或服务器中断)可能会影响多变量分析的结果,需要解决这类问题。
WHAT DID WE LEARN?
我们学到了什么?
All models are wrong; some models are useful.
所有模型都错了;有些有用。
Machine learning is great with a lot of diverse data, but has a lot to learn.
有很多不同类型的数据时,机器学习很棒,但它还有很多需要学习的地方。
Classification is putting things in buckets while regression smooths them out along a spectrum.
分类是将东西放入类别中,而回归使它们的分布显得平滑。
Supervised machine learning solves a problem with your help while unsupervised machine learning looks for interesting things to show you.
有监督的机器学习在你的帮助下解决问题,而无监督的机器学习则去寻找有趣的事情向您汇报。
Both are happy to be tutored by a human or by a direct response from the environment.
两人都可以接受人类或环境的直接反应。
Machine learning is an effort to build a system that automati- cally improves with experience.
机器学习是寻求建立一个随经验自动改善的系统。
Machine learning is really good at spotting one thing that is not like the others.
机器学习非常善于发现一件事与其他事的不同。
Neural networks are good at weighing lots of factors.
神经网络擅长在许多因素进行权衡。
AI deals in probability rather than accounting.
AI处理的是概率而不是会计。
It’s best to get comfortable with a little fuzziness rather than hope for definitive answers.
所以你最好满足于朦胧的美,而不是去追寻明确的答案。
Next, how do we apply all of this to marketing?
接下来,我们如何将所有这些知识应用于营销呢?
C H A P T E R 3