Who Is Nassim Taleb?
Who Is Nassim Taleb?

Who Is Nassim Taleb?

image

Highlights

  • Modern polymath
  • Billionaire investor
  • Bestselling author
  • PhD in risk management
  • NYU professor

Overview

Nassim Taleb is an important person to read and follow right now because he predicted, prepared for, and has helped others prepare for events like the ones we're experiencing right now for his entire career.

He started spending 40 hours per week doing deliberate learning across disciplines as a teenager in Lebanon.

I started, around the age of thirteen, to keep a log of my reading hours, shooting for between thirty and sixty a week, a practice I’ve kept up for a long time.—Nassim Taleb

He has kept up this discipline for over forty years…

When I decided to come to the United States, I repeated, around the age of eighteen, the marathon exercise by buying a few hundred books in English (by authors ranging from Trollope to Burke, Macaulay, and Gibbon, with Anaïs Nin and other then fashionable authors de scandale), not showing up to class, and keeping the thirty- to sixty-hour discipline.—Nassim Taleb

In short Nassim Taleb copied two of the core habits of many of today’s top thinkers and innovators like Bill Gates, Warren Buffett, Elon Musk, and Ray Dalio. He followed the 5-Hour Rule and became a Modern Polymath.

While at the Wharton School Of Business, Taleb became obsessed with probability and rare events.

So I went to the bookstore and ordered (there was no Web at the time) almost every book with “probability” or “stochastic” in its title. I read nothing else for a couple of years, no course material, no newspaper, no literature, nothing. I read them in bed, jumping from one book to the next when stuck with something I did not get immediately or felt ever so slightly bored. And I kept ordering those books. I was hungry to go deeper into the problem of small probabilities. It was effortless. That was my best investment—risk turned out to be the topic I know the best.—Nassim Taleb

Over time, he started having unique insights and realizing holes in the models of academics.

[I] smelled some flaws with statistical stuff that the professor could not explain, brushing them away—it was what the professor was brushing away that had to be the meat.—Nassim Taleb

Here's how he describes the big idea that has animated his career...

The best description of my lifelong business in the market is “skewed bets,” that is, I try to benefit from rare events, events that do not tend to repeat themselves frequently, but accordingly, present a large payoff when they occur. I try to make money infrequently, as infrequently as possible, simply because I believe that rare events are not fairly valued, and that the rare the event, the more undervalued it will be in price.—Nassim Taleb

However, whenever Taleb shared his philosophies with academics, he was shot down…

But I could not articulate my realization clearly, and was getting humiliated by people who started smoking me with complicated math.

Just five years after graduating from Wharton, Taleb was vindicated and set for life. He had gone into the finance field as an options trader. During the rare event of the 1987 crash, he options paid off, and he made his F*** You Money. About 10 years later, in his late 30s, Taleb decided to make a career change (although he has remained active in the market with his own money)…

After closing about 200,000 option transactions (that is separate option tickets) over 12 years and studying about 70,000 risk management reports, I felt that I needed to sit down and reflect on the thousands of mishedges I had committed. ... I quietly deposited my necktie in the trash can at the corner of Forty-fifth Street and Park Avenue in New York. I decided to take a few years off and locked myself in the attic, trying to express what was coming out of my guts, trying to frame what I called "hidden nonlinearities" and their effects. ... I clambered up to my attic where, during 6 entire months, I spent 14 hours a day, 7 days a week, immersed in probability theory, numerical analysis, and mathematical statistics (at a Ph.D. level ). ... I fondly remember the two harsh New York winters in the near-complete silence of the attic, with the luminous effect of the sun shining on the snow warming up both the room and the project. I thought of nothing else for years. ...—Nassim Taleb

In short, Taleb is a modern polymath, auto-didact, PhD, bestselling author, professor, and millionaire. He has successfully combined many amazing feats that would normally take someone a lifetime into one person. He has developed an extremely unique, consequential, and counterintuitive philosophy, tested it with his own money, and then shared it so as to become widely respected in academic communities, among world leaders, and among the public.

Works

Armed with a successful track record and a deep mathematical understanding of that track record, Taleb started writing a series of bestselling books with millions of raving readers (including nobel laureates and government leaders). He also became a Professor. His books combined (except Dynamic Hedging) make up the Incerto series:

Understanding The Multi-Disciplinarity Of The Incerto Series

Taleb created the following map to show how all of the topics he writes on are connected:

Source: Nassim Taleb
Source: Nassim Taleb

Key

  • Arrows.
    • “Point out, many disciplines, schools of thought, and, sadly, subdisciplines do not talk to each other. I mean, really, do not have any interest in one another.” –Nassim Taleb
    • “Points out to possible absence (total or partial) of overlap between neighboring units. For instance, some circles only partially intersect with a discipline: take the Skeptical Empirical Tradition to the NorthWest. Montaigne’s works show partial overlap as, being rather independent (financially), he was only tangentially in that tradition; he was also a stoic –and, not seeing some inconsistencies, he was mostly a human. Note here that such independence is largely attributable to the fact that Montaigne was not a scholar, as scholars (that is, scholars on a salary) tend to stay firmly within available disciplines, much like bank tellers repair daily to the same office at a predictable and rarely variable hour. Professional scholars lack the sense of adventure of us humans –to wit the low level of erudition to be found in the Herr Professor Doktors around the world. The reader can see some interdisciplinary activity with contract theory which straddles ethics, law, insurance, and the psychology of uncertainty –but this is rare.” —Nassim Taleb
  • Colors. Describe the disciplines
    • Red = Legal theory
    • Purple = Social science
    • Blue = Philosophy
    • Green = Mathematics
    • Black = Real World (Fat Tonyism)
  • Circles (or squares). Show positioning of the subdiscipline.

Takeaways

  • Siloed Thinking. One interesting observation from Taleb (which you can see in the arrows above is that many of the fields which talk about the same underlying ideas do not talk to each other. In many ways, it’s like the parable of the blind men touching different parts of the elephant and proclaiming they are touching different things.
  • “Knowing if a domain is fattailed and the class of risks it entails is vastly more important than producing probabilistic estimates.” —Nassim Taleb

Disciplines

Contract Theory

Now, let us discuss the interaction between disciplines We start with the least obvious. Who were the most sophisticated with nuances of probability? As the reader can guess from our previous paragraph, not the mathematicians. And most certainly not the social “scientists”. Not quite the philosophers –in the standard sense. Legal scholars, it turned out. A large segment of legal theory is meant to mitigate uncertainty and the effects of contingency to specific and general agents. They had very sophisticated methods and a healthy way of thinking about the problem. For instance, Pierre de Jean Olivi, a scholastic thinker, had an impressively detailed understanding of contingency and risk sharing, and one that is hard to find even in modern times. Risk sharing? Yes, it leads us to contract theory. Why contract theory? Because it entails the understanding of ways to find protection from Black Swans –rather than naive computation of odds that we will get wrong anyway.

Option Theory

My option trading career lasted for more than twenty years (options are often called “derivatives”). My profession, option theory, which consists in designing structures that have a certain payoff under uncertainty, is closest to contract theory than anything else. How? You don’t understand “tail events?” Don’t fool yourself. Cut the exposure by making sure you have a contract for that. And sophisticated firms knew that it was better to employ (or use) three lawyers for every mathematician–with lawyers you get protection; with mathematicians you tend to blow up.

Explaining The Lack Of Overlap In Fields

The reader will hopefully notice that understanding randomness doesn’t allow one to understand fat tails. We see no overlap between skeptical empiricism and the mathematics of large deviations. Note a technical point that is not developed within the Incerto proper, but in parallel research: even within what is called large deviation theory within mathematics, there is no overlap between that theory and fat tails owing to the so-called Cramér condition. Nor do we see overlap between the “heuristics and biases tradition” in psychology and decision-science pioneered by Kahneman and Tversky and fat tails, which is a tragedy: many pieces of research in fact pathologize people from worrying about fat tails –but, remarkably, the original Kahneman-Tversky works and the successor type of research explains biases making people underestimate tails via underestimation of randomness and overconfidence, which blows up in the tails. Just as economists who, knowing about fat tails without understanding them, make huge inferential errors. Some psychologists, alas, [and nonpsychologists such as the verbose mouth-breathing legal commentator Cass Sunstein] find some of our risk-mitigation methods irrational. They find it irrational that we worry about Ebola more than falls from ladders that have killed many more people than Ebola. But Ebola is multiplicative: it has a very small probability of very large uncontrollable spread in the age of physical connectivity. Falls from ladder are from Mediocristan. They can’t decimate the European population. So it is the psychologists who, using the wrong models, are irrational, not humans. Indeed logic deals with mistaking absence of evidence from evidence of absence –at the heart of Popperian asymmetry. No work has been done to show that, mathematically, the difference between absence of evidence and evidence of absence is greater in Extremistan. And needless to say that because of their lack of focus on fat tails, psychologists of uncertainty typically confuse the two – and make horrendous analytical mistakes. Also it takes more data under fat tails to see what is going on, which links us to the problem of skepticism. The law of large numbers is a mathematical counterpart to the philosophical problem of induction, but the two are not linked in the tradition. Finally, consider the point belabored in Fooled by Randomness about the link between statistics and skeptical philosophy. Unlike what proponents of “big data” want you think, statistics is there to provide a rational mechanism to eliminate certainties –and avoid being fooled by randomness by believing in chance associations and spurious links. It is first and last the application of skeptical empiricism to mundane affairs –and consequential ones. I remember one day, close to fifteen years ago, having a discussion with a professor of the clueless variety cum a lot of publications on risk methods and many decorations. He had written a lot of papers on Monte Carlo studies that is, random simulations. It turned out that he had never considered that “variance” for a probability distribution (or the degree of variability) maps to degree of ignorance about outcomes and had an immediate epistemic mapping to real life. And of course to philosophy. He even found the connection strange. I was altogether shocked, depressed, and excited at discovering that someone could spend a lifetime doing something without connecting the dots. This is naturally the result of writing about a subject without skin in the game, which enforces contact with reality –making decisions allows us to see what connects to what. The following fifteen years confirmed to me that such mental hurdle was the norm in academia because of the way the system was constructed. So, in the end, the reader can see that the history of ideas of probability, risk and decisions has been broken up into unconnected small petty pursuits that it is high time to merge in one manner or another. The beauty of mathematical sciences (which include logic and most philosophy) is that no matter where you start you end up with the exact same result for the same problems and the same assumptions. The same cannot be said about social “science” and similar disciplines that depend on the use of words and can be fooled by them –for instance “risk” in social science mixes ruin probability and potential profits coming from variability –“not the same ting “as Fat Tony would say.

Nassim Taleb’s Key Terms

Law Of Large Numbers

How long it takes for the sample mean to stabilize, works much more slowly in Extremistan.
Reduces the variance of the estimation as one increases the sample size.
The law of large numbers, when it works, works too slowly in the real world. This is more shocking than you think as it cancels most statistical estimates.
Let us now discuss the law of large numbers which is the basis of much of statistics. The law of large numbers tells us that as we add observations the mean becomes more stable, the rate being around square root of n. It takes many more observations under a fat-tailed distribution (on the right hand side) for the mean to stabilize.) One of the best known statistical phenomena is Pareto’s 80/20. Table 3.1 shows that while it takes 30 observations in the Gaussian to stabilize the mean up to a given level, it takes 10^11 observations in the Pareto to bring the sample error down by the same amount (assuming the mean exists).
image

Central Limit Theorom

Assumed to work for “large” sums, thus making about everything convently normal)

Option Types

Vanilla

image
“A vanilla option is a financial instrument that gives the holder the right, but not the obligation, to buy or sell an underlying asset at a predetermined price within a given timeframe. A vanilla option is a call option or put option that has no special or unusual features.” —Investopedia
“Vanilla” predictions, also known as natural exposures, correspond to situations in which the payoff is continuous and can take several values. The “vanilla” designation comes from option exposures that are open-ended as opposed to the binary ones that are called “exotic”. The designation “vanilla” is fitting outside option trading because the exposures they designate are naturally occurring continuous variables, as opposed to the binary that which tend to involve abrupt institution-mandated discontinuities. The vanilla add a layer of complication: profits for companies or deaths due to terrorism or war can take many, many potential values. You can predict the company will be “profitable”, but the profit could be $1 or 10 billion.

Binary

image
Predictions, bets, and exposures with a yes/no payoff.
“Binary predictions are about well defined discrete events, such as whether a person will win the election, a single individual will die, a team will win a contest. We call them binary because the outcome is either 0 (the event does not take place) or 1 (the event took place).” —Nassim Taleb & Philip Tetlock

Distributions

image
At the bottom left we have the degenerate distribution where there is only one possible outcome i.e. no randomness and no variation. Then, above it, there is the Bernoulli distribution which has two possible outcomes, not more. Then above it there are the two Gaussians. There is the natural Gaussian (with support on minus and plus infinity), and Gaussians that are reached by adding random walks (with compact support, sort of, unless we have infinite summands)5. These are completely different animals since one can deliver infinity and the other cannot (except asymptotically). Then above the Gaussians sit the distributions in the subexponential class that are not members of the power law class. These members have all moments. The subexponential class includes the lognormal, which is one of the strangest things in statistics because sometimes it cheats and fools us. At low variance, it is thin-tailed; at high variance, it behaves like the very thick tailed. Some people take it as good news that the data is not Paretian but lognormal; it is not necessarily so. Membership in the subexponential class does not satisfy the so-called Cramer condition, allowing insurability, as we illustrated in Figure 3.1, recall out thought experiment in the beginning of the chapter. More technically, the Cramer condition means that the expectation of the exponential of the random variable exists.6 Once we leave the yellow zone, where the law of large numbers (LLN) largely works7, and the central limit theorem (CLT) eventually ends up working8, then we encounter convergence problems. So here we have what are called power laws. We rank them by their tail index α, on which later; take for now that the lower the tail index, the fatter the tails. When the tail index is α ≤ 3 we call it supercubic (α = 3 is cubic). That’s an informal borderline: the distribution has no moment other than the first and second, meaning both the laws of large number and the central limit theorem apply in theory. Then there is a class with α ≤ 2 we call the Levy-Stable to simplify (though it includes similar power law distributions with α less than 2 not explicitly in that class; but in theory, as we add add up variables, the sum ends up in that class rather than in the Gaussian one thanks to something called the generalized central limit theorem, GCLT ). From here up we are increasingly in trouble because there is no variance. For 1 ≤ α ≤ 2 there is no variance, but mean absolute deviation (that is, the average variations taken in absolute value) exists. Further up, in the top segment, there is no mean. We call it the Fuhgetaboudit. If you see something in that category, you go home and you don’t talk about it.

Thick Tailed

“This is any distribution with fatter tails than the Gaussian i.e. with more observations within +/-1 standard deviation than erf (1 / square root of 2) ≈ 68.2% and with kurtosis (a function of the fourth central moment) higher than 3.

Subexponential

Unless they’ll enter the class of power laws, distributions are not really thick tailed because they do not have monstrous impacts from rare events. In other words, they can have all the moments.

Lognormal

The subexponential class includes the lognormal, which is one of the strangest things in statistics because sometimes it cheats and fools us. At low variance, it is thin-tailed; at high variance, it behaves like the very thick tailed.

Power Law (Paretian)

These correspond to real thick tails but the fattailedness depends on the parametrization of their tail index. Without getting into a tail index for now, consider that there will be some moment that is infinite, and moments higher than that one will also be infinite.

Right Tailed

image

Left Tailed

image
  • Tail - Rare event
  • Fat tail
  • Vanilla option
  • Ergocity - a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense.
  • Forecast - binary outcome
  • Exposure - has more nuanced results and depends on full distribution
  • Random walk
  • Flatten tails
  • Preasymptotics
  • Parameter
  • Cramer Condition
  • Fattening tails - The probability of an event staying within one standard deviation rises... As we fatten the tails, we get higher peaks, smaller shoulders, and a high incidence of a very large deviation... Because probabilities need to add up to 1, increasing mass in one area leads to decreasing it in another.
  • Gaussian
  • Karl Popper Assymetry

Deterministic vs Stochastic

  • Deterministic - A deterministic system is a system in which no randomness is involved in the development of future states of the system.
  • Stochastic - having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.

Mediocristan

In Mediocristan, when a sample under consideration gets large, no single observation can really modify the statistical properties.

Catastrophe Principle

Insurance can only work in Mediocristan. For insurability, losses need to be more likely to come from many events than a single one, thus allowing for diversification.

Extremistan

In Extremistan, the tails (rare events) play disproportionately large role in determining the properties.
Ruin is more likely to come from a single extreme event than from a series of bad episodes.

Nassim Taleb’s Key Ideas

  • Lindy Effect
  • Precautionary Principle

Tail Wags The Dog Effect

Centrally, the thicker the tails of the distribution, the more the tail wags the dog, that is, the information resides in the tails and less so in the “body” (the central part) of the distribution. Effectively, for vert fat tailed phenomena, all deviations become informationally sterile except for the large ones.

Implications

The center becomes just noise. Although the “evidence-based” science might not quite get it yet, under such conditions, there is no evidence in the body.
The property explains why, for instance, a million observations of white swans do not confirm the non-existence of black swans, or why a million confirmatory observations count less than a single disconfirmatory one.

Gray Swan Events

Rare, consequential events that are predictable (ie - very long cycles repeating.

Black Swan Events

Rare, unpredictable events that are EXTREMELY consequential.
Black swans result from the incompleteness of knowledge with effects that can be very consequential in fat tailed domains.

Skin In The Game

A filtering mechanism that forces cooks to eat their own cooking and be exposed to harm in the event of failure, thus throws dangerous people out of the system. Skin in the game fields are where operators are evaluated by tangible results or subjected to ruin and bankrupts.

Fields that have skin in the game

  • Plumbing
  • Dentistry
  • Surgery
  • Engineering

Fields with no skin in the game

  • Circular academic fields — where people rely on peer assessment rather than survival pressures from reality.

How To Deal With Uncertainty: Forecasting Is Overrated

For there are two manners to deal with uncertainty: (1) try to better understand the world in a way that allows you to formulate precise forecasts, (2) try to avoid being harmed by what you do not understand. We have accomplished very little through the first approach, perhaps underwent serious degradation, but made excellent leaps in the second. How? That’s where legal and contract theory come in. Now, note that Antifragile is ensconced in the second approach: how to deal with exposure to something rather than focus on that something. If one cannot forecast, better benefit from random events and use randomness as fuel for improvement. Likewise the mapping of fragility allows the building of contracts that remove such fragility.

Ludic Fallacy

Definition

The ludic fallacy, proposed by Nassim Nicholas Taleb in his book The Black Swan (2007), is "the misuse of games to model real-life situations". Taleb explains the fallacy as "basing studies of chance on the narrow world of games and dice". The adjective ludic originates from the Latin noun ludus, meaning "play, game, sport, pastime". The fallacy is a central argument in the book and a rebuttal of the predictive mathematical models used to predict the future – as well as an attack on the idea of applying naïve and simplified statistical models in complex domains. According to Taleb, statistics is applicable only in some domains, for instance casinos in which the odds are visible and defined. Taleb's argument centers on the idea that predictive models are based on platonified forms, gravitating towards mathematical purity and failing to take various aspects into account: * It is impossible to be in possession of the entirety of available information. * Small unknown variations in the data could have a huge impact. Taleb differentiates his idea from that of mathematical notions in chaos theory (e.g., the butterfly effect). * Theories or models based on empirical data are claimed to be flawed as they may not be able to predict events which are previously unobserved, but have tremendous impact (e.g., the 9/11 terrorist attacks or the invention of the automobile), also known as black swan theory. —Wikipedia

Examples

Suspicious Coin

One example given in the book is the following thought experiment. Two people are involved: * Dr. John who is regarded as a man of science and logical thinking * Fat Tony who is regarded as a man who lives by his wits A third party asks them to "assume that a coin is fair, i.e., has an equal probability of coming up heads or tails when flipped. I flip it ninety-nine times and get heads each time. What are the odds of my getting tails on my next throw?" * Dr. John says that the odds are not affected by the previous outcomes so the odds must still be 50:50. * Fat Tony says that the odds of the coin coming up heads 99 times in a row are so low that the initial assumption that the coin had a 50:50 chance of coming up heads is most likely incorrect. "The coin gotta be loaded. It can't be a fair game." The ludic fallacy here is to assume that in real life the rules from the purely hypothetical model (where Dr. John is correct) apply. A reasonable person, for example, would not bet on black on a roulette table that has come up red 99 times in a row (especially as the reward for a correct guess is so low when compared with the probable odds that the game is fixed). In classical terms, statistically significant events, i.e. unlikely events, should make one question one's model assumptions. In Bayesian statistics, this can be modelled by using a prior distribution for one's assumptions on the fairness of the coin, then Bayesian inference to update this distribution. This idea is modelled in the Beta distribution.

Fighting

Nassim Taleb shares an example that comes from his friend and trading partner, Mark Spitznagel... "A martial version of the ludic fallacy: organized competitive fighting trains the athlete to focus on the game and, in order not to dissipate his concentration, to ignore the possibility of what is not specifically allowed by the rules, such as kicks to the groin, a surprise knife, et cetera. So those who win the gold medal might be precisely those who will be most vulnerable in real life."

Links

Browse Profiles