## What is *probability*?

### The Traditional Definition:

Consider a set (called the **sample space**), and a function (called a **random variable**.

If is countable (or finite), a function is called a **probability distribution** if it satisfies the following 2 conditions:

- For each ,
- If , then

And if is uncountable, a function is called a **probability distribution** or a **cumulative distribution function** if it satisfies the following 3 conditions:

- For each ,

### The Intuition:

What idea are we even trying to capture with these seemingly disparate definitions for the same thing? Well, with the two cases taken separately it's somewhat obvious, but they don't seem to marry very well. The discrete case is giving us a pointwise estimation of something akin to the proportion of observations that should correspond to a value (in a perfect world). The continuous case is the same thing, but instead of corresponding to that particular value (which doesn't really even make sense in this case), the proportion corresponds to the point in question and everything less than it. The shaded region in the top picture below and the curve in the picture directly below it denote the cumulative density function of a standard normal distribution (don't worry too much about what that means for this post, but if you're doing anything with statistics, you should probably know a bit about that).

Another way to define a continuous probability distribution is through something called a probability density function, which is closer to the discrete case definition of a probability distribution (or **probability mass function**). A **probability density function** is a function such that . In other words, . This new function has some properties of our discrete case probability function, but lacks some others. On the one hand, they’re both defined pointwise, but on the other, this one can be greater than one in some places — meaning the value of the probability density function isn’t really the probability of an event, but rather (as the name “suggests”) the density therein.

### Does it measure up?

Now let’s check out the measure theoretic approach…

Let be our sample space, be the -algebra on (so is the collection of measurable subsets of ), and a measure on that measure space. Let be a random variable ( is generally taken to be or ). We define the function (where is the powerset of — the set of all subsets) such that if , we have . We call a **probability distribution** if the following conditions hold:

- for each we have .

### Why do this?

Well, right off the bat we have a serious benefit: we no longer have two disparate definitions of our probability distributions. Furthermore, there is the added benefit of having a natural separation of concerns: the measure determines the what we might intuitively consider to be the probability distribution while the random variable is used to encode the aspects of the events that we care about.

To further illustrate this

### The Examples

#### A fair die

##### All even

Let’s consider a fair die. Our sample space will be . Since our die is fair, we’ll define our measure fairly: for any in our sample space, . If we want to know, for instance, what the probability of getting each number is, we could use a very intuitive random variable (so , etc.). Then we see that , and the rest are found similarly.

##### Odds and Evens?

What if we want to consider the fair die of yester-paragraph, but we only care if the face of the die shows an odd or an even number? Well, since the actual distribution of the die hasn’t changed, we won’t have to change our measure. Instead we’ll change our random variable to capture just those aspects we care about. In particular, if is even, and if is odd. We then see and

#### Getting loaded

##### All even

Now let’s consider the same scenario of wanting to know the probability of getting each number, but now our die is loaded. Being as how we’re changing the distribution itself and not just the aspects we’re choosing to care about, we’re going to want to change the measure this time. For simplicity, let’s consider a kind of degenerate case scenario. Let our measure be: if and if . Basically, we’re defining our probability to be such that the only possible outcome is a roll of 1. So since we are concerned with the same things we were concerned with last time, we can take that same random variable. We note and for any .

##### Odds or evens

Try to do this one yourself. I’m going to go get some sleep now. Please feel free to contact me with any questions. I love doing this stuff, so don’t be shy!