# A dirty little ditty on Finite Automata

### Some Formal Definitions

#### A Deterministic Finite Automata (DFA) is

• A set $\mathcal{Q}$ called “states”
• A set $\Sigma$ called “symbols”
• A function $\delta_F:\mathcal{Q}\times\Sigma \to \mathcal{Q}$
• A designated state $q_0\in\mathcal{Q}$ called the start point
• A subset $F\subseteq\mathcal{Q}$ called the “accepting states”.

The DFA is then often referred to as the ordered quintuple $A=(\mathcal{Q},\Sigma,\delta_F,q_0,F)$.

#### Defining how strings act on DFAs.

Given a DFA, $A=(\mathcal{Q}, \Sigma, \delta, q_0, F)$, a state $q_i\in\mathcal{Q}$, and a string $w\in\Sigma^*$, we can define $\delta(q_i,w)$ like so:

• If $w$ only has one symbol, we can consider $w$ to be the symbol and define $\delta(q_i,w)$ to be the same as if we considered $w$ as the symbol.
• If $w=xv$, where $x\in\Sigma$ and $v\in\Sigma^*$, then $\delta(q_i, w)=\delta(\delta(q_i,x),v)$.

And in this way, we have defined how DFAs can interpret strings of symbols rather than just single symbols.

#### The language of a DFA

Given a DFA, $A=(\mathcal{Q}, \Sigma, \delta, q_0, F)$, we can define “the language of $A$“, denoted $L(A)$, as $\{w\in\Sigma^*|\delta(q_0,w)\in F\}$.

### Some Examples, Maybe?

#### Example 1:

Let’s construct a DFA that accepts only strings beginning with a 1 that, when interpreted as binary numbers, are multiples of 5. So some examples of strings that would be in $L(A)$ are 101, 1010, 1111

### Some More Formal Definitions

#### A Nondeterministic Finite Automata (NFA) is

• A set $\mathcal{Q}$ called “states”
• A set $\Sigma$ called “symbols”
• A function $\delta_N:\mathcal{Q}\times\Sigma \to \mathcal{P}\left(\mathcal{Q}\right)$
• A designated state $q_0\in\mathcal{Q}$ called the start point
• A subset $F\subseteq\mathcal{Q}$ called the “accepting states”.

The NFA is then often referred to as the ordered quintuple $A=(\mathcal{Q},\Sigma,\delta_N,q_0,F)$.

#### Defining how strings act on NFAs.

Given an NFA, $N=(\mathcal{Q}, \Sigma, \delta, q_0, F)$, a collection of states $\mathcal{S}\subseteq\mathcal{Q}$, and a string $w\in\Sigma^*$, we can define $\delta(\mathcal{S},w)$ like so:

• If $w$ only has one symbol, then we can consider $w$ to be the symbol and define $\delta(\mathcal{S},w):=\bigcup\limits_{s\in\mathcal{S}}\delta(s,w)$.
• If $w=xv$, where $x\in\Sigma$ and $v\in\Sigma^*$, then $\delta(\mathcal{S}, w)=\bigcup\limits_{s\in\mathcal{S}}\delta(\delta(s,x),v)$.

And in this way, we have defined how NFAs can interpret strings of symbols rather than just single symbols.

# Probability — A Measure Theoretic Approach

## What is probability?

Consider a set $\Omega$ (called the sample space), and a function $X:\Omega\rightarrow\mathbb{R}$ (called a random variable.

If $\Omega$ is countable (or finite), a function $\mathbb{P}:\Omega\rightarrow\mathbb{R}$ is called a probability distribution if it satisfies the following 2 conditions:

1. For each $x \in \Omega$, $\mathbb{P}(x) \geq 0$
2. If $A_i\cap A_j = \emptyset$, then $\mathbb{P}(\bigcup\limits_0^\infty A_i) = \sum\limits_0^\infty\mathbb{P}(A_i)$

And if $\Omega$ is uncountable, a function $F:\mathbb{R}\rightarrow\mathbb{R}$ is called a probability distribution or a cumulative distribution function if it satisfies the following 3 conditions:

1. For each $a,b\in\mathbb{R}$, $a < b \rightarrow F(a)\leq F(b)$
2. $\lim\limits_{x\to -\infty}F(x) = 0$
3. $\lim\limits_{x\to\infty}F(x) = 1$

### The Intuition:

What idea are we even trying to capture with these seemingly disparate definitions for the same thing? Well, with the two cases taken separately it's somewhat obvious, but they don't seem to marry very well. The discrete case is giving us a pointwise estimation of something akin to the proportion of observations that should correspond to a value (in a perfect world). The continuous case is the same thing, but instead of corresponding to that particular value (which doesn't really even make sense in this case), the proportion corresponds to the point in question and everything less than it. The shaded region in the top picture below and the curve in the picture directly below it denote the cumulative density function of a standard normal distribution (don't worry too much about what that means for this post, but if you're doing anything with statistics, you should probably know a bit about that).

Another way to define a continuous probability distribution is through something called a probability density function, which is closer to the discrete case definition of a probability distribution (or probability mass function). A probability density function is a function $f:\mathbb{R}\rightarrow\mathbb{R}_+$ such that $\int_{-\infty}^xf(t)dt = F(x)$. In other words, $\frac{dF}{dX} = f$. This new function has some properties of our discrete case probability function, but lacks some others. On the one hand, they’re both defined pointwise, but on the other, this one can be greater than one in some places — meaning the value of the probability density function isn’t really the probability of an event, but rather (as the name “suggests”) the density therein.

### Does it measure up?

Now let’s check out the measure theoretic approach…

Let $\Omega$ be our sample space, $S$ be the $\sigma$-algebra on $\Omega$ (so $S$ is the collection of measurable subsets of $\Omega$), and $\mu:S\to\mathbb{R}$ a measure on that measure space. Let $X:\Omega\rightarrow\mathbb{E}$ be a random variable ($\mathbb{E}$ is generally taken to be $\mathbb{R}$ or $\mathbb{R}^n$). We define the function $\mathbb{P}:\mathcal{P}(\mathbb{E})\rightarrow\mathbb{R}$ (where $\mathcal{P}(\mathbb{E})$ is the powerset of $\mathbb{E}$ — the set of all subsets) such that if $A\subseteq\mathbb{E}$, we have $\mathbb{P}(A)=\mu(X^{-1}(A))$. We call $\mathbb{P}$ a probability distribution if the following conditions hold:

1. $\mu(\Omega) = 1$
2. for each $A\subseteq\mathbb{E}$ we have $X^{-1}(A)\in S$.

### Why do this?

Well, right off the bat we have a serious benefit: we no longer have two disparate definitions of our probability distributions. Furthermore, there is the added benefit of having a natural separation of concerns: the measure $\mu$ determines the what we might intuitively consider to be the probability distribution while the random variable is used to encode the aspects of the events that we care about.

To further illustrate this

### The Examples

#### A fair die

##### All even

Let’s consider a fair die. Our sample space will be $\{1,2,3,4,5,6\}$. Since our die is fair, we’ll define our measure fairly: for any $x$ in our sample space, $\mu(\{x\}) = \frac{1}{6}$. If we want to know, for instance, what the probability of getting each number is, we could use a very intuitive random variable $X(a) = a$ (so $X(1)=1$, etc.). Then we see that $\mathbb{P}(\{1\}) = \mu(\{1\}) = \frac{1}{6}$, and the rest are found similarly.

##### Odds and Evens?

What if we want to consider the fair die of yester-paragraph, but we only care if the face of the die shows an odd or an even number? Well, since the actual distribution of the die hasn’t changed, we won’t have to change our measure. Instead we’ll change our random variable to capture just those aspects we care about. In particular, $X(a) = 0$ if $a$ is even, and $X(a) = 1$ if $a$ is odd. We then see $\mathbb{P}(1) = \mu(\{1,3,5\}) = \frac{1}{2}$ and $\mathbb{P}(0) = \mu(\{2,4,6\}) = \frac{1}{2}$

##### All even

Now let’s consider the same scenario of wanting to know the probability of getting each number, but now our die is loaded. Being as how we’re changing the distribution itself and not just the aspects we’re choosing to care about, we’re going to want to change the measure this time. For simplicity, let’s consider a kind of degenerate case scenario. Let our measure be: $\mu(A) = 1$ if $1\in A$ and $\mu(A)=0$ if $1\notin A$. Basically, we’re defining our probability to be such that the only possible outcome is a roll of 1. So since we are concerned with the same things we were concerned with last time, we can take that same random variable. We note $\mathbb{P}(1) = 1$ and $\mathbb{P}(a) = 0$ for any $a \neq 1$.

##### Odds or evens

Try to do this one yourself. I’m going to go get some sleep now. Please feel free to contact me with any questions. I love doing this stuff, so don’t be shy!

# What’s in a language?

## What is a Formal Language?

### The Rigorous definition:

Let $\Sigma$ be an alphabet and let $\Sigma^k$ be the set of all strings of length k over that alphabet. Then, we define $\Sigma^*$ to be $\bigcup\limits_{k\in\mathbb{N}}\Sigma^k$ (the union of ∑k over all natural numbers k). If $L\subseteq\Sigma^*$, we call $L$ a language.

### The Intuition Behind the Definition

Consider an alphabet (some finite set of characters), for example we can consider the letters of the English language, the ASCII symbols, the symbols $\{0, 1\}$ (otherwise known as binary), or the symbols $\{1, 2, 3, 4, 5, 6, 7, 8, 9, 0, +, \times , =\}$. We can then construct the infinite list of all the different ways we can arrange those characters (e.g. $1001011$ or $0011011011$, etc. if we’re using binary). We call these arrangements “strings”. Once we have all that machinery built up, a language is just some subset of that infinite collection of strings. The language may itself be infinite.

### Some Examples

• The alphabet: $\Sigma=\{0, 1\}$
The language: $\{x\in\Sigma|x \text{ is prime}\}$ (all prime numbers in binary)
Some strings from the language: $10, 11, 101...$
• The alphabet: ASCII characters
The language: All syntactically correct Clojure programs (the source code)
• The alphabet: All Clojure functions, operators, etc, and list $\{x, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0\}$
The language: All syntactically correct Clojure programs (the source code)

You see that we need to have an alphabet before we can have a Formal Language. Also, different alphabets may result in equivalent languages — by equivalent, we mean that both languages contain the same strings.

### What are we getting at?

Well, there are two ways to look at this. On the first hand, linguists would like to study language in its abstract essence. For this, Formal Languages may come in handy (if endowed with a Grammar and possibly more). That is not the reason I will be studying Formal Languages. I’m learning about formal languages to find their applications to computing.

Apparently, with the help of a little Mathematical thinking, we can assign semantics to the strings in a language and somehow correlate them with real world problems — such as computability, the P=nP problem, cryptography, and more!