The class comprises 10 homework sets, each of which typically has two parts: (i) a few problems for you to get some training on the material covered in class; and (ii) a small project that studies a particular stochastic system. For (ii) you are typically required to answer some questions about properties of the stochastic system, write a Matlab simulation and use this simulation to understand some properties of the system. Matlab can be obtained following these instructions, and a starter's guide can be found here. The list of homework sets and solutions follows. You can refer to the end of the page for matters about logistics and the policy on grading, collaboration and use of posted solutions.

Introduction

Disclaimer: In this first assignment you analyze a simple stochastic system. It is a little unconventional since we are asking you to perform the type of analysis this class is supposed to teach you. Therefore, some parts of it might not be very clear to you but will become clearer as we progress in the class. The idea is to gain an appreciation for what the challenges are in analyzing a stochastic system. I also want you to appreciate how it is possible to state facts about the system's behavior even if randomness implies that anything is possible.

Homework 1 - Lower-bounded biased random walk (Due Wednesday September 1, submit via Gradescope). This example considers a certain game in a certain casino where your chances of winning are slightly better than your chances of loosing. More specifically, you place 1-dollar bets, with probability p you are paid 1 dollar and with probability 1-p you loose you bet. The good news is that p>1/2. The catch is that you have to keep playing forever or until you loose all your money. If you have w dollars is it a good idea to play this game or not? You have to balance the fact that while you are more likely to win than loose, there is always a chance of getting unlucky a few times and losing all your money. And since you have to keep playing forever the latter possibility cannot be ignored heedlessly. We will also study the case p<1/2. If you are having trouble with this homework, this piece of code simulates one experiment. This other code repeats the experiment a hundred times and the numerical analysis of outcomes is undertaken here. You are well advised to first try things on your own. Solutions are available here.

Probabilty review

Homework 2 - Relationship between Bernoulli, binomial, Poisson and normal random variables. (Due Friday September 17, submit via Gradescope) On a fundamental level all random phenomena depend on something happening or not happening. Thus, all random phenomena are coin tosses. If that is true, then why do we study probability? The answer is that random phenomena tend to generate regular structure. If you throw a coin a thousand times, you can expect quite certainly to get at least 400 heads. We started with an uncertain system but ended up with a certain statement. This regularity appears in the limit of sequences of random variables and is exploited by anyone doing serious applications of probability to real world problems. In the end, as Kolmogorov said the fact that randomness generates structure is the core epistemological value of the theory of probability. To help you appreciate the beauty of that ;) this small project starts with sequences of coin tosses (Bernoulli random variables) and considers the number of heads in the sequence. As the number of tosses increases, you'll see how Poisson and normal random variables appear naturally. These two limit behaviors are examples of random phenomena with a particular structure (sequences of Bernoulli, described by binomial distributions) that when considered in a large scale give rise to a different type of structure (normal or Poisson). The complete solutions are available here.

Homework 3 - Decision making. (Due Monday September 27, submit via Gradescope) Say you are selling your car for which you are going to receive J offers. Offers will be of different value, and if all J of them were presented upfront, the car would be sold to the highest bidder. Alas, potential buyers make their offers in a random order and if not accepted they withdraw their bid -- presumably, they can ﬁnd a different seller willing to accep their offer. The issue here is the design of a decision mechanism that allows you to sell your car at a reasonable price. In principle, it makes sense to discard the first few bids since you do not know where they fit in the pool of potential offers. Then, you can choose the first offer that fares comparatively well against those that you rejected. The questions here are how many bids do we need to discard and what does it mean for an offer to fare comparatively well with respect to others. The complete solutions are available here.

Markov chains

Homework 4 - Propagation of mitochondrial DNA types. (Due Wednesday October 6, submit via Gradescope). Mitochondrial DNA is passed from mother to children without genetic contribution from the father. All the variability in mitochondrial DNA is due to random mutations accumulated over time. Using estimates of the mutation rate and differences on mitochondrial DNA between humans it becomes possible to estimate the time at which groups became distinct populations. We are not computing these times here but a few facts about a Markov chain that models the propagation of mitochondrial DNA. In particular, you are asked to follow your mitochondrial DNA for the next few centuries. The Markov chain that we use here belongs to the class of branching processes that are also used, for example, to model population dynamics, chain reactions, the growth of links to web pages and the emergence of hub airports. The complete solutions are available here.

Homework 5 - Aloha protocol for random access communication. (Due Friday October 15, submit via Gradescope). A pervasive situation in communication systems is to have a common infrastructure access point (AP) serving a group of physically distributed devices. This is, for example, the situation of your cellphone sending information to the cellular base station, your laptop transmitting to the 802.11 wireless router, or a group of satellites communicating with a common ground station. In these three cases there is a common AP, the base station, the router, or the ground station, serving a group of distributed devices, cellphones, laptops, or satellites. A problem in the uplink communication from physically distributed terminals to the AP is how to separate the information transmitted by different devices. One possibility is to assign different times or frequencies to different terminals. This is called time or frequency division multiplexing. Another possibility is to let terminals transmit at random and hope for luck to avoid simultaneous transmissions. This is called random access. Random access is advantageous because it requires little coordination between terminals. However, it is not clear that random access is a viable communication strategy. You will see in this problem that random access actually performs surprisingly well. A minor variation of this protocol is used in WiFi standards. Cellphones use this protocol to reserve resources for a voice call (the actual voice is transmitted using a different protocol). Random access goes under the picturesque name of Aloha because it was first applied to provide satellite communication services to the islands of Hawaii. The complete solutions are available here.

Homework 6 - Google's PageRank algorithm. (Due Friday October 22, submit via Gradescope). The most popular algorithms to rank pages in a web search are stochastic. Consider a web surfer that visits a page and clicks on any of the page's links at random. She repeats this process forever. What fraction of her time will be spent on a given page? The answer to this question is the rank assigned to the page. The same idea can be used to understand the structure of networks in different settings. For example, we can use this algorithm to extract connectivity information from a social graph. Say we choose a student and ask her to direct us to any of her friends selected randomly. We then go to this friend repeat the question and are directed to this new student. This is no different from the random web surfer model. Repeating this process forever we can therefore infer the degree of connectedness of students in the class from the average number of visits to each of them. This is not a pointless exercise. To market products, for example, it is worthwhile to concentrate the effort in the individuals that are most connected to other persons. The important insight here is that the network possesses knowledge that individuals do not. You will need the following homework collaboration matrix connecting students that cooperate in homework assignments. The complete solutions are available here.

Continuous-time Markov chains

Homework 7 - Arrival of passengers at a subway station. (Due Monday November 8, submit via Gradescope). In order to dimension public transportation systems we need to determine the number of customers that arrive at different stations. Since this a random quantity what we actually need to determine is the probability distribution of the number of customers that arrive in a certain time interval. In this problem you will see that as long as we can think of customers as behaving independently, the number of customers that arrive at a train station can be modeled as a Poisson process. Notice how a very unassuming hypothesis - customers behave independently, leads to a rich structure - Poisson arrivals. Solutions of Problem 7 are available here, the rest after the due date.

Homework 8 - Cash flow of an insurance company. (Due Wdnesday November 17, submit via Gradescope). The purpose of this problem is study a simplified version of the cash flow of an insurance company, and, more specifically, to determine the probability that the company pays dividends in a particular quarter. At any point in time three things might happen: (i) a customer pays a premium, (ii) a claim is paid to a client, or (iii) the company pays dividends. The probabilities of these events are different as are the amounts of money involved. A further constraint on the payment of dividends is that they are paid only if the cash on hand exceeds a certain amount. You will build a continuous-time Markov chain model of the company's cash flow and use it to determine the probability that the company pays dividends in a particular quarter. This homework set is challenging and very important. If you do it correctly, you will become very proficient on continuous-time Markov chains. The complete solutions are available here.

Gaussian, Markov, and stationary random processes

Homework 9 - White Gaussian noise. (Due Wednesday December 1, submit via Gradescope). White Gaussian noise (WGN) is probably the most common stochastic model used in engineering applications. The idea is to model a random process X(t) for which individual values are normally distributed and values X(t1) and X(t2) for different times t1, t2 are independent. It is not difficult to believe that this is a very simple model. It simply represents the drawing of independent normal random variables at different times. In the first part of this problem you will develop a model of WGN. In the second part you will use WGN to model somewhat more complex systems, for instance the integral of WGN leading to a Brownian motion process. The goal is to observe that while WGN is very simple, it can be used to model complex stochastic systems. The complete solutions are available here.

Logistics

Homework sets are typically due by Mondays or Wednesdays as per the calendar above. I ask that you please make your best effort to meet the deadline. If you can't, for whatever reason, I have no problem granting homework extensions. If you can't make the deadline, don't give me an excuse. Just tell me you need a few more days and that's it. That said, do not abuse that prerogative, if you are late a couple of times in the term, that's OK. If you are late more than that you need to organize your time better.