independent and identicaly distributed rv

interlude 1

interlude 2

arrival processes

renewal process

conditional arrival densities

axioms for events

given a sample space Ω, the class of subsets of Ω that constitute the set of events satisfies the following axioms:

Ω is an event
for every sequence of events A₁,A₂,.., the union is an event
for every event A, the complement A^c = Ω\A is an event

note that the axioms do not say that all subsets of Ω are events. in fact, there are many rather silly ways to define classes of events that obey the axioms. for example, the axioms are satisfied by choosing only the universal set Ω and the empty set to be events. we shall avoid such trivialities by assuming that for each sample point ω, the singleton subset {ω} is an event. for finite sample spaces, this assumption, plus the axioms above, imply that all subsets are events

see sigma-algebra

indicator random variables

for every event you can think of, an event is something which is true, which occurs when some set of the sample points occur and is not true otherwise. so the definition of an indicator random variable is that the indicator for an event A - as a function of the sample space - is equal to 1, if ω is in the event A, and 0 otherwise

so if you draw the cdf of it, the cdf of the indicator function is 0 up until the point 0. then it jumps up to 1-p(A). and at 1, it jumps to 1

so it's simply a binary random variable

IID - independent and identicaly distributed rv

let X₁,X₂,..,Xₙ be IID rv’s with mean X, variance σ²
let Sₙ=X₁+...+Xₙ
then σ_S² = n * σ²

take the simplest random variable you can think of, which is a binary random variable - it's either 1 (with probability 1/4) or 0 (with probability 3/4)

here the three different distributions of Sₙ with different n:

the center of the distribution varies with n and the spread (σ_S) varies with √n

the sample average is Sₙ/n, which is a rv of mean X and variance σ²/n

the center of the distribution is X and the spread decreases with 1/√n

note that Sₙ-n*X is a zero mean rv with variance n*σ². thus (Sₙ−nX)/(σ*√n) is zero mean, unit variance

the variance is equal to n*σ². when you take the expected value of this quantity squared, all these cross terms are going to balance out with the mean when you do. and when you do that, it increases with n. and the mean increases with n. the standard deviation, which gives you a picture of how wide the distribution is, only goes up as √n. this is really the essence of the weak law of large numbers and then if you go on beyond this and you talk about the sample average, namely the sum of these n random variables - assume them IID again. in fact, assume for this picture that they're the same binary random variables. you look at the sample average. you find the mean of the sample average. and it's the mean of a single random variable. you find the variance of it. because of this n here and the squaring that you're doing, the variance of the sum divided by n - what happens as n gets large? this variance goes to 0

approximation of the cdf F Sₙ/n of a sample average by a step function at the mean

what happens when you have a sequence of random variables, all of which have the same mean and whose standard deviation is going to 0? essentially what's going on here is the nice feature that when you add all these things up, the cdf gets scrunched down into a unit step. in other words, since the standard deviation is going to 0, the sequence of random variables - since they all have the same mean - they all have smaller and smaller standard deviations. the only way you can do that is to scrunch them down into a limiting random variable, which is deterministic

interlude

when you think of a random experiment, when you think of playing dice with somebody or playing cards with someone, you are - from the very beginning when you started to talk about "odds" or anything of that sort, - you have always had the idea that this is a game which you can play repeatedly. and each time you play it, it's the same game but the outcome is different but all the probabilities are exactly the same. what this says is you can always make a probability model that way. you can always make a probability model which corresponds to what you've always believed deep in your hearts all your lives. and fortunately, that's true. otherwise, you wouldn't use probability

the relationship is between the real world and these models that we're dealing with. you as engineers or business people or financial analysts or whatever the heck you're going to become will start believing in your probability models. and you will cause untold damage by losing track of the fact that these are supposedly models of something. and you better think of what they're supposed to be models of

you want to model things so that each time, each trial of your experiment, you do the same thing but get a potentially different answer. sometimes you rig things without trying to do so in such a way that these experiments are not independent of each other and in fact are very, very heavily biased. you find people taking "risk models" in the financial world where they take all sorts of these things. and they say, "Oh, all right, these things are all independent of each other!". they look independent. and then suddenly a scare comes along and everybody sells simultaneously. and you find out that all these random variables were not independent at all - they were very closely related to each other but in a way you never saw before

comparing models for similar situations and analyzing limited and effective models helps a lot in clarifying fuzziness. but ultimately, as in all of science, some experimentation is needed. you don't want to do too much of it but the important thing is that the outcome of an experiment is a "sample point". it's not a "probability". you do an experiment. you get an outcome. and all you find is one "sample point", if you do the experiment once. and there's nothing that lets you draw a "probability" from that. the only way you can get things that you would call "probabilities" is to use an extended model, hope the extended model corresponds to the physical situation, and deal with these law of large numbers kind of things. you don't necessarily need IID random variables. but you need something that you know about between a large number of random variables to get from an outcome to something you could reasonably call a "probability"

models can have two types of difficulties

in one, a sequence of real-world experiments are not sufficiently similar and isolated to correspond to the IID extendied model
in the other, the IID extension is OK but the basic model is not

example: loaded coin

you model a coin. it's coming out Heads with probability 1/2. and somebody has put a loaded coin in. and the probability that it comes up Heads is 0.45. and you might guess that this person always bets on Tails and tries to get you to bet on Heads. so in that case, the basic model that you're using is not OK

example: roll a white die and a red die

if you roll two dice with different colors then there are 36 sample outcomes, (i,j), 1≤i,j≤6, taken as equiprobable by symmetry

and now you start to roll two white dice. the outcomes (i,j) and (j,i) for i≠j are now indistinguishable. there are now 21 ‘finest grain’ outcomes

but no sane person would use these as sample points

the appropriate sample space is still the ‘white/red’ sample space with an ‘off-line’ recognition of what is distinguishable. neither the axioms nor experimentation motivate this model, i.e., modeling requires judgement and common sense

interlude 2

much of our intuitive understanding of probability comes from the notion of repeating the same experiment many times (i.e., performing multiple trials of the same experiment). however, the axioms of probability contain no explicit recognition of such repetitions

the appropriate way to handle n repetitions of an experiment is through an extended experiment whose sample points are n-tuples of sample points from the original experiment. such an extended experiment is viewed as n trials of the original experiment

the notion of multiple trials of a given experiment is so common that one sometimes fails to distinguish between the original experiment and an extended experiment with multiple trials of the original experiment

the simplest and most natural way of creating a probability model for this extended sample space and class of events is through the assumption that the n-trials are statistically independent. we refer to this as the probability model for n independent identically distributed (IID) trials of a given experiment

the outcome of a probabilistic experiment often speciﬁes a collection of numerical values such as temperatures, voltages, numbers of arrivals or departures in various time intervals, etc. each such numerical value varies, depending on the particular outcome of the experiment, and thus can be viewed as a mapping from the set Ω of sample points to the set ℝ of real numbers (note that ℝ does not include ±∞). these mappings from sample points to real numbers are called "random variables"

as with any function, there is often confusion between the function itself, which is called X, and the value X(ω) the function takes on for a sample point ω. this is particularly prevalent with random variables since we intuitively associate a rv with its sample value when an experiment is performed. we try to control that confusion here by using X, X(ω), and x, respectively, to refer to the rv, the sample value taken for a given sample point ω, and a generic sample value

the concept of a rv is often extended to complex random variables and vector random variables. a complex random variable is a mapping from the sample space Ω to the set of finite complex numbers, and a vector random variable is a mapping from the sample space Ω to the finite vectors in some finite dimensional vector space

if X has only a finite or countable number of possible sample values, the probability Pr{X=x_i} of each sample value x_i is called the "probability mass function" (PMF) at x_i; such a random variable is called "discrete". the cdf of a discrete rv is a "staircase function" staying constant between the possible sample values and having a jump of magnitude p(x_i) at each sample value x_i

if the cdf of a rv has a (finite) derivative at x, the derivative is called the "probability density" of X at x and denoted by f(x); for sufficiently small δ, f(x) then approximates the probability that X is mapped to a value between x and x+δ. if the density exists for all x, the rv is said to be "continuous". more generally, if there is a function f(x) such that, for each x∈ℝ, the cdf satisﬁes

           y
    F(y) = ∫ f(y) dy
          -∞

then the rv is said to be "continuous" and f is the "probability density". this generalization allows the density to be discontinuous. in other words, a continuous rv requires a little more than a continuous cdf and a little less than a continuous density

so for any model, an extended model for a sequence or an n-tuple of IID repetitions is well-deﬁned. relative frequencies and sample averages (in the extended model) "become deterministic" and can be compared with real-world relative frequencies and sample averages in the repeated experiment

arrival processes

the differences X_i = S_i − S_i−1 are called “interarrival times” and the S_i are called “arrival epochs”

for each t>0, N(t) is the number of arrivals in (0,t]. we call {N(t):t>0} an “arrival counting process”

we need to relate the “counting process” {N(t);t>0} to the “arrival process” :

    {S_n ≤ t} = {N(t) ≥ n} for all n ≥ 1 , t > 0

if S_n = τ for some τ≤t, then N(τ) = n and N(t) ≥ n

although we refer to these processes as arrival processes, they could equally well model departures from a system, or any other sequence of incidents. although it is quite common, especially in the simulation ﬁeld, to refer to incidents or arrivals as events, we shall avoid that here. the nth arrival epoch S_n is a rv and {S_n<t}, for example, is an event. this would make it confusing to refer to the nth arrival itself as an event

Def : a “renewal process” is an arrival process for which the interarrival intervals X₁,X₂,.. are IID

Def : a “Poisson process” is a “renewal process” for which each X_i has an exponential distribution

      F(x) = 1 - exp (−λ*x) , x ≥ 0

where λ is a ﬁxed parameter called the “rate”

Def : a rv X is “memoryless” if X is positive and, for all real positive t and x

      p(X > t + x) = p(X > t) * p(X > x)

the interarrival interval for a Poisson process is exponential, i.e.,

    p(X > x) = exp(−λ*x) , x > 0

Th : a rv X is “memoryless” if and only if it is exponential

Th : for a Poisson process of rate λ, and any given t>0, the interval Z from t to the next arrival after t has distribution F(z)=exp(−λ*z) for all z>0. the rv Z is independent of N(t), and, given N(t)=n, Z is independent of S₁,..,S_n and of {N(τ):0<τ<t}

example

suppose X is the waiting time, starting at time 0, for a bus to arrive, and suppose X is memoryless. after waiting from 0 to t, the distribution of the remaining waiting time from t is the same as the original distribution starting from 0. the still waiting customer is, in a sense, no better at time t than at time 0. on the other hand, if the bus is known to arrive regularly every 16 minutes, then it will certainly arrive within a minute, and X is not memoryless. the opposite situation is also possible. if the bus frequently breaks down, then a 15 minute wait can indicate that the remaining wait is probably very long, so again X is not memoryless

Def : a counting process {N(t):t>0} has the "stationary increment" property if N(t')−N(t) has the same distribution as N(t'−t) for all t'>t>0

let N(t,t') = N(t')−N(t), i.e., N(t,t') is the number of arrivals in the increment (t,t']. thus "stationary increments" means that N(t,t') has the same distribution as N(t'−t)

Def : a counting process {N(t):t>0} has the "independent increment" property if, for every t₁,t₂,..,t_n, the rv’s N(t₁),N(t₁,t₂),...,N(t_n-1,t_n) are independent

Th : Poisson processes have stationary and independent increments

renewal process

this is a counting process N(t') defined for t' greater than t. so for fixed t, we now have something which we can view over variable t' as a counting process. it's a Poisson process shifted to start at time t, ie, for each t', N(t')-N(t) has the same distribution as N(t'-t). same for joint distributions

so it's defined in exactly the same way as the original random process is. so it's statistically the same process. which says two things about it: everything is the same. and everything is independent

we will call that "stationary". everything is the same. everything is independent. and then we'll try to sort out how things can be the same but also be independent

oh, we already know that. we have two IID random variables, X₁ and X₂. they're IID. they're independent and identically distributed. "identity distributed" means that in one sense, they are the same. but they're also independent of each other

so the random variables are defined in the same way. and in that sense, they're stationary. but they're independent of each other by the definition of independence. so our new process is independent of the old process in the interval 0 up to t

Erlang density

for a Poisson process of rate λ, the PDF of arrival epoch S₂ can be found by convolving the density of X₁ and X₂. thus

               t
      fS₂(t) = ∫ [λ*exp(−λ*x)] * [λ*exp(−λ*(t−x))] dx =  λ²*t*exp(−λ*t) 
               0

and using iteration and convolving fS_n−1(t) with λ*exp(−λ*t) for each n,

        fS_n(t)  =  λⁿ*tⁿ⁻¹*exp(−λ*t) / (n−1)!

this is called the Erlang density

this kind of form here with an exp(−λ*t), and with a t, or t², or so forth, is a particularly easy form to integrate. so we just do this again and again. and when we do it again and again, we find out that the density function as a sum of n of these random variables, you keep picking up an extra λ every time you convolve in another exponential random variable. you pick up an extra factor of t whenever you do this again. and strangely enough, this (n-1)! appears down here when you start integrating something with some power of t in it. so when you integrate this, this is what you get

the joint density of X₁,..,X_n is

      f_n (x₁,..,x_n) = λ * exp(−λ*x₁−λ*x₂−...−λ*x_n)
      fX_n = λⁿ exp(−λ*s_n)

where s_n = Σ x_i

    fS_n (s₁,..,s_n) = λⁿ * exp(−λ*s_n)

this says that the joint density is uniform over the region where s₁<s₂...<s_n. given that the nth arrival is at s_n, the other n−1 arrivals are uniformly distributed in (0,s_n), subject to the ordering. integrating (or looking ahead), we get the Erlang marginal density

poisson PMF

note that the poisson PMF is a function of λ*t and not of λ or t separately. this is the probability of n arrivals in an interval of size t with rate λ. and if we measure length in a different system of units, λ will change accordingly, so the poisson PMF has to be a function of λ*t only

Th : if an arrival process has the stationary and independent increment properties and if N(t) has the poisson PMF for given λ and all t>0, then the process is poisson

Bernoulli processes

Bernoulli processes are often viewed as the discrete time version of Poisson processes. there is a confusing feature here that must be cleared up ﬁrst

the binary rv’s Y₁,Y₂,.. of the Bernoulli process have no direct analogs in the Poisson process. when we view a Bernoulli process as an arrival process, an arrival occurs at discrete time n if and only if Y_n=1. thus S_n=Y₁+···+Y_n is the number of arrivals up to and including time n. thus {S_n;n≥1} is analogous to the Poisson counting process {N(t);t>0}

the interarrival intervals, X₁,X₂,... for a Bernoulli arrival process are the intervals between successive 1’s in the binary stream. thus X₁=k if Y_i=0 for 1≤i≤k−1 and Y_k=1. thus pX₁(k)=p(1−p)^k−1 for all k≥1. subsequent interarrival intervals are iid with X₁. thus, the interarrival intervals for the Bernoulli counting process are geometrically distributed

interlude 3

we started to look at any set of time, say t₁, t₂, up to t_k, and then we looked at the increments. how many arrivals occurred between zero and t₁? how many arrivals occurred between t₁ and t₂? how many between t₂ and t₃? so we looked at these Poisson process increments, a whole bunch of random variables, and we said, these are stationary and independent Poisson counting processes, over their given intervals

in other words, if you stop the first process, you stop looking at this at time t₁. then you look at this from time t₁ to t₂. you look at this one from t_k-1 to t_k. and you look at the last one, the next last one, all the way up to t_k. and there should be one where you look at it from t_k on to t. no, you're only looking at k of them. so these are the things that you're looking at. the statement is that these are independent random variables

the other statement is they're stationary, which means, if you look at the number of arrivals in this interval, here, it's a function of t₂ t₁. but it's not a function of t₁ alone. it's only a function of the length of the interval. the number of arrivals in any interval of length t_k minus t_k-1 is a function only of the length of that interval and not of where the interval is. that's a reasonable way to look at stationarity

conditional arrival densities

a diverse range of problems involving Poisson processes are best tackled by conditioning on a given number n of arrivals in the interval (0,t], i.e., on the event N(t)=n

because of the incremental view of the Poisson process as independent and stationary arrivals in each incremental interval of the time axis, we would guess that the arrivals should have some sort of uniform distribution given N(t)=n. more precisely, the joint density of S(n)=(S₁,S₂,..,S_n) given N(t)=n is uniform over the region 0<s₁<s₂<...<s_n<t

the main formulas are

    probability of arrival : p(s_k > τ | N(t)=n) = [(t-τ) / t]^n

    joint density : f(S_n | n) = (n * s_n-1) / t^n

consider N(t) = 1. if you look at the small increment definition of a Poisson process, it says that arrivals are equally likely at any point along here

if you condition on the fact that there's been one arrival in this whole interval, and they're equally likely to occur anywhere along here, the only sensible thing that can happen is that the probability density for where that arrival happens is uniform over this whole interval, so it is 1/t

the important point is that this does not depend on s1, i.e., it is uniform over (0,t]

next consider N(t) = 2

given that there are two arrivals in here doesn't make any difference where they are. 2/t^2. this is, in some sense, uniform over s₁ and s₂. and you have to be more careful about what "uniform" means here

you can do the same thing for N(t)=n for arbitrary n

note that the exponents above always sum to λ*t. and what you come up with is this joint density is equal to n!/t^n. so again, it's the same story. it's "uniform"

it doesn't depend on where the s's are, except for the fact that s₁<s₂<s₃<... we have to sort that out. it has this peculiar n! here that doesn't depend on λ. it does depend on n in this way

that's a uniform n-dimensional probability density over the volume t^n/n! corresponding to the constraint region 0<s₁<...<s_n. that region is a little peculiar. this n! is a little peculiar

the t^n is pretty obvious. because when you have a joint density over t things, each bounded between zero and t, you expect to see a t^n there

you have a uniform probability density over some strange volume. and it's a strange volume which satisfies this constraint

to see why n! appears in this uniform density, let U1,..,Un be n IID rv’s, each uniform over (0,t]. let S1,..,Sn be the order statistics for U. the region of volume t^n where the density of U is nonzero partitions into n! regions, one in which s₁<s₂<...<s_n. from symmetry, each volume is the same, and thus each is t^n/n!

return to N(t) = 2 and look at fS₁S₂ | N(t)=2

the area in the s₁,s₂ space where 0<s₁<s₂<t is t^2/2, and that explains why the density is 2/t^2

since X1 = S1, X1 has density and expected value :

this extends to N(t) = n for arbitrary n :

paradox

the mean interarrival time for a Poisson process is 1/λ. but the mean time from any given t to the next arrival is also 1/λ and the mean time back to the previous arrival is (1/λ)*(1−exp(−λ*t)). thus the mean length of the interval containing t is (1/λ)*(2−exp(−λ*t))

назад оглавление вперед