назад     оглавление     вперед


  • axioms for events
  • indicator random variables
  • independent and identicaly distributed rv
  • interlude 1
  • interlude 2
  • arrival processes
  • renewal process
  • Erlang PMF
  • Poisson PMF
  • Bernoulli processes
  • interlude 3
  • conditional arrival densities

  • axioms for events

    given a sample space Ω, the class of subsets of Ω that constitute the set of events satisfies the following axioms:

    note that the axioms do not say that all subsets of Ω are events. in fact, there are many rather silly ways to define classes of events that obey the axioms. for example, the axioms are satisfied by choosing only the universal set Ω and the empty set to be events. we shall avoid such trivialities by assuming that for each sample point ω, the singleton subset {ω} is an event. for finite sample spaces, this assumption, plus the axioms above, imply that all subsets are events

    see sigma-algebra

    indicator random variables

    for every event you can think of, an event is something which is true, which occurs when some set of the sample points occur and is not true otherwise. so the definition of an indicator random variable is that the indicator for an event A - as a function of the sample space - is equal to 1, if ω is in the event A, and 0 otherwise

    so if you draw the cdf of it, the cdf of the indicator function is 0 up until the point 0. then it jumps up to 1-p(A). and at 1, it jumps to 1

    so it's simply a binary random variable


    IID - independent and identicaly distributed rv

    let X₁,X₂,..,Xₙ be IID rv’s with mean X, variance σ²
    let Sₙ=X₁+...+Xₙ
    then σS² = n * σ²

    take the simplest random variable you can think of, which is a binary random variable - it's either 1 (with probability 1/4) or 0 (with probability 3/4)

    here the three different distributions of Sₙ with different n:

    the center of the distribution varies with n and the spread (σS) varies with √n

    the sample average is Sₙ/n, which is a rv of mean X and variance σ²/n

    the center of the distribution is X and the spread decreases with 1/√n

    note that Sₙ-n*X is a zero mean rv with variance n*σ². thus (Sₙ−nX)/(σ*√n) is zero mean, unit variance

    the variance is equal to n*σ². when you take the expected value of this quantity squared, all these cross terms are going to balance out with the mean when you do. and when you do that, it increases with n. and the mean increases with n. the standard deviation, which gives you a picture of how wide the distribution is, only goes up as √n. this is really the essence of the weak law of large numbers and then if you go on beyond this and you talk about the sample average, namely the sum of these n random variables - assume them IID again. in fact, assume for this picture that they're the same binary random variables. you look at the sample average. you find the mean of the sample average. and it's the mean of a single random variable. you find the variance of it. because of this n here and the squaring that you're doing, the variance of the sum divided by n - what happens as n gets large? this variance goes to 0

    approximation of the cdf F Sₙ/n of a sample average by a step function at the mean

    what happens when you have a sequence of random variables, all of which have the same mean and whose standard deviation is going to 0? essentially what's going on here is the nice feature that when you add all these things up, the cdf gets scrunched down into a unit step. in other words, since the standard deviation is going to 0, the sequence of random variables - since they all have the same mean - they all have smaller and smaller standard deviations. the only way you can do that is to scrunch them down into a limiting random variable, which is deterministic


    interlude

    when you think of a random experiment, when you think of playing dice with somebody or playing cards with someone, you are - from the very beginning when you started to talk about "odds" or anything of that sort, - you have always had the idea that this is a game which you can play repeatedly. and each time you play it, it's the same game but the outcome is different but all the probabilities are exactly the same. what this says is you can always make a probability model that way. you can always make a probability model which corresponds to what you've always believed deep in your hearts all your lives. and fortunately, that's true. otherwise, you wouldn't use probability

    the relationship is between the real world and these models that we're dealing with. you as engineers or business people or financial analysts or whatever the heck you're going to become will start believing in your probability models. and you will cause untold damage by losing track of the fact that these are supposedly models of something. and you better think of what they're supposed to be models of

    you want to model things so that each time, each trial of your experiment, you do the same thing but get a potentially different answer. sometimes you rig things without trying to do so in such a way that these experiments are not independent of each other and in fact are very, very heavily biased. you find people taking "risk models" in the financial world where they take all sorts of these things. and they say, "Oh, all right, these things are all independent of each other!". they look independent. and then suddenly a scare comes along and everybody sells simultaneously. and you find out that all these random variables were not independent at all - they were very closely related to each other but in a way you never saw before

    comparing models for similar situations and analyzing limited and effective models helps a lot in clarifying fuzziness. but ultimately, as in all of science, some experimentation is needed. you don't want to do too much of it but the important thing is that the outcome of an experiment is a "sample point". it's not a "probability". you do an experiment. you get an outcome. and all you find is one "sample point", if you do the experiment once. and there's nothing that lets you draw a "probability" from that. the only way you can get things that you would call "probabilities" is to use an extended model, hope the extended model corresponds to the physical situation, and deal with these law of large numbers kind of things. you don't necessarily need IID random variables. but you need something that you know about between a large number of random variables to get from an outcome to something you could reasonably call a "probability"

    models can have two types of difficulties

    example: loaded coin

    you model a coin. it's coming out Heads with probability 1/2. and somebody has put a loaded coin in. and the probability that it comes up Heads is 0.45. and you might guess that this person always bets on Tails and tries to get you to bet on Heads. so in that case, the basic model that you're using is not OK

    example: roll a white die and a red die

    if you roll two dice with different colors then there are 36 sample outcomes, (i,j), 1≤i,j≤6, taken as equiprobable by symmetry

    and now you start to roll two white dice. the outcomes (i,j) and (j,i) for i≠j are now indistinguishable. there are now 21 ‘finest grain’ outcomes

    but no sane person would use these as sample points

    the appropriate sample space is still the ‘white/red’ sample space with an ‘off-line’ recognition of what is distinguishable. neither the axioms nor experimentation motivate this model, i.e., modeling requires judgement and common sense


    interlude 2

    much of our intuitive understanding of probability comes from the notion of repeating the same experiment many times (i.e., performing multiple trials of the same experiment). however, the axioms of probability contain no explicit recognition of such repetitions

    the appropriate way to handle n repetitions of an experiment is through an extended experiment whose sample points are n-tuples of sample points from the original experiment. such an extended experiment is viewed as n trials of the original experiment

    the notion of multiple trials of a given experiment is so common that one sometimes fails to distinguish between the original experiment and an extended experiment with multiple trials of the original experiment

    the simplest and most natural way of creating a probability model for this extended sample space and class of events is through the assumption that the n-trials are statistically independent. we refer to this as the probability model for n independent identically distributed (IID) trials of a given experiment

    the outcome of a probabilistic experiment often specifies a collection of numerical values such as temperatures, voltages, numbers of arrivals or departures in various time intervals, etc. each such numerical value varies, depending on the particular outcome of the experiment, and thus can be viewed as a mapping from the set Ω of sample points to the set ℝ of real numbers (note that ℝ does not include ±∞). these mappings from sample points to real numbers are called "random variables"

    as with any function, there is often confusion between the function itself, which is called X, and the value X(ω) the function takes on for a sample point ω. this is particularly prevalent with random variables since we intuitively associate a rv with its sample value when an experiment is performed. we try to control that confusion here by using X, X(ω), and x, respectively, to refer to the rv, the sample value taken for a given sample point ω, and a generic sample value

    the concept of a rv is often extended to complex random variables and vector random variables. a complex random variable is a mapping from the sample space Ω to the set of finite complex numbers, and a vector random variable is a mapping from the sample space Ω to the finite vectors in some finite dimensional vector space

    if X has only a finite or countable number of possible sample values, the probability Pr{X=xi} of each sample value xi is called the "probability mass function" (PMF) at xi; such a random variable is called "discrete". the cdf of a discrete rv is a "staircase function" staying constant between the possible sample values and having a jump of magnitude p(xi) at each sample value xi

    if the cdf of a rv has a (finite) derivative at x, the derivative is called the "probability density" of X at x and denoted by f(x); for sufficiently small δ, f(x) then approximates the probability that X is mapped to a value between x and x+δ. if the density exists for all x, the rv is said to be "continuous". more generally, if there is a function f(x) such that, for each x∈ℝ, the cdf satisfies

               y
        F(y) = ∫ f(y) dy
              -∞  
    then the rv is said to be "continuous" and f is the "probability density". this generalization allows the density to be discontinuous. in other words, a continuous rv requires a little more than a continuous cdf and a little less than a continuous density

    so for any model, an extended model for a sequence or an n-tuple of IID repetitions is well-defined. relative frequencies and sample averages (in the extended model) "become deterministic" and can be compared with real-world relative frequencies and sample averages in the repeated experiment


    arrival processes

    the differences Xi = Si − Si−1 are called “interarrival times” and the Si are called “arrival epochs”

    for each t>0, N(t) is the number of arrivals in (0,t]. we call {N(t):t>0} an “arrival counting process”

    we need to relate the “counting process” {N(t);t>0} to the “arrival process” :

        {Sn ≤ t} = {N(t) ≥ n} for all n ≥ 1 , t > 0

    if Sn = τ for some τ≤t, then N(τ) = n and N(t) ≥ n

    although we refer to these processes as arrival processes, they could equally well model departures from a system, or any other sequence of incidents. although it is quite common, especially in the simulation field, to refer to incidents or arrivals as events, we shall avoid that here. the nth arrival epoch Sn is a rv and {Sn<t}, for example, is an event. this would make it confusing to refer to the nth arrival itself as an event

    Def : a “renewal process” is an arrival process for which the interarrival intervals X1,X2,.. are IID

    Def : a “Poisson process” is a “renewal process” for which each Xi has an exponential distribution

          F(x) = 1 - exp (−λ*x) , x ≥ 0
    where λ is a fixed parameter called the “rate”

    Def : a rv X is “memoryless” if X is positive and, for all real positive t and x

          p(X > t + x) = p(X > t) * p(X > x)

    the interarrival interval for a Poisson process is exponential, i.e.,

        p(X > x) = exp(−λ*x) , x > 0

    Th : a rv X is “memoryless” if and only if it is exponential

    Th : for a Poisson process of rate λ, and any given t>0, the interval Z from t to the next arrival after t has distribution F(z)=exp(−λ*z) for all z>0. the rv Z is independent of N(t), and, given N(t)=n, Z is independent of S1,..,Sn and of {N(τ):0<τ<t}

    example

    suppose X is the waiting time, starting at time 0, for a bus to arrive, and suppose X is memoryless. after waiting from 0 to t, the distribution of the remaining waiting time from t is the same as the original distribution starting from 0. the still waiting customer is, in a sense, no better at time t than at time 0. on the other hand, if the bus is known to arrive regularly every 16 minutes, then it will certainly arrive within a minute, and X is not memoryless. the opposite situation is also possible. if the bus frequently breaks down, then a 15 minute wait can indicate that the remaining wait is probably very long, so again X is not memoryless

    Def : a counting process {N(t):t>0} has the "stationary increment" property if N(t')−N(t) has the same distribution as N(t'−t) for all t'>t>0

    let N(t,t') = N(t')−N(t), i.e., N(t,t') is the number of arrivals in the increment (t,t']. thus "stationary increments" means that N(t,t') has the same distribution as N(t'−t)

    Def : a counting process {N(t):t>0} has the "independent increment" property if, for every t1,t2,..,tn, the rv’s N(t1),N(t1,t2),...,N(tn-1,tn) are independent

    Th : Poisson processes have stationary and independent increments


    renewal process

    this is a counting process N(t') defined for t' greater than t. so for fixed t, we now have something which we can view over variable t' as a counting process. it's a Poisson process shifted to start at time t, ie, for each t', N(t')-N(t) has the same distribution as N(t'-t). same for joint distributions

    so it's defined in exactly the same way as the original random process is. so it's statistically the same process. which says two things about it: everything is the same. and everything is independent

    we will call that "stationary". everything is the same. everything is independent. and then we'll try to sort out how things can be the same but also be independent

    oh, we already know that. we have two IID random variables, X1 and X2. they're IID. they're independent and identically distributed. "identity distributed" means that in one sense, they are the same. but they're also independent of each other

    so the random variables are defined in the same way. and in that sense, they're stationary. but they're independent of each other by the definition of independence. so our new process is independent of the old process in the interval 0 up to t


    Erlang density

    for a Poisson process of rate λ, the PDF of arrival epoch S2 can be found by convolving the density of X1 and X2. thus

                   t
          fS2(t) = ∫ [λ*exp(−λ*x)] * [λ*exp(−λ*(t−x))] dx =  λ²*t*exp(−λ*t) 
                   0 
    and using iteration and convolving fSn−1(t) with λ*exp(−λ*t) for each n,
            fSn(t)  =  λn*tn−1*exp(−λ*t) / (n−1)! 
    this is called the Erlang density

    this kind of form here with an exp(−λ*t), and with a t, or t², or so forth, is a particularly easy form to integrate. so we just do this again and again. and when we do it again and again, we find out that the density function as a sum of n of these random variables, you keep picking up an extra λ every time you convolve in another exponential random variable. you pick up an extra factor of t whenever you do this again. and strangely enough, this (n-1)! appears down here when you start integrating something with some power of t in it. so when you integrate this, this is what you get

    the joint density of X1,..,Xn is

          fn (x1,..,xn) = λ * exp(−λ*x1−λ*x2−...−λ*xn)
          fXn = λn exp(−λ*sn) 
    where sn = Σ xi

        fSn (s1,..,sn) = λn * exp(−λ*sn) 

    this says that the joint density is uniform over the region where s1<s2...<sn. given that the nth arrival is at sn, the other n−1 arrivals are uniformly distributed in (0,sn), subject to the ordering. integrating (or looking ahead), we get the Erlang marginal density


    poisson PMF

    note that the poisson PMF is a function of λ*t and not of λ or t separately. this is the probability of n arrivals in an interval of size t with rate λ. and if we measure length in a different system of units, λ will change accordingly, so the poisson PMF has to be a function of λ*t only

    Th : if an arrival process has the stationary and independent increment properties and if N(t) has the poisson PMF for given λ and all t>0, then the process is poisson

    Bernoulli processes

    Bernoulli processes are often viewed as the discrete time version of Poisson processes. there is a confusing feature here that must be cleared up first

    the binary rv’s Y1,Y2,.. of the Bernoulli process have no direct analogs in the Poisson process. when we view a Bernoulli process as an arrival process, an arrival occurs at discrete time n if and only if Yn=1. thus Sn=Y1+···+Yn is the number of arrivals up to and including time n. thus {Sn;n≥1} is analogous to the Poisson counting process {N(t);t>0}

    the interarrival intervals, X1,X2,... for a Bernoulli arrival process are the intervals between successive 1’s in the binary stream. thus X1=k if Yi=0 for 1≤i≤k−1 and Yk=1. thus pX1(k)=p(1−p)k−1 for all k≥1. subsequent interarrival intervals are iid with X1. thus, the interarrival intervals for the Bernoulli counting process are geometrically distributed


    interlude 3

    we started to look at any set of time, say t1, t2, up to tk, and then we looked at the increments. how many arrivals occurred between zero and t1? how many arrivals occurred between t1 and t2? how many between t2 and t3? so we looked at these Poisson process increments, a whole bunch of random variables, and we said, these are stationary and independent Poisson counting processes, over their given intervals

    in other words, if you stop the first process, you stop looking at this at time t1. then you look at this from time t1 to t2. you look at this one from tk-1 to tk. and you look at the last one, the next last one, all the way up to tk. and there should be one where you look at it from tk on to t. no, you're only looking at k of them. so these are the things that you're looking at. the statement is that these are independent random variables

    the other statement is they're stationary, which means, if you look at the number of arrivals in this interval, here, it's a function of t2 t1. but it's not a function of t1 alone. it's only a function of the length of the interval. the number of arrivals in any interval of length tk minus tk-1 is a function only of the length of that interval and not of where the interval is. that's a reasonable way to look at stationarity


    conditional arrival densities

    a diverse range of problems involving Poisson processes are best tackled by conditioning on a given number n of arrivals in the interval (0,t], i.e., on the event N(t)=n

    because of the incremental view of the Poisson process as independent and stationary arrivals in each incremental interval of the time axis, we would guess that the arrivals should have some sort of uniform distribution given N(t)=n. more precisely, the joint density of S(n)=(S1,S2,..,Sn) given N(t)=n is uniform over the region 0<s1<s2<...<sn<t

    the main formulas are

        probability of arrival : p(sk > τ | N(t)=n) = [(t-τ) / t]^n
    
        joint density : f(Sn | n) = (n * sn-1) / t^n  

    consider N(t) = 1. if you look at the small increment definition of a Poisson process, it says that arrivals are equally likely at any point along here

    if you condition on the fact that there's been one arrival in this whole interval, and they're equally likely to occur anywhere along here, the only sensible thing that can happen is that the probability density for where that arrival happens is uniform over this whole interval, so it is 1/t

    the important point is that this does not depend on s1, i.e., it is uniform over (0,t]

    next consider N(t) = 2

    given that there are two arrivals in here doesn't make any difference where they are. 2/t^2. this is, in some sense, uniform over s1 and s2. and you have to be more careful about what "uniform" means here

    you can do the same thing for N(t)=n for arbitrary n

    note that the exponents above always sum to λ*t. and what you come up with is this joint density is equal to n!/t^n. so again, it's the same story. it's "uniform"

    it doesn't depend on where the s's are, except for the fact that s1<s2<s3<... we have to sort that out. it has this peculiar n! here that doesn't depend on λ. it does depend on n in this way

    that's a uniform n-dimensional probability density over the volume t^n/n! corresponding to the constraint region 0<s1<...<sn. that region is a little peculiar. this n! is a little peculiar

    the t^n is pretty obvious. because when you have a joint density over t things, each bounded between zero and t, you expect to see a t^n there

    you have a uniform probability density over some strange volume. and it's a strange volume which satisfies this constraint

    to see why n! appears in this uniform density, let U1,..,Un be n IID rv’s, each uniform over (0,t]. let S1,..,Sn be the order statistics for U. the region of volume t^n where the density of U is nonzero partitions into n! regions, one in which s1<s2<...<sn. from symmetry, each volume is the same, and thus each is t^n/n!

    return to N(t) = 2 and look at fS1S2 | N(t)=2

    the area in the s1,s2 space where 0<s1<s2<t is t^2/2, and that explains why the density is 2/t^2

    since X1 = S1, X1 has density and expected value :

    this extends to N(t) = n for arbitrary n :

    paradox

    the mean interarrival time for a Poisson process is 1/λ. but the mean time from any given t to the next arrival is also 1/λ and the mean time back to the previous arrival is (1/λ)*(1−exp(−λ*t)). thus the mean length of the interval containing t is (1/λ)*(2−exp(−λ*t))


    назад     оглавление     вперед