

We have sometimes used a table to display the distribution of a random variable $X$. At other times we have written $P(X = k)$ as a formula for each possible value $k$ of $X$.Another useful function that encapsulates all the information about the distribution of $X$ is called the cumulative distribution function of $X$. That's a real mouthful, so it is usually abbreviated to the cdf of $X$.Suppose $X$ has the distribution given below. It happens to be the binomial $(3, 1/2)$ distribution, but that is not important for this discussion.
What about the probability function, as used for discrete random variables? As we have just seen, for a continuous random variable, we have \(p_X(x) = \Pr(X = x) = 0\), for all \(x\), so there is no point in using this. In the next section, we look at the appropriate analogue to the probability function for continuous random variables. The Cumulative Distribution Function is the probability that the variable takes a value less than or equal to x.

However in R, regardless of PMF or PDF, the function that generates the probabilities is known as the “density” function.Now let’s talk about “cumulative” probabilities. Distributions that generate probabilities for continuous values, such as the Normal, are sometimes called “probability density functions”, or PDFs. Distributions that generate probabilities for discrete values, such as the binomial in this example, are sometimes called “probability mass functions” or PMFs. We can quickly generate the probabilities in R using the dbinom function:Barplot(dbinom(x = 0:3, size = 3, prob = 0.5), names.arg = 0:3)The function used to generate these probabilities is often referred to as the “density” function, hence the “d” in front of binom. This is sometimes abbreviated as b(3,0.5). In fact what we just demonstrated is a binomial distribution with 3 trials and probability equal to 0.5.
Why is \(\infty\) in the denominator? Because there is an infinite number of possibilities. \(P(X\le1) = \frac\) chance of selecting it. Looking at the distribution plot above that would be

ECDF stands for “Empirical Cumulative Distribution Function”. We have to use the data itself to create a cumulative distribution.We can do this in R with the ecdf function. In real life, however, the data we collect or observe does not come from a theoretical distribution. We used the binomial and normal cumulative distributions, respectively, to calculate probabilities and visualize the distribution. The cumulative distributions we explored above were based on theory. “Empirical” means we’re concerned with observations rather than theory.
We’ll work with the area variable, which is the total area of pores in each sample.The ecdf functions works on numeric vectors, which are often columns of numbers in a data frame. It contains 4 variables: area, peri, shape, and perm. Let’s try this out with the rock data set that comes with R.The rock data set contains measurements on 48 rock samples from a petroleum reservoir. Just as pbinom and pnorm were the cumulative distribution functions for our theoretical data, ecdf creates a cumulative distribution function for our observed data. The ecdf function returns a function.
