Thursday, June 12, 2014

Introduction to Data


Data, Random Variable and Parameter
In most of our experiments, we represent data as a random variable. A random variable is a mapping from the set of outcome space to set of real numbers. But is the data really random ?

Lets take tossing of a die. Considering the die is fair, it can randomly produce values 1 to 6, with equal probability. With the frequentist approach, given sufficient number of trails, we must be able to see all the numbers 1-6 appear with equal frequency. You can check this in Matlab with the following code:



Distribution is defined as way in which something is shared among a group. Here, the total probability is shared equally among all the outcomes in the sample space. Since the sharing is equal, this is called uniform distribution.


Whats with Data? Does it have a  parameter? What isle a parameter?
Can we represent it as a Random Variable???

What is a distribution? How can you characterize a distribution?
Can you generalize this?

Example:
(1) Rolling a die many times
(2) Rolling many die many times and taking the count