Course Notes: Quantitative Data
- A data distribution (commonly presented using histogram) can be analyzed by utilizing these components
- SHAPE: Bell-shaped, Left-skewed, Right-skewed, etc.
- CENTER: Mean and Median
- SPREAD: Maximum & Minimum value, Quartile, Interquartile range
- OUTLIERS: Analyze the existence of outliers. Visit this to find out how to determine outliers.
- The relationship between mean and median on different shapes of data distribution typically can be considered as follows:
- Bell-shaped: Mean = Median
- Left-skewed: Mean < Median
- Right-skewed: Mean > Median
- The Five Numbers defines five statistical measures to obtain a dataset profile/summary so that the center and the spread of the dataset can be identified. The Five Numbers are:
- Min/Minimum
- Q1/First Quartile
- Median. Also known as Q2/Second Quartile
- Q3/Third Quartile
- Max/Maximum
- Occasionally, The Five Numbers are complemented by Standard Deviation and Interquartile Range to produce a more comprehensive preview of the dataset.
- Standard Deviation, another measure in quantitative data analysis, defines the average of the distance of the data element to the mean of the data itself.
- Empirical Rule (on bell-shaped/normal distribution) / "68-95-99.7" Rule:
- Around 68% of the data falls in the range of the standard deviation. For example, if the mean = 7 and the standard deviation = 1.7 then around 68% of the data fall in the range from 5.3 to 8.7.
- Around 95% of the data falls in the standard deviation range of the range 1 (the 68% ones). For example, considering the range 1 (5.3 to 8.7), 95% of the data falls in the range of 3.6 to 10.4.
- Around 99.7% of the data falls in the standard deviation range of the range 2 (the 95% ones). For example, considering the range 2 (3.6 to 10.4), 99.7% of the data falls in the range of 1.9 to 12.1.
- Standard Score
- The standard score defines the deviation of a data value from the data mean, relative to the standard deviation value.
- The standard score can be computed as:
- (Observed value - Mean) / Standard Deviation
- For example, in a dataset with mean =7 and standard deviation = 1.7, the standard score of a data element with a value of 10 as :
- (10 - 7) / 1.7 = 1.76, can be considered 'unusual' based on the standard score.
Reference:
PS:
Corrections on this note are welcome. Feel free to leave your comment. Cheers.
Komentar
Posting Komentar