Course Notes: Quantitative Data


  • A data distribution (commonly presented using histogram) can be analyzed by utilizing these components
    • SHAPE: Bell-shaped, Left-skewed, Right-skewed, etc.
    • CENTER: Mean and Median
    • SPREAD: Maximum & Minimum value, Quartile, Interquartile range
    • OUTLIERS: Analyze the existence of outliers. Visit this to find out how to determine outliers.
  • The relationship between mean and median on different shapes of data distribution typically can be considered as follows:
    • Bell-shaped: Mean = Median
    • Left-skewed: Mean < Median
    • Right-skewed: Mean > Median
  • The Five Numbers defines five statistical measures to obtain a dataset profile/summary so that the center and the spread of the dataset can be identified. The Five Numbers are:
    • Min/Minimum
    • Q1/First Quartile
    • Median. Also known as Q2/Second Quartile
    • Q3/Third Quartile
    • Max/Maximum
  • Occasionally, The Five Numbers are complemented by Standard Deviation and Interquartile Range to produce a more comprehensive preview of the dataset.
  • Standard Deviation, another measure in quantitative data analysis, defines the average of the distance of the data element to the mean of the data itself. 
  • Empirical Rule (on bell-shaped/normal distribution) / "68-95-99.7" Rule:
    1. Around 68% of the data falls in the range of the standard deviation. For example, if the mean = 7 and the standard deviation = 1.7 then around 68% of the data fall in the range from 5.3 to 8.7.
    2. Around 95% of the data falls in the standard deviation range of the range 1 (the 68% ones). For example, considering the range 1 (5.3 to 8.7), 95% of the data falls in the range of 3.6 to 10.4.
    3. Around 99.7%  of the data falls in the standard deviation range of the range 2 (the 95% ones). For example, considering the range 2 (3.6 to 10.4), 99.7% of the data falls in the range of 1.9 to 12.1.
  • Standard Score
    • The standard score defines the deviation of a data value from the data mean, relative to the standard deviation value.
    • The standard score can be computed as: 
          • (Observed value - Mean) / Standard Deviation 
    • For example, in a dataset with mean =7 and standard deviation = 1.7, the standard score of a data element with a value of 10 as :
          • (10 - 7) / 1.7 = 1.76, can be considered 'unusual' based on the standard score.

Reference:


PS:
Corrections on this note are welcome. Feel free to leave your comment. Cheers.

Komentar

Postingan populer dari blog ini

Fakta dan Cerita di Balik Lagu-lagu OASIS

Bandung

Sandwich Generation My Ass