Statistics in Psychology – UGC NET – Notes

TOPIC INFO (UGC NET)

TOPIC INFO – UGC NET (Psychology)

SUB-TOPIC INFO – Research Methodology and Statistics (UNIT 2)

CONTENT TYPE – Detailed Notes

What’s Inside the Chapter? (After Subscription)

1. Measures of Central Tendency

1.1. Mean (Arithmetic Mean)

1.2. Median

1.3. Mode

1.4. Comparison of Mean, Median, and Mode

1.5. Limitations of Central Tendency Measures

2. Measures of Dispersion

2.1. Absolute Measure of Dispersion

2.2. Relative Measure of Dispersion

2.3. Range of Data Set

2.4. Mean Deviation

3. Mean Deviation for Ungrouped Data

3.1. Measures of Dispersion Formula

3.2. Coefficient of Dispersion

3.3. Measures of Dispersion vs Central Tendency

4. Normal Probability Curve

5. Parametric and Non-Parametric Tests

5.1. Definition of Parametric and Non-Parametric Statistics

5.2. Assumptions of Parametric and Non-Parametric Statistics

5.3. The T-Test

5.4. Non-Parametric Statistics

5.5. Advantages of Non-Parametric Statistics

5.6. Disadvantages of Non-Parametric Statistics

5.7. Wilcoxon Signed Rank Test

5.8. Mann-Whitney Test

5.9. Kruskal Wallis Anova Test in Psychology

6. Power Analysis and Effect Size

Note: The First Topic of Unit 1 is Free.

Access This Topic With Any Subscription Below:

UGC NET Psychology
UGC NET Psychology + Book Notes

Statistics in Psychology

UGC NET PSYCHOLOGY

Research Methodology and Statistics (UNIT 2)

LANGUAGE

Measures of Central Tendency

In social research, measures of central tendency are statistical tools used to summarize large datasets by identifying a single representative value. These measures help researchers analyze social patterns, economic conditions, demographic trends, and behavioral data. The three main measures are mean, median, and mode, each with distinct properties, formulas, and applications.

Mean (Arithmetic Mean)

Definition and Formula

The mean is the most commonly used measure of central tendency in social research. It is calculated by adding all values in a dataset and dividing by the total number of observations. The mean provides a balanced average that represents the dataset as a whole. This measure is particularly useful in studies where data values are continuous, such as income levels, test scores, population growth rates, and crime rates.

One of the major advantages of the mean is its sensitivity to every value in the dataset, ensuring that it accounts for all variations. However, it is also highly affected by extreme values (outliers), which can distort the representation of central tendency. For instance, if a study on household income includes a few extremely wealthy individuals, the mean income may appear much higher than what is typical for most households. This limitation makes the mean less useful when dealing with skewed distributions or datasets with significant variations.

For a dataset with values

$x_1, x_2, x_3, …, x_n$

, the mean is given by the formula:

$Mean (\overset{ˉ}{x}) = \frac{\sum x_{i}}{n}$

where:

$\sum x_{iis the sum of all values in the dataset}$

$n$
is the total number of observations

Example of Mean Calculation

A researcher wants to determine the average household income (in $1000s) of five families in a city. The incomes are: 45, 50, 55, 60, and 90.

$\overset{ˉ}{x} = \frac{45 + 50 + 55 + 60 + 90}{5} = \frac{300}{5} = 60$

Thus, the mean household income is $60,000.

Application in Social Research

The mean is used to analyze average wages, literacy rates, per capita income, and life expectancy. However, it is highly affected by outliers. In this example, the household with $90,000 income skews the mean upwards, making it appear higher than what most families earn.

Properties of Mean:

Mean is sensitive to the actual position of each and every score in a distribution and if another score is included in the distribution, then the mean or average of that distribution will change. For example, mean of the scores 5, 4, 6, 3, 2 is 4 (5+4+6+3+2 = 20 ÷ 5 = 4). But if we change the scores to 5, 4, 6, 3, 2, 8, the mean will be 4.67 (5+4+6+3+2+8 = 28 ÷ 6 = 4.67).
Mean denotes a balance point of any distribution and the total of positive deviations from the mean is equal to the negative deviations from the mean (King and Minium, 2008).
Mean is especially effective when we want the measure of central tendency to reflect the sum of the scores.

Advantages of Mean:

The definition of mean is rigid, which is a quality of a good measure of central tendency.
It is not only easy to understand but also easy to calculate.
All the scores in the distribution are considered when mean is computed.
Further mathematical calculations can be carried out on the basis of mean.
Fluctuations in sampling are least likely to affect mean.

Limitations of Mean:

Outliers or extreme values can have an impact on mean.
When there are open-ended classes, such as 10 and above or below 5, mean cannot be computed. In such cases median and mode can be computed, as midpoints cannot be determined.
If a score in the data is missing or unclear, then mean cannot be computed accurately unless the missing value is excluded from the data.
It is not possible to determine mean through inspection; it requires proper calculation and cannot be obtained directly from a graph.
It is not suitable for data that is skewed or highly asymmetrical, as mean may not represent the data accurately.

Median

Definition and Formula

The median is the middle value in an ordered dataset. When data values are arranged in ascending or descending order, the median is the value that divides the dataset into two equal halves. This measure is highly effective in social research involving income distribution, educational attainment, and age demographics, where extreme values may distort the mean.

One of the key benefits of the median is that it is not affected by outliers, making it a better measure of central tendency for skewed distributions. For example, in a study on household income in a country, the median provides a more accurate representation of what most people earn, as it is not influenced by the extremely high earnings of a small percentage of the population. The median is also commonly used in studies on wealth distribution, wage gaps, and income inequality, as it better reflects the typical experience of individuals in a population.

For an odd number of observations:

$Median = Middle Value$

For an even number of observations:

$Median = \frac{({Middle Value}_{1} + {Middle Value}_{2})}{2}$

Example of Median Calculation

Using the same household income data (45, 50, 55, 60, 90), the ordered values are:

$45, 50, 55, 60, 90$

Since there are five values (odd number), the median is the middle value, which is 55.

Now, if another household with an income of $70,000 is added, the dataset becomes:

$45, 50, 55, 60, 70, 90$

Since there are now six values (even number), the median is:

$\frac{60 + 55}{2} = 57.5$

Thus, the median household income is $57,500.

Application in Social Research

The median is commonly used in income distribution, wealth inequality, and economic disparity studies. It provides a better representation of typical income levels than the mean when extreme values are present.

Properties of Median:

When compared to mean, median is less sensitive to extreme scores or outliers.
When a distribution is skewed or asymmetrical, median can be adequately used.
When a distribution is open-ended, that is, actual score at one end of the distribution is not known, then median can be computed.

Advantages of Median:

The definition of median is rigid, which is a quality of a good measure of central tendency.
It is easy to understand and calculate.
It is not affected by outliers or extreme scores in data.
Unless the median falls in an open-ended class, it can be computed for grouped data with open-ended classes.
In certain cases, it is possible to identify median through inspection as well as graphically.

Limitations of Median:

Some statistical procedures using median are quite complex. Computation of median can be time consuming when large data is involved because the data needs to be arranged in order before it is computed.
Median cannot be computed exactly when ungrouped data is even. In such cases, it is estimated as the mean of the middle scores.
It is not based on each and every score in the distribution.
It can be affected by sampling fluctuations and is therefore less stable than mean.

Mode

Definition and Formula

The mode is the most frequently occurring value in a dataset. Unlike the mean and median, which require numerical calculations, the mode identifies the most common category or score in a dataset. This measure is particularly useful in research involving categorical or qualitative data, such as voting preferences, consumer behavior, employment sectors, and language use.

In social surveys, the mode helps researchers understand dominant trends. For example, if a survey examines preferred modes of transportation in a city, and the majority of respondents choose public buses, the mode of the dataset would be public buses. The mode is also useful in identifying common preferences, behaviors, and patterns in large populations.

A dataset may have one mode (unimodal), two modes (bimodal), or multiple modes (multimodal). In cases where data distributions have more than one peak, using the mode can help identify subgroups or distinct patterns within a population. However, the mode is limited when dealing with continuous numerical data, as it may not always provide a single clear central value.

A dataset can be:

Unimodal (one mode)
Bimodal (two modes)
Multimodal (more than two modes)
No mode (if all values occur with the same frequency)

Example of Mode Calculation

A researcher surveys 10 people about their preferred mode of transport:

$Car, Bus, Train, Bus, Bike, Bus, Car, Bike, Bus, Train$

Since Bus appears 4 times, while others appear fewer times, the mode is “Bus”.

In a numerical dataset: 2, 3, 3, 4, 5, 5, 5, 6, 7, 7, 7

Modes = 5 and 7 (Bimodal)

Application in Social Research

The mode is widely used in survey analysis, market research, and behavioral studies to determine dominant choices, preferences, and trends. For example, in elections, the mode identifies the most voted political party.

Properties of Mode:

Mode can be used with variables that can be measured on nominal scale.
Mode is easier to compute than mean and median, but it is not used often because of lack of stability from one sample to another and because a dataset may have more than one mode. In such cases, it may not adequately represent central location.
Mode is not affected by outliers or extreme scores.

Advantages of Mode:

It is not only easy to comprehend and calculate but it can also be determined by mere inspection.
It can be used with quantitative as well as qualitative data.
It is not affected by outliers or extreme scores.
Even if a distribution has one or more than one open-ended class(es), mode can easily be computed.

Limitations of Mode:

It is sometimes possible that the scores in the data vary from each other and in such cases the data may have no mode.
Mode cannot be rigidly defined.
In case of bimodal or multimodal distribution, interpretation and comparison become difficult.
Mode is not based on the whole distribution.
It may not be possible to compute further mathematical procedures based on mode.
Sampling fluctuations can have an impact on mode.

Comparison of Mean, Median, and Mode

Measure	Definition	Best Used For	Sensitive to Outliers?
Mean	Sum of values divided by count	Normally distributed numerical data (e.g., GDP, test scores)	Yes
Median	Middle value in an ordered dataset	Skewed data (e.g., income, wealth distribution)	No
Mode	Most frequently occurring value	Categorical and ordinal data (e.g., voting preferences, product choices)	No

Limitations of Central Tendency Measures

While these measures provide valuable insights, each has limitations.

The mean is highly affected by outliers, making it misleading in skewed distributions. The median does not consider all values, which can result in loss of information. The mode may not exist or may not be unique, making it unreliable for numerical data with evenly distributed values.

To overcome these limitations, researchers often use multiple measures together. For example, in income inequality studies, both mean and median are used to compare how much wealth is concentrated among the highest earners.

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here