Measuring Item Reliability Part 1 – Item Discrimination Index - Maxinity (2024)

Item Discrimination Index Summary

The discrimination index (DI) measures how discriminating items in an exam are – i.e. how well an item can differentiate between good candidates and less able ones. For each item it is a measure based on the comparison of performance between stronger and weaker candidates in the exam as a whole. The discrimination index value for an item ranges from -1 to +1 with positive numbers over 0.2 reliably implying that an item is positively discriminating.

Measuring Item Reliability Part 1 – Item Discrimination Index - Maxinity (1)

Calculating the Item Discrimination Index

In order to formally categorise your stronger and weaker candidates, we create an upper and a lower group in an exam, which are determined by the top and bottom 27% performing candidates. This gives you the same number of candidates in both groups.

You then take the number of candidates from the lower group who got the item correct away from the number of candidates from the upper group who got the item correct, divide by the number of candidates in a group, and you will get a number between -1 and 1. This is the discrimination index for the item.

Of course, if you have sophisticated exam software such as Maxexam then it will calculate the discrimination index for each item for you.

How discriminating is an item?

If an item was fully discriminating (which never happens in reality!), then everyone in your upper group would get it right and everyone in your lower group would get it wrong – leading to a discrimination index of 1.

However, there is no such thing as the ‘perfect’ item, and the general guidelines are as follows:

< 0 – negative discrimination; usually a bad sign. Could indicate a broken item i.e. could have been very misleading, or even mis-keyed (so the wrong option has been selected as the correct one in the system)

0 – 0.2 – not discriminating

0.2 – 0.4 – starting to become discriminating

0.4 – strongly discriminating because in practice, it can be difficult to obtain a DI greater than 0.4.

Should all items be discriminating?

An item that is to be used for ‘ranking’ within an exam (i.e. to help differentiate your good candidates from your bad candidates – see our blog) should ideally have a high discrimination index.

However, an item which is considered ‘essential knowledge’ – i.e. something that every candidate should know should have a discrimination index close to 0. As an example, it’s important that every student should know they should wash their hands with hot and soapy water. It’s even possible that an essential knowledge item may have a very slightly negative discrimination index (e.g. -0.1) without that signalling that there is an issue.

Does the Discrimination Index on its own tell you if an item is working as expected?

The discrimination index shouldn’t try to be understood as a standalone value. As mentioned above, the desirable discrimination index depends on if an item is easy/essential knowledge, or if it is hard/ranking. This means that you need to better understand the purpose of the item, and to look at other stats such as the mean of the item.

The mean is the average score on the item i.e. the percentage of people who got it correct, so a mean of 100% means that everyone got it correct. The mean is measured between 0-100%.

If an item has a discrimination index of 0, it indicates it is not discriminating at all – i.e. the same number of people in the higher group got it correct as in the lower group. Whether that is an issue will depend both on the type of item and what proportion of candidates overall got it correct (the mean).

The item could be testing essential knowledge – if the mean is high (approaching 100%) then a very low discrimination index is not a problem.

However, if it is a tricky question where only a small percentage of candidates got it correct, then a low discrimination index is a problem as it suggests the item was answered correctly equally by the stronger and weaker candidates, showing it is not discriminating between them.

A worked example

Measuring Item Reliability Part 1 – Item Discrimination Index - Maxinity (2)

Does the Discrimination Index only work for the correct answer?

Whilst the discrimination index is usually calculated for the correct answer it can also be calculated for the ‘distractors’ – i.e. those options that aren’t the correct response to the item. Ideally, these should all have a negative discrimination index– if one doesn’t then that means more of your strong candidates are picking it than your weak candidates. This is a great way to identify ‘broken’ distractors in an item – e.g. where another answer could be considered to be correct too.

Does the discrimination index only work for dichotic items?

If the item offers half marks on some options (i.e. a candidate neither got it right nor wrong), then an alternative formula is used. It is now the mean mark of the lower group subtracted from the mean mark of the upper group, divided by the maximum mark.

Using this formula, it is also possible to calculate the DI at Question, Scenario and Exam levels, meaning the DI is very flexible. Using Maxexam, this can help you to determine, for example, if an entire exam is meeting your objectives for distinguishing between higher and lower performing students.

Is the discrimination index always the best way of measuring the performance of your items?

It can be argued that because the discrimination index only looks at the top and bottom 27% of candidates, alternative methods should also be considered as this statistic only uses 54% of the whole data. The discrimination index is a very powerful and meaningful way of measuring the performance of your items, but if your exams only have small numbers of candidates, removing the other 46% of data can have quite an impact. Other measures like Pearson’s product-moment correlation coefficient use 100% of the data and a broader picture may be built up by looking at these calculations in parallel.

In summary

The discrimination index is one way of helping you to understand how the items in your exam are performing and is one that is used widely. Other methods (which can be used alongside the discrimination index) include Pearson’s product-moment correlation coefficient (PPC), Horst, and Cronbach’s Alpha and we will take a deeper look at these over the coming few months. Some also suggest the use of Point Biserial, and we will address the pros and cons of this methodology in our next blog.

Measuring Item Reliability Part 1 – Item Discrimination Index - Maxinity (2024)

FAQs

How to calculate item reliability index? ›

The item reliability index is equal to the product of item-score SD and discrimination. In other words, si⋅riX s i ⋅ r i X .

How do you measure item reliability? ›

Cronbach's alpha is the most popular measure of item reliability; it is the average correlation of items in a measurement scale. If the items have variances that significantly differ, standardized alpha is preferred. When all items are consistent and measure the same thing, then the coefficient alpha is equal to 1.

How do you evaluate discrimination index? ›

Discrimination index – This value is based on the top 27% and bottom 27% of the class on the exam. Computed by subtracting the number of successes by the low group on the item from the number of successes by the high group, and divide this difference by the size of the class.

What does a discrimination index of 0.34 mean in test item analysis? ›

The value of DIS ranges from − 1.00 to + 1.00. Negative items are non-discriminating, while the positives are discriminating. The discriminating items are categorized as poor (≤ 20), acceptable (0.21 to 0.24), good (0.25–0.34), and excellent (≥ 0.35) discriminating [14, 28, 33, 34].

How to calculate reliability index? ›

Calculating Reliability Index for Each Insight Category

To do so, first, a normalized value is calculated for each metric using the mean and standard deviation of the dataset (i.e., peer company values for that metric). For some of the metrics, a lower value is better than a higher value.

What is the most common measure of index reliability? ›

By far the most commonly used internal consistency index is Cronbach's coefficient alpha, the formula for which can be found in psychometric texts. Cronbach's alpha is formulated as the mean of all possible split-half coefficients.

What is the best way to measure reliability? ›

Four major ways of assessing reliability are test-retest, parallel test, internal consistency, and inter-rater reliability. In theory, reliability refers to the true score variance to the observed score variance. Reliability is majorly an empirical issue concentrated on the performance of an empirical measure.

What is the formula for product reliability? ›

Probability of System Being in State

The reliability function in the exponential case is: R(t) = e-λt, where λ is the failure rate and t is the period of time over which reliability is measured. The probability of failure is F = 1 – R(t).

What tools can be used to measure reliability? ›

This measure of reliability is described most often using Cronbach's alpha (sometimes called coefficient alpha). It measures how consistently participants respond to one set of items. You can think of it as a sort of average of the correlations between items.

What does an item discrimination index value 0.4 mean? ›

The item with negative discrimination index (D) was considered to be discarded; D: 0.0 – 0.19 – poor item – to be revised; D: 0.2 – 0.29 – acceptable; D: 0.3 – 0.39 – good; D: > 0.4 – excellent.

What is the positive discrimination index? ›

A positive discrimination index indicates that those students who got the test item correct also had a high overall exam score. When the discrimination index is negative it means that the examinees in the low performing group got the answer correct at a higher rate than the higher performing group.

How do you calculate product reliability? ›

The reliability function in the exponential case is: R(t) = e-λt, where λ is the failure rate and t is the period of time over which reliability is measured. The probability of failure is F = 1 – R(t).

How do you calculate reliability rating? ›

Reliability is calculated as an exponentially decaying probability function which depends on the failure rate. Since failure rate may not remain constant over the operational lifecycle of a component, the average time-based quantities such as MTTF or MTBF can also be used to calculate Reliability.

How to calculate AR index? ›

The severity of periAR was prospectively evaluated in 146 patients treated with the Medtronic CoreValve (Minneapolis, Minnesota) prosthesis by echocardiography, angiography, and measurement of the aortic regurgitation (AR) index, which is calculated as ratio of the gradient between diastolic blood pressure (DBP) and ...

How do you calculate reliability change index? ›

A Reliable Change Index (RCI) is computed by dividing the difference between the pretreatment and posttreatment scores by the standard error of the difference between the two scores.

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Nicola Considine CPA

Last Updated:

Views: 6494

Rating: 4.9 / 5 (49 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Nicola Considine CPA

Birthday: 1993-02-26

Address: 3809 Clinton Inlet, East Aleisha, UT 46318-2392

Phone: +2681424145499

Job: Government Technician

Hobby: Calligraphy, Lego building, Worldbuilding, Shooting, Bird watching, Shopping, Cooking

Introduction: My name is Nicola Considine CPA, I am a determined, witty, powerful, brainy, open, smiling, proud person who loves writing and wants to share my knowledge and understanding with you.