Content Validity in Research: Definition & Examples

  • It is important for researchers to establish content validity in order to ensure that their study is measuring what it intends to measure.
  • There are several ways to establish content validity, including expert opinion, focus groups, and surveys.
content validity

What Is Content Validity?

Content Validity is the degree to which elements of an assessment instrument are relevant to a representative of the targeted construct for a particular assessment purpose.

This encompasses aspects such as the appropriateness of the items, tasks, or questions to the specific domain being measured and whether the assessment instrument covers a broad enough range of content to enable conclusions to be drawn about the targeted construct (Rossiter, 2008).

One example of an assessment with high content validity is the Iowa Test of Basic Skills (ITBS). The ITBS is a standardized test that has been used since 1935 to assess the academic achievement of students in grades 3-8.

The test covers a wide range of academic skills, including reading, math, language arts, and social studies. The items on the test are carefully developed and reviewed by a panel of experts to ensure that they are fair and representative of the skills being tested.

As a result, the ITBS has high content validity and is widely used by schools and districts to measure student achievement.

Meanwhile, most driving tests have low content validity.  The questions on the test are often not representative of the skills needed to drive safely. For example, many driving permit tests do not include questions about how to parallel park or how to change lanes.

Meanwhile, driving license tests often do not test drivers in non-ideal conditions, such as rain or snow. As a result, these tests do not provide an accurate measure of a person’s ability to drive safely.

The higher the content validity of an assessment, the more accurately it can measure what it is intended to measure — the target construct (Rossiter, 2008).

Why is content validity important in research?

Content validity is important in research as it provides confidence that an instrument is measuring what it is supposed to be measuring.

This is particularly relevant when developing new measures or adapting existing ones for use with different populations. It also has implications for the interpretation of results, as findings can only be accurately applied to groups for which the content validity of the measure has been established.

Step-by-step guide: How to measure content validity?

Haynes et al. (1995) emphasized the importance of content validity and gave an overview of ways to assess it.

One of the first ways of measuring content validity was the Delphi method, which was invented by NASA in 1940 as a way of systematically creating technical predictions.  The method involves a group of experts who make predictions about the future and then reach a consensus about those predictions. Today, the Delphi method is most commonly used in medicine.

In a content validity study using the Delphi method, a panel of experts is asked to rate the items on an assessment instrument on a scale. The expert panel also has the opportunity to add comments about the items.

After all ratings have been collected, the average item rating is calculated. In the second round, the experts receive summarized results of the first round and are able to make further comments and revise their first-round answers.

This back-and-forth continues until some homogeneity criterion — similarity between the results of researchers — is achieved (Koller et al., 2017).

Lawshie (1975) and Lynn (1986) created numerical methods to assess content validity. Both of these methods require the development of a content validity index (CVI). A content validity index is a statistical measure of the degree to which an assessment instrument covers the content domain of interest.

There are two steps in calculating a content validity index:

1) Determining the number of items that should be included in the assessment instrument;

2) Determining the percentage of items that actually are included in the assessment instrument.

The first step, determining the number of items that should be included in an assessment instrument, can be done using one of two approaches: item sampling or expert consensus.

Item sampling involves selecting a sample of items from a larger set of items that cover the content domain. The number of items in the sample is then used to estimate the total number of items needed to cover the content domain.

This approach has the advantage of being quick and easy, but it can be biased if the sample of items is not representative of the larger set (Koller et al., 2017).

The second approach, expert consensus, involves asking a group of experts how many items should be included in an assessment instrument to adequately cover the content domain. This approach has the advantage of being more objective, but it can be time-consuming and expensive.

Experts are able to assign these items to dimensions of the construct that they intend to measure and assign relevance values to decide whether an item is a strong measure of the construct.

Although various attempts to numerize the process of measuring content validity exist, there is no systematic procedure that could be used as a general guideline for the evaluation of content validity (Newman et al., 2013).

When is content validity used?

Education Assessment

In the context of educational assessment, validity is the extent to which an assessment instrument accurately measures what it is intended to measure. Validity concerns anyone who is making inferences and decisions about a learner based on data.

This can have deep implications for students’ education and future. For instance, a test that poorly measures students’ abilities can lead to placement in a future course that is unsuitable for the student and, ultimately, to the student’s failure (Obilor, 2022).

There are a number of factors that specifically affect the validity of assessments given to students, such as (Obilor, 2018):

Unclear Direction: If directions do not clearly indicate to the respondent how to respond to the tool’s items, the validity of the tool is reduced.

Vocabulary: If the vocabulary of the respondent is poor, and he does not understand the items, the validity of the instrument is affected.

Poorly Constructed Test Items: If items are constructed in such a way that they have different meanings for different respondents, validity is affected.

Difficulty Level of Items: In an achievement test, too easy or too difficult test items would not discriminate among students, thereby lowering the validity of the test.

Influence of Extraneous Factors: Extraneous factors like the style of expression, legibility, mechanics of grammar (spelling, punctuation), handwriting, and length of the tool, amongst others, influence the validity of a tool.

Inappropriate Time Limit: In a speed test, if enough time limit is given, the result will be invalidated as a measure of speed. In a power test, an inappropriate time limit will lower the validity of the test.

Interviews

There are a few reasons why interviews may lack content validity. First, interviewers may ask different questions or place different emphases on certain topics across different candidates. This can make it difficult to compare candidates on a level playing field.

Second, interviewers may have their own personal biases that come into play when making judgments about candidates.

Finally, the interview format itself may be flawed. For example, many companies ask potential programmers to complete brain teasers — such as calculating the number of plumbers in Chicago or coding tasks that rely heavily on theoretical knowledge of data structures — even if this knowledge would be used rarely or never on the job.

Questionnaires

Questionnaires rely on the respondents’ ability to accurately recall information and report it honestly. Additionally, the way in which questions are worded can influence responses.

To increase content validity when designing a questionnaire, careful consideration must be given to the types of questions that will be asked.

Open-ended questions are typically less biased than closed-ended questions, but they can be more difficult to analyze. It is also important to avoid leading or loaded questions that might influence respondents’ answers in a particular direction. The wording of questions should be clear and concise to avoid confusion (Koller et al., 2017).

FAQs

Is content validity internal or external?

Most experts agree that content validity is primarily an internal issue. This means that the concepts and items included in a test should be based on a thorough analysis of the specific content area being measured.

The items should also be representative of the range of difficulty levels within that content area. External factors, such as the opinions of experts or the general public, can influence content validity, but they are not necessarily the primary determinant.

In some cases, such as when developing a test for licensure or certification, external stakeholders may have a strong say in what is included in the test (Koller et al., 2017).

How can content validity be improved?

There are a few ways to increase content validity. One is to create items that are more representative of the targeted construct. Another is to increase the number of items on the assessment so that it covers a greater range of content.

Finally, experts can review the items on the assessment to ensure that they are fair and representative of the skills being tested (Koller et al., 2017).

How do you test the content validity of a questionnaire?

There are a few ways to test the content validity of a questionnaire. One way is to ask experts in the field to review the questions and provide feedback on whether or not they believe the questions are relevant and cover all important topics.

Another way is to administer the questionnaire to a small group of people and then analyze the results to see if there are any patterns or themes emerging from the responses.

Finally, it is also possible to use statistical methods to test for content validity, although this approach is more complex and usually requires access to specialized software (Koller et al., 2017).

How can you tell if an instrument is content-valid?

There are a few ways to tell if an instrument is content-valid. The first of these involves looking at two subsets of content validity: face and construct validity.

Face validity is a measure of whether or not the items on the test appear to measure what they claim to measure. This is highly subjective but convenient to assess.

Another way is to look at the construct validity, which is whether or not the items on the test measure what they are supposed to measure. Finally, you can also look at the criterion-related validity, which is whether or not the items on the test predict future performance.

What is the difference between content and criterion validity?

Content validity is a measure of how well a test covers the content it is supposed to cover.

Criterion validity, meanwhile, is an index of how well a test correlates with an established standard of comparison or a criterion.

For example, if a measure of criminal behavior is criterion valid, then it should be possible to use it to predict whether an individual will be arrested in the future for a criminal violation, is currently breaking the law, and has a previous criminal record (American Psychological Association).

Are content validity and construct validity the same?

Content validity is not the same as construct validity.

Content validity is a method of assessing the degree to which a measure covers the range of content that it purports to measure.

In contrast, construct validity is a method of assessing the degree to which a measure reflects the underlying construct that it purports to measure.

It is important to note that content validity and construct validity are not mutually exclusive; a measure can be both valid and invalid with respect to content and construct.

However, content validity is a necessary but not sufficient condition for construct validity. That is, a measure cannot be construct valid if it does not first have content validity (Koller et al., 2017).

For example, an academic achievement test in math may have content validity if it contains questions from all areas of math a student is expected to have learned before the test, but it may not have construct validity if it does not somehow relate to tests of similar and different constructs.

How many experts are needed for content validity?

There is no definitive answer to this question as it depends on a number of factors, including the nature of the instrument being validated and the purpose of the validation exercise.

However, in general, a minimum of three experts should be used in order to ensure that the content validity of an instrument is adequately established (Koller et al., 2017).

References

American Psychological Association. (n.D.). Content Validity. American Psychological Association Dictionary.

Haynes, S. N., Richard, D., & Kubany, E. S. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological assessment, 7(3), 238.

Koller, I., Levenson, M. R., & Glück, J. (2017). What do you think you are measuring? A mixed-methods procedure for assessing the content validity of test items and theory-based scaling. Frontiers in psychology, 8, 126.

Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel psychology, 28(4), 563-575.

Lynn, M. R. (1986). Determination and quantification of content validity. Nursing research.

Chicago

Obilor, E. I. (2018). Fundamentals of research methods and Statistics in Education and Social Sciences. Port Harcourt: SABCOS Printers & Publishers.

OBILOR, E. I. P., & MIWARI, G. U. P. (2022). Content Validity in Educational Assessment.

Newman, Isadore, Janine Lim, and Fernanda Pineda. “Content validity using a mixed methods approach: Its application and development through the use of a table of specifications methodology.” Journal of Mixed Methods Research 7.3 (2013): 243-260.

Rossiter, J. R. (2008). Content validity of measures of abstract constructs in management and organizational research. British Journal of Management, 19(4), 380-388.

Saul Mcleod, PhD

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Educator, Researcher

Saul Mcleod, Ph.D., is a qualified psychology teacher with over 18 years experience of working in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.


Olivia Guy-Evans, MSc

BSc (Hons) Psychology, MSc Psychology of Education

Associate Editor for Simply Psychology

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

Charlotte Nickerson

Research Assistant at Harvard University

Undergraduate at Harvard University

Charlotte Nickerson is a student at Harvard University obsessed with the intersection of mental health, productivity, and design.