Evaluating the Validity and Reliability of Authentic Learning Instruments using RASCH Model

Evaluating the Validity and Reliability of Authentic Learning Instruments using RASCH Model


Introduction
This study is a need analysis phase to develop an authentic learning model in Malaysian Polytechnic. In this phase, a set of questionnaires with a Likert scale was used to obtain feedback from Malaysian Polytechnic lecturers. The questionnaire is adapted from The Evaluation Tool of Authentic Learning (Aziz et al., 2013). However, Likert scale has no clear meaning in the study of social sciences (Arbuckle, 2006;International Journal of Global Optimization andIts Application Vol. 1, No. 3, September 2022, pp.182-189. 183 Johns, 2010). This is because the Likert scale is an ordinal data that only categorizes the views of each respondent without providing accurate interpretation of each view. Therefore, the Rasch model analysis was used to evaluate the validity and reliability of each questionnaire item. An ability of the Rasch model analysis to measure latent traits from nominal and ordinal data to interval and ratio data is an invaluable finding in social science studies. This is because, the creation of a logit scale in the Rasch model analysis allows nominal and ordinal data in Likert scale to be converted to interval and ratio data that is scientifically meaningful and give a clear meaning to an item tested.

Authentic Learning Instrument Development
The Evaluation Tool of Authentic Learning, based on Herrington and Oliver (Aziz et al., 2013) is used as the instrument in this study. This questionnaire consists of 39 items in five sections (Part A to Part E). In summary, Section A is the respondent demographic data containing 6 items. Part B is constructed to test the respondent's knowledge of authentic learning that contains 7 items. Part C to Part E are constructs of an authentic learning element adapted from The Evaluation Tool of Authentic Learning. This part contains 26 items and specific to identify the authentic learning implementation among Malaysian Polytechnic lecturers from the point of teaching content, assignment and assessment. In detail, the Likert scale ranging from 1 to 5 (1 -Hardly ever; 2 -Seldom; 3 -Sometimes; 4 -Often; 5 -Always) was used to evaluate each question. This scale intended to see the frequencies of implementing the authentic learning among lecturers in terms of teaching content, teaching activity and teaching assessment.

Rasch Model Analysis -Logit Scale
In the measurement scale, the data hierarchy starts from nominal data, ordinal data, interval data and ratios data (Velleman & Wilkinson, 1993). Ratio data has the highest hierarchy because it includes all the features contained in interval, ordinal and nominal data (Creswell, 2002). In statistical analysis, data on a high hierarchy can be lowered to low hierarchy data according to the suitability of the test (Blomberg et al., 2003). On the other hand, data in a low hierarchy cannot be elevated to higher hierarchy data (Herrington et al., 2014). However, the Rasch model analysis can raise the data from lower to higher hierarchy through the formation of logit scales.  Table 1 shows an example when one question is asked to 100 respondents. Interpretation of consent is considered for each respondent who responds. When the response frequency is converted into a probability ratio, indirectly, ordinal data has been converted to interval data. This interval data is then converted to ratio data. To obtain the same interval value (-2, 0, 2) to allow the measurement to be made, this ratio data is converted to the logarithmic value. In the Rasch model analysis, the interval (-2, 0, 2) is known as the logit scale. The formation of this logit scale can be used as a basis to illustrate each scientifically respected human response in social science research.  Table 2 shows the list of lecturers' selections for each Polytechnic by using strata sampling method. Strata sampling method was used in this study involving 36 lecturers with work experience of at least five years and above. Minimum five years working experience would be enough to ensure lecturers with expertise in their respective fields (Jamil et al., 2014). Six Malaysian Polytechnics that offer engineering program (civil, mechanical and electrical) are selected based on the similarity of program offered for each polytechnic. This means each selected polytechnic offer courses with the same syllabus in order to avoid any inconsistencies from the point of delivery of instruction among the lecturers.

Instruments
A set of questionnaires were used to assess the validity and reliability of instrument for authentic learning implementation in Malaysian Polytechnics. The construction of the questionnaires was adapted from The Evaluation Tool of Authentic Learning (Aziz et al., 2013). This questionnaire consists of 39 items in five sections (Part A to Part E). In summary, Section A is the respondent demographic data containing 6 items. Part B is constructed to test the respondent's knowledge of authentic learning that contains 7 items.
International Journal of Global Optimization and Its Application Vol. 1, No. 3, September 2022, pp.182-189. 185 Part C to Part E are constructs of an authentic learning element adapted from The Evaluation Tool of Authentic Learning. Part C to Part E contain 26 items and they are specific to identify the authentic learning implementation among Malaysian Polytechnic lecturers from the point of teaching content, assignment and assessment.

Data Analysis
Construct validity of each questionnaire item is based on the unidimensionality statistical analysis performed using the Rasch model analysis. In Rasch model analysis, unidimensionality can be seen through principal component analysis. According to Azrilah et al. (Arbuckle, 2006), unidimensionality feature is critical in determining measuring instruments in one direction and one face only if the data is using an ordinal scale. Othman (Herrington et al., 2014), states that the use of expert panels alone is not sufficient to validate a questionnaire instrument through content validity but should be supported with statistical analysis such as construct validity. The unidimensionality feature is a critical thing to determine measuring instruments in one direction and one face only if the data is using an ordinal scale. Instruments that have an unclear item will provide misleading results. Rasch model analysis requires at least a minimum achievement of 40% raw variance explained by measures as a good unidimensionality instrument and unexplained variance in the first contrast does not exceed the 15% control limit (Arbuckle, 2006).

Validity
The reliability test of instrument in each questionnaire needs to be conducted to test the stability, consistency and consistent level of items with each other. According to Rozmi Creswell, (2002), the instrument must have a high degree of reliability, with the use of exact measurement tool. The instrument has no reliability if the researcher obtains a different score when the test had been done at different times while the conditions or requirements are the same. In this study, the researchers use the Rasch model analysis to test the reliability of the instrument used.  Figure 1 shows that raw variance explained by measures reached 85.1% above Rasch model analysis expectation of 84.7%. Raw variance explained by measures is the sum-of-squares of the Rasch-predicted observations based on the item difficulties, person abilities, and rating scale structures around their central values. These findings significantly exceed the minimum value requirement of 40%. In addition, the unexplained variance in the first contrast indicates the cause if the measurements are not achievable due to interference as noise. The measured noise level is 2.8% and is accepted as this value is far from the maximum controlled value of 15%. Therefore, the construct validity of each questionnaire item indicates that this instrument has a high unidimensional characteristic.

Construct Validity
International Journal of Global Optimization and Its Application Vol. 1, No. 3, September 2022, pp.182-189. 186 Figure 2 shows a summary statistic for the reliability tests performed on authentic learning instruments.

Figure 2. The Summary Statistics
This study found that consistency of instrument scores is shown through the value of Cronbach Alpha (α) (Talib, 2015). Cronbach Alpha (α) is also used to obtain an instrument internal reliability index (Ismail, 2016;Saibani et al., 2011). (Arbuckle, 2006), state that the value of Cronbach Alpha (α) is between 0 (no internal reliability) and 1 (perfect internal reliability) and a minimum score of .70. The value of .86 obtained in this study shows that the reliability of the instrument used is high. Item reliability value indicates the adequacy of the item to measure what you want to measure. The findings obtained at the value of 1.00 indicate that the measurement used contains items that are very enough to measure what they want to measure. The higher quality item produces a large Person Separation and seeks to separate the classification of Person being studied. The Person Reliability indicates the probability of a person's response results of .84 when the same test is performed. Hence, this authentic learning instrument is at a high level of reliability.
In brief, the variable map shows the probability of success depends on the difference between the capabilities of a respondent and the difficulty level of an item (Arbuckle, 2006). The Rasch model combines an algorithm that specifies the probability of an item as 'i' and 'n' respondent ability in the form of mathematical equations. This is illustrated as follows:  Vol. 1, No. 3, September 2022, pp.182-189. 187

Probability of Success = Respondent Capability -Difficulties of Item
Therefore, the probability of a possibility can be summarized as follows: Figure 3. Variable map diagram. Figure 3 displays the variable map is a visual representation indicating the position of the relationship between the respondent and the item in the measured dimension. According to Sumintono & Widhiarso, (2014), the variable map can show the quality of the weak and good respondents who respond to easy and difficult items. The findings of the variable map show part 1 contains authentic learning elements that are always carried out by respondent and high respondent knowledge of the authentic learning approach. Part 2 contains authentic learning elements that are sometimes performed by respondent. Part 3 contains authentic learning elements that are not performed by respondents and low respondent knowledge of the authentic learning approach. Visually, the variable map shows the response position of the respondent to the item used in the questionnaires. Respondent distribution of weak and good to easy difficult items is evenly distributed. Therefore, authentic learning instruments used in this study demonstrate high reliability when viewed visually. The position of items in the map of the variable are described in more detail in the item fit order. Figure 4 shows the suitability of the item for the instrument used. This is a list of the logit measurement information for each item and facilitates understanding of the position of the item in the variable map. This table also contains Mean Square information (MNSQ), z-Std and Point Measure Correlation (PMC) values to facilitate the detection of any Outlier or Misfit items. Assessment of fit items starts from Infit Mean Square (MNSQ) assessment. Based on the theory, MNSQ is the relation or ratio of an observation compared with expectation. When an observation meets expectations, the ideal value of MNSQ = 1. MNSQ infinity range is 0.50 logit -1.48 logit. The findings show that MNSQ range is at 0.52 logit -1.38 logit. Therefore, all tested items are appropriate as they do not exceed the prescribed infinite range. In the Rasch model analysis, indicators that reinforce the misfit items of the MNSQ values can be seen based on the large z-Std item values in the limit of t = +/-2 logit. The findings show that z-Std values are at the -1.6 logit to +1.4 logit. Therefore, all tested items are appropriate as they do not exceed the designated outfit limits. In addition, Point Measure Correlation (PMC) can also be used to accept or abolish the items tested. The acceptable range depends on the purpose of the instrument. However, negative value PMC items do not measure what should be measured and should be dropped. In this study, all tested PMC items were positive (0.09-0.77) and accordingly were retained.

Conclusion
Measurements in social science research are not always impossible. Latent trait measurements can be made as scientific measurements by changing nominal and ordinal data to interval and ratio data with the Rasch model analysis. In this study, the validity and reliability of authentic learning instrument in Malaysian Polytechnic were found to meet all the requirements set out in the Rasch model analysis. The validity for raw variance explained by measures reached 85.1% and noise level 2.8% is far from the maximum controlled value of 15%. The reliability also meets all of the requirements from the findings of summary statistics, variable map and item fit order. Therefore, this authentic learning instrument can be used in subsequent phases in the development of an authentic learning model in Malaysian Polytechnics. International Journal of Global Optimization and Its Application Vol. 1, No. 3, September 2022, pp.182-189. 189 Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.