SciELO - Scientific Electronic Library Online

 
vol.14 issue1Managing digital evidence - the governance of digital forensicsBBBEE ownership issues in Cape Peninsula-based advertising agencies: a multiple case study approach author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Article

Indicators

Related links

  • On index processCited by Google
  • On index processSimilars in Google

Share


Journal of Contemporary Management

On-line version ISSN 1815-7440

JCMAN vol.14 n.1 Meyerton  2017

 

RESEARCH ARTICLES

 

Testing for measurement invariance for employee engagement across different demographic groups in South Africa

 

 

M Du PlessisI; N MartinsII,

IDepartment of Human Resource Management, University of South Africa. vannim@unisa.ac.za
IIDepartment of Industrial and Organisational Psychology, University of South Africa. martin@unisa.ac.za

 

 


ABSTRACT

The purpose of this study was to assess the measurement invariance of the employee engagement instrument (EEI), as a factor of race and gender. Previous research has advised that existing engagement instruments cannot be used with confidence to make comparisons between different demographic groups in South Africa. Given the demographic differences between individuais, the focus of this study was to determine whether the instrument could be used with confidence among the mentioned demographic groups.
A quantitative research approach was followed in this study. The EEI was administered electronically to a sample of business people, who were representative of the South African working population. 4 099 completed questionnaires were received. Differential item functioning (DIF) was used to test for measurement invariance within an item response theory (IRT) framework.
The results revealed that the assumption for metric invariance does not hold across ali six sub-scales. It was further found that the DIF for race is less problematic than for gender. Eight items demonstrated DIF across gender. The Organisational Commitment, Immediate Manager and Strategy and Implementation sub-scales demonstrated the highest degree of invariance for race.

Key phrases: differential item analysis (DIF); employee engagement; gender; measurement invariance testing; race


 

 

1. INTRODUCTION

Employee engagement is a popular topic that continues to attract attention in modern management practices. Despite its popularity in the workplace, the concept is still highly fragmented and has little academic underpinning (Christian, Garza & Slaughter 2011:90; Macey & Schneider 2008:3-4; Nienaber & Martins 2014:485). Researchers (Goliath-Yarde & Roodt 2011:2; Van Rooy, Whitman, Hart and Caleo 2011:147; Viljevac, Cooper-Thomas & Saks 2012:3692) are further of the opinion that existing engagement instruments are poorly understood and fall short of identifying actionable insights and solutions.

Consequently, Nienaber and Martins (2014:404) identified the need to further clarify existing employee engagement theories and to develop and refine an instrument that measures employee engagement at both the individual and organisational level within a multicultural context. The results of their research accumulated in a validated and reliable instrument (Nienaber & Martins 2015b:418) that demonstrates measurement invariance across various sectors (Martins 2015:771) within the South African context.

Testing for measurement invariance is an important prerequisite for making meaningful comparisons between groups, especially within the South African context (Meiring, Van de Vijver, Rothmann & Barrick 2015:1; Moerdyk 2009:75). Researchers (Heyns & Rothmann 2016:84; Marais, Mostert & Rothmann 2009:7; Meiring, Van de Vijver, Rothmann & Barrick 2005:3) have reported that demographic differences, such as race and gender, affect the psychometric properties of instruments and should thus, be considered when standardising an instrument (Visser & Viviers 2010:7). Consequently, the purpose of this study was to assess the measurement invariance of the EEI as a factor of gender and race.

 

2. CURRENT THEORETICAL PERSPECTIVES

2.1 Employee engagement and measurement

Employee engagement has become a popular topic in recent years and continues to attract attention in modern management practices (Albrecht, Bakker, Gruman, Macey & Saks 2015:7; Griffith 2009:7; Imandin 2015:23). Employee engagement is a critical driver of organisations' success (Macey & Scheider 2008:3; Markos & Sridevi 2010:90) because of the role it plays in sustaining a competitive advantage, which could lead to improved business results and successful organisational performance (Nienaber & Martins 2014:485).

Studies have further linked employee engagement with a number of favourable organisational outcomes such as increased productivity, organisational commitment and loyalty, organisational citizenship behaviour, job satisfaction, customer satisfaction, reduced employee turnover and occupational accidents and improved health and wellness outcomes (Nienaber & Martins 2015a:18-19; Viljevac et al. 2012:3692).

Griffith (2009:7) consequently labels employee engagement as the "human resource craze", "the silver bullet" and a "magical formula" to enhance employee performance. As a result, employers want engaged, happier, healthier and more fulfilled employees who deliver improved business results and are associated with successful, high performing organisations.

Despite its popularity in the workplace, a precise definition of employee engagement remains elusive because of continued research and redefinition surrounding the topic (Albrecht et al. 2015:8; Imandin 2015:23). This confusion is further complicated by the misuse of the terms "work engagement" and "employee engagement" (Nienaber & Martins 2016:3). According to Schaufeli and Salanova (2011:42) employee engagement is a broader concept than work engagement.

Employee engagement is conceptualised by Kahn (1990:694) as a positive psychological state that consists of cognitive, emotional and behavioural dimensions; and consists of two district, yet related types, namely, job engagement and organisational engagement. Job engagement is conceptualised as the extent to which an employee is psychologically present in the performance of his/her work roles; whilst organisational engagement, on the contrary, is defined as the extent to which the employee is psychologically present in their role as an active member of the organisation (Saks 2008:41).

In contrast, work engagement is the preferred concept used by academics because of its focus on the relationships employees have with their work activities. It is defined as a positive, fulfilling, work-related state of mind that is characterised by vigour, dedication and absorption (Bakker & Demerouti 2008:209; Saks 2008:42). It captures how employees experience their work as stimulating and energetic and something to which they really want to devote time and effort; as a significant and meaningful pursuit and as engrossing and something on which they are fully concentrated (Nienaber & Martins 2015a:4). Engaged employees have high levels of energy and are enthusiastic and often fully immersed in their work.

Regarding the measurement of work engagement, Schaufeli, Salanova, Gonzalez Roma and Bakker (2002) developed the self-report Utrecht Work Engagement Scale (UWES). Vigour, dedication and absorption are assessed by six, five and six items respectively. This 17-item scale, also known as the UWES-17, has been validated and utilised in a number of countries, including South Africa. Findings have shown that the UWES and more specifically, the three engagement scales, have sufficient internal consistencies, ranging between 0.80 and 0.90 (Schaufeli et al. 2002). However, when work engagement measures are applied to different cultural groups, issues of measurement bias and equivalence become important (Storm & Rothmann 2003:64).

2.2 Measurement invariance

Measurement invariance is a psychometric method that tests whether the properties of an instrument varies across groups (Moore, Neale, Silberg & Verhuist 2016:2). Kimber, Rehm and Ferro (2015:3) further explain that measurement invariance is a prerequisite for making meaningful comparisons between groups, because according to Milfont and Fischer (2010:12), researchers often assume that an instrument measures the construct in all groups and this assumption cannot be justified without invariance testing.

Testing for measurement invariance is a particularly important prerequisite within the South African context because of its diverse composition (Meiring et al. 2015:1; Moerdyk 2009). Studies conducted within a South African context have reported that issues of race, education and language are the main factors that impact on the construct and item comparability of psychometric tests (Meiring et al. 2005:3).

Marais ei al. (2009:7), and Heyns and Rothmann (2016:84) further explain that the lack of measurement invariance within a South African context can be attributed to (1) semantic differences, (2) cultural differences, (3) personality differences, (4) socio-economic differences and (5) stereotyping. Consequently, Moerdyk (2009:11) and Visser and Viviers (2010:7) stress the importance of considering individual differences in psychological assessment. This statement is confirmed in South African validation studies that have determined measurement invariance of the UWES.

Although the three-factor structure of the UWES was confirmed, with suitable internal consistency (Coetzee & Rothmann 2005:30; Storm & Rothmann 2003:67-68), significant differences were found in the engagement levels of employees in different demographic groups (e.g. language, culture and education) (Barkhuizen & Rothmann 2006:44; Coetzee & Rothmann 2005:30; Goliath-Yarde & Roodt 2011:3). Goliath-Yarde and Roodt (2011:10) further suggests that the UWES-17 should not be used to compare different culture groups in South Africa, and is of the opinion that the differences relate mostly to language and education differences.

 

3. PROBLEM STATEMENT

From the discussion above, it is evident that the concept of employee engagement is still highly fragmented and has little academic underpinning (Christian et al. 2011:90; Macey & Schneider 2008:3-4; Nienaber & Martins 2014:485). Consequently, Simpson (2009), Van Rooy et al. (2011) and Viljevac et al. (2012:3692) found that existing engagement measures (e.g. UWES) are poorly understood and fall short of identifying actionable insights and solutions. It was further suggested that the UWES not be administered within a South African context because of the country's diverse composition (Goliath-Yarde & Roodt 2011:2).

In their research, Nienaber and Martins (2014:494) moreover found that most engagement instruments are designed to measure engagement at either the individual or organisational level, and there is no consensus regarding the dimensions comprising engagement. Based on these findings, Nienaber and Martins (2014:494) identified the need to further clarify existing theories and to develop and refine a measurement instrument to measure employee engagement. The results of their research accumulated in the development of a newly validated and reliable instrument measuring employee engagement concurrently at both the individual and organisational level within a diverse, multicultural context (Nienaber & Martins 2015b:419).

Consequently, Martins (2015:759) conceptualised employee engagement as "engaged employees at both the individual and organisational level, who are fully absorbed by and enthusiastic about their work, and so take positive action to further the organisation's reputation and interests". In essence, employee engagement was conceptualised at both the individual and organisational level, because it reflects the individual's role in his/her work and as a member of the organisation.

The instrument consists of two sections, one collecting demographic information (i.e. gender, qualification, experience and tenure), and one soliciting responses using a five-point Likert scale, on statements about engagement at the individual, team/departmental and organisational level. The instrument has been shown to have good psychometric properties (Nienaber & Martins 2015b:420) but indicated that multi-group invariance could not be assumed for the different sectors (Martins 2015:772). The next step in the refining of the instrument is to further test for group invariance.

Consequently, the purpose of this research study is to assess the employee engagement instrument with specific focus on measurement invariance for gender and race within a South African context.

 

4. RESEARCH OBJECTIVE

The objective of this research study was to determine if any measurement invariance (measurement equivalence) exists between the different races and genders within a South African context.

 

5. RESEARCH METHODOLOGY

5.1 Research participants

A research company's database of 285 000 working individuals, who represent the profile of the South African working population, was used in this study. Ethical clearance was obtained to conduct the research study and to administer the questionnaire. An electronic survey, administered by iFeedback.co.za, an online data collection portal, was used by means of a mass e-mail invitation over a period of three weeks.

Each potential participant in the database of 285 000 working individuals received an e-mail stating the purpose of the study, information regarding the completion time and a link to the electronic survey. The e-mail further explained that participation was voluntary and that confidentiality and anonymity would be ensured. Lastly, the participants were ensured that the responses would be used for research purposes only. A total of 4 099 completed questionnaires were received in the third national engagement survey conducted. The researchers aimed to obtain at least 4 000 responses.

The demographic profile of the participants in terms of race and gender is reflected in table 1.

 

 

The sample was comprised of 58.2% male and 41.8% female participants, and 62.0% of the sample was white participants.

Demographic items received had very few missing responses and therefore, these were treated as system-missing.

5.2 The measuring instrument

The employee engagement instrument (EEI), was administered to the population described above (section 3). The validity and reliability of the instrument was reported in the second phase of the scale development process (Nienaber & Martins 2015b:420).

The instrument consists of two sections, namely, one collecting biographical/demographic information and one soliciting responses using a five-point Likert scale (1 = strongly disagree; 5 = strongly agree), on statements about engagement at the individual (50 items), team (12 items) and organisational (10 items) level.

5.3 Research approach and procedure

A quantitative research approach was followed in this study. This approach to research involves collecting data in numerical form for quantitative analysis (Garwood 2006; Lyons & Doueck 2009). Recent published analyses aimed to demonstrate the instrument's degree of factorial invariance across South African business sectors, with results indicating that invariance can be assumed for all sectors except community and manufacturing sectors (Martins 2015:772).

Thus far, the development and validation of the EEI has adopted a classical test theory (CTT) or confirmatory factor analysis (CFA) approach (Martins 2016:185).

In order to further unpack the instrument's factorial invariance, the researchers decided to assess the EEI from an item response theory (IRT) perceptive, with a specific focus on measurement invariance focusing on gender and race. By using IRT, the difficulty level and discriminatory power of an item can be accurately determined (Foxcroft & Roodt 2009:73). Thus, the item parameters are independent of the test taker's ability level and are therefore, a useful method for determining item bias (De Beer 2005:723). The principle of invariance is thus a major advantage of IRT, because it allows researchers to compare traits of individuals from different populations (Bortolotti, Tezza, De Andrade, Bornia & De Sousa Junior 2013:2344-2345).

The method that was used to establish measurement invariance within the IRT framework is known as differential item functioning (DIF) (Foxcroft & Roodt 2009:73). DIF is conceptualised as the difference in item scores between two or more groups that match the concept of interest (Zumbo 1999:12). DIF is thus used to identify items that may be biased or unfair to test-takers from certain groups. DIF is focused on the functioning of items across groups. This is not to suggest that test/scale score equivalence is unimportant, as one can determine the extent to which DIF items cumulatively impact observed mean differences across groups.

A bottom-up and progressively restricted approach was thus followed in this research to test for measurement invariance (refer to figure 1). That is, the invariance requirements at each step had to be met before moving on to the higher-order form of equivalence (Dimitrov 2010:124). The steps that were followed to determine measurement invariance in this study are graphically represented in figure 2 and explained in the remainder of this section.

 

 

 

 

5.3.1 Configurai invariance

The first test of measurement invariance tests is to determine whether configuration holds across groups (Hortensius 2012:8). If a model exhibits configurai non-invariance, it is not possible to compare across groups. To demonstrate configurai invariance, the researcher has to show that a factor is unidimensional across groups. Finding unidimensionality across groups implies that there is configurai measurement invariance (Tay, Meade & Cao 2015:8). Fundamentally, one would have to prove local independence by establishing that observed item responses are uncorrelated after controlling for the underlying construct/trait.

Configurai invariance was established by conducting the following analysis (Martins 2016:186):

A restricted multidimensional item response theory (MIRT) and/or unidimensional model (CFA equivalent) was conducted on each group within each DIF variable (e.g. race and gender). In other words, the overall sample was divided into two, three or four depending on the number of groups. In the case of gender, separate models were conducted for males and females, and factor loadings were examined to assess if there were any significant differences.

Thereafter, a restricted multi-group MIRT model was conducted. All groups within the same model were analysed to determine if differences in terms of factor loadings and means existed. It is important to note that only item loadings on designated factors were freely estimated similar to multi-factor CFA analysis. This output assists in determining whether form equivalence exists for the total model of engagement.

In order to identify specific areas of non-invariance, each sub-scale was analysed separately to examine differences that were flagged in the previous step. Unidimensional multi-group analysis also provided item-level statistics to assess the requirement of local independence.

The results of the above analysis were collated to assess whether the data met the requirements of configural invariance. Thereafter, metric invariance was assessed via a DIF analysis.

5.3.2 Metric invariance

Metric invariance is the next step of equivalence and requires that the size of the factor loadings be equivalent across groups under consideration. This model therefore tests if different groups respond to items in the same way (Milfont & Fischer 2010:115). If metric invariance is satisfied, obtained ratings can be compared across groups and observed item differences will indicate group differences in the underlying construct (Milfont & Fischer 2010:115).

Within IRT, this is assessed via DIF in order to determine whether items function differently across groups. To assess this requirement, DIF analyses were conducted at a unidimensional level to determine goodness-of-fit and item functioning. The item slopes for each sub-scale/model were set equal to the assumption of equivalent loadings/slopes.

5.3.3 Scalar invariance

Scalar invariance is required to compare means. Scalar invariance indicates that observed scores are related to the latent scores. That is, individuals who have the same score on the latent construct would obtain the same score on the observed variable regardless of their group membership (Milfont & Fischer 2010:115). Scalar invariance assumes that the size of intercepts is equivalent across groups under consideration. This is assessed by conducting a DIF analysis and constraining both the slopes and item intercepts in order to test the assumption(s).

 

6. RESULTS

The previous analysis of the data provided strong evidence of the factor analytical structure of the measure (Martins 2015b:763). In order to address the objective of this study, the results of the measurement invariance testing of the EEI in relation to gender and race is reported in this article.

6.1 Gender

6.1.1 Unidimensional analyses

Males and females were analysed in separate models on a unidimensional level. This was done in order to obtain a more detailed picture of the item functioning within each sub-scale. Table 2 provides a summary of the goodness-of-fit indices for each model. Two analysis were conducted per sub-scale - one for males and one for females.

 

 

Results of the unidimensional analysis show a difference in male and female models in terms of -2loglikelihood, AIC and BIC. Overall, Customer Service, Immediate Manager and Organisational Commitment show better model fit in terms of the aforementioned indices. However, these may be affected by the difference in sample size, and therefore, M2 and RMSEA should be used to provide a better indication of model fit.

As mentioned, a significant M2 value (as seen in table 2) demonstrates poor model fit.

6.1.2 Restricted MIRT

A restricted MIRT was analysed for males and females separately, in order to investigate whether there were any significant differences in the pattern and magnitude of factor loadings.

In each case, the results were reported both on the original sample and a randomised sample of equal size. The latter procedure was followed to mitigate any large changes in fit indices that may be due to unbalanced samples. Since the information indices (AIC and BIC) are largely used to compare nested model variance, it is not prudent to compare non-nested models directly. As such, the differences in these indices between the four models in Table 3, may not be very informative. However, the -2loglikelihood can be compared. Although the original male and female samples differ quite significantly based on the -2loglikelihood index, the balanced and random samples show small differences in overall model fit.

 

 

The results further demonstrate that factor loadings between males (n = 2 387) and females (n = 1 712) are similar across models, with loading differences not greater than 0.05.

After analysing male and female models separately, both gender groups were combined into a multi-group restricted MIRT. A multi-group analyses both groups simultaneously and provides an indication of their factor loadings relative to one another.

In the multi-group analysis, males were treated as the reference group with the latent mean and variance constrained to 0 and 1 respectively, for each sub-scale. Females were treated as the focal group, with latent means and variances freely estimated.

The results of the restricted MIRT between male and female respondents in a multi-group model indicate factor loading differences between the gender groups. Factor loadings for females were generally 0.15 lower than those for males.

To better understand the multi-group effects seen in the MIRT model, separate multi-group analyses were conducted at a unidimensional level, with item parameters again being estimated via the Bock-Aitkin (BAEM) method (Bock & Aitkin 1981). A visual inspection of the factor loadings across the unidimensional and multidimensional analyses shows similarly strong factor loadings and low standard errors (+ 0.02 to + 0.04) between males and females Table 4 provides a breakdown of S- χ2 item fit statistics, where highly significant values (p<0.01) indicate that the expected values of the model deviate from the observed values of the data. Table 5 provides a breakdown of goodness-of-fit results.

 

 

 

 

Based on the foregoing tables, it is clear that some of the individual items do not fit the data very well. That is, they are not very good items. What is pleasing to see though, is that the items that appear to be problematic are consistent across males and females. Nonetheless, these items should be considered for adaptation or deletion to improve the overall discrimination of the individual sub-scale and the overall assessment.

6.1.3 Configurai invariance

A visual inspection of the factor loadings across the unidimensional and multidimensional analyses shows similarly strong factor loadings and low standard errors (± 0.02 to ± 0.04) between males and females. At a surface level, the questionnaire demonstrates configurai invariance.

However, detailed analysis of the goodness-of-fit indices (table 5) from the multi-group unidimensional models show adequate RMSEA values (close to 0), but highly significant M2 metrics. This indicates a significant source of error in the model that has not been addressed.

Examining other relevant metrics from the output show significant deviation between expected and observed frequencies across both males and females (table 4). An examination of the underlying frequency table is necessary to determine the reason for this difference. An analysis of the local independence statistic (not represented here) also shows evidence of local dependence, indicating likely multidimensionality, and the presence of one or more additional latent variables that have not been modelled.

These issues need to be addressed before configurai invariance can be proven. However, given that the factor loadings are similar across groups, it was decided to proceed to the next stage of invariance testing by running a unidimensional DIF analysis on each sub-scale.

6.1.4 Metric invariance

A differential item functioning (DIF) analysis is similar to a multi-group model, but specifically measures whether the expected item responses (in terms of scope and intercept) differ between the groups under consideration.

DIF analyses were conducted at a unidimensional level on each sub-scale. The item slopes were set to equal in order to test whether goodness-of-fit remained similar to the baseline model (freely estimated multi-group analysis). Table 6 provides a goodness-of-fit comparison between the multi-group and DIF unidimensional models. The relative measures (-2loglikelihood, AIC and BIC) show slight increases between the baseline and DIF models. Similarly, there were larger M2 values for the more constrained models, with Immediate Manager, Organisational Satisfaction, and Strategy and Implementation showing a M2 increase of more than 10.

 

 

The chi-squared difference test is also reported in table 6. This measure, annotated as χ2 Diff, assesses the statistical significance of the difference between the DIF and baseline models for each sub-scale. It is calculated by obtaining the difference between the -2loglikelihood for the DIF model and the -2loglikelihood for the baseline model. This value is then referenced to the chi-squared table, along with the associated degrees of freedom, to determine statistical significance. If a significant difference is found based on the -2loglikelihood, one can assume that assumption of metric invariance does not hold for the more constrained model. Since the -2loglikelihood metric can be interpreted as a chi-square distribution, the same problems associated with using chi-square for significance testing will plague the -2loglikelihood. For this reason, it is important to also consider the information indices like the AIC and BIC to detect differences between the two nested models.

In the case of gender, all six DIF models are shown to be significantly different to the baseline models, therefore rendering metric invariance untenable (weak).

Table 7 presents the DIF statistics for each sub-scale under consideration. Significant values of χ2 indicate the presence of differential functioning across groups, only the problematic items are highlighted.

 

 

Overall, the DIF analyses echo the findings of the multi-group undimensional analysis with signs of misfitting models and DIF across all six sub-scales. However, given the large number of items and the large sample size, only 8 items demonstrated significant DIF across gender. It thus appears as if these 8 items should either be removed or adapted. The subsequent section of the report will look at the measurement invariance of race.

6.2 Race

This section focuses on the examination of group differences based on race. The breakdown of response frequencies was provided in table 2, with the majority of test-takers being of Caucasian origin.

For the purpose of IRT analysis "Other (4)" and "Prefer not to say (5)" were excluded due to their small sample size. Due to the unbalanced size of the "White (4)" sample, a randomised selection of 500 cases was instead utilised.

6.2.1 Unidimensional analyses

Unidimensional analyses were conducted on each group within the race variable to assess factor loadings and goodness-of-fit between said groups.

Table 8 provides a breakdown of the goodness-of-fit of the unidimensional models run on each racial group within each sub-scale. A total of 24 different models were analysed to account for the four ethnic categories and the six sub-scales.

 

 

Similar to the results seen with gender (table 2), the various models all demonstrate an acceptable RMSEA model fit indicator. However, the M2 statistic demonstrated that the theoretical configuration of the sub-scale may not replicate the sample data with an adequate degree of accuracy (p<0.001). As a whole, Team showed the highest degree of misfit according to the -2loglikelihood metric. In general, the African group demonstrated higher levels of misfit compared to the other three groups.

6.2.2 Restricted MIRT

A race multi-group MIRT was conducted according to the same methodology described above. The African group was treated as the reference group with the latent mean and variance constrained to 0 and 1, respectively. The other three categories were treated as focal groups with means and variances freely estimated.

The results of the factor loadings for each group showed that the African group demonstrates the strongest factor loadings across each sub-scale. As a whole factor, loadings ranged from 0.54 to 0.95 with no aberrantly low loadings, and standard error ranged between 0.01 and 0.05. This instils confidence in the cross-cultural robustness of the measure.

Table 9 demonstrates the latent mean of the reference group (African) was constrained to zero for comparative purposes. Since all the latent mean values for the Coloured, Indian and White groups are positive, it would suggest that these groups have higher levels of endorsement on the six dimensions of engagement. It is important to remember that it is not prudent at this point to compare the latent mean scores of groups since measurement invariance has not yet been demonstrated. This will only be possible after the DIF analysis has been concluded.

 

 

Table 10 demonstrates the covariance between latent variables across the four race groups. As would be expect, most of the latent factors are positively correlated with one another. Unfortunately, no latent covariance can be examined for the African group since the latent means are constrained to zero.

 

 

As discussed above, multi-group analyses were conducted on each of the six sub-scales to assess the factor loadings and goodness-of-fit of models that included all race groups. The results indicate that the factor loadings follow a similar pattern as revealed in the restricted multi-group MIRT analyses discussed above. However, at times, the loadings were higher than those seen in the separate group analysis with some differences of up to 0.25. The Team sub-scale reported two items (Q39 and Q42), with factor loadings equal to 1. This may be due to potential Haywood cases or items with very restricted variances. These items were flagged as potentially problematic in the subsequent analyses.

The group parameter estimates and model fit indices for each of the multi-group analyses, varied in a narrow range, across the four race groups and the six dimensions.

6.2.3 Configurai invariance

The results seen when comparing race groups are similar to those seen in the instance of gender and are outlined in section 4.1.2. It seems that the issue of local dependence persists when examining item scores by race. While factor loadings are high across sub-scales and groups, together with low standard error, goodness-of-fit indices consistently report some degree of misfit. This may indicate that some items poorly reflect the underlying construct or violate the assumption of local independence. This may result in the detection of DIF, which may not be due to real differences across ethnic groups.

The DIF analysis was undertaken to determine whether the EEI items show evidence of metric invariance. In doing so, the DIF results are compared to the baseline models (multi-group unidimensional analyses) to determine the overall fit of the more constrained models. If the overall model fit deteriorates significantly, it is assumed that DIF may be prevalent in the given sub-scale. DIF statistics are also reported on item level.

6.2.4 Metric invariance

Wald's χ2 statistic implies significant differences between groups. In the four group analyses, contrast 1 compares group 1 with 2, 3, and 4 (1 vs. 2, 3, and 4). This can be regarded as an overall omnibus significant test similar to ANOVA. Contrast 2 compares group 2 to 3 and 4 (2 vs. 3, 4), while contrast 4 compares group 3 to 4 (3 vs. 4).

In order to obtain a more detailed picture of the DIF between specific race groups, post-hoc DIF analyses were performed. These compared the DIF statistic of items between groups 1 and 2, 1 and 3, and 1 and 4, in three separate models per sub-scale, the results of which are provided below.

Five items on the customer service sub-scale demonstrated DIF according to the overall test (contrast 1).

Four items on the immediate manager sub-scale demonstrated DIF according to the overall test (contrast 1). In general, only one item on the Organisational Commitment scale demonstrates DIF according to the overall test (contrast 1).

Eight items on the Organisational Satisfaction sub-scale demonstrated DIF according to the overall test (contrast 1) and two items according to contrast 2. Clearly this sub-scale is plagued by DIF. Paired DIF post-hoc analyses on the Organisational Satisfaction sub-scale found

Post-hoc analyses on the Strategy and Implementation sub-scale found that the items showed signs of DIF on four items (contrast 1) and one item on contrast 2. Similarly to Organisational Commitment above, there were no item differences between the African and Coloured groups.

Eight items in the Team sub-scale demonstrated DIF according to the overall test (contrast 1). Unlike previous analyses, a post-hoc analysis of race pairs with Team found evidence of DIF between the African and Coloured groups, while there were no significant item differences between the African and Indian groups. The biggest differences in item functioning were found between the African and White groups.

It is clear from the post-hoc test that the biggest item differences for the Immediate Manager sub-scale are between the African and White groups. Comparisons between the Indian and African groups also demonstrated some significant differences. Organisational Commitment showed relatively little DIF. Clearly the sub-scale is functioning quite well and latent mean differences between groups can be made due to the strong evidence for metric invariance. A possible reason for detecting DIF on so many items in the Strategy and Implementation sub-scale may be due to the local independence function that was violated. In the unidimensional analyses per group, it was evident that some of the items were highly correlated after the influence of the latent variable was taken into consideration.

A comparison of the baseline and DIF models is presented in table 11. Goodness-of-fit on the DIF models is generally poorer, although the difference between these metrics is not overly large.

 

 

As discussed in section 5.2.4 the -2loglikelihood difference test in table 11 demonstrates statistically significant differences between the DIF and baseline models at p<0.001. The differences are greatest for the Strategy and Implementation, Organisational Satisfaction and Immediate Manager sub-scales. The differences between the baseline and DIF analysis for Team are not overly large, relative to the other sub-scales, but the high values of the -2loglikelihood demonstrate the sub-scale is poorly fitting in both analyses. Due to the significant differences in the -2loglikelihood ratio, one can presume that the strict assumption of metric invariance does not hold across all six sub-scales. However, the differences in fit were significantly lower compared to the analysis with ethnicity as the grouping variable. This may suggest that the DIF is less problematic for race than for gender. Similar to the DIF analysis using race - the Organisational Commitment and Strategy and Implementation sub-scales seem to demonstrate the highest degree of invariance.

 

7. CONCLUSION

The purpose of this study was to assess the measurement invariance of the EEI as a factor of gender and race. Previous results confirmed the validity and reliability of the instrument and successfully demonstrated evidence of configurai invariance using a CFA (Martins 2015). The data were further used to test for measurement invariance among race and gender.

The three steps of measurement invariance according to the IRT framework (refer to figure 2) were used to test for measurement invariance among race and gender. The IRT analysis found support for the assumption of configurai invariance for the total measure, as well as for the sub-scales; however, this was tempered by poor model fit (i.e. some of the individual items do not fit the data very well) and violation of the local independence requirement of IRT. This violation was most pronounced for the Team sub-scale. It is posited that certain problematic items leading to a source of error in analysis that has not been accounted for, such as multidimensionality. It was, however, pleasing to see that the items that appear to be problematic are consistent across males, females, and race.

Unidimensional DIF analysis was undertaken to determine whether the EEI items show evidence of metric invariance.

The -2loglikelihood difference test displayed in tables 6 and 11 demonstrates a statistically significant difference in the model fit between the baseline and restricted DIF models. Due to the significant differences in the -2loglikelihood ratio, it is assumed that the assumption of metric invariance does not hold across all six sub-scales. However, the difference in fit was significantly lower compared to the analysis with suggesting that the DIF is less problematic for race than for gender. Nevertheless, given the large number of items and sample size, only eight items demonstrated significant DIF across gender (refer to table 7).

The EEI can thus be used with confidence for genders if the eight items are removed or adapted. The Organisational Commitment, Immediate Manager and Strategy and Implementation sub-scales demonstrated the highest degree of invariance for race. The results accordingly demonstrated that for the most part, it is not permissible to compare latent means directly across race. In order to remedy the problem, items demonstrating DIF should be removed or adapted and re-tested. These results thus confirm researchers concern that instruments are used without invariance testing and that testing for measurement invariance is a particularly important prerequisite within the South African context.

 

REFERENCES

ALBRECHT SL, BAKKER AB, GRUMAN JA, MACEY WH & SAKS AM. 2015. Employee engagement, human resource management practices and competitive advantage. Journal of Organisational Effectiveness: People and Performance 2(1):7-35.         [ Links ]

BAKKER AB & DEMEROUTI E. 2008. Towards a model of work engagement. Career Development International 13(3):209-223.         [ Links ]

BARKHUIZEN N & ROTHMANN S. 2006. Work engagement of academic staff in South African higher education institutions. Management Dynamics 15(1):38-46.         [ Links ]

BOCK RD & AITKIN M. 1981. Marginal maximum likelihood estimation of item parameters: an application of the EM algorithm. Psychometrika 46:443-459.         [ Links ]

BORTOLOTTI SLV, TEZZA R, DE ANDRADE DF, BORNIA AC & DE SOUSA JUNIOR AF. 2013. Relevance and advantages of using the item response theory. Quality and Quantity 47:2341 -2360.         [ Links ]

CHRISTIAN MS, GARZA AS & SLAUGHTER JE. 2011. Work engagement: a quantitative review and test of its relations with task and contextual performance. Personnel Psychology 64:89-136.         [ Links ]

COETZEE SE & ROTHMANN S. 2005. Work engagement of employees in a higher education institution in South Africa. South African Business Review 9(3):23-34.         [ Links ]

DE BEER M. 2005. Developing of the learning potential computerised adaptive test (LPCAT). South African Journal of Psychology 35(4):717-747.         [ Links ]

DIMITROV DM. 2010. Testing for factorial invariance in the context of construct validation. Measurement and Evaluation in Counselling and Development 43(2): 121-149.         [ Links ]

FOXCROFT C & ROODT G (eds.) 2009. Introduction to psychological assessment in the South African context. 3rd ed. Cape Town: Oxford University Press.         [ Links ]

GARWOOD J. 2006. Quantitative research. In Jupp V (ed). The SAGE dictionary of social research methods. London, UK: Sage. pp. 251-252.         [ Links ]

GOLIATH-YARDE L & ROODT G. 2011. Differential item functioning of the UWES-17 in South Africa. South African Journal of Industrial Psychology/SA TydskrifvirBedryfsielkunde, 37(1), Art.[#897,11 pages. http://dx.doi.org/10.4102.sajip.v37i1.897.         [ Links ] ]

GRIFFITH G. 2009. Conceptualising and exploring the meaning of employee engagement: an exploratory study. Johannesburg: University of the Witwatersrand. (Master's dissertation.         [ Links ])

HEYNS M & ROTHMANN S. 2016. Comparing trust leveis of male and female managers: measurement invariance of the behavioural trust inventory. South African Journal of Psychology 46(1):74-87.         [ Links ]

HORTENSIUS L. 2012. Project for introduction to multivariate statistics: measurement invariance. [Internet: www.tc.umn.edu/~horte005/docs/Measurementlnvariance.pdf; downloaded on 2016-05-30.         [ Links ]]

IMANDIN L. 2015. Developing a conceptual framework to analyse engagement and disengagement in the workplace. Potchefstroom: North-West University. (DPhil thesis.         [ Links ])

KAHN WA. 1990. Psychological conditions of personal engagement and disengagement at work. Academy of Management Journal 33(4):692-724.         [ Links ]

KIMBER M, REHM J & FERRO MA. 2015. Measurement invariance of the WHODAS 2.0 in a population-based sample of youth. PLOS ONE 1011:1-13.         [ Links ]

LYONS P & DOUECK HJ. 2009. The dissertation: from beginning to end. New York, NY: Oxford University Press.         [ Links ]

MACEY WH & SCHNEIDER B. 2008. The meaning of employee engagement. Industrial and Organisational Psychology 1:3-30.         [ Links ]

MARAIS C, MOSTERT K & ROTHMANN S. 2009. The psychometrical properties of translated versions of the Maslach Burnout Inventory -general survey. South African Journal of Industrial Psychology 35(1):1-8.         [ Links ]

MARKOS S & SRI DEVI MS. 2010. Employee engagement: the key to improving performance. International Journal of Business and Management 5(12):89-96.         [ Links ]

MARTINS N. 2015. Testing for measurement invariance for employee engagement across sectors in South Africa. Journal of Contemporary Management 12:757-774.         [ Links ]

MARTINS N. 2016. Confirming the measurement properties of an engagement measure. Proceedings of the 15th European conference on research methodology for business and management studies. London, UK: Kingston Business School, Kingston University: 184-192.         [ Links ]

MEIRING D, VAN DE VIJVER AJR, ROTHMANN S & BARRICK MR. 2005. Construct, item and method bias of cognitive and personality tests in South Africa. South African Journal of Psychology 31 (1):1-8.         [ Links ]

MILFONT TL & FISCHER R. 2010. Testing measurement invariance across groups: applications in cross-cultural research. International Journal of Psychological Research 3(1):111-121.         [ Links ]

MOERDYK A. 2009. The principles and practices of psychological assessment. Pretoria: Van Schaik.         [ Links ]

MOORE AA, NEALE MC, SILBERG JL & VERHULST B. 2016. Substance use and depression symptomatology: measurement invariance of the Beck Depression Inventory (BDI-II) among non-users and frequent-users of alcohol, nicotine and cannabis. PL0S ONE 11(4):1-14.         [ Links ]

NIENABER H & MARTINS N. 2014. An employee engagement instrument and framework building on existing research. Mediterranean Journal of Social Sciences 5(20):485-496.         [ Links ]

NIENABER H & MARTINS N. 2015a. Employee engagement in a South African context. Pretoria: KR Publishing.         [ Links ]

NIENABER H & MARTINS N. 2015b. Validating a scale measuring engagement in a South African context. Journal of Contemporary Management 12:401-425.         [ Links ]

SAKS AM. 2008. The meaning and bleeding of employee engagement: how muddy is the water? Industrial and Organisational Psychology 1:40-43.         [ Links ]

SCHAUFELI W & SALANOVA M. 2011. Work engagement: on how to better catch a slippery concept. European Journal of Work and Organisational Psychology 20(1):39-46.         [ Links ]

SCHAUFELI WB, SALANOVA M, GONZALEZ ROMA V & BAKKER AB. 2002. The measurement of engagement and burnout: a confirmative analytic approach. Journal of Happiness Studies 3:71-92.         [ Links ]

SIMPSON M. 2009. Engagement at work: a review of the literature. International Journal of Nursing Studies 46:1012-1024.         [ Links ]

STORM K & ROTHMANN S. 2003. A psychometric analysis of the Utrecht Work Engagement Scale in the South African Police Service. South African Journal of Industrial Psychology 29(4):62-70.         [ Links ]

TAY L, MEADE AW & CAO M. 2015. An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods 18(1):3-46.         [ Links ]

VAN ROOY DL, WHITMAN DS, HART D & CALEO S. 2011. Measuring employee engagement during a financial downturn: business imperative or nuisance? Journal of Business Psychology 26:147-152.         [ Links ]

VILJEVAC A, COOPER-THOMAS HD & SAKS AM. 2012. An investigation into the validity of two measures of work engagement. The International Journal of Human Resource Management 23(17):3692-3709.         [ Links ]

VISSER D & VIVIERS R. 2010. Construct equivalence of the OPQ32n for black and white people in South Africa. South African Journal of Industrial Psychology 36(1).         [ Links ]

ZUMBO BD. 1999. A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modelling as a unitary framework for binary and Likert-type item scores. Ottawa, CAN: Department of National Defence, Directorate of Human Resources Research and Evaluation.         [ Links ]

 

 

* corresponding author

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License