Adjustment of heterogeneous variances and a calving year effect in test-day models for national genetic evaluation of dairy cattle in South Africa

Mostert, B.E.; Theron, H.E.; Kanfer, F.H.J.; van Marle-Köster, E.

Serviços Personalizados

Artigo

Tradução automática

Indicadores

Acessos

Links relacionados

Citado por Google
Similares em Google

Mais
Mais

Permalink

South African Journal of Animal Science

versão On-line ISSN 2221-4062
versão impressa ISSN 0375-1589

S. Afr. j. anim. sci. vol.36 no.3 Pretoria 2006

Adjustment of heterogeneous variances and a calving year effect in test-day models for national genetic evaluation of dairy cattle in South Africa

B.E. Mostert^I,; H.E. Theron^I; F.H.J. Kanfer^II; E. van Marle-Köster^III

^IARC-LBD, Private Bag X2, Irene 0062, South Africa
^IIDepartment of Statistics, University of Pretoria, Pretoria 0002, South Africa
^IIIDepartment of Animal and Wildlife Sciences, University of Pretoria, Pretoria 0002, South Africa

ABSTRACT

South Africa implemented test-day models for genetic evaluations of production traits, using a Fixed Regression Test-Day Model (FRTDM), which assumes equal variances of the response variable at different days in milk, the explanatory variable. Data at the beginning and at the end of lactation period, have higher variances than tests in the middle of the lactation. Furthermore, first lactations have lower mean and variances compared to second and third lactations. This is a deviation from the basic assumptions required for the application of repeatability models. A modification was therefore implemented to reduce the effect of deviating from this assumption. Test-day milk, butterfat and protein yield records of Jersey cows, participating in the South African Milk Recording Scheme, were therefore pre-adjusted such that the variances are on the same scale. Variance components estimated using the adjusted records were higher than using unadjusted records. Convergence of breeding value estimation is reached significantly faster when using adjusted data (± 4000 iterations) compared to unadjusted records (± 15 000 iterations). Although cow and bull rankings were not influenced much, significant changes in breeding values for individual animals and genetic trends of especially young animals, were found.

Keywords: Fixed regression, Jersey, ranking, repeatability model

Introduction

Considerable interest has been shown in the last few years to model individual test-day records for genetic evaluation of dairy cattle, instead of using the traditional accumulated 305-day yields in the evaluation of production traits (Swalve, 2000; Jensen, 2001; Mrode et al., 2002). South Africa implemented test-day models for genetic evaluations of production traits, using a Fixed Regression Test-Day Model (FRTDM), where the lactation curve is modelled as a fixed regression and the random components are specified as a traditional repeatability model, i.e. constant additive genetic and permanent environmental variances throughout the lactation. Test-day records of the first three lactations were included as repeated measures in the South African FRTDM (Mostert et al., 2006a). A FRTDM assumes equal variances at different days in milk, but data at the beginning and at the end of lactations have had higher variance than tests in the middle of lactation (Jensen, 2001). This is probably due to the onset and end of lactation processes being influenced by more factors than maintenance of production in the middle of lactation (López-Romero et al., 2003), for example, the interval between calving and first test of the lactation influences first test-day yield more than other test-day yields of the lactation, as can be expected in view of rapid changes in milk yield during early lactation (Pander et al., 1992). Furthermore, first lactations have lower means and variances compared to second and third lactations. Swalve (2000) and Jensen (2001) recommended that heterogeneity of variance should be accounted for in the application of test-day models.

In the test-day model implemented by Mostert et al. (2006a), it was found that older sires, especially those born during the 1980s, who performed well at their time, but whose dams' and other female relatives' test-day records were not stored on the INTERGIS (Integrated Registration and Genetic Information System) and who did not have active progeny in the TDM anymore, ranked too high (Mostert et al., 2006b). The aim of this study was therefore to investigate adjusting for heterogeneous variances which exist at different days in milk and parity; to determine the effect of deviating from this assumption as required by repeatability models and to investigate the inclusion of a fixed calving year effect in the South African FRTDM for genetic evaluation of production traits of Jersey cattle.

Materials and Methods

A total of 3 192 159 milk, butterfat and protein test-day records of the first three parities for Jersey cows participating in the South African Milk Recording Scheme was downloaded from the INTERGIS. The following edits were applied to the data: deletion of records with unknown herds, unknown birth dates and calving dates, test days recorded before five days in milk or after 305 days in milk, records of crossbred cows and age restrictions within lactations to ensure reasonable calving ages in a specific lactation (17 - 40 months for lactation 1, 29 - 53 months for lactation 2 and 41 - 67 months for lactation 3). Protein yield was treated as missing for records where protein percentage was greater than 6% or less than 2%. The same was done with butterfat yield where butterfat percentage was higher than 9% or lower than 2%. Test-day milk yield was limited to a range of 1 kg to 70 kg. Lactations should fit the following requirements to be included (specifications from IRIS - the national dairy management system):

1. First test of a lactation should be within the first 63 days in milk.

2. No intervals longer than 100 days between tests of a lactation are allowed.

3. Only one interval between 60 and 100 days allowed per parity.

Lactations ending before 60 days in milk were also discarded. These specifications resulted in 13% of the data being discarded and a data set of 2 768 524 test-day records was obtained. This dataset is referred to as the unselected dataset.

The following fixed effect model was then applied to the data, separately for each trait, in order to obtain BLUEs (Best Linear Unbiased Estimates) of all fixed effects in the model:

Where:

y_ijklmnpq= p^th test-day milk, butterfat or protein yield of cow j in lactation m in herd x test-date x parity x number of milkings group i, of season k, age class l, calving interval class n and calving year p

μ= mean yield

HTDLM_i = fixed effect of herd x test-date x parity x number of milkings group

S_km = fixed effect of calving season in lactation m

AC_lm = fixed effect of age class in lactation m

wilmink(S_km ) = Wilmink curve (Wilmink, 1987) modelled on days in milk within season k and in lactation m (regression)

CI_nm = fixed effect of calving interval class in lactation m

CYpm = fixed effect of calving year in lactation m

e_ijklmnp = random residual error

Two calving seasons were defined: April - September and October - March, while the same age classes were allocated as in the derivation of standard lactation curves by Mostert et al. (2001). Calving interval classes were allocated using standard deviation units. This model assumed consecutive test day samples of a cow, within and across lactations, to be repeated observations of the same trait (Mostert et al., 2006a).

Data was then pre-adjusted for heterogeneous variances as follows: Variances of residuals from the fixed effect model at each day in milk (DIM) were calculated separately for each parity, for all traits, e.g. variance of DIM i of lactation m is var_im. A weighted average of all var_im values was then calculated using SAS (1996) to obtain var_m, the average variance within lactation m. It was decided to use lactation 1, as the reference parity, as most test-day records originate from lactation 1 (1 246 080 lactation 1 records versus 1 016 606 for lactation 2 and 809 323 for lactation 3). The following scaling factor (s_im) was then implemented to pre-adjust all test-day records such that residual variance of all lactations and all days in milk were similar to the weighted average of lactation 1 (reference parity) (Dr Zenting Liu, 2005: Personal Communications, VIT Geneticist, zenting.liu@vit.de). The idea was to firstly scale variance to an average within lactation and secondly to scale the variance of the different parities to the weighted average of the reference parity. This could, however, be done with a one-step procedure:

where var_1 (var_m) is the weighted average of days in milk variances for first (m_th) lactation and var_im is the variance of day i in milk in lactation m. After the estimation of scaling factors, the test-day records were adjusted as follows:

where y^* is the test-day yields adjusted for heterogeneous variances, BLUEs are the best linear unbiased estimates of all fixed effects in the model and r is the residual variance. The adjusted yields (y^*) were then included in (co)variance component estimation and in the national genetic evaluation for estimation of breeding values, using the same model as above, but adding the animal additive genetic and permanent environmental effects as random effects.

For (co)variance estimation, a selected dataset was carefully constructed to ensure adequate genetic linkage among contemporary groups, as follows:

• Only records from cows where both parents were known.

• Only records where milk, butterfat and protein yields were measured.

• Cows must have a first parity.

• Contemporary groups with daughters of at least two sires.

• Contemporary groups with at least five records.

• Sires must be represented in at least three contemporary groups.

• Lactations must have at least nine test-days.

(Co)variance components were estimated with a multitrait analysis using VCE4 (Groeneveld & Garcia-Cortes, 1998). For this selected dataset, pedigrees were traced back for three generations.

PEST (Groeneveld & Kovac, 1990) was used to estimate breeding values, using the unselected dataset and a multitrait evaluation. The pedigrees were, however, traced back as far as possible and genetic groups were incorporated to ensure that base animals enter the evaluation on the appropriate genetic level.

Pearson correlations were estimated using SAS (1996) between adjusted test-day records and test-day records from the March 2005 national genetic evaluation (unadjusted) (Mostert et al., 2006a), as well as between breeding values based on the adjusted records and breeding values from the March 2005 national genetic evaluation. Differences between estimated breeding values (EBVs) from these two evaluations (adjusted breeding values - unadjusted breeding values), averaged per year of birth, were also plotted for proven sires (having at least 20 daughters in 10 herds), young sires (having at least one daughter and having been born since 1999) and measured cows. Cows were separated into cows having only first lactations, cows having up to the second lactation and cows having up to three lactations. Genetic trends, averaging EBVs per year of birth, for proven and young sires, as well as measured cows, were also calculated.

Results and Discussion

Numerous studies have reported heterogeneous genetic, residual and permanent environmental variances for production traits (Hill et al., 1983; Logfren et al., 1985; Mirande & Van Vleck, 1985; Brotherstone & Hill, 1986; Boldman & Freeman, 1990; Dong & Mao, 1990, Short et al., 1990, Meuwissen & Van der Werf, 1993; Ibanez et al., 1996; Meuwissen et al., 1996, Jaffrezic et al., 1999; Robert-Granié et al. , 1999). Several sources of heterogeneous variances are identified, of which the increase in phenotypic variance with increase of production level is probably the most important (Robert-Granié et al., 1999). The change of residual variance over time is therefore only one of the factors of heterogeneity of variance that should be taken into account.

Pre-adjustment of test-day records for heterogeneous variances is often done in genetic evaluations. Reents et al. (1998) described an adjustment procedure to account for within herd heterogeneous variances, considering number of contemporary records, production levels, parity and stage of lactation, for official implementation in the previously used German Fixed Regression Test-Day Model. In the Canadian Random Regression Test-Day Model, data are pre-adjusted for heterogeneous herd-test-date-parity variances on a trait to trait basis (Schaeffer et al., 2000).

Figure 1 indicates residual variances across the lactation. It is clear that residual variances are higher in the beginning of all lactations compared to tests in the middle of the lactation, with erratic behaviour, due to only few tests available at the end of the lactations. Residual variances of first lactations are dramatically lower during all stages of the lactation compared to second and third lactations, while that of second and third lactations are more comparable. Third lactation test-day records show the highest variance throughout the lactation. Adjusting for heterogeneous variance due to days in milk and parity will therefore render an improvement on the March 2005 national genetic evaluation.

In Table 1 the data structure and statistics of the datasets for (co)variance component estimation and for prediction of breeding values, are presented. From this table it can be seen that 3.75% of the data was selected for (co)variance component estimation. Since a higher proportion of first lactation records (77%) relative to second and third lactation records was included in the selected dataset compared to the unselected dataset (42% first lactation records), averages in the selected dataset were slightly lower for all traits compared to the unselected dataset.

Pearson correlations between unadjusted (March 2005 national genetic evaluation) and adjusted test-day records, were above 99% for all traits, with butterfat yield mostly affected.

In Table 2 variance component ratios estimated based on adjusted records from the selected dataset, as well as variance components used in the March 2005 national genetic evaluation (unadjusted records), are listed.

Variance component ratios estimated using the adjusted records were higher for the direct and permanent environmental effects, at the expense of residual ratios, in comparison with estimates obtained from unadjusted data. This makes sense as part of the residual variance has already been taken care of when adjusting for heterogeneous variances. However, these (co)variance component estimations were not done on the same dataset. Different criteria were used in the selection of the two datasets used in these estimations. For example in the adjusted dataset, one of the criteria was that only lactations with at least nine test-day records were included in the selection, while in the unadjusted data set, lactations consisting of six and more test-day records, were included (Mostert et al., 2006a). Variance component estimation on test-day records is influenced by the stage of the lactation which is represented by the test-day records included in the evaluation. According to Meyer et al. (1989) heritabilities were generally highest for test-day yields in the second trimester of lactation. Pander et al. (1992) reported that heritability estimates for milk yield, butterfat and protein concentrations were highest in mid-lactation with a similar pattern for butterfat and protein yields, except that estimates in late lactation for these yields did not fall. Jakobsen (2000) showed that for milk and protein yields, there were tendencies towards higher heritability estimates in mid-lactation, while heritabilities for butterfat yield were more constant throughout the lactation. Druet (2003) also found genetic variance to be highest in mid-lactation and lower at the beginning and end of lactation. This can be attributed to intervals being too wide between test-days in the extremes of lactation, to define the traits in those parts of the lactation and because information is scarce in those periods (López-Romero et al., 2003). Pre-adjusting test-day records for heterogeneous residual variances removed the decline in residual variance that occurs throughout the lactation, as well as the slight increase at the end of lactation, as described by Meyer et al. (1989), Pander et al. (1992), Swalve (1995), Rekaya et al. (1999) and Pool et al. (2000), rendering higher (co)variance component ratios in comparison with those based on the unadjusted records which included higher percentage lactations over a shorter stage of the total lactation.

Convergence (defined as standardized maximum change of the solutions from one round to the next with a stopping criterion of 0.001) of breeding value estimation was reached much faster when using adjusted data (± 4000 iterations) compared to unadjusted records (± 15 000 iterations for the March 2005 evaluation). Correlations between EBVs on adjusted and unadjusted records were higher than 96% for measured cows and young sires, and higher than 98% for proven sires.

In Figures 2 and 3 the differences in milk yield EBVs, averaged per year of birth, between the adjusted and unadjusted data, are indicated for sires (Figure 2) and cows (Figure 3).

Figure 2 shows the differences in milk yield EBVs (adjusted evaluation EBVs - unadjusted evaluation EBVs) averaged per year of birth, for young and proven sires. For young sires EBVs were overestimated by the March 2005 evaluation, with the youngest sires (having a larger proportion of their daughters early in their lactation), being affected mostly. EBVs of proven sires, born from 1983 to 1993, were also overestimated by the March 2005 evaluation, with sires born in 1985 especially affected. The lower EBVs estimated by the adjusted evaluation are probably due to the inclusion of the calving year effect in the model, which proved to be significant for all traits (P < 0.05). Averages of sires born before 1979 were based on only a few sires per year. In Figure 3 these differences are indicated for the cows. EBVs were underestimated by the March 2005 evaluation for cows born between 1983 and 1990 (first lactation cows) and 1992 (second and third lactation cows), with second and third lactation cows mostly affected. Thereafter EBVs were overestimated, with first lactation cows being mostly affected, followed again by a slight underestimation for the young cows.

Although bull and cow rankings as well as differences between the two analyses, averaged per year of birth, were not dramatic, significant changes for individual animals were found between the two models. Table 3 indicates that the average EBVs of proven and young sires and first lactation cows were lower after adjusting for heterogeneous variances and fitting a calving year to the model, therefore these EBVs were previously on average, overestimated. Second and third lactation cows' averages, were higher after these changes to the model. These EBVs were therefore previously, on average, underestimated. Individuals that differed most, were first lactation cows, with proven sires being the least affected.

For example, individual sires were found with milk EBVs being 180 kg lower (previously overestimated) and 228 kg higher (previously underestimated) compared to EBVs from the March 2005 evaluation. EBVs of young sires showed by far the most variation after the changes in the model. Reents et al. (1998) also reported that although overall cow and bull rankings were not influenced much by method of standardization for heterogeneous herd x test-date variances, significant effects for individual animals could be found.

Figures 4 and 5 indicate the effect of adjustment for heterogeneous variances and the inclusion of a fixed calving year effect in the model on the genetic trend of milk yield for sires and cows, respectively. The genetic trend for proven sires is generally lower for the adjusted evaluation, especially for sires born from 1983 to 1993 and again in 1996. For the young sires the adjusted evaluation also yielded a lower genetic trend, with the youngest sires showing the largest decline in trend. For the cows, adjustment and adding a fixed calving year effect to the model only influenced the younger cows, with the adjusted evaluation yielding a significantly lower trend for cows born after 2001. Similar tendencies were observed for butterfat and protein yields, resulting therefore in a lower percentage of young animals being in the top bull and cow rankings compared to the unadjusted evaluation. Reents et al. (1998) also observed that highest genetic trends were found for models without any adjustment and smallest with models with a strict adjustment for heterogeneous variances.

Conclusions

Although cow and bull rankings were not influenced much, significant effects for individual animals and genetic trends of especially young animals were found. Including a fixed calving year effect and adjusting for heterogeneous variances due to days in milk and parity result therefore in more accurate estimation of breeding values, especially for young animals. It is recommended that these changes should be implemented in the national genetic evaluations of the other dairy breeds, thereby preventing the over- and underestimation of individual EBVs.

References

Boldman, K.G. & Freeman, A.E., 1990. Adjustment for heterogeneous variances by herd production level in dairy cow and sire evaluation. J. Dairy Sci. 74, 503-512. [ Links ]

Brotherstone, S. & Hill, W.G., 1986. Heterogeneity of variance amongst herds for milk production. Anim. Prod. 42, 297-303. [ Links ]

Dong, M.C. & Mao, I.L., 1990. Heterogeneity of (co)variance and heritability in different levels of intraherd milk production variance and of herd average. J. Dairy Sci. 73, 843-851. [ Links ]

Druet, T., Jaffrézic, Boichard, D. & Ducrocq, V., 2003. Modeling lactation curves and estimation of genetic parameters for first lactation test-day records of French Holstein cows. J. Dairy Sci. 86, 2480-2490. [ Links ]

Groeneveld, E. & Garcia-Cortes, A., 1998. VCe4.0, a (co)variance component package for frequentists and Bayesians. Proc. 6^th WCGALP. 27, 455-456. [ Links ]

Groeneveld, E. & Kovac, M., 1990. A generalized computing procedure for setting up and solving mixed linear models. J. Dairy Sci. 73, 513-531. [ Links ]

Hill, W.G., Edwards, M.R., Ahmed, M.K.A. & Thompson, R., 1983. Heritability of milk yield and composition at different levels and variability of production. Anim. Prod. 36, 56-68. [ Links ]

Ibanez, M.A., Carabano, M.J., Foulley, J.L. & Alenda, R., 1996. Heterogeneity of herd period phenotypic variances in the Spanish Holstein Friesian cattle: Sources of heterogeneity and genetic evaluation. Livest. Prod. Sci. 45, 137-147. [ Links ]

Jaffrezic, F., White, I.M.S., Thompson, R. & Hill, W.G., 2000. A Link function approach to model heterogeneity of residual variance over time in lactation curve analyses. J. Dairy Sci. 83, 1089-1093. [ Links ]

Jakobsen, J.H., 2000. Genetic correlations between the shape of the lactation curve and disease resistance in dairy cattle. Ph.D. thesis. Dept. Animal Breeding & Genetics, Danish Institute of Agricultural Sciences, Research Centre, Foulum, Denmark. [ Links ]

Jensen, J., 2001. Genetic evaluation of dairy cattle using Test-Day Models. J. Dairy Sci. 84, 2803-2812. [ Links ]

Logfren, D.L., Vinson, W.E., Pearson, R.E. & Powell, R.L., 1985. Heritability of milk yield at different herd means and variances for production. J. Dairy Sci. 68, 2737-2739. [ Links ]

López-Romero, P. & Carabano, M.J., 2003. Comparing alternative random regression models to analyse first lactation daily milk yield data in Holstein-Friesian cattle. Livest. Prod. Sci. 82, 81-96. [ Links ]

Meuwissen, T.H.E., De Jong, G. & Engel, B., 1996. Joint estimation of breeding values and heterogeneous variances of large data files. J. Dairy Sci. 79, 310-316. [ Links ]

Meuwissen, T.H.E. & Van der Werf, J.H.J., 1993. Impact of heterogeneous within herd variances on dairy cattle breeding schemes. Livest. Prod. Sci. 33, 31-41. [ Links ]

Meyer, K., Graser, H-U. & Hammond, K., 1989. Estimates of genetic parameters for first lactation test day production of Australian Black and White cows. Livest. Prod. Sci. 21, 177-199. [ Links ]

Mirande, S.L. & Van Vleck, L.D., 1985. Trends in genetic and phenotypic variances for milk production. J. Dairy Sci. 68, 2278-2286. [ Links ]

Mostert, B.E., Theron, H.E. & Kanfer, F.H.J., 2001. The effect of calving season and age at calving on production traits of South African dairy cattle. S. Afr. J. Anim. Sci. 31, 205-214. [ Links ]

Mostert, B.E., Theron, H.E., Kanfer, F.H.J. & Van Marle-Köster, E., 2006a. Fixed regression test-day models for South African dairy cattle for participation in international evaluations. S. Afr. J. Anim. Sci. 36, 58-70. [ Links ]

Mostert, B.E., Theron, H.E., Kanfer, F.H.J. & Van Marle-Köster, E., 2006b. Comparison of breeding values and genetic trends for production traits estimated by a Lactation Model and a Fixed Regression Test-Day Model. S. Afr. J. Anim. Sci. 36, 71-78. [ Links ]

Mrode, R.A., Swanson, G.J.T. & Lindberg, C.M., 2002. Efficiency of part lactation test day records for genetic evaluations using fixed and random regression models. Anim. Sci. 74, 189-197. [ Links ]

Pander, B.L., Hill, W.G. & Thompson, R., 1992. Genetic parameters of test day records of British Holstein-Friesian heifers. Anim. Prod. 55, 11-21. [ Links ]

Pool, M.H., Janss, L.L.G. & Meuwissen, T.H.E., 2000. Genetic parameters of Legendre polynomials for first parity lactation curves. J. Dairy Sci. 83, 2640-2649. [ Links ]

Reents, R., Dopp, L., Schmutz, M. & Reinhardt, F., 1998. Impact of application of a test day model to dairy production traits on genetic evaluations of cows. Proc. 1998 INTERBULL Meeting, Rotorua, New Zealand, January 18-19, 49-54. [ Links ]

Rekaya, R., Carabano, M.J. & Toro, M.A., 1999. Use of test day yields for the genetic evaluation of production traits in Holstein-Friesian cattle. Livest. Proc. Sci. 57, 203-217. [ Links ]

Robert-Granié, C., Bonaïti, B. & Barbat, A., 1999. Accounting for variance heterogeneity in French dairy cattle genetic evaluation. Livest. Prod. Sci. 60, 343-357. [ Links ]

SAS, 1996. Statistical Analysis Systems user's guide: Statistics, Release 6.12. SAS Institute Inc., Cary, North Carolina, USA. [ Links ]

Schaeffer, L.R., Jamrozik, J., Kistemaker, G.J. & Van Doormaal, B.J., 2000. Experience with a Test-Day Model. J. Dairy Sci. 83, 1135-1144. [ Links ]

Short, T.H., Blake, R.W., Quaas, R.L. & Van Vleck, L.D., 1990. Heterogeneous within-herd variance. I. Genetic parameters for first and second lactation milk yield of grade Holstein cows. J. Dairy Sci. 73, 2223-2230. [ Links ]

Swalve, H.H., 1995. The effect of test day models on the estimation of genetic parameters and breeding values for dairy yield traits. J. Dairy Sci. 78, 929-938. [ Links ]

Swalve, H.H., 2000. Theoretical basis and computational methods for different test-day genetic evaluation methods. J. Dairy Sci. 83, 1115-1124. [ Links ]

Wilmink, J.B.M., 1987. Adjustment of test-day milk, fat and protein yield for age season and stage of lactation. Livest. Prod. Sci. 16, 335-348. [ Links ]

# Corresponding author. E-mail: Bernice@arc.agric.za