VOCATIONAL QUALIFICATIONS IN BRITAIN AND EUROPE :sx THEORY AND PRACTICE .sx by S J Prais .sx This Note considers three questions bearing on the reform of vocational qualifications in Britain , against the background of changes being introduced by the National Council for Vocational Qualifications .sx First , in what important respects did Britain need a reformed and centrally-standardised system of vocational qualifications ?sx Secondly , what are the proper criteria for choosing between alternative methods of awarding qualifications ?sx Much that is at issue hinges on the relative importance of externally-marked written tests as compared with practical tasks assessed by an instructor ; the discussion and conclusions reached here in relation to vocational testing apply in large measure also to current debates in other contexts , such as the proper role of teacher-assessed coursework in school examinations at 16+ ( GCSE ) and the official teacher-assessment of pupils at age 7 ( SATs ) currently being administered in British schools for the first time .sx Our third question is :sx in what significant ways do Continental systems of awarding qualifications differ from those now proposed for Britain ?sx Need for a standardised system .sx It is now accepted on all sides that Britain needs more of its workforce to be vocationally trained to intermediate levels ; that is to say , to craft or technician standards as represented , for example , by City and Guilds examinations ( at part 2 ) or BTEC National Certificates and Diplomas .sx In engineering , building and related trades there has for long been a system for the award of qualifications that has worked more or less satisfactorily ; indeed , the City and Guilds system established at the end of the last century was in many ways an internationally admired pioneer , and its syllabuses and examinations were followed , and are still followed , in many parts of the world .sx In other occupations , such as office work or retailing , a variety of qualifying bodies grew up in Britain - such as the Royal Society of Arts , the London Chamber of Commerce and Industry , Pitman's , the Institute of Drapers - which developed ( what has been called ) a 'jungle' of qualifications at a variety of unco-ordinated levels .sx In many other occupations in Britain there was no system of qualifications at all .sx On the other hand , in Germany - but also , for example , in France , Austria , Switzerland and the Netherlands - vocational qualifications and associated part-time or full-time courses were developed which covered virtually the whole range of occupations in the economy .sx The qualifications awarded usually at ages 18-20 at the end of these vocational courses - the Berufsschulabschluss in Germany and the Certificat d'aptitude professionelle in France - are as widely understood as , say , O-level passes were recently in Britain ( the narrower and clearer range of attainments encompassed by an O-level pass make it a more appropriate standard of comparison than the new GCSE , with the very wide range of attainments spanned by its awards) .sx What was essentially wrong in Britain with engineering and building qualifications was that too few people took them - but I believe there was nothing fundamentally wrong with the qualification-procedure itself .sx For the rest of the economy there was a serious need ( a ) to make the system coherent , so that equivalent levels could more easily be recognised ; and ( b ) to expand the occupational coverage .sx These two objectives - greater recognisability and expansion of coverage - are of course to some extent linked .sx Greater recognisability should lead to greater marketability , reduced transaction costs in the labour market , and to greater demand for qualifications and skills both by employers and by trainees .sx The benefits to be expected are similar to those ensuing from 'hallmarking' .sx There are also economies of scale in organising training programmes , and in specifying standards and certification-procedures for a limited number of defined training-occupations at defined levels .sx Something is of course lost in standardising and restricting the number of training-occupations and levels :sx just as something is lost in not having a suit made to measure ; but , it hardly needs saying , manufacturing to standardised sizes enables many more to buy a decent suit .sx Criteria for vocational qualifications .sx There has always been debate on the relative roles of theory and practice in general education ; that debate has been at least as vigorous in relation to vocational education and the award of vocational qualifications .sx The unsatisfactory extremes of relying solely on 'time-serving' or solely on 'pencil-and-paper' tests have often been contrasted as the basis for the award of vocational qualifications .sx In general , it is clear that all procedures for the award of qualifications can provide no more than imperfect indicators of future capability .sx Before describing how qualifications are awarded in practice , let us for a moment consider the issues in an entirely theoretical way , with the aid of some basic statistical mathematics .sx Suppose we wish to estimate the capability of a person , not simply in relation to what he has done so far , but in relation to what he is likely to be able to do in the future under similar , but not identical , circumstances to those encountered in the past ; for example , the quality of materials may alter , designs may alter , the type of person under whom ( or with whom ) he will be working may alter .sx Let xi denote his true , but yet unobserved , capability in the future ; and let x denote his performance as measured by some test-procedure based on his past performance in specimen tasks .sx Without affecting the argument , these can both be considered as multi-dimensional - relating , for example , to speed of work , accuracy of work , cleanliness , etc. In choosing amongst alternative test-procedures we have to accept , as said , that none will be wholly accurate ; and we have also to accept that testing is an expensive process , and only limited resources can be devoted to it .sx The expected total discrepancy between test and actual performance can be divided into two components .sx In statisticians' terms they correspond to bias and variance ; in educationists' terms they correspond , respectively , to Validity and Reliability .sx In detail - when choosing between alternative estimators , or between alternative test-procedures , we wish to minimise :sx - .sx ( a ) the bias :sx that is , in a sufficiently large number of repeated applications we would like the expected value to correspond to the true value , that is , we wish to minimise .sx Ex- xi ; .sx and .sx ( b ) the variance :sx as between alternative test-procedures which were equally satisfactory from the point of view of their bias , we choose the one that has the minimum variability in repeated applications ( whether by different examiners , or on different samples of questions or tasks ) ; that is , we choose the alternative which yields the minimum value of .sx E(x-Ex ) 2 .sx These two components contribute to the total discrepancy between test and actual performance as follows :sx - .sx formula .sx i.e. Total mean-square-error =Variance +(Bias ) 2 .sx =Reliability +(Validity ) 2 .sx In contrasting written and practical testing of vocational capability , it is widely agreed that written tests have greater Reliability in the sense that different external examiners would give much the same marks if they independently examined a group of candidates .sx On the other hand , it is argued by those of 'modern' views , written tests have a lower Validity since they are applied under 'artificial' examination conditions , and do not test what the candidate actually does in the course of his work ; on that view , the greatest Validity attaches to the assessment of practical tasks carried out by the candidate in a workplace environment , preferably in the course of his normal work and assessed by his normal workplace supervisor .sx Any lower Reliability of such procedures resulting from the supervisor knowing his own trainee or for any other reason , it has sometimes incautiously been suggested , is of no consequence .sx The view in favour of giving great weight to written testing can perhaps be summarised as follows .sx First , any argument that bases itself on the notion that Validity ( ie lack of bias ) is all that matters , is essentially wrong .sx We need to be concerned with the total expected error associated with a qualification-procedure ( ie Validity plus Reliability ) ; we are likely to be misled if we focus on only one component .sx Secondly , if in reality there was a relation such that procedures of high Reliability had low Validity , and vice versa , then that relation needs careful empirical research .sx The relation is likely to vary from one occupation to another ; for example , it is likely to depend on the relative importance in each occupation of applied craft tasks and of planning tasks .sx Thirdly , we have to take into account the costs of testing .sx One simple rule seems to hold very widely , namely , that pencil-and-paper tests are quicker and cheaper to administer than assessing practical tasks ; consequently , written tests method can examine a much wider range of activities per unit of resources devoted to certification .sx A simple example may not be out of place .sx A carpenter or mechanical fitter needs to know which type of metal screw to choose for each job ; screws come in a myriad of different lengths , diameters , threads , heads ( flat , round , Phillips , etc. ) ; and in different metals ( brass , steel , chrome , ) .sx If a final assessment of capabilities had to wait till the candidate had used each type in the course of his normal work in front of his supervisor , and had done so properly on a sufficient proportion of repeated occasions , it would take a very , very long time for him to be judged as qualified .sx On the other hand , a few specimen written questions , such as :sx - .sx Which kind of screw would you use for fitting a mirror to a bathroom wall , and why ?sx would only take a few minutes .sx By testing that the candidate knows why he is doing what he is doing , and not merely observing that he is doing it correctly , we attain greater confidence that he can operate under the variety of different circumstances that arise in practice .sx Notice also that he needs to be tested not only on which is the right type of screw but , if that type is not readily available , he needs to know which of the available alternatives are acceptable , even if less than ideal ; and he also needs to know which are not acceptable , even if the customer would not immediately twig .sx The case for testing knowledge , and not merely observing practice , is thus a strong one even for the simplest of tasks .sx But let us return to the costs of testing .sx Inevitably only a sample of relevant knowledge and skills can be tested .sx Otherwise not only would the direct costs of examination become excessive , but so would the indirect costs ; as HMI recently noted , the new NVQ assessment procedures have already " encroached on the time available for teaching and learning " .sx This was said in relation to engineering qualifications ; but HMI were probably also influenced by the example they quoted of the hairdressing NVQ which involves a " 1000 task checklist " .sx Clearly , the greater the number of test-items , the greater can be our confidence in the final verdict - but also the greater is the cost .sx By the familiar statistical rule , a doubling in the required precision requires a quadrupling in the number of test-items .sx This rule applies if the observations are independent ; if they are correlated and , so to speak , to some extent they test the same capability in another way , then more than a quadrupling will be necessary ( it may not even be possible to double the precision after a certain point) .sx In other words , the way the first ten questions or ten tasks are dealt with by the candidate tells us a great deal about him ; if we aimed to double our confidence in our judgement we are likely to require many more than forty questions or tasks .sx Further , because practical tests are so very much more expensive than written tests , both in administration and marking , it is efficient when working within a limited budget to allocate more questions to written tests than to practical tests .sx A complex balancing exercise is thus involved in the economic design of test-procedures ; it is not surprising that in reality they are developed only slowly over the years , often with step-by-step experimentation , and require an intimate knowledge of the details of each occupation .sx