THE  GRAMMATICAL  INTERPRETATION  OF  RUSSIAN  INFLECTED  FORMS  USING  A  
STEM  DICTIONARY   .sx   
by  J.  McDANIEL  and  S.  WHELAN  , National  Physical  
Laboratory  , Teddington  , England   .sx   
INTRODUCTION   .sx   
THE  NPL  Russian-English  automatic  dictionary  is  organised  
on  a  stem-paradigm  basis  wherein  there  is  for  most  nouns  and  
adjectives  a  single  entry  for  all  their  inflected  forms  and  for  most  
verbs  only  one  or  two  entries .sx   This  is  in  contrast  to  the  full-form  
type  of  dictionary  organisation  wherein  each  inflected  form  of  every  
word  has  a  separate  entry .sx   The  decision  to  organise  our  dictionary  on  
this  basis  was  made  so  as  to  be  able  to  accommodate  it  on  the  magnetic  
tape  store  available  to  us  on  the  ACE  digital  electronic  
computer  of  our  laboratory  and  , further  , to  minimise  the  look-up  time  
per  word  on  the  computer  without  complicating  the  look-up  procedure  
too  much  or  investing  too  much  programming  effort  in  its  compilation .sx   
The  word  content  of  the  dictionary  initially  is  to  be  15,000  words  
from  the  Harvard  University  Automatic  Dictionary .sx   Our  dictionary  will  
have  an  average  of  about  1.5  entries  per  word  , whereas  a  full-form  
dictionary  would  have  about  ten  times  that  average .sx   
The  operation  of  our  stem-paradigm  dictionary  involves  two  extra  
processing  steps  as  compared  with  the  full-form  type  dictionary .sx   
Firstly  , words  referred  to  the  dictionary  are  reduced  to  their  stems  
so  that  they  may  be  matched  against  the  corresponding  dictionary  stem  
entries  and  , secondly  , after  matching  of  stems  , that  part  of  the  
referred  word  split  off  to  give  the  stem  requires  interpretation  to  
determine  its  grammatical  significance  for  that  stem .sx   The  first  
process  is  known  as  affix-splitting  and  consists  of  matching  the  end  
of  a  referred  word  against  a  list  of  recognised  affixes  having  
grammatical  significance .sx   The  process  is  fully  described  in  a  
companion  paper  to  this .sx   We  shall  refer  to  the  results  of  these  
papers  where  necessary .sx   The  second  process  , affix  interpretation  , is  
the  subject  of  this  paper .sx   The  extra  grammatical  properties  of  the  
referred  word  revealed  by  affix  identification  , in  addition  to  those  
identifiable  in  the  stem  of  the  word  are  as  follows  , for  nouns  , 
adjectives  and  verbs :sx -  
NOUNS :sx -   .sx   
Number  and  case  
ADJECTIVES :sx -   .sx   
Number  , case  , gender  , short  or  long  form  
VERBS :sx -   .sx   
Person  , number  , tense  , gender  , mood  , voice  , and  , for  participles  
only  , case  and  short  or  long  form .sx   
Of  course  , not  all  combinations  of  these  properties  can  occur .sx   
The  majority  of  pronoun  forms  are  treated  like  adjectives .sx   The  
remaining  pronoun  forms  and  all  indeclinable  words  are  referred  to  
full-form  type  dictionary  entries  , and  do  not  participate  in  affix  
identification  , although  they  undergo  the  splitting  process .sx   
Affix  interpretation  is  necessary  for  all  stem  type  entries  as  
its  results  form  the  basis  of  systems  of  syntactic  analysis  designed  
to  improve  a  word-for-stem  type  " translation  " of  Russian  into  
English .sx   Rules  of  English  inflection  , insertion  of  prepositions  and  
auxiliaries  , suppression  of  Russian  equivalents  and  variations  of  word  
order  will  all  require  the  affix  interpretation  results .sx   
 .sx   PRINCIPLE  OF  INTERPRETATION   .sx   
THE  splitting  process  consists  in  matching  the  endings  of  text  
words  against  a  list  of  affixes  , and  splitting  off  any  matched  
affixes  , so  that  the  interpretation  problem  may  be  stated  as  the  
problem  of  giving  a  grammatical  significance  to  each  of  these  
recognised  affixes  when  they  are  found .sx   Now  some  of  the  affixes  will  
have  varying  significance  depending  on  the  stem  from  which  they  have  
been  split .sx   For  instance  , one  of  the  affixes  in  the  list  is  -A  , and  
this  can  have  five  different  interpretations :sx -  

 .sx   Genitive  singular  when  split  from  some  masculine  noun  stems .sx   
 .sx   Genitive  singular  and  nominative  plural  when  split  from  some  
other  masculine  noun  stems  and  from  neuter  noun  stems .sx   
 .sx   Nominative  singular  when  split  from  feminine  noun  stems .sx   
 .sx   Feminine  short  form  when  split  from  adjective  and  participle  
stems .sx   
 .sx   Present  Gerund  when  split  from  verb  stems .sx   

So  for  these  ambiguous  affixes  ( they  are  mostly  noun  affixes  ) it  
is  necessary  to  check  the  stem  type  from  which  the  affix  has  been  
split  before  giving  the  grammatical  significance .sx   
There  is  a  further  check  , on  the  validity  of  a  given  split  , 
which  can  be  conveniently  made  during  interpretation .sx   This  is  to  
check  that  the  matched  dictionary  stem  includes  the  split-off  affix  in  
the  declension  or  conjugation  intended  to  be  associated  with  it  in  the  
dictionary  compilation  stage .sx   We  call  this  check  reconciliation  of  
stem  and  affix  , and  it  is  necessary  because  of  the  occurrence  of  stem  
homographs  and  also  because  of  the  possibility  of  a  text  word  whose  
true  stem  is  not  entered  in  the  dictionary  being  falsely  split  and  the  
resulting  stem  matching  with  a  dictionary  stem .sx   
We  combine  interpretation  and  reconciliation  in  one  operation  , 
making  use  of  a  paradigm  indicator  associated  with  each  stem  , and  one  
or  more  role  indicators  associated  with  each  affix .sx   By  speaking  of  
the  paradigm  of  a  stem  , we  mean  that  set  of  our  recognised  affixes  , 
all  of  which  combine  with  that  stem  to  form  valid  inflectional  forms  
of  one  Russian  word .sx   Thus  each  stem  entry  in  the  dictionary  contains  
a  computer  word  , known  as  the  paradigm  indicator  word  ( PIW  ) , 
which  indicates  by  a  binary  pattern  the  paradigm  of  that  stem .sx   
There  are  three  different  formats  for  the  PIW  for  noun  , 
adjective  and  verb  stems .sx   The  verb  format  is  used  for  two  types  of  
verb  stems  , but  in  each  case  it  represents  a  different  set  of  endings .sx   
This  was  only  necessary  in  practice  because  one  computer  word  (  the  
ACE  word  is  48  binary  digits  ( bits  ) long  ) is  not  long  enough  
to  represent  all  the  verbal  affixes .sx   We  shall  consider  the  noun  
format  of  the  PIW  as  a  specific  example .sx   
The  word  is  divided  into  fields  , one  for  each  of  the  case  and  
number  combinations  of  nouns .sx   Accusative  plural  is  excluded  , as  its  
endings  follow  those  of  nominative  plural  or  genitive  plural  depending  
on  the  animation  of  the  noun .sx   In  each  field  , a  bit  position  is  
associated  with  each  affix  that  conveys  the  significance  of  that  field  
with  a  noun  stem .sx   The  noun  format  is  shown  in  Figure  1 .sx   ( #  is  
our  symbol  for  the  null  affix) .sx   In  the  accusative  singular  field  , 
only  the  feminine  affixes  are  shown  , the  masculine  and  neuter  affixes  
being  implicit  from  the  nominative  singular  , and  genitive  singular  
fields  and  the  animation  marker  in  bit  position  43 .sx   We  could  have  
repeated  the  masculine  and  neuter  , nominative  and  genitive  singular  
endings  in  the  accusative  singular  field  , but  this  would  have  required  
more  bit  positions  than  are  available  in  an  ACE  word .sx   So  
simply  by  indicating  the  animation  of  a  noun  stem  , we  can  restrict  the  
paradigm  format  to  within  one  ACE  word .sx   
The  PIW  for  a  particular  noun  stem  is  formed  in  general  
by  inserting  a  binary  digit  1  in  the  bit  position  corresponding  to  the  
appropriate  affix  in  each  field .sx   For  example  , consider  the  stem  entry  
and  PIW  resulting  from  the  Russian  word  whose  nominative  
singular  is  1STOL  ( table) .sx   The  stem  entry  will  be  1STOL-  
and  the  set  of  affixes  which  give  all  the  inflected  forms  of  
1STOL  is  #  , 11A  , U  , E  , OM  , Y  , OV  , AM  , AKH  , AMI .sx   The  
PIW  will  thus  have  " ones  " in  positions  1  , 11  , 15  , 19  , 21  , 
26  , 32  , 37  , 39  and  41 .sx   

The  absence  of  a  " one  " in  bit  position  43  indicates  the  inanimate  
nature  of  the  stem  and  hence  implicitly  indicates  the  accusative  
singular  and  accusative  plural  endings .sx   A  stem  which  takes  
alternative  affixes  in  a  given  field  will  have  " ones  " in  the  bit  
positions  of  both  affixes  e.g.  the  stem  1VOLOS  ( hair  ) has  
the  alternative  affixes  1Y  and  1A  in  the  nominative  plural  form .sx   
Where  a  stem  is  not  common  to  all  inflected  forms  of  a  word  , only  
those  fields  to  which  that  stem  applies  will  have  a  " one  " in  them  
e.g.  the  stem  1BRAT-  ( brother  ) applies  to  the  singular  
inflected  forms  only  ( 1  , 11  , 15  , 19  , 21  , 43  ) while  the  stem  
1BRAT'-  applies  to  the  plural  forms  ( 29  , 33  , 36  , 38  , 40  , 43) .sx   
The  formats  for  adjectives  and  verbs  are  shown  in  Figure  2  
and  in  principle  are  similar  to  the  noun  format .sx   They  all  have  more  
fields  than  the  noun  format  , but  have  much  less  variety  of  affixes  
within  each  field .sx   The  two  verb  formats  have  identical  fields  , but  
mostly  different  affixes  in  those  fields .sx   They  include  fields  for  
participle  affixes  , but  the  affixes  in  these  fields  are  only  the  
participle  stem-building  affixes .sx   However  , as  participle  adjectival  
endings  follow  a  perfectly  regular  pattern  , they  need  not  be  
explicitly  stated  in  the  PIW .sx   
Nearly  all  nouns  and  adjectives  will  require  only  one  stem  and  
PIW  to  represent  all  their  inflected  forms .sx   Approximately  
2/3  of  Russian  verbs  will  need  only  one  stem  , most  of  the  rest  
requiring  two  stems  , and  only  the  irregular  verbs  more  than  two .sx   
The  PIW  are  compiled  by  the  computer  from  data  sheets  
( dictionary  entry  forms  ) one  of  which  is  manually  completed  for  each  
word  to  be  entered  into  the  dictionary .sx   There  is  a  different  data  
sheet  for  each  of  several  broad  classes  of  noun  declension  , so  as  to  
limit  the  linguistic  decisions  to  be  made  in  completing  the  sheets  , 
but  all  noun  data  sheets  refer  to  the  one  standard  format  for  the  noun  
PIW .sx   There  are  similar  data  sheets  for  adjectives  and  the  
two  types  of  verbs  , in  these  cases  only  one  type  of  data  sheet  per  
format  , because  of  the  lesser  variety  of  inflection .sx   
With  the  provision  of  a  PIW  in  each  stem  entry  in  the  
dictionary  , the  problem  of  interpretation  of  an  affix  which  has  
occurred  on  a  given  stem  as  a  text  word  , is  resolved  into  spotlighting  
the  occurrences  ( if  any  ) of  that  affix  in  the  PIW  for  that  
stem  and  noting  the  fields  ( grammatical  properties  ) in  which  they  
occur .sx   This  is  most  easily  done  by  having  , for  that  affix  , a  masking  
pattern  containing  a  " one  " bit  corresponding  to  each  occurrence  of  
it  in  the  PIW  format .sx   Then  , by  performing  a  " logical  and  " 
operation  between  this  mask  and  the  PIW  of  the  given  stem  , 
the  result  will  contain  a  " one  " bit  in  each  field  where  that  affix  
has  significance  for  the  given  stem .sx   Of  course  , if  the  result  was  
zero  , this  would  mean  that  the  affix  and  stem  were  incompatible  
i.e.  the  stem  did  not  combine  with  the  affix  in  any  meaningful  
inflection .sx   This  situation  may  arise  with  stem  homographs  and  with  
words  whose  true  stems  are  not  yet  compiled  into  the  dictionary  and  
are  falsely  split .sx   In  the  latter  case  the  PIW  would  not  
contain  the  falsely  split  affix .sx   
The  masking  pattern  referred  to  above  we  call  the  role  indicator  
word  ( RIW  ) for  the  given  affix .sx   Some  affixes  have  
significance  with  more  than  one  of  the  PIW  formats  , and  for  
these  there  will  need  to  be  more  than  one  RIW  e.g.  1I  
has  significance  for  and  appears  in  each  of  the  four  PIW  
formats  , so  it  will  have  four  RIW .sx   In  order  to  be  able  to  
match  the  appropriate  RIW  to  a  given  PIW  in  an  
interpretation  , the  format  types  are  given  a  type  number  ( digits  47  
and  48  ) and  the  RIW  which  relate  to  these  types  are  given  the  
corresponding  type  no .sx   There  are  identical  1I  and  1E  verb  
RIW  for  each  of  10  verbal  affixes  11(U  , JU  , I  , J  , '  , JTE  , 
'TE  , A  , JA  , ENN  ) and  so  we  save  some  space  in  storing  the  
RIW  by  having  only  one  verb  RIW  for  each  of  these  10  
and  indicating  its  dual  utility .sx   
Let  us  consider  two  examples  of  interpretation  of  noun  forms  
1AVTOMOBILI  and  1NEDELI  , which  would  be  matched  against  
the  dictionary  stems  1AVTOMOBIL-  and  1NEDEL-  respectively  , 
with  1I  as  the  affix  to  be  interpreted  in  both  cases .sx   The  
PIW  for  the  noun  stem  1AVTOMOBIL-  and  the  noun  type  
RIW  for  1I  would  be  as  shown  in  Figure  3 .sx   The  
" logical-and  " of  these  two  computer  words  would  give  a  " one  " bit  
in  position  28  only  i.e.  in  the  nominative  plural  field .sx   The  
PIW  for  1NEDEL-  is  also  shown  in  Figure  3  and  the  
result  of  " anding  " this  word  with  the  RIW  for  1I  would  
be  a  " one  " bit  in  positions  14  and  28  i.e.  in  the  genitive  
singular  and  nominative  plural  fields .sx