パスワードを忘れた? アカウント作成
14985116 journal
Google

yasuokaの日記: Alpinoのオランダ語係り受け解析をGoogle Colaboratoryで動かす

日記 by yasuoka

ネットサーフィンしていたところ、Alpinoというオランダ語係り受け解析エンジンを見つけた。Linux向けのバイナリも配布されているので、とりあえずGoogle Colaboratoryで動かしてみることにした。

!test -f Alpino.tar.gz || curl -L https://www.let.rug.nl/vannoord/alp/Alpino/versions/binary/latest.tar.gz -o Alpino.tar.gz
!test -d Alpino || tar xzf Alpino.tar.gz
!rm -f [1-9]*.xml [1-9]*.tab*
!echo Toch houd ik ze vast, ondanks alles, omdat ik nog steeds aan de innerlijke goedheid van den mens geloof. | Alpino/bin/Alpino -flag treebank . end_hook=xml -parse
!cat `ls -v [1-9]*.xml` /dev/null
!python2 Alpino/alpino2conll/tools/alpino2tab.py -c -f -p -r -t'\n' -w *.xml
!export ALPINO_HOME=Alpino ALPINO2CONLL_HOME=Alpino/alpino2conll ; for F in `ls -v *.tab` ; do python2 Alpino/alpino2conll/tools/tag.py -w2 -f $F ; cat $F'2' ; done

「Toch houd ik ze vast, ondanks alles, omdat ik nog steeds aan de innerlijke goedheid van den mens geloof.」を係り受け解析してみたところ、私(安岡孝一)の手元では以下の結果になった。

<?xml version="1.0" encoding="UTF-8"?>
<alpino_ds version="1.6">
  <parser build="Alpino-x86_64-linux-glibc2.5-git233-sicstus" date="2020-11-12T06:41" cats="3" skips="1" />
  <node begin="0" cat="top" end="19" id="0" rel="top">
    <node begin="7" conjtype="onder" end="8" frame="complementizer" his="robust_skip" id="1" lcat="--" lemma="omdat" pos="comp" postag="VG(onder)" pt="vg" rel="--" root="omdat" sense="omdat" word="omdat"/>
    <node begin="0" cat="du" end="19" id="2" rel="--">
      <node begin="0" cat="smain" end="7" id="3" rel="dp">
        <node begin="0" end="1" frame="adverb" his="normal" his_1="decap" his_1_1="normal" id="4" lcat="advp" lemma="toch" pos="adv" postag="BW()" pt="bw" rel="mod" root="toch" sense="toch" word="Toch"/>
        <node begin="1" end="2" frame="verb(hebben,sg1,nonp_pred_np)" his="normal" his_1="normal" id="5" infl="sg1" lcat="smain" lemma="houden" pos="verb" postag="WW(pv,tgw,ev)" pt="ww" pvagr="ev" pvtijd="tgw" rel="hd" root="houd" sc="nonp_pred_np" sense="houd" stype="declarative" tense="present" word="houd" wvorm="pv"/>
        <node begin="2" case="nom" def="def" end="3" frame="pronoun(nwh,fir,sg,de,nom,def)" gen="de" getal="ev" his="normal" his_1="normal" id="6" lcat="np" lemma="ik" naamval="nomin" num="sg" pdtype="pron" per="fir" persoon="1" pos="pron" postag="VNW(pers,pron,nomin,vol,1,ev)" pt="vnw" rel="su" rnum="sg" root="ik" sense="ik" status="vol" vwtype="pers" wh="nwh" word="ik"/>
        <node begin="3" case="both" def="def" end="4" frame="pronoun(nwh,thi,both,de,both,def,wkpro)" gen="de" getal="mv" his="normal" his_1="normal" id="7" lcat="np" lemma="ze" naamval="stan" num="both" pdtype="pron" per="thi" persoon="3" pos="pron" postag="VNW(pers,pron,stan,red,3,mv)" pt="vnw" rel="obj1" rnum="pl" root="ze" sense="ze" special="wkpro" status="red" vwtype="pers" wh="nwh" word="ze"/>
        <node aform="base" begin="4" buiging="zonder" end="5" frame="adjective(no_e(adv))" graad="basis" his="mistok" id="8" infl="no_e" lcat="ap" lemma="vast" pos="adj" positie="vrij" postag="ADJ(vrij,basis,zonder)" pt="adj" rel="predc" root="vast" sense="vast" vform="adj" word="vast,"/>
        <node begin="5" cat="pp" end="7" id="9" rel="mod">
          <node begin="5" end="6" frame="preposition(ondanks,[])" his="normal" his_1="normal" id="10" lcat="pp" lemma="ondanks" pos="prep" postag="VZ(init)" pt="vz" rel="hd" root="ondanks" sense="ondanks" vztype="init" word="ondanks"/>
          <node begin="6" end="7" frame="noun(het,mass,sg)" gen="het" getal="ev" his="mistok" id="11" lcat="np" lemma="alles" naamval="stan" num="sg" pdtype="pron" persoon="3o" pos="noun" postag="VNW(onbep,pron,stan,vol,3o,ev)" pt="vnw" rel="obj1" rnum="sg" root="alles" sense="alles" status="vol" vwtype="onbep" word="alles,"/>
        </node>
      </node>
      <node begin="8" case="nom" def="def" end="9" frame="pronoun(nwh,fir,sg,de,nom,def)" gen="de" getal="ev" his="normal" his_1="normal" id="12" lcat="np" lemma="ik" naamval="nomin" num="sg" pdtype="pron" per="fir" persoon="1" pos="pron" postag="VNW(pers,pron,nomin,vol,1,ev)" pt="vnw" rel="dp" rnum="sg" root="ik" sense="ik" status="vol" vwtype="pers" wh="nwh" word="ik"/>
      <node begin="9" cat="du" end="19" id="13" rel="dp">
        <node begin="9" cat="advp" end="11" id="14" rel="dp">
          <node begin="9" end="10" frame="modal_adverb" his="normal" his_1="normal" id="15" lcat="advp" lemma="nog" pos="adv" postag="BW()" pt="bw" rel="mod" root="nog" sc="modal" sense="nog" word="nog"/>
          <node begin="10" end="11" frame="adverb" his="normal" his_1="normal" id="16" lcat="advp" lemma="steeds" pos="adv" postag="BW()" pt="bw" rel="hd" root="steeds" sense="steeds" word="steeds"/>
        </node>
        <node begin="11" cat="pp" end="18" id="17" rel="dp">
          <node begin="11" end="12" frame="preposition(aan,[vooraf])" his="normal" his_1="normal" id="18" lcat="pp" lemma="aan" pos="prep" postag="VZ(init)" pt="vz" rel="hd" root="aan" sense="aan" vztype="init" word="aan"/>
          <node begin="12" cat="np" end="18" id="19" rel="obj1">
            <node begin="12" end="13" frame="determiner(de)" his="normal" his_1="normal" id="20" infl="de" lcat="detp" lemma="de" lwtype="bep" naamval="stan" npagr="rest" pos="det" postag="LID(bep,stan,rest)" pt="lid" rel="det" root="de" sense="de" word="de"/>
            <node aform="base" begin="13" buiging="met-e" end="14" frame="adjective(e)" graad="basis" his="normal" his_1="normal" id="21" infl="e" lcat="ap" lemma="innerlijk" naamval="stan" pos="adj" positie="prenom" postag="ADJ(prenom,basis,met-e,stan)" pt="adj" rel="mod" root="innerlijk" sense="innerlijk" vform="adj" word="innerlijke"/>
            <node begin="14" end="15" frame="noun(de,count,sg)" gen="de" genus="zijd" getal="ev" graad="basis" his="normal" his_1="normal" id="22" lcat="np" lemma="goedheid" naamval="stan" ntype="soort" num="sg" pos="noun" postag="N(soort,ev,basis,zijd,stan)" pt="n" rel="hd" rnum="sg" root="goedheid" sense="goedheid" word="goedheid"/>
            <node begin="15" cat="pp" end="18" id="23" rel="mod">
              <node begin="15" end="16" frame="preposition(van,[af,uit,vandaan,[af,aan]])" his="normal" his_1="normal" id="24" lcat="pp" lemma="van" pos="prep" postag="VZ(init)" pt="vz" rel="hd" root="van" sense="van" vztype="init" word="van"/>
              <node begin="16" cat="np" end="18" id="25" rel="obj1">
                <node begin="16" end="17" frame="determiner(den)" his="normal" his_1="variant" id="26" infl="den" lcat="detp" lemma="de" lwtype="bep" naamval="dat" npagr="evmo" pos="det" postag="LID(bep,dat,evmo)" pt="lid" rel="det" root="de" sense="de" word="den"/>
                <node begin="17" end="18" frame="noun(both,count,sg)" gen="both" genus="zijd" getal="ev" graad="basis" his="normal" his_1="normal" id="27" lcat="np" lemma="mens" naamval="stan" ntype="soort" num="sg" pos="noun" postag="N(soort,ev,basis,zijd,stan)" pt="n" rel="hd" rnum="sg" root="mens" sense="mens" word="mens"/>
              </node>
            </node>
          </node>
        </node>
        <node begin="18" end="19" frame="noun(het,count,sg)" gen="het" genus="onz" getal="ev" graad="basis" his="mistok" id="28" lcat="np" lemma="geloof" naamval="stan" ntype="soort" num="sg" pos="noun" postag="N(soort,ev,basis,onz,stan)" pt="n" rel="dp" rnum="sg" root="geloof" sense="geloof" word="geloof."/>
      </node>
    </node>
  </node>
  <sentence sentid="1">Toch houd ik ze vast, ondanks alles, omdat ik nog steeds aan de innerlijke goedheid van den mens geloof.</sentence>
  <comments>
    <comment>Q#1|Toch houd ik ze vast, ondanks alles, omdat ik nog steeds aan de innerlijke goedheid van den mens geloof.|1|4|-3.143097559099998</comment>
  </comments>
</alpino_ds>
converting 1.xml to 1.tab
retagging 1.tab to 1.tab2
['1', 'Toch', 'toch', 'adv', '2', 'mod', '_', '_\n']
['2', 'houd', 'houd', 'verb', '0', 'ROOT', '_', '_\n']
['3', 'ik', 'ik', 'pron', '2', 'su', '_', '_\n']
['4', 'ze', 'ze', 'pron', '2', 'obj1', '_', '_\n']
['5', 'vast,', 'vast', 'adj', '2', 'predc', '_', '_\n']
['6', 'ondanks', 'ondanks', 'prep', '2', 'mod', '_', '_\n']
['7', 'alles,', 'alles', 'noun', '6', 'obj1', '_', '_\n']
['8', 'omdat', 'omdat', 'comp', '0', 'ROOT', '_', '_\n']
['9', 'ik', 'ik', 'pron', '0', 'ROOT', '_', '_\n']
['10', 'nog', 'nog', 'adv', '11', 'mod', '_', '_\n']
['11', 'steeds', 'steeds', 'adv', '0', 'ROOT', '_', '_\n']
['12', 'aan', 'aan', 'prep', '0', 'ROOT', '_', '_\n']
['13', 'de', 'de', 'det', '15', 'det', '_', '_\n']
['14', 'innerlijke', 'innerlijk', 'adj', '15', 'mod', '_', '_\n']
['15', 'goedheid', 'goedheid', 'noun', '12', 'obj1', '_', '_\n']
['16', 'van', 'van', 'prep', '15', 'mod', '_', '_\n']
['17', 'den', 'de', 'det', '18', 'det', '_', '_\n']
['18', 'mens', 'mens', 'noun', '16', 'obj1', '_', '_\n']
['19', 'geloof.', 'geloof', 'noun', '0', 'ROOT', '_', '_\n']
['\n']
Parse string with Alpino
['[ @mwu Toch ]', '[ @mwu houd ]', '[ @mwu ik ]', '[ @mwu ze ]', '[ @mwu vast, ]', '[ @mwu ondanks ]', '[ @mwu alles, ]', '[ @mwu omdat ]', '[ @mwu ik ]', '[ @mwu nog ]', '[ @mwu steeds ]', '[ @mwu aan ]', '[ @mwu de ]', '[ @mwu innerlijke ]', '[ @mwu goedheid ]', '[ @mwu van ]', '[ @mwu den ]', '[ @mwu mens ]', '[ @mwu geloof. ]']
hdrug: process 374 on host 3524fa2f1272 (datime(2020,11,12,6,41,48))
[Toch houd ik ze vast, ondanks alles,][omdat][ik][nog steeds aan de innerlijke goedheid van den mens geloof.]
Q#1|Toch houd ik ze vast, ondanks alles, omdat ik nog steeds aan de innerlijke goedheid van den mens geloof.|1|4|-3.143097559099998
OUT:
top/top|top|top/hd|houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|1
steeds/[10,11]|[]:adverb|dp/dp|aan/[11,12]|[]:preposition(aan,[vooraf])|1
steeds/[10,11]|[]:adverb|dp/dp|geloof/[18,19]|[rnum=sg]:noun(het,count,sg)|1
steeds/[10,11]|[]:adverb|hd/mod|nog/[9,10]|[]:modal_adverb|1
aan/[11,12]|[]:preposition(aan,[vooraf])|hd/obj1|goedheid/[14,15]|[rnum=sg]:noun(de,count,sg)|1
ondanks/[5,6]|[]:preposition(ondanks,[])|hd/obj1|alles/[6,7]|[rnum=sg]:noun(het,mass,sg)|1
van/[15,16]|[]:preposition(van,[af,uit,vandaan,[af,aan]])|hd/obj1|mens/[17,18]|[rnum=sg]:noun(both,count,sg)|1
mens/[17,18]|[rnum=sg]:noun(both,count,sg)|hd/det|de/[16,17]|[]:determiner(den)|1
goedheid/[14,15]|[rnum=sg]:noun(de,count,sg)|hd/det|de/[12,13]|[]:determiner(de)|1
goedheid/[14,15]|[rnum=sg]:noun(de,count,sg)|hd/mod|innerlijk/[13,14]|[]:adjective(e)|1
goedheid/[14,15]|[rnum=sg]:noun(de,count,sg)|hd/mod|van/[15,16]|[]:preposition(van,[af,uit,vandaan,[af,aan]])|1
houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|-- / --|omdat/[7,8]|complementizer|1
houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|dp/dp|steeds/[10,11]|[]:adverb|1
houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|dp/dp|ik/[8,9]|[rnum=sg]:pronoun(nwh,fir,sg,de,nom,def)|1
houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|hd/mod|toch/[0,1]|[]:adverb|1
houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|hd/mod|ondanks/[5,6]|[]:preposition(ondanks,[])|1
houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|hd/obj1|ze/[3,4]|[rnum=pl]:pronoun(nwh,thi,both,de,both,def,wkpro)|1
houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|hd/predc|vast/[4,5]|[]:adjective(no_e(adv))|1
houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|hd/su|ik/[2,3]|[rnum=sg]:pronoun(nwh,fir,sg,de,nom,def)|1

** top/top|top|top/hd|houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|1
** steeds/[10,11]|[]:adverb|dp/dp|aan/[11,12]|[]:preposition(aan,[vooraf])|1
** steeds/[10,11]|[]:adverb|dp/dp|geloof/[18,19]|[rnum=sg]:noun(het,count,sg)|1
** steeds/[10,11]|[]:adverb|hd/mod|nog/[9,10]|[]:modal_adverb|1
** aan/[11,12]|[]:preposition(aan,[vooraf])|hd/obj1|goedheid/[14,15]|[rnum=sg]:noun(de,count,sg)|1
** ondanks/[5,6]|[]:preposition(ondanks,[])|hd/obj1|alles/[6,7]|[rnum=sg]:noun(het,mass,sg)|1
** van/[15,16]|[]:preposition(van,[af,uit,vandaan,[af,aan]])|hd/obj1|mens/[17,18]|[rnum=sg]:noun(both,count,sg)|1
** mens/[17,18]|[rnum=sg]:noun(both,count,sg)|hd/det|de/[16,17]|[]:determiner(den)|1
** goedheid/[14,15]|[rnum=sg]:noun(de,count,sg)|hd/det|de/[12,13]|[]:determiner(de)|1
** goedheid/[14,15]|[rnum=sg]:noun(de,count,sg)|hd/mod|innerlijk/[13,14]|[]:adjective(e)|1
** goedheid/[14,15]|[rnum=sg]:noun(de,count,sg)|hd/mod|van/[15,16]|[]:preposition(van,[af,uit,vandaan,[af,aan]])|1
** houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|-- / --|omdat/[7,8]|complementizer|1
** houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|dp/dp|steeds/[10,11]|[]:adverb|1
** houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|dp/dp|ik/[8,9]|[rnum=sg]:pronoun(nwh,fir,sg,de,nom,def)|1
** houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|hd/mod|toch/[0,1]|[]:adverb|1
** houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|hd/mod|ondanks/[5,6]|[]:preposition(ondanks,[])|1
** houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|hd/obj1|ze/[3,4]|[rnum=pl]:pronoun(nwh,thi,both,de,both,def,wkpro)|1
** houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|hd/predc|vast/[4,5]|[]:adjective(no_e(adv))|1
** houd/[1,2]|[stype=declarative]:verb(hebben,sg1,nonp_pred_np)|hd/su|ik/[2,3]|[rnum=sg]:pronoun(nwh,fir,sg,de,nom,def)|1
Traceback (most recent call last):
  File "Alpino/alpino2conll/tools/tag.py", line 661, in <module>
    retag(intabfn, outtabfn)
  File "Alpino/alpino2conll/tools/tag.py", line 68, in retag
    tmptags = parse.alpino(tokens, outtabfn+".parsed")
  File "/content/Alpino/alpino2conll/tools/alpino.py", line 52, in alpino
    output = triples2tab(output)
  File "/content/Alpino/alpino2conll/tools/alpino.py", line 82, in triples2tab
    row.append(split[1].split(",")[0]) # word index
IndexError: list index out of range
cat: 1.tab2: No such file or directory

うーむ、前半のxmlを出力するところと、その次のalpino2tab.pyまでは動いているのだが、最後のtag.pyがIndexErrorでコケている。ただ、このtag.pyはpython2で書かれているので、そもそも動かないのが当然なのかもしれない。困ったな。

この議論は、yasuoka (21275)によって ログインユーザだけとして作成されたが、今となっては 新たにコメントを付けることはできません。
typodupeerror

計算機科学者とは、壊れていないものを修理する人々のことである

読み込み中...