I am making use of sudachipy via ginza, and am trying to annotate the following sentences.
プロ野球の中日で選手、監督を務め、1月4日に70歳で死去した星野仙一氏をしのび、3日、名古屋市東区のナゴヤドームで行われた中日―楽天のオープン戦は追悼試合として開催された。
明治大の後輩、島内宏明外野手は「改めてすごい人だったんだなと思った」と話した。
And in my dictionary I have the following lines, which match 明治 and 楽天 in the above.
There are no other lines in the dictionary that match any substrings in the sentence.
楽天,1288,1288,100,楽天_4755-2018,名詞,固有名詞,組織,上場会社,*,*,RAKUTEN,楽天,*,*,*,*,*
明治,1288,1288,100,明治_2261-2009,名詞,固有名詞,組織,上場会社,*,*,MEIJI,明治,*,*,*,*,*
When I try and run annotations with this configuration, i get the below error:
...
File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/language.py", line 441, in __call__
doc = self.make_doc(text)
File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 281, in make_doc
return self.tokenizer(text)
File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 144, in __call__
dtokens = self._get_dtokens(sudachipy_tokens)
File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 182, in _get_dtokens
) for idx, token in enumerate(sudachipy_tokens) if len(token.surface()) > 0
File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 182, in <listcomp>
) for idx, token in enumerate(sudachipy_tokens) if len(token.surface()) > 0
File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/morpheme.py", line 36, in part_of_speech
return self.list.grammar.get_part_of_speech_string(wi.pos_id)
File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/sudachipy/dictionarylib/grammar.py", line 55, in get_part_of_speech_string
return self.pos_list[pos_id]
IndexError: list index out of range
Could someone advise me as to what is causing this error please?
I am quite certain the sentence with 明治 is causing the issue,as if i remove the second sentence, the annotation works fine. It therefore seems like 楽天 is being picked up by SudachiPy with the dictionary, but 明治 is not.
Why is this?
I am making use of sudachipy via ginza, and am trying to annotate the following sentences.
And in my dictionary I have the following lines, which match
明治and楽天in the above.There are no other lines in the dictionary that match any substrings in the sentence.
When I try and run annotations with this configuration, i get the below error:
Could someone advise me as to what is causing this error please?
I am quite certain the sentence with
明治is causing the issue,as if i remove the second sentence, the annotation works fine. It therefore seems like楽天is being picked up by SudachiPy with the dictionary, but明治is not.Why is this?