Sudachipy tokenizer in replit

Question:

Hey, replit newbie here. Trying to make a serverless backend to hide my openai keys. Part of what I want requires sudachipy.

I’ve installed SudachiPy@0.6.7 and SudachiDict-full@20230711 via the package manager.

Repl link:

code snippet
from sudachipy import tokenizer
from sudachipy import dictionary

tokenizer_obj = dictionary.Dictionary(dict_type="full").create() 

but i get the error

code snippet
<stdin>:1: DeprecationWarning: Parameter dict_type of Dictionary() is deprecated, use dict instead
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
Exception: Error loading config: IO Error: No such file or directory (os error 2)

Any ideas?

1 Like

The error message you’re seeing is related to a change in the SudachiPy library’s parameter names. In the newer version of SudachiPy, the parameter dict_type has been deprecated, and you should use dict_path instead. Additionally, you may encounter an “IO Error: No such file or directory” error if the dictionary files are not found.

from sudachipy import tokenizer
from sudachipy import dictionary

dictionary_path = "/path/to/your/sudachi/dictionary/directory"

try:
    tokenizer_obj = dictionary.Dictionary(dict_path=dictionary_path).create()
except Exception as e:
    print("An error occurred:", str(e))
1 Like

ah, nice! any idea how i find the install directory for the dictionary?

ok, so I’ve managed to get it to work by:

editing

venv/lib/python3.10/site-packages/sudachipy/resources/sudachi.json

to make

"systemDict" : "venv/lib/python3.10/site-packages/sudachidict_full/resources/system.dic",

and then using

tokenizer_obj = dictionary.Dictionary(config_path="venv/lib/python3.10/site-packages/sudachipy/resources/sudachi.json").create()

in the code

but it does feel very involved

1 Like