Is it possible to use pytesseract in replit?
It requires tesseract:
But sudo isn’t allowed in replit and apt-get doesn’t run without sudo either.
import pytesseract
from pytesseract import Output
import pandas as pd
from PIL import Image
import numpy as np
img = np.array(Image.open("img20230815_15015501.jpg"))
custom_config = (
r" -l eng --oem 1 --psm 6 -c preserve_interword_spaces=1pip install opencv-python"
)
d = pytesseract.image_to_data(img, config=custom_config, output_type=Output.DICT)
df = pd.DataFrame(d)
Hi @AaronBuchanan3 , welcome to the forums!
Can you try entering pip install pytesseract
into the Shell?
Hope this helps!
Hi @AaronBuchanan3 ! Caould you also try poetry add pytesseract
if my above post’s answer doesn’t work?
Thanks!
I can pip install pytesseract, but if requires tesseract.exe and a languge file to exist in the path. It is the exe that I can’t figure out how to install. Without these the pytesseract wrapper is useless. Still trying to learn this environment.
Thanks
Hi @AaronBuchanan3 ! Replit does not support .exe
files. Sorry.
1 Like
Thanks,
Not being able to leverage tesseract seems to be a pretty big void in the tech stack. Hopefully they will add it to the solution. Image OCR seems to be a pretty common use case for python. Back to my local environment for a while.
Thanks! This does get me closer. I did add the
deps = [
pkgs.tesseract
pkgs.python310Packages.tesserocr..
to my nix file, but it still doesnt have the language file.
Traceback (most recent call last):
File "main.py", line 17, in <module>
d = pytesseract.image_to_data(img, config=custom_config, output_type=Output.DICT)
File "/home/runner/ocrpreservingspacesfromimage/venv/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 527, in image_to_data
return {
File "/home/runner/ocrpreservingspacesfromimage/venv/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 533, in <lambda>
Output.DICT: lambda: file_to_dict(run_and_get_output(*args), '\t', -1),
File "/home/runner/ocrpreservingspacesfromimage/venv/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 288, in run_and_get_output
run_tesseract(**kwargs)
File "/home/runner/ocrpreservingspacesfromimage/venv/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 264, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (-11, 'Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from /nix/store/arjxcwp7fw87anxybq54mll8yfq3pdq6-tesseract-3.05.00/share/tessdata/eng.cube.lm Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file tessedit.cpp, line 210')
tesseract is a two part install, first tesseract and then the language specific file. Found tesseract per your input but still can find the -eng langugae file to put int the path