Can't install the required tesseract.exe for pytesseract to function

Is it possible to use pytesseract in replit?

It requires tesseract:

But sudo isn’t allowed in replit and apt-get doesn’t run without sudo either.

import pytesseract
from pytesseract import Output
import pandas as pd

from PIL import Image
import numpy as np

img = np.array(Image.open("img20230815_15015501.jpg"))

custom_config = (
    r" -l eng --oem 1 --psm 6  -c preserve_interword_spaces=1pip install opencv-python"
)
d = pytesseract.image_to_data(img, config=custom_config, output_type=Output.DICT)
df = pd.DataFrame(d)

Hi @AaronBuchanan3 , welcome to the forums!
Can you try entering pip install pytesseract into the Shell?
Hope this helps!

Hi @AaronBuchanan3 ! Caould you also try poetry add pytesseract if my above post’s answer doesn’t work?
Thanks!

I can pip install pytesseract, but if requires tesseract.exe and a languge file to exist in the path. It is the exe that I can’t figure out how to install. Without these the pytesseract wrapper is useless. Still trying to learn this environment.

Thanks

Hi @AaronBuchanan3 ! Replit does not support .exe files. Sorry.

1 Like

Thanks,

Not being able to leverage tesseract seems to be a pretty big void in the tech stack. Hopefully they will add it to the solution. Image OCR seems to be a pretty common use case for python. Back to my local environment for a while.

Is this post of any use?

3 Likes

Thanks! This does get me closer. I did add the

deps = [
    pkgs.tesseract
    pkgs.python310Packages.tesserocr..

to my nix file, but it still doesnt have the language file.

Traceback (most recent call last):
  File "main.py", line 17, in <module>
    d = pytesseract.image_to_data(img, config=custom_config, output_type=Output.DICT)
  File "/home/runner/ocrpreservingspacesfromimage/venv/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 527, in image_to_data
    return {
  File "/home/runner/ocrpreservingspacesfromimage/venv/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 533, in <lambda>
    Output.DICT: lambda: file_to_dict(run_and_get_output(*args), '\t', -1),
  File "/home/runner/ocrpreservingspacesfromimage/venv/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 288, in run_and_get_output
    run_tesseract(**kwargs)
  File "/home/runner/ocrpreservingspacesfromimage/venv/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 264, in run_tesseract
    raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (-11, 'Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from /nix/store/arjxcwp7fw87anxybq54mll8yfq3pdq6-tesseract-3.05.00/share/tessdata/eng.cube.lm Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file tessedit.cpp, line 210')
 

tesseract is a two part install, first tesseract and then the language specific file. Found tesseract per your input but still can find the -eng langugae file to put int the path