Using Python to modify Word documents into large print for visually-impaired people

r8yswjm9x8 · February 14, 2024, 11:31am

I am trying to write a Python program to transform a Word document in small font size to one in large font size with some specific formatting requirements.

So far I am failing at the first hurdle as the Word files my program creates are not being recognised as Word files in replit. When I download them, they are recognised as Word files but will not open because they are said to include corrupted content (even when they just contain just 2 words). I am working on MacOS within the replit website or app. I have installed the python-docx library.

Within my repl, the files I import or create should look like Word files (with a blue W icon). Hopefully then they will open correctly directly from repl and if downloaded.

Repl link: https://replit.com/@r8yswjm9x8/Adapting-Word-documents-into-Modified-Large-Print-36

document = Document()
document.save('test2.docx')

WindLother · February 14, 2024, 12:24pm

Tbh the appearance of the file icons in an IDE (like Replit) depends more on the IDE capability to recognize the file and not on the Python code itself.
Likewise, Replit not recognizing the file doesn’t mean a problem with the file itself (you can try downloading the file and opening it with a local Word processor to see if it works correctly).

What I do recommend you to change, is that when setting the font name and size, you should do it directly on the run, not on the paragraph style.

r8yswjm9x8 · February 14, 2024, 2:46pm

Thank you for your reply.

I have tried downloading the .docx files created by my Python program but get error messages saying they are corrupted and won’t open: “Word found unreadable content in “Hello World output.docx”. Do you want to recover the contents of this document? If you trust the source of this document, click Yes.”. I’m unable to recover the content.

Thanks for your advice regarding setting the font name and size directly on the run. I’m not sure what command to use for this. I’ll experiment but do let me know if you know the syntax I should be using. Thanks again!

WindLother · February 14, 2024, 2:56pm

You don’t have to change much from your original program, just the paragraph part:

from docx import Document
from docx.shared import Pt
from docx.enum.section import WD_ORIENTATION
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT

def transform_document(input_file, output_file):
    # Open the input Word document
    doc = Document(input_file)

    # Create a new output Word document with specified settings
    new_doc = Document()
    section = new_doc.sections[0]
    section.orientation = WD_ORIENTATION.LANDSCAPE
    section.left_margin = Pt(20)
    section.right_margin = Pt(20)
    section.top_margin = Pt(20)
    section.bottom_margin = Pt(20)

    # Loop through each paragraph in the input document
    for para in doc.paragraphs:
        # Add the paragraph to the new document with specified formatting
        new_para = new_doc.add_paragraph()
        run = new_para.add_run(para.text)
        run.bold = True
        run.font.name = 'Arial'
        run.font.size = Pt(36)
        new_para.paragraph_format.line_spacing = 1.5
           
    # Save the new document
    new_doc.save(output_file)

# Provide the input and output file paths
input_file = 'Hello World input.docx'
output_file = 'Hello World output.docx'

# Call the function to transform the document
transform_document(input_file, output_file)

r8yswjm9x8 · February 14, 2024, 3:53pm

Ah brilliant, I see. Thanks SO MUCH!

Fairies0feast · February 14, 2024, 6:48pm

Please mark the reply that helped you most as the solution.

system · February 21, 2024, 6:49pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.