Base10 --> Unicode

Question:
I need code to translate a base 10 number into a 2 digit unicode “number” (base149,000)
why is it all hashtags??? (output_unicode_chars.txt)

Repl link:
linky link

# ChatGPT sucks >=[
# decimal_to_unicode.py


def decimal_to_unicode(decimal_number):
    try:
        # Check if the decimal number is within the valid Unicode code point range
        if 0 <= decimal_number <= 0x10FFFF:
            # Use encode to handle Unicode encoding
            unicode_char = (
                chr(decimal_number).encode("utf-8", "surrogatepass").decode("utf-8")
            )
            return unicode_char
        else:
            # If the decimal number is outside the range, use a placeholder character or handle it as needed
            return "##"
    except (ValueError, TypeError, OverflowError) as e:
        # Raise the exception with additional information
        raise ValueError(f"Error converting decimal {decimal_number} to Unicode") from e


def process_decimal_file(input_file, output_file):
    try:
        # Read the input text file
        with open(input_file, "r") as file:
            decimal_numbers = file.read().strip().split()
        # Check if the file is not empty
        if not decimal_numbers:
            raise ValueError(
                "Text file is empty or does not contain valid decimal numbers"
            )
        # Convert each decimal number to Unicode
        unicode_chars = [decimal_to_unicode(int(num)) for num in decimal_numbers]

        # Write the Unicode characters to the output file
        with open(output_file, "w", encoding="utf-8") as output_file:
            output_file.write("".join(unicode_chars))
        print("Conversion successful. Output saved to", output_file)
    except Exception as e:
        # Raise the exception with additional information
        raise Exception("Error during processing") from e


# Example usage
process_decimal_file("decimal_pixel_data.txt", "output_unicode_chars.txt")
# Output: Conversion successful. Output saved to output_unicode_chars.txt
1 Like

Add some debug and try to see what’s going on, for example

def decimal_to_unicode(decimal_number):
    print(f"Processing number: {decimal_number}")  # Add print here
    try:
        # Check if the decimal number is within the valid Unicode code point range
        if 0 <= decimal_number <= 0x10FFFF:
            unicode_char = chr(decimal_number).encode('utf-8', 'surrogatepass').decode('utf-8')
            print(f"Unicode character: {unicode_char}")  # And here
            return unicode_char
        else:
            # If the decimal number is outside the range, use a placeholder character
            print(f"Number out of range: {decimal_number}")  And here too
            return "##"
    except (ValueError, TypeError, OverflowError) as e:
        # Print the error for debugging
        print(f"Error converting number {decimal_number}: {e}")  # Debug print
        # Raise the exception with additional information
        raise ValueError(f"Error converting decimal {decimal_number} to Unicode") from e

Do the same thing for the process_decimal_file

2 Likes

I tested this code, and it worked. This is only the decimal_to_unicode function; your other functions should remain the same. I only tested the new decimal_to_unicode function, so:

def decimal_to_unicode(decimal_number):
  return "##" if not 0 <= decimal_number <= 149000 ** 2 - 1 else \
      chr((decimal_number // 149000) + 44032) + chr((decimal_number % 149000) + 44032)

Im really bad at inserting code at the right places, is it okay if yall can paste the whole code with the fix?

1 Like
def decimal_to_unicode(decimal_number):
    return "##" if not 0 <= decimal_number <= 149000 ** 2 - 1 else \
        chr((decimal_number // 149000) + 44032) + chr((decimal_number % 149000) + 44032)
  
def process_decimal_file(input_file, output_file):
    try:
        # Read the input text file
        with open(input_file, "r") as file:
            decimal_numbers = file.read().strip().split()

        # Check if the file is not empty
        if not decimal_numbers:
            raise ValueError("Text file is empty or does not contain valid decimal numbers")

        # Convert each decimal number to Unicode
        unicode_chars = [decimal_to_unicode(int(num)) for num in decimal_numbers]

        # Write the Unicode characters to the output file
        with open(output_file, "w", encoding='utf-8') as output_file:
            output_file.write("".join(unicode_chars))

        print("Conversion successful. Output saved to", output_file)

    except Exception as e:
        # Raise the exception with additional information
        raise Exception("Error during processing") from e


process_decimal_file("decimal_pixel_data.txt", "output_unicode_chars.txt")
File "/home/runner/TCC-Project-Pt1-Comp/change2.py", line 20, in process_decimal_file
    output_file.write("".join(unicode_chars))
UnicodeEncodeError: 'utf-8' codec can't encode character '\udd01' in position 0: surrogates not allowed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/runner/TCC-Project-Pt1-Comp/change2.py", line 29, in <module>
    process_decimal_file("decimal_pixel_data.txt", "output_unicode_chars.txt")
  File "/home/runner/TCC-Project-Pt1-Comp/change2.py", line 26, in process_decimal_file
    raise Exception("Error during processing") from e
Exception: Error during processing
1 Like

Can you possibly give me your decimal_pixel_data or link your Repl?

EDIT: I found your Repl

Is this a school assignment?

If you don’t know what hashtags indicate, try learning Python. That’ll help.
Hashtags are comments.
Sorry i didn’t see what you meant

2 Likes

Yes, but not in this context. The hashtags were a result of failed conversion, so hashtags were given as placeholders.

1 Like

how would you do that? GIve a few examples. Do you mean the unicode lookalike? A 2 digit base 149000 number can hold 149000^2 types of numbers while a base 10 number (assuming 1 digit) would only hold 10 types of numbers.

1 Like

I tried multiple example decimals from your data and tested them with just my fixed code. Strangely, it worked fine. However, when I tried it with your code (using your process_decimal_file function), it didn’t work. I assume the error might be related to how your decimal data is formatted or how the data is being processed.

1 Like

No, its a personal project.
The hashtags are in a txt document. They don’t mean anything :sweat_smile:
And I do know that you use a hashtag for a commet, but its not:

# cdjhwlfhbafghlalklagblaksjfhdlkasjfhlkjsfghliqbh
# cdjhwlfhbafghlalklagblaksjfhdlkasjfhlkjsfghliqbh
# cdjhwlfhbafghlalklagblaksjfhdlkasjfhlkjsfghliqbh
# cdjhwlfhbafghlalklagblaksjfhdlkasjfhlkjsfghliqbh
# cdjhwlfhbafghlalklagblaksjfhdlkasjfhlkjsfghliqbh
# cdjhwlfhbafghlalklagblaksjfhdlkasjfhlkjsfghliqbh
# cdjhwlfhbafghlalklagblaksjfhdlkasjfhlkjsfghliqbh
# cdjhwlfhbafghlalklagblaksjfhdlkasjfhlkjsfghliqbh

but instead its

#####################################################################################################################################################################

Edit: theres more like a million hashtags in the txt file but dont wanna spam

1 Like

Is there a way to fix that?

1 Like

Correct.
When in the context “base ###” or “base###”
It mean how many possibiltys there are that you can fit in one digit.

Template:
a^b = c || There’s ‘a’ possibilities that a single digit could be (base’a’) and theres ‘b’ digits that could be different, resulting in ‘c’ different possibilities

examples:
2^5 = 32 || There’s 2 possibilities that a single digit could be (base2) and there 5 digits that could be different, resulting in 32 different possibilities
16^6 = 16777216 || There’s 16 possibilities that a single digit could be (base16) and there 6 digits that could be different, resulting in 16777216 different possibilities

1 Like

Here, I tested this, and it worked properly. I also tested it with your exact data. Please let me know if this is the expected result:

def decimal_to_unicode(decimal_number):
  base = 149000
  return "##" if not 0 <= decimal_number < base ** 2 else chr((decimal_number // base) + 44032) + chr((decimal_number % base) + 44032)

def process_file(input_file, output_file):
  with open(input_file, 'r') as file:
      decimal_data = file.read().replace('\n', ' ')

  unicode_characters = ''.join(decimal_to_unicode(int(number)) for number in decimal_data.split())
  utf8_bytes = unicode_characters.encode('utf-8', 'backslashreplace')

  with open(output_file, 'wb') as file:
      file.write(utf8_bytes)

process_file("decimal_pixel_data.txt", "unicode_pixel_data.txt")

Yes! its working fine, but it it okay to ask for some changes?
Id like it in a more readable format:
## ## ## ## ## ## ## ## ## ## ## ## ## ## ## etc…

so then another program could read the file, and turn it back into base10 then base16, and back into a photo.

1 Like

Where it says ''.join(), add a space so it would become ' '.join(). This change should give you the format you want.


HMMMMMM

1 Like

Hmm… try again, maybe the ' ' is causing the output file to be bigger then your Repl can handle. If this happens again, I will look for a workaround.

only happened once

but one thing is strange, instead of a Unicode character, sometimes it just says “\uda5c” or something alike

any ideas why?

1 Like