Hash Generator Using Python

I’ve recently developed a Python script for encoding messages or data using a rather intricate algorithm, I must say. To be transparent, I drew inspiration from certain elements I came across in the Cloudflare Turnstile Captcha files, which are commonly associated with website security measures. My endeavor primarily involved meticulously recreating and adapting these elements into a Python script.

Code:

class HashGenerator:
    @staticmethod
    def _hash_function(data):
        def b(f, g):
            i = (65535 & f) + (65535 & g)
            return ((f >> 16) + (g >> 16) + (i >> 16)) << 16 | (i & 65535)

        def c(d, f):
            return (d << (32 - f)) | (d >> f)

        padding = b'\x80' + b'\x00' * (55 - (len(data) + 8) % 64)
        data += padding + (len(data) * 8).to_bytes(8, byteorder='big')

        w = [0] * 64 

        for i in range(0, len(data), 64):
            for j in range(16):
                w[j] = int.from_bytes(data[i + j * 4:i + j * 4 + 4], byteorder='big')

            for j in range(16, 64):
                s0 = (c(w[j - 15], 7) ^ c(w[j - 15], 18) ^ (w[j - 15] >> 3))
                s1 = (c(w[j - 2], 17) ^ c(w[j - 2], 19) ^ (w[j - 2] >> 10))
                w[j] = (w[j - 16] + s0 + w[j - 7] + s1) & 0xFFFFFFFF

        o = [
            1116352408, 1899447441, 3049323471, 3921009573, 961987163, 1508970993, 2453635748, 2870763221,
            3624381080, 310598401, 607225278, 1426881987, 1925078388, 2162078206, 2614888103, 3248222580,
            3835390401, 4022224774, 264347078, 604807628, 770255983, 1249150122, 1555081692, 1996064986,
            2554220882, 2821834349, 2952996808, 3210313671, 3336571891, 3584528711, 113926993, 338241895,
            666307205, 773529912, 1294757372, 1396182291, 1695183700, 1986661051, 2177026350, 2456956037,
            2730485921, 2820302411, 3259730800, 3345764771, 3516065817, 3600352804, 4094571909, 275423344,
            430227734, 506948616, 659060556, 883997877, 958139571, 1322822218, 1537002063, 1747873779,
            1955562222, 2024104815, 2227730452, 2361852424, 2428436474, 2756734187, 3204031479, 3329325298
        ]

        s = [
            1779033703, 3144134277, 1013904242, 2773480762, 1359893119, 2600822924, 528734635, 1541459225
        ]

        A, B, C, D, E, F, G, H = s[0], s[1], s[2], s[3], s[4], s[5], s[6], s[7]

        for i in range(64):
            S1 = (c(E, 6) ^ c(E, 11) ^ c(E, 25))
            ch = ((E & F) ^ ((~E) & G))
            temp1 = (H + S1 + ch + o[i] + w[i]) & 0xFFFFFFFF
            S0 = (c(A, 2) ^ c(A, 13) ^ c(A, 22))
            maj = ((A & B) ^ (A & C) ^ (B & C))
            temp2 = (S0 + maj) & 0xFFFFFFFF

            H = G
            G = F
            F = E
            E = (D + temp1) & 0xFFFFFFFF
            D = C
            C = B
            B = A
            A = (temp1 + temp2) & 0xFFFFFFFF

        s[0] = (s[0] + A) & 0xFFFFFFFF
        s[1] = (s[1] + B) & 0xFFFFFFFF
        s[2] = (s[2] + C) & 0xFFFFFFFF
        s[3] = (s[3] + D) & 0xFFFFFFFF
        s[4] = (s[4] + E) & 0xFFFFFFFF
        s[5] = (s[5] + F) & 0xFFFFFFFF
        s[6] = (s[6] + G) & 0xFFFFFFFF
        s[7] = (s[7] + H) & 0xFFFFFFFF

        result = ''
        for i in s:
            hex_str = hex(i)[2:]
            result += '0' * (8 - len(hex_str)) + hex_str

        return result

    def generate_hash(self, data):
        return self._hash_function(data)

if __name__ == "__main__":
    hash = HashGenerator()
    data = "Hello World!"
    result = hash.generate_hash(data.encode())
    print(result)
3 Likes

how’s the performance?

3 Likes

For what? for generating hashes?

1 Like

How does this work? Do note that hashes, which differ from encrypted messages, can not be reversed, as hashing is designed to be a one-way method for verifying data.

Also, is the code obfusticated?

4 Likes

This is a robust SHA-256 hash generator that excels in processing data for cryptographic purposes. It begins by accepting data, which is initially encoded into a bytes-like object. To facilitate the subsequent intricate operations, the algorithm employs two essential helper functions, aptly named b and c.

To adhere to the SHA-256 standard, the input data undergoes a comprehensive transformation. Initially, padding is introduced to ensure the data’s length aligns with the required multiple of 64 bytes (equivalent to 512 bits). This padding includes the addition of a single 1 bit at the beginning, followed by an appropriate number of 0 bytes to achieve the prescribed length. Furthermore, the length of the original data in bits, meticulously represented as an 8-byte big-endian integer, is appended to the data’s tail.

The data is then segmented into manageable 64-byte chunks, each constituting 512 bits. Within these chunks, the data is further divided into 16 words, each spanning 32 bits and represented as w. However, beyond the initial 16 words, the algorithm calculates additional words from words 16 to 63 through a sequence of bitwise operations, namely the s0 and s1 calculations. Initial hash values, stored in a list referred to as o, consist of 64 constant values, integral to the core hashing process and critical in the main loop that computes the hash.

Similarly, another list, denoted as s, holds the initial hash values comprising 8 entities, each encompassing 256 bits. These values, pivotal to the hash generation, are predetermined constants.

The main hashing computation unfolds in a loop spanning 64 iterations, mirroring the SHA-256 algorithm’s intricate design. In each iteration, an array of bitwise operations, including shifts, XORs, ANDs, and additions, are meticulously executed on the current hash state, symbolized as A, B, C, D, E, F, G, and H. These variables represent the evolving state of the hash as it progresses through the algorithm. The operations employed are notably complex and integral to faithfully transforming the hash state in accordance with the SHA-256 algorithm’s specifications.

Ultimately, after the completion of all 64 iterations, the culmination of these operations yields the final hash value. This hash value is derived by concatenating the updated state variables (A through H) as hexadecimal strings, resulting in a fixed-size representation that encapsulates a 256-bit digest of the original input data.

At present, the code is not intentionally obscured or made cryptic in any way. If you desire, you have the freedom to make any modifications or alterations as you see fit. It’s worth noting that you may not have an immediate need for this code, and in the event that you decide to employ it in the future, you might opt to tweak the algorithm to ensure that others cannot easily produce a matching hash.

In essence, you have the flexibility to customize and adapt the code to your specific requirements or security needs should the situation demand it. This adaptability allows you to enhance the security and uniqueness of the generated hashes, preventing others from easily replicating them.

EDIT: It took me like 10 minutes to write this and make it “suitable”. :sob:

2 Likes

wait i just realized does that mean that the code is an implementation of the hash function? if so, then i apologize for any misinterpretation and would also like to congratulate your acheivement.

2 Likes

Yeah, it can be described as a Python implementation inspired by a rather random discovery of a Cloudflare turnstile captcha JavaScript hash mechanism.

1 Like

Like, how long does it take to hash an arbitrary string compared to python’s default string hash?

3 Likes

The time it takes to hash an arbitrary string in Python compared to Python’s default string hash can vary depending on several factors, including the hashing algorithm used and the size of the string.

1 Like

If you can, can you include data of time comparisons for various algorithms, such as the ones in hashlib, for hashing randomly generated strings for a range of sizes?

3 Likes

I might be able to, may I inquire as to why you’re interested in this possibility?

1 Like

Pure python code is generally slow. The built-in hashes in python are often implemented in faster languages so they are much more performant and therefore more practical.

2 Likes

I can make some example code, that’s if you’d like.

import hashlib
import timeit
import random
import string

def generate_random_string(size):
    return ''.join(random.choice(string.ascii_letters) for _ in range(size))

hash_algorithms = ['md5', 'sha1', 'sha256', 'sha512']
string_sizes = [10, 100, 1000, 10000, 100000, 1000000]

for algorithm in hash_algorithms:
    for size in string_sizes:
        setup_code = f'''
import hashlib
import random
import string

def generate_random_string(size):
    return ''.join(random.choice(string.ascii_letters) for _ in range({size}))

text = generate_random_string({size})
hash_obj = hashlib.{algorithm}()
'''
        execution_time = timeit.timeit('hash_obj.update(text.encode())\nresult = hash_obj.hexdigest()', setup=setup_code, number=1000)
        print(f'Algorithm: {algorithm}, String Size: {size}, Average Time: {execution_time:.6f} seconds')

I’ve just made this code, which dynamically cycles through the MD5, SHA-1, SHA-256, and SHA-512 hashing algorithms. It then proceeds to generate random strings with varying sizes, specifically 10 characters, 1,000 characters, 10,000 characters, 100,000 characters, and 1,000,000 characters. Subsequently, it computes and displays the average execution time for each combination of algorithm and string size (feel free to change the length/size & algorithms).

1 Like

I noticed a potential performance issue in your code.

result = ''
for i in s:
    hex_str = hex(i)[2:]
    result += '0' * (8 - len(hex_str)) + hex_str

return result

It is bad practice in python to repeatedly concatenate strings because it may lead to quadratic time complexity.

A much better option is to use str.join

results = []
for i in s:
    hex_str = hex(i)[2:]
    results += ['0' * (8 - len(hex_str)), hex_str]

return ''.join(results)
1 Like

The join() method serves the purpose of combining elements from lists of strings, which is distinct from our current scenario. In this context, there is no need for such a concatenation operation because we are working with a singular string variable, namely result, which already encompasses the concatenated hash values.

EDIT: I could be wrong in some areas, don’t get mad at me.

1 Like

I realize that s is actually just a list of 8 (I was confused because of the bad variable name) so it isn’t much of a performance issue. However, it is best practice to use str.join here.
In your original code, you are concatenating a multiplied ‘0’ string and hex_str to result, 8 times. Since this has to create a new string object for each concatenation, it is slower than str.join which only has to create a single string object. Ultimately, my provided code gives the same result as your code but faster if I did not make any mistakes.

In addition, there are other small optimizations or code quality fixes that you can implement such as constants (global or class variable instead of local) for certain values and the renaming of some variables.

1 Like

Small fix: python unpacking for A through H variables:

A, B, C, D, E, F, G, H = s
2 Likes

Indeed, I created this without giving attention to optimization, although I acknowledge that, if desired, I could refine and optimize it as per your suggestions. My primary goal was to replicate something I had observed to the best of my ability within a limited time-frame.

1 Like

That is good. In my opinion, replicating an algorithm from another language should still implement best practices for the language used, which is still entirely possible with minimal time loss.

2 Likes