How Could I Speed Up My Program?

Question:
I’m working on a face recognition software in python, but currently the process of scanning a 3 second long video takes 2 minutes, and I’d like to speed that process up by a long run. If you were to run this code currently, it would not work because there is no data for the program to use to compare faces. Any ideas on how I could speed this up?
Anyways, here is the code:

def ai_face_recog_from_video(video_path):
    people = []
    # Capture video
    video_capture = cv2.VideoCapture(video_path)
    if not video_capture.isOpened():
        raise FileNotFoundError(f'Video not found: {video_path}')

    # Initialize variables
    recog_model = {}
    frame_count = 0

    # Load recognition model only once
    for subfolder in os.listdir('recog_model'):
        subfolder_path = os.path.join('recog_model', subfolder)
        if os.path.isdir(subfolder_path):
            recog_model[subfolder] = [
                face_recognition.face_encodings(face_recognition.load_image_file(os.path.join(subfolder_path, item)))[0]
                for item in os.listdir(subfolder_path)
                if os.path.isfile(os.path.join(subfolder_path, item))
                and face_recognition.face_encodings(face_recognition.load_image_file(os.path.join(subfolder_path, item)))
            ]

    while True:
        # Grab a single frame of video
        ret, frame = video_capture.read()
        if not ret:
            break  # End of video

        # Detect faces and generate encodings
        face_locations = face_recognition.face_locations(frame)
        face_encodings = face_recognition.face_encodings(frame, face_locations)

        # Process detected faces and save them
        for face_encoding in face_encodings:
            matches = {
                label: np.mean(face_recognition.face_distance(face_images_encodings, face_encoding))
                for label, face_images_encodings in recog_model.items()
                if face_images_encodings
            }
            if matches:
                best_match = min(matches, key=matches.get)
                if matches[best_match] < 0.5:  # Adjustable threshold for matching similarity
                    if not(best_match in people):
                        people.append(best_match)

        frame_count += 1

    video_capture.release()
    print(f'Processed {frame_count} frames and {len(people)} face(s).')
    return people

faces = ai_face_recog_from_video('test1.mp4')
print(f"People in video: {', '.join(faces)}")

Can you use time.perf_counter() to show how long each bit of the program takes? Then it’d be easier to optimise.

Also, you don’t need to check every frame, because faces tend to appear for more than one frame. I think concurrency will also speed this up.

@Classfied3D Got it.
Here’s the two main chunks of my function and how long the took:

Getting the data from the model: 14.184130800000275 seconds
Scanning each frame for faces: 182.9708638000002 seconds


I’ve thought about that, but what should I do? Get every other frame? I’ve also thought about using threading to get multiple frames at a time, but I’m not sure if (and how) that would work.
It also would be great if I could find a way to speed up getting the model data (14 seconds is still a lot!), but I’m really not sure for that one.

Would video_capture.read(frame) work?

@Classfied3D Yes, I know that, but I’m wondering if it would really help that much. Theoreetically, yes it should cut the time in half, but will that really happen?


Honestly, not sure. I’ll do some research. But, when would I use that, and for what?

Is it reading a frame or scanning the frame that takes longer?

I’ll check. Give me a second.

Whoops, accidentally marked it as solution while trying to reply :man_facepalming:


@Classfied3D Here’s the data:
Generating face encodings: 1.6843520999996144 seconds
Processing detected face: 0.00035419999994701357 seconds
It can vary, but by such a miniscule amount that it doesn’t even matter lol

I found this code in an answer to get a specific frame, so I guess you would increment frame_number by 5 and then assign threads to them.

video_capture.set(cv2.CAP_PROP_POS_FRAMES, frame_number-1)
res, frame = video_capture.read()

So, are you suggesting adding video_capture.set(cv2.CAP_PROP_POS_FRAMES, frame_number-1) right before res, frame = video_capture.read()?

Yes, with a frame number variable that increments by something like 5, but if you need more performance, you should have it as a function, and assign threads to it.

@Classfied3D Understood. Logically, I’d set up a threading process that would run a function to get a frame and scan it for faces, just run that function multiple times one different frames at once. Any feedback on this?


For some reason, after changing my function to this:

def ai_face_recog_from_video(video_path):
    people = []
    # Capture video
    video_capture = cv2.VideoCapture(video_path)
    if not video_capture.isOpened():
        raise FileNotFoundError(f'Video not found: {video_path}')

    # Initialize variables
    recog_model = {}
    frame_count = 0

    # Load recognition model only once
    for subfolder in os.listdir('recog_model'):
        subfolder_path = os.path.join('recog_model', subfolder)
        if os.path.isdir(subfolder_path):
            recog_model[subfolder] = [
                face_recognition.face_encodings(face_recognition.load_image_file(os.path.join(subfolder_path, item)))[0]
                for item in os.listdir(subfolder_path)
                if os.path.isfile(os.path.join(subfolder_path, item))
                and face_recognition.face_encodings(face_recognition.load_image_file(os.path.join(subfolder_path, item)))
            ]
    while True:
        # Grab a single frame of video
        video_capture.set(cv2.CAP_PROP_POS_FRAMES, frame_count-1)
        ret, frame = video_capture.read()
        if not ret:
            break  # End of video

        # Detect faces and generate encodings
        face_locations = face_recognition.face_locations(frame)
        face_encodings = face_recognition.face_encodings(frame, face_locations)

        # Process detected faces and save them
        for face_encoding in face_encodings:
            matches = {
                label: np.mean(face_recognition.face_distance(face_images_encodings, face_encoding))
                for label, face_images_encodings in recog_model.items()
                if face_images_encodings
            }
            if matches:
                best_match = min(matches, key=matches.get)
                if matches[best_match] < 0.5:  # Adjustable threshold for matching similarity
                    if not(best_match in people):
                        people.append(best_match)
        frame_count += 5

    video_capture.release()
    print(f'Processed {frame_count} frames and {len(people)} face(s).')
    return people

It processes 100 frames, while it used to be processing 96 frames. Why do you think it is getting longer instead of shorter? I’m honestly stumped.

Make a list of threads, then run some code like this to start them:

for t in threads:
    t.start()

for t in threads:
    t.join()

Make all the threads put their frames in a list (include the frame number if you want them to be ordered)

Use a different variable to frame_count to index the frames.


It says its processing 100 frames but its actually processing 20 frames. Its because that variable is also being used for output.

So, (for example) I start 5 different threads, each of them have a list of all of the frames that they should do all of the decided frames. They add the faces found to a global list that contains all of the faces found and after all of the threads are done, it returns all of the people. Was that what you were thinking (or something of the sort) or do you have a more efficient proposal?


After skipping frames, it now takes 38.67741719999958 seconds to complete, which is 4.73068982 times faster. Yay! Now using threads should make it even faster. :smile:

Yep, that was what I had in mind. You can also increase how many frames are checked to increase performance, but that comes out of accuracy (it’s a good trade-off though).

@Classfied3D Great! I gtg, but once I am back, I’ll try to setup a threading process to increase performance even more!


Yeah, I know.

1 Like

Threading does not make a program faster because it will still only use one processor. (have you read the docs for the module?)
For people that only have the default free resources (half a core), there is no way to use multiprocessing or threading to speed up your prgoram.

If you have multiple cores, use something like multiprocessing module.
See Concurrent Execution — Python 3.12.1 documentation
But according to google searches, there should be builtin multiprocessing for the face_recognition library which you could use.

Perhaps look at these?

1 Like

@RedCoder is on core, and more processing power is given (multiple vCPUs) when actively in the repl on core. (What a well named plan :laughing:)

But yes, I should have referred to processes rather than threads (because that’s what they’re called in python).

Interesting! I’ll look into it! By the way, I’m running this on my computer (on VS Code).