Check if a list has duplicate elements

Rocko · January 28, 2023, 11:16am

suppose I have a list
num = [1,2,2,2,4,5,66,66,8]
now how do I make python say if there are duplicates in the list print True if not print False
I know that using set will pick out the duplicates and give you the actual numbers like
non_doublicated_num=list(set(num))
in this code, I’m creating a list with the non-duplicated numbers but that’s not what I want I want to know if there are duplicates in my original list called num then python will print true ok?
what I was thinking is to compare the original list(num) with the sorted list (non_doublicated_num)
but how do I compare them as both have the same numbers it’ll never be like both have 1,2 etc so it’ll compare and be true always so my question stands how do I compare my sorted list and unsorted list do I even compare it or do sum else for me to get True if there are duplicates on the list(num) thnx.

MattDESTROYER · January 28, 2023, 11:20am

To see if the list has duplicates (building off what you’ve already found), you could compare the length of the original list, and the list created once you remove duplicates. If the list without any duplicates is shorter than the original, the original obviously had one or more duplicates.

Something like this:

# you also don't need to convert back to a list
# if you're just checking for duplicates
print(len(num) != len(set(num))) # True if duplicates, False if none

Rocko · January 28, 2023, 11:28am

nice, so we’re checking the size of it! instead of comparing the two lists nicely, I’m still curious tho can we just compare two lists and say If there are any duplicates based on the settled and unsettled list without using the size of it is it possible?

UMARismyname · January 28, 2023, 11:29am

If you already have two lists, you could do

print(any(x in lst1 for x in lst2))

MattDESTROYER · January 28, 2023, 11:32am

If I understand correctly, without creating a new list, this is probably hard.

If you can create a new list, another method would be to append items one by one to it and check each element before appending it to see if it is already in the new list.

def duplicates(lst):
	checked = []
	for item in lst:
		if item in checked:
			return True
		checked.append(item)
	return False

Rocko · January 28, 2023, 11:33am

nope no new list creation no checking appending just the settled and unsettled one

UMARismyname · January 28, 2023, 11:34am

only thing you could change here for some efficiency is to make checked a set() and change checked.append to checked.add - sets can be looked up more efficiently because you’re checking against a hash table instead of iterating through a list

seen = set()
for i in nums:
  if i in seen:
    print("Duplicate")
    break
  seen.add(i)

MattDESTROYER · January 28, 2023, 11:34am

If you can’t create a new list, you could use slices of the original list. (I’m not sure if I remember the Python slice indexing thing properly. This is also essentially the same as using the checked list, except that it’s taking a slice each time which is almost definitely not optimal.)

def duplicates(lst):
	for i in range(len(lst)):
		if item in lst[0:i]: # can't remember if this should be [0:i] or [0:i-1]
			return True
	return False

UMARismyname · January 28, 2023, 11:37am

why? It’s the most efficient way to do it and can be implemented easily in other programming languages

Rocko · January 28, 2023, 11:39am

cuz I’m curious & want to know what happens if I don’t get to use a second list as simple as that lol

MattDESTROYER · January 28, 2023, 11:41am

Fair enough lol, but in practice, probably the best method is the first answer I gave you, at least for Python. Technically it might be slower than my second (because the first goes through each element in the list every time, whereas the second can stop before the end of the list, meaning it’s possible to have fewer iterations), but because set is a native function/class/type, I would presume there is some kind of speed benefit so I don’t really know for sure their performance difference (I am a bit intrigued now…). The last answer… yeah, don’t do that, it’s probably slower, it’s almost definitely harder to read, there are plenty of better ways.

bigminiboss · January 28, 2023, 3:22pm

it’s not because of native, it’s because of how it’s stored. Set is auto sorted so all operations are log(n), vs n.

this is the second fastest, there is, however, a faster method using a dictionary, probably, since dictionaries have an O(1) lookup

def duplicate(l1, l2):
    return len(set(l1).intersection(set(l2))) > 0

UMARismyname · January 28, 2023, 5:41pm

Well, it’s more efficient to use a set instead of list in the first place. Converting list to a set iterates through the list anyhow, so it’s not really practical. And then you’re iterating through both the sets even after the first duplicate is found.
And sets use hash tables just like dictionaries, so they have an O(1) lookup.

system · February 4, 2023, 5:42pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.