Need To Remove Duplicates From A Nested List Preserving The Order
Solution 1:
The unique_everseen
function in the itertools
recipes in the docs does exactly what you need:
>>> lst = [[1,2],[1,3],[1,4],[2,1],[2,5],[3,1],[3,2]]
>>> list(unique_everseen(lst, key=frozenset))
[[1, 2], [1, 3], [1, 4], [2, 5], [3, 2]]
The basic idea is that it keeps a set of all values seen so far, and skips any value that's already in the set.
The key
function works the same way as in sort
, max
, etc., as explained in the Sorting HOWTO. You want to make two lists that have the same values in a different order match, so we need to compare the set of each list's values, instead of the lists themselves. (The reason we need frozenset
instead of set
is that set
is mutable, and therefore can't be stored in a set.)
If you had more than 2 elements in your sublists, the question would be ambiguous. If you have, say, [1, 1, 2]
and [1, 2, 2]
, do you want them to be considered duplicates, or not?
- If yes: Then you're treating them a set, so use
key=frozenset
. - If no: Then you're treating them as a multiset. The nicest Python implementation of multisets is
collections.Counter
, but there's noFrozenCounter
(and building one just for this purpose is probably overkill). You can simulate one in a few ways:key=lambda sublist: frozenset(Counter(sublist).items())
key=lambda sublist: sorted(Counter(sublist).items())
key=lambda sublist: tuple(sorted(sublist))
Since your initial thought was to sort the sublists—which was only unacceptable because you need to end up with the original value, not the sorted value—I think the last of these options is most likely to be the one you'd want, but that's still really just a guess.
You can just copy and paste the function from the docs into your code:
from itertools import *
defunique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."# unique_everseen('AAAABBBCCDAABBB') --> A B C D# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key isNone:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k notin seen:
seen_add(k)
yield element
… or install the third-party library more_itertools
and use its unique_everseen
it from there. Or the different third-party library toolz
has an equivalent function named unique
.
Solution 2:
This works similar to removing duplicates from a regular list whilst preserving order, but we need to do something about the sublists not being hashable and the order of the sublists being irrelevant.
We can use frozen sets to solve both these issues at once.
>>>lst = [[1,2],[1,3],[1,4],[2,1],[2,5],[3,1],[3,2]] >>>seen = set()>>>result = []>>>>>>for x in lst:... s = frozenset(x)...if s notin seen:... result.append(x)... seen.add(s)...>>>result
[[1, 2], [1, 3], [1, 4], [2, 5], [3, 2]]
Post a Comment for "Need To Remove Duplicates From A Nested List Preserving The Order"