在一个序列上面保持元素顺序的同时消除重复的值。

看到这个标题,是不是会想到使用set() ,但是set()生成的结果顺序会被打乱,达不到保持顺序的目的。

如果序列上的值都是 hashable 类型,那么可以很简单的利用集合或者生成器来解决这个问题。

hashlib定义如下

An object is hashable if it has a hash value which never changes
during its lifetime (it needs a hash() method), and can be
compared to other objects (it needs an eq() or cmp() method).
Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set
member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no
mutable containers (such as lists or dictionaries) are. Objects which
are instances of user-defined classes are hashable by default; they
all compare unequal, and their hash value is their id().

写法如下

def dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item  # 把这函数变成一个generator
            seen.add(item)  # 把有效元素加入集合

如果一个函数带了yield,那么他就变成了一个generatorgenerator在合适的场景内可以极大程度的节省内存,不带yield的写法如下:

def example(items):
    seen = set()
    for item in items:
        if item not in seen:
            seen.add(item)
    return seen

调用测试及其返回

>>> a = [1, 3, 4, 5, 4, 8, 9, 1]
>>> print(dedupe(a), '->', list(dedupe(a)))
>>> print(example(a))

OUTPUT
<generator object dedupe at 0x10d734db0> -> [1, 3, 4, 5, 8, 9]
{ 1, 3, 4, 5, 8, 9}

关于yield更具体的用法可以参考此链接。

如果你想消除的元素不可哈希,将代码修改为以下的写法即可支持。

def dedupe(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key[item]
        if val not in seen:
            yield item
            seen.add(val)

调用测试及其返回

>>> a = [ { 'x':1, 'y':2}, { 'x':1, 'y':3}, { 'x':1, 'y':2}, { 'x':2, 'y':4}]
>>> list(dedupe(a, key=lambda d: (d['x'],d['y'])))
>>> list(dedupe(a, key=lambda d: d['x']))

OUTPUT
[{ 'x': 1, 'y': 2}, { 'x': 1, 'y': 3}, { 'x': 2, 'y': 4}]
[{ 'x': 1, 'y': 2}, { 'x': 2, 'y': 4}]

使用生成器函数让我们的函数更加通用,不仅仅是局限于列表处理。 比如,如果如果你想读取一个文件,消除重复行,你可以很容易像这样做:

with open(somefile,'r') as f:
	for line in dedupe(f): ...

本文地址:https://blog.csdn.net/WSH_ONLY/article/details/110250631