在一个序列上面保持元素顺序的同时消除重复的值。
看到这个标题,是不是会想到使用set()
,但是set()生成的结果顺序会被打乱,达不到保持顺序的目的。
如果序列上的值都是 hashable
类型,那么可以很简单的利用集合或者生成器来解决这个问题。
hashlib定义如下
An object is hashable if it has a hash value which never changes
during its lifetime (it needs a hash() method), and can be
compared to other objects (it needs an eq() or cmp() method).
Hashable objects which compare equal must have the same hash value.Hashability makes an object usable as a dictionary key and a set
member, because these data structures use the hash value internally.All of Python’s immutable built-in objects are hashable, while no
mutable containers (such as lists or dictionaries) are. Objects which
are instances of user-defined classes are hashable by default; they
all compare unequal, and their hash value is their id().
写法如下
def dedupe(items):
seen = set()
for item in items:
if item not in seen:
yield item # 把这函数变成一个generator
seen.add(item) # 把有效元素加入集合
如果一个函数带了yield
,那么他就变成了一个generator
,generator
在合适的场景内可以极大程度的节省内存,不带yield的写法如下:
def example(items):
seen = set()
for item in items:
if item not in seen:
seen.add(item)
return seen
调用测试及其返回
>>> a = [1, 3, 4, 5, 4, 8, 9, 1]
>>> print(dedupe(a), '->', list(dedupe(a)))
>>> print(example(a))
OUTPUT
<generator object dedupe at 0x10d734db0> -> [1, 3, 4, 5, 8, 9]
{ 1, 3, 4, 5, 8, 9}
关于yield更具体的用法可以参考此链接。
如果你想消除的元素不可哈希,将代码修改为以下的写法即可支持。
def dedupe(items, key=None):
seen = set()
for item in items:
val = item if key is None else key[item]
if val not in seen:
yield item
seen.add(val)
调用测试及其返回
>>> a = [ { 'x':1, 'y':2}, { 'x':1, 'y':3}, { 'x':1, 'y':2}, { 'x':2, 'y':4}]
>>> list(dedupe(a, key=lambda d: (d['x'],d['y'])))
>>> list(dedupe(a, key=lambda d: d['x']))
OUTPUT
[{ 'x': 1, 'y': 2}, { 'x': 1, 'y': 3}, { 'x': 2, 'y': 4}]
[{ 'x': 1, 'y': 2}, { 'x': 2, 'y': 4}]
使用生成器函数让我们的函数更加通用,不仅仅是局限于列表处理。 比如,如果如果你想读取一个文件,消除重复行,你可以很容易像这样做:
with open(somefile,'r') as f:
for line in dedupe(f): ...
本文地址:https://blog.csdn.net/WSH_ONLY/article/details/110250631