0%

python iterator and generator

Iterable && Iterator

  • 可迭代对象(iterable),只定义了__iter__方法; 字符串、列表、元组、字典、文件;可以通过iter(iterable)方法获取iterator对象,也可以通过list(iterable) for xxx in iterable间接调用__iter__方法

  • 迭代器(iterator), Iteration Protocol: 定义了__iter____next__两个方法,__iter__返回迭代器本身(用于for loop),__next__方法返回下一个元素,如果没有元素了,抛出StopIteration异常; for python2, use next; for python3, use __next__

    iterator = iter(l) #
    iterator2 = l.__iter__() 
    
    list(l)
    for xxx in l:
    

yrange

例子1:iterable和iterator是同一个对象。

y = iterable()
list(y) 
list(y)
for i in y

只有第一次输出所有值;后续输出未空。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class yrange:
def __init__(self, n):
self.i = 0
self.n = n

def __iter__(self):
print("__iter__1")
return self

def next(self):
if self.i < self.n:
i = self.i
self.i += 1
return i
else:
raise StopIteration()

当使用list(iterable)的时候,会调用iterable的__iter__方法返回iterator

output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
y = yrange(3)
0
y = yrange(3)
1
y = yrange(3)
2
y = yrange(3)
StopIteration

list(yrange(3))
__iter__1
[0, 1, 2]

sum(yrange(3))
__iter__1
3


y = yrange(3)

list(y)
__iter__1
[0, 1, 2]

list(y)
__iter__1
[]

y = yrange(3)

list(y.__iter__())
__iter__1
__iter__1
[0, 1, 2]

list(y.__iter__())
__iter__1
__iter__1
[]

zrange

例子2:iterable和iterator是不同对象。

z = iterable()
list(z)
list(z)
for i in z

调用N次,都会输出所有值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class zrange:
def __init__(self, n):
self.n = n

def __iter__(self):
print("__iter__1")
return zrange_iter(self.n)

class zrange_iter:
def __init__(self, n):
self.i = 0
self.n = n

def __iter__(self):
print("__iter__2")
# Iterators are iterables too.
# Adding this functions to make them so.
return self

def next(self):
if self.i < self.n:
i = self.i
self.i += 1
return i
else:
raise StopIteration()

output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
z = zrange(3)
list(z)
__iter__1
[0, 1, 2]

list(z)
__iter__1
[0, 1, 2]

z = zrange(3)
list(z.__iter__())
__iter__1
__iter__2
[0, 1, 2]

list(z.__iter__())
__iter__1
__iter__2
[0, 1, 2]

Generator

Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.

use the word “generator” to mean the genearted object and “generator function” to mean the function that generates it.

generator也是一个iterator。 Generator functions简化了iterator的创建。只需要yield就可以代替实现iterator的__iter__next方法。

1
2
3
4
5
def yrange(n):
i = 0
while i < n:
yield i
i += 1

output

1
2
3
4
5
6
7
8
9
10
11
12
13
>>> y = yrange(3)
>>> y
<generator object yrange at 0x401f30>
>>> y.next()
0
>>> y.next()
1
>>> y.next()
2
>>> y.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

How to work

When a generator function is called, it returns a generator object without even beginning execution of the function.
When next method is called for the first time, the function starts executing until it reaches yield statement. The yielded value is returned by the next call.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
>>> def foo():
... print "begin"
... for i in range(3):
... print "before yield", i
... yield i
... print "after yield", i
... print "end"
...
>>> f = foo() # 不执行任何语句,返回generator object
>>> f.next() # 执行语句直到yield,返回结果
begin
before yield 0
0
>>> f.next() # 从上一次yield语句的下一句开始执行语句直到再次到达yield,返回结果
after yield 0
before yield 1
1
>>> f.next() # 从上一次yield语句的下一句开始执行语句直到再次到达yield,返回结果
after yield 1
before yield 2
2
>>> f.next() # 从上一次yield语句的下一句开始执行语句,由于没有再次到达yield所以抛出StopIteration异常
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>

再看一下例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def integers():
"""Infinite sequence of integers."""
i = 1
while True:
yield i
i = i + 1

def squares():
for i in integers():
yield i * i

def take(n, seq):
"""Returns first n values from the given sequence."""
seq = iter(seq)
result = []
try:
for i in range(n):
result.append(seq.next())
except StopIteration:
pass
return result

print take(5, squares()) # prints [1, 4, 9, 16, 25]

Generator Expressions

Generator Expressions are generator version of list comprehensions.
They look like list comprehensions, but returns a generator back instead of a list.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
a = [x for x in range(5)]
a
[0, 1, 2, 3, 4]


a = (x*x for x in range(5))
a
<generator object <genexpr> at 0x0000000005232630>

sum(a)
10

sum((x*x for x in range(10)))
#如果只有一个参数,generator expression的()可以省略
sum(x*x for x in range(10))

pyt = ((x, y, z) for z in integers()
for y in xrange(1, z)
for x in range(1, y)
if x*x + y*y == z*z)

pyt
<generator object <genexpr> at 0x0000000005232828>

take(5,pyt)
[(3, 4, 5), (6, 8, 10), (5, 12, 13), (9, 12, 15), (8, 15, 17)]

Example: Reading multiple files

python提供的file对象就是一个iterator对象

1
2
3
f = open('./1.txt')
f.next()
f.next()

使用generator简化代码
old

1
2
3
4
5
6
7
8
9
10
def cat(filenames):
for f in filenames:
for line in open(f):
print line,

def grep(pattern, filenames):
for f in filenames:
for line in open(f):
if pattern in line:
print line,

new with generator

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def readfiles(filenames):
for f in filenames:
for line in open(f):
yield line

def grep(pattern, lines):
return (line for line in lines if pattern in line)

def printlines(lines):
for line in lines:
print line,

def main(pattern, filenames):
lines = readfiles(filenames)
lines = grep(pattern, lines)
printlines(lines)

Itertools

1
2
3
4
5
6
7
8
9
10
11
12
import itertools
it1 = iter([1, 2, 3])
it2 = iter([4, 5, 6])
for v in itertools.chain(it1, it2):
print v

for x, y in itertools.izip(["a", "b", "c"], [1, 2, 3]:
print x, y

#a 1
#b 2
#c 3

Reference

History

  • 20181029: created.