Python中的yield关键字

2017-12-30

原文

翻译

To understand what yield does, you must understand what generators are. And before generators come iterables.

为了理解yield的作用，我们必须先理解生成器，而在生成器之前必须先理解迭代器

迭代器

When you create a list, you can read its items one by one. Reading its items one by one is called iteration:
1
2
3
4
5
6
>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

当我们创建一个list，我们只能一个个读取list，而挨个读取的这个过程就被称为迭代

mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:
1
2
3
4
5
6
>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

mylist就是一个迭代器，当我们使用列表推导时，也就创造了一个list，或者说是一个迭代器

Everything you can use "for... in..." on is an iterable; lists, strings, files...

These iterables are handy because you can read them as much as you wish, but you store all the values in memory and this is not always what you want when you have a lot of values.

我们可以在这个基础上使用迭代器，例如lists，strings，files...

这些迭代器看起来很方便，想读几次就可以读几次，但是这些数据是要存储在内存上的，当数据量很大时，结果并不总是如我们所愿。

生成器

Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:
1
2
3
4
5
6
>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4
It is just the same except you used () instead of []. BUT, you cannot perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

生成器也是一种迭代器，不过只能迭代一次。生成器并不会将数据存储在内存中，它们一直在生成这些数据

以上代码除了使用()来替代[]并没什么区别，但是由于生成器只能被使用一次，因此不可以使用for in：它们会依次输出0然后遗忘，接下来依次输出1，4

Yield

yieldis a keyword that is used likereturn, except the function will return a generator.

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

yield关键字除了函数返回的是个生成器，用起来很像return

Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky :-)

Then, your code will be run each time the for uses the generator.

Now the hard part:

The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each other call will run the loop you have written in the function one more time, and return the next value, until there is no value to return.

The generator is considered empty once the function runs but does not hit yield anymore. It can be because the loop had come to an end, or because you do not satisfy an "if/else" anymore.

这是一个实际用处不大的示例，但是当你的函数将会返回一个只需要读取一次的很大的数据集合时，这个会非常方便

为了掌握yield，我们必须理解当我们调这个用函数时，我们写在函数体里的代码并不会运行。这个函数只会返回一个生成器对象，听上去有点tricky：）

然后，我们的代码每次将会通过对生成器执行for循环来调用。

接下来就是相对困难的地方了：

for第一次调用由函数生成的生成器对象时，它将从函数的最开始一直运行到yield处，然后返回循环的第一个值。之后，每次调用都会运行一次函数里的循环，并返回下一个值，循环往复直到没有值可以返回为止。

一旦函数运行时不再遇到yield，生成器就被认为是空的了。这可能是因为函数已经循环结束了，也可能是因为不再满足if/else条件了。

Your code explained

Generator:

# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):

    # Here is the code that will be called each time you use the generator object:

    # If there is still a child of the node object on its left
    # AND if distance is ok, return the next child
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild

    # If there is still a child of the node object on its right
    # AND if distance is ok, return the next child
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild

    # If the function arrives here, the generator will be considered empty
    # there is no more than two values: the left and the right children

Caller:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidates list
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

接下来，我们通过原问题的代码来解释说明

This code contains several smart parts:

The loop iterates on a list but the list expands while the loop is being iterated :-) It's a concise way to go through all these nested data even if it's a bit dangerous since you can end up with an infinite loop. In this case, candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) exhausts all the values of the generator, but while keeps creating new generator objects which will produce different values from the previous ones since it's not applied on the same node.

The extend() method is a list object method that expects an iterable and adds its values to the list.

这份代码包含了几个很聪明的地方：

循环在list上迭代，但是list只在循环迭代时时才会expands :-)，这是种很简约的方法，遍历所有这些嵌套值，即使因为这可能会造成无限循环而看起来有点危险。在这种情况下，candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))，这条语句会耗尽生成器中的值，但是while会持续创建新的生成器对象，这些生成器对象将会产生和之前不一样的值，因为它们并不是应用在同一种模式
extend()方法方法是个list对象方法，它需要迭代器作为输入参数，并将值加入到list。

Usually we pass a list to it:

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

通常我们用一个list作为输入

But in your code it gets a generator, which is good because:

You don't need to read the values twice.

You may have a lot of children and you don't want them all stored in memory.

在原问题代码中，得到的却是一个生成器，这样做的好处在于

数据不需要读取两次
可能有很多的children，而且也并不想把它们都存放在内存中

And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples and generators! This is called duck typing and is one of the reason why Python is so cool. But this is another story, for another question...

You can stop here, or read a little bit to see an advanced use of a generator:

这可以生效，因为Python并不关心方法的参数是不是一个list，Python希望的是一个可以迭代的，包括string、list、tuple，甚至生成器也都可以！这被称为鸭子类型，同时也是Python为什么这么cool的原因之一，当然这是另一个故事了，涉及另一个问题了

你可以在这里停下，去了解点生成器的一种高级用法：

控制一个生成器的消耗

>>> class Bank(): # let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # when everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # it's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

Note: For Python3 useprint(corner_street_atm.__next__()) or print(next(corner_street_atm))

It can be useful for various things like controlling access to a resource.

注意：在Python3中，使用print(corner_street_atm.__next__()) or print(next(corner_street_atm))

这对于类似控制资源的访问这样的变化的事情挺有用

itertools，你最好的朋友

The itertools module contains special functions to manipulate iterables. Ever wish to duplicate a generator? Chain two generators? Group values in a nested list with a one liner? Map / Zipwithout creating another list?

Then just import itertools.

An example? Let's see the possible orders of arrival for a 4 horse race:

itertools模块包含操作迭代器的特别函数，是否曾想过复制生成器？链接两个生成器呢？将嵌套列表中的归类为一个线性的？不用创建另一个list就可以map/zip？

接下来import itertools就可以了。

需要举例说明？我们来看看4马赛马一种可能的到达顺序吧：

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

理解迭代的内部机制

Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method). Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.

More about it in this article about how does the for loop work.

迭代是一个隐含了可迭代对象（实现了__iter__()方法）以及迭代器（实现了_next()方法），可迭代对象是你能得到迭代器的任何对象，迭代器则是可以让你在迭代对象上迭代的对象。

更多内容阅读文章 how does the for loop work