Python101: 19. Iterators

post thumb
Python
by Admin/ on 17 Jul 2021

Python101: 19. Iterators


1 Concept introduction


In previous tutorials, we’ve touched on some typical for statements, such as.

list_example = [0, 1, 2, 3, 4]
for i in list_example:
    print(i)

# 0
# 1
# 2
# 3
# 4

By simply using the for and in keywords, we can easily implement the tedious traversal operations in C. By contrast, to achieve the same functionality in C, one would write (assuming the existence of the integer array list_example)

int i;
for(i = 0; i < list_length; i++)
    printf("%d\n", list_example[i]);

It is clear that Python is much more intuitive, elegant, and concise when it comes to iterating over elements; this is because Python uses the concept of “iterators” just right when implementing the for statement.

Iterators are found everywhere in Python and have a uniform standard. By using iterators, Python can access each element of the list list_example one by one.

Let’s further discuss the relevant mechanisms below.

2 Definition and Principle


2.1 Definition of an iterator

An iterator is an interface that can be traversed in a container, encapsulating the internal logic for the user.

The above is a broad definition of “iterator” as far as we can find.

Specifically in Python, iterators are one of the built-in standard classes, and are on the same level as the “sequences” we’ve studied before.

For the iterator object itself, it needs to have __iter__() and [__next__()](https://docs.python.org/3/ library/stdtypes.html#iterator.next), which are collectively called the “iterator protocol”. That is, if both methods are present, the Python interpreter considers the object to be an iterator; conversely, if only one or neither method is present, the interpreter considers the object not to be an iterator.

The above assertion can be verified by the following code (which requires the built-in function isinstance() to determine whether an object is an instance of a class; this usage is inspired by [Xuefeng Liao’s official website]]).

from collections import Iterable, Iterator, Container
class bothIterAndNext:
    def __iter__(self):
    pass
    def __next__(self):
    pass

isinstance(bothIterAndNext(), Iterable) # objects that both methods have are iterable
# True

isinstance(bothIterAndNext(), Iterator) # The object that both methods have is an iterator
# True

class onlyNext:
    def __next__(self):
    pass

isinstance(onlyNext(), Iterable) # Only method __next__() is not iterable
# False

isinstance(onlyNext(), Iterator) # Only method __next__() is not an iterator
# False

class onlyIter:
    def __iter__(self):
    pass

isinstance(onlyIter(), Iterable) # Only method __iter__() is iterable
# True

isinstance(onlyIter(), Iterator) # Only method __iter__() is not an iterator
# False

As you can see from lines 8-11, for Python, the only criterion for determining whether an object is an iterator is “whether it has both __iter__() and __next__() methods”.

And the above inference can also be verified from lines 17-20: the method __next__() is neither iterable nor an iterator.

Something interesting happens on lines 26 and 27 of the code: the output of the code shows that only the object of method __iter__() is actually iterable! (explained later)

2.2 The essence of iterators

The iterator object essentially represents a stream of data, and by repeatedly calling its method __next__() or passing it as an argument to the next() function, each item in the stream is returned one by one in order; until there are no more items in the stream, which throws a StopIteration exception and terminates the iteration.

There are two built-in functions in Python: iter() and next(), which are used to “convert argument objects to iterator objects” and “take the next item from the iterator” respectively.

In fact, all objects with method __iter__() are treated as “iterable”. Because the operation performed by method __iter__() actually returns an iterator corresponding to that object, that is, the real meaning of “iterable” is actually “iterator that can be converted to “. The built-in function iter() also calls the __iter__() method of the object itself to convert a particular object to an iterator.

Accordingly, the built-in function next() actually calls the object’s own method __next__(), which performs the operation of taking the next item from the object’s corresponding data stream.

So calling the object’s __iter__() and __next__() methods directly is equivalent to passing the object as an argument to the built-in functions iter() and next().

One thing to note is that calling the __iter__() method on an iterator will result in the iterator itself, and all the state associated with that iterator will be preserved, including the current iteration state of that iterator. See the following code.

li = [1, 2, 3]
li_iterator = iter(li)
isinstance(li, Iterator)
# False
isinstance(li_iterator, Iterator)
# True

Obviously, the list li itself is not an iterator, and passing it into the built-in function iter() yields the corresponding iterator li_iterator for the list li. We call the next() function to iterate over it.

next(li_iterator)
# 1
next(li_iterator)
# 2

Everything is as expected. Let’s again pass itself as an argument to the built-in function iter().

li_iterator = iter(li_iterator)
next(li_iterator)
# 3

Here’s where it gets a little different than we’d like. When using such a statement, the goal is usually to get a new iterator, not the same object as the original iterator.

Further, we can see that the object obtained by calling the iter() function on the iterator not only has the same state as the original iterator, but they actually point to the same object.

id(li_iterator)
# 2195581916440
li_iterator = iter(li_iterator)
id(li_iterator)
# 2195581916440
li_iterator2 = iter(li_iterator)
id(li_iterator2)
# 2195581916440

That is, in the case of an object that is itself an iterator, Python does not perform additional operations on the corresponding iterator when it is generated, but returns the iterator itself as the result.

3 Implement an iterator class


The code to build the classes in this section is from [Python3 Documentation - Classes - 9.8 Iterators]

With the above discussion in mind, we can implement a simple iterator ourselves. Just make sure that this simple iterator has a behavior that matches the definition of the iterator.

In human terms: to define a data type that has a __iter__() method and that method returns an object with a __next__() method, or itself when the class already has a __next__() method. The sample code is as follows.

class Reverse:
    """Iterator that iterates over sequence objects in reverse."""
    def __init__(self, data):
        self.data = data
        self.index = len(data)

    def __iter__(self):
        return self

    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]

Validate.

rev = Reverse('justdopython.com')
next(rev)
# 'm'
next(rev)
# 'o'
next(rev)
# 'c'
next(rev)
# '.'

(o゜▽゜)o☆

Mission accomplished!

4 for statements and iterators


Going back to the for loop example we used as an introduction at the beginning of the article, Python actually silently calls the built-in function iter() when executing the for statement, and passes in the container object from the for statement as an argument; and the function iter() returns an iterator object. statement; and the function iter() returns an iterator object.

Thus, the for statement is called after converting the container object to an iterator object, and the __next__() method is called, accessing each object in the original container one by one until all elements have been traversed, throwing a StopIteration exception, and terminating the for loop.

5 Summary


  • An iterator must first be iterable; that is, an iterator must be iterable, but an iterable is not necessarily an iterator
  • An iterable object means that it can be converted to an iterator
  • Iterators need to have both methods __iter__() and __next__()
  • Calling the iter() function on an iterator gives you the iterator itself
  • for loops actually use iterators and generally use the exception StopIteration as a loop termination condition

This article explores iterators in Python, gaining an in-depth understanding of their properties and behavior, and learning two important methods, __iter__() and __next__(). Also figured out the internal mechanism of Python’s implementation of for loops.


Reference

[1] Python3 documentation - built-in types

[2] Liao Xuefeng’s official website

[3] Python3 Documentation - Classes - 9.8 Iterators

[4] Python-100-days-day019

comments powered by Disqus