9.4 Generators and the yield Statement in Python

Generators in Python are a special type of iterator that allow you to create sequences of values lazily (on demand) rather than all at once. Generators are defined using the yield statement, which makes them more memory-efficient than lists when working with large datasets or infinite sequences. Unlike regular functions, which return a single value and then exit, a generator function can yield multiple values over time, pausing and resuming its state between each yield.

In this section, we will explore what generators are, how the yield statement works, how to create and use generator functions, and the benefits of using generators for efficient iteration.


9.4.1 What is a Generator?

A generator is a special type of iterator that generates values lazily, one at a time, as they are needed. Unlike regular functions that use return to return a value and terminate, a generator function uses yield to produce a series of values over time. Each time the generator’s __next__() method is called, it resumes execution from where it left off and continues until it encounters the next yield statement or reaches the end of the function.

Key Features of Generators:

  • Lazy Evaluation: Generators produce items one at a time, which makes them memory-efficient for large or infinite sequences.
  • State Preservation: Generators preserve their local state between yield statements, allowing them to pick up where they left off.
  • Iteration: Generators can be iterated over using a for loop or the next() function, just like any other iterator.

9.4.2 The yield Statement

The yield statement is used in generator functions to produce a value and pause the function's execution. When a generator function is called, it returns a generator object that can be iterated over. Each call to next() resumes the function from the point where it last yielded a value, allowing the function to continue its execution.

Syntax of yield:

def generator_function():
    yield value  # Produce a value and pause execution
  • When the function reaches the yield statement, it returns the value and pauses its execution until the next call to next().
  • After the yield, the function's state is saved, and when resumed, it picks up from where it left off.

Example: Simple Generator with yield:

def simple_generator():
    yield 1
    yield 2
    yield 3

# Using the generator
gen = simple_generator()
print(next(gen))  # Output: 1
print(next(gen))  # Output: 2
print(next(gen))  # Output: 3
# Calling next() again will raise StopIteration

In this example:

  • simple_generator() yields the values 1, 2, and 3 sequentially.
  • Each call to next() retrieves the next value produced by the generator until the generator is exhausted, at which point StopIteration is raised.

9.4.3 Differences Between return and yield

The return statement in a function immediately terminates the function and returns a single value. In contrast, the yield statement in a generator function pauses the function’s execution and allows it to resume later, producing multiple values over time.

Key Differences:

  • return: Ends the function and returns a value.
  • yield: Pauses the function and produces a value, but allows the function to resume from where it paused on the next iteration.

Example: Using return vs. yield:

# Using return
def regular_function():
    return 1
    return 2  # This line will never be reached

print(regular_function())  # Output: 1

# Using yield
def generator_function():
    yield 1
    yield 2

gen = generator_function()
print(next(gen))  # Output: 1
print(next(gen))  # Output: 2

In this example:

  • The return statement in the regular function ends the function after returning 1, so the second return statement is never reached.
  • The yield statement in the generator function allows it to yield multiple values (1 and 2) without terminating the function after the first value.

9.4.4 Creating Generator Functions

A generator function is a regular function that contains one or more yield statements. Each time the generator is iterated over (using next() or a for loop), the function resumes execution where it left off, yielding the next value in the sequence.

Example: Basic Generator Function:

def countdown(n):
    print("Starting countdown")
    while n > 0:
        yield n
        n -= 1
    print("Countdown finished")

# Using the countdown generator
for number in countdown(5):
    print(number)

Output:

Starting countdown
5
4
3
2
1
Countdown finished

In this example:

  • The countdown generator function yields the numbers from n down to 1.
  • Each iteration of the for loop calls next() to get the next value, and the generator pauses after yielding each value.

9.4.5 Advantages of Generators

Generators offer several advantages over other data structures like lists:

  1. Lazy Evaluation: Generators only compute values when they are requested, which can save time when working with large or infinite sequences.
  2. Pipelines and Streams: Generators are perfect for building pipelines or processing streams of data. You can chain multiple generators together to process data in stages.

Memory Efficiency: Generators don’t store all the values in memory at once. Instead, they produce values one at a time as needed, which is ideal for large datasets or infinite sequences.

Example: Memory Efficiency

# List-based approach
large_list = [x * 2 for x in range(1000000)]

# Generator-based approach
def large_generator():
    for x in range(1000000):
        yield x * 2

# The generator approach does not load all values into memory
gen = large_generator()

In this example, the list-based approach creates and stores 1 million values in memory, whereas the generator-based approach yields values one at a time, making it much more memory-efficient.


9.4.6 Infinite Generators

Generators can be used to create infinite sequences that continue producing values indefinitely. Since generators don’t store all values in memory, they are ideal for such tasks.

Example: Infinite Fibonacci Sequence:

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# Using the Fibonacci generator
fib = fibonacci()
for _ in range(10):
    print(next(fib))

Output:

0
1
1
2
3
5
8
13
21
34

In this example:

  • The fibonacci() generator produces an infinite sequence of Fibonacci numbers.
  • The generator continues indefinitely, producing the next Fibonacci number on each call to next().

9.4.7 Generator Expressions

Python also supports generator expressions, which are similar to list comprehensions but produce values lazily. Generator expressions use parentheses instead of square brackets.

Example: Generator Expression:

# List comprehension
squares = [x ** 2 for x in range(5)]
print(squares)  # Output: [0, 1, 4, 9, 16]

# Generator expression
squares_gen = (x ** 2 for x in range(5))
print(next(squares_gen))  # Output: 0
print(next(squares_gen))  # Output: 1

In this example:

  • [x ** 2 for x in range(5)] creates a list of squares of numbers, storing all values in memory.
  • (x ** 2 for x in range(5)) creates a generator expression, which yields one square at a time, saving memory.

9.4.8 Combining Generators

Generators can be combined or chained to create more complex pipelines, where the output of one generator becomes the input to another.

Example: Chaining Generators:

def double_numbers(numbers):
    for number in numbers:
        yield number * 2

def filter_even(numbers):
    for number in numbers:
        if number % 2 == 0:
            yield number

# Using chained generators
numbers = range(10)
doubled = double_numbers(numbers)
even_doubled = filter_even(doubled)

for num in even_doubled:
    print(num)

Output:

0
4
8
12
16

In this example:

  • The double_numbers generator doubles each number in the sequence.
  • The filter_even generator filters the doubled numbers, only yielding the even ones.
  • The two generators are

chained together to produce the final output.


9.4.9 Summary

  • Generators are a special type of iterator that use the yield statement to produce values lazily, one at a time.
  • yield pauses the function’s execution, saving its state, and allows it to resume later from where it left off.
  • Generators are memory-efficient and allow you to work with large datasets or infinite sequences without loading all values into memory at once.
  • Generator expressions provide a concise way to create generators using a syntax similar to list comprehensions.
  • Generators can be combined to create pipelines, enabling efficient data processing in stages.

Generators offer powerful, flexible, and memory-efficient tools for handling data in Python, especially when dealing with large or infinite datasets, making them an essential feature for efficient iteration and data processing.