Abstract Syntax Trees (AST) code manipulation

Let’s push further into one of Python's most sophisticated features: Abstract Syntax Trees (AST) and code manipulation at runtime. This concept allows Python programs to manipulate Python code itself, making it incredibly powerful for metaprogramming, program analysis, or even creating custom interpreters.

What is an Abstract Syntax Tree (AST)?

Python internally represents code as an AST. The AST is a tree representation of the abstract syntactic structure of source code, where each node in the tree denotes a construct occurring in the code (like expressions, literals, and functions). You can programmatically modify the AST to transform or analyze code before it executes, opening doors for things like dynamic code generation, optimization, or creating domain-specific languages (DSLs).

Goal: Dynamic Code Manipulation Using AST

We will:

  1. Parse Python code into an AST.
  2. Modify the AST (for instance, injecting code, optimizing, or transforming it).
  3. Compile the modified AST back into executable Python code.

Step 1: Parse and Inspect Code with ast

Let’s start by parsing Python code into an AST and inspecting it.

import ast

source_code = """
def greet(name):
    return f"Hello, {name}!"
"""

# Parse the code into an AST
parsed_ast = ast.parse(source_code)

# Pretty print the AST
print(ast.dump(parsed_ast, indent=4))

Explanation:

  • ast.parse(source_code) parses the Python source code into an AST.
  • ast.dump(parsed_ast) provides a visual tree-like structure of the code. You can see how the function, its arguments, and the return statement are represented as nodes in the tree.

Step 2: Manipulating the AST

Now let's modify the AST to inject a logging statement into the function dynamically.

import ast

# The source code to be modified
source_code = """
def greet(name):
    return f"Hello, {name}!"
"""

# Parse the code into an AST
parsed_ast = ast.parse(source_code)

# Create a new node: adding a logging print statement
logging_node = ast.parse('print(f"Calling greet with {name}")').body[0]

# Find the greet function in the AST
class FunctionTransformer(ast.NodeTransformer):
    def visit_FunctionDef(self, node):
        # Inject the logging node at the start of the function
        node.body.insert(0, logging_node)
        return node

# Apply the transformation
transformed_ast = FunctionTransformer().visit(parsed_ast)
ast.fix_missing_locations(transformed_ast)

# Compile the AST back into a code object
code = compile(transformed_ast, filename="<ast>", mode="exec")

# Execute the transformed code
exec(code)

# Now you can call the modified function
greet("Alice")  # Output will include: Calling greet with Alice

Explanation:

  • AST Transformation: We defined a class FunctionTransformer that extends ast.NodeTransformer, which allows us to manipulate specific nodes in the AST.
  • Inserting the Logging Statement: We create an AST node for the logging statement (print(f"Calling greet with {name}")) and inject it at the start of the function body.
  • AST Re-compilation: The transformed AST is compiled back into executable Python code using compile().
  • Execution: When calling greet("Alice"), it first logs "Calling greet with Alice", then executes the original function.

Step 3: More Advanced Transformations (Auto-Memoization)

Let’s use AST manipulation to dynamically add memoization (a technique to cache function results) to a function, so it avoids recomputation when given the same arguments.

import ast

source_code = """
def expensive_computation(x):
    return x * 2
"""

# Parse the code into an AST
parsed_ast = ast.parse(source_code)

# Memoization code template
memoize_code = """
cache = {}
def memoized_func(x):
    if x in cache:
        return cache[x]
    result = expensive_computation(x)
    cache[x] = result
    return result
"""

# Parse the memoization code
memoize_ast = ast.parse(memoize_code)

# Modify the original function to call the memoized version
class MemoizeTransformer(ast.NodeTransformer):
    def visit_FunctionDef(self, node):
        # Rename original function
        node.name = "original_" + node.name
        
        # Add memoized function definition
        memo_func = memoize_ast.body[1]
        memo_func.name = node.name.replace("original_", "memoized_")
        
        # Insert the memoization logic
        return [node, memo_func]

# Apply the transformation
transformed_ast = MemoizeTransformer().visit(parsed_ast)
ast.fix_missing_locations(transformed_ast)

# Compile the AST back into executable code
code = compile(transformed_ast, filename="<ast>", mode="exec")

# Execute the modified code
exec(code)

# Example usage
print(memoized_func(10))  # Computation occurs
print(memoized_func(10))  # Cached result is used

Explanation:

  1. Memoization Logic: We first define the memoization logic (memoized_func) using an AST. This logic checks if the result for a given argument is cached, and if not, computes and caches it.
  2. NodeTransformer: We extend NodeTransformer again to rename the original function and inject the memoized version of the function.
  3. Execution: The modified code dynamically wraps the original function with a memoization layer.

Step 4: Abstract Syntax Trees for Static Code Analysis

Another common use of ASTs is static code analysis. We can traverse the AST to detect certain patterns, such as inefficient code, security vulnerabilities, or usage of certain APIs.

Let’s build an example where we analyze the code to find all function definitions and their argument names.

class FunctionAnalyzer(ast.NodeVisitor):
    def visit_FunctionDef(self, node):
        # Output function name and arguments
        arg_names = [arg.arg for arg in node.args.args]
        print(f"Function {node.name} with arguments {arg_names}")
        self.generic_visit(node)

source_code = """
def add(a, b):
    return a + b

def multiply(x, y):
    return x * y
"""

# Parse the source code into an AST
parsed_ast = ast.parse(source_code)

# Analyze the AST
analyzer = FunctionAnalyzer()
analyzer.visit(parsed_ast)

Explanation:

  • AST NodeVisitor: We extend NodeVisitor to walk through the AST nodes and inspect each FunctionDef node, which corresponds to function definitions.
  • Static Analysis: This approach allows us to gather information about the code (e.g., function names and their arguments) without executing it. This is especially useful for building linters, security checkers, or refactoring tools.

Key Advanced Concepts in This Example:

  1. AST Manipulation: You’re dynamically modifying the structure of code at runtime by transforming its AST representation.
  2. AST Parsing and Compilation: You can parse Python source code into an AST, manipulate it, and compile it back into executable code.
  3. Memoization via AST: We dynamically inject memoization logic using the AST without modifying the original source code.
  4. Static Analysis: ASTs are perfect for analyzing code for patterns or errors without running it, as used by tools like PyLint, MyPy, and other static analyzers.

Use Cases for AST Manipulation:

  • Code Optimization: Dynamically optimizing code by removing dead code, inlining functions, or constant folding.
  • Custom DSLs: Building domain-specific languages by transforming Python into specialized code that can be executed or interpreted differently.
  • Security Auditing: Automatically detecting dangerous or insecure patterns in code, such as the use of unsafe functions.
  • Code Generation: Automatically generating new code based on templates or predefined patterns.

This approach is quite advanced and incredibly powerful, as it allows for runtime code manipulation, optimizations, and analysis, which are features used by many compilers and sophisticated frameworks.