`Op`s#

An Op is a graph object that defines and performs computations in a graph.

It has to define the following methods.

make_node(*inputs)#

This method is responsible for creating output Variables of a suitable symbolic Type to serve as the outputs of this Op’s application. The Variables found in *inputs must be operated on using PyTensor’s symbolic language to compute the symbolic output Variables. This method should put these outputs into an Apply instance, and return the Apply instance.

This method creates an Apply node representing the application of the Op on the inputs provided. If the Op cannot be applied to these inputs, it must raise an appropriate exception.

The inputs of the Apply instance returned by this call must be ordered correctly: a subsequent self.make_node(*apply.inputs) must produce something equivalent to the first apply.

perform(node, inputs, output_storage)#

This method computes the function associated to this Op. node is an Apply node created by the Op’s Op.make_node() method. inputs is a list of references to data to operate on using non-symbolic statements, (i.e., statements in Python, NumPy). output_storage is a list of storage cells where the variables of the computation must be put.

More specifically:

node: This is a reference to an Apply node which was previously obtained via the Op.make_node() method. It is typically not used in simple Ops, but it contains symbolic information that could be required for complex Ops.

inputs: This is a list of data from which the values stored in output_storage are to be computed using non-symbolic language.

output_storage: This is a list of storage cells where the output is to be stored. A storage cell is a one-element list. It is forbidden to change the length of the list(s) contained in output_storage. There is one storage cell for each output of the Op.

The data put in output_storage must match the type of the symbolic output. This is a situation where the node argument can come in handy.

A function Mode may allow output_storage elements to persist between evaluations, or it may reset output_storage cells to hold a value of None. It can also pre-allocate some memory for the Op to use. This feature can allow Op.perform() to reuse memory between calls, for example. If there is something preallocated in the output_storage, it will be of the good dtype, but can have the wrong shape and have any stride pattern.

This method must be determined by the inputs. That is to say, if it is evaluated once on inputs A and returned B, then if ever inputs C, equal to A, are presented again, then outputs equal to B must be returned again.

You must be careful about aliasing outputs to inputs, and making modifications to any of the inputs. See Views and inplace operations before writing a Op.perform() implementation that does either of these things.

__eq__(other)#

other is also an Op.

Returning True here is a promise to the rewrite system that the other Op will produce exactly the same graph effects (e.g. from its Op.perform()) as this one, given identical inputs. This means it will produce the same output values, it will destroy the same inputs (same Op.destroy_map), and will alias outputs to the same inputs (same Op.view_map). For more details, see Views and inplace operations.

Note

If you set __props__, this will be automatically generated.

__hash__()#

If two Op instances compare equal, then they must return the same hash value.

Equally important, this hash value must not change during the lifetime of self. Op instances should be immutable in this sense.

Note

If you set Op.__props__, this will be automatically generated.

Optional methods or attributes#

__props__#

Default: Undefined

Must be a tuple. Lists the name of the attributes which influence the computation performed. This will also enable the automatic generation of appropriate __eq__, __hash__ and __str__ methods. Should be set to () if you have no attributes that are relevant to the computation to generate the methods.

New in version 0.7.

default_output#

Default: None

If this member variable is an integer, then the default implementation of __call__ will return node.outputs[self.default_output], where node was returned by Op.make_node(). Otherwise, the entire list of outputs will be returned, unless it is of length 1, where the single element will be returned by itself.

make_thunk(node, storage_map, compute_map, no_recycling, impl=None)#

This function must return a thunk, that is a zero-arguments function that encapsulates the computation to be performed by this Op on the arguments of the node.

Parameters:

node – Apply instance The node for which a thunk is requested.
storage_map – dict of lists This maps variables to a one-element lists holding the variable’s current value. The one-element list acts as pointer to the value and allows sharing that “pointer” with other nodes and instances.
compute_map – dict of lists This maps variables to one-element lists holding booleans. If the value is 0 then the variable has not been computed and the value should not be considered valid. If the value is 1 the variable has been computed and the value is valid. If the value is 2 the variable has been garbage-collected and is no longer valid, but shouldn’t be required anymore for this call.
no_recycling – WRITEME WRITEME
impl – None, ‘c’ or ‘py’ Which implementation to use.

The returned function must ensure that is sets the computed variables as computed in the compute_map.

Defining this function removes the requirement for perform() or C code, as you will define the thunk for the computation yourself.

__call__(*inputs, **kwargs)#

By default this is a convenience function which calls make_node() with the supplied arguments and returns the result indexed by default_output. This can be overridden by subclasses to do anything else, but must return either an PyTensor Variable or a list of Variables.

If you feel the need to override __call__ to change the graph based on the arguments, you should instead create a function that will use your Op and build the graphs that you want and call that instead of the Op instance directly.

infer_shape(fgraph, node, shapes)#

This function is needed for shape rewrites. shapes is a list with one tuple for each input of the Apply node (which corresponds to the inputs of the Op). Each tuple contains as many elements as the number of dimensions of the corresponding input. The value of each element is the shape (number of items) along the corresponding dimension of that specific input.

While this might sound complicated, it is nothing more than the shape of each input as symbolic variables (one per dimension).

The function should return a list with one tuple for each output. Each tuple should contain the corresponding output’s computed shape.

Implementing this method will allow PyTensor to compute the output’s shape without computing the output itself, potentially sparing you a costly recomputation.

flops(inputs, outputs)#: It is only used to have more information printed by the memory profiler. It makes it print the mega flops and giga flops per second for each apply node. It takes as inputs two lists: one for the inputs and one for the outputs. They contain tuples that are the shapes of the corresponding inputs/outputs.

__str__()#: This allows you to specify a more informative string representation of your Op. If an Op has parameters, it is highly recommended to have the __str__ method include the name of the Op and the Op’s parameters’ values.

Note

If you set __props__, this will be automatically generated. You can still override it for custom output.

do_constant_folding(fgraph, node)#

Default: Return True

By default when rewrites are enabled, we remove during function compilation Apply nodes whose inputs are all constants. We replace the Apply node with an PyTensor constant variable. This way, the Apply node is not executed at each function call. If you want to force the execution of an Op during the function call, make do_constant_folding return False.

As done in the Alloc Op, you can return False only in some cases by analyzing the graph from the node parameter.

debug_perform(node, inputs, output_storage)#

Undefined by default.

If you define this function then it will be used instead of C code or Op.perform() to do the computation while debugging (currently DebugMode, but others may also use it in the future). It has the same signature and contract as Op.perform().

This enables Ops that cause trouble with DebugMode with their normal behaviour to adopt a different one when run under that mode. If your Op doesn’t have any problems, don’t implement this.

If you want your Op to work with pytensor.gradient.grad() you also need to implement the functions described below.

Automatic Differentiation#

These are the functions required to work with pytensor.gradient.grad() and pytensor.gradient.pushforward().

pullback(inputs, outputs, cotangents)#

Implements the vector-Jacobian product (VJP) for reverse-mode automatic differentiation. This is the primary method for gradient computation.

Given a function \(f\) with inputs \(x\) and outputs \(y = f(x)\), the pullback computes \(\bar{x} = \bar{y} J\) where \(J = \frac{\partial f}{\partial x}\) is the Jacobian and \(\bar{y}\) are the cotangent (row) vectors (upstream gradients).

Both inputs, outputs, and cotangents are lists of symbolic PyTensor Variables and those must be operated on using PyTensor’s symbolic language. The Op.pullback() method must return a list containing one Variable for each input. Each returned Variable represents the cotangent with respect to that input computed based on the symbolic cotangents with respect to each output.

If the output is not differentiable with respect to an input then this method should be defined to return a variable of type NullType for that input. Likewise, if you have not implemented the gradient computation for some input, you may return a variable of type NullType for that input. pytensor.gradient contains convenience methods that can construct the variable for you: pytensor.gradient.grad_undefined() and pytensor.gradient.grad_not_implemented(), respectively.

If an element of cotangents is of type pytensor.gradient.DisconnectedType, it means that the cost is not a function of this output. If any of the Op’s inputs participate in the computation of only disconnected outputs, then Op.pullback() should return DisconnectedType variables for those inputs.

If Op.pullback() is not defined, then PyTensor assumes it has been forgotten. Symbolic differentiation will fail on a graph that includes this Op.

If an Op has a single vector-valued output y and a single vector-valued input x, then Op.pullback() will be passed x, y, and a cotangent vector z. Define J to be the Jacobian of y with respect to x. The method should return dot(z, J). When pytensor.grad() calls Op.pullback(), it will set z to be the gradient of the cost C with respect to y. If this Op is the only Op that acts on x, then dot(z, J) is the gradient of C with respect to x. If there are other Ops that act on x, pytensor.grad() will add up the terms of x’s gradient contributed by each Op.pullback().

In practice, it is probably not a good idea to explicitly construct the Jacobian, which might be very large and very sparse. However, the returned value should be equal to the vector-Jacobian product. Note that an Op’s inputs and outputs may include scalars, matrices, sparse arrays, or higher-dimensional tensors — not just vectors. The VJP contract still applies: conceptually, each of these can be viewed as a vector that has been reshaped into a tensor. The returned cotangent for each input must have the same shape as that input, and each incoming cotangent in cotangents will have the same shape as the corresponding output.

PyTensor currently imposes the following constraints on the values returned by the Op.pullback() method:

They must be Variable instances.
When they are types that have dtypes, they must never have an integer dtype.

The output cotangents passed to Op.pullback() will also obey these constraints.

Integers are a tricky subject. Integers are the main reason for having DisconnectedType, NullType or zero gradient. When you have an integer as an argument to your Op.pullback() method, recall the definition of a derivative to help you decide what value to return:

\(\frac{d f}{d x} = \lim_{\epsilon \rightarrow 0} (f(x+\epsilon)-f(x))/\epsilon\).

Suppose your function f has an integer-valued output. For most functions you’re likely to implement in PyTensor, this means your gradient should be zero, because \(f(x+epsilon) = f(x)\) for almost all \(x\). (The only other option is that the gradient could be undefined, if your function is discontinuous everywhere, like the rational indicator function)

Suppose your function \(f\) has an integer-valued input. This is a little trickier, because you need to think about what you mean mathematically when you make a variable integer-valued in PyTensor. Most of the time in machine learning we mean “\(f\) is a function of a real-valued \(x\), but we are only going to pass in integer-values of \(x\)”. In this case, \(f(x+\epsilon)\) exists, so the gradient through \(f\) should be the same whether \(x\) is an integer or a floating point variable. Sometimes what we mean is “\(f\) is a function of an integer-valued \(x\), and \(f\) is only defined where \(x\) is an integer.” Since \(f(x+\epsilon)\) doesn’t exist, the gradient is undefined. Finally, many times in PyTensor, integer valued inputs don’t actually affect the elements of the output, only its shape.

If your function \(f\) has both an integer-valued input and an integer-valued output, then both rules have to be combined:

If \(f\) is defined at \(x + \epsilon\), then the input gradient is defined. Since \(f(x+\epsilon)\) would be equal to \(f(x)\) almost everywhere, the gradient should be zero (first rule).
If \(f\) is only defined where \(x\) is an integer, then the gradient is undefined, regardless of what the gradient with respect to the output is.

Examples:

\(f(x,y)\) is a dot product between \(x\) and \(y\). \(x\) and \(y\) are integers. Since the output is also an integer, \(f\) is a step function. Its gradient is zero almost everywhere, so Op.pullback() should return zeros in the shape of \(x\) and \(y\).
\(f(x,y)\) is a dot product between \(x\) and \(y\). \(x\) is floating point and \(y\) is an integer. In this case the output is floating point. It doesn’t matter that \(y\) is an integer. We consider \(f\) to still be defined at \(f(x,y+\epsilon)\). The gradient is exactly the same as if \(y\) were floating point.
\(f(x,y)\) is the argmax of \(x\) along axis \(y\). The gradient with respect to \(y\) is undefined, because \(f(x,y)\) is not defined for floating point \(y\). How could you take an argmax along a fractional axis? The gradient with respect to \(x\) is 0, because \(f(x+\epsilon, y) = f(x)\) almost everywhere.
\(f(x,y)\) is a vector with \(y\) elements, each of which taking on the value \(x\) The Op.pullback() method should return DisconnectedType for \(y\), because the elements of \(f\) don’t depend on \(y\). Only the shape of \(f\) depends on \(y\). You probably also want to implement a connection_pattern method to encode this.
\(f(x) = int(x)\) converts float \(x\) into an integer. \(g(y) = float(y)\) converts an integer \(y\) into a float. If the final cost \(C = 0.5 * g(y) = 0.5 g(f(x))\), then the gradient with respect to \(y\) will be 0.5, even if \(y\) is an integer. However, the gradient with respect to \(x\) will be 0, because the output of \(f\) is integer-valued.

pushforward(inputs, outputs, tangents)#

Implements the Jacobian-vector product (JVP) for forward-mode automatic differentiation.

Given a function \(f\) with inputs \(x\) and outputs \(y = f(x)\), the pushforward computes \(\dot{y} = J \dot{x}\) where \(J = \frac{\partial f}{\partial x}\) is the Jacobian and \(\dot{x}\) are the tangent vectors.

inputs are the symbolic variables corresponding to the input values, outputs are the symbolic variables corresponding to the output values, and tangents are the symbolic variables corresponding to the tangent vectors to right-multiply the Jacobian with. Tangent entries of type DisconnectedType indicate that the corresponding input is not being differentiated.

The method must return the same number of outputs as there are outputs of the Op. For each output, the result is the Jacobian of that output with respect to the inputs, right-multiplied by the tangent vector. Return a variable of type DisconnectedType for outputs that are disconnected from all inputs.

connection_pattern(node):

Sometimes needed for proper operation of pytensor.gradient.grad().

Returns a list of list of booleans.

Op.connection_pattern[input_idx][output_idx] is true if the elements of inputs[input_idx] have an effect on the elements of outputs[output_idx].

The node parameter is needed to determine the number of inputs. Some Ops such as Subtensor take a variable number of inputs.

If no connection_pattern is specified, pytensor.gradient.grad() will assume that all inputs have some elements connected to some elements of all outputs.

This method conveys two pieces of information that are otherwise not part of the PyTensor graph:

Which of the Op’s inputs are truly ancestors of each of the Op’s outputs. Suppose an Op has two inputs, \(x\) and \(y\), and outputs \(f(x)\) and \(g(y)\). \(y\) is not really an ancestor of \(f\), but it appears to be so in the PyTensor graph.
Whether the actual elements of each input/output are relevant to a computation. For example, the shape Op does not read its input’s elements, only its shape metadata. \(\frac{d shape(x)}{dx}\) should thus raise a disconnected input exception (if these exceptions are enabled). As another example, the elements of the Alloc Op’s outputs are not affected by the shape arguments to the Alloc Op.

Failing to implement this function for an Op that needs it can result in two types of incorrect behavior:

pytensor.gradient.grad() erroneously raising a TypeError reporting that a gradient is undefined.
pytensor.gradient.grad() failing to raise a ValueError reporting that an input is disconnected.

Even if connection_pattern is not implemented correctly, if pytensor.gradient.grad() returns an expression, that expression will be numerically correct.

Ops#

Optional methods or attributes#

Automatic Differentiation#

`Op`s#