Troubleshooting#

Here are Linux troubleshooting instructions. There is a specific MacOS section.

Why do I get a network error when I install PyTensor
Why is my code so slow/uses so much memory
How to solve TypeError: object of type ‘TensorVariable’ has no len()
How to solve Out of memory Error
pytensor.function returns a float64 when the inputs are float32 and int{32, 64}
How to test that PyTensor works properly
How do I configure/test my BLAS library

Why do I get a network error when I install PyTensor#

If you are behind a proxy, you must do some extra configuration steps before starting the installation. You must set the environment variable http_proxy to the proxy address. Using bash this is accomplished with the command export http_proxy="http://user:pass@my.site:port/" You can also provide the --proxy=[user:pass@]url:port parameter to pip. The [user:pass@] portion is optional.

How to solve TypeError: object of type ‘TensorVariable’ has no len()#

If you receive the following error, it is because the Python function __len__ cannot be implemented on PyTensor variables:

TypeError: object of type 'TensorVariable' has no len()

Python requires that __len__ returns an integer, yet it cannot be done as PyTensor’s variables are symbolic. However, var.shape[0] can be used as a workaround.

This error message cannot be made more explicit because the relevant aspects of Python’s internals cannot be modified.

How to solve Out of memory Error#

Occasionally PyTensor may fail to allocate memory when there appears to be more than enough reporting:

Error allocating X bytes of device memory (out of memory). Driver report Y bytes free and Z total.

where X is far less than Y and Z (i.e. X << Y < Z).

This scenario arises when an operation requires allocation of a large contiguous block of memory but no blocks of sufficient size are available.

A known example is related to writing data to shared variables. When updating a shared variable PyTensor will allocate new space if the size of the data does not match the size of the space already assigned to the variable. This can lead to memory fragmentation which means that a continugous block of memory of sufficient capacity may not be available even if the free memory overall is large enough.

pytensor.function returns a float64 when the inputs are float32 and int{32, 64}#

It should be noted that using float32 and int{32, 64} together inside a function would provide float64 as output.

To help you find where float64 are created, see the warn_float64 PyTensor flag.

How to test that PyTensor works properly#

An easy way to check something that could be wrong is by making sure PYTENSOR_FLAGS have the desired values as well as the ~/.pytensorrc

Also, check the following outputs :

ipython

import pytensor
pytensor.__file__
pytensor.__version__

Once you have installed PyTensor, you should run the test suite in the tests directory.

python -c "import numpy; numpy.test()"
python -c "import scipy; scipy.test()"
pip install pytest
PYTENSOR_FLAGS='' pytest tests/

All PyTensor tests should pass (skipped tests and known failures are normal). If some test fails on your machine, you are encouraged to tell us what went wrong in the GitHub issues.

Why is my code so slow/uses so much memory#

There is a few things you can easily do to change the trade-off between speed and memory usage.

Could raise memory usage but speed up computation:

config.allow_gc =False

Could lower the memory usage, but raise computation time:

config.scan__allow_gc = True
config.scan__allow_output_prealloc =False
Disable one or scan more rewrites:
- optimizer_excluding=scan_pushout_seqs_ops
- optimizer_excluding=scan_pushout_dot1
- optimizer_excluding=scan_pushout_add
Disable all rewrites tagged as raising memory usage: optimizer_excluding=more_mem (currently only the 3 scan rewrites above)
float16.

If you want to analyze the memory usage during computation, the simplest is to let the memory error happen during PyTensor execution and use the PyTensor flags exception_verbosity=high.

How do I configure/test my BLAS library#

There are many ways to configure BLAS for PyTensor. This is done with the PyTensor flags blas__ldflags (config – PyTensor Configuration). If not specified, PyTensor will attempt to find a local BLAS library to link against, prioritizing specialized implementations. The details can be found in pytensor.link.c.cmodule.default_blas_ldflags().

Users can manually set the PyTensor flags blas__ldflags to link against a specific version. This is useful even if the default version is the desired one, as it will avoid the costly work of trying to find the best BLAS library at runtime.

The PyTensor flags can be set in a few ways:

In the ${HOME}/.pytensorrc file.

# other stuff can go here
[blas]
ldflags = -llapack -lblas -lcblas  # put your flags here

# other stuff can go here

In BASH before running your script:

export PYTENSOR_FLAGS="blas__ldflags='-llapack -lblas -lcblas'"

In an Ipython/Jupyter notebook before importing PyTensor:

%set_env PYTENSOR_FLAGS=blas__ldflags='-llapack -lblas -lcblas'

In pytensor.config directly:

import pytensor
pytensor.config.blas__ldflags = '-llapack -lblas -lcblas'

(For more information on the formatting of ~/.pytensorrc and the configuration options that you can put there, see config – PyTensor Configuration.)

You can find the default BLAS library that PyTensor is linking against by checking pytensor.config.blas__ldflags or running pytensor.link.c.cmodule.default_blas_ldflags().

Here are some different way to configure BLAS:

0) Do nothing and use the default config. This will usually work great for installation via conda/mamba/pixi (conda-forge channel). It will usually fail to link altogether for installation via pip.

1) Disable the usage of BLAS and fall back on NumPy for dot products. To do this, set the value of blas__ldflags as the empty string. Depending on the kind of matrix operations your PyTensor code performs, this might slow some things down (vs. linking with BLAS directly).

2) You can install the default (reference) version of BLAS if the NumPy version (against which PyTensor links) does not work. If you have root or sudo access in fedora you can do sudo yum install blas blas-devel. Under Ubuntu/Debian sudo apt-get install libblas-dev. Then use the PyTensor flags blas__ldflags=-lblas. Note that the default version of blas is not optimized. Using an optimized version can give up to 10x speedups in the BLAS functions that we use.

3) Install the ATLAS library. ATLAS is an open source optimized version of BLAS. You can install a precompiled version on most OSes, but if you’re willing to invest the time, you can compile it to have a faster version (we have seen speed-ups of up to 3x, especially on more recent computers, against the precompiled one). On Fedora, sudo yum install atlas-devel. Under Ubuntu, sudo apt-get install libatlas-base-dev libatlas-base or libatlas3gf-sse2 if your CPU supports SSE2 instructions. Then set the PyTensor flags blas__ldflags to -lf77blas -latlas -lgfortran. Note that these flags are sometimes OS-dependent.

4) Use a faster version like MKL, GOTO, … You are on your own to install it. See the doc of that software and set the PyTensor flags blas__ldflags correctly (for example, for MKL this might be -lmkl -lguide -lpthread or -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lguide -liomp5 -lmkl_mc -lpthread).

5) Use another backend such as Numba or JAX that perform their own BLAS optimizations, by setting the configuration mode to "NUMBA" or "JAX" and making sure those packages are installed. This configuration mode can be set in all the ways that the BLAS flags can be set, described above.

Alternatively, you can pass mode='NUMBA' when compiling individual PyTensor functions without changing the default. or use the config.change_flags context manager.

from pytensor import function, config
from pytensor.tensor import matrix

x = matrix('x')
y = x @ x.T
f = function([x], y, mode='NUMBA')

with config.change_flags(mode='NUMBA'):
    # compiling function that benefits from BLAS using NUMBA
    f = function([x], y)

Note

Make sure your BLAS libraries are available as dynamically-loadable libraries. ATLAS is often installed only as a static library. PyTensor is not able to use this static library. Your ATLAS installation might need to be modified to provide dynamically loadable libraries. (On Linux this typically means a library whose name ends with .so. On Windows this will be a .dll, and on OS-X it might be either a .dylib or a .so.)

This might be just a problem with the way PyTensor passes compilation arguments to g++, but the problem is not fixed yet.

Note

If you have problems linking with MKL, Intel Line Advisor and the MKL User Guide can help you find the correct flags to use.

Note

If you have error that contain “gfortran” in it, like this one:

ImportError: (‘/home/Nick/.pytensor/compiledir_Linux-2.6.35-31-generic-x86_64-with-Ubuntu-10.10-maverick–2.6.6/tmpIhWJaI/0c99c52c82f7ddc775109a06ca04b360.so: undefined symbol: _gfortran_st_write_done’

The problem is probably that NumPy is linked with a different blas then then one currently available (probably ATLAS). There is 2 possible fixes:

Uninstall ATLAS and install OpenBLAS.
Use the PyTensor flag “blas__ldflags=-lblas -lgfortran”

1) is better as OpenBLAS is faster then ATLAS and NumPy is probably already linked with it. So you won’t need any other change in PyTensor files or PyTensor configuration.

Testing BLAS#

It is recommended to test your PyTensor/BLAS integration. There are many versions of BLAS that exist and there can be up to 10x speed difference between them. Also, having PyTensor link directly against BLAS instead of using NumPy/SciPy as an intermediate layer reduces the computational overhead. This is important for BLAS calls to ger, gemv and small gemm operations (automatically called when needed when you use dot()). To run the PyTensor/BLAS speed test:

python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")

This will print a table with different versions of BLAS/numbers of threads on multiple CPUs. It will also print some PyTensor/NumPy configuration information. Then, it will print the running time of the same benchmarks for your installation. Try to find a CPU similar to yours in the table, and check that the single-threaded timings are roughly the same.

PyTensor should link to a parallel version of BLAS and use all cores when possible. By default it should use all cores. Set the environment variable “OMP_NUM_THREADS=N” to specify to use N threads.

Mac OS#

Although the above steps should be enough, running PyTensor on a Mac may sometimes cause unexpected crashes, typically due to multiple versions of Python or other system libraries. If you encounter such problems, you may try the following.

You can ensure MacPorts shared libraries are given priority at run-time with export LD_LIBRARY_PATH=/opt/local/lib:$LD_LIBRARY_PATH. In order to do the same at compile time, you can add to your ~/.pytensorrc:
[gcc] cxxflags = -L/opt/local/lib
More generally, to investigate libraries issues, you can use the otool -L command on .so files found under your ~/.pytensor directory. This will list shared libraries dependencies, and may help identify incompatibilities.