创建 NumPy 通用函数

有两种类型的通用函数

操作标量的函数，这些是“通用函数”或 ufuncs（参见下面的 @vectorize）。
操作高维数组和标量的函数，这些是“广义通用函数”或 gufuncs（参见下面的 @guvectorize）。

`@vectorize` 装饰器

Numba 的 vectorize 允许接受标量输入参数的 Python 函数用作 NumPy ufuncs。创建一个传统的 NumPy ufunc 并非最直接的过程，它涉及编写一些 C 代码。Numba 使这变得容易。使用 vectorize() 装饰器，Numba 可以将纯 Python 函数编译成一个 ufunc，它能像用 C 编写的传统 ufuncs 一样快速地操作 NumPy 数组。

使用 vectorize()，您可以编写函数来操作输入标量，而不是数组。Numba 将生成周围的循环（或 kernel），从而实现对实际输入的有效迭代。

vectorize() 装饰器有两种操作模式

即时（或装饰时）编译：如果您向装饰器传递一个或多个类型签名，您将构建一个 NumPy 通用函数 (ufunc)。本小节的其余部分描述了使用装饰时编译构建 ufuncs。
延迟（或调用时）编译：当未给定任何签名时，装饰器将为您提供一个 Numba 动态通用函数（DUFunc），它在用以前不支持的输入类型调用时动态编译一个新的内核。稍后的一个子章节，“动态通用函数”，将更深入地描述这种模式。

如上所述，如果您将签名列表传递给 vectorize() 装饰器，您的函数将被编译成一个 NumPy ufunc。在基本情况下，只会传递一个签名

来自 numba/tests/doc_examples/test_examples.py 的 test_vectorize_one_signature

from numba import vectorize, float64

@vectorize([float64(float64, float64)])
def f(x, y):
    return x + y

如果您传递多个签名，请注意必须在最不具体的签名之前传递最具体的签名（例如，单精度浮点数在双精度浮点数之前），否则基于类型的调度将无法按预期工作

来自 numba/tests/doc_examples/test_examples.py 的 test_vectorize_multiple_signatures

from numba import vectorize, int32, int64, float32, float64
import numpy as np

@vectorize([int32(int32, int32),
            int64(int64, int64),
            float32(float32, float32),
            float64(float64, float64)])
def f(x, y):
    return x + y

该函数将按预期在指定的数组类型上工作

来自 numba/tests/doc_examples/test_examples.py 的 test_vectorize_multiple_signatures

a = np.arange(6)
result = f(a, a)
# result == array([ 0,  2,  4,  6,  8, 10])

来自 numba/tests/doc_examples/test_examples.py 的 test_vectorize_multiple_signatures

a = np.linspace(0, 1, 6)
result = f(a, a)
# Now, result == array([0. , 0.4, 0.8, 1.2, 1.6, 2. ])

但它将在其他类型上失败

>>> a = np.linspace(0, 1+1j, 6)
>>> f(a, a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'ufunc' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

您可能会问自己，“为什么我要这样做，而不是使用 @jit 装饰器编译一个简单的迭代循环？” 答案是 NumPy ufuncs 自动获得其他功能，例如归约、累加或广播。使用上面的例子

来自 numba/tests/doc_examples/test_examples.py 的 test_vectorize_multiple_signatures

a = np.arange(12).reshape(3, 4)
# a == array([[ 0,  1,  2,  3],
#             [ 4,  5,  6,  7],
#             [ 8,  9, 10, 11]])

result1 = f.reduce(a, axis=0)
# result1 == array([12, 15, 18, 21])

result2 = f.reduce(a, axis=1)
# result2 == array([ 6, 22, 38])

result3 = f.accumulate(a)
# result3 == array([[ 0,  1,  2,  3],
#                   [ 4,  6,  8, 10],
#                   [12, 15, 18, 21]])

result4 = f.accumulate(a, axis=1)
# result3 == array([[ 0,  1,  3,  6],
#                   [ 4,  9, 15, 22],
#                   [ 8, 17, 27, 38]])

另请参阅

ufuncs 的标准功能 (NumPy 文档)。

注意

在编译代码中仅支持 ufuncs 的广播和归约功能。

vectorize() 装饰器支持多种 ufunc 目标

目标	描述
cpu	单线程 CPU
parallel	多核 CPU
cuda	CUDA GPU 注意这将创建一个 ufunc-like 对象。详情请参阅 CUDA ufunc 的文档。

一般的指导原则是为不同的数据大小和算法选择不同的目标。“cpu”目标适用于小数据量（约小于 1KB）和低计算强度算法。它的开销最小。“parallel”目标适用于中等数据量（约小于 1MB）。线程会增加少量延迟。“cuda”目标适用于大数据量（约大于 1MB）和高计算强度算法。在 GPU 之间传输内存会增加显著开销。

从 Numba 0.59 开始，cpu 目标在编译代码中支持以下属性和方法

ufunc.nin
ufunc.nout
ufunc.nargs
ufunc.identity
ufunc.signature
ufunc.reduce() (仅前 5 个参数 - 实验性功能)

`@guvectorize` 装饰器

虽然 vectorize() 允许您一次处理一个元素的 ufuncs，但 guvectorize() 装饰器将概念更进一步，允许您编写可在任意数量的输入数组元素上操作，并接受和返回不同维度的数组的 ufuncs。典型的例子是移动中值或卷积滤波器。

与 vectorize() 函数相反，guvectorize() 函数不返回它们的结果值：它们将其作为数组参数，必须由函数填充。这是因为该数组实际上是由 NumPy 的调度机制分配的，该机制调用 Numba 生成的代码。

与 vectorize() 装饰器类似，guvectorize() 也有两种操作模式：即时（或装饰时）编译和延迟（或调用时）编译。

这是一个非常简单的例子

来自 numba/tests/doc_examples/test_examples.py 的 test_guvectorize

from numba import guvectorize, int64
import numpy as np

@guvectorize([(int64[:], int64, int64[:])], '(n),()->(n)')
def g(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i] + y

底层的 Python 函数只是将给定的标量（y）添加到一维数组的所有元素中。更有趣的是声明。其中有两点

输入和输出布局的符号形式声明：(n),()->(n) 告诉 NumPy，该函数接受一个 n 元素的一维数组，一个标量（符号上用空元组 () 表示）并返回一个 n 元素的一维数组；
根据 @vectorize 支持的具体签名列表；在这里，如上述示例中，我们演示了 int64 数组。

注意

一维数组类型也可以接收标量参数（形状为 () 的那些）。在上面的示例中，第二个参数也可以声明为 int64[:]。在这种情况下，该值必须通过 y[0] 读取。

我们现在可以检查编译后的 ufunc 在一个简单示例中的作用

来自 numba/tests/doc_examples/test_examples.py 的 test_guvectorize

a = np.arange(5)
result = g(a, 2)
# result == array([2, 3, 4, 5, 6])

好的地方是 NumPy 将根据其形状自动调度更复杂的输入

来自 numba/tests/doc_examples/test_examples.py 的 test_guvectorize

a = np.arange(6).reshape(2, 3)
# a == array([[0, 1, 2],
#             [3, 4, 5]])

result1 = g(a, 10)
# result1 == array([[10, 11, 12],
#                   [13, 14, 15]])

result2 = g(a, np.array([10, 20]))
g(a, np.array([10, 20]))
# result2 == array([[10, 11, 12],
#                   [23, 24, 25]])

注意

无论 vectorize() 还是 guvectorize() 都支持传递 nopython=True ，如 @jit 装饰器中所示。使用它来确保生成的代码不会回退到对象模式。

标量返回值

现在假设我们想从 guvectorize() 返回一个标量值。要做到这一点，我们需要

在签名中，用 [:] 声明标量返回值，就像一维数组一样（例如 int64[:]），
在布局中，将其声明为 ()，
在实现中，写入第一个元素（例如 res[0] = acc）。

以下示例函数计算一维数组（x）与标量（y）的和，并将其作为标量返回

来自 numba/tests/doc_examples/test_examples.py 的 test_guvectorize_scalar_return

from numba import guvectorize, int64
import numpy as np

@guvectorize([(int64[:], int64, int64[:])], '(n),()->()')
def g(x, y, res):
    acc = 0
    for i in range(x.shape[0]):
        acc += x[i] + y
    res[0] = acc

现在，如果我们将封装的函数应用于数组，我们将得到一个标量值作为输出

来自 numba/tests/doc_examples/test_examples.py 的 test_guvectorize_scalar_return

a = np.arange(5)
result = g(a, 2)
# At this point, result == 20.

覆盖输入值

在大多数情况下，写入输入似乎也有效——然而，这种行为不可靠。考虑以下示例函数

来自 numba/tests/doc_examples/test_examples.py 的 test_guvectorize_overwrite

from numba import guvectorize, float64
import numpy as np

@guvectorize([(float64[:], float64[:])], '()->()')
def init_values(invals, outvals):
    invals[0] = 6.5
    outvals[0] = 4.2

调用 init_values 函数并传入 float64 类型的数组会导致输入发生可见变化

来自 numba/tests/doc_examples/test_examples.py 的 test_guvectorize_overwrite

invals = np.zeros(shape=(3, 3), dtype=np.float64)
# invals == array([[6.5, 6.5, 6.5],
#                  [6.5, 6.5, 6.5],
#                  [6.5, 6.5, 6.5]])

outvals = init_values(invals)
# outvals == array([[4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2]])

这之所以有效，是因为 NumPy 可以将输入数据直接传递给 init_values 函数，因为数据 dtype 与声明的参数类型匹配。然而，它也可能创建并传入一个临时数组，在这种情况下，对输入的更改将丢失。例如，当需要类型转换时，可能会发生这种情况。为了演示，我们可以对 init_values 函数使用 float32 类型的数组

来自 numba/tests/doc_examples/test_examples.py 的 test_guvectorize_overwrite

invals = np.zeros(shape=(3, 3), dtype=np.float32)
# invals == array([[0., 0., 0.],
#                  [0., 0., 0.],
#                  [0., 0., 0.]], dtype=float32)
outvals = init_values(invals)
# outvals == array([[4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2]])
print(invals)
# invals == array([[0., 0., 0.],
#                  [0., 0., 0.],
#                  [0., 0., 0.]], dtype=float32)

在这种情况下，invals 数组没有变化，因为被修改的是临时类型转换后的数组。

为了解决这个问题，需要告知 GUFunc 引擎 invals 参数是可写的。这可以通过向 @guvectorize 传递 writable_args=('invals',)（按名称指定）或 writable_args=(0,)（按位置指定）来实现。现在，上面的代码可以按预期工作了

来自 numba/tests/doc_examples/test_examples.py 的 test_guvectorize_overwrite

@guvectorize(
    [(float64[:], float64[:])],
    '()->()',
    writable_args=('invals',)
)
def init_values(invals, outvals):
    invals[0] = 6.5
    outvals[0] = 4.2

invals = np.zeros(shape=(3, 3), dtype=np.float32)
# invals == array([[0., 0., 0.],
#                  [0., 0., 0.],
#                  [0., 0., 0.]], dtype=float32)
outvals = init_values(invals)
# outvals == array([[4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2]])
print(invals)
# invals == array([[6.5, 6.5, 6.5],
#                  [6.5, 6.5, 6.5],
#                  [6.5, 6.5, 6.5]], dtype=float32)