0x0. 前言

這篇文章基于自己為OneFlow框架開發(fā)interpolate這個Op總結而來，OneFlow的interpolate Op 和 Pytorch的功能一致，都是用來實現(xiàn)插值上采樣或者下采樣的。在實現(xiàn)這個Op的時候還給Pytorch修復了一個bug并合并到了主倉庫，見：https://github.com/pytorch/pytorch/commit/6ab3a210983b7eee417e7cd92a8ad2677065e470。因此OneFlow框架中的interpolate算子和Pytorch中的interpolate算子的功能是完全等價的。這篇文章就以OneFlow中這個算子的實現(xiàn)為例來盤點一下深度學習框架中的那些插值算法。

0x1. doc && interface接口

要了解interpolate算子中的插值算法，首先需要從文檔和Python前端接口看起?？匆幌陆涌谖臋n，https://oneflow.readthedocs.io/en/master/functional.html?highlight=interpolate 。

這里可以看到OneFlow的interpolate算子用來實現(xiàn)插值上采樣或者下采樣的功能，支持3-D，4-D，5-D的輸入Tensor，然后提供了多種插值的方式應用于不同Shape的輸入Tensor。下面再看一下參數(shù)列表：

input：輸入Tensor。
size：插值后輸出Tensor的空間維度的大小，這個spatial size就是去掉Batch，Channel，Depth維度后剩下的值。比如NCHW的spatial size是HW。
scale_factor(float 或者 Tuple[float])：spatial size的乘數(shù)，如果是tuple則必須匹配輸入數(shù)據(jù)的大小。
mode(str)：上采樣的模式，包含'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear' | 'area'。默認是 'nearest'。
align_corners(bool)：在幾何上，我們將輸入和輸出的像素視為正方形而不是點。如果設置為True，則輸入和輸出張量按其角像素的中心點對齊，保留角像素處的值。如果設置為False，則輸入和輸出張量按其角像素的角點對齊，插值使用邊緣值填充來處理邊界外值，當scale_factor保持不變時，此操作與輸入大小無關。這僅在mode為 'linear' | 'bilinear' | 'bicubic' | 'trilinear'時有效。默認值是False。（沒看懂沒關系，下面有一節(jié)專門講解）
recompute_scale_factor(bool)：重新計算用于插值計算的 scale_factor。當 scale_factor 作為參數(shù)傳遞時，它用于計算 output_size。如果 recompute_scale_factor 為 False 或未指定，則傳入的 scale_factor 將用于插值計算。否則，將根據(jù)用于插值計算的輸出和輸入大小計算新的 scale_factor（即，等價于顯示傳入output_size）。請注意，當 scale_factor 是浮點數(shù)時，由于舍入和精度問題，重新計算的 scale_factor 可能與傳入的不同。

除了功能描述和參數(shù)描述之外還有幾個注意事項和warning，大家可以自行查看文檔。下面貼一段如何使用的示例代碼，非常簡單。

>>> import oneflow as flow
>>> import numpy as np

>>> input = flow.Tensor(np.arange(1, 5).reshape((1, 1, 4)), dtype=flow.float32)
>>> output = flow.nn.functional.interpolate(input, scale_factor=2.0, mode="linear")
>>> output
tensor([[[1.0000, 1.2500, 1.7500, 2.2500, 2.7500, 3.2500, 3.7500, 4.0000]]],
       dtype=oneflow.float32)

介紹完文檔之后，我們看一下這個Op實現(xiàn)的Python前端接口，代碼見：https://github.com/Oneflow-Inc/oneflow/blob/master/python/oneflow/nn/modules/interpolate.py#L25-L193 。這里的主要邏輯就是在根據(jù)是否傳入了recompute_scale_factor參數(shù)來重新計算scale_factor的值，在獲得了scale_factor之后根據(jù)傳入的mode調(diào)用不同的插值Kernel的實現(xiàn)。見：

if len(x.shape) == 3 and self.mode == "nearest":
            return flow._C.upsample_nearest_1d(
                x, scale_factor=scale_factors[0], data_format="channels_first"
            )
        if len(x.shape) == 4 and self.mode == "nearest":
            return flow._C.upsample_nearest_2d(
                x,
                height_scale=scale_factors[0],
                width_scale=scale_factors[1],
                data_format="channels_first",
            )
        if len(x.shape) == 5 and self.mode == "nearest":
            return flow._C.upsample_nearest_3d(
                x,
                depth_scale=scale_factors[0],
                height_scale=scale_factors[1],
                width_scale=scale_factors[2],
                data_format="channels_first",
            )
        if len(x.shape) == 3 and self.mode == "area":
            assert output_size is not None
            return flow._C.adaptive_avg_pool1d(x, output_size)
        if len(x.shape) == 4 and self.mode == "area":
            assert output_size is not None
            return flow._C.adaptive_avg_pool2d(x, output_size)
        if len(x.shape) == 5 and self.mode == "area":
            assert output_size is not None
            return flow._C.adaptive_avg_pool3d(x, output_size)
        if len(x.shape) == 3 and self.mode == "linear":
            assert self.align_corners is not None
            return flow._C.upsample_linear_1d(
                x,
                scale_factor=scale_factors[0],
                align_corners=self.align_corners,
                data_format="channels_first",
            )
        if len(x.shape) == 4 and self.mode == "bilinear":
            assert self.align_corners is not None
            return flow._C.upsample_bilinear_2d(
                x,
                height_scale=scale_factors[0],
                width_scale=scale_factors[1],
                align_corners=self.align_corners,
                data_format="channels_first",
            )
        if len(x.shape) == 4 and self.mode == "bicubic":
            assert self.align_corners is not None
            return flow._C.upsample_bicubic_2d(
                x,
                height_scale=scale_factors[0],
                width_scale=scale_factors[1],
                align_corners=self.align_corners,
                data_format="channels_first",
            )
        if len(x.shape) == 5 and self.mode == "trilinear":
            assert self.align_corners is not None
            return flow._C.upsample_trilinear_3d(
                x,
                depth_scale=scale_factors[0],
                height_scale=scale_factors[1],
                width_scale=scale_factors[2],
                align_corners=self.align_corners,
                data_format="channels_first",
            )

所以Python前端就是處理了一些參數(shù)關系，然后調(diào)用了C++層的API來完成真正的計算過程。下面我們將分別介紹各種插值算法的原理以及在OneFlow中的實現(xiàn)。

0x2. AlignCorners解釋

在上面的接口中，align_corners是一個非常重要的參數(shù)，這里我們先解釋一下這個參數(shù)是什么含義再繼續(xù)講解每種Kernel的實現(xiàn)。這里以一張圖片的nearest插值為例講解align_corners的具體含義。

假設原始圖像的大小是，目標圖像是，那么兩幅圖像的邊長比分別是和。那么目標圖像的位置的像素可以通過上面的邊長比對應回原圖像，坐標為。當然這樣獲得的坐標可能不是整數(shù)，如果強行取整就是普通的最鄰近插值，而雙線性插值就是通過尋找距離這個對應坐標最近的四個像素點，來計算該點的值，如果坐標是，那么最近的四個像素是, ，。如果圖形是灰度圖，那么點的像素值可以通過下面的公式計算：其中，為最近的個像素點，為各點的權重。

到這里并沒有結束，我們需要特別注意的是，僅僅按照上面得到公式實現(xiàn)的雙線性插值的結果和OpenCV/Matlab的結果是對應不起來的，這是為什么呢？

原因就是因為坐標系的選取問題，按照一些網(wǎng)上的公開實現(xiàn)，將源圖像和目標圖像的原點均選在左上角，然后根據(jù)插值公式計算目標圖像每個點的像素，假設我們要將的圖像縮小成，那么源圖像和目標圖像的對應關系如下圖所示：

可以看到如果選擇了左上角作為原點，那么最右邊和最下邊的像素是沒有參與計算的，所以我們得到的結果和OpenCV/MatLab中的結果不會一致，那應該怎么做才是對的呢？

答案就是讓兩個圖像的幾何中心重合，并且目標圖像的每個像素之間都是等間隔的，并且都和兩邊有一定的邊距。如下圖所示：

所以，我們只需要在計算坐標的時候將：

int x=i*m/a;
int y=j*n/b;

改成：

int x=(i+0.5)*m/a-0.5;
int y=(j+0.5)*n/b-0.5;

所以在interpolate Op的實現(xiàn)中提供了align_corners這個參數(shù)讓用戶選擇是否對齊輸入和輸出的幾何中心。

0x3. Linear插值

Linaer插值即線性插值。線性插值的幾何意義即為概述圖中利用過A點和B點的直線來近似表示原函數(shù)。如下圖所示：

由于，那么再展開一下可得：

在OneFlow中實現(xiàn)線性插值的代碼在https://github.com/Oneflow-Inc/oneflow/blob/master/oneflow/user/kernels/upsample_linear_1d_kernel.cpp，我們只看前向，代碼中的h1lambda就對應了這個公式里面的。

template<typename T>
OF_DEVICE_FUNC T GetLinearInputIndex(const int64_t out_dim_idx, const T scale, bool align_corners) {
  if (align_corners) {
    return static_cast<T>(scale * out_dim_idx);
  } else {
    T src_idx = scale * (out_dim_idx + 0.5) - 0.5;
    return static_cast<T>(src_idx < 0 ? 0 : src_idx);
  }
}

static void UpsampleLinear1DForward(const int64_t elem_cnt, const T* in_dptr,
                                    NdIndexOffsetHelper<int64_t, 3> in_helper,
                                    NdIndexOffsetHelper<int64_t, 3> out_helper, const int in_height,
                                    const float scale_factor, bool align_corners, T* out_dptr) {
  for (int64_t index = 0; index < elem_cnt; ++index) {
    int64_t n, c, h;
    out_helper.OffsetToNdIndex(index, n, c, h);
    const T h1r = GetLinearInputIndex(h, scale_factor, align_corners);
    const int64_t h1 = h1r;
    const int64_t h1p = (h1 < in_height - 1) ? 1 : 0;
    const T h1lambda = h1r - h1;
    const T h0lambda = static_cast<T>(1.) - h1lambda;
    out_dptr[index] = h0lambda * in_dptr[in_helper.NdIndexToOffset(n, c, h1)]
                      + h1lambda * in_dptr[in_helper.NdIndexToOffset(n, c, h1 + h1p)];
  }
}

線性鄰插值支持輸入Tensor為3-D(NCW)。

0x4. nearest插值

最近鄰插值法在放大圖像時補充的像素是最近鄰的像素的值。在0x2中已經(jīng)講解了最近鄰插值的做法，假設原始圖像的大小是，目標圖像是，那么兩幅圖像的邊長比分別是和。那么目標圖像的位置的像素可以通過上面的邊長比對應回原圖像，坐標為。這里對應目標圖形像素位置到原始圖形像素位置如果是直接四舍五入那么就是最近鄰插值。這種插值缺點就是會導致像素的變化不連續(xù)，在新圖中會產(chǎn)生鋸齒。

在OneFlow中實現(xiàn)最近鄰插值的代碼在https://github.com/Oneflow-Inc/oneflow/blob/master/oneflow/user/kernels/upsample_nearest_kernel.cpp，這里以輸入Tensor為NCW為例代碼如下：

OF_DEVICE_FUNC static int64_t GetNearestInputIndex(const int64_t out_dim_idx, const float scale,
                                                   const int64_t in_dim_size) {
  int64_t index = static_cast<int64_t>(std::floor((static_cast<float>(out_dim_idx) * scale)));
  index = index > in_dim_size - 1 ? in_dim_size - 1 : index;
  index = index < static_cast<int64_t>(0) ? static_cast<int64_t>(0) : index;
  return index;
}

template<typename T>
static void UpsampleNearest1DForward(const int64_t elem_cnt, const T* in_dptr,
                                     NdIndexOffsetHelper<int64_t, 3> in_helper,
                                     NdIndexOffsetHelper<int64_t, 3> out_helper,
                                     const int64_t in_height, const float scale_factor,
                                     T* out_dptr) {
  for (int64_t index = 0; index < elem_cnt; ++index) {
    int64_t n, c, h;
    out_helper.OffsetToNdIndex(index, n, c, h);
    const int64_t in_h = GetNearestInputIndex(h, scale_factor, in_height);
    out_dptr[index] = in_dptr[in_helper.NdIndexToOffset(n, c, in_h)];
  }
}

最近鄰插值支持輸入Tensor為3-D(NCW)，4-D(NCHW)，5-D(NCDHW)。

0x5. bilinear插值

假設原始圖像的大小是，目標圖像是，那么兩幅圖像的邊長比分別是和。那么目標圖像的位置的像素可以通過上面的邊長比對應回原圖像，坐標為。當然這樣獲得的坐標可能不是整數(shù)，如果強行取整就是普通的最鄰近插值，而雙線性插值就是通過尋找距離這個對應坐標最近的四個像素點，來計算該點的值，如果坐標是，那么最近的四個像素是, ，。如果圖形是灰度圖，那么點的像素值可以通過下面的公式計算：。其中，為最近的個像素點，為各點的權重。

怎么計算這里直接截圖百度百科的解釋，非常清楚：

按照上面的方法來實現(xiàn)代碼，OneFlow中實現(xiàn)在https://github.com/Oneflow-Inc/oneflow/blob/master/oneflow/user/kernels/upsample_bilinear_2d_kernel.cpp，這里只看前向：

template<typename T>
OF_DEVICE_FUNC void GetBilinearParam(const bool align_corners, const int64_t h, const int64_t w,
                                     const int64_t in_height, const int64_t in_width,
                                     const T scale_h, const T scale_w, BilinearParam<T>* params) {
  T h1r;
  if (align_corners) {
    h1r = scale_h * static_cast<T>(h);
  } else {
    h1r = (static_cast<T>(h) + 0.5f) * scale_h - 0.5f;
    h1r = h1r < 0 ? 0 : h1r;
  }
  const int64_t h1 = h1r;
  const int64_t h1p = (h1 < in_height - 1) ? 1 : 0;

  T w1r;
  if (align_corners) {
    w1r = scale_w * static_cast<T>(w);
  } else {
    w1r = (static_cast<T>(w) + 0.5f) * scale_w - 0.5f;
    w1r = w1r < 0 ? 0 : w1r;
  }
  const int64_t w1 = w1r;
  const int64_t w1p = (w1 < in_width - 1) ? 1 : 0;

  params->top_h_index = h1;
  params->bottom_h_index = h1 + h1p;
  params->h_lerp = h1r - h1;
  params->left_w_index = w1;
  params->right_w_index = w1 + w1p;
  params->w_lerp = w1r - w1;
}

template<typename T>
static void UpsampleBilinear2DForward(const int64_t elem_cnt, const T* in_dptr,
                                      NdIndexOffsetHelper<int64_t, 4> in_helper,
                                      NdIndexOffsetHelper<int64_t, 4> out_helper,
                                      const int64_t in_height, const int64_t in_width,
                                      const T scale_h, const T scale_w, const bool align_corners,
                                      T* out_dptr) {
  for (int64_t index = 0; index < elem_cnt; ++index) {
    int64_t n, c, h, w;
    out_helper.OffsetToNdIndex(index, n, c, h, w);
    BilinearParam<T> params;
    GetBilinearParam(align_corners, h, w, in_height, in_width, scale_h, scale_w, &params);
    const int64_t top_offset = in_helper.NdIndexToOffset(n, c, params.top_h_index, 0);
    const int64_t bottom_offset = in_helper.NdIndexToOffset(n, c, params.bottom_h_index, 0);
    const T top_left = in_dptr[top_offset + params.left_w_index];
    const T top_right = in_dptr[top_offset + params.right_w_index];
    const T bottom_left = in_dptr[bottom_offset + params.left_w_index];
    const T bottom_right = in_dptr[bottom_offset + params.right_w_index];
    const T top = top_left + (top_right - top_left) * params.w_lerp;
    const T bottom = bottom_left + (bottom_right - bottom_left) * params.w_lerp;
    out_dptr[index] = top + (bottom - top) * params.h_lerp;
  }
}

和上面圖片中的過程是一一對應的。雙線性插值相對于最近鄰插值好處就是目標像素是由原始圖像中多個像素插值來的，圖形就會比較平滑，不會產(chǎn)生鋸齒。

bilinear插值支持二維（NCHW）輸入。

0x6. bicubic 插值

雙三次插值是一種更加復雜的插值方式，它能創(chuàng)造出比雙線性插值更平滑的圖像邊緣。

wiki：在數(shù)值分析這個數(shù)學分支中，雙三次插值（英語：Bicubic interpolation）是二維空間中最常用的插值方法。在這種方法中，函數(shù) f 在點 (x, y) 的值可以通過矩形網(wǎng)格中最近的十六個采樣點的加權平均得到，在這里需要使用兩個多項式插值三次函數(shù)，每個方向使用一個。

這是實現(xiàn)interpolate這個算子時最復雜的一種插值方式，計算過程如下：

其中的計算方式如下：

注意這里提到一般取-0.5或者-0.75，我們這里和Pytorch以及OpenCV保持一致，取-0.75。計算W的過程代碼實現(xiàn)如下：

// Based on
// https://en.wikipedia.org/wiki/Bicubic_interpolation#Bicubic_convolution_algorithm

template<typename T>
OF_DEVICE_FUNC T cubic_convolution1(const T x, const T A) {
  return ((A + 2.0) * x - (A + 3.0)) * x * x + 1.0;
}

template<typename T>
OF_DEVICE_FUNC T cubic_convolution2(const T x, const T A) {
  return ((A * x - 5.0 * A) * x + 8.0 * A) * x - 4.0 * A;
}

template<typename T>
OF_DEVICE_FUNC void get_cubic_upsample_coefficients(T coeffs[4], const T t) {
  T A = -0.75;

  T x1 = t;
  coeffs[0] = cubic_convolution2<T>(x1 + 1.0, A);
  coeffs[1] = cubic_convolution1<T>(x1, A);

  // opposite coefficients
  T x2 = 1.0 - t;
  coeffs[2] = cubic_convolution1<T>(x2, A);
  coeffs[3] = cubic_convolution2<T>(x2 + 1.0, A);
}

template<typename T>
OF_DEVICE_FUNC T cubic_interp1d(const T x0, const T x1, const T x2, const T x3, const T t) {
  T coeffs[4];
  get_cubic_upsample_coefficients<T>(coeffs, t);
  return x0 * coeffs[0] * 1.0 + x1 * coeffs[1] * 1.0 + x2 * coeffs[2] * 1.0 + x3 * coeffs[3] * 1.0;
}

基于這幾個函數(shù)實現(xiàn)完整的bicubic插值算法：

void Compute(user_op::KernelComputeContext* ctx) const override {
    const user_op::Tensor* x_tensor = ctx->Tensor4ArgNameAndIndex("x", 0);
    user_op::Tensor* y_tensor = ctx->Tensor4ArgNameAndIndex("y", 0);
    const T* in_ptr = x_tensor->dptr<T>();
    T* out_ptr = y_tensor->mut_dptr<T>();
    const float height_scale = ctx->Attr<float>("height_scale");
    const float width_scale = ctx->Attr<float>("width_scale");
    const bool align_corners = ctx->Attr<bool>("align_corners");

    const int nbatch = x_tensor->shape().At(0);
    const int channels = x_tensor->shape().At(1);
    const int64_t in_height = x_tensor->shape().At(2);
    const int64_t in_width = x_tensor->shape().At(3);
    const int64_t out_height = y_tensor->shape().At(2);
    const int64_t out_width = y_tensor->shape().At(3);

    if (in_height == out_height && in_width == out_width) {
      memcpy(out_ptr, in_ptr, sizeof(T) * nbatch * channels * in_height * in_width);
    } else {
      const T scale_height = GetAreaPixelScale(in_height, out_height, align_corners, height_scale);
      const T scale_width = GetAreaPixelScale(in_width, out_width, align_corners, width_scale);

      for (int64_t output_y = 0; output_y < out_height; output_y++) {
        for (int64_t output_x = 0; output_x < out_width; output_x++) {
          const T* in = in_ptr;
          T* out = out_ptr;

          const T real_x = GetAreaPixel(scale_width, output_x, align_corners, /*cubic=*/true);
          int64_t input_x = std::floor(real_x);
          const T t_x = real_x - input_x;

          const T real_y = GetAreaPixel(scale_height, output_y, align_corners, /*cubic=*/true);
          int64_t input_y = std::floor(real_y);
          const T t_y = real_y - input_y;

          for (int64_t c = 0; c < channels * nbatch; c++) {
            T coefficients[4];

            // Interpolate 4 times in the x direction
            for (int64_t i = 0; i < 4; i++) {
              coefficients[i] =
                  cubic_interp1d<T>(upsample_get_value_bounded<T>(in, in_width, in_height,
                                                                  input_x - 1, input_y - 1 + i),
                                    upsample_get_value_bounded<T>(in, in_width, in_height,
                                                                  input_x + 0, input_y - 1 + i),
                                    upsample_get_value_bounded<T>(in, in_width, in_height,
                                                                  input_x + 1, input_y - 1 + i),
                                    upsample_get_value_bounded<T>(in, in_width, in_height,
                                                                  input_x + 2, input_y - 1 + i),
                                    t_x);
            }

            // Interpolate in the y direction using x interpolations
            out[output_y * out_width + output_x] = cubic_interp1d<T>(
                coefficients[0], coefficients[1], coefficients[2], coefficients[3], t_y);

            // Move to next channel
            in += in_width * in_height;
            out += out_width * out_height;
          }
        }
      }
    }
  }

從代碼可以看到，這里的一次2維bicubic插值被拆成了2次1維的bicubic插值。

bicubic插值支持4維(NCHW)的輸入數(shù)據(jù)，插值后的圖形比bilinear更加精細平滑。

0x7. trilinear插值

三線性插值（trilinear interpolation）主要是用于在一個3D的立方體中，通過給定頂點的數(shù)值然后計算立方體中其他點的數(shù)值的線性插值方法。如下圖：

首先我們需要選擇一個方向，然后線性插值一次將其變成雙線性插值，這樣就可以套用上面雙線性的公式了。我在實現(xiàn)的時候為了簡單直接選擇了wiki百科給出的最終公式：

在OneFlow中代碼實現(xiàn)在這里：https://github.com/Oneflow-Inc/oneflow/blob/master/oneflow/user/kernels/upsample_trilinear_3d_kernel.cpp#L25-L69，這里只看前向：

template<typename T>
static void UpsampleTrilinear3DForward(const int64_t elem_cnt, const T* in_dptr,
                                       NdIndexOffsetHelper<int64_t, 5> in_helper,
                                       NdIndexOffsetHelper<int64_t, 5> out_helper,
                                       const int64_t in_depth, const int64_t in_height,
                                       const int64_t in_width, const T rdepth, const T rheight,
                                       const T rwidth, const bool align_corners, T* out_dptr) {
  for (int64_t index = 0; index < elem_cnt; ++index) {
    int64_t n, c, d, h, w;
    out_helper.OffsetToNdIndex(index, n, c, d, h, w);

    const T t1r = GetAreaPixel(rdepth, d, align_corners);
    const int64_t t1 = t1r;
    const int64_t t1p = (t1 < in_depth - 1) ? 1 : 0;
    const T t1lambda = t1r - t1;
    const T t0lambda = static_cast<T>(1.) - t1lambda;

    const T h1r = GetAreaPixel(rheight, h, align_corners);
    const int64_t h1 = h1r;
    const int64_t h1p = (h1 < in_height - 1) ? 1 : 0;
    const T h1lambda = h1r - h1;
    const T h0lambda = static_cast<T>(1.) - h1lambda;

    const T w1r = GetAreaPixel(rwidth, w, align_corners);
    const int64_t w1 = w1r;
    const int64_t w1p = (w1 < in_width - 1) ? 1 : 0;
    const T w1lambda = w1r - w1;
    const T w0lambda = static_cast<T>(1.) - w1lambda;

    const T* pos1 = &in_dptr[in_helper.NdIndexToOffset(n, c, t1, h1, w1)];

    out_dptr[index] =
        t0lambda
            * (h0lambda * (w0lambda * pos1[0] + w1lambda * pos1[w1p])
               + h1lambda
                     * (w0lambda * pos1[h1p * in_width] + w1lambda * pos1[h1p * in_width + w1p]))
        + t1lambda
              * (h0lambda
                     * (w0lambda * pos1[t1p * in_height * in_width]
                        + w1lambda * pos1[t1p * in_height * in_width + w1p])
                 + h1lambda
                       * (w0lambda * pos1[t1p * in_height * in_width + h1p * in_width]
                          + w1lambda * pos1[t1p * in_height * in_width + h1p * in_width + w1p]));
  }
}

上面的代碼對應了trilinear插值的實現(xiàn)過程，將其分別為三次獨立的線性插值。

trilinear插值支持5維(NCDHW)輸入數(shù)據(jù)。

0x8. area插值

interpolate算子中還有一種插值方法，即area插值，代碼如下：

可以看到area插值就是adaptive_avg_pool，自適應平均池化。由于自適應平均池化中一個輸出像素對應了一個區(qū)域的輸入像素所以插值的mode參數(shù)為area，這樣想比較好理解。關于adaptive_avg_pool的細節(jié)我就不講了，思路就是枚舉輸出像素然后找到對應的輸入像素的區(qū)域進行像素求和和取平均。感興趣可以看一下OneFlow的具體實現(xiàn)：https://github.com/Oneflow-Inc/oneflow/blob/master/oneflow/user/kernels/adaptive_pool_cpu_kernel.cpp

0x9. 插值方法比較

上面介紹了interpolate Op的各種插值算法，從Nearest到BiLinear再到Bicubic，獲得的結果越來越平滑，但計算的代價也相應的增大。OneFlow和Pytorch一樣將基于這個實現(xiàn)各種Upsample Module。還需要說明的是上采樣除了這個interpolate中提到的方法還有反卷積方法，之前已經(jīng)講過了，這里就不重復補充。

另外上面介紹的示例代碼都是CPU版本，只需要在對應鏈接下找同名的.cu文件就可以看到GPU版本的代碼。

本文以interpolate算子的開發(fā)過程為例，梳理了深度學習框架中基本所有的插值方法，希望可以幫助到讀者。

歡迎關注GiantPandaCV, 在這里你將看到獨家的深度學習分享，堅持原創(chuàng)，每天分享我們學習到的新鮮知識。( ? ?ω?? )?

有對文章相關的問題，或者想要加入交流群，歡迎添加BBuf微信：

二維碼

以OneFlow為例梳理深度學習框架的那些插值方法