↓推薦關(guān)注↓

最近使用深度學(xué)習(xí)進(jìn)行時(shí)間序列預(yù)測(cè)而不是經(jīng)典方法涌現(xiàn)出諸多創(chuàng)新。本文將為大家演示一個(gè)基于 HuggingFace Transformers 包構(gòu)建的概率時(shí)間序列預(yù)測(cè)的案例。

概率預(yù)測(cè)

通常，經(jīng)典方法針對(duì)數(shù)據(jù)集中的每個(gè)時(shí)間序列單獨(dú)擬合。然而，當(dāng)處理大量時(shí)間序列時(shí)，在所有可用時(shí)間序列上訓(xùn)練一個(gè)“全局”模型是有益的，這使模型能夠從許多不同的來(lái)源學(xué)習(xí)潛在的表示。

深度學(xué)習(xí)非常適合訓(xùn)練 全局概率模型，而不是訓(xùn)練局部點(diǎn)預(yù)測(cè)模型，因?yàn)樯窠?jīng)網(wǎng)絡(luò)可以從幾個(gè)相關(guān)的時(shí)間序列中學(xué)習(xí)表示，并對(duì)數(shù)據(jù)的不確定性進(jìn)行建模。

在概率設(shè)定中學(xué)習(xí)某些選定參數(shù)分布的未來(lái)參數(shù)很常見(jiàn)，例如高斯分布或 Student-T，或者學(xué)習(xí)條件分位數(shù)函數(shù)，或使用適應(yīng)時(shí)間序列設(shè)置的共型預(yù)測(cè)框架。通過(guò)采用經(jīng)驗(yàn)均值或中值，人們總是可以將概率模型轉(zhuǎn)變?yōu)辄c(diǎn)預(yù)測(cè)模型。

時(shí)間序列Transformer

這篇博文中，我們將利用傳統(tǒng) vanilla Transformer 進(jìn)行單變量概率預(yù)測(cè)任務(wù) (即預(yù)測(cè)每個(gè)時(shí)間序列的一維分布)。由于 Encoder-Decoder Transformer 很好地封裝了幾個(gè)歸納偏差，所以它成為了我們預(yù)測(cè)的自然選擇。

首先，使用 Encoder-Decoder 架構(gòu)在推理時(shí)很有幫助。通常對(duì)于一些記錄的數(shù)據(jù)，我們希望提前預(yù)知未來(lái)的一些預(yù)測(cè)步驟。我們可以在給定某種分布類(lèi)型的情況下，從中抽樣以提供預(yù)測(cè)，直到我們期望的預(yù)測(cè)范圍。這被稱為貪婪采樣 (Greedy Sampling)/搜索。

其次，Transformer 幫助我們訓(xùn)練可能包含成千上萬(wàn)個(gè)時(shí)間點(diǎn)的時(shí)間序列數(shù)據(jù)。由于時(shí)間和內(nèi)存限制，一次性將所有時(shí)間序列的完整歷史輸入模型或許不太可行。因此，在為隨機(jī)梯度下降構(gòu)建批次時(shí)，可以考慮適當(dāng)?shù)纳舷挛拇翱诖笮。挠?xùn)練數(shù)據(jù)中對(duì)該窗口和后續(xù)預(yù)測(cè)長(zhǎng)度大小的窗口進(jìn)行采樣?？梢詫⒄{(diào)整過(guò)大小的上下文窗口傳遞給編碼器、預(yù)測(cè)窗口傳遞給 ausal-masked 解碼器。

Transformers 相對(duì)于其他架構(gòu)的另一個(gè)好處是，我們可以將缺失值作為編碼器或解碼器的額外掩蔽值，并且仍然可以在不訴諸于填充或插補(bǔ)的情況下進(jìn)行訓(xùn)練。

設(shè)置環(huán)境

首先，讓我們安裝必要的庫(kù): Transformers、Datasets、Evaluate、Accelerate 和 GluonTS。

正如我們將展示的那樣，GluonTS 將用于轉(zhuǎn)換數(shù)據(jù)以創(chuàng)建特征以及創(chuàng)建適當(dāng)?shù)挠?xùn)練、驗(yàn)證和測(cè)試批次。

!pip install -q transformers!pip install -q datasets!pip install -q evaluate!pip install -q accelerate!pip install -q gluonts ujson

加載數(shù)據(jù)集

在這篇博文中，我們將使用 Hugging Face Hub 上提供的 tourism_monthly 數(shù)據(jù)集。該數(shù)據(jù)集包含澳大利亞 366 個(gè)地區(qū)的每月旅游流量。

此數(shù)據(jù)集是 Monash Time Series Forecasting 存儲(chǔ)庫(kù)的一部分，該存儲(chǔ)庫(kù)收納了是來(lái)自多個(gè)領(lǐng)域的時(shí)間序列數(shù)據(jù)集。它可以看作是時(shí)間序列預(yù)測(cè)的 GLUE 基準(zhǔn)。

from datasets import load_datasetdataset = load_dataset("monash_tsf", "tourism_monthly")

可以看出，數(shù)據(jù)集包含 3 個(gè)片段: 訓(xùn)練、驗(yàn)證和測(cè)試。

datasetDatasetDict({        train: Dataset({            features: ['start', 'target', 'feat_static_cat', 'feat_dynamic_real', 'item_id'],            num_rows: 366        })        test: Dataset({            features: ['start', 'target', 'feat_static_cat', 'feat_dynamic_real', 'item_id'],            num_rows: 366        })        validation: Dataset({            features: ['start', 'target', 'feat_static_cat', 'feat_dynamic_real', 'item_id'],            num_rows: 366        })    })

每個(gè)示例都包含一些鍵，其中 start 和 target 是最重要的鍵。讓我們看一下數(shù)據(jù)集中的第一個(gè)時(shí)間序列:

train_example = dataset['train'][0]train_example.keys()
dict_keys(['start', 'target', 'feat_static_cat', 'feat_dynamic_real', 'item_id'])

start 僅指示時(shí)間序列的開(kāi)始 (類(lèi)型為 datetime) ，而 target 包含時(shí)間序列的實(shí)際值。

start 將有助于將時(shí)間相關(guān)的特征添加到時(shí)間序列值中，作為模型的額外輸入 (例如“一年中的月份”) 。因?yàn)槲覀円呀?jīng)知道數(shù)據(jù)的頻率是每月，所以也能推算第二個(gè)值的時(shí)間戳為 1979-02-01，等等。

print(train_example['start'])print(train_example['target'])1979-01-01 00:00:00    [1149.8699951171875, 1053.8001708984375, ..., 5772.876953125]

驗(yàn)證集包含與訓(xùn)練集相同的數(shù)據(jù)，只是數(shù)據(jù)時(shí)間范圍延長(zhǎng)了 prediction_length 那么多。這使我們能夠根據(jù)真實(shí)情況驗(yàn)證模型的預(yù)測(cè)。

與驗(yàn)證集相比，測(cè)試集還是比驗(yàn)證集多包含 prediction_length 時(shí)間的數(shù)據(jù) (或者使用比訓(xùn)練集多出數(shù)個(gè) prediction_length 時(shí)長(zhǎng)數(shù)據(jù)的測(cè)試集，實(shí)現(xiàn)在多重滾動(dòng)窗口上的測(cè)試任務(wù))。

validation_example = dataset['validation'][0]validation_example.keys()
dict_keys(['start', 'target', 'feat_static_cat', 'feat_dynamic_real', 'item_id'])

驗(yàn)證的初始值與相應(yīng)的訓(xùn)練示例完全相同：

print(validation_example['start'])print(validation_example['target'])
1979-01-01 00:00:00    [1149.8699951171875, 1053.8001708984375, ..., 5985.830078125]

但是，與訓(xùn)練示例相比，此示例具有 prediction_length=24 個(gè)額外的數(shù)據(jù)。讓我們驗(yàn)證一下。

freq = "1M"prediction_length = 24
assert len(train_example["target"]) + prediction_length == len(    validation_example["target"])

讓我們可視化一下：

import matplotlib.pyplot as plt
figure, axes = plt.subplots()axes.plot(train_example["target"], color="blue")axes.plot(validation_example["target"], color="red", alpha=0.5)
plt.show()

將 start 更新為 pd.Period

我們要做的第一件事是根據(jù)數(shù)據(jù)的 freq 值將每個(gè)時(shí)間序列的 start 特征轉(zhuǎn)換為 pandas 的 Period 索引:

from functools import lru_cache
import pandas as pdimport numpy as np
@lru_cache(10_000)def convert_to_pandas_period(date, freq):    return pd.Period(date, freq)
def transform_start_field(batch, freq):    batch["start"] = [convert_to_pandas_period(date, freq) for date in batch["start"]]    return batch

這里我們使用 datasets 的 set_transform 來(lái)實(shí)現(xiàn):

from functools import partial
train_dataset.set_transform(partial(transform_start_field, freq=freq))test_dataset.set_transform(partial(transform_start_field, freq=freq))

定義模型

接下來(lái)，讓我們實(shí)例化一個(gè)模型。該模型將從頭開(kāi)始訓(xùn)練，因此我們不使用 from_pretrained 方法，而是從 config 中隨機(jī)初始化模型。

我們?yōu)槟Ｐ椭付藥讉€(gè)附加參數(shù):

prediction_length (在我們的例子中是 24 個(gè)月) : 這是 Transformer 的解碼器將學(xué)習(xí)預(yù)測(cè)的范圍;
context_length: 如果未指定 context_length，模型會(huì)將 context_length (編碼器的輸入) 設(shè)置為等于 prediction_length;
給定頻率的 lags(滯后): 這將決定模型“回頭看”的程度，也會(huì)作為附加特征。例如對(duì)于 Daily 頻率，我們可能會(huì)考慮回顧 [1, 2, 7, 30, ...]，也就是回顧 1、2……天的數(shù)據(jù)，而對(duì)于 Minute數(shù)據(jù)，我們可能會(huì)考慮 [1, 30, 60, 60*24, ...] 等;
時(shí)間特征的數(shù)量: 在我們的例子中設(shè)置為 2，因?yàn)槲覀儗⑻砑?nbsp;MonthOfYear 和 Age 特征;
靜態(tài)類(lèi)別型特征的數(shù)量: 在我們的例子中，這將只是 1，因?yàn)槲覀儗⑻砑右粋€(gè)“時(shí)間序列 ID”特征;
基數(shù): 將每個(gè)靜態(tài)類(lèi)別型特征的值的數(shù)量構(gòu)成一個(gè)列表，對(duì)于本例來(lái)說(shuō)將是 [366]，因?yàn)槲覀冇?366 個(gè)不同的時(shí)間序列;
嵌入維度: 每個(gè)靜態(tài)類(lèi)別型特征的嵌入維度，也是構(gòu)成列表。例如 [3] 意味著模型將為每個(gè) 366 時(shí)間序列 (區(qū)域) 學(xué)習(xí)大小為 3 的嵌入向量。

讓我們使用 GluonTS 為給定頻率 (“每月”) 提供的默認(rèn)滯后值:

from gluonts.time_feature import get_lags_for_frequency
lags_sequence = get_lags_for_frequency(freq)print(lags_sequence)
>>> [1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 23, 24, 25, 35, 36, 37]

這意味著我們每個(gè)時(shí)間步將回顧長(zhǎng)達(dá) 37 個(gè)月的數(shù)據(jù)，作為附加特征。我們還檢查 GluonTS 為我們提供的默認(rèn)時(shí)間特征:

from gluonts.time_feature import time_features_from_frequency_str
time_features = time_features_from_frequency_str(freq)print(time_features)
>>> [<function month_of_year at 0x7fa496d0ca70>]

在這種情況下，只有一個(gè)特征，即“一年中的月份”。這意味著對(duì)于每個(gè)時(shí)間步長(zhǎng)，我們將添加月份作為標(biāo)量值 (例如，如果時(shí)間戳為 "january"，則為 1；如果時(shí)間戳為 "february"，則為 2，等等) 。

我們現(xiàn)在準(zhǔn)備好定義模型需要的所有內(nèi)容了:

from transformers import TimeSeriesTransformerConfig, TimeSeriesTransformerForPrediction
config = TimeSeriesTransformerConfig(    prediction_length=prediction_length,    # context length:    context_length=prediction_length * 2,    # lags coming from helper given the freq:    lags_sequence=lags_sequence,    # we'll add 2 time features ("month of year" and "age", see further):    num_time_features=len(time_features) + 1,    # we have a single static categorical feature, namely time series ID:    num_static_categorical_features=1,    # it has 366 possible values:    cardinality=[len(train_dataset)],    # the model will learn an embedding of size 2 for each of the 366 possible values:    embedding_dimension=[2],
    # transformer params:    encoder_layers=4,    decoder_layers=4,    d_model=32,)
model = TimeSeriesTransformerForPrediction(config)

請(qǐng)注意，與 Transformers 庫(kù)中的其他模型類(lèi)似，TimeSeriesTransformerModel 對(duì)應(yīng)于沒(méi)有任何頂部前置頭的編碼器-解碼器 Transformer，而 TimeSeriesTransformerForPrediction 對(duì)應(yīng)于頂部有一個(gè)分布前置頭 (distribution head) 的 TimeSeriesTransformerForPrediction。默認(rèn)情況下，該模型使用 Student-t 分布 (也可以自行配置):

model.config.distribution_output
>>> student_t

這是具體實(shí)現(xiàn)層面與用于 NLP 的 Transformers 的一個(gè)重要區(qū)別，其中頭部通常由一個(gè)固定的分類(lèi)分布組成，實(shí)現(xiàn)為 nn.Linear 層。

定義轉(zhuǎn)換

接下來(lái)，我們定義數(shù)據(jù)的轉(zhuǎn)換，尤其是需要基于樣本數(shù)據(jù)集或通用數(shù)據(jù)集來(lái)創(chuàng)建其中的時(shí)間特征。

同樣，我們用到了 GluonTS 庫(kù)。這里定義了一個(gè) Chain (有點(diǎn)類(lèi)似于圖像訓(xùn)練的 torchvision.transforms.Compose) 。它允許我們將多個(gè)轉(zhuǎn)換組合到一個(gè)流水線中。

from gluonts.time_feature import (    time_features_from_frequency_str,    TimeFeature,    get_lags_for_frequency,)from gluonts.dataset.field_names import FieldNamefrom gluonts.transform import (    AddAgeFeature,    AddObservedValuesIndicator,    AddTimeFeatures,    AsNumpyArray,    Chain,    ExpectedNumInstanceSampler,    InstanceSplitter,    RemoveFields,    SelectFields,    SetField,    TestSplitSampler,    Transformation,    ValidationSplitSampler,    VstackFeatures,    RenameFields,)

下面的轉(zhuǎn)換代碼帶有注釋供大家查看具體的操作步驟。從全局來(lái)說(shuō)，我們將迭代數(shù)據(jù)集的各個(gè)時(shí)間序列并添加、刪除某些字段或特征:

from transformers import PretrainedConfig
def create_transformation(freq: str, config: PretrainedConfig) -> Transformation:    remove_field_names = []    if config.num_static_real_features == 0:        remove_field_names.append(FieldName.FEAT_STATIC_REAL)    if config.num_dynamic_real_features == 0:        remove_field_names.append(FieldName.FEAT_DYNAMIC_REAL)    if config.num_static_categorical_features == 0:        remove_field_names.append(FieldName.FEAT_STATIC_CAT)
    # a bit like torchvision.transforms.Compose    return Chain(        # step 1: remove static/dynamic fields if not specified        [RemoveFields(field_names=remove_field_names)]        # step 2: convert the data to NumPy (potentially not needed)        + (            [                AsNumpyArray(                    field=FieldName.FEAT_STATIC_CAT,                    expected_ndim=1,                    dtype=int,                )            ]            if config.num_static_categorical_features > 0            else []        )        + (            [                AsNumpyArray(                    field=FieldName.FEAT_STATIC_REAL,                    expected_ndim=1,                )            ]            if config.num_static_real_features > 0            else []        )        + [            AsNumpyArray(                field=FieldName.TARGET,                # we expect an extra dim for the multivariate case:                expected_ndim=1 if config.input_size == 1 else 2,            ),            # step 3: handle the NaN's by filling in the target with zero            # and return the mask (which is in the observed values)            # true for observed values, false for nan's            # the decoder uses this mask (no loss is incurred for unobserved values)            # see loss_weights inside the xxxForPrediction model            AddObservedValuesIndicator(                target_field=FieldName.TARGET,                output_field=FieldName.OBSERVED_VALUES,            ),            # step 4: add temporal features based on freq of the dataset            # month of year in the case when freq="M"            # these serve as positional encodings            AddTimeFeatures(                start_field=FieldName.START,                target_field=FieldName.TARGET,                output_field=FieldName.FEAT_TIME,                time_features=time_features_from_frequency_str(freq),                pred_length=config.prediction_length,            ),            # step 5: add another temporal feature (just a single number)            # tells the model where in its life the value of the time series is,            # sort of a running counter            AddAgeFeature(                target_field=FieldName.TARGET,                output_field=FieldName.FEAT_AGE,                pred_length=config.prediction_length,                log_scale=True,            ),            # step 6: vertically stack all the temporal features into the key FEAT_TIME            VstackFeatures(                output_field=FieldName.FEAT_TIME,                input_fields=[FieldName.FEAT_TIME, FieldName.FEAT_AGE]                + (                    [FieldName.FEAT_DYNAMIC_REAL]                    if config.num_dynamic_real_features > 0                    else []                ),            ),            # step 7: rename to match HuggingFace names            RenameFields(                mapping={                    FieldName.FEAT_STATIC_CAT: "static_categorical_features",                    FieldName.FEAT_STATIC_REAL: "static_real_features",                    FieldName.FEAT_TIME: "time_features",                    FieldName.TARGET: "values",                    FieldName.OBSERVED_VALUES: "observed_mask",                }            ),        ]    )

InstanceSplitter

對(duì)于訓(xùn)練、驗(yàn)證、測(cè)試步驟，接下來(lái)我們創(chuàng)建一個(gè) InstanceSplitter，用于從數(shù)據(jù)集中對(duì)窗口進(jìn)行采樣 (因?yàn)橛捎跁r(shí)間和內(nèi)存限制，我們無(wú)法將整個(gè)歷史值傳遞給 Transformer)。

實(shí)例拆分器從數(shù)據(jù)中隨機(jī)采樣大小為 context_length 和后續(xù)大小為 prediction_length 的窗口，并將 past_ 或 future_ 鍵附加到各個(gè)窗口的任何臨時(shí)鍵。這確保了 values 被拆分為 past_values 和后續(xù)的 future_values 鍵，它們將分別用作編碼器和解碼器的輸入。同樣我們還需要修改 time_series_fields 參數(shù)中的所有鍵:

from gluonts.transform.sampler import InstanceSamplerfrom typing import Optional
def create_instance_splitter(    config: PretrainedConfig,    mode: str,    train_sampler: Optional[InstanceSampler] = None,    validation_sampler: Optional[InstanceSampler] = None,) -> Transformation:    assert mode in ["train", "validation", "test"]
    instance_sampler = {        "train": train_sampler        or ExpectedNumInstanceSampler(            num_instances=1.0, min_future=config.prediction_length        ),        "validation": validation_sampler        or ValidationSplitSampler(min_future=config.prediction_length),        "test": TestSplitSampler(),    }[mode]
    return InstanceSplitter(        target_field="values",        is_pad_field=FieldName.IS_PAD,        start_field=FieldName.START,        forecast_start_field=FieldName.FORECAST_START,        instance_sampler=instance_sampler,        past_length=config.context_length + max(config.lags_sequence),        future_length=config.prediction_length,        time_series_fields=["time_features", "observed_mask"],    )

創(chuàng)建 DataLoader

有了數(shù)據(jù)，下一步需要?jiǎng)?chuàng)建 PyTorch DataLoaders。它允許我們批量處理成對(duì)的 (輸入, 輸出) 數(shù)據(jù)，即 (past_values, future_values)。

from typing import Iterable
import torchfrom gluonts.itertools import Cached, Cyclicfrom gluonts.dataset.loader import as_stacked_batches

def create_train_dataloader(    config: PretrainedConfig,    freq,    data,    batch_size: int,    num_batches_per_epoch: int,    shuffle_buffer_length: Optional[int] = None,    cache_data: bool = True,    **kwargs,) -> Iterable:    PREDICTION_INPUT_NAMES = [        "past_time_features",        "past_values",        "past_observed_mask",        "future_time_features",    ]    if config.num_static_categorical_features > 0:        PREDICTION_INPUT_NAMES.append("static_categorical_features")
    if config.num_static_real_features > 0:        PREDICTION_INPUT_NAMES.append("static_real_features")
    TRAINING_INPUT_NAMES = PREDICTION_INPUT_NAMES + [        "future_values",        "future_observed_mask",    ]
    transformation = create_transformation(freq, config)    transformed_data = transformation.apply(data, is_train=True)    if cache_data:        transformed_data = Cached(transformed_data)
    # we initialize a Training instance    instance_splitter = create_instance_splitter(config, "train")
    # the instance splitter will sample a window of    # context length + lags + prediction length (from the 366 possible transformed time series)    # randomly from within the target time series and return an iterator.    stream = Cyclic(transformed_data).stream()    training_instances = instance_splitter.apply(        stream, is_train=True    )
    return as_stacked_batches(        training_instances,        batch_size=batch_size,        shuffle_buffer_length=shuffle_buffer_length,        field_names=TRAINING_INPUT_NAMES,        output_type=torch.tensor,        num_batches_per_epoch=num_batches_per_epoch,    )

def create_test_dataloader(    config: PretrainedConfig,    freq,    data,    batch_size: int,    **kwargs,):    PREDICTION_INPUT_NAMES = [        "past_time_features",        "past_values",        "past_observed_mask",        "future_time_features",    ]    if config.num_static_categorical_features > 0:        PREDICTION_INPUT_NAMES.append("static_categorical_features")
    if config.num_static_real_features > 0:        PREDICTION_INPUT_NAMES.append("static_real_features")
    transformation = create_transformation(freq, config)    transformed_data = transformation.apply(data, is_train=False)
    # we create a Test Instance splitter which will sample the very last    # context window seen during training only for the encoder.    instance_sampler = create_instance_splitter(config, "test")
    # we apply the transformations in test mode    testing_instances = instance_sampler.apply(transformed_data, is_train=False)
    return as_stacked_batches(        testing_instances,        batch_size=batch_size,        output_type=torch.tensor,        field_names=PREDICTION_INPUT_NAMES,    )

train_dataloader = create_train_dataloader(    config=config,    freq=freq,    data=train_dataset,    batch_size=256,    num_batches_per_epoch=100,)
test_dataloader = create_test_dataloader(    config=config,    freq=freq,    data=test_dataset,    batch_size=64,)

讓我們檢查第一批:

batch = next(iter(train_dataloader))for k, v in batch.items():    print(k, v.shape, v.type())
>>> past_time_features torch.Size([256, 85, 2]) torch.FloatTensor    past_values torch.Size([256, 85]) torch.FloatTensor    past_observed_mask torch.Size([256, 85]) torch.FloatTensor    future_time_features torch.Size([256, 24, 2]) torch.FloatTensor    static_categorical_features torch.Size([256, 1]) torch.LongTensor    future_values torch.Size([256, 24]) torch.FloatTensor    future_observed_mask torch.Size([256, 24]) torch.FloatTensor

可以看出，我們沒(méi)有將 input_ids 和 attention_mask 提供給編碼器 (訓(xùn)練 NLP 模型時(shí)也是這種情況)，而是提供 past_values，以及 past_observed_mask、past_time_features、static_categorical_features 和 static_real_features 幾項(xiàng)數(shù)據(jù)。

解碼器的輸入包括 future_values、future_observed_mask 和 future_time_features。future_values 可以看作等同于 NLP 訓(xùn)練中的 decoder_input_ids。

前向傳播

讓我們對(duì)剛剛創(chuàng)建的批次執(zhí)行一次前向傳播:

# perform forward passoutputs = model(    past_values=batch["past_values"],    past_time_features=batch["past_time_features"],    past_observed_mask=batch["past_observed_mask"],    static_categorical_features=batch["static_categorical_features"]    if config.num_static_categorical_features > 0    else None,    static_real_features=batch["static_real_features"]    if config.num_static_real_features > 0    else None,    future_values=batch["future_values"],    future_time_features=batch["future_time_features"],    future_observed_mask=batch["future_observed_mask"],    output_hidden_states=True,)

print("Loss:", outputs.loss.item())
>>> Loss: 9.069628715515137

目前，該模型返回了損失值。這是由于解碼器會(huì)自動(dòng)將 future_values 向右移動(dòng)一個(gè)位置以獲得標(biāo)簽。這允許計(jì)算預(yù)測(cè)結(jié)果和標(biāo)簽值之間的誤差。

另請(qǐng)注意，解碼器使用 Causal Mask 來(lái)避免預(yù)測(cè)未來(lái)，因?yàn)樗枰A(yù)測(cè)的值在 future_values 張量中。

訓(xùn)練模型

是時(shí)候訓(xùn)練模型了！我們將使用標(biāo)準(zhǔn)的 PyTorch 訓(xùn)練循環(huán)。

這里我們用到了 Accelerate 庫(kù)，它會(huì)自動(dòng)將模型、優(yōu)化器和數(shù)據(jù)加載器放置在適當(dāng)?shù)?device 上。

from accelerate import Acceleratorfrom torch.optim import AdamW
accelerator = Accelerator()device = accelerator.device
model.to(device)optimizer = AdamW(model.parameters(), lr=6e-4, betas=(0.9, 0.95), weight_decay=1e-1)
model, optimizer, train_dataloader = accelerator.prepare(    model,    optimizer,    train_dataloader,)
model.train()for epoch in range(40):    for idx, batch in enumerate(train_dataloader):        optimizer.zero_grad()        outputs = model(            static_categorical_features=batch["static_categorical_features"].to(device)            if config.num_static_categorical_features > 0            else None,            static_real_features=batch["static_real_features"].to(device)            if config.num_static_real_features > 0            else None,            past_time_features=batch["past_time_features"].to(device),            past_values=batch["past_values"].to(device),            future_time_features=batch["future_time_features"].to(device),            future_values=batch["future_values"].to(device),            past_observed_mask=batch["past_observed_mask"].to(device),            future_observed_mask=batch["future_observed_mask"].to(device),        )        loss = outputs.loss
        # Backpropagation        accelerator.backward(loss)        optimizer.step()
        if idx % 100 == 0:            print(loss.item())

模型推理

在推理時(shí)，建議使用 generate() 方法進(jìn)行自回歸生成，類(lèi)似于 NLP 模型。

預(yù)測(cè)的過(guò)程會(huì)從測(cè)試實(shí)例采樣器中獲得數(shù)據(jù)。采樣器會(huì)將數(shù)據(jù)集的每個(gè)時(shí)間序列的最后 context_length 那么長(zhǎng)時(shí)間的數(shù)據(jù)采樣出來(lái)，然后輸入模型。請(qǐng)注意，這里需要把提前已知的 future_time_features 傳遞給解碼器。

該模型將從預(yù)測(cè)分布中自回歸采樣一定數(shù)量的值，并將它們傳回解碼器最終得到預(yù)測(cè)輸出:

model.eval()
forecasts = []
for batch in test_dataloader:    outputs = model.generate(        static_categorical_features=batch["static_categorical_features"].to(device)        if config.num_static_categorical_features > 0        else None,        static_real_features=batch["static_real_features"].to(device)        if config.num_static_real_features > 0        else None,        past_time_features=batch["past_time_features"].to(device),        past_values=batch["past_values"].to(device),        future_time_features=batch["future_time_features"].to(device),        past_observed_mask=batch["past_observed_mask"].to(device),    )    forecasts.append(outputs.sequences.cpu().numpy())

該模型輸出一個(gè)表示結(jié)構(gòu)的張量 (batch_size, number of samples, prediction length)。

下面的輸出說(shuō)明: 對(duì)于大小為 64 的批次中的每個(gè)示例，我們將獲得接下來(lái) 24 個(gè)月內(nèi)的 100 個(gè)可能的值:

forecasts[0].shape
>>> (64, 100, 24)

我們將垂直堆疊它們，以獲得測(cè)試數(shù)據(jù)集中所有時(shí)間序列的預(yù)測(cè):

forecasts = np.vstack(forecasts)print(forecasts.shape)
>>> (366, 100, 24)

我們可以根據(jù)測(cè)試集中存在的樣本值，根據(jù)真實(shí)情況評(píng)估生成的預(yù)測(cè)。這里我們使用數(shù)據(jù)集中的每個(gè)時(shí)間序列的 MASE 和 sMAPE 指標(biāo) (metrics) 來(lái)評(píng)估:

from evaluate import loadfrom gluonts.time_feature import get_seasonality
mase_metric = load("evaluate-metric/mase")smape_metric = load("evaluate-metric/smape")
forecast_median = np.median(forecasts, 1)
mase_metrics = []smape_metrics = []for item_id, ts in enumerate(test_dataset):    training_data = ts["target"][:-prediction_length]    ground_truth = ts["target"][-prediction_length:]    mase = mase_metric.compute(        predictions=forecast_median[item_id],         references=np.array(ground_truth),         training=np.array(training_data),         periodicity=get_seasonality(freq))    mase_metrics.append(mase["mase"])
    smape = smape_metric.compute(        predictions=forecast_median[item_id],         references=np.array(ground_truth),     )    smape_metrics.append(smape["smape"])

print(f"MASE: {np.mean(mase_metrics)}")
>>> MASE: 1.2564196892177717
print(f"sMAPE: {np.mean(smape_metrics)}")
>>> sMAPE: 0.1609541520852549

我們還可以單獨(dú)繪制數(shù)據(jù)集中每個(gè)時(shí)間序列的結(jié)果指標(biāo)，并觀察到其中少數(shù)時(shí)間序列對(duì)最終測(cè)試指標(biāo)的影響很大:

plt.scatter(mase_metrics, smape_metrics, alpha=0.3)plt.xlabel("MASE")plt.ylabel("sMAPE")plt.show()

為了根據(jù)基本事實(shí)測(cè)試數(shù)據(jù)繪制任何時(shí)間序列的預(yù)測(cè)，我們定義了以下輔助繪圖函數(shù):

import matplotlib.dates as mdates
def plot(ts_index):    fig, ax = plt.subplots()
    index = pd.period_range(        start=test_dataset[ts_index][FieldName.START],        periods=len(test_dataset[ts_index][FieldName.TARGET]),        freq=freq,    ).to_timestamp()
    # Major ticks every half year, minor ticks every month,    ax.xaxis.set_major_locator(mdates.MonthLocator(bymonth=(1, 7)))    ax.xaxis.set_minor_locator(mdates.MonthLocator())
    ax.plot(        index[-2*prediction_length:],         test_dataset[ts_index]["target"][-2*prediction_length:],        label="actual",    )
    plt.plot(        index[-prediction_length:],         np.median(forecasts[ts_index], axis=0),        label="median",    )
    plt.fill_between(        index[-prediction_length:],        forecasts[ts_index].mean(0) - forecasts[ts_index].std(axis=0),         forecasts[ts_index].mean(0) + forecasts[ts_index].std(axis=0),         alpha=0.3,         interpolate=True,        label="+/- 1-std",    )    plt.legend()    plt.show()

總結(jié)

正如時(shí)間序列研究人員所知，人們對(duì)“將基于 Transformer 的模型應(yīng)用于時(shí)間序列”問(wèn)題很感興趣。傳統(tǒng) vanilla Transformer 只是眾多基于注意力 (Attention) 的模型之一，因此需要向庫(kù)中補(bǔ)充更多模型。

目前沒(méi)有什么能妨礙我們繼續(xù)探索對(duì)多變量時(shí)間序列進(jìn)行建模，但是為此需要使用多變量分布頭來(lái)實(shí)例化模型。目前已經(jīng)支持了對(duì)角獨(dú)立分布，后續(xù)會(huì)增加其他多元分布支持。請(qǐng)繼續(xù)關(guān)注未來(lái)的博客文章以及其中的教程。

最后，NLP/CV 領(lǐng)域從大型預(yù)訓(xùn)練模型中獲益匪淺，但據(jù)我們所知，時(shí)間序列領(lǐng)域并非如此?；?Transformer 的模型似乎是這一研究方向的必然之選，我們迫不及待地想看看研究人員和從業(yè)者會(huì)發(fā)現(xiàn)哪些突破！

來(lái)源：https://huggingface.co/blog/time-series-transformers

- EOF -

星球服務(wù)

知識(shí)星球是一個(gè)面向 全體學(xué)生和在職人員 的技術(shù)交流平臺(tái)，旨在為大家提供社招/校招準(zhǔn)備攻略、面試題庫(kù)、面試經(jīng)驗(yàn)、學(xué)習(xí)路線、求職答疑、項(xiàng)目實(shí)戰(zhàn)案例、內(nèi)推機(jī)會(huì)等內(nèi)容，幫你快速成長(zhǎng)、告別迷茫。

涉及Python，數(shù)據(jù)分析，數(shù)據(jù)挖掘，機(jī)器學(xué)習(xí)，深度學(xué)習(xí)，大數(shù)據(jù)，搜光推、自然語(yǔ)言處理、計(jì)算機(jī)視覺(jué)、web 開(kāi)發(fā)、大模型、多模態(tài)、Langchain、擴(kuò)散模型、知識(shí)圖譜等方向。

我們會(huì)不定期開(kāi)展知識(shí)星球立減優(yōu)惠活動(dòng)，加入星球前可以添加城哥微信：dkl88191，咨詢?cè)斍椤?/p>

技術(shù)學(xué)習(xí)資料如下，星球成員可免費(fèi)獲取2個(gè)，非星球成員，添加城哥微信：dkl88191，可以單獨(dú)購(gòu)買(mǎi)。

使用 Transformers 進(jìn)行概率時(shí)間序列預(yù)測(cè)實(shí)戰(zhàn)