Time as a Machine Learning Feature

A method to represent cyclic continuous data

There are many kinds of data naturally following cycles, like longitude, rotation, positions and movements of atronomical objects or tides for example. The way we measure time is one of the upiquitous primary application areas of cyclic data.

But is the way we use to handle and display time values also suitable for expressing and introducing cyclicality to a machine learning model?

To make it quick: No.

The reason for this is, that „cyclicity“ is only an implicit feature of time known to us, as we have learned to cope with it in our childhood, which means we are able to understand distances between two adjacent points in time on the cyclic scale implicitely. A machine learning algorithm using a regularily scaled time feature, e.g. represented as 24h time, will interpret some distances between two adjacent points in time wronlgy.

On a 24h clock, the difference between 23:00 and 03:00 is 4h but the difference between 03:00 and 23:00 is 20h. In order to prevent bias of this kind and to introduce the cyclic behaviour to our machine learning models, a time representation in the form of sinus-cosinus value pairs has become established.

The following python code shows how time can be encoded in form of sinus-cosinus value pairs.

At first we generate some random points in time and import some default libraries and do something about the plotting style. The import looks like much, but only numpy, pandas and matplotlib.pyplot are really needed.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

colors = ["windows blue", "amber", "greyish", "faded green", "dusty purple"]
sns.palplot(sns.xkcd_palette(colors))

plt.rcParams["figure.figsize"] = (20,10)
sns.set()

SECONDS_IN_DAY = 24 * 60 * 60 + 3600
N_TIMES = 2500

def gimme_times(n):
    rnd_sec = np.random.randint(0, SECONDS_IN_DAY, n)
    return pd.Series(rnd_sec, name="Seconds").sort_values().reset_index(drop=True)


times = gimme_times(N_TIMES)
times.head()

The former code will produce an output similar to the following table.

ISeconds
013
125
234
364
471
Name: Seconds, dtype: int64

Plotting this data will only display a linearly increasing graph not indicating that inter-day timesplits could be closer than intra-day timesplits.

Time Features: linear plot of times without expected midnight gap.

Sinus-Cosinus Transformation of Time

To obtain the desired behaviour of correctly interpreted time intervals at the transition between two days and thus correct cyclical characteristic representation, we represent times as sine/cosine value tuples. The following code will create a pandas dataframe from the former times series and will

def sine_cosine_transform(times):
    sine_time = np.sin(2 * np.pi * times / SECONDS_IN_DAY)
    cosine_time = np.cos(2 * np.pi * times / SECONDS_IN_DAY)

    return pd.DataFrame({"sine_time": sine_time,
                         "cosine_time": cosine_time
                        })

cyclic_times = sine_cosine_transform(times)
cyclic_times.head()

generate an output similar to this table.

Isine_timecosine_time
00.0000701.000000
10.0002091.000000
20.0030720.999995
30.0044680.999990
40.0048870.999988

Plotting the calculated sine and cosine values will yield well known plots.

cyclic_times.sine_time.plot()
cyclic_times.cosine_time.plot()
Time Features: Sine and cosine plot of time points.

Be aware that you cannot use a single value alone, e.g. sine or cosine, for representing time as you would introduce symmetric effects as horizontal lines drawn accross the plot would always touch two points in the graph with the meaning that two non-adjacent times would be equal.

Using sine_time and cosine_time together creates the necessary unambiguity and introduces cyclic properties correctly as machine-learning feature. The following graph of a randomly selected sample of sine-cosine-time points illustrates the properties of the previous feature transformations.

cyclic_times.sample(75).plot.scatter('sine_time','cosine_time').set_aspect('equal');
Time Features: Circular plot of sine-cosine time points.

Have fun evaluating the effects of your transformed cyclic values on your machine learning models.

Kind regards,
Henrik Hain

This article has been first published on:
https://henrikhain.io/post/time-as-a-machine-learning-feature/

Gefällt Ihnen der Artikel?

Share on linkedin
Share on Linkdin
Share on xing
Share on XING
Share on twitter
Share on Twitter
Share on facebook
Share on Facebook
Ihre Daten werden gemäß unserer Datenschutzerklärung erhoben und verarbeitet.
Künstliche Intelligenz Partials
Data Analytics

Time Series Data Clustering Distance Measures

As ubiquitous as time series are, it is often of interest to identify clusters of similar time series in order to gain better insight into the structure of the available data. However, unsupervised learning from time series data has its own stumbling blocks. For this reason, the following article presents some helpful time series specific distance metrics and basic procedures to work successfully with time series data.

Weiterlesen »
Künstliche Intelligenz Parts
Künstliche Intelligenz

Unsupervised Skill Discovery in Deep Reinforcement Learning

Scientists from Google AI have published exciting research regarding unsupervised skill discovery in deep reinforcement learning. Essentially it will be possible to utilize unsupervised learning methods to learn model dynamics and promising skills in an unsupervised, model-free reinforcement learning enviroment, subsequently enabling to use model-based planning methods in model-free reinforcement learning setups.

Weiterlesen »