We present a fully Bayesian approach to the data association problem. Our model is capable of correctly modelling data generated from multiple processes simultaneously solving the association problem of the observed data and the induced supervised learning problems. Underpinning our approach is the use of Gaussian process priors to encode the structure of both the data and the data associations. We present an efficient learning scheme based on doubly stochastic variational inference and discuss how it can be applied to deep Gaussian process priors.

We present a fully Bayesian approach to the data association problem.

Our model is capable of correctly modelling data generated from multiple processes simultaneously solving the association problem of the observed data and the induced supervised learning problems.

Underpinning our approach is the use of Gaussian process priors to encode the structure of both the data and the data associations.

We present an efficient learning scheme based on doubly stochastic variational inference and discuss how it can be applied to deep Gaussian process priors.

\end{abstract}

\section{Introduction}

\label{sec:introduction}

Many real-world modelling tasks can be cathegorized by multiple operational regimes. As an example, consider a model describing the lift resulting from airflow around a wing profile as a function of attack angle. At a low angle the lift increases linearly with attack angle until the wing stalls and the characteristic of the airflow fundamentally changes. Building a truthful model of such data requires learning two separate models and correctly associating the observed data to each of the dynamical regimes. A similar example would be if our sensors that measures the lift are faulty in a manner such that we either get a correct reading or a noisy one. Estimating a model in this scenario is often referred to as a \emph{data association problem}~\parencite{Bar-Shalom:1987, Cox93areview}, where we consider the data to have been generated by a mixture of processes and we are interested in factorising the data into these components.

Many real-world modelling tasks can be categorized by multiple operational regimes.

As an example, consider a model describing the lift resulting from airflow around a wing profile as a function of attack angle.

At a low angle the lift increases linearly with attack angle until the wing stalls and the characteristic of the airflow fundamentally changes.

Building a truthful model of such data requires learning two separate models and correctly associating the observed data to each of the dynamical regimes.

A similar example would be if our sensors that measures the lift are faulty in a manner such that we either get a correct reading or a noisy one.

Estimating a model in this scenario is often referred to as a \emph{data association problem}~\parencite{Bar-Shalom:1987, Cox93areview}, where we consider the data to have been generated by a mixture of processes and we are interested in factorising the data into these components.

\todo[inline]{I think this way of actually saying that the two problems are just the same will cut some introduction, what do you think?}