Geert Pingen @gpgn

Cloud & Machine Learning Engineer, Research Scientist
Causal Discovery
Machine Learning


Robotic platforms are employed for tasks that are too dangerous, costly, or simply too repetitive for humans to perform. A degree of autonomy is desirable to reduce the effort required from human operators to monitor and manage the system.

These autonomous systems can be deployed in constrained environments – as is the case for static robotic arms, or track-guided warehouse robots – or in unstructured open world environments which can contain unknowns. In the latter case, an autonomous system will encounter changes in the environment, and possibly in the performance of its own capabilities. In order to still operate effectively and safely under these dynamics, it must have the ability to assess its own performance and adapt itself in response to changing circumstances. Such systems are referred to as Self-Adaptive Systems (SAS).

Self-adaptivity typically involves some form of feedback loop to allow the system to react on external stimuli. Many approaches to engineer SAS adhere to the processes defined in the MAPE-K control model: Monitor, Analyse, Plan, Execute, using shared Knowledge.

Figure 1. MAPE-K control loop.

Figure 1. MAPE-K control loop.

The ability to model the environment under uncertainty is critical to successful autonomous operation. Recently, machine learning (ML) methods, and especially deep-learning based models, have been used to support or complement processes in the MAPE-K control loop, for example in data pre-processing, world modeling, or policy optimization.

However, while these models have shown great performance in tasks ranging from recommendation to visual object detection and language modeling, they are intrinsically obfuscated, and require another layer of models to provide global or local interpretability – spawning the field of Explainable AI (XAI). In addition, they have difficulty in generalizing to novel situations or tasks, and in reacting to shifts in data distribution. It is not straightforward to reuse learned representations.

Causal models, contrastingly, are less black-box, and by default more illustrative to a human operator. A causal model captures the causal mechanism underlying observational data in a Directed Acyclic Graph (DAG), with nodes representing the variables and directed edges representing causal properties such as direction, strength, and time-lag. Although causal models can be generated purely observational data, it is difficult to ascribe strong causal meaning due to possible (unobserved) confounding effects in the data. Many causal discovery methods based solely on observational data rely, therefore, on strong assumptions – i.e. on the existence of a set of variables that when measured remove bias introduced by confounders – or establish distinct bounds for the treatment effect.

If an autonomous system is able to actively intervene in a situation, possibly altering the sensor data distribution of a target variable, it can attempt to account for confounders. Classically, the Randomized Control Trial (RCT) is the golden standard for establishing causal relations. Through randomization, confounding effects on the target variable from any (un)known causes can be mitigated, ensuring that the intervention is the only cause of the observed effect.

Figure 2. Example of a DAG for a math camp example. Image taken from

Figure 2. Example of a DAG for a math camp example. Image taken from

Interventional data improves model robustness and generalizability compared to purely statistical methods such as ML. Real-world interventions, however, are typically expensive, and can even be infeasible operations in some domains. Simulations can circumvent these issues, but often struggle to realistically model the physical processes present in the real world, thus limiting their use. Therefore, one can combine observational and interventional data, thus compounding the statistical advantage of a large set of observational data with the causally stronger nature of interventions, if only a limited set.

Although there exists a plethora of studies into both self-adaptive systems and causal discovery, many studies consider only simulated data or systems. Even less investigate systems operating in real-life open-world scenarios. In this project in my time as a research scientist at TNO, some amazing colleagues and I set out to investigate the applicability of causal discovery methods for explainable self-assessment of autonomous systems operating in an open world, and integrated our solution into a continuous causal self-assessment pipeline on an operational autonomous system.

Specifically, we considered causal discovery methods for time-series data able to handle lagged correlations. Starting out with an initial causal structure derived from observational data, the system is able to introduce interventions to validate and update its internal causal world model. The goal here is not to establish a complete world model, or find the true causal model (if such a thing exists). As the system has a limited set of sensors and actuators, this is a futile task.

We are interested instead in producing a causal model that is robust to external changes and can generalize well to unseen environments, rather than one that captures world dynamics as accurately as possible. The model needs to be complete enough to generate effective adaptations against changes in the environment (reducing the number of human interventions), and decisions should sufficiently explainable to a human operator.


We find many interpretations and frameworks of causality in literature, the most prominent of which are:

  • Pearl’s graphical causality, which is defined in terms of (perfect) interventions, by applying Structural Causal Models (SCMs) (or Structural Equation Models (SEMs)) and do-calculus.
  • Rubin’s Potential Outcomes framework which, while formally equivalent, does not deal with graphical representa- tions but instead compares different observed outcomes and relates these to the causal model.
  • Wiener-Granger causality, which deals with time-series data, and only considers the predictive power of one time-series on the other - thus being an extremely weak notion of causality.

SCMs are a way of modeling causality that takes into account possible confounding effects and feedback loops. An SCM is a tuple M:=(I,J,X,E,f,Pϵ)M := (I, J, X, E, f, P_{\epsilon}) with II, a finite set of endogenous variables; JJ, a finite set of exogenous variables; X=ΠiIXiX = \Pi_{i\in I} X_{i}, a product of standard measurable spaces; E=ΠjJEjE = \Pi_{j\in J} E_{j} , a product of standard measurable spaces; f:X×EXf : X \times E \rightarrow X, the causal mechanism; and Pϵ=ΠjJPϵjP_{\epsilon} = \Pi_{j\in J} P_{\epsilon j} on EE, the exogenous distribution. This allows for a straightforward definition of interventions, as used in do-calculus. With do(XI=ΞI)do(X_{I} = \Xi_{I} ), the perfect intervention that sets XIX_{I} to the value of ΞI\Xi_{I} , the SCM changes from M=(X,E,f,Pϵ)M = (X, E, f, P_{\epsilon}) to Mdo(XI=ΞI)=(X,E,f,Pϵ)M_{do(X_{I} = \Xi_{I} )} = (X, E, f, P_{\epsilon}) with fi=XI=f(X,E):iIXI=ΞI:iIf_{i} = {X_{I} = f(X, E) : i \notin I} \cup {X_{I} = \Xi_{I} : i \in I}, i.e. override specifically the causal mechanism that would set the values of the variables under intervention.

In order to reason about causality, especially when taking a statistical approach, several assumptions are made. The most prominent of which are the Causal Markov Condition, the Faithfulness assumption, and the Causal Sufficiency assumption.

  • The Causal Markov Condition states that all relevant probabilistic information that can be obtained from the system is contained in all its direct causes. In other words, conditional on the set of all its direct causes, a node in the causal graph is independent of all variables which are not direct causes or direct effects of that node. This assumption is used to reason causally about graphical models.
  • The Faithfulness assumption states that conditional independencies found in measurements do not arise from coincide, but from causal structure. In other words, if variables XIX_{I} and EIE_{I} are not correlated, they are really independent. This assumption is often made for constraint-based causal discovery methods.
  • The Causal Sufficiency assumption states that measured variables include all of the common causes (i.e. latent confounders are not present). This assumption simplifies the problem significantly, and is frequently made by score-based methods, but at the same time often violated in real-world scenarios.

There exists much prior work on estimating causal structure based on purely observational data. These methods can roughly be categorized into constraint-based approaches, score-based approaches, and hybrid approaches.

Constraint-based methods, like the classic PC, IC, and FCI algorithms, test for conditional independence between variables, iteratively removing edges in the causal graph. Score-based methods, such as Bayesian EG or Adaptive Lasso, search for the DAG that best describes the statistical dependence relation in the observed data. Hybrid approaches combine both methods in order to reduce computational load, generally by pruning the search space used in score-based methods through conditional independence tests. The two main methods we considered in this project were PCMCI and TCDF.

PCMCI is a method for causal discovery specifically designed for large scale time-series data including both linear and non-linear (potentially time-lagged) dependencies. It is based on the classical PC algorithm but modifies it to a time-based variant by also taking into account data-points for a certain time-window in the past. This is the first step in the algorithm, but might lead to false positives in the case of taking especially liberal time-windows, or when time-series are highly interdependent or auto-correlated. Therefore in the second step, momentary conditional independence (MCI) testing, false positives are reduced by applying conditional independence tests including only the parent nodes in the conditioning set. Selecting an appropriate time-window therefore is important – which is a general notion when working with causal models and time-series data.

Figure 3. Causal discovery problem with time-series data. Image taken from Runge et al.

Figure 3. Causal discovery problem with time-series data. Image taken from Runge et al.

TCDF is a deep learning-based method for causal discovery. More specifically, it is based around Attention-based Convolutional Neural Networks (CNNs). To discover a causal structure in the supplied dataset, first, for each time-series in the dataset, a CNN is trained which aims to predict that time-series. All time-series in the dataset, including the target time-series, are provided as input to the CNN. After training, the internal parameters of the network are analyzed to determine which input time-series could be causes the of the target time-series for that CNN. In short, this analysis relies on the used attention mechanism, to determine which input time-series the CNN focused on for the prediction of the target time-series. Afterwards, all candidate cause-effects relations are validated using TCDF’s Permutation Importance Validation Method (PIVM), which measures how much the prediction error of a target time-series increases if the values of a potential cause are randomly permuted. This steps tests if the assumption of chronological ordering between cause and effect, i.e. cause must occur before effect, holds. In the last step the lag between cause and effect is determined through the analysis of kernel weights. All remaining cause-effect relations after validation are merged into a final causal graph.

Figure 4. Overview of TCDF framework. Image taken from Nauta et al.

Figure 4. Overview of TCDF framework. Image taken from Nauta et al.

Recently, JCI has been proposed as a framework for causal discovery on data found in multiple distinct contexts. Their approach pools data from multiple contexts in order to discover conditional independencies that can only be found when joining the measurements from both contexts, rather than only testing for conditional independence within each context and later combining those as done in most constraint-based methods.

Other work focuses on combining ML methods with causal learning, for example in order to learn causal variables (which are currently often designed, knowingly or unknowingly, by system engineers, or part of encoded neural-network representations); improving Reinforcement Learning robustness using causal world- or agent-models; and to understand biases in deep-learning models resulting from design decisions in data modeling or task selection through a causal lens.

System Overview

The main objective of the project was to investigate the effectiveness of current temporal causal discovery methods for application in real-world scenarios involving large-scale, uncertain and unstructured data for self-assessment of autonomous systems.

The ROS2-based system used in this work sits on top of Spot, the quadruped robotic platform designed by Boston Dynamics, and is comprised of several hard- and software components such as cameras, 3D sensing, object detection models, and audio components in order to communicate with humans. The Spot platform was selected for its agility and robust navigation of heterogeneous terrain types.

The system was tasked with a search-and-rescue mission to locate and assist a number of victims in an indoor and outdoor environment. Initialized with some prior knowledge about its environment, such as a rough map of the building, it may encounter unforeseen changes. Environmental dynamics include changes in lighting; decreased visibility due to fog or smoke; loud noise; blocked passages; or terrain variability (both in height variation and material density).


The system was initially deployed in a hostile indoor environment, considered too dangerous for humans to enter. In this environment, it was tasked by a human operator to search for either a known number of humans believed present in the environment, or exhaustively search the space for an unknown number of victims. Victims can be of any gender or age, and may be positioned in varying orientations (i.e. standing, sitting, lying down) and be partially occluded from view. Additionally, they may be incapacitated, either physically or mentally, and may not respond to auditive queries by the system.

When a victim is found, the system will query them with a specific name in order to keep track of which victim is found. If the victim is unresponsive, and the system cannot assist, it will mark the victim’s location and delegate further responsibility to a human operator. Potentially, the system has to navigate in outdoor terrain as well when encountering blocked passages for example.

Figure 5. System architecture.

Figure 5. System architecture.

Figure 5 gives a high-level schematic overview of the components in the system.

After receiving a task (with optionally known locations of victims), the Tactical Planner component generates a flexible Markov Decision Process (MDP) from the concepts stored in the knowledge base. The MDP is then passed to a Monte-Carlo Tree-search based planner that is able to handle partially observable MDPs as well (where states are defined by distributions of possible actual states). The outcome of this process is an action, such as Go left, Go right, or Go to door. This is transferred via the Platform Server, which is responsible for operations relating to the robot’s navigation and other motor control (such as turning on a flashlight), to the navigational stack which interfaces with the Spot API and manages SLAM processes.

Microphone and audio devices are connected to the Audio Server which is used to parse speech signals into a textual representation, and parsed using an NLP method to localize victims in the vicinity of the system. In addition, multiple camera’s are used to generate imagery data, and to generate point clouds used to assess terrain state. Imagery data is used to detect victims using a task-specific object characterization algorithm. The Environmental Driver is used to interface with light, ambient noise, and gas sensors.

The Behaviour Server schedules the readings and commands to the four subsystems.

Finally, the Task Performance and Causal Reasoning components monitor the external and internal system’s state, and will generate an intervention when performance drops below the expected values required by the operator. The Causal Reasoning component specifically manages the causal world models established during operation.

The system is backed by ROS2’s DDS message bus implementation and a TypeDB graph database.

Causal reasoning

We'll zoom in a little on the causal reasoning component. It is initialized with a set of causal models generated previously using a causal discovery method (we can switch between methods, as long as it outputs a standardized DAG). On receiving a signal from the Task Performance component, and after an initial data filtering step to prune any missing values (which is common in real-world environments), the component will first determine which causal model is currently appropriate in the given context. This is done by calculating the Wasserstein distance between sensor distributions of the associated context of the causal model and the incoming data-vector (current environmental state). After selecting an active causal model, it will determine an appropriate intervention.

The intervention is generated by comparing the required (or demanded) performance for the current task as defined in the initial signal from the Task Performance component with the maximum expected effect obtained by applying an intervention as predicted by the causal model. The correlation of each intervention on the current target variable (relating to the task, i.e. detector frequency in the case of a visual search task) is determined by applying classic graph traversal and sums the contribution of all pathways by which the intervention and target variables are connected. The strength of every contributing pathway is determined by calculating the product of the path coefficients along that pathway. The resulting value together with the intervention (proposed behaviour) is returned to the Task Performance component when the expected value exceeds that of the signaled required/demanded value.

The generated causal models are stored and continuously updated on-the-fly with a causal discovery framework using a graph embedding approach. The full pipeline is specified in Figure 6.

All sensor components push their data onto the ROS2 DDS bus. This sensor data is first cleaned and aggregated in a pre-processing step, where missing values are filtered out, and sensor values generated in a single time step are grouped together. Next, causal models are created by sliding a time window over the data and generating causal models for every window. This results in a large dataset of graphs. Note that the specific causal discovery method is pluggable in this setup, and allows drop-in replacements. In the match & merge step, we prune that large set of causal graphs by using a graph embedding approach to determine significant differences between causal models by calculating relative distances in vector space. This results in a lower number of causal models that hold in the world in which the autonomous system operates. These can be viewed as the observed rules of the world that the system has learned.

Next, in the context association step, they are labelled with the distribution of sensor values under which the causal model was found (in order to later retrieve them). As described in the previous section, the system will select an active causal model when generating suitable interventions. The full process can be executed offline (which we did to create an base set of causal models), and then ran in a continuous online fashion in order to keep updating the system’s world model on the fly.

Causal models can also be merged into a single-world model when taking into account contextual variables by applying Mooij et al.’s work on causal models in multiple contexts. The proposed framework is designed to be lightweight and modular, in order to plug in new implementations in the future for the causal discovery; graph embedding; and context association steps.

Figure 6. Continuous causal reasoning framework.

Figure 6. Continuous causal reasoning framework.

Challenges & Lessons learned

With the previously described framework in place, we investigated the challenges associated with the application of causal discovery methods to two intervention scenario’s related to the visual search task.

First, the expected causal structure for both scenario’s were defined as a baseline. The first scenario involved various lighting conditions, for which it was expected that the image detector would fail to detect humans in certain conditions (e.g. strong back light, or dark conditions). The optimal intervention here is that the system can activate a flashlight in order to improve the lighting conditions and proceed with its task. As this is a relatively simple use-case, we expected that these causal relations would be captured by the causal discovery algorithm without major issues.

The second scenario involved varying terrain conditions (loose sand, grass, flat gravel, wood snips, obstacles, etc.). The expectation here is that terrain variability influences walking stability and speed, which in turn influences coverage. Additionally, there are several options to influence the systems walking behaviour, such as adjusting its walking height (i.e. low on the ground or high on its feet) or turning on a slow crawling walking mode. The system is more stable on rough terrain when positioning itself high on its legs due to the extra time it has to position its feet relative to the ground. When positioned low on its legs, this time is much shorter, leading to the robot being unstable and in extreme cases falling over. However, on flat terrain the low-positioned crawl mode is more stable and costs less effort.

Various combinations were recorded to be used as a dataset for terrain type. The full dataset includes point cloud data, local grid maps, efforts in joint states, and average speed. Additional metrics were calculated using the point cloud and local grid maps such as local omnivariance and surface variation based on work by Hackel et al..

After initial experimentation on small test cases taken from above-mentioned datasets, we found that the PCMCI method by Runge et al. and TCDF method by Nauta et al. were promising candidate causal discovery methods.

These methods are one of the few around that can handle large-scale time-series data, and have a solid implementation. TCDF, being a DNN-based method is more compute intensive however. Since the aim was to run all processes autonomously on the robotic platform, which is constrained for compute resources, we integrated the more lightweight candidate, PCMCI.

When doing these experiments, we identified a number of challenges thus far, which should be addressed in an operational framework:

  • Robustness. While there are many causal discovery methods available, there is a lack of results on real-life open-world data. Especially for autonomous systems, there is a need for a solid benchmark to indicate how well they transfer to novel situations, and whether they enable effective intervention selection in real-life scenarios.
  • Model accuracy. Our initial experiments still showed a significant degree of false positive relations, requiring significance thresholding to be effective. Causal graphs need to be compared to predefined true graphs for a given graph distance metric (i.e. edit distance, or more advanced methods with special attention for missed confounders).
  • Explainability. Causal models should be evaluated on their ability to explain the current state of the autonomous system to a human operator. Further, it would be interesting to expose examples from the match-and-merge step and allow an operator to select a model by hand based on those examples.
  • Bias. There exists an inherent bias in design/engineering of (autonomous) systems, simply by forcing a selection of variables for operational use through incorporating sensor components. This increases the chance of latent confounders.
  • Scalability. It should be clear what the performance impact is of incorporating additional signals into the framework, and what further steps can be taken to mitigate that impact (e.g. by more aggressive pruning in the merge step).

Future Work

Although there exists a lot of research on self-adaptive systems and causal discovery, most remains theoretical or validated on small simulated datasets. We found that causal models provide a straightforward and explainable way of reasoning about performance for an autonomous system, allowing the learned representation to be shared with other systems and reused for different tasks. However, some degree of feature engineering is currently still required in order to obtain useful variables for causal analysis, especially for dynamic environments. Causal Petri nets look like an interesting way forward.

Additionally, a system should be able to make an assessment of both risk and informational gain of a given intervention, for example by applying causal Bayesian networks. Second, the integration of expert knowledge and human feedback in a structured manner merits attention, especially because the graphical models lend themselves well to that application. Next, not only should the system be able to handle uncertainties in sensor data and its environment, it should also be able to manage uncertainties related to the goals and tasks it receives from human operators. Finally, as Schölkopf et al. noted, one of the most important challenges for real-world causal systems is the representation problem: i.e. to automatically learn high level causal variables from raw large-scale unstructured data.

Treading into causal territory should be done carefully. It is a continuous danger to blindly interpret graphical causal models as robust representation of causal reality, especially when generated and used by autonomous systems.


[1] Kephart, Jeffrey O., and David M. Chess. "The vision of autonomic computing." Computer 36.1 (2003): 41-50.

[2] Balke, Alexander, and Judea Pearl. "Bounds on treatment effects from studies with imperfect compliance." Journal of the American Statistical Association 92.439 (1997): 1171-1176.

[3] Pearl, Judea. "Models, reasoning and inference." Cambridge, UK: CambridgeUniversityPress 19.2 (2000).

[4] Rubin, Donald B. "Causal inference using potential outcomes: Design, modeling, decisions." Journal of the American Statistical Association 100.469 (2005): 322-331.

[5] Granger, Clive WJ. "Investigating causal relations by econometric models and cross-spectral methods." Econometrica: journal of the Econometric Society (1969): 424-438.

[6] Bongers, Stephan, et al. "Structural causal models: Cycles, marginalizations, exogenous reparametrizations and reductions." arXiv preprint arXiv 1611 (2016).

[7] Vowels, Matthew J., Necati Cihan Camgoz, and Richard Bowden. "D’ya like dags? a survey on structure learning and causal discovery." ACM Computing Surveys 55.4 (2022): 1-36.

[8] Spirtes, Peter, Clark N. Glymour, and Richard Scheines. Causation, prediction, and search. MIT press, 2000.

[9] Elidan, Gal, and Stephen Gould. "Learning bounded treewidth Bayesian networks." Advances in neural information processing systems 21 (2008).

[10] Shojaie, Ali, and George Michailidis. "Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs." Biometrika 97.3 (2010): 519-538.

[11] Chickering, David Maxwell. "Optimal structure identification with greedy search." Journal of machine learning research 3.Nov (2002): 507-554.

[12] Wang, Yuhao, et al. "Causal discovery from incomplete data: a deep learning approach." arXiv preprint arXiv:2001.05343 (2020).

[13] Runge, Jakob. "Causal network reconstruction from time series: From theoretical assumptions to practical estimation." Chaos: An Interdisciplinary Journal of Nonlinear Science 28.7 (2018): 075310.

[14] Nauta, Meike, Doina Bucur, and Christin Seifert. "Causal discovery with attention-based convolutional neural networks." Machine Learning and Knowledge Extraction 1.1 (2019): 312-340.

[15] Mooij, Joris M., Sara Magliacane, and Tom Claassen. "Joint causal inference from multiple contexts." The Journal of Machine Learning Research 21.1 (2020): 3919-4026.

[16] Schölkopf, Bernhard, et al. "Toward causal representation learning." Proceedings of the IEEE 109.5 (2021): 612-634.

[17] Wright, Sewall. "The method of path coefficients." The annals of mathematical statistics 5.3 (1934): 161-215.

[18] Hackel, Timo, Jan D. Wegner, and Konrad Schindler. "Fast semantic segmentation of 3D point clouds with strongly varying density." ISPRS annals of the photogrammetry, remote sensing and spatial information sciences 3 (2016): 177-184.

[19] Anand, Ritwik, et al. "Petri Nets Enable Causal Reasoning in Dynamical Systems." A causal view on dynamical systems, NeurIPS 2022 workshop.