How HCI and human thinking combine

Combining Natural and Artificial Intelligence: An Experimental Test Environment for Explainable AI (xAI)


Artificial intelligence (AI) follows the term human intelligence, which unfortunately is not a clearly defined term. The most common definition, as it is given in cognitive science as a mental ability, contains, among other things, the ability to think abstractly, logically and reasonably and to solve given problems in the real world. A current topic in AI is to find out whether and to what extent algorithms are able to learn such abstract thinking and reasoning similar to humans - or whether the learning outcome is based on purely statistical correlation. In this article we present a freely available, universal and expandable experimental test environment that we developed. These "Kandinsky Patterns" (,, named after the Russian painter and art theorist Wassily Kandinsky (1866 –1944), represent a kind of “Swiss knife” for studying the problems mentioned. The area that deals with these problems is called “explainable AI” (xAI). Explainability / interpretability has the goal of enabling human experts to understand underlying explanatory factors - causality, i.e. Why an AI decision has been made, paving the way for transparent and verifiable AI.


Artificial intelligence (AI) follows the concept of human intelligence, which unfortunately is not a clearly defined concept. The most common definition, as given in cognitive science as a mental ability, includes the ability to think abstract, logical and deductively and to solve given problems of the real world. A current topic in AI is to find out whether and to what extent algorithms are capable of learning such abstract “thinking” and reasoning similar to humans — or whether the learning outcome is based on purely statistical correlation. In this paper we present a freely available, universal and extensible experimental test environment. These “Kandinsky Patterns”, named after the Russian painter and art theorist Wassily Kandinsky (1866–1944), represent a kind of “Swiss knife” for studying the problems mentioned above. The area that deals with these problems is called “explainable AI” (xAI). Explainability / Interpretability aims to enable human experts to understand the underlying explanatory factors — causality — i.e. why an AI decision was made, thus paving the way for a transparent and verifiable AI.

Introduction and motivation

Impressive successes have been achieved, especially due to the great progress made in statistical / probabilistic machine learning (ML) over the past few decades. After the long AI winter (Hendler 2008), the AI ​​was not only "socially acceptable" again, it was even an economic success. Whether speech recognition (from Alexa to Siri), translation systems (from Google Translate to DeepL), recommender systems, classification algorithms (medicine), or self-driving cars, there are successful examples in many economically highly relevant application domains. These successes are very impressive from a computer science point of view and are often viewed as milestones in the achievement of "human intelligence". However, even the currently most successful applications are only successful in very narrowly limited and very specific tasks (Dowe and Hernández-Orallo 2012), while people understand a context and can also deal with incomplete information in complex, changing problems (Holzinger 2016b). But perhaps the biggest problem with the currently most successful solutions is their lack of comprehensibility and explainability / interpretability (Holzinger 2018b).

A practical example is the excellent work in Nature from the Stanford group around Sebastian Thrun (Esteva et al. 2017). In Europe, this work is often cited as a prime example in medicine under the title “AI - as good as dermatologists”. It is a classification of dermatological images used to identify malignant melanomas. The "deep learning" approach used for this in the form of an Inception v3 convolutional neural network (Krizhevsky et al. 2012) delivered impressive results with 92% performance. However, such approaches are referred to as “black box” and do not allow any traceability and thus no interpretability and an opportunity to answer the question Why these 92% and 8% misclassifications arose. In particular, due to the “right to explanation” in the European General Data Protection Regulation, there is increasing interest in research and industry in the subject of explainability / interpretability, which is summarized under the name explainable AI (ex-AI or xAI for short) (Holzinger 2018a). The traceability, interpretation and comprehensibility of results of machine learning algorithms and possibilities quality Evaluating AI is thus becoming the focus of interest in science and industry. Due to the weak so-called “black box” algorithms as well as social and ethical responsibility, at least in domains like medicine, one is more motivated to connect human intelligence and machine intelligence (“augmenting human intelligence with artificial intelligence”) work and combine the strengths of AI and human intelligence (Holzinger 2016a; Holzinger et al. 2019b).

In the following second chapter we first discuss some theoretical concepts on the term “intelligence” and give a description of the research gap why the Kandinsky patterns are relevant. In the third chapter we discuss some current ways in which one can test forms of "intelligence". Finally, in the fourth chapter, we introduce the Kandinsky Patterns.

From the concept of intelligence to explainable AI

A fundamental problem for AI is the often vague and very different definitions of the term intelligence. This is particularly acute in artificial systems that differ significantly from humans and has also been known for a long time (Legg and Hutter 2007). For this reason, intelligence tests for AI in general and ML in particular have not been the focus of extensive research in the international AI / ML community. The assessment of approaches and algorithms was primarily based on certain comparative standards, so-called "benchmarks" (cf. Nambiar et al. 2019; Nambiar 2018).

The best known approach, which is not strictly speaking, is that of Alan Turing 1950 (Turing 1950): an algorithm is considered intelligent (enough) for a certain kind of task if and only if it could do all possible tasks of its kind. The shortcoming of this approach, however, is that it is strong task-oriented is and that he is a high a priori knowledge about all possible tasks and the possibility of defining them. The latter in turn harbors the problem of the granularity and precision of definitions. An indicative example is the “intelligence test” for autonomous driving (Li et al. 2019). 2003). 2013).

In cognitive science, the testing of human aptitude - intelligence is seen here as a form of cognitive aptitude - has a very long tradition. Basically, the idea of ​​psychological measurement arises from general developments in 19th century science, and in particular physics, which places a significant emphasis on the precise measurement of variables. Human intelligence tests began around 1900: Alfred Binet (1857–1911) began developing assessment questions to identify children who were ready for school. Notably, Binet focused not only on aspects explicitly taught in schools, but also on more general and abstract skills, including attention span, memory, and problem-solving skills. Binet and colleagues found that the children's ability to answer the questions and solve the assignments was not necessarily a matter of physical age. Based on this observation, Binet suggested a mental age that was actually the first measurement of intelligence. The level of fitness was seen in relation to the average fitness of the entire population. Many different types of intelligence tests have evolved over time. What we would just like to emphasize in this article is that these very early intelligence tests already make the fundamental difference to the task-oriented evaluation of the later AI very clear. Human intelligence was not seen as the ability to solve a specific task, such as a pure classification task, but as a much broader construct. In addition, human intelligence was generally not measured in isolation, but always in relation to a population. Using the example of self-driving cars, the question would therefore be whether a car can drive better than all other cars, or whether and to what extent the car is better than human drivers.

For the following understanding of our Kandinsky Patterns, we mention two basic works.

The first work is that of John Raven (1902–1970), who developed the so-called "Raven’s Progressive Matrices". This is a nonverbal Multiple choice measurement of the argumentation component of two forms, namely (i) clear thinking and a sense of complexity and (ii) the ability to store and reproduce information (Raven 2000).

The second work was presented by Mikhail Bongard (1924–1971) in the form of so-called Bongard problems (Bongard 1967). A Bongard problem is to present two sets of relatively simple patterns wherein e.g. B. all patterns from set A have a common factor or attribute that is absent in all patterns from set B. The problem is to find the common factor, or at least to formulate it convincingly. Bongard problems were described in the well-known book by Douglas Hofstadter (born 1945) (Hofstadter 1979), but interestingly enough they had little influence on the AI. The theory behind these approaches is “concept learning” (Hunt 1962; Lake et al. 2015; Mao et al. 2015).

In the area of ​​"explainable AI" (the Anglo-Saxon term explainable AI is more common), methods are being developed to make deep learning models interpretable in particular. This can e.g. This can be done, for example, with the help of a simple sensitivity analysis (Hampel et al. 2011) in order to understand the prediction in relation to the input variables. A very well-known and successful method is Layer-Wise-Relevance Propagation (LRP), whereby a heat map can be used to show which input parameters contribute most to the result (Bach et al. 2015; Lapuschkin et al. 2016)Footnote 1. The heat map visualization, see e.g. B. (Sturm et al. 2015) shows e.g. For example, indicate which pixels have to be changed so that the image (from the point of view of the AI ​​systems!) Looks roughly like the predicted class.

The key to an effective human-AI interaction and therefore also the success of future human-AI interfaces lies in an efficient and consistent mapping of the explainability ("explainability" through "artificial intelligence") with causality in the sense of (Pearl 2009) (through "human intelligence").

This “map metaphor” is about making connections and relationships between areas - and Not (!) It's about drawing a new map, rather it is about identifying the same (or at least similar) areas in two completely different “maps” - so “mapping” is a very good term. Effective mapping is necessary, but not sufficient for understanding an explanation. Whether an explanation has been understood depends on other factors (such as prior knowledge). If the explanation was good, you understood something. In order to guarantee long-term economic success in various application domains, new human-AI interfaces will be necessary in the future, which allow constant feedback as to whether something has been understood or not. In a person-to-person interaction, this feedback is ensured to a very large extent by means of facial expressions (emotion), which is why the topic of “emotion” (Picard et al. 2001; Stickel et al. 2009) and emotional interfaces will be an important issue for Get explainable AI.

It is also important to differentiate between explainability and causability. By explainability (as already mentioned above) we mean the property of an AI system to generate “machine explanation” (e.g. through “heat mapping” (Sturm et al. 2015; Bach et al. 2015)). Causability, based on the word usability, is that quality with which such a mapping takes place, i.e. a mapping like on a map between AI and humans; d. H. between the "machine explanation" and the ability of humans to understand explanations (Holzinger et al. 2019a). In order to evaluate the quality of such explanations, we have developed a System Causability Scale (SCS) (Holzinger et al. 2020).

About testing intelligence

There is intense discussion within the machine learning community whether z. B. neural networks can learn abstract thinking or whether they only rely on pure correlations. In a recent post, the authors (Santoro et al. 2018) propose a data set and challenge to examine abstract thinking inspired by a well-known human IQ test: the Raven test, or more precisely the Ravens Progressive Matrices (RPM) and the Mill Hill Vocabulary Scales, which were developed in 1936 for basic research on both the genetic and environmental factors of “intelligence”, as already mentioned in Section 3 (Raven 2000). The premise behind RPMs is simple: one must think about the relationships between perceptible visual features (such as shape positions or line colors) in order to choose an image that completes the matrix. For example, the size of the squares increases along the lines, and the correct image is the one that adheres to that size ratio.

RPMs are a powerful diagnostic of abstract verbal, spatial, and mathematical thinking skills. In order to meet the challenge successfully, models have to deal with different generalization regimes in which the training and test data clearly differ from one another. The amazingly advancing field of AI and ML technologies adds another dimension to the discourse of intelligence tests, namely the evaluation of artificial intelligence as opposed to human intelligence. Human intelligence tends to adapt to its environment based on various cognitive and neural processes. The field of AI, in turn, has a very strong focus on developing algorithms that can mimic human behavior (weak or tight AI).

This is especially true for applied genres such as autonomous driving of cars, robotics or games. This also leads to marked differences in what we consider intelligent. Man has a consciousness, he can improvise, and human physiology shows plasticity, which leads to real learning in that man can “change” this “consciousness” himself. Although humans tend to make more mistakes, human intelligence as such is usually more reliable and resilient to catastrophic mistakes, while AI is prone to even minor glitches, such as: B. for software errors, hardware and power failures.

Human intelligence evolves based on infinite interactions with an infinite environment, while AI is limited to the small world of a particular task.

We want to illustrate this idea through the challenge of identifying and interpreting / explaining visual patterns. In essence, this refers to the human ability to understand the meaning of the world (e.g., by identifying the nature of a series of visual patterns that need to be continued). Sensemaking is an active processing of sensations in order to gain an understanding of the outside world and involves acquiring information, learning about new areas, solving problems, acquiring situational awareness and participating in social knowledge exchange (Pirolli and Russell 2011) . The ability can be applied to concrete areas such as various human-computer interactions, but also to abstract areas such as pattern recognition. This topic was the focus of medical research. Kundel and Nodine (Kundel and Nodine 1983), for example, examined gaze paths in medical images (a sonogram, a tomogram, and two standard X-ray images) - similar to modern studies (Pohn et al. 2019).The subjects were asked to summarize each of the images in one sentence. The results of this study showed that the correct interpretation of the images was related to visiting the relevant areas of the images and not visiting visually dominant areas of the images. The authors also found a strong correlation between explanations and experiences with images. A basic principle in the perception and interpretation of visual patterns is the probability principle originally formulated by Helmholtz, which states that the preferred organization of perception of an abstract visual pattern is based on the probability of certain objects (Leeuwenberg and Boselie 1988).

A competing explanation to a certain extent is the minimum principle proposed by Gestalt psychology, which claims that humans perceive a visual pattern according to the simplest possible interpretation. The role of experience is also reflected in studies in the context of the perception of abstract versus representative visual arts; (Uusitalo et al. 2009) showed clear differences between art experts and laypeople in the perception and preferences of the visual arts. Psychological studies have shown that the type of perception and interpretation of visual patterns is therefore a function of expectations (Yanagisawa 2019). On the one hand, this often leads to misinterpretations or premature interpretations; on the other hand, it increases the explainability of interpretations, since the visual perception is determined by existing conceptualizations.

Kandinsky Patterns

To develop our Kandinsky Patterns, we were inspired by our experience in working with pathologists. These describe z. B. histopathological images by identifying geometric objects, they speak of architectures and identify regularities and anomalies of these geometric structures. In the first step you describe what you see, in a second step you interpret your observations. If KI / ML is applied to such digital images, one quickly comes across a major problem: a lack of ground truth. This problem was a central motivator for the development of the Kandinsky Pattern.

Kandinsky Patterns (Müller and Holzinger 2019)Footnote 2 are mathematically describable, simple, self-contained and thus mathematically strictly controllable test data sets (images) for the development, validation and training of explainability in AI. Kandinsky patterns are at the same time (!) Easy to distinguish from humans and therefore controlled patterns can be described / processed by humans as well as by algorithms. This is very important in order to compare and understand the explanatory processes of algorithms with those of humans. In this way we gain fundamental knowledge for the field of explainable AI. The most important thing is that we can generate the “ground truth” (unfortunately very often very often in the real world), hide it during testing, but always compare it.

The term "ground truth" (actually "ground truth") originally comes from cartography and remote sensing (Geographical Information Systems, GIS), where only the presence of so-called "ground truth data" makes it possible to check a result for correctness (Pickles 1995).

The results of a first, recently conducted study of the explanatory behavior of people (Holzinger et al. 2019c) showed that the majority of explanations were based on the properties of the individual elements in an image (i.e. shape, color, size) and the appearance of the individual objects (number). Comparisons of elements (e.g. more, less, larger, smaller, etc.) were significantly less likely and, interestingly, the position of objects played almost no role in explaining the images.

In a natural language statement about a Kandinsky character, people use a number of Basic concepts, which are combined by logical operators. The following (incomplete) examples illustrate some concepts of increasing complexity (see Fig. 1).

  • Basic concepts given by the definition of a Kandinsky figure: a series of objects described by shape, color, size and position, see fig. 1a – d for color and fig. 1e – h for shape.

  • Presence, numbers, proportions (number, amount or proportions of objects), e.g. B. "a Kandinsky figure contains 4 red triangles and more yellow objects than circles", see Fig. 1i-l.

  • Spatial concepts that describe the arrangement of objects, either absolutely (above, above, below, left, right, ...) or relatively (above, above, above, touching, ...), e.g. B. "in a Kandinsky figure there are red objects on the left, blue objects on the right and yellow objects on the right under blue fields", see Fig. 1m-p.

  • Design concepts (see below) e.g. B. closure, symmetry, continuity, proximity, similarity, e.g. B. “In a Kandinsky figure, objects are grouped in a circle”, see Fig. 1q – t.

  • Domain concepts, e.g. B. "a group of objects is perceived as a" flower "", see Fig. 1u – x.

These basic concepts can be used to select groups of objects, e.g. B. "all red circles in the upper left corner", and to further combine individual objects and groups in a statement with a logical operator, e. B. "if there is a red circle in the upper left corner, there is no blue object", or with complex domain-specific rules, e.g. B. "if the size of a red circle is smaller than the size of a yellow circle, red circles are arranged in a circle around yellow circles".

See the picture below.

In their experiments (Hubel and Wiesel 1962) they discovered, among other things, that the human visual system builds up an image from very simple stimuli to more complex representations. This inspired the neural network community to see their so-called "deep learning" models as a cascading model of cell types that always follow similar simple rules: first, lines are learned, then shapes, then objects formed, which ultimately lead to conceptual representations (Schmidhuber 2015 ).

Using "backpropagation" (Lecun et al. 1989) such a model is able to discover intricate structures in large data sets to indicate how to adjust the internal parameters that are used to represent the representation in each Calculate the layer from the representation in the previous layer. The representation of concepts relates to the human ability to learn categories for objects and to recognize new instances of those categories. In machine learning, conceptual learning is defined as the derivation of a Boolean function from training examples of its inputs and outputs, i.e. H. it trains an algorithm to distinguish between examples and non-examples - we call these "counterfactuals" ("counterfactuals", "what if ..."). Conceptual learning has long been a relevant research area in machine learning and has its origins in cognitive science, defined as the search for attributes that can be used to distinguish specimens from non-patterns in various categories (Bruner 1956).

The ability to think in abstractions is one of the most powerful tools humans have. Technically, people organize their experiences into coherent categories by defining a given situation as a member of that collection of situations for which answers x, y, etc. are most likely appropriate. This classification is not a passive process and understanding how people learn abstractions is essential not only for understanding human thought but also for building artificial intelligence machines (Hunt 1962).

In the case of learning classification models for segmentation in particular, it is a matter of classifying between “good” and “bad” segmentations and using the gestalt cues as features (the priors) for training the learning model. Images manually segmented by humans are used as examples of “good” segmentations (basic truth), and “bad” segmentations are constructed by randomly assigning a human segmentation to another image (Ren and Malik 2003).

Gestalt principles (Koffka 1935) can be seen as rules, i. H. they only discriminate competing segmentations when everything else is the same, so we speak more generally as gestalt laws and a particular group of gestalt laws are the gestalt laws of grouping, called conciseness (Wertheimer 1938), which include the law of proximity: objects that are close together , seem to form groups, even if they are completely different, the law of similarity: similar objects are grouped; or the law of closure: objects can be perceived as such, even if they are incomplete or hidden by other objects. As mentioned at the beginning, the currently most powerful machine learning methods unfortunately have a number of disadvantages, one of which is particularly relevant, precisely the one that “black box” approaches are difficult to interpret due to their complexity. Image classifiers do not work with high-level concepts, but with low-level features (e.g. lines, circles, etc.) and with domain concepts, and this is precisely what makes their inner workings difficult to interpret and understand. However, the “why” is often much more useful than the pure classification result.


By comparing both the strengths of machine intelligence and human intelligence, it is possible to solve problems that we currently lack suitable methods for. A big general question is: "How can we accomplish a task using the knowledge gained in solving previous tasks?" To answer such questions it is necessary to gain insight into human explanatory behavior, but not with the aim to mimic human behavior, but to compare human learning methods with machine learning methods. We hope that our Kandinsky Patterns will challenge the international machine learning community and look forward to many comments and results. To take the step towards a more human-like and probably more thorough assessment of AI / ML, we suggest applying the principles of human intelligence testing as outlined in this post. A number of Kandinsky patterns can be used to assess AI / ML, each of which is complex in itself. A “real” intelligent achievement would be the identification of the concepts - and thus the meaning! - of sequences of several Kandinsky patterns. In any case, much more experimental and theoretical work is required.


  1. 1.

    Sebastian Bach has been called Sebastian Lapuschkin since 2016, so it is the same author.

  2. 2.


  1. In: Bilham E (Ed) Advances in Cryptology-Eurocrypt 2003. Springer, Berlin, S 294–311

    Google Scholar

  2. Bach S, Binder A, Montavon G, Klauschen F, Müller K ‑ R, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10 (7): e130140.

    Article Google Scholar

  3. In: Human factors in computing and informatics. Lecture notes in computer science, LNCS 7946. Springer, Berlin Heidelberg, S 409–426

    Google Scholar

  4. Bongard MM (1967) The problem of recognition. Nauka, Moscow (in Russian)

    Google Scholar

  5. Bruner JS (1956) On attributes and concepts. In: Bruner JS, Goodnow JJ, Austin GA (Eds) A study of thinking. Wiley, New York, pp. 25-49

    Google Scholar

  6. Dowe DL, Hernández-Orallo J (2012) IQ tests are not for machines, yet. Intelligence 40 (2): 77-81.

    Article Google Scholar

  7. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 (7639): 115-118.

    Article Google Scholar

  8. Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (2011) Robust statistics: the approach based on influence functions. Wiley, New York

    MATH Google Scholar

  9. Hendler J (2008) Avoiding another AI winter. IEEE Intell Syst 23 (2): 2-4.

    Article Google Scholar

  10. Hofstadter DR (1979) Goedel, Escher, Bach: an eternal golden braid. Basic Books, New York

    MATH Google Scholar

  11. Holzinger A (2016a) Interactive machine learning (iML). Inform spectrum 39 (1): 64-68.

    Article Google Scholar

  12. Holzinger A (2016b) Interactive machine learning for health Informatics: when do we need the human-in-the-loop? Brain Inform 3 (2): 119-131.

    Article Google Scholar

  13. Holzinger A (2018a) Explainable AI (ex-AI). Inform spectrum 41 (2): 138-143.

    Article Google Scholar

  14. Holzinger A (2018b) Interpretable AI: New methods reveal decision-making paths of artificial intelligence. c’t 22: 136-141

    Google Scholar

  15. Holzinger A, Carrington A, Müller H (2020) Measuring the quality of explanations: the system causability scale (SCS). Artificial Intell.

    Article Google Scholar

  16. Holzinger A, Kickmeier-Rust M, Müller H (2019c) KANDINSKY patterns as IQ-test for machine learning. In: International cross-domain conference for machine learning and knowledge extraction. Lecture notes in computer science LNCS 11713. Springer, Cham, S 1–14

    Google Scholar

  17. Holzinger A, Langs G, Denk H, Zatloukal K, Mueller H (2019a) Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip Rev Data Min Knowl Discov.

    Article Google Scholar

  18. Holzinger A, Plass M, Kickmeier-Rust M, Holzinger K, Crişan GC, Pintea C ‑ M, Palade V (2019b) Interactive machine learning: experimental evidence for the human in the algorithmic loop. Appl Intell 49 (7): 2401-2414.

    Article Google Scholar

  19. Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160 (1): 106-154.

    Article Google Scholar

  20. Hunt EB (1962) Concept learning: an information processing problem. Wiley, Hoboken

    Book Google Scholar

  21. Koffka K (1935) Principles of gestalt psychology. Harcourt, New York

    Google Scholar

  22. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems (NIPS 2012) NIPS, Lake Tahoe, S 1097–1105

    Google Scholar

  23. Kundel HL, Nodine CF (1983) A visual concept shapes image perception. Radiology 146 (2): 363-368

    Article Google Scholar

  24. Lake BM, Salakhutdinov R, Tenenbaum JB (2015) Human-level concept learning through probabilistic program induction. Science 350 (6266): 1332-1338.

    MathSciNetArticleMATH Google Scholar

  25. Lapuschkin S, Binder A, Montavon G, Müller K ‑ R, Samek W (2016) The LRP toolbox for artificial neural networks. J Mach Learn Res 17 (1): 3938-3942

    MathSciNetMATH Google Scholar

  26. Lecun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1 (4): 541-551.

    Article Google Scholar

  27. Leeuwenberg EL, Boselie F (1988) Against the likelihood principle in visual form perception. Psychol Rev 95 (4): 485-491.

    Article Google Scholar

  28. Legg S, Hutter M (2007) Universal intelligence: a definition of machine intelligence. Minds Mach 17 (4): 391-444.

    Article Google Scholar

  29. Li L, Wang X, Wang K, Lin Y, Xin J, Chen L, Xu L, Tian B, Ai Y, Wang J (2019) Parallel testing of vehicle intelligence via virtual-real interaction. Sci Robot 4 (eaaw4106): 1-3

    Google Scholar

  30. Mao J, Wei X, Yang Y, Wang J, Huang Z, Yuille AL (2015) Learning like a child: fast novel visual concept learning from sentence descriptions of images. In: Proceedings of the IEEE international conference on computer vision ICCV 2015.

    Google Scholar

  31. Müller H, Holzinger A (2019) Kandinsky patterns. arXiv: 1906.00657

    Google Scholar

  32. Nambiar R (2018) Towards an industry standard for benchmarking artificial intelligence systems. In: IEEE 34th international conference on data engineering ICDE 2018.

    Google Scholar

  33. Nambiar R, Ghandeharizadeh S, Little G, Boden C, Dholakia A (2019) Industry panel on defining industry standards for benchmarking artificial intelligence. Springer, Cham, S 1–6

    Book Google Scholar

  34. Pearl J (2009) Causality: models, reasoning, and inference, 2nd ed. Cambridge University Press, Cambridge