Bayesian Networks

A Bayesian network is a model that is based on probability theory and uses graphs to model relationships between variables. With the development and use of artificial intelligence methods, the ability of the model to explain and justify its conclusions is proving to be very important in some applications. Bayesian networks are among the models that are able to provide such explanations.

Bayesian networks first appeared under this name in the 1980s in the work of Judea Pearl. At that time, papers by other authors also appeared in rapid succession, mainly dealing with methods for efficient computation in these models. Among the most important were papers by Steffen Lauritzen and David Spiegelhalter, as well as Glenn Shafer and Prakash Shenoy. In the area of decision diagrams, these were mainly the works by Ross Shachter. Research in the field of probabilistic graphical models, which include Bayesian networks, has a long tradition in the Czech Republic, and in the former Czechoslovakia. Already in the 1960s, methods for working with many-dimensional probabilistic models, based on the approximation of probability tables by simplifying the dependence structure, appeared in the works of Albert Perez. Albert Perez worked at the Institute of Information Theory and Automation of the then Czechoslovak Academy of Sciences, and around him a strong group was formed that achieved important world-renowned results. Among the most important representatives of this group were Radim Jiroušek, Milan Studený and František Matúš. The tradition of the research in this area at the Institute of Information Theory and Automation of the Czech Academy of Sciences continues.

A Bayesian network is a model that is based on probability theory and uses graphs to model relationships between variables. Bayesian networks, like some other graphical models, were proposed in the 1970s for applications in artificial intelligence. The goal was to create models that were easy to explain and understand. The basic building block of these models is conditional probability, which is handled by Bayes' theorem.

Bayes' theorem describes the relationship between the conditional probability of an event \(A\) when event \(B\) occurs and the inverse probability of event \(B\) when event \(A\) occurs. The simplest formulation of this theorem assumes knowledge of the prior probabilities of event \(A\) and event \(B\), and also requires that the prior probability of event \(B\) is non-zero. This formulation can be expressed using the following formula:

\[P(A|B)=\frac{P(B|A) \cdot P(A)}{P(B)} \enspace\]

where the symbol \(P(A|B)\) denotes the conditional probability of the event \(A\) given the observed event \(B\), the symbol \(P(B|A)\) denotes the conditional probability, the symbol \(P(A)\) denotes the prior probability of the event \(A\), and the symbol \(P(B)\) denotes the prior probability of the event \(B\).

A Bayesian network is defined by an acyclic oriented graph and conditional probability tables for all nodes. The table of a node represents the probability distribution of its states for all configurations of the states of the parents of that node. An example of a Bayesian network is a model for testing the knowledge of students solving arithmetic problems with fractions.  This model was published in 2004. The nodes in the graph of the model represent the skills needed to solve problems with fractions and typical misconceptions in solving problems with fractions. The graph contains nodes corresponding to the ability to apply different skills in fraction problems. Also included in the model are the observed dependencies between these variables. The model allows a detailed diagnosis of the skills and misconceptions of the tested students. The structure of the model is shown in the following figure. The nodes in brown correspond to hidden (unobserved) quantities, those in blue to skills, those in red to misconceptions, and those in light yellow to individual tasks.

Such a model can be used to create an adaptive knowledge test, which is a test that does not have predetermined questions that the tested student must answer. The questions are selected during the course of the test based on previous answers so as to provide as much information as possible about the particular student. Simply put, if a student has mastered the more difficult questions, we will no longer ask them the easy questions, but rather the harder ones. Or, if we see that he or she has demonstrated through correct answers that he or she has some of the necessary skills, we will focus on other skills where there is still uncertainty. We use the conditional probability for skills obtained from the Bayesian network to select the question to be asked, where we condition on the student's previous answers. A suitable selection criterion is then, for example, minimizing the expected entropy of the skill probability table. Adaptive knowledge testing finds applications not only in traditional student learning, but also in the areas of e-learning, intelligent tutoring, and wherever there is a need to efficiently ascertain the knowledge or level of a system user.

Related publications:

  1. VOMLEL, Jiří. Bayesovské sítě. Věda kolem nás. Academia. 2024.
  2. VOMLEL, Jiří. Bayesian networks in educational testing. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2004, 12.supp01: 83-100.
     

Contact person

Jiří Vomlel