Get Complete Project Material File(s) Now! »

## Bayesian Networks

Bayesian networks (also called Bayesian belief networks) have been investigated and widely applied in student modeling for several decades. The Bayesian network student models are capable to assess student knowledge and predict student actions. A Bayesian network is a directed acyclic graph, in which nodes represent variables and edges represent probabilistic dependencies among variables (Jensen and Nielsen 2007). It provides a mathematically sound formulism to handle uncertainty. Bayesian networks are causal networks, where the strength of causal links is represented as conditional probabilities. For instance, if there is a link from X to Y, we say X is a parent of Y, and Y is a child of X. X has an influence on Y, and evidence about X will influence the certainty of Y. To quantify the strength of the influence, it is natural to use the conditional probability P(Y|X). However, if Z is also a parent of Y, the two conditional probabilities P(Y|X) and P(Y|Z) alone do not give any clue about the impact when X and Z interact. They may cooperate or counteract, so we need a joint conditional probability P(Y|X,Z). Therefore, to define a Bayesian network, we have to specify:

A set of variables, each of which represents a sample space, also called chance variable.

A set of directed edges between variables.

To each variable Yi with parents X1,⋯,Xn, the conditional probability table P(Yi|X1,⋯,Xn).

### Dynamic Bayesian Networks

The Bayesian network student models introduced in section 2.1.1.1 is static, that is, they are only able to evaluate student knowledge at one point in time, like a pre-test or post-test of a period of student learning. To construct a model tracking student knowledge during learning, we need to update student knowledge each time a new behavior is observed. In this case, the variables in a Bayesian network is time-sensitive, whose probability distributions evolve over time. Dynamic Bayesian Networks (DBNs) (Jensen and Nielsen 2007; Murphy 2002) which introduce a discrete time stamp can be used in this case. The model in each unit of time of a DBN is called the time slice. It is exactly the same with the static student model, except that some nodes have relatives outside the time slice.

DBNs have been applied in many student models. (Reye 1996, 1998) described the process of using a DBN student model to update student knowledge. Their model assumed that a student’s knowledge state after the nth interaction with the system relies on the student knowledge state after (n-1)th interaction and the outcome of the nth interaction. The idea is to model a student’s mastery of a knowledge component over time. The outcome of a student’s nth attempt to apply the knowledge component depends on the previous belief of his knowledge state. And the probability of mastering a skill P(Si) depends on the previous belief of the student’s knowledge state and the outcome of his nth attempt. However, in a time slice of their network, each interaction is related to only one knowledge component (in his application it is a production rule).

#### Bayesian Knowledge Tracing

Bayesian knowledge tracing (BKT) (Corbett and Anderson 1995) is a well-known technique to track the dynamic knowledge of students during learning. It is a hidden Markov model since it assumes that a student’s past knowledge state has no influence on the future knowledge state given the current knowledge state. The classic BKT model evaluates student knowledge of a single knowledge component each time, with one latent variable and one observable variable per time slice. The observations are usually fine-grained, like scaffolding questions or steps, each of which is only related to one knowledge component. BKT models are based on the learning assumption (Corbett and Anderson 1995): with practice, student knowledge is strengthened in memory and student performance grows more reliable and rapid. This assumption is supported by the empirical results, like learning curves which will be introduced in section 2.1.2.3.

The BKT model is actually a special dynamic Bayesian network model. We discuss it at the same section level with the DBN models because it is the most commonly used student model in ITSs. And it is different from the other DBN student models, as it takes into account a particular transition parameter. In the BKT model, a student’s mastery of a knowledge component could be two states, the learned and unlearned state. A student’s mastery of a knowledge component can transition from the unlearned to the learned state at each opportunity of learning the knowledge component or applying the knowledge component in problem-solving. In the classic BKT, there is no forgetting, that is, a student’s knowledge state cannot transition in the other direction. As mentioned above, student performance is noisy. Students might make mistakes due to slipping though they know the related knowledge component, or might response correctly by guessing though they do not know that knowledge component. Hence, two learning parameters and two performance parameters are specified in the classic BKT model. Figure 2.3 shows the structure of the classic BKT model and the

parameters for the corresponding links.

**Item Response Theory**

Item Response Theory (IRT) (Lord 1980) is a well-known psychometric theory modeling the response of a learner with a given ability to a test item. It has been investigated for several decades and widely used in Computerized Adaptive Testing (Wainer 2001). IRT is based on the assumption that the probability of a correct response to an item is a mathematical function of the learner’s ability and item characteristics. It is assumed that the knowledge level, ability or proficiency of a student is measured by a continuous variable, usually denoted by θ, which is called the trait. IRT models are considered as latent trait models, since the discrete responses to items are the observable manifestations of the latent traits. The item characteristics are described by the parameters in the IRT models. The commonly used is the 1PL (1 parameter logistic) -IRT model, also called the Rasch model, which only incorporates one item parameter, that is the difficulty level. The difficulty level describes how difficult a question is. The other IRT models include the 2PL-IRT and 3PL-IRT models, which involve two and three item parameters respectively. Besides the difficulty level, the 2PL-IRT model incorporates an additional item parameter—the discrimination power. The 3PL-IRT model incorporates the third item parameter—the guess factor. The discrimination power describes how well an item can discriminate students with different ability levels. The guess factor is the probability that a student can answer an item correctly by guessing.

The item response function is used to calculate the probability of answering item i correctly given a student’s ability θ and the item parameters. The item response function of the 3PLIRT model is described as equation 2.5 (Baker 2001). 1 ( ) 1 1 ai bi i i i i e P P Q c c .

**DINA and NIDA**

DINA (Deterministic Input Noisy AND) (Junker and Sijtsma 2001) and NIDA (Noisy Input Deterministic AND) (Maris 1999) are two latent variable models developed in psychometrics, which are proposed to model the conjunctive relationship between a set of cognitive attributes to be assessed and student performance on particular items or tasks in the assessment. They are nonparametric models, which only require the specification of an item-by-attribute association matrix. Since no statistical parameter estimation is required, the models can be used on a sample size as small as 1. It can be noted that in the terminology of psychometrics, a knowledge component or a skill is called an attribute. In the two models, both of the latent cognitive attributes and the observations of student performance are represented by discrete variables, thus they are also the latent class models, which aim to estimate the class membership of a student’s knowledge. The latent classes are the complete profile of skills which have been mastered and which have not. An accurate Q-matrix which representing the mapping from items to attributes is required for the two models, whereas in an IRT model, the mapping between items and a coarse-level subject is required.

Suppose that there are K cognitive attributes to be assessed. The attribute profile of a student (i.e. the knowledge state) is a K-dimensional vector, denoted by vector 𝜶. Each entry k, denoted by 𝛼𝑘 , where k=1,⋯,K , indicates student knowledge on attribute k with two alternatives, i.e. mastered or not mastered. Hence, there are 2K alternatives for 𝜶, which are the latent classes for which the classification is desired. To model the relationship between tasks and attributes, they use the additional variables—latent response variables in both models but with distinct meanings. The formal definitions of the two models are as follows:

Xij=1 or 0 denotes whether or not student i performs item j correctly.

Qjk=1 or 0 denotes whether or not attribute k is relevant to item j.

αik=1 or 0 denotes whether or not student i possesses attribute k.

**Table of contents :**

List of Figures

List of Tables

**Chapter 1: Introduction **

1.1 Individualized Learning

1.2 Learning Sequence

1.3 Student Modeling

1.4 Issues and Challenges

1.5 Contribution of This Thesis

1.6 Structure of This Thesis

**Chapter 2: Review of Literature **

2.1 Evidence Models

2.1.1 Probabilistic Graphical Models

2.1.1.1 Bayesian Networks

2.1.1.2 Dynamic Bayesian Networks

2.1.1.3 Bayesian Knowledge Tracing

2.1.2 Latent Variable Models

2.1.2.1 Item Response Theory

2.1.2.2 DINA and NIDA

2.1.2.3 Factor Analysis

2.1.3 Integrated models

2.1.4 Q-matrix

2.2 Skill Models

2.2.1 Granularity

2.2.2 Prerequisite Relationships

**Chapter 3: Towards Improving Evidence Model **

3.1 Diagnostic Features

3.2 A General Graphical Model

3.3 Improving Student Model with Diagnostic Items

3.3.1 A Diagnostic Model

3.3.2 Metrics for Student Model Evaluation

3.3.3 Evaluation

3.3.3.1 Data Sets

3.3.3.2 Comparison of Three Diagnostic Models

3.3.3.3 Diagnostic models vs. binary models

3.4 Comparison of Existing Models

3.5 Summary

**Chapter 4: Towards Improving Skill Model **

4.1 Prerequisite Relationships

4.2 Discovering Prerequisite Structure of Skills

4.2.1 Association Rules Mining

4.2.2 Discovering Skill Structure from Knowledge States

4.2.3 Discovering Skill Structure from Performance Data

4.3 Evaluation of Our Method

4.3.1 The Experiment on Simulated Testing Data

4.3.2 The Experiment on Real Testing Data

4.3.3 The Experiment on Real Log Data

4.3.4 Joint Effect of Thresholds

4.4 Comparison with Existing Methods

4.5 Improvement of a Student Model via Prerequisite Structures

4.6 Summary

**Chapter 5: Conclusion **

5.1 Summary of This Thesis

5.2 Limitations and Future Research

**Bibliography**