Piccole Gioie

Markov chain Monte Carlo

LiveLoveFlow — Sat, 20 May 2023 04:28:36 +0900

Markov chain Monte Carlo is a powerful technique that combines the concepts of Markov chain and Monte Carlo methods. It provides a way to sample from otherwise intractable probability distributions. MCMC methods are statistical techniques that provide a way to sample from complex, high-dimensional probability distributions that are otherwise intractable or difficult to analyze mathematically. MCMC methods are particularly useful in Bayesian inference, a framework for updating beliefs or making predictions based on prior knowledge and observed data.

MCMC methods arise from constructing a Markov chain - a sequence of random variables where the probability of each variable depends on the last variable in the sequence. The MCMC method aims to generate samples from a target probability distribution, often a posterior distribution in Bayesian inference. The key idea is constructing a Markov Chain whose stationary distribution equals the target distribution. We obtain a representative set of samples from the target distribution by iteratively generating samples from this Markov chain.

https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.researchgate.net%2Ffigure%2FIllustration-of-Markov-Chain-Monte-Carlo-method_fig1_334001505&psig=AOvVaw376MgcRDqSyf1XOBM9scPy&ust=1684582332392000&source=images&cd=vfe&ved=0CBMQjhxqFwoTCJiVlK6kgf8CFQAAAAAdAAAAABA5

The main advantage of MCMC methods is their ability to explore and sample from intractable probability distributions with no closed-form solution. This makes MCMC particularly valuable in cases where direct sampling or analytical approaches are not feasible.

Metropolis-Hastings algorithm

1) Start with an initial sample from the target distribution.

2) Propose a new sample by randomly perturbing the current sample

3) Evaluate the acceptance probability for the proposed sample based on the ratio of the target distribution values

4) Accept the proposed sample with a certain probability, or reject it and keep the current sample

5) Repeat steps 2 to 4 for a large number of iterations to obtain a sequence of samples

The Metropolis-Hastings algorithm leverages the Markov chain property to explore the target distribution iteratively. By generating proposals and accepting or rejecting them based on the acceptance probability, the algorithm allows for efficient sampling from high-dimensional and complicated distributions. Over a sufficient number of iterations, the sequence of samples produced by the algorithm converges to a representative sample from the target distribution.

In Bayesian Inference, MCMC methods explore the posterior distribution of model parameters given observed data. By sampling from the posterior distribution, one can estimate the uncertainty in parameter values, assess model fit, and make predictions. It allows for the incorporation of prior knowledge and the quantification of uncertainty.

In the Random-Walk Metropolis-Hastings (RW-MH) algorithms are accepted a proposed move from a point to a point ' with probability min ( 1, ( | ') ( ')/ ( '| ) ( ) ).

Proposal Distribution	The distribution is used to generate candidate points in the Markov chain. RW-MH often follows a random walk pattern, where the new point is obtained by perturbing the current point according to a specific distribution.
(θ\|θ')	The density of proposing θ' given θ. It represents the likelihood of transitioning from the current point θ to the proposed point θ'.
(θ'\|θ)	The density of proposing θ given θ'. It represents the likelihood of transitioning from the proposed point θ' back to the current point θ.
Target Distribution	The distribution from which we want to sample. Often, it is the posterior distribution in Bayesian inference. η(θ) represents the density of the target distribution at point θ, and η(θ') represents the density at the proposed point θ'.
(θ)	The density of the target distribution at point θ (The current state or parameter). It indicates the relative probability of observing θ according to the target distribution.
(θ')	The density of the target distribution at the proposed point θ' (New proposed state or parameter). It indicates the relative probability of observing θ' according to the target distribution.
Acceptance Probability	The probability of accepting a proposed move from point θ to point θ', determined by the formula min(1, (ϕ(θ\|θ')η(θ')) / (ϕ(θ'\|θ)η(θ))). It ensures that the Markov chain transitions between points with a probability proportional to the target distribution.

min( 1, x ) denotes taking the minimum between 1 and x. This means that this will ensure the proposed state ' is accepted with a probability of at least one or proportional to the target and proposal densities ratio.

This acceptance probability determines whether the proposed state is incorporated into Markov Chain or Rejected.

Unsupervised or Supervised Classification

LiveLoveFlow — Fri, 19 May 2023 08:50:54 +0900

Unsupervised or Supervised Classification

Comparison Table

This post will focus on classification algorithms, such as K-means, model-based clustering, K nearest neighbours and probabilistic classifiers.

K-means

K-means clustering is a popular unsupervised machine learning algorithm to partition a dataset into k distinct clusters. It aims to minimize the within-cluster sum of squares by assigning data points to clusters based on their proximity to cluster centres. * distance can be measured by the standard Euclidean distance: d(x_j, x_n) = || x_j - x_k ||₂

-> which makes the total within-cluster sum of squares minimized.

How to do the K-means?

Initialization: Start by selecting k initial cluster centres. Various methods can be used, such as choosing random points, points from the dataset, or a centrally located point.
Assignment Step: For each data point, calculate the Euclidean distance to each cluster centre and assign the point to the cluster with the closest centre. The distance can be computed using the standard Euclidean distance formula.
Update Step: After assigning all points to clusters, update the cluster centres by calculating the mean of the data points within each cluster. The new cluster centre represents the average position of the data points in that cluster.
Convergence Check: Check if the cluster centres have converged. If the positions of the cluster centres remain unchanged or the change is below a predefined threshold, the algorithm stops. Otherwise, repeat steps 2 and 3.

** Clustering depends on the initial centres. Thus, run the algorithms with different initial centres, then choose the result with the smallest sum of squares.

** To decide how many clusters to keep, using an elbow plot can be helpful. Choose the value of k where the break occurs in the slope.

How to choose centres?

Random Initialization: Choose k points uniformly and randomly from the data space. This approach is simple and suitable when the data distribution is uniform.
Forgy's Method: Select k points from the dataset as the initial cluster centres. This method ensures that the initial centres are representative of the data points.
Random Partition: Assign points to clusters uniformly at random, with each cluster initially having approximately n/k points. Use the cluster means as the initial centres.
Centrally-Located Point: Choose the most centrally located point in the dataset as the first centre and subsequently select the points with the largest minimum distance to all existing centres.
Maximin: Begin by selecting a centre arbitrarily and then choose subsequent centres as the data points with the largest minimum distance to all existing centres. This method ensures that the initial centres are well-separated.

Determining the Number of Clusters:
The choice of the number of clusters, k, is crucial. One approach uses an elbow plot, which displays the relationship between the number of clusters and the within-cluster sum of squares. Identify the value of k where the plot shows a significant break or reduction in the slope, indicating a good balance between compactness and separation of clusters.

Model-based clustering; EM algorithms

Model-based clustering is a powerful technique that allows us to discover underlying patterns in data by fitting a probabilistic model to the observed data. The Expectation-Maximization (EM) algorithms are commonly used to estimate the model's parameters in a maximum likelihood framework.

Model-based Clustering involves specifying the number of clusters in advance and then fitting a probabilistic model to the data. A probability distribution represents each cluster, and the goal is to estimate the parameters of these distributions that best explain the data.

EM Algorithms: The EM algorithm is an iterative optimization technique to estimate the model parameters. It consists of two steps- The expectation and Maximization steps.

Initialization: Start by assigning points to clusters uniformly at random and construct an initial class membership matrix, z, using one-hot encoding.
Expectation step: Compute the mixture parameters, such as the mixing coefficients ( ), mean values ( ), and covariance matrices ( ), based on the current estimates.
Maximization step: Update the cluster assignment probabilities using the newly estimated mixture parameters. This involves calculating the probabilities of points belonging to each cluster based on the model's parameters.
Convergence check: If the cluster assignment probabilities, z_ij, are not changing significantly, stop the algorithm. Otherwise, repeat steps 2 and 3 until convergence is achieved.

Benefits of Model-based Clustering

Flexibility: Model-based clustering allows for more flexible cluster shapes and can accommodate various types of data distributions.
Uncertainty Estimation: The algorithm provides probabilities of class membership for each data point, allowing uncertainty quantification.
Scalability: Model-based clustering can handle large datasets using efficient algorithms and approximation techniques.

Choosing the number of clusters

Selecting the appropriate number of clusters is a crucial step in model-based clustering. BIC or cross-validation can be used to determine the optimal number of clusters that best balance model complexity and goodness of fit to the data.

K-nearest neighbours

K-nearest neighbours (KNN) is a simple yet effective classification algorithm that makes predictions based on the proximity of training data points to a test feature vector.

1) Given a test feature vector x, the KNN algorithm identifies the k nearest neighbours in the training set based on a specified distance metric. The distance metric can be Euclidean distance, Manhattan distance or any other suitable similarity measure.

2) Take a majority vote from those K for the predicted class of the test datapoint ( K should be odd).

KNN offers several advantages, including its simplicity and flexibility. It can handle both classification and regression tasks and adapt to various data types.

** The choice of K: A small value of K may lead to overfitting, while a large value of K may result in underfitting (smooth boundary, but high bias and low variance, reduce noise impact).

** KNN relies on distance calculations; it is crucial to normalize or scale the features to ensure that no single feature dominates the distance calculation. ( z-score normalization or min-max scaling)

** Recommendation systems, image recognition and anomaly detection are useful when the decision boundaries are complex, or the training data is unbalanced.

Probabilistic Classifiers

Probabilistic Classifiers are a class of machine learning algorithms that assign probabilities to different classes or categories for a given input. Instead of providing a single predicted class, probabilistic classifiers estimate the likelihood or probability of an input belonging to each class.

1) Probability estimation: Probabilistic classifiers provide a probability distribution over all possible classes for a given input.

2) Decision boundary separates different classes based on the estimated probabilities.

3) Training process: The algorithm learns the relationship between input features and class labels using a labelled dataset. The process involves estimating the parameters of the probability distribution for each class.

4) Bayesian framework: Often based on the Bayesian framework, which incorporates prior knowledge and updates it with observed data using Bayes' theorem.

5) Evaluation: Evaluated using various metrics such as prolonged loss, accuracy, precision, recall and F1-score.

** Logistic regression, Naive Bayes and support vector machines with probabilistic outputs.

** Spam detection, sentiment analysis, medical diagnosis, and fraud detection

PCA (Principal Components Analysis)

LiveLoveFlow — Fri, 19 May 2023 02:41:48 +0900

Principal Components Analysis (PCA) is a popular technique for analysing high-dimensional data. PCA is a mathematical procedure that transforms a set of correlated variables into a new set of uncorrelated variables, called "Principal Components" while retaining most of the variability of the original data.

To perform PCA on a dataset of N, p-dimensional data point x_j ∈ Rᴾ, we need to follow these steps.

1. Standardize the data: To ensure all variables are in a similar range, subtract the mean and divide by the standard deviation for each variable.

2. Compute the covariance matrix: The covariance matrix summarizes the relationships between the variables in the data. The covariance matrix can be computed by using the formula, "cov(x)= 1/(n-1)* XᵀX" (X is the standardized data matrix).

3. Compute the eigenvectors and eigenvalues (Eigensystem): These represent the directions and magnitude of the maximum data variability. ( and )

4. Sort the eigenvectors in descending order of their corresponding eigenvalues: The eigenvectors with the highest eigenvalues represent the directions of maximum variability in the data. These directions are called "Principal components".

5. Choose the number of principal components: Use a scree plot to determine the number of components to keep. (scree plot: Plot of eigenvalues against the number of components)

6. Compute the principal components: by multiplying the standardized data matrix by the eigenvectors of the covariance matrix

7. Calculate the percentage of the total sample variance explained by the i-th principal components ( _i / ∑ _j ) * 100 %, j-th element of the i-th eigenvector v_i is called "loading" of the j-th variable onto the i-th principal component.

(*Loading: Relationship between the original variables and principal components. ; PC_i = v_i1 * x_1 + ... + v_ip *x_p )

The result of PCA will produce principal components, which are linear combinations of original variables. These PCs will be uncorrelated.

PCA can be used to reduce the dimensionality of data, making it easier to visualize and analyze. It can also be used to identify patterns and relationships in the data (correlation), and data compression.

Batch Normalization for Deep Learning

LiveLoveFlow — Wed, 5 Apr 2023 06:21:37 +0900

*** This post is a summary version of the below website. The website has more details and references.

https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/

One possible reason for the difficulties of having changeable inputs to layers deep in the network("Internal Covariate Shift) is due to updating weights per mini-batch. Batch Normalisation can stabilise the learning process and reduce the number of training epochs required.

Batch Normalisation

: scales the layer output by standardising each input variable's activations per mini-batch. This is done by subtracting the mean of the batch from each value and dividing it by the standard deviation of the batch. This way, the rescaled data would have a mean of zero and a standard deviation of one. Therefore, it will allow all input variables to contribute to the model equally. "Whitening" is another term for Batch Norm.

Use cases:

- Dramatic speed improvement of an inception-based convolutional neural network for photo classification

- Used it after each convolution and before activation for a standard photo classification, significant outcome

- Used in the updated inception model (GoogleNet Inception-v3) for the ImageNet dataset, an incredible outcome

- Used in the recurrent neural networks in their end-to-end model for speech recognition, improve the final generalization error and accelerating training

How to Use

Use before or after the activation function in the previous layer:
1. after it, if s-shaped functions ( logistic, hyperbolic tangent )
2. before it, if it results in non-Gaussian distributions like rectified linear activation function
Use a Higher Learning Rate: Observed that the training speedup with higher learning rates.
It can be used in data preprocessing purposes if the data has different scaled variables.
DO NOT use Dropout: As it reduces the generalization error, dropout is not required.

https://gaussian37.github.io/dl-concept-batchnorm/

Some quotes that I like related to Happiness

LiveLoveFlow — Tue, 4 Apr 2023 23:04:07 +0900

Keywords

Verbs:

essere (to be)
avere (to have)
godere (to enjoy)
trovare (to find)
fare (to make/do)
pensare (to think)
aspettare (to wait)
prendere (to take)
durare (to last)
vivere (to live)
guardare (to look)
rimpiangere (to regret)
sprecare (to waste)

Subjects:
la felicità (happiness)
le cose belle (beautiful things)
l'occasione (opportunity)
la vita (life)
il cammino (journey)

Phrases

"La felicità è un viaggio, non una meta." - Happiness is a journey, not a destination.
"La felicità è fatta di piccole cose." - Happiness is made of small things.
"La felicità non è avere tutto, ma saper godere di ciò che si ha." - Happiness does not have everything, but being able to enjoy what you have.
"La felicità non è qualcosa di già fatto. Viene dalle tue azioni." - Happiness is not something already made. It comes from your actions.
"Essere felici non significa avere una vita perfetta, ma riuscire a trovare la perfezione nelle imperfezioni." - Being happy doesn't mean having a perfect life, but being able to find perfection in imperfections.
"Fallo e basta, non pensarci troppo. L'importante è averci provato." (Do it, don't think too much about it. The important thing is to have tried.)
"Goditi ogni momento del cammino, qualunque cosa ti capiti e ovunque ti porti. Un giorno, guarderai indietro e sarai contento di non averlo sprecato." (Enjoy every moment of the journey, whatever happens and wherever it takes you. One day, you will look back and be glad you didn't waste it.)
"Non aspettare l'occasione perfetta, prendi l'occasione e rendila perfetta." (Don't wait for the perfect opportunity, take the opportunity and make it perfect.)
"Le cose belle non durano per sempre, ma i ricordi di esse restano per sempre nella tua mente." (Beautiful things don't last forever, but the memories of them stay with you forever in your mind.)
"Vivi la tua vita in modo che, quando la guardi indietro, il tuo unico rimpianto sia di non aver vissuto abbastanza." (Live your life in such a way that when you look back, your only regret is not having lived enough.)

Lecture Note - Parameter Tuning (Deep Learning) - TBU

LiveLoveFlow — Thu, 30 Mar 2023 07:10:50 +0900

https://south-crown-f53.notion.site/Lecture-Note-Parameter-Tuning-Deep-Learning-109c66b3094c4bf08af1710947a5ea88

Lecture Note - Parameter Tuning (Deep Learning)

https://www.youtube.com/watch?v=1waHlpKiNyY&list=PLkDaE6sCZn6Hn0vK8co82zjQtt3T2Nkqc

south-crown-f53.notion.site

[Book Review] Methods for interpreting the deep neural networks - TBU

LiveLoveFlow — Thu, 30 Mar 2023 07:06:44 +0900

Interpretation of the deep neural networks

Deep neural networks can be challenging to interpret and understand with their multiple layers of nonlinear transformations. While they may produce highly accurate predictions, it can be challenging to determine how the model arrived at those predictions and which features of the input data were most important in making the prediction.

This summarises a few methods that can be applied to interpreting deep neural networks.

The following is the book "Interpretable Machine Learning Chapter 10 - Neural Network" summary.

[Introduction]

Interpretation methods are used to visualize features and concepts learned by neural networks, explain individual predictions, and simplify neural networks
Deep learning has been successful in image and text tasks, but the mapping from input to prediction is too complex for humans to understand without interpretation methods
Specific interpretation methods are needed for neural networks because they learn hidden features and concepts and because the gradient can be utilized for more efficient methods
The following techniques are covered in the following chapters:
- Learned Features
- Pixel Attribution (Saliency Maps)
- Concepts
- Adversarial Examples
- Influential Instances

[10.1 Learned Features]

We can use activation maximisation to visualize the learned features in a convolutional neural network. This involves starting with a random input image and iteratively adjusting its pixel values to maximize the activation of a specific feature or neuron in the network. By doing this for multiple features or neurons, we can generate images highlighting the learned features in each network layer. These images can give us insights into what types of patterns and concepts the network is learning.

https://christophm.github.io/interpretable-ml-book/cnn-features.html

[10.1.1. Feature Visualization]

Feature Visualization is making the learned features explicit in a neural network. It is done by finding the input that maximizes the activation of a unit, such as a single neuron, channel, entire layer, or the final class probability in classification. (Feature visualization can be carried out for different units, neurons, channels, layers, hidden layers, and pre-softmax neurons)

TBD...

Mathematically, feature visualization is an optimization problem, where the goal is to find a new image that maximizes the (mean) activation of a unit, assuming that the neural network weights are fixed. The optimization can be done by generating new images starting from random noise while applying constraints such as minor changes, jittering, rotation, scaling, frequency penalization, or generating images with learned priors.

Different units can be used for feature visualization, including individual neurons, channels, entire layers, or the final class probability in classification. Individual neurons are atomic units of the network, but visualizing each neuron’s feature would be time-consuming due to the millions of neurons in a network. Channels as units are a good choice for feature visualization, and we can visualize an entire convolutional layer. Layers as a unit are used for Google’s DeepDream, which repeatedly adds the visualized features of a layer to the original image, resulting in a dream-like version of the input.

TBD...

[Literature Review] Estimating Networks of Sustainable Development Goals

LiveLoveFlow — Tue, 28 Mar 2023 09:18:28 +0900

"Estimating Networks of Sustainable Development Goals" by Luis Ospina-Forero

The paper "Estimating Networks of Sustainable Development Goals" by Luis Ospina-Forero focuses on the interlinkages and interactions between the 17 Sustainable Development Goals (SDGs) outlined by the United Nations. The paper presents a method for estimating networks of SDGs, which can be used to identify the most influential SDGs and how they interact.

The paper begins with an introduction to the SDGs and their importance in global development efforts. It then discusses the challenges in measuring the progress towards achieving the SDGs and the need to understand the interlinkages between them better.

The proposed method for estimating SDG networks involves using Bayesian networks, which can model the causal relationships between the SDGs. The paper describes the data used in the analysis, which includes data from the World Bank and the United Nations and the statistical methods used to estimate the network.

The analysis results show that the SDGs are highly interconnected, with some SDGs being more influential than others. The paper discusses the implications of the findings for policymakers and suggests that the approach can be used to prioritize interventions and measure progress towards achieving the SDGs.

Overall, the paper provides a novel method for estimating networks of SDGs and contributes to understanding the interlinkages and interactions between the SDGs. The findings have important implications for global development efforts and can inform policymakers' decision-making processes.

Introduction

Background: Provides an overview of the Sustainable Development Goals (SDGs) and their importance in global development efforts - introduction to the SDGs and their importance in global development efforts
Research question: States the main research question addressed in the study - challenges in measuring the progress towards achieving the SDGs and the need to understand the interlinkages between them better.
Objectives: Outline the objectives of the study
Contribution: Summarizes the contributions of the study
Outline: Provides a brief overview of the structure of the paper

Literature review

Describes the concept of the SDGs and their importance in sustainable development
Discusses previous research on the SDGs, including studies that have examined their relationships and interdependencies
Identifies gaps in the existing knowledge and explains how the current study addresses those gaps

Methods

Methodologies mentioned in the paper
- Bayesian networks are a probabilistic graphical model used to represent and reason uncertain relationships between variables. This paper uses Bayesian networks to model the causal relationships between the SDGs.
- Partial correlation analysis: a method for identifying correlations between variables while controlling for the effects of other variables.
- Ridge regression: a linear regression method used to analyze multiple regression data that suffer from multicollinearity.
- Network inference using time-delayed mutual information: a method for inferring causal relationships between variables from time-series data.
- Granger causality: a statistical method for inferring causal relationships between variables in time-series data.
- Structural equation modelling: a statistical technique for modelling complex relationships between variables using a combination of observed variables and latent variables.
Data sources: Describes the data sources used in the study, including the SDG indicators from the UN Sustainable Development Goals Knowledge Platform
Analytical techniques: Describes the analytical techniques used to estimate the networks of SDGs, including correlation analysis and network analysis
Statistical models: Describes the statistical models used to estimate the networks of SDGs, including partial correlation and graphical LASSO

Results

Describes the networks of SDGs estimated using the analytical techniques and statistical models described in the methods section
Presents data visualizations of the networks, including network graphs and heatmaps
Discusses the findings of the study, including the most important SDGs and their relationships to other SDGs

Discussion

Interprets the results and discusses their implications, including the policy implications of the findings
Highlights the strengths and limitations of the study
Suggests avenues for future research

Conclusion

Summarizes the main findings of the study
Restates the objectives of the study
Highlights the contributions of the study

[Literature Summary] Causal Discovery and Inference: concepts and recent methodological advances

LiveLoveFlow — Tue, 28 Mar 2023 09:10:19 +0900

Ref1. Causal Discovery and Inference: concepts and recent methodological advances

Notes

[What to expect from this paper]

Provides an overview of automated causal inference and emerging approaches to causal discovery from i.i.d data and time series
Reviews fundamental concepts such as manipulations, causal models, sample predictive modelling, causal predictive modelling, and structural equation models
Discusses the constraint-based approach to causal discovery, which relies on conditional independence relationships in the data, and the assumptions underlying its validity
Focuses on causal discovery based on structural equation models and discusses the identifiability of the causal structure implied by appropriately defined structural equation models
Shows that independence between the error term and causes, together with appropriate structural constraints on the structural equation, makes it possible to identify the causal direction between two variables in the two-variable case
Discusses recent advances in causal discovery from time series, mentioning traditionally challenging problems such as causal discovery from subsampled data and in the presence of confounding time series
Lists several open questions in the field of causal discovery and inference
Provides a valuable resource for researchers interested in automated causal inference and emerging approaches to causal discovery from i.i.d data and time series.

Key takeaways

Keywords: Causal inference, Causal discovery, Structural equation model, Conditional independence, Statistical independence, Identifiability

Manipulation is changing one or more variables in a system and observing the effect on other variables to establish causality.
Causal models: mathematical models that describe the causal relationships between variables in a system.
Sample predictive modelling: a method of causal inference that uses data from observational studies to predict the outcome of interventions on new data.
Causal predictive modelling: a causal inference method that uses data from observational studies to estimate the causal effects of interventions.
Structural equation models: a statistical model that describes the relationships between variables in terms of structural equations.
Constraint-based approach: a method of causal discovery that relies on identifying conditional independence relationships in the data.
Causal discovery based on structural equations: a method of causal discovery that involves fitting structural equation models to the data and using constraints to identify causal relationships.
Linear causal models with non-Gaussian noise: a class of models that assumes linear causal relationships between variables with non-Gaussian noise.
Causal discovery from time series: a causal discovery method involving analyzing time series data to identify causal relationships.
Causal discovery from subsampled data: a causal discovery method that involves inferring causal relationships from subsets of time series data.
Confounding time series: a problem in causal discovery from time series data where the observed data is influenced by unobserved variables that affect both the cause and effect.
Counterfactual analysis: a method of causal inference that involves comparing the outcomes of a treatment group to a control group to estimate the causal effect of the treatment.
Bayesian networks: a probabilistic graphical model representing the causal relationships between variables as a directed acyclic graph.

Summary

Causal discovery and inference are related but distinct concepts in statistics and machine learning. Causal discovery involves identifying the causal relationships between variables in a system, while causal inference involves using these relationships to make predictions or draw conclusions about the system. Recent methodological advances have improved our ability to perform causal discovery and inference in complex systems, using techniques such as Bayesian networks, structural equation modelling, and counterfactual analysis.

Bayesian networks are a popular method for performing causal discovery, allowing researchers to model the relationships between variables in a probabilistic framework. Bayesian networks can be used to infer causal relationships from observational data, and can also be used to make predictions about the behaviour of the system under different conditions. Recent advances in Bayesian network modelling have included the development of efficient algorithms for learning the network's structure from data and integrating causal discovery with other statistical techniques such as clustering and regression.

Structural equation modelling (SEM) is another method for performing causal inference, which involves modelling the relationships between variables in terms of their underlying causal mechanisms. SEM allows researchers to test hypotheses about the causal relationships between variables, and to estimate the strength and direction of these relationships. Recent advances in SEM have included the development of new methods for dealing with missing data and the integration of SEM with machine learning techniques such as deep neural networks.

Counterfactual analysis is a third approach to causal inference, which involves modelling the system's behaviour under hypothetical scenarios. The counterfactual analysis allows researchers to estimate the causal effects of interventions, by comparing the system's behaviour with and without the intervention. Recent advances in the counterfactual analysis have included the development of new methods for estimating causal effects from observational data and integrating counterfactual analysis with machine learning techniques such as causal forests and propensity score matching.

[Korean Teaching Material] I can speak Korean, I can do!

LiveLoveFlow — Mon, 27 Mar 2023 03:33:01 +0900

This is the lecture material I made for the teaching demonstration: "I can do ~ / I cannot do~."

-을 수 있어요/없어요!

한국말을 할 수 있어요 2.pdf

1.70MB