Skip to main content

Categorical Data Analysis

Study Course Description

Course Description Statuss:Approved
Course Description Version:5.00
Study Course Accepted:14.03.2024 11:48:07
Study Course Information
Course Code:SL_117LQF level:Level 7
Credit Points:2.00ECTS:3.00
Branch of Science:Mathematics; Theory of Probability and Mathematical StatisticsTarget Audience:Life Science
Study Course Supervisor
Course Supervisor:Maksims Zolovs
Study Course Implementer
Structural Unit:Statistics Unit
The Head of Structural Unit:
Contacts:14 Baložu street, Block A, Riga, statistikaatrsu[pnkts]lv, +371 67060897
Study Course Planning
Full-Time - Semester No.1
Lectures (count)7Lecture Length (academic hours)2Total Contact Hours of Lectures14
Classes (count)5Class Length (academic hours)2Total Contact Hours of Classes10
Total Contact Hours24
Part-Time - Semester No.1
Lectures (count)7Lecture Length (academic hours)1Total Contact Hours of Lectures7
Classes (count)5Class Length (academic hours)2Total Contact Hours of Classes10
Total Contact Hours17
Study course description
Preliminary Knowledge:
• Familiarity with probability theory and mathematical statistics. • Basic knowledge in Jamovi software. • Basic knowledge in data analysis.
Objective:
Since most of statistical data is of categorical nature, then the aim of the course is to point out the specific features of this type of data and to teach the respective methods of statistical analysis. The course will be mostly on methods and application, to some extent mathematical background and justification of the methodology will be provided. Statistics software Jamovi will be used for computer labs, where real datasets covered in lectures will be analysed by students, so that they would connect theory and practice, so that students would become confident in using the methodology in practical data analysis.
Topic Layout (Full-Time)
No.TopicType of ImplementationNumberVenue
1The nature of categorical data. Classification by purpose and scale. Types of studies. Probability distributions. Overdispersion.Lectures1.00auditorium
2Joint distribution of categorical variables. Conditional and marginal distributions. Maximum likelihood estimates to probabilities.Lectures1.00auditorium
3Independence. Measures of dependence, relative risk, odds, odds ratio. 2x2 frequency tables. Estimates from frequency table. Conditional probabilities – sensitivity, specificity. True negative, false positive.Lectures1.00auditorium
4Introduction to "Jamovi". Visualising categorical data. Comparing distributions. Frequency tables. Conditional frequencies. Estimating measures of dependence.Classes1.00computer room
5Larger than 2 x 2 tables. Measurements of dependence for ordinal and nominal data. Hypothesis about population distribution. Chi-square test.Lectures1.00auditorium
6Measurements of dependence for ordinal and nominal data. Hypothesis about independence and conditional independence.Classes1.00computer room
7Large sample case, hypothesis about independence. Chi-square and likelihood ratio test. Small sample case, Fisher’s exact test.Lectures1.00auditorium
8Hypothesis about independence and conditional independence.Classes1.00computer room
9Asymptotic distribution of multinomial frequencies. Confidence intervals for odds ratio and relative risk. Testing homogeneity of marginal distribution in case of paired observations.Lectures1.00auditorium
10Interval estimation for measures of dependence. McNemar test.Classes1.00computer room
11Models for binary outcome variable – Logit- and log-linear models. Models for retrospective studies. Decision trees for classification.Lectures1.00auditorium
12Modelling categorical data. Classified data and raw data. Modelling and decision trees. Classification error.Classes1.00computer room
Topic Layout (Part-Time)
No.TopicType of ImplementationNumberVenue
1The nature of categorical data. Classification by purpose and scale. Types of studies. Probability distributions. Overdispersion.Lectures1.00auditorium
2Joint distribution of categorical variables. Conditional and marginal distributions. Maximum likelihood estimates to probabilities.Lectures1.00auditorium
3Independence. Measures of dependence, relative risk, odds, odds ratio. 2x2 frequency tables. Estimates from frequency table. Conditional probabilities – sensitivity, specificity. True negative, false positive.Lectures1.00auditorium
4Introduction to "Jamovi". Visualising categorical data. Comparing distributions. Frequency tables. Conditional frequencies. Estimating measures of dependence.Classes1.00computer room
5Larger than 2 x 2 tables. Measurements of dependence for ordinal and nominal data. Hypothesis about population distribution. Chi-square test.Lectures1.00auditorium
6Measurements of dependence for ordinal and nominal data. Hypothesis about independence and conditional independence.Classes1.00Pool
7Large sample case, hypothesis about independence. Chi-square and likelihood ratio test. Small sample case, Fisher’s exact test.Lectures1.00auditorium
8Hypothesis about independence and conditional independence.Classes1.00computer room
9Asymptotic distribution of multinomial frequencies. Confidence intervals for odds ratio and relative risk. Testing homogeneity of marginal distribution in case of paired observations.Lectures1.00auditorium
10Interval estimation for measures of dependence. McNemar test.Classes1.00computer room
11Models for binary outcome variable – Logit- and log-linear models. Models for retrospective studies. Decision trees for classification.Lectures1.00auditorium
12Modelling categorical data. Classified data and raw data. Modelling and decision trees. Classification error.Classes1.00computer room
Assessment
Unaided Work:
1. Individual work with the course material in preparation to lectures according to plan. 2. Independently prepared homeworks by practicing the concepts studied in the course. In order to evaluate the quality of the study course as a whole, the student should fill out the study course evaluation questionnaire on the Student Portal.
Assessment Criteria:
Assessment on the 10-point scale according to the RSU Educational Order: • 2 independent homeworks – 50%. • Attendance and active participation during practical classes – 25%. • Final written exam – 25%.
Final Examination (Full-Time):Exam (Written)
Final Examination (Part-Time):Exam (Written)
Learning Outcomes
Knowledge:• On successful course completion students will be familiar with a range of statistical analysis methodology available for categorical data. They will know and interpret the large sample as well the small sample tests. • Will detect the nature of categorical data; how to measure dependence between categorical variables based of the type of study and type of variables (nominal or ordinal). • Students will demonstrate how to model a binary outcome variable using continuous or categorical variables.
Skills:• Student will understand and explain the effect of different types of data collection methods to the random nature of the frequency table. Interpret the distributional models for the frequency table, for its rows and columns. • Explain the dependence measures defined on the joint distribution of 2 categorical variables (relative risk, odds ratio, etc.), can interpret and estimate them. • Can test goodness-of-fit of data with the assumed distributional model, can test independence of categorical variables. • Can model the categorical variable (binary, in special case) by other variables. • Can apply independently his/her knowledge on real data.
Competencies:• On successful course completion student will be competent to read and critically assess the scientific publications that have used categorical data in their analysis. • Student will be competent to plan and execute data analysis with categorical data.
Bibliography
No.Reference
Required Reading
1Agresti, Alan. Categorical Data Analysis. Wiley, 2012 (or 1990, 2002 editions).
Additional Reading
1Agresti, Alan. An Introduction to Categorical Data Analysis. Wiley, 2019 (or 1996, 2007 editions).