Week 1. MetaData & ParaData

2022. 10. 9. 06:44Data science/Database

반응형

1. Metadata: data about data with the attribute

- Data that provides information about a specific dataset, focusing on different aspects such as source, resolution, production or accuracy

- Multiple purposes: to manage datasets, to report information 

- It's stored in metadata repositories(which are databases in themselves) 

 

1) Metadata General properties 

- Sufficiency: Can an object describe itself? Like images

- Scalability: Allows for rapid searching compared to searching on large data files

- Inter-operability: Exchange data using mutually agreed metadata format. 

 

2) Types of metadata (https://atlan.com/what-is-metadata/)

By Bretherton and Singley (1994)

  1. Structural metadata: Describes the datasets(tables, attributes, indexes etc.) information that helps establish object-to-object relationships and hierarchical structure between different data assets. This includes table names, data types, data sources, foreign key cardinality, and referential integrity.
  2. Guide metadata: Describe contents in a common language; metadata should be able to interpret and read by anyone

By Kimball(1996)

  1. Technical metadata: For main internal attributes, Information about the data itself(design, structure of schema, table and column information, column size, validation rules, and data quality profiles on data assets
  2. Business metadata: For the external attributes, Related to the industrial process, a glossary of terms/definitions helps business users understand a particular data asset.

... Refer to the ppt slides

 

2. Paradata

Paradata is auxiliary data about the process of data collection and includes keystroke files and time stamps

Paradata, which provides information about the survey data collection process (Couper, 2000a), lend insight into errors and costs that can impact the quality of a survey data collection—often at a low cost of collection to researchers.

Despite the problems currently associated with web surveys, such as coverage and nonresponse errors, web
surveys have at least one particularly interesting feature: They record a wide array of paradata.

Creating Rich,Structured Metadata:Lessons Learned in theMetadata Portal Projectby Mary Vardigan1, Darrell Donakowski2, Pascal Heus3, Sanda Ionescu4, and Julia Rotondo5
Paradata analyses to inform population-based survey capture of pregnancy outcomes: EN-INDEPTH study

Couper (1998) was the first person to introduce the term “paradata” to the field of survey methodology. He initially used the term to refer to automatically generated process data, such as the audit trails produced by Blaise. Since then, the term has expanded to cover all types of data about the process of collecting survey data, such as interviewer call records, length of the interview, keystroke data, and interviewer characteristics. Although interviewer observations and information from interviewer questionnaires do not describe processes, this kind of information is also often referred to as paradata. Not included under the term paradata are the actual survey questionnaire data.

ESRC National Centre for Research Methods Review paperSurvey Paradata: A reviewGerry NicolaasNational Centre for Social Research (NatCen)

 

  • Server-side paradata: describes server events but not the respondent's action in the webpage
  • Client-side paradata: collected at the respondent's client machine, its behaviour in the webpage, how it filled in the survey, though some scripting tool

**Paradata can be used to improve the management of data collection in the following ways: 

 

반응형