2022. 10. 9. 06:44ㆍData science/Database
1. Metadata: data about data with the attribute
- Data that provides information about a specific dataset, focusing on different aspects such as source, resolution, production or accuracy
- Multiple purposes: to manage datasets, to report information
- It's stored in metadata repositories(which are databases in themselves)
1) Metadata General properties
- Sufficiency: Can an object describe itself? Like images
- Scalability: Allows for rapid searching compared to searching on large data files
- Inter-operability: Exchange data using mutually agreed metadata format.
2) Types of metadata (https://atlan.com/what-is-metadata/)
By Bretherton and Singley (1994)
- Structural metadata: Describes the datasets(tables, attributes, indexes etc.) information that helps establish object-to-object relationships and hierarchical structure between different data assets. This includes table names, data types, data sources, foreign key cardinality, and referential integrity.
- Guide metadata: Describe contents in a common language; metadata should be able to interpret and read by anyone
By Kimball(1996)
- Technical metadata: For main internal attributes, Information about the data itself(design, structure of schema, table and column information, column size, validation rules, and data quality profiles on data assets
- Business metadata: For the external attributes, Related to the industrial process, a glossary of terms/definitions helps business users understand a particular data asset.
... Refer to the ppt slides
2. Paradata
Paradata is auxiliary data about the process of data collection and includes keystroke files and time stamps
Paradata, which provides information about the survey data collection process (Couper, 2000a), lend insight into errors and costs that can impact the quality of a survey data collection—often at a low cost of collection to researchers.
Despite the problems currently associated with web surveys, such as coverage and nonresponse errors, web
surveys have at least one particularly interesting feature: They record a wide array of paradata.
Couper (1998) was the first person to introduce the term “paradata” to the field of survey methodology. He initially used the term to refer to automatically generated process data, such as the audit trails produced by Blaise. Since then, the term has expanded to cover all types of data about the process of collecting survey data, such as interviewer call records, length of the interview, keystroke data, and interviewer characteristics. Although interviewer observations and information from interviewer questionnaires do not describe processes, this kind of information is also often referred to as paradata. Not included under the term paradata are the actual survey questionnaire data.
- Server-side paradata: describes server events but not the respondent's action in the webpage
- Client-side paradata: collected at the respondent's client machine, its behaviour in the webpage, how it filled in the survey, though some scripting tool
**Paradata can be used to improve the management of data collection in the following ways:
'Data science > Database' 카테고리의 다른 글
Understanding Database 1 - Database (0) | 2023.01.15 |
---|---|
Week 1. Data Quality (0) | 2022.10.09 |
Week 1. Data Sources & Data Resolution (0) | 2022.10.09 |
Week1. What is a database? (0) | 2022.10.02 |
[IBM]Databases and SQL for Data Science with Python - JOIN Statements (0) | 2021.05.09 |