1. What is database?

Motivation

Database
- Organized collection of inter-related data that models some aspect of the real-world.
- Databases are one of the core components of most computer applications
Database Examples
- Universities: Registration, grades
- Financial market
  - credit card transactions
  - Sales and purchases information of stocks and bonds
  - Real-time market data
- Enterprise information
  - Sales: customer products, purchases
  - Accounting: payments, receipts, assets
  - Human resources: employee profile, salaries, taxes

File System

In the early data, database applications were built directly on top of file systems:

Data redundancy and inconsistency
- Data is stored in multiple file formats and locations
- → Resulting induplication of information
Difficulty in accessing data
- Need to write a new program to carry out each new task
Data isolation
- Meaning: a property that determines when and how changes made by one operation become visible to others
- Cannot be controlled with files
Integrity problems
- Integrity constraints (ex. account balance ≥ 0) become "buried: in program code, rather than being stated explicitly
Concurrency problems
- Uncontrolled concurrent accesses can lead to inconsistencies
Security problems
- Hard to provide a fine-grained user access control

→ Database systems offer solutions to all the above problems

Brief history

~early 1960s:
- Data processing using magnetic tapes for storage
  - Tapes provided only sequential access
- Punched cards for input
- System 360(IBM), Random Acess, Sequential Access
Late 1960s~ and 1970s
- Hard disks allowed direct access to data
- Network and hierarchical data models in use
- Ted Codd defined the relational data model
  - The work won the ACM Turing Awards (1981)
  - IBM Research began System R prototype
  - UC Berkeley (Michael Stonebraker) began Ingres prototype
  - Oracle released first commercial relational database
1980s
- Research relational prototypes evlove into commercial systems
  - SQL becomes industrial standard
- Parallel and distributed database systems
  - Wisconsin, IBM, Teradata
- Object-oriented database system
1990s
- Large decision support and data-mining application
- Large multi-terabyte data warehouse
- Emergence of Web commerce
2000s
- Big data storage systems:Google BigTable, Yahoo PNuts, Amazon, NoAQL systems
- Big data analysis: beyond SQL
  - Map reduce
2010s
- SQL reloaded
  - SQL front-end to Map Reduce systems
  - Massively parallel database systems
  - Multi-code main-memory database

Relational Data

(↔OODB; object-oriented database system)

Database Systems

Databasse
- Organized collection of inter-related data that models some aspect of the real-world(A.pavlo)
  - Things related are laid together (c.f. files are not like this)
- Database system: Informal definition
  - magnetic tapes(storage)
  - Storages; random access, hard disk drives, spinning
  - File system; magnetic data
  - Data; paper, pdf. .
  - Flat file strawman
    - Store a database as comma-separated value(csv) files
    - Manage the CSV files using our own code
      - Use a separate file per entity
      - The applications has to parse the CSV files each time they want to read or update records
    → Issue: data integrity→ Issue: implementationHow to write a new application that uses the same data?What if the machine crashes while file writing?
  - → Issue: durability
  - How to find a particular record?
  - How to examine the validity of the values?
- Database management system (DBMS)
  - Software that allows applications to store and analyze information in a database
    - Access data without worrying about the file I/O-level details
  - A general-purpose DBMS is designed to allow the definition, creation, querying update, and administration of databases
- DBMS as a data storage
  - Data abstraction to avoid low-level implementation and maintenance chores
    - Store database in simple data structures
    - Access data through high-level language
  - Database abstraction does not include:
    - How to implement the storage, relations, ...
    - Clear separation between logical vs. physical layers
- DBMS as interface
  - Data definition language (DDL)
  - Data manipulation language (DML)
  → Structured query language (SQL) includes both DDL and DML

Data Model

Data model: data 혹은 information을 설명하는 notion
- 세 가지 부분을 가짐; structure, operations, constraints
ex.
- Relational data model : 대부분의 DBMS (이번 학기 배울 대부분의 내용)
- NoSQL
  - key/value pair
    - hash table의 관점. 모든 데이터는 key값을 가진다고 가정한다.
  - gragh
  - document
    - mongoDB(jSON)
  - Column-family
- Machine learning
  - Array/matrix
- Misc.: hierarchical, network
  - 서로 다른 data model의 유형들

Relational Data model

relational data model: relation의 관점에서 설명된 데이터로 설명된 data model
Relation
- 정렬되지 않은 집합.
  - entity를 표시하는 attribute의 관계를 포함한다.

Relation(Table)

Attribute(Column)
- atomic해야 한다. (indivisible data type) → data type: int, float, char, string
  - string is an atomic datatype in 대부분의 디비 시스템
Tuple(row)
- attribute value들의 집합.(=domain in the relation)
- 각각의 튜플은 각각의 attribute에 대해 하나의 값을 가진다.
- value는 보통 atomic/scalar
- domain; the set of allowed values for each attrribute
- Null; 모든 도메인의 멤버, 알수 없다는 것을 지칭(missing/ not available)
  - null values는 많은 operation 에서 complications을 유발한다.
    - n-ary relation = table with n colums
    - attributes = columns = features
    - tuples = rows = records = data instance = data point
  - ex. a Relation
Relation은 unordered이다.
- Order of tuples는 상관없다.
- 순서가 달라도 같은 table이다.

Notation

Using a table
Using a set notation
- structure: instructor(ID, name, dept_name, salary)
- Tuples: (76766, Crick, Biology, 72000.00),
  - 이런식으로 tuple안에 attribute를 집어넣는다. 순서는 중요하지 않다.
수학적으로, set은 순서나 중복을 허용하지 않는다.
하지만 tuple안에서는 순서가 중요하다.

Key

constraints의 한 가지 유형
하나 혹은 더 많은 attribute이 key를 형성한다.
A key for a relation → key attributes의 같은 값에 대해서 중복을 허용하지 않는다.
Definition
- k가 R에 속한다고 하자.
- 만약, K의 값이 각각의 가능한 relation r에 대해서 고유한 tuple을 식별하는 데에 충분하다면
  - K 는 R의 super key이다.
- superkey는 k가 최소일 때 candidate key이다. (유일성만 만족하면 super key)
- candidate key중 하나가 primary key로 선택된다.

정리

Super key; 테이블에서 각 행을 유일하게 식별할 수 있는 하나 또는 그 이상의 속성들의 집합
- 유일성만 만족한다.
Candidate key; 각 행을 유일하게 식별할 수 있는 최소한의 속성들의 집합
- 유일성과 최소성을 만족한다.
Primary key; 후보키들 중에서 하나를 선택한 키로 최소성과 유일성을 만족하는 속성
- 오직 1개, null 혹은 중복값 없음.
Alternative key; 후보키들 중에서 하나를 선택한 키로 최소성과 유일성을 만족하는 속성
Foreign Key; 테이블이 다른 테이블의 데이터를 참조하여 테이블간의 관계를 연결하는 것
- 참조될 열의 값은 참조될 테이블에서 기본키(Primary Key)로 설정
- 다른 테이블의 데이터를 참조할 때 없는 값을 참조할 수 없도록 제약을 주는 것
- referencing data → referenced relation

Data Language

DDL; data definition language
- represent relations
DML; data manipulation language
- store and retrieve information
- relational algebra에 기초한다.
cf. non-procedural DML
- data를 찾는 방법에 대한 것이 아니라 what data is wanted
- relational calculus에 근거. → query optimization

Database Schema

Database: relation의 모음.
Database schema: database의 논리적 구조
Database instance: 특정한 순간에 database안에 있는 data를 snapshot하는 것.
- Relation instance: 특정한 순간에 attribute과 tuple 에 대한 정보의 모임
→ 특정한 순간에 데이터 베이스에 저장되어 있는 정보의 모임. (시간적인 개념)

저작자표시 비영리 변경금지 (새창열림)

'Lecture Note > [DB] Database Theory' 카테고리의 다른 글

Structured Query Language (0)	2023.10.27
Handshaking with an R-DBMS (0)	2023.10.27
MySQL과 명령어 (0)	2023.10.27
Relational Algebra (0)	2023.10.26
Database Theory (0)	2022.08.04

Mandy World

Database Systems

1. What is database?

Motivation

Relational Data

Data Model

Relational Data model

Relation(Table)

Notation

Key

Data Language

Database Schema

'Lecture Note > [DB] Database Theory' 카테고리의 다른 글

티스토리툴바

Database Systems

1. What is database?

Motivation

Relational Data

Data Model

Relational Data model

Relation(Table)

Notation

Key

Data Language

Database Schema

'Lecture Note > [DB] Database Theory' 카테고리의 다른 글

관련글

티스토리툴바