Database Systems Lab 서울대학교 데이터베이스 시스템 연구실

Current Projects

Multi-Model Multi-Storage Engine Database Management System (2020 ~)

Integrated storage engine layer: Various opensource storage engines are combined to support 6 data models.
Inter-Model Query and Inter-Model Foreign Key: These new terms are proposed to manage and analysis different models effectively.
Smart Distribution: Exploit data's characteristics and save in the most efficient data form.

Past Projects

Construction and Demonstration of Fine Particle Monitoring System using Scanning LIDAR (2019 ~ 2020)

We have built a monitoring and visualization framework for fine particle concentration. Data is collected by a 3D Scanning LIDAR installed on Siheung Smart City.
The scope of this project includes the development of real-time analysis processes, transforming LiDAR signals to the concentration level of fine particles for real-time detection of hot spots.

System Overview

LiDAR System Overview

Predicting Potential Drug-Drug Interactions via Big Data Analysis and Machine Learning (2019)

Each drug is characterized by various features such as targets, enzymes, and side effects. With drug representations, a machine learning model is trained to predict drug-drug interactions.
Along with predictions on the types of drug-drug interactions with a multi-classification model, we aim to provide diverse analyses to assist users in understanding the result.

Big Data Big Computing Engine for High-Performance Computer (2016 ~ 2021)

The study of an array processing system on top of Apache Spark is to support matrix/tensor computation and represent multi-dimensional arrays.

Optimize the performance for data processing.
Accelerate the computation using GPU.

The establishment of an integrative environment of Apache Spark and NVIDIA GPUs is included in the project.

Dynamic generation of CUDA code for a given Spark operator.
GPU-accelerated version of the GraphX algorithm.

Development Plan for Utilization of Artificial Intelligence to Construct an Innovative Biological License Application Approval System (2019)

To identify applicable subcomponents, the current biopharmaceuticals approval and review process is investigated.
The current state-of-the-art artificial intelligence technologies(Natural Language Processing, Entity and Relation Extraction, Chatbot, Data Management, and more) are studied as a part of a long-term roadmap proposal.

Development of an Algorithm System for Drug Repositioning Using Drug Adverse Event Bigdata (2018~2019)

Diverse matching and machine learning algorithms are utilized to extract drug-side effect pairs from structured(databases) and unstructured data(text, table files of published works of literature).
And similarity analysis of drug pairs based on side-effects is applied to find potential repositioning candidates.
The project aims to build a pipeline to continue the candidate discovery process as more data are supplied.

Study of Software Platform for Data Storing/Management in Smart Campus (2015 ~ 2017)

For comprehensive and efficient design, both relational DBMS and NoSQL (key-value, column family, documents, graph) are considered.
The integrated warehouse stores formal/informal/multimedia data.
The optimal warehouse to save life-logging data generated from various smart devices and sensors on the campus is investigated and developed as a part of the project.

Performance Study of Multidimensional Array DBMS (SciDB) (2014 ~ 2016)

In-depth analysis of storage structure, query processing, and indexing of SciDB is investigated.
The performance of queries involving multidimensional data concerning selectivity, the shape of query and difference of data location is measured and the optimal chunking guideline is suggested.
The aggregation query computation on individual attributes with a hash tree structure is implemented and an array filter is accelerated with B+-tree indexing.

In-memory DBMS Flash Acceleration (2014 ~ 2016)

Optimization of in-memory DBMS is approached based on NVMe/PCIe.
Fundamentals of H-Store is studied to affirm provided features and performances. And the following features are considered as primary targets for optimization: Anti caching, Snapshot, and Command logging.

Anti-caching refers to the migration of some data from memory to disk when the size of the total data grows bigger than the memory limit.
Snapshot feature takes an image of data in memory at a specific time and stores it in disk for recovery.
Command logging stores every command of transactions in disk for REDO action.

H-store architecture

H-store architecture

Optimizing DBMS on SSD (2014 ~ 2015)

The study of optimizing DBMS (e.g., PostgreSQL) to be aware of the underlying flash storage device is conducted.
On the other hand, the optimization of SSD to be aware of the host with an open channel is studied.

Big Data-based Silver Robot (2014 ~ 2015)

The project incorporates big data collection, management, and distributed processing technology. To efficiently process and manage the collected data, NoSQL based hybrid data handling technology with MongoDB is applied.
Multi-attribute data indexing, modeling, and processing technology are implemented.