Current Projects
Multi-Model Multi-Storage Engine Database Management System (2020 ~)
In this project, we develop a new Multi-model DBMS that can efficiently handle complicated problems consisted of various data models and queries.
- Integrated storage engine layer: Various opensource storage engines are combined to support 6 data models.
- Inter-Model Query and Inter-Model Foreign Key: These new terms are proposed to manage and analysis different models effectively.
- Smart Distribution: Exploit data's characteristics and save in the most efficient data form.
Past Projects
Construction and Demonstration of Fine Particle Monitoring System using Scanning LIDAR (2019 ~ 2020)
This project primarily aims to develop a visualization and monitoring system with an efficient storage for multi-dimensional and spatial data on fine particles. Through visualization of the concentration of fine particles on the district-level, we plan to implement a specialized information system for protective services concerning fine particles.
- We have built a monitoring and visualization framework for fine particle concentration. Data is collected by a 3D Scanning LIDAR installed on Siheung Smart City.
- The scope of this project includes the development of real-time analysis processes, transforming LiDAR signals to the concentration level of fine particles for real-time detection of hot spots.
System Overview
Predicting Potential Drug-Drug Interactions via Big Data Analysis and Machine Learning (2019)
This project aims to develop a system for predicting potential drug-drug interactions(DDI). Drug-drug interactions can occur during the co-administration of medications and cause unexpected pharmacological effects, such as adverse drug events (ADEs). Therefore, drug-drug interaction prediction is critical to prevent unexpected results. Moreover, it can provide insights to optimize drug treatments in clinical trials and drug design.
- Each drug is characterized by various features such as targets, enzymes, and side effects. With drug representations, a machine learning model is trained to predict drug-drug interactions.
- Along with predictions on the types of drug-drug interactions with a multi-classification model, we aim to provide diverse analyses to assist users in understanding the result.
Big Data Big Computing Engine for High-Performance Computer (2016 ~ 2021)
The primary purpose of this project is to develop Big Data Big Computing (BDBC) engine for massive application programs based on high-performance computing. It aims to overcome the limitations of large-scale computations such as long-running time and massive memory consumption.
- The study of an array processing system on top of Apache Spark is to support matrix/tensor computation and represent multi-dimensional arrays.
- Optimize the performance for data processing.
- Accelerate the computation using GPU.
- The establishment of an integrative environment of Apache Spark and NVIDIA GPUs is included in the project.
- Dynamic generation of CUDA code for a given Spark operator.
- GPU-accelerated version of the GraphX algorithm.
Development Plan for Utilization of Artificial Intelligence to Construct an Innovative Biological License Application Approval System (2019)
Traditional biological license approval and review process requires massive human labor to identify core qualification components in applications. The goal of this project is to investigate the possible utilization of artificial intelligence technologies in the approval and review system to facilitate the process.
- To identify applicable subcomponents, the current biopharmaceuticals approval and review process is investigated.
- The current state-of-the-art artificial intelligence technologies(Natural Language Processing, Entity and Relation Extraction, Chatbot, Data Management, and more) are studied as a part of a long-term roadmap proposal.
Development of an Algorithm System for Drug Repositioning Using Drug Adverse Event Bigdata (2018~2019)
This project aims to develop a system for drug repositioning using drug adverse events big data. Three major aspects of the project involve Big Data, Similarity Analysis, and Continuous Discovery.
- Diverse matching and machine learning algorithms are utilized to extract drug-side effect pairs from structured(databases) and unstructured data(text, table files of published works of literature).
- And similarity analysis of drug pairs based on side-effects is applied to find potential repositioning candidates.
- The project aims to build a pipeline to continue the candidate discovery process as more data are supplied.
Study of Software Platform for Data Storing/Management in Smart Campus (2015 ~ 2017)
This project aims to design a software platform and data-driven technologies for the smart campus. The data warehouse is built to fulfill the demands of diversity, scalability, functionality, availability.
- For comprehensive and efficient design, both relational DBMS and NoSQL (key-value, column family, documents, graph) are considered.
- The integrated warehouse stores formal/informal/multimedia data.
- The optimal warehouse to save life-logging data generated from various smart devices and sensors on the campus is investigated and developed as a part of the project.
Performance Study of Multidimensional Array DBMS (SciDB) (2014 ~ 2016)
The long-term objective of the project is to boost the research capability of scientific database management technology. The short-term goal includes addressing various challenges encountered in managing and analyzing remote-sensing data from satellites and developing a multidimensional array model DBMS optimized for the tasks based on an open-source SciDB.
SciDB is a database system that supports an array data model. A SciDB database is a collection of multidimensional arrays and supports an extensible type system like user-defined types (e.g., probability distribution functions). Open Source: www.scidb.org
- In-depth analysis of storage structure, query processing, and indexing of SciDB is investigated.
- The performance of queries involving multidimensional data concerning selectivity, the shape of query and difference of data location is measured and the optimal chunking guideline is suggested.
- The aggregation query computation on individual attributes with a hash tree structure is implemented and an array filter is accelerated with B+-tree indexing.
In-memory DBMS Flash Acceleration (2014 ~ 2016)
Since In-memory DBMS stores data mainly in memory, not in a disk, it has shown better performance in environments with a lot of real-time transactions. However, in-memory DBMS still requires improvement to increase durability. The project also intends to optimize in-memory DBMS and design software modules based on NVMe/PCIe devices.
- Optimization of in-memory DBMS is approached based on NVMe/PCIe.
- Fundamentals of H-Store is studied to affirm provided features and performances. And the following features are considered as primary targets for optimization: Anti caching, Snapshot, and Command logging.
- Anti-caching refers to the migration of some data from memory to disk when the size of the total data grows bigger than the memory limit.
- Snapshot feature takes an image of data in memory at a specific time and stores it in disk for recovery.
- Command logging stores every command of transactions in disk for REDO action.
H-store architecture
Optimizing DBMS on SSD (2014 ~ 2015)
Replacing HDDs with SSDs in DBMS servers is a common performance upgrade. However, DBMSs are not capable of utilizing SSD’s full potential due to its current interface (i.e., SSD acts as a black box). The main goal of this project is vertical optimization between DBMS and SSD.
- The study of optimizing DBMS (e.g., PostgreSQL) to be aware of the underlying flash storage device is conducted.
- On the other hand, the optimization of SSD to be aware of the host with an open channel is studied.
Big Data-based Silver Robot (2014 ~ 2015)
Through Big data-based real environment robot development, the project aims to provide a customized welfare system to the silver generation. Using leading ICT-based technology, we create the silver robot service market.
- The project incorporates big data collection, management, and distributed processing technology. To efficiently process and manage the collected data, NoSQL based hybrid data handling technology with MongoDB is applied.
- Multi-attribute data indexing, modeling, and processing technology are implemented.