Current Projects
Predicting Potential Drug-Drug Interactions Using Big data and Machine Learning (2019~ )
This project aims to develop a system for predicting potential drug-drug interactions (DDI). Drug-Drug Interactions (DDI) occurs during the co-administration of medications. Drug interactions can cause unexpected pharmacological effects such as adverse drug events (ADEs). Therefore, DDI prediction can help to prevent unexpected results. Moreover, it can help to optimize treatments in clinical trials and drug design.
- Each drug can be characterized by various features such as targets, enzymes, and side effects. With these features, we train a machine learning model and try to predict the drug-drug interactions.
- We aim to not only predict various kinds of drug-drug interactions with the trained multi-classification model but also provide a diverse analysis to help users deeply understand the predicted result.
Development of an Algorithm System for Drug Repositioning Using Drug Adverse Event Bigdata
This project aims to develop a system for drug repositioning using drug adverse events big data.
- Big Data: Utilize diverse matching algorithms and machine learning algorithms to extract drug-SE pairs from structured(databases) and unstructured data(text, table files).
- Similarity Analysis: Apply drug-SE similarity analysis to find potential candidate drugs for repositioning.
- Continuous Discovery: Discover candidate drugs as more unstructured data are supplemented.
Big Data Big Computing Engine for High Performance Computer (2016 ~ )
The main purpose of this project is to develop Big Data Big Computing (BDBC) engine for massive application programs based on high-performance computing.
It is aimed at overcoming the limit suffered from a long running time and the data size crucially accompanied by large-scale computation.
- Implement a snapshot-based recovery system to recover master failure on Apache Spark.
- Study an array processing system on top of Apache Spark to support matrix/tensor computation and represent multi-dimensional arrays.
- Optimize the performance for data processing.
- Accelerate the computation using GPU.
- Develop an environment where you can use Apache Spark and NVIDIA GPUs together.
- Dynamic generation of CUDA code for a given Spark operator.
- GPU-accelerated version of the GraphX algorithm.
Past Projects
Study of Software Platform for Data Storing/Management in Smart Campus (2015 ~ 2017)
This project aims to design a software platform, data based technologies for smart campus.
We analyze the requirements with diversity, scalability, functionality, availability and develop the data warehouse to cope with requirements.
- Considering both relational DBMS and NoSQL (key-value, column family, documents, graph) for comprehensive design.
- Building an integrated warehouse for formal / informal / multimedia data.
- Developing optimal warehouse for campus life logging data generated by various smart devices and sensors.
Performance Study of Multidimensional Array DBMS (SciDB) (2014 ~ 2016)
SciDB is a revived effort to build a database system that supports an array data model.
- A SciDB database is a collection of multidimensional arrays.
- Each array cell contains a tuple of attribute values, whose types can be numerical or fixed-length strings.
- Will support an extensible type system like user defined types (e.g., probability distribution functions).
- Data model is nested. An array cell itself can contain another array.
- Similarly to RasDaMan, SciDB supports array definition and array manipulation languages.
- Open Source: www.scidb.org
The long-term objective of the project is to boost the research capability of scientific database management technology. The short-term goal is to address various challenges encountered in managing and analyzing remote-sensing data from satellites and develop a multidimensional array model DBMS optimized for the tasks based on an open-source SciDB.
In-memory DBMS Flash Acceleration (2014 ~ 2016)
In-memory DBMS stores data mainly in memory, not in disk. Therefore, it has good performance when there are a lot of real time transactions, but it needs some features for durability.
- Optimizing In-Memory Database based on NVMe/PCIe
- Anti caching, Snapshot and Command logging are major issues.
- Anti caching : migrate some data from memory to disk when size of total data grows bigger than memory limit.
- Snapshot : take a image of data in memory at a certain time and store it in disk for recovery.
- Command logging : store every command of transactions in disk for REDO action.
- Target IMDB: HStore (open source)
H-store architecture
Optimizing DBMS on SSD (2014 ~ 2015)
Replacing HDDs with SSDs in DBMS servers is a common performance upgrade. However, DBMSs are not capable of utilizing SSD’s full potential due to its current interface (i.e., SSD acts as a black box). The main goal of this project is vertical optimization
between DBMS and SSD.
- Optimizing DBMS (e.g. PostgreSQL) to be aware of the underlying flash storage device.
- Optimizing SSD to be aware of the host.
Big Data-based Silver Robot (2014 ~ 2015)
- Through Big data-based real environment robot development, we provide customized welfare system to the silver generation.
And using leading ICT-based technology, we create the silver robot service market.
- Big data collection, manage and distributed processing technology
- Multi-attribute big data indexing and modeling technology
- Big data fusion recognition analysis and processing technology
- NoSQL based hybrid data handling technology with MongoDB