Development of an Algorithm System for Drug Repositioning Using Drug Adverse Event Bigdata
This project aims to develop a system for drug repositioning using drug adverse events big data.
- Big Data: Utilize diverse matching algorithms and machine learning algorithms to extract drug-SE pairs from structured(databases) and unstructured data(text, table files).
- Similarity Analysis: Apply drug-SE similarity analysis to find potential candidate drugs for repositioning.
- Continuous Discovery: Discover candidate drugs as more unstructured data are supplemented.
Big Data Big Computing Engine for High Performance Computer (2016 ~ )
The main purpose of this project is to develop Big Data Big Computing (BDBC) engine for massive application programs based on high-performance computing.
It is aimed at overcoming the limit suffered from a long running time and the data size crucially accompanied by large-scale computation.
- Implement a snapshot-based recovery system to recover master failure on Apache Spark.
- Study an array processing system on top of Apache Spark to support matrix/tensor computation and represent multi-dimensional arrays.
- Optimize the performance for data processing.
- Accelerate the computation using GPU.
- Develop an environment where you can use Apache Spark and NVIDIA GPUs together.
- Dynamic generation of CUDA code for a given Spark operator.
- GPU-accelerated version of the GraphX algorithm.
Study of Software Platform for Data Storing/Management in Smart Campus (2015 ~ 2017)
This project aims to design a software platform, data based technologies for smart campus.
We analyze the requirements with diversity, scalability, functionality, availability and develop the data warehouse to cope with requirements.
- Considering both relational DBMS and NoSQL (key-value, column family, documents, graph) for comprehensive design.
- Building an integrated warehouse for formal / informal / multimedia data.
- Developing optimal warehouse for campus life logging data generated by various smart devices and sensors.
Performance Study of Multidimensional Array DBMS (SciDB) (2014 ~ 2016)
SciDB is a revived effort to build a database system that supports an array data model.
- A SciDB database is a collection of multidimensional arrays.
- Each array cell contains a tuple of attribute values, whose types can be numerical or fixed-length strings.
- Will support an extensible type system like user defined types (e.g., probability distribution functions).
- Data model is nested. An array cell itself can contain another array.
- Similarly to RasDaMan, SciDB supports array definition and array manipulation languages.
- Open Source: www.scidb.org
The long-term objective of the project is to boost the research capability of scientific database management technology. The short-term goal is to address various challenges encountered in managing and analyzing remote-sensing data from satellites and develop a multidimensional array model DBMS optimized for the tasks based on an open-source SciDB.
In-memory DBMS Flash Acceleration (2014 ~ 2016)
In-memory DBMS stores data mainly in memory, not in disk. Therefore, it has good performance when there are a lot of real time transactions, but it needs some features for durability.
- Optimizing In-Memory Database based on NVMe/PCIe
- Anti caching, Snapshot and Command logging are major issues.
- Anti caching : migrate some data from memory to disk when size of total data grows bigger than memory limit.
- Snapshot : take a image of data in memory at a certain time and store it in disk for recovery.
- Command logging : store every command of transactions in disk for REDO action.
- Target IMDB: HStore (open source)
Optimizing DBMS on SSD (2014 ~ 2015)
Replacing HDDs with SSDs in DBMS servers is a common performance upgrade. However, DBMSs are not capable of utilizing SSD’s full potential due to its current interface (i.e., SSD acts as a black box). The main goal of this project is vertical optimization
between DBMS and SSD.
- Optimizing DBMS (e.g. PostgreSQL) to be aware of the underlying flash storage device.
- Optimizing SSD to be aware of the host.
Big Data-based Silver Robot (2014 ~ 2015)
- Through Big data-based real environment robot development, we provide customized welfare system to the silver generation.
And using leading ICT-based technology, we create the silver robot service market.
- Big data collection, manage and distributed processing technology
- Multi-attribute big data indexing and modeling technology
- Big data fusion recognition analysis and processing technology
- NoSQL based hybrid data handling technology with MongoDB