Youngmok Jung

tom418@kaist.ac.kr | Google Scholar | Linkedin | CV

Youngmok Jung

I am an AI research scientist at Genome Insight, where I collaborate with geneticists to develop secondary and tertiary analysis solutions for genomics data, aiming to uncover novel biological insights to enhance human health.
I received my PhD from Electrical Engineering at KAIST, where I was advised by Dongsu Han and Young Seok Ju. My dissertation introduced new methods for improving genome analysis with AI and ML. Before then, I received my BS degree in Electrical Engineering in KAIST, South Korea.

My interests are in crafting high-performance systems that incorporate AI/ML approaches. I concentrate on utilizing underlying assumptions of real-world systems to enhance the integration of AI/ML techniques. My works aims at 1) designing efficient system built on AI/ML approaches; 2) improving AI/ML approaches by exploiting system-specific assumptions. All my work involves several months of implementation followed by thorough testing in real-world data.

Recently, I am striving to develop secondary and tertiary analysis solutions for genomics data using AI/ML approaches. Concurrently, I worked on various topics in high-performance networked systems, including AI enhanced video delivery system and data-center networking.

Keywords: Deep-learning and Machine-learning, Genomics software, High-performance network system

Awards and Honors

Feb, 2023 Samsung Electronics 29th Humantech Paper Award (Silver Prize, Communication & Networks)
Feb, 2022 Samsung Electronics PhD Scholarship :smile:
Spring, 2021 KAIST Breakthrough of the Year 2021, Spring (LiveNAS, NEMO)

Selected Publications

  1. Preprint
    Generalizing deep variant callers via domain adaptation and semi-supervised learning
    Youngmok Jung, Jinwoo Park, Hwijoon Lim, Jeong Seok Lee, Young Seok Ju, and Dongsu Han
    preprint 2023
    Bioinformatics
    BWA-MEME: BWA-MEM emulated with a machine learning approach
    Youngmok Jung and Dongsu Han
    Oxford Bioinformatics, 2022
    SIGCOMM
    LiveNAS - Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning
    Jaehong Kim, (Co-first)Youngmok Jung, Hyunho Yeo, Juncheol Ye and Dongsu Han
    In Proceedings of ACM SIGCOMM conference, 2020

    Publications

    1. CoNEXT
      Co-optimizing for Flow Completion Time in Radio Access Network
      Jaehong Kim, Yunheon Lee, Hwijoon Lim, Youngmok Jung, Song Min Kim, and Dongsu Han
      In Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies, 2022
      SIGCOMM
      NeuroScaler: Neural Video Enhancement at Scale
      Hyunho Yeo, Hwijoon Lim, Jaehong Kim, Youngmok Jung, Juncheol Ye, and Dongsu Han
      In Proceedings of ACM SIGCOMM conference, 2022
      EuroSys
      Towards Timeout-less Transport in Commodity Datacenter Networks
      Hwijoon Lim, Wei Bai, Yibo Zhu, Youngmok Jung and Dongsu Han
      The 16th European Conference on Computer Systems, 2021
      Mobicom
      NEMO: Enabling Neural-enhanced Video Streaming on Commodity Mobile Devices
      Hyunho Yeo, Chan Ju Chong, Youngmok Jung, Juncheol Ye and Dongsu Han
      The 26th Annual International Conference on Mobile Computing and Networking, 2020
      OSDI
      Neural Adaptive Content-aware Internet Video Delivery
      Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, and Dongsu Han
      13th USENIX Symposium on Operating Systems Design and Implementation, 2018

    Experiences

    1. Enhancing genome analysis pipeline with AI and ML techniques

      BWA-MEME: Machine-learning Enhanced Read Alignment Software
      • BWA MEM is a industry-standard alignment software for next-generation sequencing data developed by the Broad Institute of MIT and Harvard.
      • We developed BWA-MEME that achieves up to 3.45x speedup in seeding throughput over its' predecessor, BWA-MEM2 from Intel and Broad institute, while ensuring identical output.
      • BWA-MEME is now operational in the production environments of several institutions, projected to lower alignment costs by 35%. This efficiency translates into millions of $ in cost reductions for projects on a million-genome scale.
      Generalizing Deep Variant Callers via Domain Adaptation and Semi-supervised learning
      • Deploying deep learning-based variant callers (DVCs) to a sequencing method with varying error profiles necessitates generalization which is challenging due to their reliance on extensive labeled data.
      • We developed a generalization framework that enables deep learning-based variant callers (e.g., Google DeepVariant, Clair3) to accommodate diverse sequencing methods, leveraging semi-supervised learning and domain adaptation techniques.
      • RUN-DVC improves SNP F1-score and INDEL F1-score by up to 6.40 %p and 9.36 %p over the supevised training approach using only unlabeled data or achieves the same variant calling accuracy using merely half of the labeled data.

      Designing AI-enhanced high-performance networked systems

      Live-NAS: Stream high-quality live videos even when the network becomes congested
      • Developed neural-enhanced live video streaming system (LiveNAS) that incorporates 1) Online training and inference system for super-resolution DNN model during live video streaming. 2) Bandwidth allocation algorithm to maximize user Quality of Experience (QoE)
      • LiveNAS delivers live video with the same quality as Google WebRTC using only 45.9% bandwidth on average. LiveNAS system is built on top of Google WebRTC C++ code.
      NAS: Neural Adaptive Streaming
      • Developed an adaptive bit-rate streaming algorithm based on Reinforcement Learning (a3c) for NAS, a video-on-demand system such as Youtube or Netflix.
      • NAS enhanced the average QoE by 43.08% using the same bandwidth budget or saving 17.13% of bandwidth while providing the same user QoE.
      TLT: Timeout-less Transport in Commodity Datacenter Networks
      • Implemented data-center network protocols (TLT, PFC) in NS-3 network simulator. TLT protocol was also implemented in Linux kernel and tested in real-world testbed. TLT augments diverse datacenter transports, from widely-used (TCP, DCTCP, DCQCN) to state-of-the-art (IRN and HPCC), by achieving up to 81% lower tail latency.

      Opensource

        Main Contributor
        RUN-DVC

        https://github.com/kaist-ina/RUN-DVC
        - Generalizing deep learning-based variant caller via domain adaptation and semi-supervised learning.

        BWA-MEME badge

        https://github.com/kaist-ina/BWA-MEME
        - Short-read alignment software, released for benefit of research community.

        Collaborative Projects
        NAS: Neural Adaptive Content-aware Internet Video Delivery

        https://github.com/kaist-ina/NAS_public

        NEMO: Enabling Neural-enhanced Video Streaming on Commodity Mobile Devices

        https://github.com/kaist-ina/nemo

        TLT: Towards Timeout-less Transport in Commodity Datacenter Networks

        https://github.com/kaist-ina/ns3-tlt-rdma-public

        NeuroScaler: Neural Video Enhancement at Scale

        https://github.com/kaist-ina/neuroscaler-public

        Skills

          Programming Languages: Python, C++, Rust, SQL, Shell scripting
          Frameworks: Tensorflow, Pytorch, Spark, Hadoop, Python Django, Node.js, Cloudstack