Personal Profile

I am a mathematics undergraduate specializing in AI for Science and deep learning research. I have repeatedly led and participated in projects related to the intersection of deep learning and bioinformatics, as well as multimodal large models. My academic and research experience has consistently revolved around using artificial intelligence methods to solve challenges in the scientific field. You can click here to download my English resume.

Educational Background

Nanjing Normal University High School

2019 - 2022

School of Mathematics and Statistics, Lanzhou University

2022 - Present
Bachelor of Science (Mathematics Foundation Theory Program)

Publications

Research group: ai4sci_bioinfo team led by Tianchi Lu, PhD candidate at City University of Hong Kong

  • A General Language Model for Peptide Identification
  • Jixiu Zhai, Tianchi Lu, Haitian Zhong, Ziyang Xu, Yuhuan Liu, Shengrui Xu, Jingwan Wang, Dan Huang

    Briefings in Bioinformatics, Submitting

  • PhosF3C:A Feature Fusion Architecture with Fine-Tuned Protein Language Model and Conformer for Prediction of General Phosphorylation Sites
  • Yuhuan Liu, Haitian Zhong, Jixiu Zhai, Xueying Wang, Tianchi Lu

    Briefings in Bioinformatics, Accept

  • SCMPPI:Supervised Contrastive Multimodal Framework for Predicting Protein-Protein Interactions
  • Shengrui Xu, Tianchi Lu, Zikun Wang, Jixiu Zhai, Jingwan Wang

    NeurIPS, Submitting

  • PHbinder and PSGM:A Cascaded Framework for Epitope Prediction and HLA-I Allele Identification
  • Zikun Wang, Xueying WANG, Jixiu Zhai, ShengRui Xu, Tianchi LU

    Advanced Science, Submitting

    Research Experience

    Deep Learning-Based Prediction of Protein Sequence Biological Characteristics

    Principal Investigator
    National Innovation and Entrepreneurship Training Program
    2024 - 2025
    • Advisor: Professor Wang Yejuan
    • Project Content: PDeepPP is a universal deep learning framework for peptide identification that innovatively integrates protein pretrained language models (such as ESM-2) with a parallel Transformer-CNN architecture, enabling unified modeling of peptide function prediction and post-translational modification (PTM) site identification. This model simultaneously captures both local structural and global sequence features, and introduces the TIM loss function, which effectively improves recognition accuracy and generalization on imbalanced datasets. Experiments covering 37 biological recognition tasks demonstrate that PDeepPP achieves an average AUC improvement of 4.2% on 25 tasks, with accuracy reaching up to 0.97 in some cases—significantly surpassing existing mainstream methods. Additionally, PDeepPP supports rapid analysis of ultra-large-scale proteomic data (with a 218-fold speed increase), greatly reducing dependence on feature engineering and manual annotation. This framework not only enhances the efficiency and accuracy of peptide and PTM prediction but also provides a powerful intelligent tool for large-scale protein function annotation, drug target screening, and related fields, offering broad prospects for both scientific research and industrial applications.
    • Main Responsibilities: Modeling, data collection, experiment design, visualization, paper writing and defense; Python/LaTeX code implementation for all the above processes

    Exploring Protein-Organic Molecule Affinity using SMILES-based Pretrained Deep Learning Frameworks

    Principal Investigator
    National Innovation and Entrepreneurship Training Program
    2025 - 2026
    • Advisor: Professor Wang Yejuan
    • Project Content: This project focuses on a deep learning framework based on SMILES pretraining to improve the prediction accuracy of protein-organic molecule binding affinity. Innovatively, the concept of a principal value is introduced to integrate multi-dimensional features of molecules and proteins into a single decision metric, significantly optimizing feature fusion and computational efficiency. At the same time, Gaussian noise modeling is employed to simulate the uncertainties in the molecular-protein binding process, enhancing the model’s adaptability to real biological environments. Additionally, by transforming the interaction features of molecules and proteins into two-dimensional matrices for CNN input, the model fully extracts complex interaction information and improves interpretability. The project has completed PubChem dataset processing and preliminary model development, supported by an interdisciplinary team and high-performance computing resources. The results are expected to be efficiently applied in drug development, target prediction, and other fields, driving innovative applications of intelligent computing in the biopharmaceutical industry.
    • Main Responsibilities: Guiding team members on data alignment, experiment design, and helping with writing and revising papers.

    Predicting Protein-Protein Interactions Using a Supervised Contrastive Learning Multimodal Framework

    Core Member
    Lanzhou University Cuiying Fund
    2025 - 2026
    • Advisor: Professor Zhao Xuejing
    • Project Content: This project addresses the challenges of accuracy and generalization in protein-protein interaction (PPI) prediction by proposing an innovative supervised contrastive learning multimodal deep learning framework, SCMPPI. The project creatively designed a dynamic weighted contrastive loss function and developed a multimodal feature projection head to achieve deep integration of three heterogeneous protein features: sequence, structure, and network. Protein network features are extracted using the Node2vec algorithm and combined with an ESMC-based sequence encoder to obtain multi-scale, multimodal embedding vectors, significantly improving the model’s discriminative ability. The contrastive learning module incorporates the TM-score as a sample similarity metric to dynamically adjust the weights of negative samples, effectively reducing the false negative rate and optimizing model performance and interpretability.
    • Main Responsibilities: Proposed the integration of CKSAAP into ESMC pre-training, resulting in more effective feature extraction compared to traditional ESMC approaches.

    Visual Large Language Model Knowledge Editing

    Member
    Lanzhou University Cuiying Fund
    2024 - 2025
    • Advisor: Professor Huang Yumei
    • Project Content: This project aims to address the problem of overfitting in knowledge editing for large language models (LLMs). Traditional methods often cause the model’s understanding of other related content to be affected when a specific piece of knowledge is modified, resulting in decreased generalization ability. To tackle this, we propose the REACT framework, which employs a two-stage process of “representation extraction” and “selective perturbation” to precisely locate and selectively adjust the model’s internal knowledge representations. This approach not only enables efficient updating or correction of knowledge within the model, but also effectively avoids side effects and knowledge confusion caused by editing. The method features a clear structure and strong controllability, making it easier for beginners to understand the key technologies and practical significance of knowledge editing in large models.
    • Main Responsibilities: Preliminary understanding of visual large language models and reproducing large model knowledge editing methods on EasyEdit.

    Deep Learning-Based Protein-RNA Binding Affinity Prediction

    Core Member
    Lanzhou University Cuiying Fund
    2025 - 2026
    • Advisor: Professor Zhao Xuejing
    • Project Content: This project focuses on the accurate prediction of protein-RNA binding affinity, aiming to provide technological support for the elucidation of gene regulatory mechanisms and the development of RNA-targeted drugs. The innovation lies in the proposed multimodal deep learning framework, which integrates sequence, structural, and dynamic features. It jointly employs the latest protein language model (ESM-2), RNA language model (RNA-BERT), and dynamic k-mer feature extraction, while also incorporating molecular dynamics simulation and graph neural networks to achieve comprehensive feature representation from static to dynamic, and from sequence to spatial dimensions. The model uses a weighted Focal Loss and contrastive loss functions to handle both hard-to-classify samples and global feature distribution, significantly improving prediction accuracy and generalization ability. The project has integrated multiple high-quality datasets and is expected to produce high-level publications and open-source tools, providing a new paradigm for interdisciplinary research and industrial applications in the field of bioinformatics.
    • Main Responsibilities: Guiding team members in data collection and processing, experiment design, and assisting with paper writing and revisions.

    Academic Awards

    • National Scholarship (2023 - 2024)

      Professional Rank 3/50

    • Lanzhou University Learning Model (2023 - 2024)

       

    • National Undergraduate Mathematics Contest (2023)

      Gansu Province Second Prize

    • National Undergraduate Mathematical Modeling Contest (2024)

      Gansu Province First Prize

    • Challenge Cup Startup Plan Competition (2024)

      Jilin Region Silver Award

    Skills & Proficiency

    Python

    C++

    PyTorch

    Linux

    Latex