Natural Language Processing (NLP) Core

The NLP Core provides NLP resources and infrastructures to facilitate text analytics at the University of Florida (UF) Clinical and Translational Science Institute (CTSI) and the OneFlorida+ Clinical Research Consortium.

  • Providing key NLP resources and infrastructures in close collaborations with the Biomedical Informatics (BMI) program, UF Integrated Data Repository (IDR), OneFlorida, and the Cancer Informatics Shared Resource at the Cancer Center.
  • Supporting OneFlorida as one of NLP-enabled Clinical Data Research Networks (CDRN) within PCORnet
  • Providing consultation regarding NLP solutions and tools for clinical research and applications.
  • Providing NLP capability as a service at UF CTSI to bridge the gap of using unstructured clinical text for research
  • Developing NLP research and education programs to accelerate the UF Artificial Intelligence (AI) initiative.


  • Clinical Language models. The NLP Core have developed GatorTron using >90 billion words of text, including >82 billion words of de-identified clinical text collected from over 126 departments, approximately 2 million patients and 50 million encounters at UF Health. GatorTron is currently the largest language model in the clinical domain achieving state-of-the-art performance for clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference, and medical question answering. Language models is the key technology for medical AI systems utilizing clinical narratives. One pretrained language model can be applied to many NLP tasks through fine-tuning, which is known as transfer learning.
nlp figure flow chart
  • Social determinants of health. The NLP Core provides service to extract individual-level social determinants of health from clinical narratives. Social determinants of health (SDoH, here we use the term SDoH to represent both social [e.g., education] and behavioral [e.g., smoking] determinants of health for simplicity) are increasingly recognized as important factors affecting a wide range of health, functional, and quality of life outcomes, as well as healthcare fairness and disparities.
  • Medical determinants of health. The NLP Core provides service to extract medical determinants of health such as over-the-counter medications, family history, adverse drug events, early symptoms of diseases.
  • NLP-powered computable phenotyping.  Computable phenotyping is critical to accurately identify a research-standard patient cohort for clinical studies.  The NLP Core provide NLP-powered computable phenotyping solutions to identify disease phenotypes (e.g., Alzheimer’s diseases) that highly depends on unstructured information from clinical narratives.
  • NLP-powered disease screening. Identify unstructured data elements to improve disease screening. For example, extract quantitative smoking information (e.g., smoking pack year, pack per day, years of smoking) to improve lung cancer screening.


Yonghui Wu, PhD, Core Director
Phone: (352) 294-8436