Data Management Guidance, Tools & Resources

Last revised: 
12/30/2020

The National Science Foundation (NSF) requires that all grant proposals be accompanied by a data management plan and beginning in 2023 the National Institute of Health (NIH) will as well. This page is a starting point to guide researchers through the data life cycle and highlight available tools for data organization and planning.

Data Collection Tools

Research Electronic Data Capture (REDCap)

Web-based HIPAA-compliant and secure electronic data capture and storage for research studies.

  • Develop data entry forms and surveys
  • Data validation
  • Database report
  • Free for UCSF Community
  • Consultation available

See also:

Data Access

UCSF/ZSFG Clinical Data for Research

Request electronic health record data from APeX and data from ZSFG and other DPH facilities. Begin the process with a CTSI Data Management and Extraction Consultation Request.

Population Health and Health Services Research Datasets

A searchable database with more than 100 dataset resources for population health, health services research and health equity research. Consultation available

Data Analysis Platforms

RAE

Formerly known as MyResearch, RAE offers secure hosting for sensitive data with web-based management and collaboration tools including support for study-specific large datasets and AWS resource options.

  • View, manipulate, and save data entirely in a protected environment without storing files on personal computers
  • Free access to research software applications, such as SAS, Stata, SPSS, R, TreeAge, Atlas.ti, MS Office, and Matlab - see full list
  • Collaboration tools, such as SharePoint, facilitate the conduct of multi-site research studies
  • Free up to 10GB/month for UCSF PIs

Data Management Resources

  • CTSI Consultation Services provides expert advice in a wide range of subject areas, including data management issues such as how to design databases and workflow to support studies, electronic health record extracts from the APeX system, assistance with querying existing databases, consultations for Comparative Effectiveness Large Data Set Analysis and other data management needs  (initial hour of consultation is free of charge).
  • Library Data Science Initiative provides free training and consultations to help you make a data management plan, publish data to meet journal requirements, or get started organizing and documenting your research data.
  • Data Systems Services in the Department of Epidemiology and Biostatistics provides data collection, cleaning, and storage services to research investigators and include:
    • Cloud computing and server/desktop virtualization, hosted within the UCSF network and compliant with NIST-mandated security protocols
    • Customized programming and data services
    • Customized databases for outcome ascertainment studies
  • Dryad is a data repository service for the University of California that accepts a wide variety of data formats and meets funder and publisher requirements for data sharing. The Library's Data Science Initiative provides support.
  • Data Management for Clinical Research is a free online course offered by Vanderbilt University via Coursera. You do not receive a grade for the class, but if you participate and successfully complete all assignments and tests then you receive a certificate upon completion. 
  • San Francisco Coordinating Center combines scientific expertise with broad experience in managing multi-center studies, and offers access to a network of high quality, experienced clinical centers. The following services are offered: 
    • Study design, coordination and implementation
    • Measurement selection
    • Protocol development
    • Database design
    • Research study data collection via fax
    • Data quality control
  • UCSF's NLP community curates knowledge as participants experiment, learn and implement NLP tools in clinical and biomedical research projects. Join the Slack channel and regular meetups & explore recommended tools for textual analysis of clinical notes.
  • Information Commons is a high performance compute cluster on AWS used for clinical data at scale and very high performance, and an environment suited to pattern recognition and machine learning. Info Commons offers:
    • Access to de-identified structured EHR data; additional data sets coming soon, including de-identified clinical notes and images
    • Spark analytics engine, that enables fast data query via Spark-SQL, Machine Learning via Spark MLib, R via SparkR
    • Query data using PatientExploreR  
    • An online application that helps researchers create data management plans
    • Free for UCSF Community
  • DMPTool
    • Meets funder requirements
    • Quick-start guide
    • General data management guidance
    • Free for UCSF Community