A central problem in genomic medicine is understanding the effect of individual DNA variants. One approach is to measure the function of these variants in the lab, but measuring variants individually is both time and resource intensive. Multiplexed Assays of Variant Effect (MAVEs) are a family of experimental techniques that allow researchers to measure all possible variants in a disease-relevant gene at once. This field has expanded dramatically in recent years, and is poised to become an integral part of clinical variant interpretation. Since these data are usable by a wide community of scientists and clinicians, the results must be auditable, widely-available, and reproducible.
We developed MaveDB, which has been adopted by the field as the database of record for MAVEs. It currently contains over 250 individual datasets, many shared by study authors pre-publication, and over three million individual variant effect measurements. We have curated the majority of published studies for which data was made available, and have received additional direct submissions from research groups all over the world. As part of this effort, we have created software tools and APIs to help simplify data submission and enable other researchers to download and work with MAVE data. MaveDB is already becoming a hub for related tools developed by others, and we continue to promote the growth of a software ecosystem to plan, execute, analyze, and disseminate the results of these impactful experiments.
Here we will describe our ongoing efforts to enhance our ability to share data and annotations for whole MAVE datasets and individual variants, while maintaining the critical metadata, contextual information, and provenance tracking that is required for use within variant curation and genomic research workflows.