A Unified Standard Format for Proteomics Mass Spectrometry Data
5 Jun 2006The Human Proteome Organisation’s Proteomics Standards Initiative (HUPO-PSI) announced a roadmap for creating a unified data interchange format for proteomics mass spectrometry at the Conference of the American Society for Mass Spectrometry. The new format will combine the current HUPO-PSI format (mzData) with the mzXML format.
The new format will include features from both formats:
- An interchange schema which has split data vectors compatible with other analytical interchange formats
- Support for both random access indexes and digital signatures via a wrapper schema
In support of the new format, the format project will also include tools to support developers and users of the format:
- A program to normalize XML files for random access and digital signatures
- A validation program to insure that the use of controlled vocabulary terms matches minimum reporting (“MIAPE”) requirements
- An ‘Application Programming Interface’ (API) including language bindings for popular programming languages
- Abstract data models and other documentation to assist software developers who wish to implement systems based on the interchange format
In addition to the interchange format and software to help read and validate documents, the project will also develop reference implementations of data converters to create the new format from as many mass spectrometry instruments as possible. Reference implementations of converters will be developed as open source software projects with the assistance of mass spectrometry instrument vendors and the community of software developers working in the field of mass spectrometry informatics.
The time line for the project calls for the majority of the project deliverables to be completed by the end of the year, 2006:
August:
- Data model (UML)
- Ontology models
September
- Documentation
- Draft specification of schema
- Language bindings (Parsing API)
December
- Binary indexing & signatures programs
- Validation program
- Reference implementations of converters
This is a major undertaking for the proteomics informatics community and represents widespread agreement on the need to improve data interchange. It would not be possible without the support of the leadership of HUPO, and specifically the HUPO Publications Committee which helped organize the joint meeting between the informatics community and the journal editors and publishers at the HUPO-PSI Spring meeting held in San Francisco in April 2006.