Neha Patil (Editor)

Software Heritage

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Headquarters
  
Inria

Affiliations
  
Inria

Founded
  
30 June 2016

Location
  
Rocquencourt, France

Website
  
softwareheritage.org

Staff
  
4

Software Heritage httpswwwinriafrvarinriastorageimagesmedi

Formation
  
June 30, 2016; 8 months ago (2016-06-30)

Scientific Advisors
  
Gérard Berry Jean-François Abramatic Serge Abiteboul

Founders
  
Roberto Di Cosmo, Stefano Zacchiroli

Why and how software heritage is building the universal software archive roberto di cosmo inria


Software Heritage is an initiative whose goal is to collect, preserve, and share software code -- both freely licensed and not -- in a universal software storage archive.

Contents

History

Although started in 2015, the initiative was worked on as a research project for two years before that time. Software Heritage began public operations on June 30, 2016. It was formed under the auspices of the French research institute, French Institute for Research in Computer Science and Automation (Inria), which hosts the initiative on its servers. The budget Inria is providing for the project is €500,000 over three years.

Software Heritage was founded by computer scientists Roberto Di Cosmo and Stefano Zacchiroli. Its repository holds over 20 million software projects, with an archive of over 2.7 billion unique source files as of July 2016.

Additional sponsors of the Software Heritage initiative include Microsoft and the Royal Netherlands Academy of Arts and Sciences and the Netherlands Organisation for Scientific Research's Data Archiving and Networked Services (DANS). Creative Commons, Free Software Foundation, GitHub, Jason Scott, the Linux Foundation, and Microsoft among others have endorsed the project.

Overview

Software Heritage's goal is to preserve software in its original source code that is free/open source software (FOSS). The focus of the initiative is to collect, preserve, and share software that is across cultural heritage, industry, education, science, and research communities, with the concern that software that is made up of technical and scientific knowledge will be lost without preservation. The project came about because software code is seen as being even more vulnerable to corruption and obsolescence than typical archival holdings like books and other media like video and film.

The interface is built using open source code, with an initial focus on search, where end-users search by SHA-1 hashes. The Software Heritage initiative is open to scientific researchers, with the idea that it would be a Library of Alexandria type resource for software. Additionally, Software Heritage will be an infrastructure resource upon which developers can build applications on top of the archive. Another goal is to get guidance from researchers on what features might be valuable as a way to structure output and collection curation.

Other grass-roots initiatives exist, like archivist Jason Scott's Textfiles.com project, the Code Archive (which attempts to archive GitHub), as well as the Internet Archive Wayback Machine. Software Heritage is gathering software that has free licenses from sources that include GitHub, Debian package archive, and GNU Project FTP archive and from entities like Gitorious and Google Code, projects that no longer exist.

The archive is structured so knowledge can be preserved, enabling continuous access to digital information, as well as creating a building block for thematic portals and collections of software. The initiative can be used to create better software for the industry, where original software has often been lost. Software Heritage will ensure long-term preservation of software, making software provenance more traceable, integrated, and reusable, with an ability to know licensing (which is not always present) and use constraints, track security vulnerabilities, and assist in the discovery of prior code assets.

References

Software Heritage Wikipedia