Trisha Shetty (Editor)

Gensim

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Original author(s)
  
Radim Řehůřek

Development status
  
active

Initial release
  
2009

Developer(s)
  
RaRe Technologies, various

Stable release
  
0.13.4 / 25 December 2016; 2 months ago (2016-12-25)

Repository
  
github.com/RaRe-Technologies/gensim

Gensim is a mature open-source vector space modeling and topic modeling toolkit implemented Python. It uses NumPy, SciPy and optionally Cython for performance. It is specifically designed to handle large text collections, using data streaming and efficient incremental algorithms, which differentiates it from most other scientific software packages that only target batch and in-memory processing.

Contents

Main features

Gensim includes implementations of tf-idf, random projections, word2vec and document2vec algorithms, hierarchical Dirichlet processes (HDP), latent semantic analysis (LSA) and latent Dirichlet allocation (LDA), including distributed parallel versions.

Some of the online algorithms in Gensim were also published in the 2011 PhD dissertation Scalability of Semantic Analysis in Natural Language Processing of Radim Řehůřek, the creator of Gensim.

Uses of gensim

Gensim has been used and cited in over 400 commercial and academic applications. The software has been covered in several new articles, podcasts and interviews over the past 8 years.

Free and commercial support

The code is developed and hosted on GitHub and a public support forum is maintained on Google Groups and Gitter.

Gensim is commercially supported by the company rare-technologies.com, who also provide student mentorship and open-source thesis projects for gensim via their free Incubator programme.

References

Gensim Wikipedia