Harman Patil (Editor)

SkELL

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Developer(s)
  
Lexical Computing Ltd.

Written in
  
jQuery, JavaScript

Development status
  
Active

Available in
  
English

SkELL

Original author(s)
  
Vít Baisa, Vít Suchomel

Initial release
  
November 2014; 2 years ago (2014-11)

SkELL is an abbreviation of Sketch Engine for Language Learning. It is a web interface for English language learning. The main purpose is to help students and teachers of English language. SkELL has its own corpus that was gathered so that contained texts covering everyday, standard, formal, and professional English language. In the corpus, there are a total of more than 60 million sentences and more than one billion words.

Contents

The SkELL interface provides features such as simple search showing words in context, but the maximum of displayed lines (concordances, in fact) is 40. However, the frequency of searched query is located below the search box and expressed with the number hits per million. The second function is word sketch which enables showing collocates for a given word or words. The last one is named as similar words. It visualises similar words to searched word in a word cloud.

Since 2015, the tool has been available also for Russian language.

Features

SkELL offers three types of searches.

  • Examples – searching for words and phrases and their all derived forms
  • Word sketch – a simplified version of the original word sketch page
  • Similar words – based on the Distributional thesaurus in Sketch Engine, there are not necessarily synonyms
  • Data

    The corpus consists of English Wikipedia (special sorted out 130,000 articles), English collection of Project Gutenberg, a subset from the web corpus enTenTen14, the whole British National Corpus, and free new sources.

    Processing the data

    After gathering and pre-cleaning (all structures have removed except sentences) data has run through processing pipe: normalization, tokenization, TreeTagger for English, and deduplication. The further process was a compilation of the corpus using manatee indexing library. In the end, all sentences were scored with the GDEX tool.

    References

    SkELL Wikipedia