Buckeye Corpus

Updated on Apr 25, 2026

Edit

Comment

The Buckeye Corpus of conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof. Mark Pitt. . It contains high-quality recordings from 40 speakers in Columbus, Ohio conversing freely with an interviewer. The interviewer's voice is heard only faintly in the background of these recordings. The sessions were conducted as Sociolinguistics interviews, and are essentially monologues. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software (Xwaves and Wavesurfer). Software for searching the transcription files is also available at the project web site. The corpus is available to researchers in academics and industry.

The project was funded by the National Institute on Deafness and Other Communication Disorders and the Office of Research at Ohio State University.

References

Buckeye Corpus Wikipedia

(Text) CC BY-SA