The BASE text files (transcribed and tagged) are available from this site. However the video and audio recordings are part of the BASE Plus collection. For enquiries about accesss to the BASE Plus collection please e-mail baseplus@warwick.ac.uk
Holdings were distributed across four broad disciplinary groups, each represented by 40 lectures and 10 seminars. These groups are:
Arts and Humanities transcriptsLife and Medical Sciences transcriptsPhysical Sciences transcriptsSocial Sciences transcripts
BASE recordings were transcribed and tagged using a system devised in accordance with the TEI Guidelines
. The marked up transcripts of the BASE corpus are also available as XML files, in zipped folders. To download the data, click on one of the following links which will enable you to either open or save a zipped folder containing the XML files of all lectures and seminars for one of the academic divisions in the corpus. In addition to the files, the BASE DTD is included in the folder and it must always be present in the same folder as any of the XML files that is viewed. File names are made up of five letters and three digits, in which the first two letters indicate the disciplinary group, the next three indicate whether the file is a transcript of a lecture (lct) or a seminar (sem), and the digits are unique identifiers.
ah [Arts and Humanities] XML filesls [Life and Medical Sciences] XML filesps [Physical Sciences] XML filesss [Social Sciences] XML files
explains the transcription and mark-up coventions used in the corpus
BASE-in-Sketch-Engine can be used as a query tool for analysis of original BASE lecture transcripts.
The lecture portion of the BASE corpus can be accessed through the corpus analysis interface, Sketch Engine. All 160 lectures are included, with 40 for each general disciplinary domain. This interface allows the user to view concordance lines, form complex queries, collect word frequency data (including word lists) and more. The service requires a subscription - for details, visit the Sketch Engine website at http://corpora.sketchengine.co.uk/
. The service can be obtained initially on a 30 day trial subscription with full access to all resources.
The British Academic Spoken English (BASE) corpus
is available to non-commercial researchers who agree to the following conditions:
Researchers must acknowledge their use of the BASE corpus project using the following form of words:
The transcriptions used in this study come from the British Academic Spoken English (BASE) corpus project. The corpus was developed at the Universities of Warwick and Reading under the directorship of Hilary Nesi and Paul Thompson. Corpus development was assisted by funding from BALEAP, EURALEX, the British Academy and the Arts and Humanities Research Council.
Companies who wish to use BASE for research and/or commercial purposes, should contact the BASE plus team: baseplus@warwick.ac.uk