CHILDES CA data corpus

From emcawiki
Jump to: navigation, search

made available by Mike Forrester

Available at the CHILDES (Child Language Data Exchange System) - http://childes.psy.cmu.edu/ For those CA people interested in working with transcripts/recordings of a child learning how to talk between the ages of 1 and 3.5 years the CHILDES data corpus has recently made available data representing 31 recordings - amounting to over 11 hours of transcribed material. This data was collected by Mike Forrester at the University of Kent and documents his daughter Ella interacting with her family, for the most part at mealtimes. The video-recordings were digitised and transcribed (originally in CA format) where utterance lines in the files are linked to appropriate points in the video-recordings. The CHILDES suite of programs (when used alongside QuickTime) allows for the simultaneous 'playing' of the transcripts with the video-recordings.

The CHILDES program team, along with colleagues at TALKBANK (http://talkbank.org/ca/ ) have been working to make available 'translation' facilities so that transcriptions originally made in the 'chat' format, often used in developmental psycholinguistics, can be transformed into 'CA' characters. The 'chat' transcripts available at http://childes.psy.cmu.edu/data/Eng-UK/Forrester.zip

when downloaded alongside the associated movies from http://childes.psy.cmu.edu/data/video/Forrester/ can be translated into CA transcript files by running instructions in the CLAN command window. Details of the CLAN environment and associated manuals are available to researchers from the CHILDES web site. For any of the files in the Forrester corpus (*.cha files) the first instruction to run is:

chat2ca filename.cha

which creates an output file called 'filename.ca'

and then to line-up the formatting in the CA characters, a second command to run is

indent +1 filename.ca

which outputs an appropriately formatted 'filenman.ca' file. The transcript contains links to the video files (which should be stored in the same folder as the transcript files) which can be viewed at the utterance level or to suit the user/analyst.

Alternatively, the data can be viewed through 'streaming' facilities by selecting 'Web Data' in the 'Windows' menu of CLAN and then navigating to the data available at

'Eng-UK/Forrester/' where a list of the files (in *.cha) chat format can be viewed.

Further details on CLAN at the facilities for CA can be found at the CHILDES and TALBANK websites. Specific queries concerning the 'Forrester' data corpus can be directed at m.a.forrester@kent.ac.uk.