Start | Recordings | MP3 listing | Transcripts | Background | Pictures | Movies
CORPUS BACKGROUND
The Pear-Chaplin Basque Corpus
© Jon Aske
The transcriptions are in a blend of standard Basque orthography and phonemic formats. Thus, for example, Txaplin is capitalized as a proper noun (Chaplin), but h’s and elided consonants are not written, as they would be in standard Basque orthography. Because the purpose of these transcriptions was to study grammatical constructions and, in particular, word order, detailed phonemic and/or phonetic transcription was generally avoided as unnecessary.
In general, each numbered line corresponds to a single intonation unit, which occasionally is complex, consisting of several sub-tone units.
Embedded clauses inside an intonation unit are extracted and repeated in the following line and the line receives the same number with a + sign. This was done to facilitate the coding of the clauses (the coding is not included here). You may ignore those lines when reading the transcript. For example:
|
05-HEND-05-SPKR-02(02)-CHAP-01(01) |
7 |
eta= bo ez du ikusten <kamioia galdu duela>~, |
|
05-HEND-05-SPKR-02(02)-CHAP-01(01) |
7+ |
~<kamioia galdu duela> |
Here 7+ is just a repetition of a clause inside intonation unit number 7.
Occassionally we find more than one clause in what seems to be a single intonation unit. Although these complex units may very well be blends of more than one unit, I have treated them as a single intonation unit and thus a single line number is used (e.g. 25). However, I have taken the liberty of splitting these complex units into their unit clauses for the purpose of coding the clauses (e.g. 25a, 25b, etc.). A tilde (~) is also used at the end and the beginning of the units to indicate the continuity. Thus, for instance:
|
05-HEND-05-SPKR-02(02)-CHAP-01(01) |
20a |
orduan kartzelan, ~ |
|
05-HEND-05-SPKR-02(02)-CHAP-01(01) |
20b |
~ badago indartsu bat. - |
Here 20a and 20b are both part of the same intonation unit, but were split into two units for the purpose of coding the clauses.
The symbol # in the transcripts signals a presummed boundary between a setting element (typically adverbial) and the assertion proper (the rheme or predicate). This was something for me to go back and check. It can be safely ignored. Thus for instance:
|
57-IKAS-21-SPKR-12(23)-CHAP-06(12) |
8 |
Eta ordun, |
|
57-IKAS-21-SPKR-12(23)-CHAP-06(12) |
9 |
ortikan # pasa zen kotxe ba=t- |
When a clause is split into two intonation units, I sometimes use the symbols >> at the end of the first part and << at the beginning of the second part to indicate that the two form a clausal unit, but these symbols too do not represent any aspect of the speech itself.
Finally, XXX indicates that the speech could not be accurately identified (one X per syllable).
If you identify other marks which are not explained here, please let me know.
|
Symbol |
Explanation |
|
text - (text) |
Short pause (< ½ sec.) |
|
text .. (text) |
Medium length pause (~ ½-1 sec.) |
|
text … (text) |
Long pause (> 1-1½ sec.) |
|
@ |
Laughter |
|
, |
At the end of an intonation unit it indicates continuing (non-final) rising intonation contour; inside an intonation unit it indicates a lesser rising intonation contour |
|
. |
At the end of an intonation unit it indicates final, falling intonational contour |
|
^ |
Sharp rise-fall intonation contour associated with the following word |
|
/ |
Sharp rising intonational contour (similar to ‘,’) |
|
\ |
Sharp falling intonational contour (similar to ‘.’) |
|
text= |
Lengthened syllable (in hesitations) |
|
=text |
There is no pause between this and the preceding intonation unit |
|
<X … > |
Period in which a certain feature X holds |
|
text- |
Sudden interruption of a word |
|
text-text |
Dash indicates that the two morphemes form a single phonological word; not used in Basque orthography (see below) |
In a departure from Basque orthography, I have in most cases used a hyphen (-) to connect the two parts of periphrastic verbs. I have done this in order to emphasize the fact that the two parts act as a single word in affirmative clauses for purposes of word order. Occasionally and for simplicity’s sake I have also given the gloss for the whole verbal complex instead of for each of its parts. Similarly, I typically use a hyphen to connect the negative word ez to the finite verb that follows to indicate that it’s cliticized to it. This also is not part of standard practice in Basque orthography. When I do not use the hyphen it is not because of any differences between these and the other sentences, but for the most part to respect the orthography of sources other than my own texts, or because it is not relevant in that context.
Dotted underline: denotes variable accent of topics and other pre-rhematic setting constituents which varies depending on the constituent’s degree of intonational integration with the rest of the clause. If fully accented, they would be dislocated. If unaccented, they would be ‘cliticized’ to the following rheme. A minor, secondary accent indicates an intermediate status, in which the topic (or setting) is part of the clause’s overall intonation unit.
For simplicity, absolutive noun phrases have been left uncoded for grammatical role. Thus all nominals which do not have a grammatical case role mark in the gloss are absolutives. Plurality has also been coded inside the English gloss itself whenever possible for simplicity (e.g. apples instead of apple:pl).
Occasionally I have glossed some grammatical morphemes with a ‘meaningful’ grammatical label, such as more (for -ago-), if (for conditional ba-), whether, because, etc. I have also done this with occasional examples from other sources.
Start | Recordings | MP3 listing | Transcripts | Background | Pictures | Movies