Corpus info

General stats

Tokens 80184
Words 80184
Types8045
Lemmas6
Hapax legomenon4955
Dis legomenon1263
POS tags6

Tokens = strings separated by white spaces (punctuation marks included).
Words = strings separated by white spaces (punctuation marks excluded).
Types = unique words (based on standardized spelling and case insensitive).

Documents

Number of documents 259
Average (tokens per document)310
Median (tokens per document)267
Longest document (tokens)1071
Shortest document (tokens)6
Oldest document (year)
Most recent document (year)

Group by part of speech

Main POS tagN%
untagged80183100.00
common noun10.00
proper noun00.00
verb00.00
adjective00.00
adverb00.00
determiner00.00
pronoun00.00
preposition00.00
numeral00.00
interjection00.00
conjunction00.00
foreign word00.00
punctuation00.00
Total80184100.00

Group by project

ProjectN%
Total0100.00

Group by text type

Text typeN%
Total0100.00

Group by century

CenturyN%
Total0100.00

Group by province

ProvinceN%
Granada1928424.05
Madrid1293516.13
_1228615.32
Córdoba987312.31
Salamanca886211.05
Huesca73449.16
País Vasco23392.92
Andalucía20992.62
Navarra11241.40
Sevilla6560.82
Cuenca5680.71
Madrid 5660.71
Vizcaya4900.61
Hueca4710.59
huesca4300.54
Valladolid3650.46
Castilla-La Mancha2540.32
Castilla-la Mancha2380.30
Total80184100.00

Group by institution

InstitutionN%
Total0100.00

Group by century and province (absolute frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 0 0
Granada 0
Jaén 0
Málaga 0 0
Córdoba 0
Cádiz 0 0
Sevilla 0
Huelva 0
Madrid 0 0
Burgos 0
others 0 0
Total (century) 0 0 0 0 0 0 0

Group by century and province (relative frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Granada 0.00 0.00 0.00 0.00 0.00 0.00
Jaén 0.00 0.00 0.00 0.00 0.00 0.00
Málaga 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Córdoba 0.00 0.00 0.00 0.00 0.00 0.00
Cádiz 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Sevilla 0.00 0.00 0.00 0.00 0.00 0.00
Huelva 0.00 0.00 0.00 0.00 0.00 0.00
Madrid 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Burgos 0.00 0.00 0.00 0.00 0.00 0.00
others 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Total (century) 0.00 0.00 0.00 0.00 0.00 100.00 100.00

Measures of lexical diversity

MeasureDescriptionFormulaResult
TTR type-token ratio TTR = V N 0.100
RTTR Giraud's root type-token ratio RTTR = V N 28.411
CTTR Carroll's corrected type-token ratio CTTR = V 2N 20.089
C Herdan's C index C = log V log N 0.796
S Somer's S index S = log ( log 𝑉 ) log ( log 𝑁 ) 0.906
M Maas' index M = ( log 𝑁 - log 𝑉 ) log 𝑁 2 0.028
H Honoré's index H = 100 * ( log ⁡N 1 - V 1 V ) 2939.960
K Yule's K index K = 10 4 * [ - 1 N + i = 1 V f v ( i , N ) * ( i N ) 2 ] 122.965
D Simpson's D index D = i = 1 V f v ( i , N ) * ( i N ) * ( i - 1 N - 1 ) 0.012
HTR Hapax-token ratio HTR = V 1 V 0.616
DTR Dis-token ratio DTR = V 2 V 0.157
VGR Vocabulary growth rate VGR = V 1 N 0.062

N = number of words; V = number of types; V1 = number of hapax legomenon; V2 = number of dis legomenon; f v ( i , N ) = numbers of types occurring i times in a sample of length N.