Corpus info

General stats

Tokens 210138
Words 201787
Types14369
Lemmas5012
Hapax legomenon8213
Dis legomenon2607
POS tags546

Tokens = strings separated by white spaces (punctuation marks included).
Words = strings separated by white spaces (punctuation marks excluded).
Types = unique words (based on standardized spelling and case insensitive).

Documents

Number of documents 401
Average (tokens per document)524
Median (tokens per document)350
Longest document (tokens)9105
Shortest document (tokens)30
Oldest document (year)
Most recent document (year)

Group by part of speech

Main POS tagN%
untagged10484249.89
common noun193839.22
preposition178548.50
verb165037.85
pronoun131306.25
determiner113835.42
conjunction88884.23
punctuation83513.97
adverb47042.24
adjective29141.39
proper noun17500.83
interjection4360.21
numeral00.00
foreign word00.00
Total210138100.00

Group by project

ProjectN%
Total0100.00

Group by text type

Text typeN%
Total0100.00

Group by century

CenturyN%
Total0100.00

Group by province

ProvinceN%
_9857146.91
Guipúzcoa208659.93
Valladolid202569.64
Nápoles170018.09
Barcelona78573.74
Zaragoza57652.74
s.l.43042.05
¿?36661.74
Huesca28141.34
Madrid26601.27
Valencia24961.19
Vizcaya24081.15
Nueva Aquitania23901.14
Caller17510.83
Barcerlona16880.80
Roma16220.77
Lacio13120.62
Piacenza11220.53
Viena11210.53
Álava11050.53
Isla de Francia9500.45
Toledo7920.38
Tirol6930.33
Figueras6020.29
Perpiñán5720.27
Mühldorf5670.27
Salamanca5500.26
Amberes5100.24
Castellón4490.21
Sevilla4310.21
Navarra4060.19
París3800.18
Castilnuovo de Nápoles3520.17
Murcia3300.16
Liguria3170.15
Guipúzcoa?2900.14
Bruselas-Capital2600.12
Aragón2500.12
Jaén2480.12
Badajoz2120.10
Lisboa2030.10
Total210138100.00

Group by institution

InstitutionN%
Total0100.00

Group by century and province (absolute frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 0 0
Granada 0
Jaén 0
Málaga 0 0
Córdoba 0
Cádiz 0 0
Sevilla 0
Huelva 0
Madrid 0 0
Burgos 0
others 0 0
Total (century) 0 0 0 0 0 0 0

Group by century and province (relative frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Granada 0.00 0.00 0.00 0.00 0.00 0.00
Jaén 0.00 0.00 0.00 0.00 0.00 0.00
Málaga 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Córdoba 0.00 0.00 0.00 0.00 0.00 0.00
Cádiz 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Sevilla 0.00 0.00 0.00 0.00 0.00 0.00
Huelva 0.00 0.00 0.00 0.00 0.00 0.00
Madrid 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Burgos 0.00 0.00 0.00 0.00 0.00 0.00
others 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Total (century) 0.00 0.00 0.00 0.00 0.00 100.00 100.00

Measures of lexical diversity

MeasureDescriptionFormulaResult
TTR type-token ratio TTR = V N 0.071
RTTR Giraud's root type-token ratio RTTR = V N 31.987
CTTR Carroll's corrected type-token ratio CTTR = V 2N 22.619
C Herdan's C index C = log V log N 0.784
S Somer's S index S = log ( log 𝑉 ) log ( log 𝑁 ) 0.903
M Maas' index M = ( log 𝑁 - log 𝑉 ) log 𝑁 2 0.029
H Honoré's index H = 100 * ( log ⁡N 1 - V 1 V ) 2851.151
K Yule's K index K = 10 4 * [ - 1 N + i = 1 V f v ( i , N ) * ( i N ) 2 ] 110.745
D Simpson's D index D = i = 1 V f v ( i , N ) * ( i N ) * ( i - 1 N - 1 ) 0.011
HTR Hapax-token ratio HTR = V 1 V 0.572
DTR Dis-token ratio DTR = V 2 V 0.181
VGR Vocabulary growth rate VGR = V 1 N 0.041

N = number of words; V = number of types; V1 = number of hapax legomenon; V2 = number of dis legomenon; f v ( i , N ) = numbers of types occurring i times in a sample of length N.