Corpus info

General stats

Tokens 210151
Words 201792
Types14370
Lemmas4965
Hapax legomenon8213
Dis legomenon2607
POS tags555

Tokens = strings separated by white spaces (punctuation marks included).
Words = strings separated by white spaces (punctuation marks excluded).
Types = unique words (based on standardized spelling and case insensitive).

Documents

Number of documents 401
Average (tokens per document)524
Median (tokens per document)350
Longest document (tokens)9105
Shortest document (tokens)30
Oldest document (year)
Most recent document (year)

Group by part of speech

Main POS tagN%
untagged10487249.90
common noun192429.16
preposition178388.49
verb165047.85
pronoun130706.22
determiner113765.41
conjunction89684.27
punctuation83593.98
adverb47012.24
adjective29441.40
proper noun18410.88
interjection4360.21
numeral00.00
foreign word00.00
Total210151100.00

Group by project

ProjectN%
Total0100.00

Group by text type

Text typeN%
Total0100.00

Group by century

CenturyN%
Total0100.00

Group by province

ProvinceN%
_9858446.91
Guipúzcoa208659.93
Valladolid202569.64
Nápoles170018.09
Barcelona78573.74
Zaragoza57652.74
s.l.43042.05
¿?36661.74
Huesca28141.34
Madrid26601.27
Valencia24961.19
Vizcaya24081.15
Nueva Aquitania23901.14
Caller17510.83
Barcerlona16880.80
Roma16220.77
Lacio13120.62
Piacenza11220.53
Viena11210.53
Álava11050.53
Isla de Francia9500.45
Toledo7920.38
Tirol6930.33
Figueras6020.29
Perpiñán5720.27
Mühldorf5670.27
Salamanca5500.26
Amberes5100.24
Castellón4490.21
Sevilla4310.21
Navarra4060.19
París3800.18
Castilnuovo de Nápoles3520.17
Murcia3300.16
Liguria3170.15
Guipúzcoa?2900.14
Bruselas-Capital2600.12
Aragón2500.12
Jaén2480.12
Badajoz2120.10
Lisboa2030.10
Total210151100.00

Group by institution

InstitutionN%
Total0100.00

Group by century and province (absolute frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 0 0
Granada 0
Jaén 0
Málaga 0 0
Córdoba 0
Cádiz 0 0
Sevilla 0
Huelva 0
Madrid 0 0
Burgos 0
others 0 0
Total (century) 0 0 0 0 0 0 0

Group by century and province (relative frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Granada 0.00 0.00 0.00 0.00 0.00 0.00
Jaén 0.00 0.00 0.00 0.00 0.00 0.00
Málaga 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Córdoba 0.00 0.00 0.00 0.00 0.00 0.00
Cádiz 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Sevilla 0.00 0.00 0.00 0.00 0.00 0.00
Huelva 0.00 0.00 0.00 0.00 0.00 0.00
Madrid 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Burgos 0.00 0.00 0.00 0.00 0.00 0.00
others 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Total (century) 0.00 0.00 0.00 0.00 0.00 100.00 100.00

Measures of lexical diversity

MeasureDescriptionFormulaResult
TTR type-token ratio TTR = V N 0.071
RTTR Giraud's root type-token ratio RTTR = V N 31.989
CTTR Carroll's corrected type-token ratio CTTR = V 2N 22.620
C Herdan's C index C = log V log N 0.784
S Somer's S index S = log ( log 𝑉 ) log ( log 𝑁 ) 0.903
M Maas' index M = ( log 𝑁 - log 𝑉 ) log 𝑁 2 0.029
H Honoré's index H = 100 * ( log ⁡N 1 - V 1 V ) 2850.892
K Yule's K index K = 10 4 * [ - 1 N + i = 1 V f v ( i , N ) * ( i N ) 2 ] 110.748
D Simpson's D index D = i = 1 V f v ( i , N ) * ( i N ) * ( i - 1 N - 1 ) 0.011
HTR Hapax-token ratio HTR = V 1 V 0.572
DTR Dis-token ratio DTR = V 2 V 0.181
VGR Vocabulary growth rate VGR = V 1 N 0.041

N = number of words; V = number of types; V1 = number of hapax legomenon; V2 = number of dis legomenon; f v ( i , N ) = numbers of types occurring i times in a sample of length N.