Corpus info

General stats

Tokens 210150
Words 201790
Types14373
Lemmas4947
Hapax legomenon8213
Dis legomenon2608
POS tags566

Tokens = strings separated by white spaces (punctuation marks included).
Words = strings separated by white spaces (punctuation marks excluded).
Types = unique words (based on standardized spelling and case insensitive).

Documents

Number of documents 401
Average (tokens per document)524
Median (tokens per document)350
Longest document (tokens)9105
Shortest document (tokens)30
Oldest document (year)
Most recent document (year)

Group by part of speech

Main POS tagN%
untagged10477049.85
common noun191319.10
preposition178748.51
verb165037.85
pronoun130536.21
determiner113945.42
conjunction90204.29
punctuation83603.98
adverb47002.24
adjective30271.44
proper noun18820.90
interjection4360.21
numeral00.00
foreign word00.00
Total210150100.00

Group by project

ProjectN%
Total0100.00

Group by text type

Text typeN%
Total0100.00

Group by century

CenturyN%
Total0100.00

Group by province

ProvinceN%
_9858346.91
Guipúzcoa208659.93
Valladolid202569.64
Nápoles170018.09
Barcelona78573.74
Zaragoza57652.74
s.l.43042.05
¿?36661.74
Huesca28141.34
Madrid26601.27
Valencia24961.19
Vizcaya24081.15
Nueva Aquitania23901.14
Caller17510.83
Barcerlona16880.80
Roma16220.77
Lacio13120.62
Piacenza11220.53
Viena11210.53
Álava11050.53
Isla de Francia9500.45
Toledo7920.38
Tirol6930.33
Figueras6020.29
Perpiñán5720.27
Mühldorf5670.27
Salamanca5500.26
Amberes5100.24
Castellón4490.21
Sevilla4310.21
Navarra4060.19
París3800.18
Castilnuovo de Nápoles3520.17
Murcia3300.16
Liguria3170.15
Guipúzcoa?2900.14
Bruselas-Capital2600.12
Aragón2500.12
Jaén2480.12
Badajoz2120.10
Lisboa2030.10
Total210150100.00

Group by institution

InstitutionN%
Total0100.00

Group by century and province (absolute frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 0 0
Granada 0
Jaén 0
Málaga 0 0
Córdoba 0
Cádiz 0 0
Sevilla 0
Huelva 0
Madrid 0 0
Burgos 0
others 0 0
Total (century) 0 0 0 0 0 0 0

Group by century and province (relative frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Granada 0.00 0.00 0.00 0.00 0.00 0.00
Jaén 0.00 0.00 0.00 0.00 0.00 0.00
Málaga 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Córdoba 0.00 0.00 0.00 0.00 0.00 0.00
Cádiz 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Sevilla 0.00 0.00 0.00 0.00 0.00 0.00
Huelva 0.00 0.00 0.00 0.00 0.00 0.00
Madrid 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Burgos 0.00 0.00 0.00 0.00 0.00 0.00
others 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Total (century) 0.00 0.00 0.00 0.00 0.00 100.00 100.00

Measures of lexical diversity

MeasureDescriptionFormulaResult
TTR type-token ratio TTR = V N 0.071
RTTR Giraud's root type-token ratio RTTR = V N 31.996
CTTR Carroll's corrected type-token ratio CTTR = V 2N 22.625
C Herdan's C index C = log V log N 0.784
S Somer's S index S = log ( log 𝑉 ) log ( log 𝑁 ) 0.903
M Maas' index M = ( log 𝑁 - log 𝑉 ) log 𝑁 2 0.029
H Honoré's index H = 100 * ( log ⁡N 1 - V 1 V ) 2850.097
K Yule's K index K = 10 4 * [ - 1 N + i = 1 V f v ( i , N ) * ( i N ) 2 ] 110.898
D Simpson's D index D = i = 1 V f v ( i , N ) * ( i N ) * ( i - 1 N - 1 ) 0.011
HTR Hapax-token ratio HTR = V 1 V 0.571
DTR Dis-token ratio DTR = V 2 V 0.181
VGR Vocabulary growth rate VGR = V 1 N 0.041

N = number of words; V = number of types; V1 = number of hapax legomenon; V2 = number of dis legomenon; f v ( i , N ) = numbers of types occurring i times in a sample of length N.