File:Zipf-euro-2 Biblical texts - Latin Vulgate OT, Greek NT, Russian OT.svg

Original file(SVG file, nominally 512 × 504 pixels, file size: 2.01 MB)

Captions

Captions

Ziplf Law plot for Biblical texts: Latin Vulgate Pentateuch, Greek NT, Russian Pentateuch

Summary edit

Description
English: Zipf law plot (frequency as function of frequency rank) for the words in Biblical texts, in Latin, Greek, and Russian.

The languages, texts and the frequency files are:

  • Latin. The first five books (the Pentateuch) from the Latin version (Vulgate) of the Old Testament, edited by St. Jerome around 400 CE. Converted to lowercase. Sample: in principio creavit deus caelum et terram terra autem erat inanis et [...] chananei. File latn/ptt/tot.1/gud.wfr (original 96870 words, truncated/filtered to 35027 words, N = 6633 distinct).
  • Greek. The full Byzantine text-type or Majority Text version of the New Testament (27 books) in vulgar Byzantine Greek (koiné), from 300 CE or earlier. Transcribed by Mark Fuller. Converted to a had-hoc encoding of the Greek alphabet into ISO Latin-1. Whole text (27 books). Sample: biblos geneseôs iësou qristou uiou dauid uiou abraam abraam egennësen [...] marðas tës. File grek/nwt/tot.1/gud.wfr (original 66183 words, truncated/filtered to 35027 words, N = 5436 distinct).
  • Russian. The first five books (the Pentateuch) from the Synodal Russian Bible (1876). Translated from Old Slavonic, with many archaic words. Romanized, all lowercase. Sample: v nachale sotvoril bog nebo i zemlyu zemlya zhe byla bezvidna i pusta i [...] v den' sobraniya i otdal ikh gospod' mne i. File russ/ptr/tot.1/gud.wfr (original 111824 words, truncated/filtered to 35027 words, N = 5520 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the UNICAMP website. The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.
Date
Source Own work
Author Jorge Stolfi

Licensing edit

I, the copyright holder of this work, hereby publish it under the following license:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current22:31, 15 May 2023Thumbnail for version as of 22:31, 15 May 2023512 × 504 (2.01 MB)Jorge Stolfi (talk | contribs)Rebuilt the file with small changes in dataset, colors
14:45, 9 May 2023Thumbnail for version as of 14:45, 9 May 2023512 × 504 (2.01 MB)Jorge Stolfi (talk | contribs)Uploaded own work with UploadWizard

There are no pages that use this file.

File usage on other wikis

The following other wikis use this file:

Metadata