Fun with Text
This page demonstrates how to read a text file and how to perform simple text manipulation
Did Shakespeare write the Bible? According to some numerologists he did.
The theory states that the Authorized King James version of the Bible was published in 1611, when Shakespeare was 46 years old. If you look at the text of Psalm 46, the 46th word from the beginning of the Psalm is 'Shake' and the 46th word from the end is 'Spear'.
So can we use APL to prove it without all that tedious counting?
Let's imagine that you have a text file containing the text of Psalm 46. (The words are included at the end of this page for reference, and you get a text file version by downloading the following attachment : psalm46.txt).
Reading a Text File
The first thing we need to do is to read the text file into an APL variable. The way you do this varies slightly from one APL to another. Most APLs support a system function called ⎕NREAD. For example you can read the whole file in APLX (no longer under development) like this:
'C:\simon\psalm46.txt' ⎕NTIE ¯1 text←⎕NREAD ¯1 4 ⎕NUNTIE ¯1 ⍝ Check it worked OK: ⍴text 1101 20↑text 1: God is our refuge
Here's the same thing in APL+Win:
'C:\simon\psalm46.txt' ⎕NTIE ¯1 text←⎕NREAD ¯1 82,(⎕NSIZE ¯1),0 in APL+WIN ⎕NUNTIE ¯1
...and Dyalog APL is also very similar.
The ⎕NTIE function is used to open an existing file (You would use ⎕NCREATE / ⎕NWRITE / ⎕NUNTIE to make a new one). The right argument is the tie number which is used to identify the file in subsequent calls like ⎕NREAD, since you can have more than one file open at a time. (For some APLs, the tie number must be negative)
The ⎕NREAD function reads the contents of the text file. The two parameters used here are the tie number and a conversion parameter, which in this case tells ⎕NREAD to read the file's contents as text. You could also read the file as a series of raw bytes, UTF-8, integers, etc.
⎕NREAD takes a number of extra parameters to specify whereabouts in the file to start reading, the number of bytes to read, etc. If your APL allows these to be omitted, the defaults are to start at the beginning of the file and read it all.
Finally we close the file using ⎕NUNTIE
A Note on Tie Numbers
In the example above we used a hard-coded ¯1 as the file tie number. This is OK for a simple example, but in a larger application the APL programmer often allocates the tie number dynamically. The system function ⎕NNUMS returns an integer vector of the tie numbers currently in use by open files, so one approach is to use the following expression:
⍝ For APLs which expect positive tie numbers ⍝ Returns an integer larger than any currently in use 1+⌈/0,⎕NNUMS ⍝ For APLs which expect negative tie numbers ⍝ Returns an integer smaller than any currently in use ¯1+⌊/0,⎕NNUMS
However, Phil Last suggests one of the following because the expressions above can slip into floating point format when you have a long-running process which opens and closes multiple files:
If you are using Dyalog APL you can let the interpreter choose a free tie number for you by using 0:
tie←'C:\simon\psalm46.txt' ⎕NTIE 0
Breaking the text up into words
Having read the text, we want to break it up into words so we can find the 46th word.
In most APLs you can use ⎕A to get the alphabet in uppercase, and ⎕a to get it in lowercase, so '⎕A,⎕a' is equivalent to ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz. We can therefore find out which characters in the text are letters, and which are other characters like white space, punctuation, etc:
t←text∊⎕A,⎕a 20↑text 1: God is our refuge 20↑t 0 0 0 1 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1
We can use this information in a number of ways. First we can find out what the text is with all non-alphabetic characters removed:
Secondly we can use it to partition the text into words - i.e. to create a nested vector where each entry is a separate word:
t⊂text God is our refuge and strength a very present help in trouble ...etc
This is a little bit more obvious when displayed like this:
words←t⊂text ⎕display 6↑words ┌→───────────────────────────────────────────┐ │ ┌→──┐ ┌→─┐ ┌→──┐ ┌→─────┐ ┌→──┐ ┌→───────┐ │ │ │God│ │is│ │our│ │refuge│ │and│ │strength│ │ │ └───┘ └──┘ └───┘ └──────┘ └───┘ └────────┘ │ └∊───────────────────────────────────────────┘
So now we're ready to test the theory that Shakespeare wrote Psalm 46:
46⌷words shake 46⌷⌽words in
Well, half right. The 46th word from the beginning is 'shake', but the 46th word from the end doesn't look right. Where does the word 'spear' appear, counting from the end?
Looks like that extra word 'Selah' at the end might be the problem:
However, it seems to occur several times in the Psalm. If we remove it, is 'shake' still the 46th word from the start?
There it is! Proof that Shakespeare wrote Psalm 46. I know I'm convinced
Here's the text of Psalm 46 for reference:
1: God is our refuge and strength, a very present help in trouble.
2: Therefore will not we fear, though the earth be removed, and though the mountains be carried into the midst of the sea;
3: Though the waters thereof roar and be troubled, though the mountains shake with the swelling thereof. Selah.
4: There is a river, the streams whereof shall make glad the city of God, the holy place of the tabernacles of the most High.
5: God is in the midst of her; she shall not be moved: God shall help her, and that right early.
6: The heathen raged, the kingdoms were moved: he uttered his voice, the earth melted.
7: The LORD of hosts is with us; the God of Jacob is our refuge. Selah.
8: Come, behold the works of the LORD, what desolations he hath made in the earth.
9: He maketh wars to cease unto the end of the earth; he breaketh the bow, and cutteth the spear in sunder; he burneth the chariot in the fire.
10: Be still, and know that I am God: I will be exalted among the heathen, I will be exalted in the earth.
11: The LORD of hosts is with us; the God of Jacob is our refuge. Selah.