APL hardware: Difference between revisions

From APL Wiki
Jump to navigation Jump to search
No edit summary
m (Text replacement - "<source" to "<syntaxhighlight")
Tags: Mobile edit Mobile web edit
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
APL Hardware is hardware that has been designed to natively support APL array operations. This breaks the popular understanding of APL as an interpreted language. Unlike x86, which is targeted to operate on individual scalars one at a time, native APL architectures would be targeted to operate on entire arrays at a time, thereby increasing the speed of APL processing.
APL hardware is hardware that has been designed to natively support APL [[array]] operations. This breaks the popular understanding of APL as an interpreted language. Unlike [[wikipedia:x86|x86]], which is targeted to operate on individual [[scalar]]s one at a time, native APL architectures would be targeted to operate on entire arrays at a time, thereby increasing the speed of APL processing.
 
== The APL Machine ==
{{Main|APL Machine}}
The APL Machine was an actual (as opposed to theoretical) hardware implementation, created with the explicit purpose of facilitating programming an analog array processor.


== Cellular APL Computer ==
== Cellular APL Computer ==
=== Overview ===
A 1970 paper describes a possible general design for a computer which implements a dialect of APL as its machine language. The purpose of the design was to take advantage of the inherent parallelism in APL by being flexible enough to operate on entire arrays. The design was built to be cellular, meaning that each component would handle a separate part of the APL logic.<ref>Thurber, Kenneth J. and Myrna, John W. [https://ieeexplore.ieee.org/document/1671509 System Design of a Cellular APL Computer]. IEEE Transactions on Computers, volume C-19, issue 4. Institute of Electrical and Electronics Engineers. April 1970.</ref>
[https://ieeexplore.ieee.org/document/1671509 System Design of a Cellular APL Computer], written in April 1970 by Kenneth J. Thurber and John W. Myrna, is a paper describing a possible general design for a computer which implements a dialect of APL as its machine language. The purpose of the design was to take advantage of the inherent parallelism in APL by being flexible enough to operate on entire arrays. The design was built to be cellular, meaning that each component would handle a separate part of the APL logic.  
=== Design ===
=== Design ===
The specified design contains:
The specified design contains:
Line 12: Line 15:
* thirty-two vector accumulators (VA1, VA2..., VA32)
* thirty-two vector accumulators (VA1, VA2..., VA32)
* input-output controllers (IOC)
* input-output controllers (IOC)
* a preprocessor (PP)
* a pre-processor (PP)


The MLIM is a 32x32 array of memory cells. Each memory cell contains four shift registers named A, B, C, and T. This is equivalent to creating 4 arrays of memory cells with one shift register each. The arrays created by registers A, B, and C are used to store and operate on data, while T is temporary array storage for the result of an operation. The operations which the MLIM can perform can either read from A and B to store the result in C, or read from C and B and store the result in A. The RL processing helps place each memory array in the correct locations of A, B, and C such that the operands line up before the MLIM performs its operations.  
The MLIM is a 32x32 array of memory cells. Each memory cell contains four shift registers named A, B, C, and T. This is equivalent to creating 4 arrays of memory cells with one shift register each. The arrays created by registers A, B, and C are used to store and operate on data, while T is temporary array storage for the result of an operation. The operations which the MLIM can perform can either read from A and B to store the result in C, or read from C and B and store the result in A. The RL processing helps place each memory array in the correct locations of A, B, and C such that the operands line up before the MLIM performs its operations.  
MA1 through MA16 are each 32x32 arrays of memory cells which can each store one word (16 bits). This means the total array storage of the computer is 16,384 words.  
 
MA1 through MA16 are each 32×32 arrays of memory cells which can each store one word (16 bits). This means the total array storage of the computer is 16,384 words.  
 
The IMU is a temporary location for instructions, to "give the programmer a usable memory of 16,384 words." Each cell is a 32-bit read-only memory cell.  
The IMU is a temporary location for instructions, to "give the programmer a usable memory of 16,384 words." Each cell is a 32-bit read-only memory cell.  
The RL is a specialized transfer system that can perform row- or column-wise transfer of data to the MLIM or VAs. It can also index into matrices and vectors during transfer.
The RL is a specialized transfer system that can perform row- or column-wise transfer of data to the MLIM or VAs. It can also index into matrices and vectors during transfer.
The IOC is a generalized input/output system that can be modified for any purpose. Input from the IOC is fed through the PP before it is fed into the RL.  
The IOC is a generalized input/output system that can be modified for any purpose. Input from the IOC is fed through the PP before it is fed into the RL.  
The PP would handle storage allocation, basic operations, and other operations which the MLIM cannot perform.
The PP would handle storage allocation, basic operations, and other operations which the MLIM cannot perform.
VA1 through VA32 are each composed of two 32 bit registers A and B. Register A of each accumulator is connected to the Routing and Control Logic board via a decoder. Each decoder is connected to its VA via a 32-bit bus, to the Routing and Control Logic board via a 32-bit output bus, and to the PP via a 5-bit input logic bus. if p is the value on the 5-bit bus such that 0≤p≤31, then the 32-bit bus shifts the bits in the output such that it returns 32-p, 32-p+1, ..., 32. Thus, it shifts the input left by p bits, masking out the indices that are greater than or equal to 32. Thus, this type of register is called a ''shift register''. The decoders are considered a part of the RL cell. Register B is directly connected to register A, and has a direct vector routing bus connected to the Routing and Control Logic board.  
 
It is an important functionality that vectors can be loaded into right justified into a VA, then read offset such that its length ≤ 32. Because of register B, each accumulator can perform reductions by repeatedly adding the register A.  
VA1 through VA32 are each composed of two 32 bit registers A and B. Register A of each accumulator is connected to the Routing and Control Logic board via a decoder. Each decoder is connected to its VA via a 32-bit bus, to the Routing and Control Logic board via a 32-bit output bus, and to the PP via a 5-bit input logic bus. if p is the value on the 5-bit bus such that 0 ≤ p ≤ 31, then the 32-bit bus shifts the bits in the output such that it returns 32−p, 32−p+1, , 32. Thus, it shifts the input left by p bits, masking out the indices that are greater than or equal to 32. Thus, this type of register is called a ''shift register''. The decoders are considered a part of the RL cell. Register B is directly connected to register A, and has a direct vector routing bus connected to the Routing and Control Logic board.  
 
It is an important functionality that vectors can be loaded into right justified into a VA, then read offset such that its length ≤ 32. Because of register B, each accumulator can perform reductions by repeatedly adding the register A.


=== Supported Operations ===
=== Supported Operations ===
Values can be transferred using the <source lang=apl inline>←</source> operator.
Values can be transferred using the <syntaxhighlight lang=apl inline>←</syntaxhighlight> operator.
The MLIM natively supports the monadic operations:
The MLIM natively supports the monadic operations:
<source lang=apl inline>(A,B,C,T)←~(A,B,C,T)</source>
<syntaxhighlight lang=apl inline>(A,B,C,T)←~(A,B,C,T)</syntaxhighlight>
and the dyadic operations:
and the dyadic operations:
<source lang=apl>
<syntaxhighlight lang=apl>
(C,A)←(A,C)+B
(C,A)←(A,C)+B
(C,A)←(A,C)-B
(C,A)←(A,C)-B
Line 38: Line 46:
(C,A)←(A,C)=B
(C,A)←(A,C)=B
(C,A)←(A,C)>B
(C,A)←(A,C)>B
</source>
</syntaxhighlight>
The operations can be combined to create the following operations:
The operations can be combined to create the following operations:
<source lang=apl>
<syntaxhighlight lang=apl>
+M
+M
×M
×M
Line 65: Line 73:
M⍱N
M⍱N
M⍲N
M⍲N
</source>
</syntaxhighlight>
The masking functionality of the RL combined with the native ability for scan (reduce while reading the output in between each step) and reduce (<source lang=apl inline>\</source> and <source lang=apl inline>/</source> respectively) allows for iota (monadic <source lang=apl inline>⍳</source>) to be defined, while its generalized indexing functionality allows reverse and transpose (monadic <source lang=apl inline>⌽ ⍉</source>) to be defined. Shape and Ravel (monadic <source lang=apl inline>, </source>) can also be defined by using the RL and PP in parallel. Thus, a list of complex operators can be defined:  
The masking functionality of the RL combined with the native ability for [[Scan]] (reduce while reading the output in between each step) and [[Reduce]] (<syntaxhighlight lang=apl inline>\</syntaxhighlight> and <syntaxhighlight lang=apl inline>/</syntaxhighlight> respectively) allows for [[Index Generator]] (monadic <syntaxhighlight lang=apl inline>⍳</syntaxhighlight>) to be defined, while its generalized indexing functionality allows [[Reverse]] and [[Transpose]] (monadic <syntaxhighlight lang=apl inline>⌽</syntaxhighlight> and <syntaxhighlight lang=apl inline>⍉</syntaxhighlight>) to be defined. [[Shape]] and [[Ravel]] (monadic <syntaxhighlight lang=apl inline>⍴</syntaxhighlight> and <syntaxhighlight lang=apl inline>,</syntaxhighlight>) can also be defined by using the RL and PP in parallel. Thus, a list of complex operators can be defined:  
<source lang=apl>
<syntaxhighlight lang=apl>
⍳N
⍳N
,N
,N
Line 86: Line 94:
M∘.b N
M∘.b N
M[N]
M[N]
</source>
</syntaxhighlight>
This paper does not outline floating point arithmetic. Many functions may be missing because floating point arithmetic has not been defined.
This paper does not outline floating point arithmetic. Many functions may be missing because floating point arithmetic has not been defined.
== All Applications Digital Computer ==
The [http://www.bitsavers.org/pdf/raytheon/military/aadc/The_All_Application_Digital_Computer_Nov73.pdf All Applications Digital Computer] (AADC) is a paper written by Stanley M. Nissen and Steven J. Wallach in 1973 detailing a modular computer architecture which can process APL natively.
== References ==
<references/>
{{APL development}}[[Category:Performance]][[Category:Lists]]

Latest revision as of 22:13, 10 September 2022

APL hardware is hardware that has been designed to natively support APL array operations. This breaks the popular understanding of APL as an interpreted language. Unlike x86, which is targeted to operate on individual scalars one at a time, native APL architectures would be targeted to operate on entire arrays at a time, thereby increasing the speed of APL processing.

The APL Machine

Main article: APL Machine

The APL Machine was an actual (as opposed to theoretical) hardware implementation, created with the explicit purpose of facilitating programming an analog array processor.

Cellular APL Computer

A 1970 paper describes a possible general design for a computer which implements a dialect of APL as its machine language. The purpose of the design was to take advantage of the inherent parallelism in APL by being flexible enough to operate on entire arrays. The design was built to be cellular, meaning that each component would handle a separate part of the APL logic.[1]

Design

The specified design contains:

  • a matrix logic-in-memory unit (MLIM)
  • sixteen memory arrays (MA1, MA2..., MA16)
  • an instruction memory unit (IMU)
  • routing logic (RL)
  • thirty-two vector accumulators (VA1, VA2..., VA32)
  • input-output controllers (IOC)
  • a pre-processor (PP)

The MLIM is a 32x32 array of memory cells. Each memory cell contains four shift registers named A, B, C, and T. This is equivalent to creating 4 arrays of memory cells with one shift register each. The arrays created by registers A, B, and C are used to store and operate on data, while T is temporary array storage for the result of an operation. The operations which the MLIM can perform can either read from A and B to store the result in C, or read from C and B and store the result in A. The RL processing helps place each memory array in the correct locations of A, B, and C such that the operands line up before the MLIM performs its operations.

MA1 through MA16 are each 32×32 arrays of memory cells which can each store one word (16 bits). This means the total array storage of the computer is 16,384 words.

The IMU is a temporary location for instructions, to "give the programmer a usable memory of 16,384 words." Each cell is a 32-bit read-only memory cell. The RL is a specialized transfer system that can perform row- or column-wise transfer of data to the MLIM or VAs. It can also index into matrices and vectors during transfer.

The IOC is a generalized input/output system that can be modified for any purpose. Input from the IOC is fed through the PP before it is fed into the RL. The PP would handle storage allocation, basic operations, and other operations which the MLIM cannot perform.

VA1 through VA32 are each composed of two 32 bit registers A and B. Register A of each accumulator is connected to the Routing and Control Logic board via a decoder. Each decoder is connected to its VA via a 32-bit bus, to the Routing and Control Logic board via a 32-bit output bus, and to the PP via a 5-bit input logic bus. if p is the value on the 5-bit bus such that 0 ≤ p ≤ 31, then the 32-bit bus shifts the bits in the output such that it returns 32−p, 32−p+1, …, 32. Thus, it shifts the input left by p bits, masking out the indices that are greater than or equal to 32. Thus, this type of register is called a shift register. The decoders are considered a part of the RL cell. Register B is directly connected to register A, and has a direct vector routing bus connected to the Routing and Control Logic board.

It is an important functionality that vectors can be loaded into right justified into a VA, then read offset such that its length ≤ 32. Because of register B, each accumulator can perform reductions by repeatedly adding the register A.

Supported Operations

Values can be transferred using the operator. The MLIM natively supports the monadic operations: (A,B,C,T)←~(A,B,C,T) and the dyadic operations:

(C,A)←(A,C)+B
(C,A)←(A,C)-B
(C,A)←(A,C)×B
(C,A)←(A,C)÷B  ⍝ when B ≠ 0
(C,A)←(A,C)∨B
(C,A)←(A,C)∧B
(C,A)←(A,C)<B
(C,A)←(A,C)=B
(C,A)←(A,C)>B

The operations can be combined to create the following operations:

+M
×M
|M
?M
~M
!M
M+N
M-N
M×N
M÷N
M⌊N
M⌈N
M!N
M*N
M<N
M≤N
M=N
M≥N
M>N
M≠N
M∨N
M∧N
M⍱N
M⍲N

The masking functionality of the RL combined with the native ability for Scan (reduce while reading the output in between each step) and Reduce (\ and / respectively) allows for Index Generator (monadic ) to be defined, while its generalized indexing functionality allows Reverse and Transpose (monadic and ) to be defined. Shape and Ravel (monadic and ,) can also be defined by using the RL and PP in parallel. Thus, a list of complex operators can be defined:

⍳N
,N
⍴N
⌽N
⍉N

M⍴N
M,N
M⌽N
M⍳N
M/N
M\N
M⍉N
M↑N
M↓N
M∊N
M∘.b N
M[N]

This paper does not outline floating point arithmetic. Many functions may be missing because floating point arithmetic has not been defined.

All Applications Digital Computer

The All Applications Digital Computer (AADC) is a paper written by Stanley M. Nissen and Steven J. Wallach in 1973 detailing a modular computer architecture which can process APL natively.

References

  1. Thurber, Kenneth J. and Myrna, John W. System Design of a Cellular APL Computer. IEEE Transactions on Computers, volume C-19, issue 4. Institute of Electrical and Electronics Engineers. April 1970.
APL development [edit]
Interface SessionTyping glyphs (on Linux) ∙ FontsText editors
Publications IntroductionsLearning resourcesSimple examplesAdvanced examplesMnemonicsISO 8485:1989ISO/IEC 13751:2001A Dictionary of APLCase studiesDocumentation suitesBooksPapersVideosAPL Quote QuadVector journalTerminology (Chinese, German) ∙ Neural networksError trapping with Dyalog APL (in forms)
Sharing code Backwards compatibilityAPLcartAPLTreeAPL-CationDfns workspaceTatinCider
Implementation ResourcesOpen-sourceMagic functionPerformanceAPL hardware
Developers Timeline of corporationsAPL2000DyalogIBMIPSASTSC