Dyalog APL: Difference between revisions
(→Implementation: Instruction sets) |
(→Instruction set usage: Version numbers for x86 extension first usage) |
||
Line 301: | Line 301: | ||
In Dyalog 17.0, the code for vectorised [[scalar function]]s was unified and extended to allow Intel [[wikipedia:AVX2|AVX2]] and ARM NEON in addition to Intel [[wikipedia:SSE2|SSE2]] and [[wikipedia:SSE4.1|SSE4.1]], and AltiVec VMX for IBM POWER. This code is also used for operations involving the scalar dyadics [[Plus]], [[Minus]], [[Times]], [[Divide]], [[Maximum]], [[Minimum]], and [[comparison function]]s, as well as some functions derived from operators applied to these functions, such as the [[Outer Product]] and [[Inner Product]]. | In Dyalog 17.0, the code for vectorised [[scalar function]]s was unified and extended to allow Intel [[wikipedia:AVX2|AVX2]] and ARM NEON in addition to Intel [[wikipedia:SSE2|SSE2]] and [[wikipedia:SSE4.1|SSE4.1]], and AltiVec VMX for IBM POWER. This code is also used for operations involving the scalar dyadics [[Plus]], [[Minus]], [[Times]], [[Divide]], [[Maximum]], [[Minimum]], and [[comparison function]]s, as well as some functions derived from operators applied to these functions, such as the [[Outer Product]] and [[Inner Product]]. | ||
Dyalog also uses many other x86 extensions: | Dyalog also uses many other x86 extensions: | ||
* [[wikipedia: | * Since at least [[Dyalog APL versions#12.1|12.1]], [[wikipedia:SSE2|SSE2]] is used for [[scalar dyadic]]s. | ||
* [[wikipedia:SSSE3|SSSE3]] is used primarily for the shuffle instruction for permuting arrays and searching small lookup tables. | * Since [[Dyalog APL versions#17.0|17.0]], [[wikipedia:AVX2|AVX2]] is used for scalar dyadics if available. | ||
* [[wikipedia:SSE4.2|SSE4.2]] POPCNT is used to sum Boolean arrays. | * Since [[Dyalog APL versions#14.1|14.1]], [[wikipedia:SSE4.1|SSE4.1]] is used for [[Minimum]] and [[Maximum]], and finding the range of an array. [[wikipedia:AVX2|AVX2]] can also be used for these purposes in [[Dyalog APL versions#18.0|18.0]]. | ||
* [[wikipedia:SSE4.2|SSE4.2]] CRC32 is used to compute fast hash functions. | * Since [[Dyalog APL versions#17.0|17.0]], [[wikipedia:SSSE3|SSSE3]] is used primarily for the shuffle instruction for permuting arrays and searching small lookup tables. | ||
* [[wikipedia:BMI2|BMI2]] is used for Boolean [[Compress]] and [[Expand]], and several [[structural function]]s on Boolean arrays. | * Since [[Dyalog APL versions#14.0|14.0]], [[wikipedia:SSE4.2|SSE4.2]] POPCNT is used to sum Boolean arrays. | ||
* [[wikipedia:CLMUL instruction set|CLMUL]] is used for [[xor]] [[reduction]]s and [[scan]]s | * Since [[Dyalog APL versions#14.0|14.0]], [[wikipedia:SSE4.2|SSE4.2]] CRC32 is used to compute fast hash functions. | ||
* Since [[Dyalog APL versions#15.0|15.0]], [[wikipedia:BMI2|BMI2]] is used for Boolean matrix transpose. Since [[Dyalog APL versions#16.0|16.0]], it is used for Boolean [[Compress]] and [[Expand]], and several [[structural function]]s on Boolean arrays. | |||
* Since [[Dyalog APL versions#18.0|18.0]], [[wikipedia:CLMUL instruction set|CLMUL]] is used for [[xor]] [[reduction]]s and [[scan]]s. | |||
* Since [[Dyalog APL versions#18.0|18.0]], [[wikipedia:FMA instruction set|FMA3]] is used to implement [[Divide|division]] by a [[singleton]]. | |||
It also uses the POWER8 [https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/vec_gbb.html gather-bits-by-bytes] instruction, which is equivalent to transposing an 8x8 bit matrix for [[Boolean]] [[Transpose]] since version 15.0 (expanded in applicability in 16.0) and the fused multiply-add instruction for division like x86 FMA3 in 18.0. | It also uses the POWER8 [https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/vec_gbb.html gather-bits-by-bytes] instruction, which is equivalent to transposing an 8x8 bit matrix for [[Boolean]] [[Transpose]] since version 15.0 (expanded in applicability in 16.0) and the fused multiply-add instruction for division like x86 FMA3 in 18.0. |
Revision as of 09:49, 22 November 2019
Dyalog APL, or simply Dyalog, is a modern APL in the APL2 tradition, first released by British company Dyadic Systems Ltd. (now Dyalog Ltd.) in 1983 for the Zylog Z80 processor (the name Dyalog is a portmanteau of Dyadic and Zylog). Dyalog supports several platforms and interfaces with many languages and runtimes including native shared libraries, .NET, the JVM, R, and Python. It is actively developed and has introduced many new primitives and concepts to array programming. Major categories of features introduced to APL by Dyalog are tacit programming by allowing named derived functions and later trains, lexically-scoped functional programming using dfns, namespaces and object-oriented programming, and the addition of leading axis theory and the Rank operator to the nested array paradigm.
In 1995, two Dyalog developers—John Scholes and Peter Donnelly—were awarded the Iverson Award for their work on the interpreter. Gitte Christensen and Morten Kromberg were joint recipients of the Iverson Award in 2016.
Versions
- Main article: Dyalog APL versions
Dyalog lists historical versions, along with release notes since 14.0, on its website. Its early history is recounted in more detail by Pete Donnelly in Dyalog APL: A Personal History (pdf).
Number | Year | Month | Features |
---|---|---|---|
1 | 1983 | April | (Zilog S8000 only) |
2 | 1984 | (Many more platforms) | |
3.0 | 1985 | (More platforms) Rectangular display of arrays | |
4.0 | 1986 | October | User-defined operators, Assignment for functions (including derived functions), ⎕MONITOR
|
5.0 | 1987 | April | Nested array editor |
5.1 | 1988 | April | (first version for DOS) User-defined input/output tables, ⎕SM and ⎕SR , windowed editor/tracer, interface to GSS/CGI
|
5.2 | 1990 | January | Naked trace |
6.0 | 1990 | April | GUI IDE |
6.1 | 1990 | October | ⎕ED
|
6.2.1 | 1992 | July | (first version for Windows) ⎕WC , ⎕DQ , etc.
|
6.3.1 | 1993 | April | ⎕NA , graphical, clipboard and printer objects
|
7.0.1 | 1994 | August | Namespaces, additional GUI objects |
7.1 | 1995 | May | ⎕CS , GUI objects as namespaces, greater APL2 compatibility
|
8.0 | 1996 | May | Keywords (:If /:Else , :Repeat /:Until , :Trap , and so on), ⎕PATH , additional GUI objects, OLE
|
8.1 | 1997 | March | dfns with lexical scope, syntax colouring, TCPSocket object, OLE client/server, automatic file tie numbers |
8.2 | 1999 | January | Windowed Reduction and scalar functions with axis (from APL2), Threading with Spawn (& ), ActiveX, :With , additional GUI objects
|
9.0 | 2000 | September | Namespace references (instead of string names) and dot syntax, context-sensitive help (F1), aditional GUI objects with animation |
9.0.1 | 2001 | January | (Windows CE) Pocket APL |
9.0.2 | 2002 | January | .NET support |
9.5 | 2002 | September | |
10.0 | 2003 | March | ⎕NULL , ⎕MAP , idiom recognition (mapped files), retained hash tables, .NET support built-in, run-time workspace as .exe, auto-completion, mapped
|
10.1 | 2004 | July | Multiple arguments in tradfn headers, thread tokens, 64-bit component files, value tips |
11.0 | 2006 | October | Object-oriented programming (classes, objects, interfaces) modelled after C#, Index (⌷ ), Power operator (⍣ ), GCD (∨ ), LCM (∧ )
|
12.0 | 2008 | August | Unicode support (⎕AVU , ⎕UCS ), ⎕FCOPY , ⎕FPROPS
|
12.1 | 2009 | November | I-Beam (⌶ ), Table (⍪ ), ⎕XML , ⎕FCHK , User commands
|
13.0 | 2011 | April | Left (⊣ ), Right (⊢ ), Variant (⍠ ), ⎕OPT , ⎕R , ⎕S , ⎕PROFILE , ⎕RSI , complex number and decimal float support, short arguments for Take, Drop, and Index (↑ , ↓ , ⌷ )
|
13.1 | 2012 | April | ⎕DMX , ⎕FHIST
|
13.2 | 2013 | January | Array Editor |
14.0 | 2014 | June | Trains, Tally (≢ ), Key (⌸ ), Rank operator (⍤ ), multi-threading with futures and isolates
|
14.1 | 2015 | June | :Disposable .NET objects and resources, gesture support, many new I-beams
|
15.0 | 2016 | June | ⎕MKDIR , ⎕NDELETE , ⎕NEXISTS , ⎕NGET , ⎕NINFO , ⎕NPARTS , ⎕NPUT
|
16.0 | 2017 | June | At (@ ), Interval Index (⍸ ), Where (⍸ ), Nest (⊆ ), Partition (⊆ ), Stencil (⌺ ), ⎕JSON , ⎕CSV
|
17.0 | 2018 | July | ⎕NCOPY , ⎕NMOVE
|
17.1 | 2019 | October | Duplicates in Interval Index (⍸ ) look-up array
|
18.0 | Unreleased | Atop (⍤ ), Over (⍥ ), Constant (⍨ ), Unique Mask (≠ ), duplicates from Where (⍸ ), empty partitions from Partitioned Enclose (⊂ ), multi-line session input, date-time conversion, case folding/mapping (⎕C )
|
Primitives
Functions
Glyph | Monadic | Dyadic |
---|---|---|
+ |
Conjugate | Plus |
- |
Negate | Minus |
× |
Signum | Times |
÷ |
Reciprocal | Divide |
| |
Magnitude | Residue |
⌊ |
Floor | Minimum |
⌈ |
Ceiling | Maximum |
* |
Exponential | Power |
⍟ |
Natural Logarithm | Logarithm |
! |
Factorial | Binomial |
○ |
Pi Times | Circular |
~ |
Not | Without |
? |
Roll | Query |
∧ |
And | |
∨ |
Or | |
⍲ |
Nand | |
⍱ |
Nor | |
< |
Less | |
≤ |
Less Or Equal | |
= |
Equal | |
≥ |
Greater Or Equal | |
> |
Greater | |
≠ |
Unique Mask | Not Equal |
⍴ |
Shape | Reshape |
, |
Ravel | Catenate |
⍪ |
Table | Catenate First |
⌽ |
Reverse | Rotate |
⊖ |
Reverse First | Rotate First |
⍉ |
Transpose | |
↑ |
Mix/Disclose | Take |
↓ |
Split | Drop |
⊂ |
Enclose | Partitioned Enclose |
⊆ |
Nest | Partition |
∊ |
Enlist/Type | Membership |
⊃ |
Disclose/Mix | Pick |
/ |
Replicate | |
⌿ |
Replicate First | |
\ |
Expand | |
⍀ |
Expand First | |
∩ |
Intersection | |
∪ |
Unique | Union |
⊣ |
Same | Left |
⊢ |
Same | Right |
⍳ |
Index Generator | Index Of |
⍸ |
Where | Interval Index |
⍒ |
Grade Down | |
⍋ |
Grade Up | |
⍷ |
Find | |
≡ |
Depth | Match |
≢ |
Tally | Not Match |
⍎ |
Execute | |
⍕ |
Format | |
⊥ |
Base | |
⊤ |
Represent | |
⌹ |
Matrix Inverse | Matrix Divide |
⌷ |
Materialise | Squad Indexing |
Operators
Syntax | Monadic call | Dyadic call |
---|---|---|
f/ |
Reduction | Windowed Reduction |
f⌿ |
Reduction First | Windowed Reduction First |
f\ |
Scan | |
f⍀ |
Scan First | |
f¨ |
Each | |
f⍨ |
Commute | |
A⍨ |
Constant | |
f⍣v |
Power | |
f.g |
Inner Product | |
∘.f |
Outer Product | |
A∘g |
Bind | |
f∘B |
||
f∘g |
Beside | |
f⍤B |
Rank | |
f⍤g |
Atop | |
f⍥g |
Over | |
f@v |
At | |
f⍠B |
Variant | |
f⌸ |
Key | |
f⌺B |
Stencil | |
A⌶ |
I-Beam | |
f& |
Spawn | |
f[B] |
Axis |
Implementation
Internal types
Dyalog uses the following numeric types:
- 1-bit packed Boolean
- 1-byte integer
- 2-byte integer
- 4-byte integer
- 8-byte double
- 16-byte complex (one double for each component)
- 16-byte decimal float (BID or DPD)
Character encodings differ for classic and unicode interpreters: classic interpreters use a custom 1-byte encoding for all characters, and are limited to a 256-character set, while unicode characters are 1-, 2-, or 4-byte unsigned code point values.
Instruction set usage
Dyalog makes heavy use of vector instructions on all platforms, as well as other special instruction sets primarily on x86. Instruction set availability is checked at runtime, so that the minimum required instruction set remains low:
- For 32-bit x86, only SSE2 is required.
- For x86_64, there is no minimum requirement as every processor supports SSE2. SSE4.1 is required on macOS as all x86 Apple machines support this instruction set.
- For ARM32, there is no minimum requirement.
- As of version 17.1, POWER7 and above are supported. Support for older systems is dropped because Dyalog compiles separate binaries for each POWER architecture.
In Dyalog 17.0, the code for vectorised scalar functions was unified and extended to allow Intel AVX2 and ARM NEON in addition to Intel SSE2 and SSE4.1, and AltiVec VMX for IBM POWER. This code is also used for operations involving the scalar dyadics Plus, Minus, Times, Divide, Maximum, Minimum, and comparison functions, as well as some functions derived from operators applied to these functions, such as the Outer Product and Inner Product.
Dyalog also uses many other x86 extensions:
- Since at least 12.1, SSE2 is used for scalar dyadics.
- Since 17.0, AVX2 is used for scalar dyadics if available.
- Since 14.1, SSE4.1 is used for Minimum and Maximum, and finding the range of an array. AVX2 can also be used for these purposes in 18.0.
- Since 17.0, SSSE3 is used primarily for the shuffle instruction for permuting arrays and searching small lookup tables.
- Since 14.0, SSE4.2 POPCNT is used to sum Boolean arrays.
- Since 14.0, SSE4.2 CRC32 is used to compute fast hash functions.
- Since 15.0, BMI2 is used for Boolean matrix transpose. Since 16.0, it is used for Boolean Compress and Expand, and several structural functions on Boolean arrays.
- Since 18.0, CLMUL is used for xor reductions and scans.
- Since 18.0, FMA3 is used to implement division by a singleton.
It also uses the POWER8 gather-bits-by-bytes instruction, which is equivalent to transposing an 8x8 bit matrix for Boolean Transpose since version 15.0 (expanded in applicability in 16.0) and the fused multiply-add instruction for division like x86 FMA3 in 18.0.
External links
APL dialects [edit] | |
---|---|
Maintained | APL+Win ∙ APL2 ∙ APL64 ∙ APL\iv ∙ Aplette ∙ April ∙ Co-dfns ∙ Dyalog APL ∙ Dyalog APL Vision ∙ dzaima/APL ∙ GNU APL ∙ Kap ∙ NARS2000 ∙ Pometo ∙ TinyAPL |
Historical | A Programming Language ∙ A+ (A) ∙ APL# ∙ APL2C ∙ APL\360 ∙ APL/700 ∙ APL\1130 ∙ APL\3000 ∙ APL.68000 ∙ APL*PLUS ∙ APL.jl ∙ APL.SV ∙ APLX ∙ Extended Dyalog APL ∙ Iverson notation ∙ IVSYS/7090 ∙ NARS ∙ ngn/apl ∙ openAPL ∙ Operators and Functions ∙ PAT ∙ Rowan ∙ SAX ∙ SHARP APL ∙ Rationalized APL ∙ VisualAPL (APLNext) ∙ VS APL ∙ York APL |
Derivatives | AHPL ∙ BQN ∙ CoSy ∙ ELI ∙ Glee ∙ I ∙ Ivy ∙ J ∙ Jelly ∙ K (Goal, Klong, Q) ∙ KamilaLisp ∙ Lang5 ∙ Lil ∙ Nial ∙ RAD ∙ Uiua |
Overviews | Comparison of APL dialects ∙ Timeline of array languages ∙ Timeline of influential array languages ∙ Family tree of array languages |