Dyalog APL: Difference between revisions

Jump to navigation Jump to search
2,437 bytes added ,  15:46, 21 November 2019
→‎Implementation: Instruction sets
(Internal types)
(→‎Implementation: Instruction sets)
Line 290: Line 290:


Character encodings differ for classic and unicode interpreters: classic interpreters use a custom 1-byte encoding for all characters, and are limited to a 256-character set, while unicode characters are 1-, 2-, or 4-byte unsigned [[wikipedia:code point|code point]] values.
Character encodings differ for classic and unicode interpreters: classic interpreters use a custom 1-byte encoding for all characters, and are limited to a 256-character set, while unicode characters are 1-, 2-, or 4-byte unsigned [[wikipedia:code point|code point]] values.
=== Instruction set usage ===
Dyalog makes heavy use of [[vector instructions]] on all platforms, as well as other special instruction sets primarily on x86. Instruction set availability is checked at runtime, so that the minimum required instruction set remains low:
* For 32-bit x86, only [[wikipedia:SSE2|SSE2]] is required.
* For x86_64, there is no minimum requirement as every processor supports SSE2. SSE4.1 is required on macOS as all x86 Apple machines support this instruction set.
* For ARM32, there is no minimum requirement.
* As of version 17.1, POWER7 and above are supported. Support for older systems is dropped because Dyalog compiles separate binaries for each POWER architecture.
In Dyalog 17.0, the code for vectorised [[scalar function]]s was unified and extended to allow Intel [[wikipedia:AVX2|AVX2]] and ARM NEON in addition to Intel [[wikipedia:SSE2|SSE2]] and [[wikipedia:SSE4.1|SSE4.1]], and AltiVec VMX for IBM POWER. This code is also used for operations involving the scalar dyadics [[Plus]], [[Minus]], [[Times]], [[Divide]], [[Maximum]], [[Minimum]], and [[comparison function]]s, as well as some functions derived from operators applied to these functions, such as the [[Outer Product]] and [[Inner Product]].
Dyalog also uses many other x86 extensions: in version 18.0,
* [[wikipedia:SSE2|SSE2]], [[wikipedia:SSE4.1|SSE4.1]], and [[wikipedia:AVX2|AVX2]] are used for [[scalar dyadic]]s.
* [[wikipedia:SSSE3|SSSE3]] is used primarily for the shuffle instruction for permuting arrays and searching small lookup tables.
* [[wikipedia:SSE4.2|SSE4.2]] POPCNT is used to sum Boolean arrays.
* [[wikipedia:SSE4.2|SSE4.2]] CRC32 is used to compute fast hash functions.
* [[wikipedia:BMI2|BMI2]] is used for Boolean [[Compress]] and [[Expand]], and several [[structural function]]s on Boolean arrays.
* [[wikipedia:CLMUL instruction set|CLMUL]] is used for [[xor]] [[reduction]]s and [[scan]]s (new in 18.0).
* [[wikipedia:FMA instruction set|FMA3]] is used to implement [[Divide|division]] by a [[singleton]] (new in 18.0).
It also uses the POWER8 [https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/vec_gbb.html gather-bits-by-bytes] instruction, which is equivalent to transposing an 8x8 bit matrix for [[Boolean]] [[Transpose]] since version 15.0 (expanded in applicability in 16.0) and the fused multiply-add instruction for division like x86 FMA3 in 18.0.


== External links ==
== External links ==

Navigation menu