2,951
edits
(Internal types) |
(→Implementation: Instruction sets) |
||
Line 290: | Line 290: | ||
Character encodings differ for classic and unicode interpreters: classic interpreters use a custom 1-byte encoding for all characters, and are limited to a 256-character set, while unicode characters are 1-, 2-, or 4-byte unsigned [[wikipedia:code point|code point]] values. | Character encodings differ for classic and unicode interpreters: classic interpreters use a custom 1-byte encoding for all characters, and are limited to a 256-character set, while unicode characters are 1-, 2-, or 4-byte unsigned [[wikipedia:code point|code point]] values. | ||
=== Instruction set usage === | |||
Dyalog makes heavy use of [[vector instructions]] on all platforms, as well as other special instruction sets primarily on x86. Instruction set availability is checked at runtime, so that the minimum required instruction set remains low: | |||
* For 32-bit x86, only [[wikipedia:SSE2|SSE2]] is required. | |||
* For x86_64, there is no minimum requirement as every processor supports SSE2. SSE4.1 is required on macOS as all x86 Apple machines support this instruction set. | |||
* For ARM32, there is no minimum requirement. | |||
* As of version 17.1, POWER7 and above are supported. Support for older systems is dropped because Dyalog compiles separate binaries for each POWER architecture. | |||
In Dyalog 17.0, the code for vectorised [[scalar function]]s was unified and extended to allow Intel [[wikipedia:AVX2|AVX2]] and ARM NEON in addition to Intel [[wikipedia:SSE2|SSE2]] and [[wikipedia:SSE4.1|SSE4.1]], and AltiVec VMX for IBM POWER. This code is also used for operations involving the scalar dyadics [[Plus]], [[Minus]], [[Times]], [[Divide]], [[Maximum]], [[Minimum]], and [[comparison function]]s, as well as some functions derived from operators applied to these functions, such as the [[Outer Product]] and [[Inner Product]]. | |||
Dyalog also uses many other x86 extensions: in version 18.0, | |||
* [[wikipedia:SSE2|SSE2]], [[wikipedia:SSE4.1|SSE4.1]], and [[wikipedia:AVX2|AVX2]] are used for [[scalar dyadic]]s. | |||
* [[wikipedia:SSSE3|SSSE3]] is used primarily for the shuffle instruction for permuting arrays and searching small lookup tables. | |||
* [[wikipedia:SSE4.2|SSE4.2]] POPCNT is used to sum Boolean arrays. | |||
* [[wikipedia:SSE4.2|SSE4.2]] CRC32 is used to compute fast hash functions. | |||
* [[wikipedia:BMI2|BMI2]] is used for Boolean [[Compress]] and [[Expand]], and several [[structural function]]s on Boolean arrays. | |||
* [[wikipedia:CLMUL instruction set|CLMUL]] is used for [[xor]] [[reduction]]s and [[scan]]s (new in 18.0). | |||
* [[wikipedia:FMA instruction set|FMA3]] is used to implement [[Divide|division]] by a [[singleton]] (new in 18.0). | |||
It also uses the POWER8 [https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/vec_gbb.html gather-bits-by-bytes] instruction, which is equivalent to transposing an 8x8 bit matrix for [[Boolean]] [[Transpose]] since version 15.0 (expanded in applicability in 16.0) and the fused multiply-add instruction for division like x86 FMA3 in 18.0. | |||
== External links == | == External links == |