Dyalog APL: Difference between revisions

Dyalog APL (view source)

Revision as of 15:46, 21 November 2019

2,437 bytes added , 15:46, 21 November 2019

→‎Implementation: Instruction sets

Marshall

Bureaucrats, Interface administrators, Administrators, trusted

2,951

edits

@@ Line 290: / Line 290: @@
 Character encodings differ for classic and unicode interpreters: classic interpreters use a custom 1-byte encoding for all characters, and are limited to a 256-character set, while unicode characters are 1-, 2-, or 4-byte unsigned [[wikipedia:code point|code point]] values.
+=== Instruction set usage ===
+Dyalog makes heavy use of [[vector instructions]] on all platforms, as well as other special instruction sets primarily on x86. Instruction set availability is checked at runtime, so that the minimum required instruction set remains low:
+* For 32-bit x86, only [[wikipedia:SSE2|SSE2]] is required.
+* For x86_64, there is no minimum requirement as every processor supports SSE2. SSE4.1 is required on macOS as all x86 Apple machines support this instruction set.
+* For ARM32, there is no minimum requirement.
+* As of version 17.1, POWER7 and above are supported. Support for older systems is dropped because Dyalog compiles separate binaries for each POWER architecture.
+In Dyalog 17.0, the code for vectorised [[scalar function]]s was unified and extended to allow Intel [[wikipedia:AVX2|AVX2]] and ARM NEON in addition to Intel [[wikipedia:SSE2|SSE2]] and [[wikipedia:SSE4.1|SSE4.1]], and AltiVec VMX for IBM POWER. This code is also used for operations involving the scalar dyadics [[Plus]], [[Minus]], [[Times]], [[Divide]], [[Maximum]], [[Minimum]], and [[comparison function]]s, as well as some functions derived from operators applied to these functions, such as the [[Outer Product]] and [[Inner Product]].
+Dyalog also uses many other x86 extensions: in version 18.0,
+* [[wikipedia:SSE2|SSE2]], [[wikipedia:SSE4.1|SSE4.1]], and [[wikipedia:AVX2|AVX2]] are used for [[scalar dyadic]]s.
+* [[wikipedia:SSSE3|SSSE3]] is used primarily for the shuffle instruction for permuting arrays and searching small lookup tables.
+* [[wikipedia:SSE4.2|SSE4.2]] POPCNT is used to sum Boolean arrays.
+* [[wikipedia:SSE4.2|SSE4.2]] CRC32 is used to compute fast hash functions.
+* [[wikipedia:BMI2|BMI2]] is used for Boolean [[Compress]] and [[Expand]], and several [[structural function]]s on Boolean arrays.
+* [[wikipedia:CLMUL instruction set|CLMUL]] is used for [[xor]] [[reduction]]s and [[scan]]s (new in 18.0).
+* [[wikipedia:FMA instruction set|FMA3]] is used to implement [[Divide|division]] by a [[singleton]] (new in 18.0).
+It also uses the POWER8 [https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/vec_gbb.html gather-bits-by-bytes] instruction, which is equivalent to transposing an 8x8 bit matrix for [[Boolean]] [[Transpose]] since version 15.0 (expanded in applicability in 16.0) and the fused multiply-add instruction for division like x86 FMA3 in 18.0.
 == External links ==

Dyalog APL: Difference between revisions

Dyalog APL (view source)

Revision as of 15:46, 21 November 2019

Navigation menu

Search