Dyalog APL: Difference between revisions

Jump to navigation Jump to search
4 bytes added ,  13:58, 25 April 2022
→‎Instruction set usage: 18.0 portions don't apply to 18.2
m (Reverted edits by Adám Brudzewsky (talk) to last revision by Marshall)
Tag: Rollback
(→‎Instruction set usage: 18.0 portions don't apply to 18.2)
Line 361: Line 361:
* Since [[Dyalog APL versions#14.0|14.0]], [[wikipedia:SSE4.2|SSE4.2]] CRC32 is used to compute fast hash functions.
* Since [[Dyalog APL versions#14.0|14.0]], [[wikipedia:SSE4.2|SSE4.2]] CRC32 is used to compute fast hash functions.
* Since [[Dyalog APL versions#15.0|15.0]], [[wikipedia:BMI2|BMI2]] is used for Boolean matrix transpose. Since [[Dyalog APL versions#16.0|16.0]], it is used for Boolean [[Compress]] and [[Expand]], and several [[structural function]]s on Boolean arrays.
* Since [[Dyalog APL versions#15.0|15.0]], [[wikipedia:BMI2|BMI2]] is used for Boolean matrix transpose. Since [[Dyalog APL versions#16.0|16.0]], it is used for Boolean [[Compress]] and [[Expand]], and several [[structural function]]s on Boolean arrays.
* Since [[Dyalog APL versions#18.0|18.0]], [[wikipedia:CLMUL instruction set|CLMUL]] is used for [[xor]] [[reduction]]s and [[scan]]s.
* In [[Dyalog APL versions#18.0|18.0]] only, [[wikipedia:CLMUL instruction set|CLMUL]] is used for [[xor]] [[reduction]]s and [[scan]]s.
* Since [[Dyalog APL versions#18.0|18.0]], [[wikipedia:FMA instruction set|FMA3]] is used to implement [[Divide|division]] by a [[singleton]].
* In [[Dyalog APL versions#18.0|18.0]] only, [[wikipedia:FMA instruction set|FMA3]] is used to implement [[Divide|division]] by a [[singleton]].


It also uses the POWER8 [https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/vec_gbb.html gather-bits-by-bytes] instruction, which is equivalent to transposing an 8x8 bit matrix for [[Boolean]] [[Transpose]] since version 15.0 (expanded in applicability in 16.0) and the fused multiply-add instruction for division like x86 FMA3 in 18.0.
It also uses the POWER8 [https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/vec_gbb.html gather-bits-by-bytes] instruction, which is equivalent to transposing an 8x8 bit matrix for [[Boolean]] [[Transpose]] since version 15.0 (expanded in applicability in 16.0) and the fused multiply-add instruction for division like x86 FMA3 in 18.0.

Navigation menu