Performance: Difference between revisions

Jump to navigation Jump to search
22 bytes added ,  20:43, 9 November 2020
m
(Section on overall philosophy of APL performance in practice)
(One intermediate revision by the same user not shown)
Line 11: Line 11:
To make array operations fast, arrays are stored in a regular format in memory, so that all data about the format can be kept in the array's "header" and read only before, not during, data processing. This means that elements in the array must have the same storage type for efficient operation. [[Flat array model]] languages with or without boxes require that arrays have a homogeneous type (all numbers, all characters, or all boxes); [[Nested array model|nested]] APLs do not, but store homogeneous arrays differently and aren't as efficient when processing mixed-type arrays. Fast APLs tend to use several [[internal type]]s to operate efficiently on subsets of the available numbers like integers or booleans, and may [[Array model#Numeric type coercion|coerce]] numbers to a common type to ensure that an array is homogeneous in memory.
To make array operations fast, arrays are stored in a regular format in memory, so that all data about the format can be kept in the array's "header" and read only before, not during, data processing. This means that elements in the array must have the same storage type for efficient operation. [[Flat array model]] languages with or without boxes require that arrays have a homogeneous type (all numbers, all characters, or all boxes); [[Nested array model|nested]] APLs do not, but store homogeneous arrays differently and aren't as efficient when processing mixed-type arrays. Fast APLs tend to use several [[internal type]]s to operate efficiently on subsets of the available numbers like integers or booleans, and may [[Array model#Numeric type coercion|coerce]] numbers to a common type to ensure that an array is homogeneous in memory.


In the simplest case, homogeneous array storage allows an implementation to use specialized code to run a primitive on each type. Because this special code is written in a low-level language with complete knowledge of the types in use, it can attain a throughput similar to that of the implementation language (usually a low-level language such as [[wikipedia:C (programming language)|C]]) for the primitive—although this will tend to leave overall algorithms running slower than that language because in APL they have been split into multiple passes. APL's array operations are also ideal for implementation with [[wikipedia:SIMD|SIMD]], or "single instruction, multiple data", operations, that perform a single action on several different values. In some cases, such as [[scalar function]]s, the primitives are SIMD operations; in others such as [[Reverse]], they are easily implemented using SIMD—for Reverse, SIMD selection of "shuffle". While experimental SIMD machines (such as the APL-influenced [[wikipedia:CDC Star-100|CDC Star-100]]) were created as early as the 1960s, SIMD computing first entered the personal computing mainstream in the 1990s and has steadily grown in prominence for high-performance computing since then. In APL, CPU vector instruction sets such as Intel's [[wikipedia:SSE|SSE]] are the most often way to access SIMD optimization, although [[Co-dfns]] instead runs on a GPU to attain much higher throughput at the cost of increased overhead and restriction of available algorithms.
In the simplest case, homogeneous array storage allows an implementation to use specialized code to run a primitive on each type. Because this special code is written in a low-level language with complete knowledge of the types in use, it can attain a throughput similar to that of the implementation language (usually a low-level language such as [[wikipedia:C (programming language)|C]]) for the primitive—although this will tend to leave overall algorithms running slower than that language because in APL they have been split into multiple passes. APL's array operations are also ideal for implementation with [[wikipedia:SIMD|SIMD]], or "single instruction, multiple data", operations, that perform a single action on several different values. In some cases, such as [[scalar function]]s, the primitives are SIMD operations; in others such as [[Reverse]], they are easily implemented using SIMD—for Reverse, SIMD selection or "shuffle". While experimental SIMD machines (such as the APL-influenced [[wikipedia:CDC Star-100|CDC Star-100]]) were created as early as the 1960s, SIMD computing first entered the personal computing mainstream in the 1990s and has steadily grown in prominence for high-performance computing since then. In APL, CPU vector instruction sets such as Intel's [[wikipedia:Streaming SIMD Extensions|SSE]] are the most often way to access SIMD optimization, although [[Co-dfns]] instead runs on a GPU to attain much higher throughput at the cost of increased overhead and restriction of available algorithms.


Not all primitives are equally amenable to SIMD implementation or other high-performance techniques. In order to get the most out of an APL implementation, the programmer must take advantage of the fast operations available, which may require significant testing to determine when primitives perform well and alternative strategies either to work around cases where an operation is slow or to take advantage of information the programmer knows but the language doesn't.
Not all primitives are equally amenable to SIMD implementation or other high-performance techniques. In order to get the most out of an APL implementation, the programmer must take advantage of the fast operations available, which may require significant testing to determine when primitives perform well and alternative strategies either to work around cases where an operation is slow or to take advantage of information the programmer knows but the language doesn't.

Navigation menu