Dyalog APL

Dyalog APL, or simply Dyalog, is a nested APL based on NARS and APL2, first released by British company Dyadic Systems Ltd. (now Dyalog Ltd.) in 1983 for the Zilog Z8000 processor (the name Dyalog is a portmanteau of Dyadic and Zilog). Continuously developed since, Dyalog has added support for many programming paradigms including object-oriented programming based on .NET, Lisp-style lexically scoped anonymous functions (dfns), and leading axis and tacit programming support based on J. It supports several platforms and interoperability between them, and interfaces with other languages and runtimes including native shared libraries, .NET, the JVM, R, and Python.

Although it initially received very little commercial interest, Dyalog has steadily grown in prominence and in the 2010s has been the basis of several new APL dialects including ngn/apl, APL\iv, and dzaima/APL. Even in APLs not derived from Dyalog such as GNU APL and NARS2000, dfn-style function syntax has become common, and Dyalog has also popularised SHARP APL and J innovations such as the Rank operator and trains among nested APLs.

History
Work on Dyalog was begun in 1981 by APL consulting company Dyadic Systems, which by that time had grown to support about 15 employees. In partnership with Zilog UK, Dyadic developed an interpreter using the C programming language for the Zilog Z8000's UNIX operating system—both obscure technologies at the time. Dyadic employees John Scholes and Geoff Streeter worked full-time on the implementation, while David Crossley managed its development as a part-time role. Initially aiming to produce something like SHARP APL, they eventually chose a nested model rather than adding boxes to the flat array model like SHARP, and drew most design decisions from STSC's experimental NARS dialect and the material available at the time regarding IBM's plans for APL2.

Released at APL83, Dyalog sold very few copies due to the lack of interest in either Unix or the nested array model. Subsequent sales were also limited, with only a single licence sold through Zilog partly due to the Z8000's limited popularity. Supported by Dyadic's APL consulting and later by sales of Unix hardware, Scholes and Streeter continued work on Dyalog by porting it to a wide variety of Unix systems in response to requests from users; in 1995, Scholes was awarded the Iverson Award jointly with Peter Donnelly for his work on the Dyalog interpreter. Although it continued to run significant losses every year, Dyalog slowly acquired users during the 1980s, including current client SimCorp. In 1991, Dyadic hired John Daintree to begin work on the  graphical user interface for Microsoft Windows; Dyalog for Windows debuted at APL92 and quickly become Dyalog's main platform.

In 1996, John Scholes introduced a new form of functional definition to Dyalog based on his studies of the functional programming language Scheme, which he called dfns, for "direct functions". Rarely used for many years, dfns have become a common APL feature, with many newer APLs removing traditional defined functions from the language in favour of dfns. Another major addition to the language began in 2000, when John Daintree was invited to participate in the design of Microsoft's .NET. Based on this work, and the namespaces which he had added to the language in 1994, Daintree developed an object model for Dyalog, which was released in 2006. These efforts also led to the new language called APL, which was first released in 2010 but abandoned in 2012 when Microsoft deprecated Silverlight.

In the 2010s, Dyalog development began to focus on performance, which had been improved out of necessity in early releases but had not been a major focus. In 2010 Dyalog Ltd. hired Jay Foad, a compiler developer who initially created a bytecode compiler for APL and later improved performance in other ways, and served as CTO from 2016 until his departure in 2019. In 2011 the company hired Roger Hui, developer of J, and in 2016 it also hired J programmer and language implementor Marshall Lochbaum. Both developers improved performance of Dyalog's primitives on flat arrays, and brought concepts such as the Rank operator, trains, and composition operators Atop and Over from J to APL.

The Dyalog interpreter has also incorporated significant components written in APL in the 2000s and 2010s. Dan Baronet, hired in 2006, introduced the SALT (Simple APL Library Toolkit) system to distribute APL code, and user commands based on it, in version 12.0 in 2008. Work on APL components of Dyalog has also been done by Brian Becker and Adám Brudzewsky. Brudzewsky, hired in 2015, has also driven the adoption of new functionality such as Nest and array notation in Dyalog.

Platforms
Besides for working out-of-the-box on Windows, macOS, and AIX, Dyalog APL runs on many Linuxes. However, to function, some require additional action beyond simple installation. As of 2019-05-15, the necessary actions for versions 16.0, 17.0, and 17.1 are:

Versions
Dyalog lists historical versions, along with release notes since 14.0, on its website. Its early history is recounted in more detail by Pete Donnelly in Dyalog APL: A Personal History (pdf).

Implementation
Dyalog APL is implemented primarily in C with some parts implemented in C++ in order to use templates. C intrinsics are used to access instruction set extensions. Some architecture-specific assembly, both compiled separately and inline from C, is used for functionality like exception flags which is not easily accessible in C. Prior to version 17.0, assembly was also used for vectorized arithmetic. In 17.0, this code was replaced by a new C++ implementation.

Internal types
Dyalog uses the following numeric types:
 * 1-bit packed Boolean
 * 1-byte integer
 * 2-byte integer
 * 4-byte integer
 * 8-byte double
 * 16-byte complex (one double for each component)
 * 16-byte decimal float "decf" (BID or DPD)

Character encodings differ for classic and unicode interpreters: classic interpreters use a custom 1-byte encoding for all characters, and are limited to a 256-character set, while characters in unicode interpreters are 1-, 2-, or 4-byte unsigned unicode code point values.

Nested and mixed arrays (that is, pointer arrays) are always stored as arrays of pointers, while simple numeric or character arrays are always stored using one of the above types. For both numbers and characters, an array may be represented using any type that can contain all the values. The interpreter may reduce the type of an array to the minimum possible ("squeeze" the array) during execution.

Because there is no complex representation using decimal floats for the components, arrays containing both decimal floats and complex numbers have no common representation. Dyalog converts such arrays to complex numbers, resulting in a loss of precision for decf elements.

Instruction set usage
Dyalog makes heavy use of vector instructions on all platforms, as well as other special instruction sets primarily on x86. Instruction set availability is checked at runtime, so that the minimum required instruction set remains low:
 * For 32-bit x86, only SSE2 is required.
 * For x86_64, there is no minimum requirement as every processor supports SSE2. SSE4.1 is required on macOS as all x86 Apple machines support this instruction set.
 * For ARM32, there is no minimum requirement.
 * As of version 17.1, POWER7 and above are supported. Support for older systems is dropped because Dyalog compiles separate binaries for each POWER architecture.

In Dyalog 17.0, the code for vectorized scalar functions was unified and extended to allow Intel AVX2 and ARM NEON in addition to Intel SSE2 and SSE4.1, and AltiVec VMX for IBM POWER. This code is also used for operations involving the scalar dyadics Plus, Minus, Times, Divide, Maximum, Minimum, and comparison functions, as well as some functions derived from operators applied to these functions, such as the Outer Product and Inner Product.

Dyalog also uses many other x86 extensions:
 * Since at least 12.1, SSE2 is used for scalar dyadics.
 * Since 17.0, AVX2 is used for scalar dyadics if available.
 * Since 14.1, SSE4.1 is used for Minimum and Maximum, and finding the range of an array. AVX2 can also be used for these purposes in 18.0.
 * Since 17.0, SSSE3 is used primarily for the shuffle instruction for permuting arrays and searching small lookup tables.
 * Since 14.0, SSE4.2 POPCNT is used to sum Boolean arrays.
 * Since 14.0, SSE4.2 CRC32 is used to compute fast hash functions.
 * Since 15.0, BMI2 is used for Boolean matrix transpose. Since 16.0, it is used for Boolean Compress and Expand, and several structural functions on Boolean arrays.
 * Since 18.0, CLMUL is used for xor reductions and scans.
 * Since 18.0, FMA3 is used to implement division by a singleton.

It also uses the POWER8 gather-bits-by-bytes instruction, which is equivalent to transposing an 8x8 bit matrix for Boolean Transpose since version 15.0 (expanded in applicability in 16.0) and the fused multiply-add instruction for division like x86 FMA3 in 18.0.