Array model: Difference between revisions

From APL Wiki
Jump to navigation Jump to search
(→‎External links: Add references)
m (→‎Based array theory: Link to WP instead of APL Wiki for NumPy and Julia)
 
(5 intermediate revisions by 2 users not shown)
Line 37: Line 37:


A second version of the APL array model was developed in order to more transparently handle nested data, without the need to explicitly box and unbox arrays. The Nested Array Research System ([[NARS]]) was developed to study this model. In it, arrays contain other arrays directly. In this way they resemble the [[wikipedia:inductive type|inductive type]]s used in type theory.
A second version of the APL array model was developed in order to more transparently handle nested data, without the need to explicitly box and unbox arrays. The Nested Array Research System ([[NARS]]) was developed to study this model. In it, arrays contain other arrays directly. In this way they resemble the [[wikipedia:inductive type|inductive type]]s used in type theory.
 
{| class=wikitable style="float:right"
|{{quote | "An array is a rectangular collection of numbers, characters and arrays, arranged along zero or more axes."|[[John Scholes]]}}
|}
In nested APLs each individual number or character is encapsulated in a [[simple scalar]]. Such a scalar may be referred to as "a number" or "a character" but it maintains the properties of an array. Other arrays used by the language are defined inductively: an array can be formed which contains as elements any array which has already been defined. Such an array cannot, by the nature of inductive definition, contain itself, even within many levels of nesting inside it. Arrays which contain only simple scalars, or are themselves simple scalars, are called [[Simple array|simple]]. Non-simple arrays are called "nested". The simple arrays are a superset of the arrays allowed in flat array theory without boxes: they include all arrays of numbers and characters, as well as arrays which mix numbers and characters. Arrays which would not be representable in flat array theory—those which contain a mixture of simple scalar types, or contain both simple scalars and other arrays—are called [[Mixed array|mixed]].
In nested APLs each individual number or character is encapsulated in a [[simple scalar]]. Such a scalar may be referred to as "a number" or "a character" but it maintains the properties of an array. Other arrays used by the language are defined inductively: an array can be formed which contains as elements any array which has already been defined. Such an array cannot, by the nature of inductive definition, contain itself, even within many levels of nesting inside it. Arrays which contain only simple scalars, or are themselves simple scalars, are called [[Simple array|simple]]. Non-simple arrays are called "nested". The simple arrays are a superset of the arrays allowed in flat array theory without boxes: they include all arrays of numbers and characters, as well as arrays which mix numbers and characters. Arrays which would not be representable in flat array theory—those which contain a mixture of simple scalar types, or contain both simple scalars and other arrays—are called [[Mixed array|mixed]].


Line 56: Line 58:
== Based array theory ==
== Based array theory ==


Based array theory discards the principle that all data should be stored in arrays, instead defining basic types such as characters and numbers independently of arrays and arrays as a collection type—possibly one of many—that can contain any data. This model does not have any widely accepted name, with the term "based system" introduced in an [[APL Quote Quad]] paper in 1981.<ref>Randall Mercer. [https://dl.acm.org/doi/abs/10.1145/586656.586663 "A based system for general arrays"]. [[APL Quote Quad]] Volume 12, Issue 2. 1981-12.</ref> However, as it is the natural model when arrays are added to an existing programming system, it is common in array libraries such as [[NumPy]], [[wikipedia:ILNumerics|ILNumerics]], and [[wikipedia:Haskell (programming language)|Haskell]]'s [https://hackage.haskell.org/package/repa Repa], as well as the language [[Julia]]. It is used by the APL-family language [[BQN]].
Based array theory discards the principle that all data should be stored in arrays, instead defining basic types such as characters and numbers independently of arrays and arrays as a collection type—possibly one of many—that can contain any data. This model does not have any widely accepted name, with the term "based system" introduced in an [[APL Quote Quad]] paper in 1981.<ref>Randall Mercer. [https://dl.acm.org/doi/abs/10.1145/586656.586663 "A based system for general arrays"]. [[APL Quote Quad]] Volume 12, Issue 2. 1981-12.</ref> However, as it is the natural model when arrays are added to an existing programming system, it is common in array libraries such as [[wikipedia:NumPy|NumPy]], [[wikipedia:ILNumerics|ILNumerics]], and [[wikipedia:Haskell (programming language)|Haskell]]'s [https://hackage.haskell.org/package/repa Repa], as well as the language [[wikipedia:Julia (programming language)|Julia]]. It has also been promoted with the mnemonic "APL+1" by Jacob Brickman<ref>Jacob Brickman. "Behavior of APL primitives in a system with depth-1 scalars (APL+1)". [[APL-Germany]] ''APL-Journal'' 2019 1-2 ([https://apl-germany.de/wp-content/uploads/2020/06/APL_Journal_2019_12.pdf#page=5 pdf]).</ref>, and is used by the APL-family language [[BQN]].


=== Mutable based arrays ===
=== Mutable based arrays ===
Line 93: Line 95:
In nested APLs, a simple non-scalar array has depth 1, an array containing only depth 1 arrays has depth 2, and a simple scalar (e.g a number or character) has depth 0.
In nested APLs, a simple non-scalar array has depth 1, an array containing only depth 1 arrays has depth 2, and a simple scalar (e.g a number or character) has depth 0.


Most APLs provide a Depth function <source lang=apl inline>≡</source> to find an array's depth. For example:
Most APLs provide a Depth function <syntaxhighlight lang=apl inline>≡</syntaxhighlight> to find an array's depth. For example:
<source lang=apl>
<syntaxhighlight lang=apl>
       ≡('ab' 'cde')('fg' 'hi')
       ≡('ab' 'cde')('fg' 'hi')
3
3
</source>
</syntaxhighlight>


APLs vary in their definition of depth: for example some may return the depth with a sign to indicate that some level of the array mixes elements of different depths.
APLs vary in their definition of depth: for example some may return the depth with a sign to indicate that some level of the array mixes elements of different depths.
Line 106: Line 108:
* [https://chat.stackexchange.com/rooms/52405/conversation/lesson-1-introduction-to-arrays-in-apl APL Cultivation]
* [https://chat.stackexchange.com/rooms/52405/conversation/lesson-1-introduction-to-arrays-in-apl APL Cultivation]
* [https://www.sacrideo.us/tag/apl-a-day/ APL a Day] series
* [https://www.sacrideo.us/tag/apl-a-day/ APL a Day] series
* [https://www.jsoftware.com/papers/array.htm What is an Array?] by [[Roger Hui]] (in [[J]])
* [https://www.jsoftware.com/papers/array.htm What is an Array?] by [[Roger Hui]]


== References ==
== References ==
<references />
<references />
{{APL features}}[[Category:Arrays| ]]
{{APL features}}[[Category:Arrays| ]]

Latest revision as of 20:27, 3 March 2024

This page describes the array datatype as defined by array languages. For the role of arrays in APL syntax, see Array.

The distinguishing feature of APL and the array language family is its focus on arrays. In most array languages the array is the only first class datatype. While this sounds like a very strict model of language design, in fact it imposes no restrictions at all: any kind of data can be treated as a scalar, or array with rank 0!

APL's array model is distinct from and richer than the one-dimensional data structures given the name "array" in languages such as Python, Javascript, and Java. These structures correspond to APL vectors, sometimes with the requirement that all elements have the same type. In APL it is the arrangement of data into a multidimensional shape, and not any requirement about the way it is stored or the type of its elements, that defines an array. APL arrays are most closely related to multidimensional FORTRAN or C arrays.

An array is a rectangular collection of elements, arranged along zero or more axes. The number of axes is called the array's rank while their lengths make up the shape. Names are given to arrays with particular ranks:

  • An array with 0 axes is a scalar.
  • An array with 1 axis is a vector.
  • An array with 2 axes is a matrix.

An array's shape is then a vector whose elements are axis lengths, and its rank is a scalar.

The largest divide among APLs hinges on the definition of "element" above. In flat array theory elements are not arrays: they are simple data such as characters and numbers. In nested array theory the elements can only be arrays, and characters and numbers are held to be contained in simple scalars, arrays which by convention contain themselves as elements. In each case arrays may be considered to be homogeneous because all of their elements have the same type. In flat array languages this means arrays are restricted to contain only one type of data; in nested arrays it means that the elements of an array are all of the array "type"—that is, the type of all first-class values in the language!

In a language with stranding, such as APL2, creating a 1-dimensional array is very simple: just write the elements next to each other, separated by spaces. Nested APLs such as APL2 allow any array to be used as an element, including scalar numbers or characters (written with quotes) as well as larger arrays. In order to include a stranded array inside another array it must be parenthesized.

In most APLs, "string" is just a term for a character vector, and a string may be written with single quotes around the entire string.

"Array languages" without arrays

Some languages, despite deriving from APL, do not use APL-style arrays at all! Examples include K and I, which only have vectors, and MATLAB, which has true multidimensional arrays but usually treats data as matrices. Languages like K are usually considered part of the array language or APL family, but may or may not be considered array languages themselves.

Flat array theory

In Iverson notation arrays were considered to contain numbers (or Booleans, before these were unified with ordinary numbers), which are not themselves arrays. The property that array elements are some non-array type is the defining feature of flat array theory.

Flat APLs impose the rule that all elements of arrays have the same type, such as all character or all numeric. IBM's APL\360 was likely the first to specify this rule explicitly, and it has been maintained in newer languages such as SHARP APL and J. Although it is possible to discard this rule, resulting in an inhomogeneous array theory that allows arrays to contain elements of mixed type, but not other arrays, no APL to date has done this.

Boxes

Main article: Box

In order to allow programmers to work with inhomogeneous or nested data, flat array languages may define a special kind of element which "encloses" or "boxes" an array. Then there are three allowed element types for an array: character, numeric, and boxed.

While a boxed array represents a collection of arrays, it is not considered to contain those arrays—its elements are boxes, and not their contents. For this reason scalar functions do not reach into boxes: they act on the elements of an array directly. Thus Equal to on two boxes compares them, with a single Boolean result indicating whether the arrays inside the boxes match.

Nested array theory

A second version of the APL array model was developed in order to more transparently handle nested data, without the need to explicitly box and unbox arrays. The Nested Array Research System (NARS) was developed to study this model. In it, arrays contain other arrays directly. In this way they resemble the inductive types used in type theory.

"An array is a rectangular collection of numbers, characters and arrays, arranged along zero or more axes."

John Scholes

In nested APLs each individual number or character is encapsulated in a simple scalar. Such a scalar may be referred to as "a number" or "a character" but it maintains the properties of an array. Other arrays used by the language are defined inductively: an array can be formed which contains as elements any array which has already been defined. Such an array cannot, by the nature of inductive definition, contain itself, even within many levels of nesting inside it. Arrays which contain only simple scalars, or are themselves simple scalars, are called simple. Non-simple arrays are called "nested". The simple arrays are a superset of the arrays allowed in flat array theory without boxes: they include all arrays of numbers and characters, as well as arrays which mix numbers and characters. Arrays which would not be representable in flat array theory—those which contain a mixture of simple scalar types, or contain both simple scalars and other arrays—are called mixed.

The programmer can Pick an element of an array directly. The resulting element is always an array, even for simple arrays: Pick never returns something which is "just a number". This may be viewed in multiple ways: either an array's elements are in fact always arrays, or Pick and similar functions wrap non-array elements so they are still arrays. An APL could be defined which gives an error rather than allow a program to pick into a simple scalar array. Another choice might be to return a non-array character, but an APL which allowed such values to be used might no longer be considered an array language.

Whether an array language is flat or nested depends only on the language's behavior from the programmer's perspective. Nearly all APLs use homogeneous flat arrays for implementation purposes, with pointers to enclose elements. However, the language's array model is determined by what is presented to the programmer and not what is stored in memory. If a programmer can create and manipulate arrays with both character and numeric data the same way they would work with completely numeric arrays, then the language is nested!

Floating arrays

All nested APLs to date define simple scalars to be "floating", that is, a simple scalar is identical to its enclose (or any scalar array containing only that simple scalar). It is this property that justifies the term "simple scalar": a simple array contains only simple scalars, so a simple array which is scalar must contain a single simple scalar and thus be identical to it. Therefore the only arrays which are both simple and scalar are the simple scalars.

With floating arrays a simple scalar may be thought of as an infinite stack of scalar arrays. Any attempt to Enclose or Disclose this array results in the same kind of infinite stack, so it should be identical to the scalar itself.

Floating arrays represent a departure from a true inductive type. To produce floating array theory from type theory, fixed arrays must be defined using simple scalars and inductive definition, and then simple scalars and scalar arrays containing them must be explicitly identified. Not making this identification would result in an array model not present in any APL: a "fixed" rather than "floating" nested array theory.

Flat array theory is often called "grounded" in contrast to "floating" nested array theory.

Based array theory

Based array theory discards the principle that all data should be stored in arrays, instead defining basic types such as characters and numbers independently of arrays and arrays as a collection type—possibly one of many—that can contain any data. This model does not have any widely accepted name, with the term "based system" introduced in an APL Quote Quad paper in 1981.[1] However, as it is the natural model when arrays are added to an existing programming system, it is common in array libraries such as NumPy, ILNumerics, and Haskell's Repa, as well as the language Julia. It has also been promoted with the mnemonic "APL+1" by Jacob Brickman[2], and is used by the APL-family language BQN.

Mutable based arrays

In many languages with this array style, such as NumPy and Julia, the arrays are mutable, meaning that copies of an array can be made, so that one copy reflects changes made to any copy. In contrast, APL operations that appear to modify an array, like indexed assignment, will only change the particular copy of the array used, and can be said to create a new array rather than change an existing one: there is no special connection between the old and modified array. Mutable arrays make it possible for an array to contain itself, by replacing one element of an existing array with the whole array. This means that more values are possible than in an immutable based array language, and that some properties of immutable arrays, such as a finite depth, do not hold.

Other features of the array model

Array prototypes

Main article: Array prototype

An empty array may carry additional information to indicate what type its elements would be, if it had any. In flat array theory, this is typically just the type: character, numeric, or boxed. In a nested array language, the prototype may be quite complex, containing an entire nested structure.

An array's prototype is used to determine the value of fills when they are required by the language.

Numeric type coercion

Most APLs, flat or nested, implicitly store simple numeric arrays as one of many numeric types. When a numeric array is formed from numbers with different types, all numbers are converted to a common type in order to be represented as a flat array. If the hierarchy of numeric types is not strict, that is, there are some pairs of numeric types for which neither type is a subset of the other, then this coercion may affect the behavior of the numbers in the array. For example, J on a 64-bit machine uses both 64-bit integers and double-precision floats. Catenating the two results in an array of doubles, which will lose precision for integers whose absolute value is larger than 253. In Dyalog APL a similar issue occurs with decimal floats and complex numbers: combining the two results in an array of complex numbers, but this loses precision since Dyalog's complex numbers are stored as pairs of double-precision floats and its 128-bit decimal floats have higher precision than doubles.

Array characteristics

APL defines the following characteristics of an array. All information about an array is contained in two vectors: its shape and ravel, including, for empty arrays, the ravel's prototype.

  • The rank is the number of dimensions or axes it has. It is the length of the shape.
  • The shape gives its length along each dimension.
  • The bound is the total number of elements in an array, that is, the product of the shape.
  • The ravel is a vector containing all of the array's elements. Its length is the bound.
  • The prototype is a special "null" element for the array. It is derived from the first element for non-empty arrays.
  • The depth is a number indicating how deeply nested an array is.

Depth

Main article: Depth

The depth is the level of nesting or boxing in an array. It is defined differently in nested and flat APLs.

In nested APLs, a simple non-scalar array has depth 1, an array containing only depth 1 arrays has depth 2, and a simple scalar (e.g a number or character) has depth 0.

Most APLs provide a Depth function to find an array's depth. For example:

      ≡('ab' 'cde')('fg' 'hi')
3

APLs vary in their definition of depth: for example some may return the depth with a sign to indicate that some level of the array mixes elements of different depths.

External links

References

  1. Randall Mercer. "A based system for general arrays". APL Quote Quad Volume 12, Issue 2. 1981-12.
  2. Jacob Brickman. "Behavior of APL primitives in a system with depth-1 scalars (APL+1)". APL-Germany APL-Journal 2019 1-2 (pdf).
APL features [edit]
Built-ins Primitives (functions, operators) ∙ Quad name
Array model ShapeRankDepthBoundIndex (Indexing) ∙ AxisRavelRavel orderElementScalarVectorMatrixSimple scalarSimple arrayNested arrayCellMajor cellSubarrayEmpty arrayPrototype
Data types Number (Boolean, Complex number) ∙ Character (String) ∙ BoxNamespaceFunction array
Concepts and paradigms Conformability (Scalar extension, Leading axis agreement) ∙ Scalar function (Pervasion) ∙ Identity elementComplex floorArray ordering (Total) ∙ Tacit programming (Function composition, Close composition) ∙ GlyphLeading axis theoryMajor cell searchFirst-class function
Errors LIMIT ERRORRANK ERRORSYNTAX ERRORDOMAIN ERRORLENGTH ERRORINDEX ERRORVALUE ERROREVOLUTION ERROR