Array model: Difference between revisions

Jump to navigation Jump to search
2,423 bytes added ,  20:04, 29 September 2020
→‎External links: Add references
Miraheze>Marshall
(→‎External links: Add references)
(19 intermediate revisions by 6 users not shown)
Line 1: Line 1:
The distinguishing feature of APL and the array language family is its focus on arrays. In most array languages the array is the only first class datatype. While this sounds like a very strict model of language design, in fact it imposes no restrictions at all: any kind of data can be treated as a [[scalar]], or array with rank 0!
:''This page describes the array datatype as defined by array languages. For the role of arrays in [[APL syntax]], see [[Array]].''
 
The distinguishing feature of APL and the array language family is its focus on '''arrays'''. In most array languages the array is the only first class datatype. While this sounds like a very strict model of language design, in fact it imposes no restrictions at all: any kind of data can be treated as a [[scalar]], or array with rank 0!


APL's array model is distinct from and richer than the one-dimensional data structures given the name "array" in languages such as Python, Javascript, and Java. These structures correspond to APL vectors, sometimes with the requirement that all elements have the same type. In APL it is the arrangement of data into a multidimensional shape, and not any requirement about the way it is stored or the type of its elements, that defines an array. APL arrays are most closely related to multidimensional FORTRAN or C arrays.
APL's array model is distinct from and richer than the one-dimensional data structures given the name "array" in languages such as Python, Javascript, and Java. These structures correspond to APL vectors, sometimes with the requirement that all elements have the same type. In APL it is the arrangement of data into a multidimensional shape, and not any requirement about the way it is stored or the type of its elements, that defines an array. APL arrays are most closely related to multidimensional FORTRAN or C arrays.


An array is a rectangular collection of [[Element|elements]], arranged along zero or more [[Axis|axes]]. The number of axes is called the array's [[rank]] while their lengths make up the [[shape]]. Names are given to arrays with particular ranks:
An array is a rectangular collection of [[element]]s, arranged along zero or more [[Axis|axes]]. The number of axes is called the array's [[rank]] while their lengths make up the [[shape]]. Names are given to arrays with particular ranks:
* An array with 0 axes is a [[scalar]].
* An array with 0 axes is a [[scalar]].
* An array with 1 axis is a [[vector]].
* An array with 1 axis is a [[vector]].
Line 9: Line 11:
An array's shape is then a vector whose elements are axis lengths, and its rank is a scalar.
An array's shape is then a vector whose elements are axis lengths, and its rank is a scalar.


The largest divide among APLs hinges on the definition of "element" above. In [[#Flat array theory|flat array theory]] elements are not arrays: they are simple data such as characters and numbers. In [[#Nested array theory|nested array theory]] the elements can only be arrays, and characters and numbers are held to be contained in [[simple scalars]], arrays which by convention contain themselves as elements. In each case arrays may be considered to be homogeneous because all of their elements have the same type. In flat array languages this means arrays are restricted to contain only one type of data; in nested arrays it means that the elements of an array are all of the array "type"—that is, the type of all first-class values in the language!
The largest divide among APLs hinges on the definition of "element" above. In [[#Flat array theory|flat array theory]] elements are not arrays: they are simple data such as characters and numbers. In [[#Nested array theory|nested array theory]] the elements can only be arrays, and characters and numbers are held to be contained in [[simple scalar]]s, arrays which by convention contain themselves as elements. In each case arrays may be considered to be homogeneous because all of their elements have the same type. In flat array languages this means arrays are restricted to contain only one type of data; in nested arrays it means that the elements of an array are all of the array "type"—that is, the type of all first-class values in the language!


In a language with [[stranding]], such as [[APL2]], creating a 1-dimensional array is very simple: just write the elements next to each other, separated by spaces. Nested APLs such as APL2 allow any array to be used as an element, including scalar numbers or characters (written with quotes) as well as larger arrays. In order to include a stranded array inside another array it must be parenthesized.
In a language with [[stranding]], such as [[APL2]], creating a 1-dimensional array is very simple: just write the elements next to each other, separated by spaces. Nested APLs such as APL2 allow any array to be used as an element, including scalar numbers or characters (written with quotes) as well as larger arrays. In order to include a stranded array inside another array it must be parenthesized.
Line 21: Line 23:
== Flat array theory ==
== Flat array theory ==


In [[Iverson notation]] arrays were considered to contain numbers (or [[Booleans]], before these were unified with ordinary numbers), which are not themselves arrays. The property that array elements are some non-array type is the defining feature of flat array theory.
In [[Iverson notation]] arrays were considered to contain numbers (or [[Boolean]]s, before these were unified with ordinary numbers), which are not themselves arrays. The property that array elements are some non-array type is the defining feature of flat array theory.


Flat APLs impose the rule that all elements of arrays have the same type, such as all character or all numeric. IBM's [[APL\360]] was likely the first to specify this rule explicitly, and it has been maintained in newer languages such as [[SHARP APL]] and [[J]]. Although it is possible to discard this rule, resulting in an inhomogeneous array theory that allows arrays to contain elements of mixed type, but not other arrays, no APL to date has done this.
Flat APLs impose the rule that all elements of arrays have the same type, such as all character or all numeric. IBM's [[APL\360]] was likely the first to specify this rule explicitly, and it has been maintained in newer languages such as [[SHARP APL]] and [[J]]. Although it is possible to discard this rule, resulting in an inhomogeneous array theory that allows arrays to contain elements of mixed type, but not other arrays, no APL to date has done this.
Line 30: Line 32:
In order to allow programmers to work with inhomogeneous or nested data, flat array languages may define a special kind of element which "encloses" or "boxes" an array. Then there are three allowed element types for an array: character, numeric, and boxed.
In order to allow programmers to work with inhomogeneous or nested data, flat array languages may define a special kind of element which "encloses" or "boxes" an array. Then there are three allowed element types for an array: character, numeric, and boxed.


While a boxed array represents a collection of arrays, it is not considered to contain those arrays—its [[elements]] are boxes, and not their contents. For this reason [[scalar functions]] do not reach into boxes: they act on the elements of an array directly. Thus [[Equals]] on two boxed arrays compares them with a single Boolean result to indicate whether the boxed arrays [[match]].
While a boxed array represents a collection of arrays, it is not considered to contain those arrays—its [[element]]s are boxes, and not their contents. For this reason [[scalar function]]s do not reach into boxes: they act on the elements of an array directly. Thus [[Equal to]] on two boxes compares them, with a single Boolean result indicating whether the arrays inside the boxes [[match]].


== Nested array theory ==
== Nested array theory ==


A second version of the APL array model was developed in order to more transparently handle nested data, without the need to explicitly box and unbox arrays. The Nested Array Research System ([[NARS]]) was developed to study this model. In it, arrays contain other arrays directly. In this way they resemble the [https://en.wikipedia.org/wiki/Inductive_type inductive types] used in type theory.
A second version of the APL array model was developed in order to more transparently handle nested data, without the need to explicitly box and unbox arrays. The Nested Array Research System ([[NARS]]) was developed to study this model. In it, arrays contain other arrays directly. In this way they resemble the [[wikipedia:inductive type|inductive type]]s used in type theory.


In nested APLs each individual number or character is encapsulated in a [[simple scalar]]. Such a scalar may be referred to as "a number" or "a character" but it maintains the properties of an array. Other arrays used by the language are defined inductively: an array can be formed which contains as elements any array which has already been defined. Such an array cannot, by the nature of inductive definition, contain itself, even within many levels of nesting inside it. Arrays which contain only simple scalars, or are themselves simple scalars, are called [[Simple array|simple]]. Non-simple arrays are called "nested". The simple arrays are a superset of the arrays allowed in flat array theory without boxes: they include all arrays of numbers and characters, as well as arrays which mix numbers and characters. Arrays which would not be representable in flat array theory—those which contain a mixture of simple scalar types, or contain both simple scalars and other arrays—are called [[Mixed array|mixed]].
In nested APLs each individual number or character is encapsulated in a [[simple scalar]]. Such a scalar may be referred to as "a number" or "a character" but it maintains the properties of an array. Other arrays used by the language are defined inductively: an array can be formed which contains as elements any array which has already been defined. Such an array cannot, by the nature of inductive definition, contain itself, even within many levels of nesting inside it. Arrays which contain only simple scalars, or are themselves simple scalars, are called [[Simple array|simple]]. Non-simple arrays are called "nested". The simple arrays are a superset of the arrays allowed in flat array theory without boxes: they include all arrays of numbers and characters, as well as arrays which mix numbers and characters. Arrays which would not be representable in flat array theory—those which contain a mixture of simple scalar types, or contain both simple scalars and other arrays—are called [[Mixed array|mixed]].
Line 48: Line 50:
With floating arrays a simple scalar may be thought of as an infinite stack of scalar arrays. Any attempt to [[Enclose]] or [[Disclose]] this array results in the same kind of infinite stack, so it should be identical to the scalar itself.
With floating arrays a simple scalar may be thought of as an infinite stack of scalar arrays. Any attempt to [[Enclose]] or [[Disclose]] this array results in the same kind of infinite stack, so it should be identical to the scalar itself.


Floating arrays represent a departure from a true [https://en.wikipedia.org/wiki/Inductive_type inductive type]. To produce floating array theory from type theory, fixed arrays must be defined using simple scalars and inductive definition, and then simple scalars and scalar arrays containing them must be explicitly identified. Not making this identification would result in an array model not present in any APL: a "fixed" rather than "floating" nested array theory.
Floating arrays represent a departure from a true [[wikipedia:inductive type|inductive type]]. To produce floating array theory from type theory, fixed arrays must be defined using simple scalars and inductive definition, and then simple scalars and scalar arrays containing them must be explicitly identified. Not making this identification would result in an array model not present in any APL: a "fixed" rather than "floating" nested array theory.


Flat array theory is often called "grounded" in contrast to "floating" nested array theory.
Flat array theory is often called "grounded" in contrast to "floating" nested array theory.
== Based array theory ==
Based array theory discards the principle that all data should be stored in arrays, instead defining basic types such as characters and numbers independently of arrays and arrays as a collection type—possibly one of many—that can contain any data. This model does not have any widely accepted name, with the term "based system" introduced in an [[APL Quote Quad]] paper in 1981.<ref>Randall Mercer. [https://dl.acm.org/doi/abs/10.1145/586656.586663 "A based system for general arrays"]. [[APL Quote Quad]] Volume 12, Issue 2. 1981-12.</ref> However, as it is the natural model when arrays are added to an existing programming system, it is common in array libraries such as [[NumPy]], [[wikipedia:ILNumerics|ILNumerics]], and [[wikipedia:Haskell (programming language)|Haskell]]'s [https://hackage.haskell.org/package/repa Repa], as well as the language [[Julia]]. It is used by the APL-family language [[BQN]].
=== Mutable based arrays ===
In many languages with this array style, such as NumPy and Julia, the arrays are [[wikipedia:Immutable object|mutable]], meaning that copies of an array can be made, so that one copy reflects changes made to any copy. In contrast, APL operations that appear to modify an array, like [[indexed assignment]], will only change the particular copy of the array used, and can be said to create a new array rather than change an existing one: there is no special connection between the old and modified array. Mutable arrays make it possible for an array to contain itself, by replacing one element of an existing array with the whole array. This means that more values are possible than in an immutable based array language, and that some properties of immutable arrays, such as a finite [[depth]], do not hold.


== Other features of the array model ==
== Other features of the array model ==
Line 64: Line 74:
=== Numeric type coercion ===
=== Numeric type coercion ===


Most APLs, flat or nested, implicitly store simple numeric arrays as one of many [[Numeric type|numeric types]]. When a numeric array is formed from numbers with different types, all numbers are converted to a common type in order to be represented as a flat array. If the hierarchy of numeric types is not strict, that is, there are some pairs of numeric types for which neither type is a subset of the other, then this coercion may affect the behavior of the numbers in the array. For example, [[J]] on a 64-bit machine uses both 64-bit integers and [https://en.wikipedia.org/wiki/IEEE_754 double-precision floats]. [[Catenate|Catenating]] the two results in an array of doubles, which will lose precision for integers whose absolute value is larger than 2<sup>53</sup>. In [[Dyalog APL]] a similar issue occurs with [[decimal floats]] and [[complex numbers]]: combining the two results in an array of complex numbers, but this loses precision since Dyalog's complex numbers are stored as pairs of double-precision floats and its 128-bit decimal floats have higher precision that doubles.
Most APLs, flat or nested, implicitly store simple numeric arrays as one of many [[numeric type]]s. When a numeric array is formed from numbers with different types, all numbers are converted to a common type in order to be represented as a flat array. If the hierarchy of numeric types is not strict, that is, there are some pairs of numeric types for which neither type is a subset of the other, then this coercion may affect the behavior of the numbers in the array. For example, [[J]] on a 64-bit machine uses both 64-bit integers and [[wikipedia:IEEE_754|double-precision floats]]. [[Catenate|Catenating]] the two results in an array of doubles, which will lose precision for integers whose absolute value is larger than 2<sup>53</sup>. In [[Dyalog APL]] a similar issue occurs with [[decimal float]]s and [[complex number]]s: combining the two results in an array of complex numbers, but this loses precision since Dyalog's complex numbers are stored as pairs of double-precision floats and its 128-bit decimal floats have higher precision than doubles.


== Array characteristics ==
== Array characteristics ==
APL defines the following characteristics of an array. All information about an array is contained in two [[vector]]s: its [[shape]] and [[ravel]], including, for empty arrays, the ravel's [[prototype]].
* The [[rank]] is the number of dimensions or [[axes]] it has. It is the length of the shape.
* The [[shape]] gives its length along each dimension.
* The [[bound]] is the total number of elements in an array, that is, the product of the shape.
* The [[ravel]] is a [[vector]] containing all of the array's [[element]]s. Its length is the bound.
* The [[prototype]] is a special "null" element for the array. It is derived from the first element for non-empty arrays.
* The [[depth]] is a number indicating how deeply nested an array is.


=== Depth ===
=== Depth ===
Line 75: Line 93:
In nested APLs, a simple non-scalar array has depth 1, an array containing only depth 1 arrays has depth 2, and a simple scalar (e.g a number or character) has depth 0.
In nested APLs, a simple non-scalar array has depth 1, an array containing only depth 1 arrays has depth 2, and a simple scalar (e.g a number or character) has depth 0.


Most APLs provide a Depth function <code>≡</code> to find an array's depth. For example:
Most APLs provide a Depth function <source lang=apl inline>≡</source> to find an array's depth. For example:
<source lang=apl>
<source lang=apl>
       ≡('ab' 'cde')('fg' 'hi')
       ≡('ab' 'cde')('fg' 'hi')
Line 82: Line 100:


APLs vary in their definition of depth: for example some may return the depth with a sign to indicate that some level of the array mixes elements of different depths.
APLs vary in their definition of depth: for example some may return the depth with a sign to indicate that some level of the array mixes elements of different depths.
=== Rank ===
{{Main|Rank}}
The concept of rank is very important in APL, and isn't present in many other languages. It refers to the number of dimensions in the array. So far, rank-0 arrays are scalars, rank-1 arrays are vectors, rank-2 arrays are usually called matrices or tables.


== External links ==
== External links ==
[http://help.dyalog.com/latest/Content/Language/Introduction/Variables/Arrays.htm Formal definition]
[https://chat.stackexchange.com/rooms/52405/conversation/lesson-1-introduction-to-arrays-in-apl Chat lesson]


[http://help.dyalog.com/latest/Content/Language/Introduction/Variables/Vector%20Notation.htm Vector notation]
* [https://help.dyalog.com/latest/index.htm#Language/Introduction/Variables/Arrays.htm Dyalog array model]
* [https://chat.stackexchange.com/rooms/52405/conversation/lesson-1-introduction-to-arrays-in-apl APL Cultivation]
* [https://www.sacrideo.us/tag/apl-a-day/ APL a Day] series
* [https://www.jsoftware.com/papers/array.htm What is an Array?] by [[Roger Hui]] (in [[J]])


{{APL programming language}}
== References ==
<references />
{{APL features}}[[Category:Arrays| ]]

Navigation menu