Array model: Difference between revisions

From APL Wiki
Jump to navigation Jump to search
Miraheze>Adám Brudzewsky
m (Text replacement - "<code>" to "<source lang=apl inline>")
(Disambiguate)
(12 intermediate revisions by 6 users not shown)
Line 1: Line 1:
The distinguishing feature of APL and the array language family is its focus on arrays. In most array languages the array is the only first class datatype. While this sounds like a very strict model of language design, in fact it imposes no restrictions at all: any kind of data can be treated as a [[scalar]], or array with rank 0!
:''This page describes the array datatype as defined by array languages. For the role of arrays in [[APL syntax]], see [[Array]].''
 
The distinguishing feature of APL and the array language family is its focus on '''arrays'''. In most array languages the array is the only first class datatype. While this sounds like a very strict model of language design, in fact it imposes no restrictions at all: any kind of data can be treated as a [[scalar]], or array with rank 0!


APL's array model is distinct from and richer than the one-dimensional data structures given the name "array" in languages such as Python, Javascript, and Java. These structures correspond to APL vectors, sometimes with the requirement that all elements have the same type. In APL it is the arrangement of data into a multidimensional shape, and not any requirement about the way it is stored or the type of its elements, that defines an array. APL arrays are most closely related to multidimensional FORTRAN or C arrays.
APL's array model is distinct from and richer than the one-dimensional data structures given the name "array" in languages such as Python, Javascript, and Java. These structures correspond to APL vectors, sometimes with the requirement that all elements have the same type. In APL it is the arrangement of data into a multidimensional shape, and not any requirement about the way it is stored or the type of its elements, that defines an array. APL arrays are most closely related to multidimensional FORTRAN or C arrays.


An array is a rectangular collection of [[Element|elements]], arranged along zero or more [[Axis|axes]]. The number of axes is called the array's [[rank]] while their lengths make up the [[shape]]. Names are given to arrays with particular ranks:
An array is a rectangular collection of [[element]]s, arranged along zero or more [[Axis|axes]]. The number of axes is called the array's [[rank]] while their lengths make up the [[shape]]. Names are given to arrays with particular ranks:
* An array with 0 axes is a [[scalar]].
* An array with 0 axes is a [[scalar]].
* An array with 1 axis is a [[vector]].
* An array with 1 axis is a [[vector]].
Line 9: Line 11:
An array's shape is then a vector whose elements are axis lengths, and its rank is a scalar.
An array's shape is then a vector whose elements are axis lengths, and its rank is a scalar.


The largest divide among APLs hinges on the definition of "element" above. In [[#Flat array theory|flat array theory]] elements are not arrays: they are simple data such as characters and numbers. In [[#Nested array theory|nested array theory]] the elements can only be arrays, and characters and numbers are held to be contained in [[simple scalars]], arrays which by convention contain themselves as elements. In each case arrays may be considered to be homogeneous because all of their elements have the same type. In flat array languages this means arrays are restricted to contain only one type of data; in nested arrays it means that the elements of an array are all of the array "type"—that is, the type of all first-class values in the language!
The largest divide among APLs hinges on the definition of "element" above. In [[#Flat array theory|flat array theory]] elements are not arrays: they are simple data such as characters and numbers. In [[#Nested array theory|nested array theory]] the elements can only be arrays, and characters and numbers are held to be contained in [[simple scalar]]s, arrays which by convention contain themselves as elements. In each case arrays may be considered to be homogeneous because all of their elements have the same type. In flat array languages this means arrays are restricted to contain only one type of data; in nested arrays it means that the elements of an array are all of the array "type"—that is, the type of all first-class values in the language!


In a language with [[stranding]], such as [[APL2]], creating a 1-dimensional array is very simple: just write the elements next to each other, separated by spaces. Nested APLs such as APL2 allow any array to be used as an element, including scalar numbers or characters (written with quotes) as well as larger arrays. In order to include a stranded array inside another array it must be parenthesized.
In a language with [[stranding]], such as [[APL2]], creating a 1-dimensional array is very simple: just write the elements next to each other, separated by spaces. Nested APLs such as APL2 allow any array to be used as an element, including scalar numbers or characters (written with quotes) as well as larger arrays. In order to include a stranded array inside another array it must be parenthesized.
Line 16: Line 18:


== "Array languages" without arrays ==
== "Array languages" without arrays ==
:''See also: [[Timeline of influential array languages]]''


Some languages, despite deriving from APL, do not use APL-style arrays at all! Examples include [[K]] and [[I]], which only have vectors, and [[MATLAB]], which has true multidimensional arrays but usually treats data as matrices. Languages like K are usually considered part of the array language or APL family, but may or may not be considered array languages themselves.
Some languages, despite deriving from APL, do not use APL-style arrays at all! Examples include [[K]] and [[I]], which only have vectors, and [[MATLAB]], which has true multidimensional arrays but usually treats data as matrices. Languages like K are usually considered part of the array language or APL family, but may or may not be considered array languages themselves.
Line 21: Line 24:
== Flat array theory ==
== Flat array theory ==


In [[Iverson notation]] arrays were considered to contain numbers (or [[Booleans]], before these were unified with ordinary numbers), which are not themselves arrays. The property that array elements are some non-array type is the defining feature of flat array theory.
In [[Iverson notation]] arrays were considered to contain numbers (or [[Boolean]]s, before these were unified with ordinary numbers), which are not themselves arrays. The property that array elements are some non-array type is the defining feature of flat array theory.


Flat APLs impose the rule that all elements of arrays have the same type, such as all character or all numeric. IBM's [[APL\360]] was likely the first to specify this rule explicitly, and it has been maintained in newer languages such as [[SHARP APL]] and [[J]]. Although it is possible to discard this rule, resulting in an inhomogeneous array theory that allows arrays to contain elements of mixed type, but not other arrays, no APL to date has done this.
Flat APLs impose the rule that all elements of arrays have the same type, such as all character or all numeric. IBM's [[APL\360]] was likely the first to specify this rule explicitly, and it has been maintained in newer languages such as [[SHARP APL]] and [[J]]. Although it is possible to discard this rule, resulting in an inhomogeneous array theory that allows arrays to contain elements of mixed type, but not other arrays, no APL to date has done this.
Line 30: Line 33:
In order to allow programmers to work with inhomogeneous or nested data, flat array languages may define a special kind of element which "encloses" or "boxes" an array. Then there are three allowed element types for an array: character, numeric, and boxed.
In order to allow programmers to work with inhomogeneous or nested data, flat array languages may define a special kind of element which "encloses" or "boxes" an array. Then there are three allowed element types for an array: character, numeric, and boxed.


While a boxed array represents a collection of arrays, it is not considered to contain those arrays—its [[elements]] are boxes, and not their contents. For this reason [[scalar functions]] do not reach into boxes: they act on the elements of an array directly. Thus [[Equals]] on two boxed arrays compares them with a single Boolean result to indicate whether the boxed arrays [[match]].
While a boxed array represents a collection of arrays, it is not considered to contain those arrays—its [[element]]s are boxes, and not their contents. For this reason [[scalar function]]s do not reach into boxes: they act on the elements of an array directly. Thus [[Equal to]] on two boxes compares them, with a single Boolean result indicating whether the arrays inside the boxes [[match]].


== Nested array theory ==
== Nested array theory ==


A second version of the APL array model was developed in order to more transparently handle nested data, without the need to explicitly box and unbox arrays. The Nested Array Research System ([[NARS]]) was developed to study this model. In it, arrays contain other arrays directly. In this way they resemble the [https://en.wikipedia.org/wiki/Inductive_type inductive types] used in type theory.
A second version of the APL array model was developed in order to more transparently handle nested data, without the need to explicitly box and unbox arrays. The Nested Array Research System ([[NARS]]) was developed to study this model. In it, arrays contain other arrays directly. In this way they resemble the [[wikipedia:inductive type|inductive type]]s used in type theory.


In nested APLs each individual number or character is encapsulated in a [[simple scalar]]. Such a scalar may be referred to as "a number" or "a character" but it maintains the properties of an array. Other arrays used by the language are defined inductively: an array can be formed which contains as elements any array which has already been defined. Such an array cannot, by the nature of inductive definition, contain itself, even within many levels of nesting inside it. Arrays which contain only simple scalars, or are themselves simple scalars, are called [[Simple array|simple]]. Non-simple arrays are called "nested". The simple arrays are a superset of the arrays allowed in flat array theory without boxes: they include all arrays of numbers and characters, as well as arrays which mix numbers and characters. Arrays which would not be representable in flat array theory—those which contain a mixture of simple scalar types, or contain both simple scalars and other arrays—are called [[Mixed array|mixed]].
In nested APLs each individual number or character is encapsulated in a [[simple scalar]]. Such a scalar may be referred to as "a number" or "a character" but it maintains the properties of an array. Other arrays used by the language are defined inductively: an array can be formed which contains as elements any array which has already been defined. Such an array cannot, by the nature of inductive definition, contain itself, even within many levels of nesting inside it. Arrays which contain only simple scalars, or are themselves simple scalars, are called [[Simple array|simple]]. Non-simple arrays are called "nested". The simple arrays are a superset of the arrays allowed in flat array theory without boxes: they include all arrays of numbers and characters, as well as arrays which mix numbers and characters. Arrays which would not be representable in flat array theory—those which contain a mixture of simple scalar types, or contain both simple scalars and other arrays—are called [[Mixed array|mixed]].
Line 48: Line 51:
With floating arrays a simple scalar may be thought of as an infinite stack of scalar arrays. Any attempt to [[Enclose]] or [[Disclose]] this array results in the same kind of infinite stack, so it should be identical to the scalar itself.
With floating arrays a simple scalar may be thought of as an infinite stack of scalar arrays. Any attempt to [[Enclose]] or [[Disclose]] this array results in the same kind of infinite stack, so it should be identical to the scalar itself.


Floating arrays represent a departure from a true [https://en.wikipedia.org/wiki/Inductive_type inductive type]. To produce floating array theory from type theory, fixed arrays must be defined using simple scalars and inductive definition, and then simple scalars and scalar arrays containing them must be explicitly identified. Not making this identification would result in an array model not present in any APL: a "fixed" rather than "floating" nested array theory.
Floating arrays represent a departure from a true [[wikipedia:inductive type|inductive type]]. To produce floating array theory from type theory, fixed arrays must be defined using simple scalars and inductive definition, and then simple scalars and scalar arrays containing them must be explicitly identified. Not making this identification would result in an array model not present in any APL: a "fixed" rather than "floating" nested array theory.


Flat array theory is often called "grounded" in contrast to "floating" nested array theory.
Flat array theory is often called "grounded" in contrast to "floating" nested array theory.
Line 64: Line 67:
=== Numeric type coercion ===
=== Numeric type coercion ===


Most APLs, flat or nested, implicitly store simple numeric arrays as one of many [[Numeric type|numeric types]]. When a numeric array is formed from numbers with different types, all numbers are converted to a common type in order to be represented as a flat array. If the hierarchy of numeric types is not strict, that is, there are some pairs of numeric types for which neither type is a subset of the other, then this coercion may affect the behavior of the numbers in the array. For example, [[J]] on a 64-bit machine uses both 64-bit integers and [https://en.wikipedia.org/wiki/IEEE_754 double-precision floats]. [[Catenate|Catenating]] the two results in an array of doubles, which will lose precision for integers whose absolute value is larger than 2<sup>53</sup>. In [[Dyalog APL]] a similar issue occurs with [[decimal floats]] and [[complex numbers]]: combining the two results in an array of complex numbers, but this loses precision since Dyalog's complex numbers are stored as pairs of double-precision floats and its 128-bit decimal floats have higher precision that doubles.
Most APLs, flat or nested, implicitly store simple numeric arrays as one of many [[numeric type]]s. When a numeric array is formed from numbers with different types, all numbers are converted to a common type in order to be represented as a flat array. If the hierarchy of numeric types is not strict, that is, there are some pairs of numeric types for which neither type is a subset of the other, then this coercion may affect the behavior of the numbers in the array. For example, [[J]] on a 64-bit machine uses both 64-bit integers and [[wikipedia:IEEE_754|double-precision floats]]. [[Catenate|Catenating]] the two results in an array of doubles, which will lose precision for integers whose absolute value is larger than 2<sup>53</sup>. In [[Dyalog APL]] a similar issue occurs with [[decimal float]]s and [[complex number]]s: combining the two results in an array of complex numbers, but this loses precision since Dyalog's complex numbers are stored as pairs of double-precision floats and its 128-bit decimal floats have higher precision than doubles.


== Array characteristics ==
== Array characteristics ==


APL defines the following characteristics of an array. All information about an array is contained in two [[Vector|vectors]]: its [[shape]] and [[ravel]], including, for empty arrays, the ravel's [[prototype]].
APL defines the following characteristics of an array. All information about an array is contained in two [[vector]]s: its [[shape]] and [[ravel]], including, for empty arrays, the ravel's [[prototype]].
* The [[rank]] is the number of dimensions or [[axes]] it has. It is the length of the shape.
* The [[rank]] is the number of dimensions or [[axes]] it has. It is the length of the shape.
* The [[shape]] gives its length along each dimension.
* The [[shape]] gives its length along each dimension.
* The [[bound]] is the total number of elements in an array, that is, the product of the shape.
* The [[bound]] is the total number of elements in an array, that is, the product of the shape.
* The [[ravel]] is a [[vector]] containing all of the array's [[Element|elements]]. It's length is the bound.
* The [[ravel]] is a [[vector]] containing all of the array's [[element]]s. Its length is the bound.
* The [[Array prototype|prototype]] is a special "null" element for the array. It is derived from the first element for non-empty arrays.
* The [[prototype]] is a special "null" element for the array. It is derived from the first element for non-empty arrays.
* The [[depth]] is a number indicating how deeply nested an array is.
* The [[depth]] is a number indicating how deeply nested an array is.


Line 83: Line 86:
In nested APLs, a simple non-scalar array has depth 1, an array containing only depth 1 arrays has depth 2, and a simple scalar (e.g a number or character) has depth 0.
In nested APLs, a simple non-scalar array has depth 1, an array containing only depth 1 arrays has depth 2, and a simple scalar (e.g a number or character) has depth 0.


Most APLs provide a Depth function <source lang=apl inline>≡</code> to find an array's depth. For example:
Most APLs provide a Depth function <source lang=apl inline>≡</source> to find an array's depth. For example:
<source lang=apl>
<source lang=apl>
       ≡('ab' 'cde')('fg' 'hi')
       ≡('ab' 'cde')('fg' 'hi')
Line 92: Line 95:


== External links ==
== External links ==
[http://help.dyalog.com/latest/Content/Language/Introduction/Variables/Arrays.htm Formal definition]
[https://chat.stackexchange.com/rooms/52405/conversation/lesson-1-introduction-to-arrays-in-apl Chat lesson]
[http://help.dyalog.com/latest/Content/Language/Introduction/Variables/Vector%20Notation.htm Vector notation]


{{APL programming language}}
* [http://help.dyalog.com/latest/index.htm#Language/Introduction/Variables/Arrays.htm Dyalog array model]
* [https://chat.stackexchange.com/rooms/52405/conversation/lesson-1-introduction-to-arrays-in-apl APL Cultivation]
* [https://www.sacrideo.us/tag/apl-a-day/ APL a Day] series
* [https://www.jsoftware.com/papers/array.htm What is an Array?] by [[Roger Hui]] (in [[J]])
{{APL features}}[[Category:Arrays| ]]

Revision as of 09:04, 13 May 2020

This page describes the array datatype as defined by array languages. For the role of arrays in APL syntax, see Array.

The distinguishing feature of APL and the array language family is its focus on arrays. In most array languages the array is the only first class datatype. While this sounds like a very strict model of language design, in fact it imposes no restrictions at all: any kind of data can be treated as a scalar, or array with rank 0!

APL's array model is distinct from and richer than the one-dimensional data structures given the name "array" in languages such as Python, Javascript, and Java. These structures correspond to APL vectors, sometimes with the requirement that all elements have the same type. In APL it is the arrangement of data into a multidimensional shape, and not any requirement about the way it is stored or the type of its elements, that defines an array. APL arrays are most closely related to multidimensional FORTRAN or C arrays.

An array is a rectangular collection of elements, arranged along zero or more axes. The number of axes is called the array's rank while their lengths make up the shape. Names are given to arrays with particular ranks:

  • An array with 0 axes is a scalar.
  • An array with 1 axis is a vector.
  • An array with 2 axes is a matrix.

An array's shape is then a vector whose elements are axis lengths, and its rank is a scalar.

The largest divide among APLs hinges on the definition of "element" above. In flat array theory elements are not arrays: they are simple data such as characters and numbers. In nested array theory the elements can only be arrays, and characters and numbers are held to be contained in simple scalars, arrays which by convention contain themselves as elements. In each case arrays may be considered to be homogeneous because all of their elements have the same type. In flat array languages this means arrays are restricted to contain only one type of data; in nested arrays it means that the elements of an array are all of the array "type"—that is, the type of all first-class values in the language!

In a language with stranding, such as APL2, creating a 1-dimensional array is very simple: just write the elements next to each other, separated by spaces. Nested APLs such as APL2 allow any array to be used as an element, including scalar numbers or characters (written with quotes) as well as larger arrays. In order to include a stranded array inside another array it must be parenthesized.

In most APLs, "string" is just a term for a character vector, and a string may be written with single quotes around the entire string.

"Array languages" without arrays

See also: Timeline of influential array languages

Some languages, despite deriving from APL, do not use APL-style arrays at all! Examples include K and I, which only have vectors, and MATLAB, which has true multidimensional arrays but usually treats data as matrices. Languages like K are usually considered part of the array language or APL family, but may or may not be considered array languages themselves.

Flat array theory

In Iverson notation arrays were considered to contain numbers (or Booleans, before these were unified with ordinary numbers), which are not themselves arrays. The property that array elements are some non-array type is the defining feature of flat array theory.

Flat APLs impose the rule that all elements of arrays have the same type, such as all character or all numeric. IBM's APL\360 was likely the first to specify this rule explicitly, and it has been maintained in newer languages such as SHARP APL and J. Although it is possible to discard this rule, resulting in an inhomogeneous array theory that allows arrays to contain elements of mixed type, but not other arrays, no APL to date has done this.

Boxes

Main article: Box

In order to allow programmers to work with inhomogeneous or nested data, flat array languages may define a special kind of element which "encloses" or "boxes" an array. Then there are three allowed element types for an array: character, numeric, and boxed.

While a boxed array represents a collection of arrays, it is not considered to contain those arrays—its elements are boxes, and not their contents. For this reason scalar functions do not reach into boxes: they act on the elements of an array directly. Thus Equal to on two boxes compares them, with a single Boolean result indicating whether the arrays inside the boxes match.

Nested array theory

A second version of the APL array model was developed in order to more transparently handle nested data, without the need to explicitly box and unbox arrays. The Nested Array Research System (NARS) was developed to study this model. In it, arrays contain other arrays directly. In this way they resemble the inductive types used in type theory.

In nested APLs each individual number or character is encapsulated in a simple scalar. Such a scalar may be referred to as "a number" or "a character" but it maintains the properties of an array. Other arrays used by the language are defined inductively: an array can be formed which contains as elements any array which has already been defined. Such an array cannot, by the nature of inductive definition, contain itself, even within many levels of nesting inside it. Arrays which contain only simple scalars, or are themselves simple scalars, are called simple. Non-simple arrays are called "nested". The simple arrays are a superset of the arrays allowed in flat array theory without boxes: they include all arrays of numbers and characters, as well as arrays which mix numbers and characters. Arrays which would not be representable in flat array theory—those which contain a mixture of simple scalar types, or contain both simple scalars and other arrays—are called mixed.

The programmer can Pick an element of an array directly. The resulting element is always an array, even for simple arrays: Pick never returns something which is "just a number". This may be viewed in multiple ways: either an array's elements are in fact always arrays, or Pick and similar functions wrap non-array elements so they are still arrays. An APL could be defined which gives an error rather than allow a program to pick into a simple scalar array. Another choice might be to return a non-array character, but an APL which allowed such values to be used might no longer be considered an array language.

Whether an array language is flat or nested depends only on the language's behavior from the programmer's perspective. Nearly all APLs use homogeneous flat arrays for implementation purposes, with pointers to enclose elements. However, the language's array model is determined by what is presented to the programmer and not what is stored in memory. If a programmer can create and manipulate arrays with both character and numeric data the same way they would work with completely numeric arrays, then the language is nested!

Floating arrays

All nested APLs to date define simple scalars to be "floating", that is, a simple scalar is identical to its enclose (or any scalar array containing only that simple scalar). It is this property that justifies the term "simple scalar": a simple array contains only simple scalars, so a simple array which is scalar must contain a single simple scalar and thus be identical to it. Therefore the only arrays which are both simple and scalar are the simple scalars.

With floating arrays a simple scalar may be thought of as an infinite stack of scalar arrays. Any attempt to Enclose or Disclose this array results in the same kind of infinite stack, so it should be identical to the scalar itself.

Floating arrays represent a departure from a true inductive type. To produce floating array theory from type theory, fixed arrays must be defined using simple scalars and inductive definition, and then simple scalars and scalar arrays containing them must be explicitly identified. Not making this identification would result in an array model not present in any APL: a "fixed" rather than "floating" nested array theory.

Flat array theory is often called "grounded" in contrast to "floating" nested array theory.

Other features of the array model

Array prototypes

Main article: Array prototype

An empty array may carry additional information to indicate what type its elements would be, if it had any. In flat array theory, this is typically just the type: character, numeric, or boxed. In a nested array language, the prototype may be quite complex, containing an entire nested structure.

An array's prototype is used to determine the value of fills when they are required by the language.

Numeric type coercion

Most APLs, flat or nested, implicitly store simple numeric arrays as one of many numeric types. When a numeric array is formed from numbers with different types, all numbers are converted to a common type in order to be represented as a flat array. If the hierarchy of numeric types is not strict, that is, there are some pairs of numeric types for which neither type is a subset of the other, then this coercion may affect the behavior of the numbers in the array. For example, J on a 64-bit machine uses both 64-bit integers and double-precision floats. Catenating the two results in an array of doubles, which will lose precision for integers whose absolute value is larger than 253. In Dyalog APL a similar issue occurs with decimal floats and complex numbers: combining the two results in an array of complex numbers, but this loses precision since Dyalog's complex numbers are stored as pairs of double-precision floats and its 128-bit decimal floats have higher precision than doubles.

Array characteristics

APL defines the following characteristics of an array. All information about an array is contained in two vectors: its shape and ravel, including, for empty arrays, the ravel's prototype.

  • The rank is the number of dimensions or axes it has. It is the length of the shape.
  • The shape gives its length along each dimension.
  • The bound is the total number of elements in an array, that is, the product of the shape.
  • The ravel is a vector containing all of the array's elements. Its length is the bound.
  • The prototype is a special "null" element for the array. It is derived from the first element for non-empty arrays.
  • The depth is a number indicating how deeply nested an array is.

Depth

Main article: Depth

The depth is the level of nesting or boxing in an array. It is defined differently in nested and flat APLs.

In nested APLs, a simple non-scalar array has depth 1, an array containing only depth 1 arrays has depth 2, and a simple scalar (e.g a number or character) has depth 0.

Most APLs provide a Depth function to find an array's depth. For example:

      ≡('ab' 'cde')('fg' 'hi')
3

APLs vary in their definition of depth: for example some may return the depth with a sign to indicate that some level of the array mixes elements of different depths.

External links

APL features [edit]
Built-ins Primitives (functions, operators) ∙ Quad name
Array model ShapeRankDepthBoundIndex (Indexing) ∙ AxisRavelRavel orderElementScalarVectorMatrixSimple scalarSimple arrayNested arrayCellMajor cellSubarrayEmpty arrayPrototype
Data types Number (Boolean, Complex number) ∙ Character (String) ∙ BoxNamespaceFunction array
Concepts and paradigms Conformability (Scalar extension, Leading axis agreement) ∙ Scalar function (Pervasion) ∙ Identity elementComplex floorArray ordering (Total) ∙ Tacit programming (Function composition, Close composition) ∙ GlyphLeading axis theoryMajor cell search
Errors LIMIT ERRORRANK ERRORSYNTAX ERRORDOMAIN ERRORLENGTH ERRORINDEX ERRORVALUE ERROREVOLUTION ERROR