Array notation design considerations: Difference between revisions

Jump to navigation Jump to search
Dialect agnostic
(emphasise rank of [] components#)
(Dialect agnostic)
Line 1: Line 1:
[[File:Array notation syntax.png|thumb|right|[[wikipedia:Railroad diagram|Railroad diagram]] for Dyalog's syntax.]]
This article details the design considerations for [[array notation]] in APL. It is also intended to solicit feedback, via the [[{{TALKPAGENAME}}|Discussion page]]. Feedback from other media will also be posted to that page.  
This page gives a precise description and history of [[Array notation]] in [[Dyalog APL]].


== Description ==
== Objectives ==
The notation is added to the language by giving meaning to previously invalid statements. The added syntax consists of three constructs that are currently [[SYNTAX ERROR]]s:


* ''broken'' round parentheses
The following requirements were proposed as objectives for an APL array notation:<ref>[[Adám Brudzewsky]]. Internal documents. [[Dyalog Ltd.]] 30 Jun 2017.</ref>
* ''broken'' square brackets
* empty round parentheses: <source lang=apl inline>()</source>


where ''broken'' means interrupted by one or more [[diamond]]s (<source lang=apl inline></source>) or line breaks (outside of [[dfn]]s).
# No new [[glyph]]s
# Reusing existing glyphs for similar purposes
# Similarity to other languages ([[K]], [[wikipedia:JSON|JSON]], [[wikipedia:CSS|CSS]])
# Visual attractiveness
# Intuitive syntax
# As little [[wikipedia:syntactic sugar|syntactic sugar]] as possible
 
== Specific considerations ==
 
Various alternatives have been considered and the following details each design decision.
 
=== Glyphs ===
 
The design requirement for no new glyphs was contentious, and both [[bi-glyph]] and non-ASCII brackets were considered. Bi-glyphs were rejected out of readability concerns, especially when nested. For example, <source lang=apl inline>1 1 3⍴2</source> could have been written as <source lang=apl inline>[[[[2 2 2]]]]</source>. Non-ASCII brackets were rejected for font and keyboarding reasons, as well as to make it easier for non-APL systems to generate APL data. For example, <source lang=apl inline>⟦</source>…<source lang=apl inline>⟧</source> was proposed to denote a collection of [[major cells]], forming a new array of rank one-higher than the rank of the highest-[[rank]] constituent [[cell]]. However, few [[fonts]] support these glyphs.
 
The eventual choice was to go with existing symbols, and this had important implications for the specifics of the notation. While ideally, a notation would have been introduced for a collection of major cells, thereby handling both vectors and higher-rank arrays, a problem presents itself with [[axis|axes]] of length 1, because both square brackets and round parentheses already have meaning with when surrounding a single statement (namely [[function axis]]/[[bracket indexing]] and [[precedence]]/[[function train]]s). Thus, while <source lang=apl inline>2 ⟦3⟧</source> could have denoted the [[nested array]] <source lang=apl inline>2 (1⍴3)</source>, this isn't viable with <source lang=apl inline>2 [1⍴3]</source> because this already denotes indexing <source lang=apl inline>2</source> using the indices <source lang=apl inline>1⍴3</source>. To disambiguate, at least one statement separator or line break must be present in each level of array notation brackets and parentheses.
 
=== Disambiguating square brackets ===
 
The overloading of square brackets, currently in use only for [[function axis]] and [[bracket indexing]], to mean a higher-rank array, poses a problem of disambiguation in the case where there is only one major cell. For example <source lang=apl inline>'abc'[3 3]</source> could be equivalent to <source lang=apl inline>'cc'</source> or <source lang=apl inline>'abc'(1 2⍴3)</source> depending on whether the brackets are interpreted as indexing or an array. Two proposals have been made, and it is possible to support either or both:
# Square brackets are interpreted as representing an array if no other interpretation is possible, e.g. immediately following an opening round parenthesis, curly brace, or square bracket, or beginning a statement.
# Square brackets are interpreted as representing an array if they are "broken", i.e. contain a diamond or newline that isn't enclosed in another round parenthesis, curly brace, or square bracket.
Option 1 depends on an outer context of the notation, while option 2 depends on the inner content of the notation. The latter has similarity to the manner in which a [[dfn]] is determined to be a function, a monadic operator, or a dyadic operator: If the curly braces ''contain'' <source lang=apl inline>⍵⍵</source> then the dfn is a dyadic operator; otherwise, a <source lang=apl inline>⍺⍺</source> indicates a monadic operator; and any other dfn is a function.
 
=== Minimum rank of major cells ===


* A ''broken'' round parenthesis creates a [[namespace]] if every diamond/line break-separated statement is a ''name-value pair''.
While <source lang=apl inline>⟦⟦3⟧⟧</source> could denote <source lang=apl inline>1 1⍴3</source> using non-ASCII glyphs, an equivalent ASCII scheme instead would have required <source lang=apl inline>[[3⋄]]</source> where the inner bracket creates a vector, and the outer creates a [[matrix]]. Using line breaks instead of diamonds, it was found to be counter-intuitive that <source lang=apl>[
* A ''broken'' round parenthesis creates a [[vector]] if every diamond/line break-separated statement is a value expression. In that case, every such statement forms an [[element]] in the resulting vector.
3
* A ''broken'' square bracket creates a an [[array]] where every diamond/line break-separated statement forms a [[major cell]], of [[rank]] greater than or equal to 1, in the resulting array.
5
* <source lang=apl inline>()</source> is equivalent to <source lang=apl inline>(⎕NS 0⍴⊂'')</source>
  ]</source> was to denote two-[[element]] vector while <source lang=apl>[
* A ''name-value pair'' consist of a valid APL identifier, followed by a <source lang=apl inline>:</source> and a value expression.
3 4
5 6
    ]</source> would be a two-row matrix. This is indeed the case in [[dzaima/APL]], as opposed to in [[Dyalog APL]], where a special rule was added to the effect that in such collections of major cells, every cell will be considered to have a rank of at least 1, even if it is  a [[scalar]]. However, this choice introduced the need for a separate notation to allow vectors to be written over multiple lines, and therefore the round parentheses were extended from their traditional use in [[strand notation]] to also denote collections of [[enclose]]d elements.


=== Formal syntax ===
=== Name-value pairs ===
The array notation can be described using [[wikipedia:Extended Backus–Naur form|Extended Backus–Naur form]], where an <code>expression</code> is any traditional APL expression:
<pre>
value   ::= expression | list | block | space
list    ::= '(' ( ( value sep )+ value? | ( sep value )+ sep? ) ')'
block    ::= '[' ( ( value sep )+ value? | ( sep value )+ sep? ) ']'
space    ::= '(' sep? ( name ':' value ( sep name ':' value )* )? sep? ')'
sep      ::= [⋄#x000A#x000D#x0085]+
</pre>


== History ==
As a notation for [[namespace]]s, several details were debated, as detailed below.
 
==== Separators between name-value pairs ====
 
Should <source lang=apl inline>⋄</source> or <source lang=apl inline>;</source> be used to separate [[wikipedia:name-value pair|name-value pair]]s (in addition to line breaks)?
 
The <source lang=apl inline>⋄</source> was chosen to separate name-value pairs, as it is generally exchangeable with a line break, while <source lang=apl inline>;</source> though it is used to separate names ― without values ― in [[Defined_function_(traditional)#Semi-colons|headers]] and in [[locals lines]]. Furthermore, it was seen as natural the values would be computed in reading order (left-to-right) just like multiple statements are, and while <source lang=apl inline>⋄</source> would imply this, <source lang=apl inline>;</source> wouldn't. Indeed, in the statement <source lang=apl inline>A[B;C]</source>, expression <source lang=apl inline>C</source> is evaluated before expression <source lang=apl inline>B</source>. It was briefly considered to have values computed from the right, just line stranding is, but this was rejected because replacing the semi-colons with line breaks would then require evaluation beginning with the last line and working upwards!
 
==== Namespace delimiters ====
 
Should round parentheses (<source lang=apl inline>(</source>…<source lang=apl inline>)</source>) or square brackets (<source lang=apl inline>[</source>…<source lang=apl inline>]</source>) be used to enclose namespaces?
 
Round parentheses were chosen because namespaces are seen as (unordered) lists, and so are more similar to vectors than higher-rank arrays. Furthermore, <source lang=apl inline>[]</source> already had meaning (indexing all elements of a vector) while <source lang=apl inline>()</source> didn't have any existing use, and so could be used to denote a new empty namespace, equivalent to <source lang=apl inline>⎕NS 0⍴⊂''</source>.
 
==== Separator between name and value ====
 
Should <source lang=apl inline>:</source> or <source lang=apl inline>←</source> separate the name from the value?
 
While initially, <source lang=apl inline>←</source> was seen as the obvious choice to separate the name and the value, it was soon discovered that a namespace with only one member would be indistinguishable from a parenthesised [[assignment]]. Furthermore, it was noted that value expressions could contain intermediary assignments, and that such assignments were of a fundamentally different nature from the name-value declaration. The intermediary assignments would happen in a temporary scope, with any created variables disappearing once the namespace member value was established.
 
==== Scoping ====
 
In which scope the value expressions should be evaluated?
 
Value expressions could be evaluated in the newly established namespace (similar to expressions in <source lang=apl inline>:Namespace</source> scripts), or in the surrounding scope (similar to inline expressions in [[wikipedia:JavaScript|JavaScript]]'s object notation). It was envisioned that a main usage of the literal notation would be to collect existing values into a namespace, and evaluating inside the new namespace would force the use of <source lang=apl inline>##.</source> to fetch values in the surrounding scope. In a departure from JavaScript, it was found most natural that such intermediate assignments be local to the value expression, similar to assignments in dfns. Global assignment is still available using <source lang=apl inline>⎕THIS.name←value</source>, just as in dfns.
 
== Timeline ==


=== 1996 ===
=== 1996 ===
Line 39: Line 79:
=== 2013 ===
=== 2013 ===


Phil Last sent a proposal to Dyalog outlining two possible executable notations for creating multi-dimensional arrays without function application. One using potential new system construct :Array and :Cell to be used in tradfns and another using line-ends between balanced brackets to define arrays of rank-2 or greater in both dfns and tradfns.
Phil Last sent a proposal to Dyalog outlining two possible executable notations for creating multi-dimensional arrays without function application. One using potential new system construct <source lang=text inline>:Array</source> and <source lang=text inline>:Cell</source> to be used in tradfns and another using line-ends between balanced brackets to define arrays of rank-2 or greater in both dfns and tradfns.


It became RFE 9458: Large and higher rank literal values.  See [[File:Embedding data.pdf]]
It became RFE 9458: Large and higher rank literal values.  See [[File:Embedding data.pdf]]
Line 79: Line 119:
[[APL Germany]]'s 2020 journal also included a description of the notation, including a discussion of potential issues with [[assignment]].<ref>Brudzewsky, Adám. [https://apl-germany.de/wp-content/uploads/2021/11/APL_Journal_2020_1u2.pdf#page=34 A Notation for APL Arrays]. APL-Journal, Volume 2020, number 1-2. [[APL Germany|APL-Germany e.V.]] 2020.</ref>
[[APL Germany]]'s 2020 journal also included a description of the notation, including a discussion of potential issues with [[assignment]].<ref>Brudzewsky, Adám. [https://apl-germany.de/wp-content/uploads/2021/11/APL_Journal_2020_1u2.pdf#page=34 A Notation for APL Arrays]. APL-Journal, Volume 2020, number 1-2. [[APL Germany|APL-Germany e.V.]] 2020.</ref>


== Design considerations ==
== Language comparison ==


In creating the notation's specification, various alternatives were considered. The following requirements were proposed:<ref>[[Adám Brudzewsky]]. Internal documents. [[Dyalog Ltd.]] 30 Jun 2017.</ref>
The following systems support list or vector notation in some form, beyond simple [[strand notation]]. The separators <code>;</code> in A+ and K, and <code>⋄</code> in APL and BQN, indicate any separator, including a line break.


# No new [[glyph]]s
{| class=wikitable
# Reusing existing glyphs for similar purposes
! Language              !! Vectors          !! High-rank        !! [[Namespace]]s          !! [[Function array]]s   !! Assignable
# Similarity to other languages ([[K]], [[wikipedia:JSON|JSON]], [[wikipedia:CSS|CSS]])
|-
# Visual attractiveness
| [[Nial]]              || <code>[,]</code> ||                  ||                        || {{Yes}}              || {{No}}
# Intuitive syntax
|-
# As little [[wikipedia:syntactic sugar|syntactic sugar]] as possible
| [[A+]]                || <code>(;)</code> ||                  ||                        || {{Maybe|First-class}} || {{Yes}}
|-
| [[K]]                 || <code>(;)</code> ||                  ||                        || {{Maybe|First-class}} || {{Yes}}
|-
| [[dzaima/APL]]        || <code>(⋄)</code> || <code>[⋄]</code> || <code>(key:val⋄)</code> || {{Yes}}              || {{Maybe|N/A}}
|-
| [[BQN]]<ref>[[Marshall Lochbaum|Lochbaum, Marshall]]. [https://mlochbaum.github.io/BQN/doc/arrayrepr.html#array-literals BQN: Array notation and display; Array literals]. Retrieved 2022-09-01.</ref> || <code>⟨⋄⟩</code> || <code>[⋄]</code> || <code>{key⇐val⋄}</code> || {{Maybe|First-class}} || {{Yes}}
|-
| [[Dyalog Link]]       || <code>(⋄)</code> || <code>[⋄]</code> || <code>(key:val⋄)</code> || {{No|No (indirect)}}  || {{No}}
|-
| Acre Desktop<ref>The Carlisle Group. [https://github.com/the-carlisle-group/Acre-Desktop/wiki/APL-Array-Notation APL Array Notation]. Acre Desktop Wiki. GitHub. Retrieved 2022-09-01.</ref> || <code>(⋄)</code> || <code>[⋄]</code> || <code>[key←val⋄]</code> || {{No}}  || {{No}}
|}


=== Glyphs ===
The "Function arrays" column indicates whether functions can be placed in array notation ([[function array]]s can be created in Dyalog by another method). "First class" indicates that functions are first class, so this is possible without special consideration; in Nial and dzaima/APL vectors of functions are a special form that can be applied to arguments to return a list of results. The "Assignable" column indicates that array notation can be used as an assignment target to perform destructuring. BQN's namespaces don't use a dedicated construction; instead, any block (like a [[dfn]]) with <code></code> statements returns a namespace reference. Acre Desktop only uses array notation for storing literal arrays; it cannot appear in executable code.
 
The design requirement for no new glyphs was contentious, and both [[bi-glyph]] and non-ASCII brackets were considered. Bi-glyphs were rejected out of readability concerns, especially when nested. For example, <source lang=apl inline>1 1 3⍴2</source> could have been written as <source lang=apl inline>[[[[2 2 2]]]]</source>. Non-ASCII brackets were rejected for font and keyboarding reasons, as well as to make it easier for non-APL systems to generate APL data. For example, <source lang=apl inline>⟦</source>…<source lang=apl inline>⟧</source> was proposed to denote a collection of [[major cells]], forming a new array of rank one-higher than the rank of the highest-[[rank]] constituent [[cell]]. However, few [[fonts]] support these glyphs.
 
The eventual choice was to go with existing symbols, and this had important implications for the specifics of the notation. While ideally, a notation would have been introduced for a collection of major cells, thereby handling both vectors and higher-rank arrays, a problem presents itself with [[axis|axes]] of length 1, because both square brackets and round parentheses already have meaning with when surrounding a single statement (namely [[function axis]]/[[bracket indexing]] and [[precedence]]/[[function train]]s). Thus, while <source lang=apl inline>2 ⟦3⟧</source> could have denoted the [[nested array]] <source lang=apl inline>2 (1⍴3)</source>, this isn't viable with <source lang=apl inline>2 [1⍴3]</source> because this already denotes indexing <source lang=apl inline>2</source> using the indices <source lang=apl inline>1⍴3</source>. To disambiguate, at least one statement separator or line break must be present in each level of array notation brackets and parentheses.
 
=== Disambiguating square brackets ===
The overloading of square brackets, currently in use only for [[function axis]] and [[bracket indexing]], to mean a higher-rank array, poses a problem of disambiguation in the case where there is only one major cell. For example <source lang=apl inline>'abc'[3 3]</source> could be equivalent to <source lang=apl inline>'cc'</source> or <source lang=apl inline>'abc'(1 2⍴3)</source> depending on whether the brackets are interpreted as indexing or an array. Two proposals have been made, and it is possible to support either or both:
# Square brackets are interpreted as representing an array if no other interpretation is possible, e.g. immediately following an opening round parenthesis, curly brace, or square bracket, or beginning a statement.
# Square brackets are interpreted as representing an array if they are "broken", i.e. contain a diamond or newline that isn't enclosed in another round parenthesis, curly brace, or square bracket.
The design used in this article, which corresponds to the design proposed by [[Dyalog Ltd]], uses only the first option.
=== Minimum rank of major cells ===
 
While <source lang=apl inline>⟦⟦3⟧⟧</source> could denote <source lang=apl inline>1 1⍴3</source> using non-ASCII glyphs, an equivalent ASCII scheme instead would have required <source lang=apl inline>[[3⋄]⋄]</source> where the inner bracket creates a vector, and the outer creates a [[matrix]]. Using line breaks instead of diamonds, it was found to be counter-intuitive that <source lang=apl>[
3
5
  ]</source> was to denote two-[[element]] vector while <source lang=apl>[
3 4
5 6
    ]</source> would be a two-row matrix. Therefore, a special rule was added to the effect that in such collections of major cells, every cell would be considered to have a rank of at least 1, even if it was a [[scalar]].
 
In turn, this choice introduced the need for a separate notation to allow vectors to be written over multiple lines, and therefore the round parentheses was extended from its traditional use in [[strand notation]] to also denote a collection of [[enclose]]d elements.
 
=== Name-value pairs ===
 
As a notation for [[namespace]]s, several details were debated:
 
# Whether to use <source lang=apl inline>⋄</source> or <source lang=apl inline>;</source> to separate [[wikipedia:name-value pair|name-value pair]]s (in addition to line breaks)
# Which enclosure glyphs to use, <source lang=apl inline>(</source>…<source lang=apl inline>)</source> or <source lang=apl inline>[</source>…<source lang=apl inline>]</source>
# Which glyph should separate the name from the value, <source lang=apl inline>:</source> or <source lang=apl inline>←</source>
# In which scope the value expressions should be evaluated
 
The <source lang=apl inline>⋄</source> was chosen to separate name-value pairs, as it is generally exchangeable with a line break, while <source lang=apl inline>;</source> though it is used to separate names ― without values ― in [[Defined_function_(traditional)#Semi-colons|headers]] and in [[locals lines]]. Furthermore, it was seen as natural the values would be computed in reading order (left-to-right) just like multiple statements are, and while <source lang=apl inline>⋄</source> would imply this, <source lang=apl inline>;</source> wouldn't. Indeed, in the statement <source lang=apl inline>A[B;C]</source>, expression <source lang=apl inline>C</source> is evaluated before expression <source lang=apl inline>B</source>. It was briefly considered to have values computed from the right, just line stranding is, but this was rejected because replacing the semi-colons with line breaks would then require evaluation beginning with the last line and working upwards!
 
Round parentheses were chosen because namespaces are seen as (unordered) lists, and so are more similar to vectors than higher-rank arrays. Furthermore, <source lang=apl inline>[]</source> already had meaning (indexing all elements of a vector) while <source lang=apl inline>()</source> didn't have any existing use, and so could be used to denote a new empty namespace, equivalent to <source lang=apl inline>⎕NS 0⍴⊂''</source>.
 
While initially, <source lang=apl inline>←</source> was seen as the obvious choice to separate the name and the value, it was soon discovered that a namespace with only one member would be indistinguishable from a parenthesised [[assignment]]. Furthermore, it was noted that value expressions could contain intermediary assignments, and that such assignments were of a fundamentally different nature from the name-value declaration. The intermediary assignments would happen in a temporary scope, with any created variables disappearing once the namespace member value was established.
 
Value expressions could be evaluated in the newly established namespace (similar to expressions in <source lang=apl inline>:Namespace</source> scripts), or in the surrounding scope (similar to inline expressions in [[wikipedia:JavaScript|JavaScript]]'s object notation). It was envisioned that a main usage of the literal notation would be to collect existing values into a namespace, and evaluating inside the new namespace would force the use of <source lang=apl inline>##.</source> to fetch values in the surrounding scope. In a departure from JavaScript, it was found most natural that such intermediate assignments be local to the value expression, similar to assignments in dfns. Global assignment is still available using <source lang=apl inline>⎕THIS.name←value</source>, just as in dfns.


== References ==
== References ==
<references/>
<references/>
{{APL syntax}}[[Category:APL syntax]][[Category:Nested array model]]
{{APL syntax}}[[Category:APL syntax]][[Category:Nested array model]]

Navigation menu