Key: Difference between revisions

Latest revision as of 00:56, 16 April 2024

Key (⌸) is a primitive monadic operator which takes a dyadic function operand where specified keys group the indices or major cells of an argument. It was introduced in Dyalog APL version 14.0 and is commonly compared to SQL's GROUP BY statement.

Description

Monadically, Key will group identical major cells together and applies the function operand once for each unique major cell. The function is applied with the unique major cell as left argument, while the right argument is the indices of major cells that match it:

      {⍺⍵}⌸'Mississippi'
┌─┬────────┐
│M│1       │
├─┼────────┤
│i│2 5 8 11│
├─┼────────┤
│s│3 4 6 7 │
├─┼────────┤
│p│9 10    │
└─┴────────┘

In the dyadic case, Key applies the function to collections of major cells from the right argument corresponding to unique elements of the left argument:

      'Mississippi'{⍺⍵}⌸'ABCDEFGHIJK' 
┌─┬────┐
│M│A   │
├─┼────┤
│i│BEHK│
├─┼────┤
│s│CDFG│
├─┼────┤
│p│IJ  │
└─┴────┘

The monadic case, f⌸Y is equivalent to Y f⌸ ⍳≢Y.

Problems

Vocabulary

A common problem with Key is the inability to control the order of the result (as Key will use the order of appearance) and the "vocabulary" (as Key will never include information for a major cell that doesn't occur). For example, here we want to count occurrences of the letters A, C, G, T:

      {⍺,≢⍵}⌸'TCCGCGGTGGCG'
T 2
C 4
G 6

Since A is entirely missing in the argument, it isn't mentioned in the result either. Likewise, the result is mis-ordered due to G and T appearing before the first C. A common solution is to inject the vocabulary before the actual data, and then decrement from the counts:

      {⍺,¯1+≢⍵}⌸'ACGT','TCCGCGGTGGCG'
A 0
C 4
G 6
T 2

Now that the meaning of each count is known, the operand's left argument can be ignored, and the decrementing can be factored out from the operand:

      ¯1+{≢⍵}⌸'ACGT','TCCGCGGTGGCG'
0 4 6 2

Computing the unique

Key computes the set of unique major cells. Often, this collection is needed separately from the occurrence information, but can be hard to extract. For example, to get the most frequently occurring letter:

      ⊃⍒{≢⍵}⌸'TCCGCGGTGGCG'
3

Notice that 3 is the index in the unique set of letters, and so it is tempting to write:

      {(⊃⍒{≢⍵}⌸⍵)⌷∪⍵}'TCCGCGGTGGCG'
G

However, while this code works, it is inefficient in that the unique is computed twice. This can be avoided by letting Key return the unique and using that:

      (keys counts)←,⌿{⍺,≢⍵}⌸'TCCGCGGTGGCG'
      keys⌷⍨⊃⍒counts
G

Unfortunately, this can introduce a different inefficiency, in that the result of Key's operand can end up being a heterogeneous array (containing multiple datatypes), and these are stored as pointer arrays, consuming memory for one pointer per element, and forcing "pointer chasing" when addressing the data. A possible work-around is to collect the unique keys separately from the result of counts:

      data←'TCCGCGGTGGCG'
      keys←0⌿data
      counts←{keys⍪←⍺ ⋄ ≢⍵}⌸data
      keys⌷⍨⊃⍒counts
G

If there are a large number of unique values, the repeated updating of the accumulating keys variable can be an issue in itself.

History

A key operator was first defined in J by 1990. J's implementer Roger Hui had written a J model taking a monadic function operand and two arguments in January following a discussion at I.P. Sharp Associates; he mentioned that such an operator had been proposed in the past by himself as well as Joey Tuttle and Bob Bernecky.^[1]

Hui also implemented the operator in Dyalog APL 14.0, released in 2014. This version added the left argument to the operand function call based on experience with J, and also defined the monadic case, which in J had been defined as Oblique, calling the function on diagonals of an argument matrix. This definition was adopted by Dyalog-like dialects including ngn/apl, dzaima/APL (with changes), and April. In 2023, Henry Rich added a new key primitive /.. to J that also passes the unique value as the left argument (however, calling the derived function monadically was not defined).

Other variations on Key have been implemented. Since 2023 Kap defines a primitive function ⌸ that functions like Dyalog's {⍺⍵}⌸. Dyalog APL Vision defines an extension to Key when the operand is an array. The operand defines a vocabulary, so that result elements correspond to cells of the operand, and these elements contain either a list of indices where that value was found (if monadic), or right argument cells corresponding to positions where it was found in the left argument (if dyadic). The monadic form is similar to K3's Group, except the list of keys is specified explicitly rather than being taken from the unique values of the argument.

External links

Lessons

APL Cultivation

Documentation

Dyalog
Kap (a primitive function variation)
J NuVoc (/. and /..), Dictionary (/. only)

APL built-ins [edit]
Primitives (Timeline)	Functions
Scalar
		Monadic	Conjugate ∙ Negate ∙ Signum ∙ Reciprocal ∙ Magnitude ∙ Exponential ∙ Natural Logarithm ∙ Floor ∙ Ceiling ∙ Factorial ∙ Not ∙ Pi Times ∙ Roll ∙ Type ∙ Imaginary ∙ Square Root
		Dyadic	Add ∙ Subtract ∙ Times ∙ Divide ∙ Residue ∙ Power ∙ Logarithm ∙ Minimum ∙ Maximum ∙ Binomial ∙ Comparison functions ∙ Boolean functions (And, Or, Nand, Nor) ∙ GCD ∙ LCM ∙ Circular ∙ Complex ∙ Root
Non-Scalar
		Structural	Shape ∙ Reshape ∙ Tally ∙ Depth ∙ Ravel ∙ Enlist ∙ Table ∙ Catenate ∙ Reverse ∙ Rotate ∙ Transpose ∙ Raze ∙ Mix ∙ Split ∙ Enclose ∙ Nest ∙ Cut (K) ∙ Pair ∙ Link ∙ Partitioned Enclose ∙ Partition
		Selection	First ∙ Pick ∙ Take ∙ Drop ∙ Unique ∙ Identity ∙ Stop ∙ Select ∙ Replicate ∙ Expand ∙ Set functions (Intersection ∙ Union ∙ Without) ∙ Bracket indexing ∙ Index ∙ Cartesian Product ∙ Sort
		Selector	Index generator ∙ Grade ∙ Index Of ∙ Interval Index ∙ Indices ∙ Deal ∙ Prefix and suffix vectors
		Computational	Match ∙ Not Match ∙ Membership ∙ Find ∙ Nub Sieve ∙ Encode ∙ Decode ∙ Matrix Inverse ∙ Matrix Divide ∙ Format ∙ Execute ∙ Materialise ∙ Range
Operators	Monadic	Each ∙ Commute ∙ Constant ∙ Replicate ∙ Expand ∙ Reduce ∙ Windowed Reduce ∙ Scan ∙ Outer Product ∙ Key ∙ I-Beam ∙ Spawn ∙ Function axis
Operators	Dyadic	Bind ∙ Compositions (Compose, Reverse Compose, Beside, Withe, Atop, Over) ∙ Inner Product ∙ Determinant ∙ Power ∙ At ∙ Under ∙ Rank ∙ Depth ∙ Variant ∙ Stencil ∙ Cut ∙ Direct definition (operator)
Quad names	Index origin ∙ Comparison tolerance ∙ Migration level ∙ Atomic vector

↑ Roger Hui. Essays/Key. "History".

[1] Roger Hui. Essays/Key. "History".

[1]

@@ Line 1: / Line 1: @@
-{{Built-in|Key|⌸}} is a primitive [[monadic operator]] with an ambivalent [[operand]] where specified keys group the indices or major cells of an argument. It was introduced in [[Dyalog APL]] version 14.0 and is commonly compared to SQL's "GROUP BY" clause.
+{{Built-in|Key|⌸}} is a [[primitive operator|primitive]] [[monadic operator]] which takes a [[dyadic function]] [[operand]] where specified keys group the indices or major cells of an argument. It was introduced in [[Dyalog APL]] version 14.0 and is commonly compared to SQL's [[wikipedia:Group by (SQL)|GROUP BY]] statement.
 == Description ==
-Monadically, key will group identical major cells together and applies the [[function]] operand f to each unique key, and the indices of the elements matching that key.
+[[Monadic]]ally, Key will group identical [[major cell]]s together and applies the [[function]] operand once for each unique major cell. The function is applied with the unique major cell as left argument, while the right argument is the indices of major cells that match it:
-<source lang=apl>
+<syntaxhighlight lang=apl>
        {⍺⍵}⌸'Mississippi'
 ┌─┬────────┐
@@ Line 15: / Line 15: @@
 │p│9 10    │
 └─┴────────┘
-</source>
+</syntaxhighlight>
-In the dyadic case, key applies f to the elements of the right argument corresponding to the unique elements of the left.
+In the [[dyadic]] case, Key applies the function to collections of major cells from the right argument corresponding to unique elements of the left argument:
-<source lang=apl>
+<syntaxhighlight lang=apl>
        'Mississippi'{⍺⍵}⌸'ABCDEFGHIJK'
 ┌─┬────┐
@@ Line 30: / Line 30: @@
 │p│IJ  │
 └─┴────┘
-</source>
+</syntaxhighlight>
-In fact, the monadic case <source lang=apl inline>f⌸⍵</source> is equivalent to <source lang=apl inline>⍵ f⌸ ⍳≢⍵</source>
+The monadic case, <syntaxhighlight lang=apl inline>f⌸Y</syntaxhighlight> is equivalent to <syntaxhighlight lang=apl inline>Y f⌸ ⍳≢Y</syntaxhighlight>.
+== Problems ==
+=== Vocabulary ===
+A common problem with Key is the inability to control the order of the result (as Key will use the order of appearance) and the "vocabulary" (as Key will never include information for a major cell that doesn't occur). For example, here we want to count occurrences of the letters A, C, G, T:
+<syntaxhighlight lang=apl>
+      {⍺,≢⍵}⌸'TCCGCGGTGGCG'
+T 2
+C 4
+G 6
+</syntaxhighlight>
+Since A is entirely missing in the argument, it isn't mentioned in the result either. Likewise, the result is mis-ordered due to G and T appearing before the first C. A common solution is to inject the vocabulary before the actual data, and then decrement from the counts:
+<syntaxhighlight lang=apl>      {⍺,¯1+≢⍵}⌸'ACGT','TCCGCGGTGGCG'
+A 0
+C 4
+G 6
+T 2
+</syntaxhighlight>
+Now that the meaning of each count is known, the operand's left argument can be ignored, and the decrementing can be factored out from the operand:
+<syntaxhighlight lang=apl>
+      ¯1+{≢⍵}⌸'ACGT','TCCGCGGTGGCG'
+4 6 2
+</syntaxhighlight>
+=== Computing the unique ===
+Key computes the set of [[unique]] major cells. Often, this collection is needed separately from the occurrence information, but can be hard to extract. For example, to get the most frequently occurring letter:
+<syntaxhighlight lang=apl>
+      ⊃⍒{≢⍵}⌸'TCCGCGGTGGCG'
+</syntaxhighlight>
+Notice that 3 is the index in the unique set of letters, and so it is tempting to write:
+<syntaxhighlight lang=apl>
+      {(⊃⍒{≢⍵}⌸⍵)⌷∪⍵}'TCCGCGGTGGCG'
+G
+</syntaxhighlight>
+However, while this code works, it is inefficient in that the unique is computed twice. This can be avoided by letting Key return the unique and using that:
+<syntaxhighlight lang=apl>
+      (keys counts)←,⌿{⍺,≢⍵}⌸'TCCGCGGTGGCG'
+      keys⌷⍨⊃⍒counts
+G
+</syntaxhighlight>
+Unfortunately, this can introduce a different inefficiency, in that the result of Key's operand can end up being a [[heterogeneous array]] (containing multiple [[datatype]]s), and these are stored as pointer arrays, consuming memory for one pointer per element, and forcing "pointer chasing" when addressing the data. A possible work-around is to collect the unique keys separately from the result of counts:
+<syntaxhighlight lang=apl>
+      data←'TCCGCGGTGGCG'
+      keys←0⌿data
+      counts←{keys⍪←⍺ ⋄ ≢⍵}⌸data
+      keys⌷⍨⊃⍒counts
+G
+</syntaxhighlight>
+If there are a large number of unique values, the repeated updating of the accumulating <syntaxhighlight lang=apl inline>keys</syntaxhighlight> variable can be an issue in itself.
+== History ==
+A key operator was first defined in [[J]] by 1990. J's implementer [[Roger Hui]] had written a J model taking a monadic function operand and two arguments in January following a discussion at [[I.P. Sharp Associates]]; he mentioned that such an operator had been proposed in the past by himself as well as [[Joey Tuttle]] and [[Bob Bernecky]].<ref>[[Roger Hui]]. [https://code.jsoftware.com/wiki/Essays/Key Essays/Key]. "History".</ref>
+Hui also implemented the operator in [[Dyalog APL 14.0]], released in 2014. This version added the left argument to the operand function call based on experience with J, and also defined the monadic case, which in J had been defined as Oblique, calling the function on diagonals of an argument matrix. This definition was adopted by Dyalog-like dialects including [[ngn/apl]], [[dzaima/APL]] (with changes), and [[April]]. In 2023, [[Henry Rich]] added a new key primitive <syntaxhighlight lang=j inline>/..</syntaxhighlight> to J that also passes the unique value as the left argument (however, calling the derived function monadically was not defined).
+Other variations on Key have been implemented. Since 2023 [[Kap]] defines a primitive function <syntaxhighlight lang=apl inline>⌸</syntaxhighlight> that functions like Dyalog's <syntaxhighlight lang=apl inline>{⍺⍵}⌸</syntaxhighlight>. [[Dyalog APL Vision]] defines an extension to Key when the operand is an array. The operand defines a vocabulary, so that result elements correspond to cells of the operand, and these elements contain either a list of indices where that value was found (if monadic), or right argument cells corresponding to positions where it was found in the left argument (if dyadic). The monadic form is similar to [[K|K3]]'s [[Group (K)|Group]], except the list of keys is specified explicitly rather than being taken from the unique values of the argument.
 == External links ==
@@ Line 39: / Line 94: @@
 === Documentation ===
 * [https://help.dyalog.com/latest/Content/Language/Primitive%20Operators/Key.htm Dyalog]
+* [https://kapdemo.dhsdevelopments.com/reference.html#_key Kap] (a primitive function variation)
+* J [https://code.jsoftware.com/wiki/Vocabulary/slashdot#dyadic NuVoc] (<syntaxhighlight lang=j inline>/.</syntaxhighlight> and <syntaxhighlight lang=j inline>/..</syntaxhighlight>), [https://code.jsoftware.com/wiki/Vocabulary/slashdot#dyadic Dictionary] (<syntaxhighlight lang=j inline>/.</syntaxhighlight> only)
 {{APL built-ins}}[[Category:Primitive operators]]

Key: Difference between revisions

Latest revision as of 00:56, 16 April 2024

Contents

Description

Problems

Vocabulary

Computing the unique

History

External links

Lessons

Documentation

Navigation menu

Key: Difference between revisions

Latest revision as of 00:56, 16 April 2024

Description

Problems

Vocabulary

Computing the unique

History

External links

Lessons

Documentation

Navigation menu

Search