Unique: Difference between revisions
m (→History) |
(Simpler fix for length-1 case) |
||
(16 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
{{Built- | {{Built-ins|Unique|∪|↑}}, or '''Nub''', is a [[monadic]] [[set function]] which removes duplicate [[major cells]] from an array. It returns only the distinct cells (those which do not [[match]] an earlier distinct cell) in the order they appeared in the array. | ||
== Examples == | == Examples == | ||
When called on the [[string]] "Mississippi", Unique keeps the first three letters and a later "p", but removes all subsequent occurrences. | When called on the [[string]] "Mississippi", Unique keeps the first three letters and a later "p", but removes all subsequent occurrences. | ||
< | <syntaxhighlight lang=apl> | ||
∪ 'Mississippi' | ∪ 'Mississippi' | ||
Misp | Misp | ||
</ | </syntaxhighlight> | ||
It works on nested arrays as well: | It works on nested arrays as well: | ||
< | <syntaxhighlight lang=apl> | ||
∪ v←'CAT' 'DOG' 'CAT' 'DUCK' 'DOG' 'DUCK' | ∪ v←'CAT' 'DOG' 'CAT' 'DUCK' 'DOG' 'DUCK' | ||
┌───┬───┬────┐ | ┌───┬───┬────┐ | ||
│CAT│DOG│DUCK│ | │CAT│DOG│DUCK│ | ||
└───┴───┴────┘ | └───┴───┴────┘ | ||
</ | </syntaxhighlight> | ||
In some languages, such as [[J]] and [[Dyalog APL 17.0]] and later, Unique applies to the [[major cells]] of an array, including those with rank greater than 1: | In some languages, such as [[J]] and [[Dyalog APL 17.0]] and later, Unique applies to the [[major cells]] of an array, including those with rank greater than 1: | ||
< | <syntaxhighlight lang=apl> | ||
⎕←x ← (⍳10)∘.∨2 3 6 | |||
1 1 1 | 1 1 1 | ||
2 1 2 | 2 1 2 | ||
Line 33: | Line 33: | ||
1 3 3 | 1 3 3 | ||
2 3 6 | 2 3 6 | ||
</ | </syntaxhighlight> | ||
== Definition == | == Definition == | ||
Line 45: | Line 45: | ||
=== Tolerant comparison === | === Tolerant comparison === | ||
The following example shows why we must be careful about how cells are matched when tolerant comparison is involved. Here we produce a vector in which the first and second elements match, as do the second and third, but the first and third elements do not! Unique produces an array | The following example shows why we must be careful about how cells are matched when tolerant comparison is involved. Here we produce a vector in which the first and second elements match, as do the second and third, but the first and third elements do not! Unique produces an array Z such that each element of the original array matches at least one element of Z, but a less careful definition (for example, excluding every element which matches an earlier one) would fail to satisfy this property. | ||
< | <syntaxhighlight lang=apl> | ||
⎕CT←1e¯14 | ⎕CT←1e¯14 | ||
⎕PP←18 ⍝ Print all digits | ⎕PP←18 ⍝ Print all digits | ||
⎕←x←1+⎕CT×0 0.6 1.2 | |||
1 1.000000000000006 1.000000000000012 | 1 1.000000000000006 1.000000000000012 | ||
⍳⍨x ⍝ Each element equal to the previous | ⍳⍨x ⍝ Each element equal to the previous | ||
Line 59: | Line 59: | ||
x∊1↑x ⍝ ...which wouldn't happen if the last were excluded | x∊1↑x ⍝ ...which wouldn't happen if the last were excluded | ||
1 1 0 | 1 1 0 | ||
</ | </syntaxhighlight> | ||
{{Works in|[[Dyalog APL]]}} | {{Works in|[[Dyalog APL]]}} | ||
== History == | == History == | ||
The function Nub was proposed in | The function Nub was proposed in [[Ken Iverson|Iverson]]'s 1978 [[Operators and Functions]] along with other set functions including [[Union]], [[Intersection]], and [[Difference]]. There it had the symbol <syntaxhighlight lang=apl inline>∪</syntaxhighlight> and definition <syntaxhighlight lang=apl inline>∪w</syntaxhighlight> {{←→}} <syntaxhighlight lang=apl inline>((⍳⍴w)=w⍳w)/w←,w</syntaxhighlight>. It was implemented as part of [[STSC]]'s [[NARS]] in 1981 with the name Unique and the same definition, except that the argument was restricted to be a vector or scalar.<ref>Cheney, Carl. ''APL*PLUS Nested Arrays Reference Manual. [[STSC]] Inc. 1981.</ref> Later APLs such as [[Dyalog APL]] generally adopted this version of the primitive, and it is featured in the [[ISO/IEC 13751:2001]] standard. Dyalog extended Unique to higher-rank arrays in [[Dyalog APL 17.0]], following the much earlier extension made by [[J]]. | ||
Iverson continued to refine his definition of Nub. He included it in his [[Rationalized APL]] specification with no changes in 1983, but modified it to work on [[major cell]]s (thus aligning it with [[leading axis theory]]) in his 1987 [[A Dictionary of APL]]. Iverson's dictionary also changed the symbol for Nub to < | Iverson continued to refine his definition of Nub. He included it in his [[Rationalized APL]] specification with no changes in 1983, but modified it to work on [[major cell]]s (thus aligning it with [[leading axis theory]]) in his 1987 [[A Dictionary of APL]]. Iverson's dictionary also changed the symbol for Nub to <syntaxhighlight lang=apl inline>↑</syntaxhighlight>, added the functions [[Nub Sieve]] (<syntaxhighlight lang=apl inline>≠</syntaxhighlight>) and [[Nub in]] (<syntaxhighlight lang=apl inline>=</syntaxhighlight>), and defined Nub in terms of Nub Sieve. These definitions were adopted by [[IPSA]] in the [[SHARP APL]] successor [[SAX]]. [[J]] uses similar definitions: its Nub-related functions are Nub (<syntaxhighlight lang=j inline>~.</syntaxhighlight>), [[Nub Sieve]] (<syntaxhighlight lang=j inline>~:</syntaxhighlight>), and [[Self-classify]] (<syntaxhighlight lang=j inline>=</syntaxhighlight>). However, Self-classify is indicated as deprecated in J's NuVoc because its result can be much larger than its argument. | ||
[[A+]] does not include Unique or any related functions, but [[K]]'s function range (<code>?</code>) gives the unique elements of a list argument. In later versions it is called "distinct" or "unique". While APLs allow a [[scalar]] argument ([[scalar rank extension]]), K gives a [[RANK ERROR]] if the argument is not a list. | [[A+]] does not include Unique or any related functions, but [[K]]'s function range (<code>?</code>) gives the unique elements of a list argument. In later versions it is called "distinct" or "unique". While APLs allow a [[scalar]] argument ([[scalar rank extension]]), K gives a [[RANK ERROR]] if the argument is not a list. | ||
Line 72: | Line 72: | ||
== APL model == | == APL model == | ||
For [[Vector|vectors]] the following implementation based on [[Union]] can be used. It repeatedly adds cells of the argument to an accumulated unique vector < | For [[Vector|vectors]] the following implementation based on [[Union]] can be used. It repeatedly adds cells of the argument to an accumulated unique vector <syntaxhighlight lang=apl inline>u</syntaxhighlight>, using Union so that duplicate cells are never added. | ||
< | <syntaxhighlight lang=apl> | ||
VecUnique ← { | VecUnique ← { | ||
u ← 0↑⍵ | u ← 0↑⍵ | ||
u ⊣ {u∪←⍵}⍤¯1 ⊢1/⍵ | u ⊣ {u∪←⍵}⍤¯1 ⊢1/⍵ | ||
} | } | ||
</ | </syntaxhighlight> | ||
{{Works in|[[Dyalog APL]]}} | {{Works in|[[Dyalog APL]]}} | ||
The accumulation above resembles a [[reduction]]—in fact, it is a reverse [[insertion]]. The following implementation written with Reduce works for [[simple]] vectors in [[Nested array model|nested]] APLs because reduction is equivalent to insertion followed by [[Enclose]] on such vectors: | The accumulation above resembles a [[reduction]]—in fact, it is a reverse [[insertion]] (reversing is necessary when using tolerant comparison as cells need to be added in order). The following implementation written with Reduce works for non-empty [[simple]] vectors in [[Nested array model|nested]] APLs because reduction is equivalent to insertion followed by [[Enclose]] on such vectors: | ||
< | <syntaxhighlight lang=apl> | ||
VecUnique ← {⊃∪⍨/⌽⍵} | VecUnique ← {⊃∪⍨/⌽⍵} | ||
</ | </syntaxhighlight> | ||
{{Works in|[[Dyalog APL]],[[ngn/apl]]}} | {{Works in|[[Dyalog APL]],[[ngn/apl]]}} | ||
It | It is extended to empty, nested, and arbitrary rank inputs as follows: | ||
< | <syntaxhighlight lang=apl> | ||
m ← {0=≢⍵:⍵ ⋄ ↑,⊃∪⍨/⌽⊂¨⊂⍤¯1⊢⍵} | |||
</ | </syntaxhighlight> | ||
{{Works in|[[Dyalog APL]]}} | {{Works in|[[Dyalog APL]]}} | ||
The guard checks for length zero because <syntaxhighlight lang=apl inline>∪⍨</syntaxhighlight> lacks an [[identity element]]. The [[Replicate]] call <syntaxhighlight lang=apl inline>1/⍵</syntaxhighlight> converts a scalar to a vector, and the encloses ensure that the array added by Union is always a scalar. | |||
While the model < | If modelling the primitive via Union is not desirable due to how similar these two primitives are, it is possible to replace this particular usage of union (not generally, as it relies on the right argument being scalar) as follows: | ||
<syntaxhighlight lang=apl> | |||
m ← {0=≢⍵:⍵ ⋄ ↑,⊃{⍺,(∧/⍺≢¨⍵)/⍵}⍨/⌽⊂¨⊂⍤¯1⊢⍵} | |||
</syntaxhighlight> | |||
{{Works in|[[Dyalog APL]]}} | |||
While the model <syntaxhighlight lang=apl inline>{((⍳≢⍵)=(⍵⍳⍵))⌿⍵}</syntaxhighlight> for vector arguments, based on [[Nub Sieve]], is often described as an implementation of Unique, it does not correctly handle [[tolerant comparison]].<ref>Hui, Roger. [https://dyalog.tv/Dyalog17/?v=fPWky9IOG40 Tolerant Unique]. [[Dyalog '17]].</ref> A correct implementation of Nub Sieve could be used to implement Unique, but writing such an implementation is no easier than implementing Unique directly. | |||
== Properties == | == Properties == | ||
Line 103: | Line 110: | ||
=== Documentation === | === Documentation === | ||
* [ | * [https://help.dyalog.com/latest/index.htm#Language/Primitive%20Functions/Unique.htm Dyalog] | ||
* [http://microapl.com/apl_help/ch_020_020_392.htm APLX] | * [http://microapl.com/apl_help/ch_020_020_392.htm APLX] | ||
* [https://www.jsoftware.com/help/dictionary/d221.htm J Dictionary], [https://code.jsoftware.com/wiki/Vocabulary/tildedot J NuVoc] (as <syntaxhighlight lang=apl inline>~.</syntaxhighlight>) | |||
* [https://mlochbaum.github.io/BQN/doc/selfcmp.html#deduplicate BQN] (as <code>⍷</code>) | |||
== References == | |||
{{APL built-ins}} | <references /> | ||
{{APL built-ins}}[[Category:Primitive functions]][[Category:Set functions]] |
Latest revision as of 12:57, 23 November 2023
∪ ↑
|
Unique (∪
, ↑
), or Nub, is a monadic set function which removes duplicate major cells from an array. It returns only the distinct cells (those which do not match an earlier distinct cell) in the order they appeared in the array.
Examples
When called on the string "Mississippi", Unique keeps the first three letters and a later "p", but removes all subsequent occurrences.
∪ 'Mississippi' Misp
It works on nested arrays as well:
∪ v←'CAT' 'DOG' 'CAT' 'DUCK' 'DOG' 'DUCK' ┌───┬───┬────┐ │CAT│DOG│DUCK│ └───┴───┴────┘
In some languages, such as J and Dyalog APL 17.0 and later, Unique applies to the major cells of an array, including those with rank greater than 1:
⎕←x ← (⍳10)∘.∨2 3 6 1 1 1 2 1 2 1 3 3 2 1 2 1 1 1 2 3 6 1 1 1 2 1 2 1 3 3 2 1 2 ∪x 1 1 1 2 1 2 1 3 3 2 3 6
Definition
Unique returns an array which is composed of some major cells of its argument; thus the shape of the result is identical to the shape of the argument in all but the leading axis, and less than or equal to it in that axis. A scalar argument is subject to scalar rank extension; the result for a scalar will always be its ravel. Often the argument of Unique is required to be a vector, but the result is still computed in the way described here.
To construct the Unique of an array, the major cells are considered in order of increasing index. Each cell is included if it does not match any cell which was already included.
The array matching is subject to tolerant comparison. In the intolerant case, that is, when equality is transitive, a cell is included if and only if it does not match any other cell which appears earlier in the array. That's because all the discarded duplicate cells must have matched an earlier included cell; if equality is transitive then a cell which matches the duplicate would also match the earlier cell.
Tolerant comparison
The following example shows why we must be careful about how cells are matched when tolerant comparison is involved. Here we produce a vector in which the first and second elements match, as do the second and third, but the first and third elements do not! Unique produces an array Z such that each element of the original array matches at least one element of Z, but a less careful definition (for example, excluding every element which matches an earlier one) would fail to satisfy this property.
⎕CT←1e¯14 ⎕PP←18 ⍝ Print all digits ⎕←x←1+⎕CT×0 0.6 1.2 1 1.000000000000006 1.000000000000012 ⍳⍨x ⍝ Each element equal to the previous 1 1 2 ∪x ⍝ Includes the last element! 1 1.000000000000012 x∊∪x ⍝ Every element matches some unique element 1 1 1 x∊1↑x ⍝ ...which wouldn't happen if the last were excluded 1 1 0
History
The function Nub was proposed in Iverson's 1978 Operators and Functions along with other set functions including Union, Intersection, and Difference. There it had the symbol ∪
and definition ∪w
((⍳⍴w)=w⍳w)/w←,w
. It was implemented as part of STSC's NARS in 1981 with the name Unique and the same definition, except that the argument was restricted to be a vector or scalar.[1] Later APLs such as Dyalog APL generally adopted this version of the primitive, and it is featured in the ISO/IEC 13751:2001 standard. Dyalog extended Unique to higher-rank arrays in Dyalog APL 17.0, following the much earlier extension made by J.
Iverson continued to refine his definition of Nub. He included it in his Rationalized APL specification with no changes in 1983, but modified it to work on major cells (thus aligning it with leading axis theory) in his 1987 A Dictionary of APL. Iverson's dictionary also changed the symbol for Nub to ↑
, added the functions Nub Sieve (≠
) and Nub in (=
), and defined Nub in terms of Nub Sieve. These definitions were adopted by IPSA in the SHARP APL successor SAX. J uses similar definitions: its Nub-related functions are Nub (~.
), Nub Sieve (~:
), and Self-classify (=
). However, Self-classify is indicated as deprecated in J's NuVoc because its result can be much larger than its argument.
A+ does not include Unique or any related functions, but K's function range (?
) gives the unique elements of a list argument. In later versions it is called "distinct" or "unique". While APLs allow a scalar argument (scalar rank extension), K gives a RANK ERROR if the argument is not a list.
APL model
For vectors the following implementation based on Union can be used. It repeatedly adds cells of the argument to an accumulated unique vector u
, using Union so that duplicate cells are never added.
VecUnique ← { u ← 0↑⍵ u ⊣ {u∪←⍵}⍤¯1 ⊢1/⍵ }
The accumulation above resembles a reduction—in fact, it is a reverse insertion (reversing is necessary when using tolerant comparison as cells need to be added in order). The following implementation written with Reduce works for non-empty simple vectors in nested APLs because reduction is equivalent to insertion followed by Enclose on such vectors:
VecUnique ← {⊃∪⍨/⌽⍵}
It is extended to empty, nested, and arbitrary rank inputs as follows:
m ← {0=≢⍵:⍵ ⋄ ↑,⊃∪⍨/⌽⊂¨⊂⍤¯1⊢⍵}
The guard checks for length zero because ∪⍨
lacks an identity element. The Replicate call 1/⍵
converts a scalar to a vector, and the encloses ensure that the array added by Union is always a scalar.
If modelling the primitive via Union is not desirable due to how similar these two primitives are, it is possible to replace this particular usage of union (not generally, as it relies on the right argument being scalar) as follows:
m ← {0=≢⍵:⍵ ⋄ ↑,⊃{⍺,(∧/⍺≢¨⍵)/⍵}⍨/⌽⊂¨⊂⍤¯1⊢⍵}
While the model {((⍳≢⍵)=(⍵⍳⍵))⌿⍵}
for vector arguments, based on Nub Sieve, is often described as an implementation of Unique, it does not correctly handle tolerant comparison.[2] A correct implementation of Nub Sieve could be used to implement Unique, but writing such an implementation is no easier than implementing Unique directly.
Properties
The first major cell of a non-empty argument to Unique is included in the result, since there are no earlier cells for it to match. It follows that the result of Unique is empty only if the argument was empty.
Every major cell in the argument to Unique matches at least one cell in its result: if a cell didn't match any cells in the Unique, then it would have been included itself. A cell may match multiple cells if it matches both but they do not match each other.
External links
Documentation
- Dyalog
- APLX
- J Dictionary, J NuVoc (as
~.
) - BQN (as
⍷
)
References
- ↑ Cheney, Carl. APL*PLUS Nested Arrays Reference Manual. STSC Inc. 1981.
- ↑ Hui, Roger. Tolerant Unique. Dyalog '17.