Unique

From APL Wiki
Jump to navigation Jump to search

Unique (), or Nub, is a monadic set function which removes duplicate major cells from an array. It returns only the distinct cells (those which do not match an earlier distinct cell) in the order they appeared in the array.

Examples

When called on the string "Mississippi", Unique keeps the first three letters and a later "p", but removes all subsequent occurrences.

       'Mississippi'
Misp

It works on nested arrays as well:

       v'CAT' 'DOG' 'CAT' 'DUCK' 'DOG' 'DUCK'
┌───┬───┬────┐
CATDOGDUCK
└───┴───┴────┘

In some languages, such as J and Dyalog APL 17.0 and later, Unique applies to the major cells of an array, including those with rank greater than 1:

      x  (10)∘.2 3 6
1 1 1
2 1 2
1 3 3
2 1 2
1 1 1
2 3 6
1 1 1
2 1 2
1 3 3
2 1 2
      x
1 1 1
2 1 2
1 3 3
2 3 6

Definition

Unique returns an array which is composed of some major cells of its argument; thus the shape of the result is identical to the shape of the argument in all but the leading axis, and less than or equal to it in that axis. A scalar argument is subject to scalar rank extension; the result for a scalar will always be its ravel. Often the argument of Unique is required to be a vector, but the result is still computed in the way described here.

To construct the Unique of an array, the major cells are considered in order of increasing index. Each cell is included if it does not match any cell which was already included.

The array matching is subject to tolerant comparison. In the intolerant case, that is, when equality is transitive, a cell is included if and only if it does not match any other cell which appears earlier in the array. That's because all the discarded duplicate cells must have matched an earlier included cell; if equality is transitive then a cell which matches the duplicate would also match the earlier cell.

Tolerant comparison

The following example shows why we must be careful about how cells are matched when tolerant comparison is involved. Here we produce a vector in which the first and second elements match, as do the second and third, but the first and third elements do not! Unique produces an array which contains an element matching every element of the original array, but a less careful definition (for example, excluding every element which matches an earlier one) would fail to satisfy this property.

      ⎕CT1e¯14
      ⎕PP18 ⍝ Print all digits
      x1+⎕CT×0 0.6 1.2
1 1.000000000000006 1.000000000000012
      x    ⍝ Each element equal to the previous
1 1 2
      x     ⍝ Includes the last element!
1 1.000000000000012
      x∊∪x   ⍝ Every element matches some unique element
1 1 1
      x1x  ⍝ ...which wouldn't happen if the last were excluded
1 1 0
Works in: Dyalog APL

APL model

For vectors the following implementation based on Union can be used. It repeatedly adds cells of the argument to an accumulated unique vector u, using Union so that duplicate cells are never added.

VecUnique  {
    u  0
    u  {u}¯1 1/
}
Works in: Dyalog APL

The accumulation above resembles a reduction—in fact, it is a reverse insertion. The following implementation written with Reduce works for simple vectors in nested APLs because reduction is equivalent to insertion followed by Enclose on such vectors:

VecUnique  {⊃∪⍨/}

It can be modified to obtain a full model for Unique by enclosing enough times, and converting scalars to vectors with Replicate at the end:

Unique  {1/↑↑⌽∪/⌽⊂¯1}
Works in: Dyalog APL

While the model {((⍳≢)=())}, based on Nub Sieve, is often described as an implementation of Unique, it does not correctly handle tolerant comparison. A correct implementation of Nub Sieve could be used to implement Unique, but writing such an implementation is no easier than implementing Unique directly.

Properties

The first major cell of a non-empty argument to Unique is included in the result, since there are no earlier cells for it to match. It follows that the result of Unique is empty only if the argument was empty.

Every major cell in the argument to Unique matches at least one cell in its result: if a cell didn't match any cells in the Unique, then it would have been included itself. A cell may match multiple cells if it matches both but they do not match each other.

External links

Documentation

APL built-ins [edit]
Primitive functions
Scalar
Monadic ConjugateNotRollType
Dyadic PlusEqual to (Xnor) ∙ Not Equal to (Xor)
Non-Scalar
Structural ShapeReshapeDepthRavelReverse
Selection TakeDropUnique
Selector
Computational
Primitive operators Each
Quad names
Constants Index originMigration level
Functions
Operators
Other ZildeHigh minus