Nub Sieve: Difference between revisions

Revision as of 21:02, 10 September 2022

Nub Sieve (≠), Nubsieve, or Unique Mask, is a monadic function that returns a Boolean vector indicating which major cells of its argument would be included in the result of Unique (Nub). More precisely, it indicates all cells which do not match any earlier indicated cell. The Nub Sieve is more informative than Unique because it encodes not only which cells are unique but where they appear in the argument. This means it can be used for tasks such as filtering secondary arrays and working with duplicates that are difficult to do with Unique and slow to compute with Key.

Examples

Nub Sieve returns a 1 for each major cell in the argument that is unique, and a 0 if it's a duplicate: <source lang=apl>

     ≠ 'Hello, World'

1 1 1 0 1 1 1 1 0 1 0 1 </syntaxhighlight> Using Key we can see which elements were labelled unique and duplicate: <source lang=apl>

     (≠ {⍺⍵}⌸ ⊢) 'Hello, World!'

┌─┬──────────┐ │1│Helo, Wrd!│ ├─┼──────────┤ │0│lol │ └─┴──────────┘ </syntaxhighlight>

Works in: Dyalog APL

Nub Sieve can be used to filter a matrix by some of its columns, or find its unique rows. <source lang=apl>

     ⎕← a ← (⍳6) ∘.! 3+⍳3

4 5 6 6 10 15 4 10 20 1 5 15 0 1 6 0 0 1

     (≠1↑⍤1⊢a) ⌿ a           ⍝ Keep only one row for each first column value

4 5 6 6 10 15 1 5 15 0 1 6

     ≠ a                     ⍝ Which entire rows are unique?

1 1 1 1 1 1

     ∧⌿ ≠ a                  ⍝ All of them.

1 </syntaxhighlight>

Description

While the Nub Sieve satisfies <source lang=apl inline>(∪A) ≡ (≠A)⌿A</syntaxhighlight>, this criterion is not a full definition because it doesn't specify which cell should be selected from among duplicates. The Nub Sieve should indicate cells by following the same procedure as Unique, that is, by considering all major cells in order and indicating those which do not match any cell already indicated.

Formally, let <source lang=apl inline>x</syntaxhighlight> be an array of rank at least 1. The Nub Sieve <source lang=apl inline>m</syntaxhighlight> of <source lang=apl inline>x</syntaxhighlight> is the array which satisfies the following two axioms (with <source lang=apl inline>⎕IO←0</syntaxhighlight>) for all <source lang=apl inline>i</syntaxhighlight> in <source lang=apl inline>⍳≢x</syntaxhighlight>. <source lang=apl> notin ← (⍳=≢⍤⊣)⍨ ⍝ Extension of (~∊) to rank >1 (⍴m) ≡ ,≢x ⍝ Shape of m m[i] ≡ (i⌷x) notin m⌿⍥(i∘↑)x ⍝ Items of m </syntaxhighlight> The search <source lang=apl inline>⍳</syntaxhighlight> in <source lang=apl inline>notin</syntaxhighlight> depends on <source lang=apl inline>⎕CT</syntaxhighlight> and so the Nub Sieve depends on comparison tolerance. The two comparisons <source lang=apl inline>≡</syntaxhighlight> are taken to be independent of <source lang=apl inline>⎕CT</syntaxhighlight>; <source lang=apl inline>m</syntaxhighlight> is exactly Boolean.

When the argument is a scalar, Unique returns its ravel, an instance of scalar rank extension. Correspondingly, Nub Sieve of a scalar is defined to be the singleton vector <source lang=apl inline>,1</syntaxhighlight>.

APL models

An implementation based on the above definition follows: <source lang=apl> UniqueMask ← {

   ⎕IO←0
   notin ← (⍳=≢⍤⊣)⍨
   x ← 1/⍵                          ⍝ Treat scalar as vector
   m ← ⍬                            ⍝ Initial mask
   m ⊣ {m,←(⍵⌷x) notin m⌿⍵↑x}¨ ⍳≢x

} </syntaxhighlight>

The Unique Mask can also be derived from the self-classify or index-in-nub function <source lang=apl inline>(∪⍳1/⊢)</syntaxhighlight>. The following two implementations arise from the fact that 1s in the Unique Mask appear exactly when a new value appears in the self-classify. <source lang=apl> (⍳⍤≢=⍳⍨)⍤(∪⍳1/⊢) ⍝ Pre-correction of "bad meme" (⍳≢⍵)=⍵⍳⍵ (2≠/¯1,⌈\)⍤(∪⍳1/⊢) ⍝ Optimised self-classify → Nub Sieve conversion </syntaxhighlight>

Use cases

One use of Nub Sieve is to remove duplicates from one array and also corresponding arrays, which may contain attributes of cells in that array. The statement <source lang=apl inline>a b⌿⍨←⊂≠a</syntaxhighlight> filters <source lang=apl inline>a</syntaxhighlight> and corresponding <source lang=apl inline>b</syntaxhighlight> in this way. A minor variation is to filter an array based on only one attribute of its major cells: <source lang=apl inline>a⌿⍨≠3⌷⍤1⊢a</syntaxhighlight> gets the rows of matrix <source lang=apl inline>a</syntaxhighlight> which introduce new values in column 3. Code like this might be used to compute, for example, the first Nobel prize winner from each country given a list of winners sorted by date. While Key can be used for such computations (for instance <source lang=apl inline>{⊣⌿⍵}⌸⍨</syntaxhighlight> in place of <source lang=apl inline>⌿⍨∘≠</syntaxhighlight>), the Nub Sieve is slightly simpler and considerably faster in most implementations: Key recomputes uniqueness information for its left argument each time.

The combination with Indices <source lang=apl inline>⍸≠A</syntaxhighlight> finds the indices of unique elements of an array, which might be useful to iterate through these elements without extracting them into a separate array (it could even be used to modify them in their original positions). Again, this combination can be computed with Key, using <source lang=apl inline>{⊣⌿⍵}⌸A</syntaxhighlight>, but the Nub Sieve implementation is more obvious and readable ("Where is A unique?") and much faster (while the particular Key operand <source lang=apl inline>{⊣⌿⍵}</syntaxhighlight> could be sped up, the function can be written in many ways, and recognising all of them is difficult).

Nub Sieve can be used to find all elements of an array that are duplicates, by inverting the result. This is useful in order to see or manipulate what is being discarded by Unique. A program might print duplicate values before discarding them: <source lang=apl> m←≠vals ⎕←'Discarding duplicate elements: ',⍕(~m)⌿vals vals⌿⍨←m </syntaxhighlight> The same method can be used with a sieve derived from a function of an array rather than the array itself, for instance to keep only one namespace with a given <source lang=apl inline>id</syntaxhighlight> property but retain the discarded ones for future use.

History

See also Unique#History.

In A Programming Language, the forward set selector $\sigma /\mathbf {v}$ is a special notation indicating the nub sieve of $\mathbf {v}$ , so that unique elements are found with a Compression $(\sigma /\mathbf {v} )/\mathbf {v}$ .

The function Nubsieve (<source lang=apl inline>≠</syntaxhighlight>) first appeared in A Dictionary of APL in 1987, and was eventually added to SHARP APL. It appears in J as <source lang=j inline>~:</syntaxhighlight> or Nub Sieve. It is also included in Dyalog APL 18.0, released June 2020.

External links

Documentation

Dyalog
J Dictionary, NuVoc (as <source lang=j inline>~:</syntaxhighlight>)
BQN (as ∊)

APL built-ins [edit]
Primitives (Timeline)	Functions
Scalar
		Monadic	Conjugate ∙ Negate ∙ Signum ∙ Reciprocal ∙ Magnitude ∙ Exponential ∙ Natural Logarithm ∙ Floor ∙ Ceiling ∙ Factorial ∙ Not ∙ Pi Times ∙ Roll ∙ Type ∙ Imaginary ∙ Square Root ∙ Round
		Dyadic	Add ∙ Subtract ∙ Times ∙ Divide ∙ Residue ∙ Power ∙ Logarithm ∙ Minimum ∙ Maximum ∙ Binomial ∙ Comparison functions ∙ Boolean functions (And, Or, Nand, Nor) ∙ GCD ∙ LCM ∙ Circular ∙ Complex ∙ Root
Non-Scalar
		Structural	Shape ∙ Reshape ∙ Tally ∙ Depth ∙ Ravel ∙ Enlist ∙ Table ∙ Catenate ∙ Reverse ∙ Rotate ∙ Transpose ∙ Raze ∙ Mix ∙ Split ∙ Enclose ∙ Nest ∙ Cut (K) ∙ Pair ∙ Link ∙ Partitioned Enclose ∙ Partition
		Selection	First ∙ Pick ∙ Take ∙ Drop ∙ Unique ∙ Identity ∙ Stop ∙ Select ∙ Replicate ∙ Expand ∙ Set functions (Intersection ∙ Union ∙ Without) ∙ Bracket indexing ∙ Index ∙ Cartesian Product ∙ Sort
		Selector	Index generator ∙ Grade ∙ Index Of ∙ Interval Index ∙ Indices ∙ Deal ∙ Prefix and suffix vectors
		Computational	Match ∙ Not Match ∙ Membership ∙ Find ∙ Nub Sieve ∙ Encode ∙ Decode ∙ Matrix Inverse ∙ Matrix Divide ∙ Format ∙ Execute ∙ Materialise ∙ Range
Operators	Monadic	Each ∙ Commute ∙ Constant ∙ Replicate ∙ Expand ∙ Reduce ∙ Windowed Reduce ∙ Scan ∙ Outer Product ∙ Key ∙ I-Beam ∙ Spawn ∙ Function axis ∙ Identity (Null, Ident)
Operators	Dyadic	Bind ∙ Compositions (Compose, Reverse Compose, Beside, Withe, Atop, Over) ∙ Inner Product ∙ Determinant ∙ Power ∙ At ∙ Under ∙ Rank ∙ Depth ∙ Variant ∙ Stencil ∙ Cut ∙ Direct definition (operator) ∙ Identity (Lev, Dex)
Quad names	Index origin ∙ Comparison tolerance ∙ Migration level ∙ Atomic vector

@@ Line 7: / Line 7: @@
        ≠ 'Hello, World'
 1 1 0 1 1 1 1 0 1 0 1
-</source>
+</syntaxhighlight>
 Using [[Key]] we can see which elements were labelled unique and duplicate:
 <source lang=apl>
@@ Line 16: / Line 16: @@
 │0│lol       │
 └─┴──────────┘
-</source>
+</syntaxhighlight>
 {{Works in|[[Dyalog APL]]}}
 Nub Sieve can be used to filter a [[matrix]] by some of its columns, or find its unique rows.
@@ Line 36: / Line 36: @@
        ∧⌿ ≠ a                  ⍝ All of them.
-</source>
+</syntaxhighlight>
 == Description ==
-While the Nub Sieve satisfies <source lang=apl inline>(∪A) ≡ (≠A)⌿A</source>, this criterion is not a full definition because it doesn't specify which cell should be selected from among duplicates. The Nub Sieve should indicate cells by following the same procedure as [[Unique]], that is, by considering all [[major cells]] in order and indicating those which do not match any cell already indicated.
+While the Nub Sieve satisfies <source lang=apl inline>(∪A) ≡ (≠A)⌿A</syntaxhighlight>, this criterion is not a full definition because it doesn't specify which cell should be selected from among duplicates. The Nub Sieve should indicate cells by following the same procedure as [[Unique]], that is, by considering all [[major cells]] in order and indicating those which do not match any cell already indicated.
-Formally, let <source lang=apl inline>x</source> be an array of [[rank]] at least 1. The Nub Sieve <source lang=apl inline>m</source> of <source lang=apl inline>x</source> is the array which satisfies the following two axioms (with <source lang=apl inline>⎕IO←0</source>) for all <source lang=apl inline>i</source> in <source lang=apl inline>⍳≢x</source>.
+Formally, let <source lang=apl inline>x</syntaxhighlight> be an array of [[rank]] at least 1. The Nub Sieve <source lang=apl inline>m</syntaxhighlight> of <source lang=apl inline>x</syntaxhighlight> is the array which satisfies the following two axioms (with <source lang=apl inline>⎕IO←0</syntaxhighlight>) for all <source lang=apl inline>i</syntaxhighlight> in <source lang=apl inline>⍳≢x</syntaxhighlight>.
 <source lang=apl>
 notin ← (⍳=≢⍤⊣)⍨               ⍝ Extension of (~∊) to rank >1
 (⍴m) ≡ ,≢x                     ⍝ Shape of m
 m[i] ≡ (i⌷x) notin m⌿⍥(i∘↑)x   ⍝ Items of m
-</source>
+</syntaxhighlight>
-The search <source lang=apl inline>⍳</source> in <source lang=apl inline>notin</source> depends on <source lang=apl inline>⎕CT</source> and so the Nub Sieve depends on [[comparison tolerance]]. The two comparisons <source lang=apl inline>≡</source> are taken to be independent of <source lang=apl inline>⎕CT</source>; <source lang=apl inline>m</source> is exactly [[Boolean]].
+The search <source lang=apl inline>⍳</syntaxhighlight> in <source lang=apl inline>notin</syntaxhighlight> depends on <source lang=apl inline>⎕CT</syntaxhighlight> and so the Nub Sieve depends on [[comparison tolerance]]. The two comparisons <source lang=apl inline>≡</syntaxhighlight> are taken to be independent of <source lang=apl inline>⎕CT</syntaxhighlight>; <source lang=apl inline>m</syntaxhighlight> is exactly [[Boolean]].
-When the argument is a [[scalar]], Unique returns its [[ravel]], an instance of [[scalar rank extension]]. Correspondingly, Nub Sieve of a scalar is defined to be the [[singleton]] [[vector]] <source lang=apl inline>,1</source>.
+When the argument is a [[scalar]], Unique returns its [[ravel]], an instance of [[scalar rank extension]]. Correspondingly, Nub Sieve of a scalar is defined to be the [[singleton]] [[vector]] <source lang=apl inline>,1</syntaxhighlight>.
 === APL models ===
@@ Line 63: / Line 63: @@
      m ⊣ {m,←(⍵⌷x) notin m⌿⍵↑x}¨ ⍳≢x
 }
-</source>
+</syntaxhighlight>
-The Unique Mask can also be derived from the [[self-classify]] or index-in-nub function <source lang=apl inline>(∪⍳1/⊢)</source>. The following two implementations arise from the fact that 1s in the Unique Mask appear exactly when a new value appears in the self-classify.
+The Unique Mask can also be derived from the [[self-classify]] or index-in-nub function <source lang=apl inline>(∪⍳1/⊢)</syntaxhighlight>. The following two implementations arise from the fact that 1s in the Unique Mask appear exactly when a new value appears in the self-classify.
 <source lang=apl>
 (⍳⍤≢=⍳⍨)⍤(∪⍳1/⊢)      ⍝ Pre-correction of "bad meme" (⍳≢⍵)=⍵⍳⍵
 (2≠/¯1,⌈\)⍤(∪⍳1/⊢)    ⍝ Optimised self-classify → Nub Sieve conversion
-</source>
+</syntaxhighlight>
 == Use cases ==
-One use of Nub Sieve is to remove duplicates from one array and also corresponding arrays, which may contain attributes of cells in that array. The statement <source lang=apl inline>a b⌿⍨←⊂≠a</source> filters <source lang=apl inline>a</source> and corresponding <source lang=apl inline>b</source> in this way. A minor variation is to filter an array based on only one attribute of its major cells: <source lang=apl inline>a⌿⍨≠3⌷⍤1⊢a</source> gets the rows of matrix <source lang=apl inline>a</source> which introduce new values in column 3. Code like this might be used to compute, for example, the first Nobel prize winner from each country given a list of winners sorted by date. While [[Key]] can be used for such computations (for instance <source lang=apl inline>{⊣⌿⍵}⌸⍨</source> in place of <source lang=apl inline>⌿⍨∘≠</source>), the Nub Sieve is slightly simpler and considerably faster in most implementations: Key recomputes uniqueness information for its left argument each time.
+One use of Nub Sieve is to remove duplicates from one array and also corresponding arrays, which may contain attributes of cells in that array. The statement <source lang=apl inline>a b⌿⍨←⊂≠a</syntaxhighlight> filters <source lang=apl inline>a</syntaxhighlight> and corresponding <source lang=apl inline>b</syntaxhighlight> in this way. A minor variation is to filter an array based on only one attribute of its major cells: <source lang=apl inline>a⌿⍨≠3⌷⍤1⊢a</syntaxhighlight> gets the rows of matrix <source lang=apl inline>a</syntaxhighlight> which introduce new values in column 3. Code like this might be used to compute, for example, the first Nobel prize winner from each country given a list of winners sorted by date. While [[Key]] can be used for such computations (for instance <source lang=apl inline>{⊣⌿⍵}⌸⍨</syntaxhighlight> in place of <source lang=apl inline>⌿⍨∘≠</syntaxhighlight>), the Nub Sieve is slightly simpler and considerably faster in most implementations: Key recomputes uniqueness information for its left argument each time.
-The combination with [[Indices]] <source lang=apl inline>⍸≠A</source> finds the [[Index|indices]] of unique elements of an array, which might be useful to iterate through these elements without extracting them into a separate array (it could even be used to modify them in their original positions). Again, this combination can be computed with Key, using <source lang=apl inline>{⊣⌿⍵}⌸A</source>, but the Nub Sieve implementation is more obvious and readable ("Where is A unique?") and much faster (while the particular Key operand <source lang=apl inline>{⊣⌿⍵}</source> could be sped up, the function can be written in many ways, and recognising all of them is difficult).
+The combination with [[Indices]] <source lang=apl inline>⍸≠A</syntaxhighlight> finds the [[Index|indices]] of unique elements of an array, which might be useful to iterate through these elements without extracting them into a separate array (it could even be used to modify them in their original positions). Again, this combination can be computed with Key, using <source lang=apl inline>{⊣⌿⍵}⌸A</syntaxhighlight>, but the Nub Sieve implementation is more obvious and readable ("Where is A unique?") and much faster (while the particular Key operand <source lang=apl inline>{⊣⌿⍵}</syntaxhighlight> could be sped up, the function can be written in many ways, and recognising all of them is difficult).
 Nub Sieve can be used to find all elements of an array that ''are'' duplicates, by inverting the result. This is useful in order to see or manipulate what is being discarded by Unique. A program might print duplicate values before discarding them:
@@ Line 82: / Line 82: @@
 ⎕←'Discarding duplicate elements: ',⍕(~m)⌿vals
 vals⌿⍨←m
-</source>
+</syntaxhighlight>
-The same method can be used with a sieve derived from a function of an array rather than the array itself, for instance to keep only one namespace with a given <source lang=apl inline>id</source> property but retain the discarded ones for future use.
+The same method can be used with a sieve derived from a function of an array rather than the array itself, for instance to keep only one namespace with a given <source lang=apl inline>id</syntaxhighlight> property but retain the discarded ones for future use.
 == History ==
@@ Line 91: / Line 91: @@
 In [[A Programming Language]], the ''forward set selector'' <math>\sigma/\mathbf{v}</math> is a special notation indicating the nub sieve of <math>\mathbf{v}</math>, so that unique elements are found with a [[Compress]]ion <math>(\sigma/\mathbf{v})/\mathbf{v}</math>.
-The function Nubsieve (<source lang=apl inline>≠</source>) first appeared in [[A Dictionary of APL]] in 1987, and was eventually added to [[SHARP APL]]. It appears [https://code.jsoftware.com/wiki/Vocabulary/tildeco in J] as <source lang=j inline>~:</source> or Nub Sieve. It is also included in [[Dyalog APL 18.0]], released June 2020.
+The function Nubsieve (<source lang=apl inline>≠</syntaxhighlight>) first appeared in [[A Dictionary of APL]] in 1987, and was eventually added to [[SHARP APL]]. It appears [https://code.jsoftware.com/wiki/Vocabulary/tildeco in J] as <source lang=j inline>~:</syntaxhighlight> or Nub Sieve. It is also included in [[Dyalog APL 18.0]], released June 2020.
 == External links ==
@@ Line 98: / Line 98: @@
 * [https://help.dyalog.com/18.0/#Language/Primitive%20Functions/Unique%20Mask.htm Dyalog]
-* J [https://www.jsoftware.com/help/dictionary/d222.htm Dictionary], [https://code.jsoftware.com/wiki/Vocabulary/tildeco NuVoc] (as <source lang=j inline>~:</source>)
+* J [https://www.jsoftware.com/help/dictionary/d222.htm Dictionary], [https://code.jsoftware.com/wiki/Vocabulary/tildeco NuVoc] (as <source lang=j inline>~:</syntaxhighlight>)
 * [https://mlochbaum.github.io/BQN/doc/selfcmp.html#mark-firsts BQN] (as <code>∊</code>)
 {{APL built-ins}}[[Category:Primitive functions]][[Category:Set functions]]

Nub Sieve: Difference between revisions

Revision as of 21:02, 10 September 2022

Contents

Examples

Description

APL models

Use cases

History

External links

Documentation

Navigation menu

Nub Sieve: Difference between revisions

Revision as of 21:02, 10 September 2022

Examples

Description

APL models

Use cases

History

External links

Documentation

Navigation menu

Search