Key: Difference between revisions

Key (view source)

Revision as of 20:59, 10 September 2022

108 bytes added , 20:59, 10 September 2022

m

Text replacement - "</source>" to "</syntaxhighlight>"

Adám Brudzewsky

Bureaucrats, Interface administrators, Administrators, trusted

4,493

edits

@@ Line 15: / Line 15: @@
 │p│9 10    │
 └─┴────────┘
-</source>
+</syntaxhighlight>
 In the [[dyadic]] case, Key applies the function to collections of major cells from the right argument corresponding to unique elements of the left argument:
@@ Line 30: / Line 30: @@
 │p│IJ  │
 └─┴────┘
-</source>
+</syntaxhighlight>
-The monadic case, <source lang=apl inline>f⌸Y</source> is equivalent to <source lang=apl inline>Y f⌸ ⍳≢Y</source>.
+The monadic case, <source lang=apl inline>f⌸Y</syntaxhighlight> is equivalent to <source lang=apl inline>Y f⌸ ⍳≢Y</syntaxhighlight>.
 == Problems ==
@@ Line 42: / Line 42: @@
 C 4
 G 6
-</source>
+</syntaxhighlight>
 Since A is entirely missing in the argument, it isn't mentioned in the result either. Likewise, the result is mis-ordered due to G and T appearing before the first C. A common solution is to inject the vocabulary before the actual data, and then decrement from the counts:
 <source lang=apl>      {⍺,¯1+≢⍵}⌸'ACGT','TCCGCGGTGGCG'
@@ Line 49: / Line 49: @@
 G 6
 T 2
-</source>
+</syntaxhighlight>
 Now that the meaning of each count is known, the operand's left argument can be ignored, and the decrementing can be factored out from the operand:
 <source lang=apl>
        ¯1+{≢⍵}⌸'ACGT','TCCGCGGTGGCG'
 4 6 2
-</source>
+</syntaxhighlight>
 === Computing the unique ===
 Key computes the set of [[unique]] major cells. Often, this collection is needed separately from the occurrence information, but can be hard to extract. For example, to get the most frequently occurring letter:
@@ Line 60: / Line 60: @@
        ⊃⍒{≢⍵}⌸'TCCGCGGTGGCG'
-</source>
+</syntaxhighlight>
 Notice that 3 is the index in the unique set of letters, and so it is tempting to write:
 <source lang=apl>
        {(⊃⍒{≢⍵}⌸⍵)⌷∪⍵}'TCCGCGGTGGCG'
 G
-</source>
+</syntaxhighlight>
 However, while this code works, it is inefficient in that the unique is computed twice. This can be avoided by letting Key return the unique and using that:
 <source lang=apl>
@@ Line 71: / Line 71: @@
        keys⌷⍨⊃⍒counts
 G
-</source>
+</syntaxhighlight>
 Unfortunately, this can introduce a different inefficiency, in that the result of Key's operand can end up being a [[heterogeneous array]] (containing multiple [[datatype]]s), and these are stored as pointer arrays, consuming memory for one pointer per element, and forcing "pointer chasing" when addressing the data. A possible work-around is to collect the unique keys separately from the result of counts:
 <source lang=apl>
@@ Line 79: / Line 79: @@
        keys⌷⍨⊃⍒counts
 G
-</source>
+</syntaxhighlight>
-If there are a large number of unique values, the repeated updating of the accumulating <source lang=apl inline>keys</source> variable can be an issue in itself.
+If there are a large number of unique values, the repeated updating of the accumulating <source lang=apl inline>keys</syntaxhighlight> variable can be an issue in itself.
 == External links ==

Key: Difference between revisions

Key (view source)

Revision as of 20:59, 10 September 2022

Navigation menu

Search