4,493
edits
m (Text replacement - "</source>" to "</syntaxhighlight>") |
m (Text replacement - "<source" to "<syntaxhighlight") Tags: Mobile edit Mobile web edit |
||
Line 4: | Line 4: | ||
[[Monadic]]ally, Key will group identical [[major cell]]s together and applies the [[function]] operand once for each unique major cell. The function is applied with the unique major cell as left argument, while the right argument is the indices of major cells that match it: | [[Monadic]]ally, Key will group identical [[major cell]]s together and applies the [[function]] operand once for each unique major cell. The function is applied with the unique major cell as left argument, while the right argument is the indices of major cells that match it: | ||
< | <syntaxhighlight lang=apl> | ||
{⍺⍵}⌸'Mississippi' | {⍺⍵}⌸'Mississippi' | ||
┌─┬────────┐ | ┌─┬────────┐ | ||
Line 19: | Line 19: | ||
In the [[dyadic]] case, Key applies the function to collections of major cells from the right argument corresponding to unique elements of the left argument: | In the [[dyadic]] case, Key applies the function to collections of major cells from the right argument corresponding to unique elements of the left argument: | ||
< | <syntaxhighlight lang=apl> | ||
'Mississippi'{⍺⍵}⌸'ABCDEFGHIJK' | 'Mississippi'{⍺⍵}⌸'ABCDEFGHIJK' | ||
┌─┬────┐ | ┌─┬────┐ | ||
Line 32: | Line 32: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
The monadic case, < | The monadic case, <syntaxhighlight lang=apl inline>f⌸Y</syntaxhighlight> is equivalent to <syntaxhighlight lang=apl inline>Y f⌸ ⍳≢Y</syntaxhighlight>. | ||
== Problems == | == Problems == | ||
=== Vocabulary === | === Vocabulary === | ||
A common problem with Key is the inability to control the order of the result (as Key will use the order of appearance) and the "vocabulary" (as Key will never include information for a major cell that doesn't occur). For example, here we want to count occurrences of the letters A, C, G, T: | A common problem with Key is the inability to control the order of the result (as Key will use the order of appearance) and the "vocabulary" (as Key will never include information for a major cell that doesn't occur). For example, here we want to count occurrences of the letters A, C, G, T: | ||
< | <syntaxhighlight lang=apl> | ||
{⍺,≢⍵}⌸'TCCGCGGTGGCG' | {⍺,≢⍵}⌸'TCCGCGGTGGCG' | ||
T 2 | T 2 | ||
Line 44: | Line 44: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Since A is entirely missing in the argument, it isn't mentioned in the result either. Likewise, the result is mis-ordered due to G and T appearing before the first C. A common solution is to inject the vocabulary before the actual data, and then decrement from the counts: | Since A is entirely missing in the argument, it isn't mentioned in the result either. Likewise, the result is mis-ordered due to G and T appearing before the first C. A common solution is to inject the vocabulary before the actual data, and then decrement from the counts: | ||
< | <syntaxhighlight lang=apl> {⍺,¯1+≢⍵}⌸'ACGT','TCCGCGGTGGCG' | ||
A 0 | A 0 | ||
C 4 | C 4 | ||
Line 51: | Line 51: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Now that the meaning of each count is known, the operand's left argument can be ignored, and the decrementing can be factored out from the operand: | Now that the meaning of each count is known, the operand's left argument can be ignored, and the decrementing can be factored out from the operand: | ||
< | <syntaxhighlight lang=apl> | ||
¯1+{≢⍵}⌸'ACGT','TCCGCGGTGGCG' | ¯1+{≢⍵}⌸'ACGT','TCCGCGGTGGCG' | ||
0 4 6 2 | 0 4 6 2 | ||
Line 57: | Line 57: | ||
=== Computing the unique === | === Computing the unique === | ||
Key computes the set of [[unique]] major cells. Often, this collection is needed separately from the occurrence information, but can be hard to extract. For example, to get the most frequently occurring letter: | Key computes the set of [[unique]] major cells. Often, this collection is needed separately from the occurrence information, but can be hard to extract. For example, to get the most frequently occurring letter: | ||
< | <syntaxhighlight lang=apl> | ||
⊃⍒{≢⍵}⌸'TCCGCGGTGGCG' | ⊃⍒{≢⍵}⌸'TCCGCGGTGGCG' | ||
3 | 3 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Notice that 3 is the index in the unique set of letters, and so it is tempting to write: | Notice that 3 is the index in the unique set of letters, and so it is tempting to write: | ||
< | <syntaxhighlight lang=apl> | ||
{(⊃⍒{≢⍵}⌸⍵)⌷∪⍵}'TCCGCGGTGGCG' | {(⊃⍒{≢⍵}⌸⍵)⌷∪⍵}'TCCGCGGTGGCG' | ||
G | G | ||
</syntaxhighlight> | </syntaxhighlight> | ||
However, while this code works, it is inefficient in that the unique is computed twice. This can be avoided by letting Key return the unique and using that: | However, while this code works, it is inefficient in that the unique is computed twice. This can be avoided by letting Key return the unique and using that: | ||
< | <syntaxhighlight lang=apl> | ||
(keys counts)←,⌿{⍺,≢⍵}⌸'TCCGCGGTGGCG' | (keys counts)←,⌿{⍺,≢⍵}⌸'TCCGCGGTGGCG' | ||
keys⌷⍨⊃⍒counts | keys⌷⍨⊃⍒counts | ||
Line 73: | Line 73: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Unfortunately, this can introduce a different inefficiency, in that the result of Key's operand can end up being a [[heterogeneous array]] (containing multiple [[datatype]]s), and these are stored as pointer arrays, consuming memory for one pointer per element, and forcing "pointer chasing" when addressing the data. A possible work-around is to collect the unique keys separately from the result of counts: | Unfortunately, this can introduce a different inefficiency, in that the result of Key's operand can end up being a [[heterogeneous array]] (containing multiple [[datatype]]s), and these are stored as pointer arrays, consuming memory for one pointer per element, and forcing "pointer chasing" when addressing the data. A possible work-around is to collect the unique keys separately from the result of counts: | ||
< | <syntaxhighlight lang=apl> | ||
data←'TCCGCGGTGGCG' | data←'TCCGCGGTGGCG' | ||
keys←0⌿data | keys←0⌿data | ||
Line 80: | Line 80: | ||
G | G | ||
</syntaxhighlight> | </syntaxhighlight> | ||
If there are a large number of unique values, the repeated updating of the accumulating < | If there are a large number of unique values, the repeated updating of the accumulating <syntaxhighlight lang=apl inline>keys</syntaxhighlight> variable can be an issue in itself. | ||
== External links == | == External links == |