4,500
edits
m (Text replacement - "</source>" to "</syntaxhighlight>") |
|||
Line 15: | Line 15: | ||
│p│9 10 │ | │p│9 10 │ | ||
└─┴────────┘ | └─┴────────┘ | ||
</ | </syntaxhighlight> | ||
In the [[dyadic]] case, Key applies the function to collections of major cells from the right argument corresponding to unique elements of the left argument: | In the [[dyadic]] case, Key applies the function to collections of major cells from the right argument corresponding to unique elements of the left argument: | ||
Line 30: | Line 30: | ||
│p│IJ │ | │p│IJ │ | ||
└─┴────┘ | └─┴────┘ | ||
</ | </syntaxhighlight> | ||
The monadic case, <source lang=apl inline>f⌸Y</ | The monadic case, <source lang=apl inline>f⌸Y</syntaxhighlight> is equivalent to <source lang=apl inline>Y f⌸ ⍳≢Y</syntaxhighlight>. | ||
== Problems == | == Problems == | ||
Line 42: | Line 42: | ||
C 4 | C 4 | ||
G 6 | G 6 | ||
</ | </syntaxhighlight> | ||
Since A is entirely missing in the argument, it isn't mentioned in the result either. Likewise, the result is mis-ordered due to G and T appearing before the first C. A common solution is to inject the vocabulary before the actual data, and then decrement from the counts: | Since A is entirely missing in the argument, it isn't mentioned in the result either. Likewise, the result is mis-ordered due to G and T appearing before the first C. A common solution is to inject the vocabulary before the actual data, and then decrement from the counts: | ||
<source lang=apl> {⍺,¯1+≢⍵}⌸'ACGT','TCCGCGGTGGCG' | <source lang=apl> {⍺,¯1+≢⍵}⌸'ACGT','TCCGCGGTGGCG' | ||
Line 49: | Line 49: | ||
G 6 | G 6 | ||
T 2 | T 2 | ||
</ | </syntaxhighlight> | ||
Now that the meaning of each count is known, the operand's left argument can be ignored, and the decrementing can be factored out from the operand: | Now that the meaning of each count is known, the operand's left argument can be ignored, and the decrementing can be factored out from the operand: | ||
<source lang=apl> | <source lang=apl> | ||
¯1+{≢⍵}⌸'ACGT','TCCGCGGTGGCG' | ¯1+{≢⍵}⌸'ACGT','TCCGCGGTGGCG' | ||
0 4 6 2 | 0 4 6 2 | ||
</ | </syntaxhighlight> | ||
=== Computing the unique === | === Computing the unique === | ||
Key computes the set of [[unique]] major cells. Often, this collection is needed separately from the occurrence information, but can be hard to extract. For example, to get the most frequently occurring letter: | Key computes the set of [[unique]] major cells. Often, this collection is needed separately from the occurrence information, but can be hard to extract. For example, to get the most frequently occurring letter: | ||
Line 60: | Line 60: | ||
⊃⍒{≢⍵}⌸'TCCGCGGTGGCG' | ⊃⍒{≢⍵}⌸'TCCGCGGTGGCG' | ||
3 | 3 | ||
</ | </syntaxhighlight> | ||
Notice that 3 is the index in the unique set of letters, and so it is tempting to write: | Notice that 3 is the index in the unique set of letters, and so it is tempting to write: | ||
<source lang=apl> | <source lang=apl> | ||
{(⊃⍒{≢⍵}⌸⍵)⌷∪⍵}'TCCGCGGTGGCG' | {(⊃⍒{≢⍵}⌸⍵)⌷∪⍵}'TCCGCGGTGGCG' | ||
G | G | ||
</ | </syntaxhighlight> | ||
However, while this code works, it is inefficient in that the unique is computed twice. This can be avoided by letting Key return the unique and using that: | However, while this code works, it is inefficient in that the unique is computed twice. This can be avoided by letting Key return the unique and using that: | ||
<source lang=apl> | <source lang=apl> | ||
Line 71: | Line 71: | ||
keys⌷⍨⊃⍒counts | keys⌷⍨⊃⍒counts | ||
G | G | ||
</ | </syntaxhighlight> | ||
Unfortunately, this can introduce a different inefficiency, in that the result of Key's operand can end up being a [[heterogeneous array]] (containing multiple [[datatype]]s), and these are stored as pointer arrays, consuming memory for one pointer per element, and forcing "pointer chasing" when addressing the data. A possible work-around is to collect the unique keys separately from the result of counts: | Unfortunately, this can introduce a different inefficiency, in that the result of Key's operand can end up being a [[heterogeneous array]] (containing multiple [[datatype]]s), and these are stored as pointer arrays, consuming memory for one pointer per element, and forcing "pointer chasing" when addressing the data. A possible work-around is to collect the unique keys separately from the result of counts: | ||
<source lang=apl> | <source lang=apl> | ||
Line 79: | Line 79: | ||
keys⌷⍨⊃⍒counts | keys⌷⍨⊃⍒counts | ||
G | G | ||
</ | </syntaxhighlight> | ||
If there are a large number of unique values, the repeated updating of the accumulating <source lang=apl inline>keys</ | If there are a large number of unique values, the repeated updating of the accumulating <source lang=apl inline>keys</syntaxhighlight> variable can be an issue in itself. | ||
== External links == | == External links == |