Key: Difference between revisions

Jump to navigation Jump to search
108 bytes added ,  20:59, 10 September 2022
m
Text replacement - "</source>" to "</syntaxhighlight>"
m (Text replacement - "</source>" to "</syntaxhighlight>")
Line 15: Line 15:
│p│9 10    │
│p│9 10    │
└─┴────────┘
└─┴────────┘
</source>
</syntaxhighlight>


In the [[dyadic]] case, Key applies the function to collections of major cells from the right argument corresponding to unique elements of the left argument:
In the [[dyadic]] case, Key applies the function to collections of major cells from the right argument corresponding to unique elements of the left argument:
Line 30: Line 30:
│p│IJ  │
│p│IJ  │
└─┴────┘
└─┴────┘
</source>
</syntaxhighlight>


The monadic case, <source lang=apl inline>f⌸Y</source> is equivalent to <source lang=apl inline>Y f⌸ ⍳≢Y</source>.
The monadic case, <source lang=apl inline>f⌸Y</syntaxhighlight> is equivalent to <source lang=apl inline>Y f⌸ ⍳≢Y</syntaxhighlight>.


== Problems ==
== Problems ==
Line 42: Line 42:
C 4
C 4
G 6
G 6
</source>
</syntaxhighlight>
Since A is entirely missing in the argument, it isn't mentioned in the result either. Likewise, the result is mis-ordered due to G and T appearing before the first C. A common solution is to inject the vocabulary before the actual data, and then decrement from the counts:
Since A is entirely missing in the argument, it isn't mentioned in the result either. Likewise, the result is mis-ordered due to G and T appearing before the first C. A common solution is to inject the vocabulary before the actual data, and then decrement from the counts:
<source lang=apl>      {⍺,¯1+≢⍵}⌸'ACGT','TCCGCGGTGGCG'
<source lang=apl>      {⍺,¯1+≢⍵}⌸'ACGT','TCCGCGGTGGCG'
Line 49: Line 49:
G 6
G 6
T 2
T 2
</source>
</syntaxhighlight>
Now that the meaning of each count is known, the operand's left argument can be ignored, and the decrementing can be factored out from the operand:
Now that the meaning of each count is known, the operand's left argument can be ignored, and the decrementing can be factored out from the operand:
<source lang=apl>
<source lang=apl>
       ¯1+{≢⍵}⌸'ACGT','TCCGCGGTGGCG'
       ¯1+{≢⍵}⌸'ACGT','TCCGCGGTGGCG'
0 4 6 2
0 4 6 2
</source>
</syntaxhighlight>
=== Computing the unique ===
=== Computing the unique ===
Key computes the set of [[unique]] major cells. Often, this collection is needed separately from the occurrence information, but can be hard to extract. For example, to get the most frequently occurring letter:
Key computes the set of [[unique]] major cells. Often, this collection is needed separately from the occurrence information, but can be hard to extract. For example, to get the most frequently occurring letter:
Line 60: Line 60:
       ⊃⍒{≢⍵}⌸'TCCGCGGTGGCG'
       ⊃⍒{≢⍵}⌸'TCCGCGGTGGCG'
3
3
</source>
</syntaxhighlight>
Notice that 3 is the index in the unique set of letters, and so it is tempting to write:
Notice that 3 is the index in the unique set of letters, and so it is tempting to write:
<source lang=apl>
<source lang=apl>
       {(⊃⍒{≢⍵}⌸⍵)⌷∪⍵}'TCCGCGGTGGCG'
       {(⊃⍒{≢⍵}⌸⍵)⌷∪⍵}'TCCGCGGTGGCG'
G
G
</source>
</syntaxhighlight>
However, while this code works, it is inefficient in that the unique is computed twice. This can be avoided by letting Key return the unique and using that:
However, while this code works, it is inefficient in that the unique is computed twice. This can be avoided by letting Key return the unique and using that:
<source lang=apl>
<source lang=apl>
Line 71: Line 71:
       keys⌷⍨⊃⍒counts
       keys⌷⍨⊃⍒counts
G
G
</source>
</syntaxhighlight>
Unfortunately, this can introduce a different inefficiency, in that the result of Key's operand can end up being a [[heterogeneous array]] (containing multiple [[datatype]]s), and these are stored as pointer arrays, consuming memory for one pointer per element, and forcing "pointer chasing" when addressing the data. A possible work-around is to collect the unique keys separately from the result of counts:
Unfortunately, this can introduce a different inefficiency, in that the result of Key's operand can end up being a [[heterogeneous array]] (containing multiple [[datatype]]s), and these are stored as pointer arrays, consuming memory for one pointer per element, and forcing "pointer chasing" when addressing the data. A possible work-around is to collect the unique keys separately from the result of counts:
<source lang=apl>
<source lang=apl>
Line 79: Line 79:
       keys⌷⍨⊃⍒counts
       keys⌷⍨⊃⍒counts
G
G
</source>
</syntaxhighlight>
If there are a large number of unique values, the repeated updating of the accumulating <source lang=apl inline>keys</source> variable can be an issue in itself.
If there are a large number of unique values, the repeated updating of the accumulating <source lang=apl inline>keys</syntaxhighlight> variable can be an issue in itself.


== External links ==
== External links ==

Navigation menu