Interval Index: Difference between revisions

From APL Wiki
Jump to navigation Jump to search
mNo edit summary
m (Text replacement - "High-rank set functions" to "Major cell search")
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
:''This page is about classifying data. See [[Indexing]], [[Indices]], [[Index Generator]], and [[Index of]] for other operations named after indices.''
:''This page is about classifying data. See [[Indexing]], [[Indices]], [[Index Generator]], and [[Index of]] for other operations named after indices.''


{{Built-in|Interval Index|⍸}}, or '''Bins''' (<source lang=apl inline>⍋</source>) in [[A+]] and [[BQN]], is a [[primitive function|primitive]] [[search function]] which locates [[cell]]s of its right [[argument]] relative to cells of a [[sorted]] left argument. Like searching for the right page for a word in a dictionary, knowing the first word on each page, Interval Index finds for each cell (word to search for) an [[index]] (page number) in the left argument (dictionary) so that the target cell is between the [[major cell]] at that index (first word on the page) and the major cell at the following index (first word on the next page). In general computer science, this problem is known as a '''predecessor search''' (see [[wikipedia:predecessor problem|predecessor problem]]), since it searches for the predecessor—entry immediately preceding—of an array, but may be referred to by the name of a common technique used to solve it, the [[wikipedia:binary search|binary search]]. Interval Index may also be considered a way to perform [[wikipedia:data binning|data binning]].
{{Built-in|Interval Index|⍸}}, or '''Bins''' (<syntaxhighlight lang=apl inline>⍋</syntaxhighlight>) in [[A+]] and [[BQN]], is a [[primitive function|primitive]] [[search function]] which locates [[cell]]s of its right [[argument]] relative to cells of a [[sorted]] left argument. Like searching for the right page for a word in a dictionary, knowing the first word on each page, Interval Index finds for each cell (word to search for) an [[index]] (page number) in the left argument (dictionary) so that the target cell is between the [[major cell]] at that index (first word on the page) and the major cell at the following index (first word on the next page). In general computer science, this problem is known as a '''predecessor search''' (see [[wikipedia:predecessor problem|predecessor problem]]), since it searches for the predecessor—entry immediately preceding—of an array, but may be referred to by the name of a common technique used to solve it, the [[wikipedia:binary search|binary search]]. Interval Index may also be considered a way to perform [[wikipedia:data binning|data binning]].


Like [[Grade]], Interval Index relies on [[array ordering]], but is not subject to [[comparison tolerance]]. Its result depends on [[index origin]] when present.
Like [[Grade]], Interval Index relies on [[array ordering]], but is not subject to [[comparison tolerance]]. Its result depends on [[index origin]] when present.
Line 10: Line 10:


Suppose a directory groups people by first name: those starting with A–D form one group, and similarly E–I, J–Q, and R–Z are each grouped together. A few APL developers want to know which section they are in. Interval Index is a perfect fit for this task. The left argument should be the first letter of each group (the last letters are not needed, as they are derived from the first letter of the next group), and the right argument should be the list of names. The five results correspond to the five right argument elements.
Suppose a directory groups people by first name: those starting with A–D form one group, and similarly E–I, J–Q, and R–Z are each grouped together. A few APL developers want to know which section they are in. Interval Index is a perfect fit for this task. The left argument should be the first letter of each group (the last letters are not needed, as they are derived from the first letter of the next group), and the right argument should be the list of names. The five results correspond to the five right argument elements.
<source lang=apl>
<syntaxhighlight lang=apl>
       'AEJR' ⍸ 'Ken' 'Adin' 'Larry' 'Phil' 'Roger'
       'AEJR' ⍸ 'Ken' 'Adin' 'Larry' 'Phil' 'Roger'
3 1 3 3 4
3 1 3 3 4
</source>
</syntaxhighlight>
{{Works in|[[Dyalog APL]]}}
{{Works in|[[Dyalog APL]]}}
In [[J]] the ''last'' letter of each group should be used instead, and the left argument must be converted to a list of [[box]]ed strings: <source lang=j inline>(<@,"0'DIQZ') I. 'Ken';'Adin';'Larry';'Phil';'Roger'</source>. The results are one smaller, because J uses [[index origin]] 0.
In [[J]] the ''last'' letter of each group should be used instead, and the left argument must be converted to a list of [[box]]ed strings: <syntaxhighlight lang=j inline>(<@,"0'DIQZ') I. 'Ken';'Adin';'Larry';'Phil';'Roger'</syntaxhighlight>. The results are one smaller, because J uses [[index origin]] 0.


Left argument elements with multiple characters can also be used. Here, Ken and Larry are placed in the same group because their first names fall between "Ke" and "Lo".
Left argument elements with multiple characters can also be used. Here, Ken and Larry are placed in the same group because their first names fall between "Ke" and "Lo".
<source lang=apl>
<syntaxhighlight lang=apl>
       ('A' 'Ke' 'Lo' 'Pa') ⍸ 'Ken' 'Adin' 'Larry' 'Phil' 'Roger'
       ('A' 'Ke' 'Lo' 'Pa') ⍸ 'Ken' 'Adin' 'Larry' 'Phil' 'Roger'
2 1 2 4 4
2 1 2 4 4
</source>
</syntaxhighlight>
{{Works in|[[Dyalog APL]]}}
{{Works in|[[Dyalog APL]]}}


Interval Index is useful for constructing [[wikipedia:Histogram|Histogram]]s. Below, the intervals 1–9, 10–99, 100–999 are used to group several numbers. Again, the right endpoints of intervals are not needed (and in this case, they are not wanted, as they introduce confusion about where numbers such as 99.5 belong). For numbers before the first interval boundary, index 0 is returned, and for numbers after the last, index 4 is returned.
Interval Index is useful for constructing [[wikipedia:Histogram|Histogram]]s. Below, the intervals 1–9, 10–99, 100–999 are used to group several numbers. Again, the right endpoints of intervals are not needed (and in this case, they are not wanted, as they introduce confusion about where numbers such as 99.5 belong). For numbers before the first interval boundary, index 0 is returned, and for numbers after the last, index 4 is returned.
<source lang=apl>
<syntaxhighlight lang=apl>
       1 10 100 1000 ⍸ 44 2 1 0 2481
       1 10 100 1000 ⍸ 44 2 1 0 2481
2 1 1 0 4
2 1 1 0 4
</source>
</syntaxhighlight>
{{Works in|[[Dyalog APL]]}}
{{Works in|[[Dyalog APL]]}}


You can use Interval Index with [[Matrix|matrices]] and other high-[[rank]] arrays as well, in order to compare cells of those array. For example, if you took several pictures at a programming conference, and have the conference schedule as well as image timestamps, then you can find which talk each picture is of. Use the start time for each talk as the left argument, and the timestamp for each picture in the right argument.
You can use Interval Index with [[Matrix|matrices]] and other high-[[rank]] arrays as well, in order to compare cells of those array. For example, if you took several pictures at a programming conference, and have the conference schedule as well as image timestamps, then you can find which talk each picture is of. Use the start time for each talk as the left argument, and the timestamp for each picture in the right argument.
<source lang=apl>
<syntaxhighlight lang=apl>
       ⎕←schedule ← 4 2 ⍴ 1 00  1 45  2 15  2 30
       ⎕←schedule ← 4 2 ⍴ 1 00  1 45  2 15  2 30
1  0
1  0
Line 44: Line 44:
       schedule ⍸ timestamps
       schedule ⍸ timestamps
1 2 2
1 2 2
</source>
</syntaxhighlight>


== Description ==
== Description ==
Line 55: Line 55:
* The [[index]] of the last left argument major cell preceding that cell, or
* The [[index]] of the last left argument major cell preceding that cell, or
* The number of left argument major cells which precede it.
* The number of left argument major cells which precede it.
Either definition may be adjusted to use either strict or non-strict precedence, and the result may be offset by some amount. Note that the result in the first case is an [[index]], and should be subject to [[index origin]], while the result in the second is merely a count, and should not. Therefore the definitions are incompatible in any language with variable index origin, and at least one must be adjusted by <source lang=apl inline>⎕IO</source>.
Either definition may be adjusted to use either strict or non-strict precedence, and the result may be offset by some amount. Note that the result in the first case is an [[index]], and should be subject to [[index origin]], while the result in the second is merely a count, and should not. Therefore the definitions are incompatible in any language with variable index origin, and at least one must be adjusted by <syntaxhighlight lang=apl inline>⎕IO</syntaxhighlight>.


== History ==
== History ==
Line 61: Line 61:
In 1987 [[Roger Hui]] presented the function ''classify'', with a definition very similar to that later introduced in [[Dyalog APL]], but restricted to [[vector]]s.<ref>[[Roger Hui|Hui, Roger]], [https://www.jsoftware.com/papers/from.htm Some Uses of { and }], §1.2. [[APL87]]. APL Quote Quad, Volume 17, Number 4. 1987-05.</ref>
In 1987 [[Roger Hui]] presented the function ''classify'', with a definition very similar to that later introduced in [[Dyalog APL]], but restricted to [[vector]]s.<ref>[[Roger Hui|Hui, Roger]], [https://www.jsoftware.com/papers/from.htm Some Uses of { and }], §1.2. [[APL87]]. APL Quote Quad, Volume 17, Number 4. 1987-05.</ref>


[[A+]] was the first APL to feature Interval Index as a primitive. Called Bins (<source lang=apl inline>⍋</source>), it assumed the left argument to be sorted ascending with no duplicates and returned the number of cells strictly less than the target cell, giving right-closed indices. A+ used the same [[High-rank set functions|cell rank convention]] as Find ([[Index Of]]) for Interval Index, a choice that was also adopted in later implementations of the primitive.
[[A+]] was the first APL to feature Interval Index as a primitive. Called Bins (<syntaxhighlight lang=apl inline>⍋</syntaxhighlight>), it assumed the left argument to be sorted ascending with no duplicates and returned the number of cells strictly less than the target cell, giving right-closed indices. A+ used the same [[Major cell search|cell rank convention]] as Find ([[Index Of]]) for Interval Index, a choice that was also adopted in later implementations of the primitive.


Interval Index (<source lang=j inline>I.</source>) in [[J]] was implemented in release 6.01, available in 2006.<ref>Jsoftware. [https://www.jsoftware.com/docs/archive/release/binsearch.htm I. ''Interval Index'' Implemented]. 2005.</ref> The choice of <source lang=j inline>I.</source> for a symbol introduced the pairing of Interval Index with the monadic function [[Indices]]. In release 6.02 (2008), [[Index Of]] was extended to allow non-matching cell shapes, but Interval Index was not similarly extended.
Interval Index (<syntaxhighlight lang=j inline>I.</syntaxhighlight>) in [[J]] was implemented in release 6.01, available in 2006.<ref>Jsoftware. [https://www.jsoftware.com/docs/archive/release/binsearch.htm I. ''Interval Index'' Implemented]. 2005.</ref> The choice of <syntaxhighlight lang=j inline>I.</syntaxhighlight> for a symbol introduced the pairing of Interval Index with the monadic function [[Indices]]. In release 6.02 (2008), [[Index Of]] was extended to allow non-matching cell shapes, but Interval Index was not similarly extended.


Interval Index was also introduced in [[Dyalog APL 16.0]], initially with the requirement that the left argument be sorted ascending with no duplicates. Dyalog extended it by adding [[total array ordering]] in [[Dyalog APL 17.0]], and by removing the no-duplicate requirement in [[Dyalog APL 17.1]]. Dyalog's introduction was the first time the primitive was added to a language with [[index origin]]. The primitive was chosen to depend on <source lang=apl inline>⎕IO</source>, with its result decreased by 1 relative to A+ and J with index origin 0 in order to accommodate the index origin 1 case (thus, the index origin 1 case matches A+ and J in terms of the possible result indices).
Interval Index was also introduced in [[Dyalog APL 16.0]], initially with the requirement that the left argument be sorted ascending with no duplicates. Dyalog extended it by adding [[total array ordering]] in [[Dyalog APL 17.0]], and by removing the no-duplicate requirement in [[Dyalog APL 17.1]]. Dyalog's introduction was the first time the primitive was added to a language with [[index origin]]. The primitive was chosen to depend on <syntaxhighlight lang=apl inline>⎕IO</syntaxhighlight>, with its result decreased by 1 relative to A+ and J with index origin 0 in order to accommodate the index origin 1 case (thus, the index origin 1 case matches A+ and J in terms of the possible result indices).


{|class=wikitable
{|class=wikitable
! Language      !! Name          !! [[Glyph]]                          !! Minimum result                        !! Interval shape !! Left argument requirements
! Language      !! Name          !! [[Glyph]]                          !! Minimum result                        !! Interval shape !! Left argument requirements
|-
|-
| [[A+]]        || Bins          || <source lang=apl inline>⍋</source>  || <source lang=apl inline>0</source>    || (]            || Ascending; no duplicates (unchecked)
| [[A+]]        || Bins          || <syntaxhighlight lang=apl inline>⍋</syntaxhighlight>  || <syntaxhighlight lang=apl inline>0</syntaxhighlight>    || (]            || Ascending; no duplicates (unchecked)
|-
|-
| [[J]]          || Interval Index || <source lang=j inline>I.</source>  || <source lang=j inline>0</source>      || (]            || Ascending or descending (unchecked)
| [[J]]          || Interval Index || <syntaxhighlight lang=j inline>I.</syntaxhighlight>  || <syntaxhighlight lang=j inline>0</syntaxhighlight>      || (]            || Ascending or descending (unchecked)
|-
|-
| [[Dyalog APL]] || Interval Index || <source lang=apl inline>⍸</source>  || <source lang=apl inline>⎕IO-1</source> || [)            || Ascending; no duplicates prior to [[Dyalog APL 17.1|17.1]]
| [[Dyalog APL]] || Interval Index || <syntaxhighlight lang=apl inline>⍸</syntaxhighlight>  || <syntaxhighlight lang=apl inline>⎕IO-1</syntaxhighlight> || [)            || Ascending; no duplicates prior to [[Dyalog APL 17.1|17.1]]
|-
|-
| [[BQN]]        || Bins          || <source lang=apl inline>⍋⍒</source> || <source lang=apl inline>0</source>    || [)            || Ascending (<source lang=apl inline>⍋</source>) or descending (<source lang=apl inline>⍒</source>)
| [[BQN]]        || Bins          || <syntaxhighlight lang=apl inline>⍋⍒</syntaxhighlight> || <syntaxhighlight lang=apl inline>0</syntaxhighlight>    || [)            || Ascending (<syntaxhighlight lang=apl inline>⍋</syntaxhighlight>) or descending (<syntaxhighlight lang=apl inline>⍒</syntaxhighlight>)
|}
|}


Line 89: Line 89:
* [https://help.dyalog.com/latest/index.htm#Language/Primitive%20Functions/Interval%20Index.htm Dyalog]
* [https://help.dyalog.com/latest/index.htm#Language/Primitive%20Functions/Interval%20Index.htm Dyalog]
* J [https://www.jsoftware.com/help/dictionary/dicapdot.htm Dictionary], [https://code.jsoftware.com/wiki/Vocabulary/icapdot#dyadic NuVoc]
* J [https://www.jsoftware.com/help/dictionary/dicapdot.htm Dictionary], [https://code.jsoftware.com/wiki/Vocabulary/icapdot#dyadic NuVoc]
* [https://mlochbaum.github.io/BQN/doc/order.html#bins BQN]


== References ==
== References ==

Latest revision as of 23:27, 10 March 2024

This page is about classifying data. See Indexing, Indices, Index Generator, and Index of for other operations named after indices.

Interval Index (), or Bins () in A+ and BQN, is a primitive search function which locates cells of its right argument relative to cells of a sorted left argument. Like searching for the right page for a word in a dictionary, knowing the first word on each page, Interval Index finds for each cell (word to search for) an index (page number) in the left argument (dictionary) so that the target cell is between the major cell at that index (first word on the page) and the major cell at the following index (first word on the next page). In general computer science, this problem is known as a predecessor search (see predecessor problem), since it searches for the predecessor—entry immediately preceding—of an array, but may be referred to by the name of a common technique used to solve it, the binary search. Interval Index may also be considered a way to perform data binning.

Like Grade, Interval Index relies on array ordering, but is not subject to comparison tolerance. Its result depends on index origin when present.

Examples

There are many variations on Interval Index. Examples here are based primarily on Dyalog APL's model.

Suppose a directory groups people by first name: those starting with A–D form one group, and similarly E–I, J–Q, and R–Z are each grouped together. A few APL developers want to know which section they are in. Interval Index is a perfect fit for this task. The left argument should be the first letter of each group (the last letters are not needed, as they are derived from the first letter of the next group), and the right argument should be the list of names. The five results correspond to the five right argument elements.

      'AEJR' ⍸ 'Ken' 'Adin' 'Larry' 'Phil' 'Roger'
3 1 3 3 4
Works in: Dyalog APL

In J the last letter of each group should be used instead, and the left argument must be converted to a list of boxed strings: (<@,"0'DIQZ') I. 'Ken';'Adin';'Larry';'Phil';'Roger'. The results are one smaller, because J uses index origin 0.

Left argument elements with multiple characters can also be used. Here, Ken and Larry are placed in the same group because their first names fall between "Ke" and "Lo".

      ('A' 'Ke' 'Lo' 'Pa') ⍸ 'Ken' 'Adin' 'Larry' 'Phil' 'Roger'
2 1 2 4 4
Works in: Dyalog APL

Interval Index is useful for constructing Histograms. Below, the intervals 1–9, 10–99, 100–999 are used to group several numbers. Again, the right endpoints of intervals are not needed (and in this case, they are not wanted, as they introduce confusion about where numbers such as 99.5 belong). For numbers before the first interval boundary, index 0 is returned, and for numbers after the last, index 4 is returned.

      1 10 100 1000 ⍸ 44 2 1 0 2481
2 1 1 0 4
Works in: Dyalog APL

You can use Interval Index with matrices and other high-rank arrays as well, in order to compare cells of those array. For example, if you took several pictures at a programming conference, and have the conference schedule as well as image timestamps, then you can find which talk each picture is of. Use the start time for each talk as the left argument, and the timestamp for each picture in the right argument.

      ⎕←schedule ← 4 2 ⍴ 1 00  1 45  2 15  2 30
1  0
1 45
2 15
2 30
      ⎕←timestamps ← 3 2 ⍴ 1 16  2 02  1 50
1 16
2  2
1 50
      schedule ⍸ timestamps
1 2 2

Description

Like Index Of, Interval Index splits its right argument into cells with the same rank as the major cells of the left argument (in Dyalog APL, a scalar left argument is not allowed; in other languages it is treated as a list of a single scalar major cell). One scalar value is returned for each cell. In all current implementations, the cell shapes of left and right arguments must match; otherwise, a LENGTH ERROR is given. In A+ and J, arguments are also required to have the same type (character, numeric, or boxed).

Interval Index uses array ordering to order cells. Note that array ordering is always defined in terms of intolerant comparison, so that Interval Index does not depend on comparison tolerance. Any cell which comes before another cell in this ordering is said to precede it. If the cells do not intolerantly match then the first strictly precedes the second.

There are two major ways to define the value corresponding to a single cell:

  • The index of the last left argument major cell preceding that cell, or
  • The number of left argument major cells which precede it.

Either definition may be adjusted to use either strict or non-strict precedence, and the result may be offset by some amount. Note that the result in the first case is an index, and should be subject to index origin, while the result in the second is merely a count, and should not. Therefore the definitions are incompatible in any language with variable index origin, and at least one must be adjusted by ⎕IO.

History

In 1987 Roger Hui presented the function classify, with a definition very similar to that later introduced in Dyalog APL, but restricted to vectors.[1]

A+ was the first APL to feature Interval Index as a primitive. Called Bins (), it assumed the left argument to be sorted ascending with no duplicates and returned the number of cells strictly less than the target cell, giving right-closed indices. A+ used the same cell rank convention as Find (Index Of) for Interval Index, a choice that was also adopted in later implementations of the primitive.

Interval Index (I.) in J was implemented in release 6.01, available in 2006.[2] The choice of I. for a symbol introduced the pairing of Interval Index with the monadic function Indices. In release 6.02 (2008), Index Of was extended to allow non-matching cell shapes, but Interval Index was not similarly extended.

Interval Index was also introduced in Dyalog APL 16.0, initially with the requirement that the left argument be sorted ascending with no duplicates. Dyalog extended it by adding total array ordering in Dyalog APL 17.0, and by removing the no-duplicate requirement in Dyalog APL 17.1. Dyalog's introduction was the first time the primitive was added to a language with index origin. The primitive was chosen to depend on ⎕IO, with its result decreased by 1 relative to A+ and J with index origin 0 in order to accommodate the index origin 1 case (thus, the index origin 1 case matches A+ and J in terms of the possible result indices).

Language Name Glyph Minimum result Interval shape Left argument requirements
A+ Bins 0 (] Ascending; no duplicates (unchecked)
J Interval Index I. 0 (] Ascending or descending (unchecked)
Dyalog APL Interval Index ⎕IO-1 [) Ascending; no duplicates prior to 17.1
BQN Bins ⍋⍒ 0 [) Ascending () or descending ()

External links

Lessons

Documentation

References

  1. Hui, Roger, Some Uses of { and }, §1.2. APL87. APL Quote Quad, Volume 17, Number 4. 1987-05.
  2. Jsoftware. I. Interval Index Implemented. 2005.
APL built-ins [edit]
Primitives (Timeline) Functions
Scalar
Monadic ConjugateNegateSignumReciprocalMagnitudeExponentialNatural LogarithmFloorCeilingFactorialNotPi TimesRollTypeImaginarySquare Root
Dyadic AddSubtractTimesDivideResiduePowerLogarithmMinimumMaximumBinomialComparison functionsBoolean functions (And, Or, Nand, Nor) ∙ GCDLCMCircularComplexRoot
Non-Scalar
Structural ShapeReshapeTallyDepthRavelEnlistTableCatenateReverseRotateTransposeRazeMixSplitEncloseNestCut (K)PairLinkPartitioned EnclosePartition
Selection FirstPickTakeDropUniqueIdentityStopSelectReplicateExpandSet functions (IntersectionUnionWithout) ∙ Bracket indexingIndexCartesian ProductSort
Selector Index generatorGradeIndex OfInterval IndexIndicesDealPrefix and suffix vectors
Computational MatchNot MatchMembershipFindNub SieveEncodeDecodeMatrix InverseMatrix DivideFormatExecuteMaterialiseRange
Operators Monadic EachCommuteConstantReplicateExpandReduceWindowed ReduceScanOuter ProductKeyI-BeamSpawnFunction axis
Dyadic BindCompositions (Compose, Reverse Compose, Beside, Withe, Atop, Over) ∙ Inner ProductDeterminantPowerAtUnderRankDepthVariantStencilCutDirect definition (operator)
Quad names Index originComparison toleranceMigration levelAtomic vector