Interval Index

From APL Wiki
(Redirected from Bins)
Jump to navigation Jump to search
This page is about classifying data. See Indexing, Indices, Index Generator, and Index of for other operations named after indices.

Interval Index (), or Bins () in A+ and BQN, is a primitive search function which locates cells of its right argument relative to cells of a sorted left argument. Like searching for the right page for a word in a dictionary, knowing the first word on each page, Interval Index finds for each cell (word to search for) an index (page number) in the left argument (dictionary) so that the target cell is between the major cell at that index (first word on the page) and the major cell at the following index (first word on the next page). In general computer science, this problem is known as a predecessor search (see predecessor problem), since it searches for the predecessor—entry immediately preceding—of an array, but may be referred to by the name of a common technique used to solve it, the binary search. Interval Index may also be considered a way to perform data binning.

Like Grade, Interval Index relies on array ordering, but is not subject to comparison tolerance. Its result depends on index origin when present.

Examples

There are many variations on Interval Index. Examples here are based primarily on Dyalog APL's model.

Suppose a directory groups people by first name: those starting with A–D form one group, and similarly E–I, J–Q, and R–Z are each grouped together. A few APL developers want to know which section they are in. Interval Index is a perfect fit for this task. The left argument should be the first letter of each group (the last letters are not needed, as they are derived from the first letter of the next group), and the right argument should be the list of names. The five results correspond to the five right argument elements.

      'AEJR' ⍸ 'Ken' 'Adin' 'Larry' 'Phil' 'Roger'
3 1 3 3 4
Works in: Dyalog APL

In J the last letter of each group should be used instead, and the left argument must be converted to a list of boxed strings: (<@,"0'DIQZ') I. 'Ken';'Adin';'Larry';'Phil';'Roger'. The results are one smaller, because J uses index origin 0.

Left argument elements with multiple characters can also be used. Here, Ken and Larry are placed in the same group because their first names fall between "Ke" and "Lo".

      ('A' 'Ke' 'Lo' 'Pa') ⍸ 'Ken' 'Adin' 'Larry' 'Phil' 'Roger'
2 1 2 4 4
Works in: Dyalog APL

Interval Index is useful for constructing Histograms. Below, the intervals 1–9, 10–99, 100–999 are used to group several numbers. Again, the right endpoints of intervals are not needed (and in this case, they are not wanted, as they introduce confusion about where numbers such as 99.5 belong). For numbers before the first interval boundary, index 0 is returned, and for numbers after the last, index 4 is returned.

      1 10 100 1000 ⍸ 44 2 1 0 2481
2 1 1 0 4
Works in: Dyalog APL

You can use Interval Index with matrices and other high-rank arrays as well, in order to compare cells of those array. For example, if you took several pictures at a programming conference, and have the conference schedule as well as image timestamps, then you can find which talk each picture is of. Use the start time for each talk as the left argument, and the timestamp for each picture in the right argument.

      ⎕←schedule ← 4 2 ⍴ 1 00  1 45  2 15  2 30
1  0
1 45
2 15
2 30
      ⎕←timestamps ← 3 2 ⍴ 1 16  2 02  1 50
1 16
2  2
1 50
      schedule ⍸ timestamps
1 2 2

Description

Like Index Of, Interval Index splits its right argument into cells with the same rank as the major cells of the left argument (in Dyalog APL, a scalar left argument is not allowed; in other languages it is treated as a list of a single scalar major cell). One scalar value is returned for each cell. In all current implementations, the cell shapes of left and right arguments must match; otherwise, a LENGTH ERROR is given. In A+ and J, arguments are also required to have the same type (character, numeric, or boxed).

Interval Index uses array ordering to order cells. Note that array ordering is always defined in terms of intolerant comparison, so that Interval Index does not depend on comparison tolerance. Any cell which comes before another cell in this ordering is said to precede it. If the cells do not intolerantly match then the first strictly precedes the second.

There are two major ways to define the value corresponding to a single cell:

  • The index of the last left argument major cell preceding that cell, or
  • The number of left argument major cells which precede it.

Either definition may be adjusted to use either strict or non-strict precedence, and the result may be offset by some amount. Note that the result in the first case is an index, and should be subject to index origin, while the result in the second is merely a count, and should not. Therefore the definitions are incompatible in any language with variable index origin, and at least one must be adjusted by ⎕IO.

History

In 1987 Roger Hui presented the function classify, with a definition very similar to that later introduced in Dyalog APL, but restricted to vectors.[1]

A+ was the first APL to feature Interval Index as a primitive. Called Bins (), it assumed the left argument to be sorted ascending with no duplicates and returned the number of cells strictly less than the target cell, giving right-closed indices. A+ used the same cell rank convention as Find (Index Of) for Interval Index, a choice that was also adopted in later implementations of the primitive.

Interval Index (I.) in J was implemented in release 6.01, available in 2006.[2] The choice of I. for a symbol introduced the pairing of Interval Index with the monadic function Indices. In release 6.02 (2008), Index Of was extended to allow non-matching cell shapes, but Interval Index was not similarly extended.

Interval Index was also introduced in Dyalog APL 16.0, initially with the requirement that the left argument be sorted ascending with no duplicates. Dyalog extended it by adding total array ordering in Dyalog APL 17.0, and by removing the no-duplicate requirement in Dyalog APL 17.1. Dyalog's introduction was the first time the primitive was added to a language with index origin. The primitive was chosen to depend on ⎕IO, with its result decreased by 1 relative to A+ and J with index origin 0 in order to accommodate the index origin 1 case (thus, the index origin 1 case matches A+ and J in terms of the possible result indices).

Language Name Glyph Minimum result Interval shape Left argument requirements
A+ Bins 0 (] Ascending; no duplicates (unchecked)
J Interval Index I. 0 (] Ascending or descending (unchecked)
Dyalog APL Interval Index ⎕IO-1 [) Ascending; no duplicates prior to 17.1
BQN Bins ⍋⍒ 0 [) Ascending () or descending ()

External links

Lessons

Documentation

References

  1. Hui, Roger, Some Uses of { and }, §1.2. APL87. APL Quote Quad, Volume 17, Number 4. 1987-05.
  2. Jsoftware. I. Interval Index Implemented. 2005.
APL built-ins [edit]
Primitives (Timeline) Functions
Scalar
Monadic ConjugateNegateSignumReciprocalMagnitudeExponentialNatural LogarithmFloorCeilingFactorialNotPi TimesRollTypeImaginarySquare RootRound
Dyadic AddSubtractTimesDivideResiduePowerLogarithmMinimumMaximumBinomialComparison functionsBoolean functions (And, Or, Nand, Nor) ∙ GCDLCMCircularComplexRoot
Non-Scalar
Structural ShapeReshapeTallyDepthRavelEnlistTableCatenateReverseRotateTransposeRazeMixSplitEncloseNestCut (K)PairLinkPartitioned EnclosePartition
Selection FirstPickTakeDropUniqueIdentityStopSelectReplicateExpandSet functions (IntersectionUnionWithout) ∙ Bracket indexingIndexCartesian ProductSort
Selector Index generatorGradeIndex OfInterval IndexIndicesDealPrefix and suffix vectors
Computational MatchNot MatchMembershipFindNub SieveEncodeDecodeMatrix InverseMatrix DivideFormatExecuteMaterialiseRange
Operators Monadic EachCommuteConstantReplicateExpandReduceWindowed ReduceScanOuter ProductKeyI-BeamSpawnFunction axisIdentity (Null, Ident)
Dyadic BindCompositions (Compose, Reverse Compose, Beside, Withe, Atop, Over) ∙ Inner ProductDeterminantPowerAtUnderRankDepthVariantStencilCutDirect definition (operator)Identity (Lev, Dex)
Quad names Index originComparison toleranceMigration levelAtomic vector