Partition representations: Difference between revisions

Jump to navigation Jump to search
→‎Target indices: Add BQN link
(Created page with "Partition, Partitioned Enclose, and the Cut operator all allow a programmer to split a vector into differently sized pieces. The left argument to these ope...")
 
(→‎Target indices: Add BQN link)
(11 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Partition]], [[Partitioned Enclose]], and the [[Cut]] operator all allow a programmer to split a [[vector]] into differently sized pieces. The left [[argument]] to these operations defines the structure of the partition. This page discusses possible ways partitions can be represented.
[[Partition]], [[Partitioned Enclose]], the [[Cut]] operator, and [[Cut (K)]] all allow a programmer to split a [[vector]] into differently sized pieces. The left [[argument]] to these operations defines the structure of the partition. This article discusses possible ways partitions can be represented, using [[index origin]] 0 throughout.
 
This page uses [[index origin]] 0.


== Definitions ==
== Definitions ==


In this page, a '''partition''' of a [[vector]] is defined to be a [[nested]] or [[box]]ed vector containing vectors, such that the [[Raze]] (or equivalently <source lang=apl inline>{⊃⍪/⍵}</source> in [[Nested array model|nested]] APLs) of the partition intolerantly [[match]]es the original vector. A partition contains all the original [[element]]s of the vector, in the same order, but at one greater [[depth]].
In this page, a '''partition''' of a [[vector]] is defined to be a non-[[empty]] [[nested]] or [[box]]ed vector containing vectors, such that the [[Raze]] (or equivalently <source lang=apl inline>{⊃⍪/⍵}</source> in [[Nested array model|nested]] APLs) of the partition intolerantly [[match]]es the original vector. A partition contains all the original [[element]]s of the vector, in the same order, but at one greater [[depth]].


The vectors contained in a partition are called '''divisions''', and the boundaries between them are called '''dividers'''. Although the English word "partition" can be used for either of these, the ambiguity of using one word for three different objects could be confusing. Only boundaries between divisions are called dividers: we do not name the two outermost edges, which must exist in any partition.
The vectors contained in a partition are called '''divisions''', and the boundaries between them are called '''dividers'''. Although the English word "partition" can be used for either of these, the ambiguity of using one word for three different objects could be confusing. Only boundaries between divisions are called dividers: we do not name the two outermost edges, which must exist in any partition.
Line 26: Line 24:
{{Works in|[[Dyalog APL]]}}
{{Works in|[[Dyalog APL]]}}


The division endpoints are another way to represent partitions. The endpoints <source lang=apl inline>+\l</source> computed above form a non-decreasing vector of non-negative integers whose last element is equal to the length of the partitioned vector. It would also be possible to use the partition starting points <source lang=apl inline>+\¯1↓0,l</source>; here we use only endpoints for consistency.
The division endpoints are another way to represent partitions. The endpoints <source lang=apl inline>+\l</source> computed above form a non-decreasing vector of non-negative integers whose last element is equal to the length of the partitioned vector. It would also be possible to use the partition starting points <source lang=apl inline>+\¯1↓0,l</source>; here we use only endpoints for consistency. The latter representation matches [[K]]'s [[Cut (K)|Cut]] function, which produces a complete partition exactly when the left argument contains a zero (otherwise, some initial elements are omitted).


Vectors of division endpoints correspond exactly to vectors of division lengths, because the [[Windowed Reduce|windowed]] [[difference]] <source lang=apl inline>¯2-/0,e</source> is an exact [[inverse]] of Plus Scan which obtains division lengths <source lang=apl inline>l</source> from endpoints <source lang=apl inline>e</source>.
Vectors of division endpoints correspond exactly to vectors of division lengths, because the [[Windowed Reduce|windowed]] [[difference]] <source lang=apl inline>¯2-/0,e</source> is an exact [[inverse]] of Plus Scan which obtains division lengths <source lang=apl inline>l</source> from endpoints <source lang=apl inline>e</source>.
Line 32: Line 30:
== Representation using divider locations ==
== Representation using divider locations ==


Existing APLs rarely define partitioning functions using properties of the resulting divisions. Instead, most partition functions use a vector of the same length as the partitioned vector to control how it is partitioned. For example, [[Partitioned Enclose]] starts a new partition whenever a 1 is encountered in the [[Boolean]] left argument:
Existing APLs rarely define partitioning functions using properties of the resulting divisions. Instead, most partition functions use a vector of the same length as the partitioned vector to control how it is partitioned. For example, [[Partitioned Enclose]] starts a new division whenever a 1 is encountered in the [[Boolean]] left argument:
<source lang=apl>
<source lang=apl>
       1 0 1 1 0 0 1 ⊂ 'abcdefg'
       1 0 1 1 0 0 1 ⊂ 'abcdefg'
Line 41: Line 39:
{{Works in|[[Dyalog APL]]}}
{{Works in|[[Dyalog APL]]}}
This style of partition function is slightly harder to understand and implement, but does have advantages:
This style of partition function is slightly harder to understand and implement, but does have advantages:
* The left argument can sometimes be obtained directly from the right argument. For example, a partition might be started whenever a space is encountered.
* The left argument can sometimes be obtained directly from the right argument. For example, a division might be started whenever a space is encountered.
* Left arguments can be merged using arithmetic. For two [[Partitioned Enclose]] arguments <source lang=apl inline>a</source> and <source lang=apl inline>b</source>, the [[Maximum]] <source lang=apl inline>a⌈b</source> gives a common [[wikipedia:Partition of a set#Refinement of partitions|refinement]] of the two corresponding partitions.
* Left arguments can be merged using arithmetic. For two [[Partitioned Enclose]] arguments <source lang=apl inline>a</source> and <source lang=apl inline>b</source>, the [[Maximum]] <source lang=apl inline>a⌈b</source> gives a common [[wikipedia:Partition of a set#Refinement of partitions|refinement]] of the two corresponding partitions.


Line 55: Line 53:
└──┴────┴─┘
└──┴────┴─┘
</source>
</source>
A more rigorous approach is to allow the left argument to specify the [[index]] of the division to which the corresponding element belongs. In order for this to produce a partition, the left argument must be a vector of non-decreasing non-negative integers (as with the division endpoint representation). Then for a left argument <source lang=apl inline>l</source> the number of partitions is <source lang=apl inline>⊃⌽l</source>.
A more rigorous approach is to allow the left argument to specify the [[index]] of the division to which the corresponding element belongs. In order for this to produce a partition, the left argument must be a vector of non-decreasing non-negative integers (as with the division endpoint representation). Then for a left argument <source lang=apl inline>l</source> the number of divisions is <source lang=apl inline>⊃⌽l</source>. BQN's [[Group (BQN)|Group]] uses an extension of this representation.
<source lang=apl>
<source lang=apl>
       1 1 3 3 3 3 6 part 'abcdefg'
       1 1 3 3 3 3 6 part 'abcdefg'
Line 84: Line 82:
└┴──┴┴────┴┴┴─┘
└┴──┴┴────┴┴┴─┘
</source>
</source>
As with the Partition-based representation, we must allow an additional element in the left argument after the end of the right argument in order to encode empty partitions after the end of the data.
As with the Partition-based representation, we must allow an additional element in the left argument after the end of the right argument in order to encode empty divisions after the end of the data.


This definition is nearly identical to the one used in [[Partitioned Enclose]] when the divider counts are [[Boolean]]. The only difference is in the first element: while in Partitioned Enclose a first element of 0 indicates that no division is started, and elements before the first 1 are excluded from the final partition, in the version here, a first element of 0 is like a 1 in Partitioned Enclose indicating that a division is started, while an initial 1 indicates that an empty division precedes the first non-empty division. The number used in the representation here is one higher than the one used by Partitioned Enclose. This encoding better represents complete partitions of the right argument because every vector on non-negative integers corresponds to a partition. In Partitioned Enclose vectors beginning with 0 correspond to incomplete partitions where some initial elements are not included. However, the partitioning used in Partitioned Enclose can still be produced by [[drop]]ping the first division.
This definition is nearly identical to the one used in [[Partitioned Enclose]] when the divider counts are [[Boolean]]. The only difference is in the first element: while in Partitioned Enclose a first element of 0 indicates that no division is started, and elements before the first 1 are excluded from the final division, in the version here, a first element of 0 is like a 1 in Partitioned Enclose indicating that a division is started, while an initial 1 indicates that an empty division precedes the first non-empty division. The number used in the representation here is one higher than the one used by Partitioned Enclose. This encoding better represents complete partitions of the right argument because every vector on non-negative integers corresponds to a partition. In Partitioned Enclose vectors beginning with 0 correspond to incomplete partitions where some initial elements are not included. However, the partitioning used in Partitioned Enclose can still be produced by [[drop]]ping the first division.


== Unification ==
== Unification ==


The division-based and divider-based representations are dual to one another: counting the number of elements with no divisions in between and the number of dividers with no elements in between are symmetric concepts. We can convert between then using the [[Indices]] primitive and its inverse. The following diagram shows the relationships between the four representations. Note the symmetry in the graphs for target indices and division endpoints: the graphs are essentially [[transpose]]s of one another. The [[Interval Index]] function can be used to convert between these two representations directly.
The division-based and divider-based representations are dual to one another: counting the number of elements with no divisions in between and the number of dividers with no elements in between are symmetric concepts. We can convert between them using the [[Indices]] primitive and its inverse. The following diagram shows the relationships between the four representations. Note the symmetry in the graphs for target indices and division endpoints: the graphs are essentially [[transpose]]s of one another. The [[Interval Index]] function can be used to convert between these two representations directly.


[[File:Partition representations.png|center|600px]]
[[File:Partition representations.png|center|600px]]
Line 98: Line 96:
The partitioning mechanisms shown above can be grouped in categories in several ways. The left and right halves of the diagram each contain vectors of the same length. This is because the vectors have the same domain: on the left, a value is given for each element to be partitioned, while on the right a value is given for each division or divider. The top and bottom halves also share common properties: representations on the top specify where elements should be placed (using either dense indices giving a position for each element or sparse indices giving how many elements go in each slot) while those on the bottom specify where dividers should be placed in the same sense.
The partitioning mechanisms shown above can be grouped in categories in several ways. The left and right halves of the diagram each contain vectors of the same length. This is because the vectors have the same domain: on the left, a value is given for each element to be partitioned, while on the right a value is given for each division or divider. The top and bottom halves also share common properties: representations on the top specify where elements should be placed (using either dense indices giving a position for each element or sparse indices giving how many elements go in each slot) while those on the bottom specify where dividers should be placed in the same sense.


{|class=wikitable style="margin: 1em auto 1em auto"
{|class=wikitable style="margin: 1em auto 1em auto; text-align:center"
|colspan=2 rowspan=2|
|colspan=2 rowspan=2|
!colspan=2| Domain
!colspan=2| Domain
Line 106: Line 104:
!rowspan=2|Positions
!rowspan=2|Positions
! Elements
! Elements
| Target indices || Division lengths
| Target indices<br/>([[Partition]])          || Division lengths
|-
|-
! Dividers
! Dividers
| Divider counts || Division endpoints
| Divider counts<br/>([[Partitioned Enclose]]) || Division endpoints<br/>([[Cut (K)]])
|}
|}
Another grouping appears when considering how values are indicated. The divider counts and division lengths are counts, while the target indices and division endpoints are [[Index|indices]]. This difference is reflected in the domains of these representations: while counts are allowed to be any vector of non-negative integers, indices must additionally be non-decreasing.
== Mesh representation ==
Because it interleaves elements from a vector with dividers, partitioning is closely related to the [[Mesh]] operation. In fact, a partition can be used to produce a mesh by prepending an element to each division and [[Raze]]ing:
<source lang=apl>
      ⎕←P
┌┬──┬┬────┬┬┬─┐
││ab││cdef│││g│
└┴──┴┴────┴┴┴─┘
      ⊃⍪/ 'ABCDEFG' ,¨ P
ABabCDcdefEFGg
</source>
The above vector is identical to the mesh of <source lang=apl inline>'ABCDEFG'</source> with <source lang=apl inline>'abcdefg'</source> using mesh vector <source lang=apl inline>0 0 1 1 0 0 1 1 1 1 0 0 0 1</source>.
A partition can also be extracted from the mesh using the [[Cut]] operator and the mesh vector. In [[J]]:
<source lang=j>
  (-.0 0 1 1 0 0 1 1 1 1 0 0 0 1) <;._1 'ABabCDcdefEFGg'
┌┬──┬┬────┬┬┬─┐
││ab││cdef│││g│
└┴──┴┴────┴┴┴─┘
</source>
{{Works in|[[J]]}}
Here <source lang=j inline>-.</source> is [[Not]], <source lang=j inline><</source> is [[Box]], and <source lang=j inline>;._1</source> uses the Cut operator <source lang=j inline>;.</source> with dropped initial elements.
The [[Boolean]] control vector used by Mesh serves as another way to represent partitions. Unlike the case with [[Partitioned Enclose]], a Boolean vector here is sufficient to represent any partition including those with empty divisions. This is possible because of the vector's increased length: while Partitioned Enclose uses only one Boolean for each element in the partitioned vector, the mesh vector has a 1 for each element in the partitioned vector as well as a 0 for each divider.
The mesh vector can be obtained from the other representations. In the diagram above, it contains a 1 for each horizontal line segment in the target indices graph, and a 0 for each vertical one. The directions are reversed for division endpoints.
<source lang=apl>
      {⍵/0 1⍴⍨≢⍵},1,⍤0⍨ 0 0 0 2 1 0  ⍝ From divider counts
1 1 1 0 0 1 0 1 1
      {⍵/1 0⍴⍨≢⍵},1,⍤0⍨ 3 0 1 2      ⍝ From division lengths
1 1 1 0 0 1 0 1 1 0
</source>
In the above functions, the only difference in code is exchanging 0 and 1, and the only difference in output is an extra trailing 0 when computed from the division lengths. The mesh representation makes the partition representation duality especially clear, as the dual representation can be obtained simply by [[Not|negating]] the mesh vector.
[[Category:Essays]]

Navigation menu