Partition representations: Difference between revisions

Jump to navigation Jump to search
m
Text replacement - "<source" to "<syntaxhighlight"
m (Text replacement - "</source>" to "</syntaxhighlight>")
m (Text replacement - "<source" to "<syntaxhighlight")
Line 3: Line 3:
== Definitions ==
== Definitions ==


In this page, a '''partition''' of a [[vector]] is defined to be a non-[[empty]] [[nested]] or [[box]]ed vector containing vectors, such that the [[Raze]] (or equivalently <source lang=apl inline>{⊃⍪/⍵}</syntaxhighlight> in [[Nested array model|nested]] APLs) of the partition intolerantly [[match]]es the original vector. A partition contains all the original [[element]]s of the vector, in the same order, but at one greater [[depth]].
In this page, a '''partition''' of a [[vector]] is defined to be a non-[[empty]] [[nested]] or [[box]]ed vector containing vectors, such that the [[Raze]] (or equivalently <syntaxhighlight lang=apl inline>{⊃⍪/⍵}</syntaxhighlight> in [[Nested array model|nested]] APLs) of the partition intolerantly [[match]]es the original vector. A partition contains all the original [[element]]s of the vector, in the same order, but at one greater [[depth]].


The vectors contained in a partition are called '''divisions''', and the boundaries between them are called '''dividers'''. Although the English word "partition" can be used for either of these, the ambiguity of using one word for three different objects could be confusing. Only boundaries between divisions are called dividers: we do not name the two outermost edges, which must exist in any partition.
The vectors contained in a partition are called '''divisions''', and the boundaries between them are called '''dividers'''. Although the English word "partition" can be used for either of these, the ambiguity of using one word for three different objects could be confusing. Only boundaries between divisions are called dividers: we do not name the two outermost edges, which must exist in any partition.
Line 11: Line 11:
== Representation using division lengths ==
== Representation using division lengths ==


Perhaps the most obvious way to represent a partition of a vector <source lang=apl inline>X</syntaxhighlight> is using a vector of lengths <source lang=apl inline>l</syntaxhighlight> such that <source lang=apl inline>(+/l) = ≢X</syntaxhighlight>. <source lang=apl inline>l</syntaxhighlight> gives the length of each division in order, that is, for a partition <source lang=apl inline>P</syntaxhighlight> of <source lang=apl inline>X</syntaxhighlight>, <source lang=apl inline>l ≡ ≢¨P</syntaxhighlight>. Because the length vector is determined by the partition in this way, two different length vectors cannot lead to the same partition.
Perhaps the most obvious way to represent a partition of a vector <syntaxhighlight lang=apl inline>X</syntaxhighlight> is using a vector of lengths <syntaxhighlight lang=apl inline>l</syntaxhighlight> such that <syntaxhighlight lang=apl inline>(+/l) = ≢X</syntaxhighlight>. <syntaxhighlight lang=apl inline>l</syntaxhighlight> gives the length of each division in order, that is, for a partition <syntaxhighlight lang=apl inline>P</syntaxhighlight> of <syntaxhighlight lang=apl inline>X</syntaxhighlight>, <syntaxhighlight lang=apl inline>l ≡ ≢¨P</syntaxhighlight>. Because the length vector is determined by the partition in this way, two different length vectors cannot lead to the same partition.


We can obtain a partition from the division lengths by using [[Take]] twice to obtain each division. In the process, we must compute the division endpoints <source lang=apl inline>+\l</syntaxhighlight> using a [[Plus]] [[Scan]].
We can obtain a partition from the division lengths by using [[Take]] twice to obtain each division. In the process, we must compute the division endpoints <syntaxhighlight lang=apl inline>+\l</syntaxhighlight> using a [[Plus]] [[Scan]].
<source lang=apl>
<syntaxhighlight lang=apl>
       l ← 2 0 3 3
       l ← 2 0 3 3
       X ← 'abcdefgh'
       X ← 'abcdefgh'
Line 24: Line 24:
{{Works in|[[Dyalog APL]]}}
{{Works in|[[Dyalog APL]]}}


The division endpoints are another way to represent partitions. The endpoints <source lang=apl inline>+\l</syntaxhighlight> computed above form a non-decreasing vector of non-negative integers whose last element is equal to the length of the partitioned vector. It would also be possible to use the partition starting points <source lang=apl inline>+\¯1↓0,l</syntaxhighlight>; here we use only endpoints for consistency. The latter representation matches [[K]]'s [[Cut (K)|Cut]] function, which produces a complete partition exactly when the left argument contains a zero (otherwise, some initial elements are omitted).
The division endpoints are another way to represent partitions. The endpoints <syntaxhighlight lang=apl inline>+\l</syntaxhighlight> computed above form a non-decreasing vector of non-negative integers whose last element is equal to the length of the partitioned vector. It would also be possible to use the partition starting points <syntaxhighlight lang=apl inline>+\¯1↓0,l</syntaxhighlight>; here we use only endpoints for consistency. The latter representation matches [[K]]'s [[Cut (K)|Cut]] function, which produces a complete partition exactly when the left argument contains a zero (otherwise, some initial elements are omitted).


Vectors of division endpoints correspond exactly to vectors of division lengths, because the [[Windowed Reduce|windowed]] [[difference]] <source lang=apl inline>¯2-/0,e</syntaxhighlight> is an exact [[inverse]] of Plus Scan which obtains division lengths <source lang=apl inline>l</syntaxhighlight> from endpoints <source lang=apl inline>e</syntaxhighlight>.
Vectors of division endpoints correspond exactly to vectors of division lengths, because the [[Windowed Reduce|windowed]] [[difference]] <syntaxhighlight lang=apl inline>¯2-/0,e</syntaxhighlight> is an exact [[inverse]] of Plus Scan which obtains division lengths <syntaxhighlight lang=apl inline>l</syntaxhighlight> from endpoints <syntaxhighlight lang=apl inline>e</syntaxhighlight>.


== Representation using divider locations ==
== Representation using divider locations ==


Existing APLs rarely define partitioning functions using properties of the resulting divisions. Instead, most partition functions use a vector of the same length as the partitioned vector to control how it is partitioned. For example, [[Partitioned Enclose]] starts a new division whenever a 1 is encountered in the [[Boolean]] left argument:
Existing APLs rarely define partitioning functions using properties of the resulting divisions. Instead, most partition functions use a vector of the same length as the partitioned vector to control how it is partitioned. For example, [[Partitioned Enclose]] starts a new division whenever a 1 is encountered in the [[Boolean]] left argument:
<source lang=apl>
<syntaxhighlight lang=apl>
       1 0 1 1 0 0 1 ⊂ 'abcdefg'
       1 0 1 1 0 0 1 ⊂ 'abcdefg'
┌──┬─┬───┬─┐
┌──┬─┬───┬─┐
Line 40: Line 40:
This style of partition function is slightly harder to understand and implement, but does have advantages:
This style of partition function is slightly harder to understand and implement, but does have advantages:
* The left argument can sometimes be obtained directly from the right argument. For example, a division might be started whenever a space is encountered.
* The left argument can sometimes be obtained directly from the right argument. For example, a division might be started whenever a space is encountered.
* Left arguments can be merged using arithmetic. For two [[Partitioned Enclose]] arguments <source lang=apl inline>a</syntaxhighlight> and <source lang=apl inline>b</syntaxhighlight>, the [[Maximum]] <source lang=apl inline>a⌈b</syntaxhighlight> gives a common [[wikipedia:Partition of a set#Refinement of partitions|refinement]] of the two corresponding partitions.
* Left arguments can be merged using arithmetic. For two [[Partitioned Enclose]] arguments <syntaxhighlight lang=apl inline>a</syntaxhighlight> and <syntaxhighlight lang=apl inline>b</syntaxhighlight>, the [[Maximum]] <syntaxhighlight lang=apl inline>a⌈b</syntaxhighlight> gives a common [[wikipedia:Partition of a set#Refinement of partitions|refinement]] of the two corresponding partitions.


A shortcoming of existing [[Partition]] and [[Partitioned Enclose]] is that they do not allow empty divisions to be produced. While the length representation makes empty divisions easy to represent, and the obvious implementation produces them with no trouble, it is less obvious how empty divisions should be represented in APL-style partitions. However, each can be modified to obtain a representation which is complete and one-to-one. In the case of Partitioned Enclose the modification is mostly [[backwards compatible]]: it extends the domain from [[Boolean]] vectors to vectors of non-negative integers, and the only incompatibility between our definition and the existing APL definition will be an offset of one for the first element.
A shortcoming of existing [[Partition]] and [[Partitioned Enclose]] is that they do not allow empty divisions to be produced. While the length representation makes empty divisions easy to represent, and the obvious implementation produces them with no trouble, it is less obvious how empty divisions should be represented in APL-style partitions. However, each can be modified to obtain a representation which is complete and one-to-one. In the case of Partitioned Enclose the modification is mostly [[backwards compatible]]: it extends the domain from [[Boolean]] vectors to vectors of non-negative integers, and the only incompatibility between our definition and the existing APL definition will be an offset of one for the first element.
Line 47: Line 47:


First, we give a representation related to the one used in Partition. Recall that Partition starts a new division whenever values in its left argument increase (similar to [[Key]], except that it cares about whether values are adjacent):
First, we give a representation related to the one used in Partition. Recall that Partition starts a new division whenever values in its left argument increase (similar to [[Key]], except that it cares about whether values are adjacent):
<source lang=apl>
<syntaxhighlight lang=apl>
       1 1 3 3 3 3 6 ⊆ 'abcdefg'
       1 1 3 3 3 3 6 ⊆ 'abcdefg'
┌──┬────┬─┐
┌──┬────┬─┐
Line 53: Line 53:
└──┴────┴─┘
└──┴────┴─┘
</syntaxhighlight>
</syntaxhighlight>
A more rigorous approach is to allow the left argument to specify the [[index]] of the division to which the corresponding element belongs. In order for this to produce a partition, the left argument must be a vector of non-decreasing non-negative integers (as with the division endpoint representation). Then for a left argument <source lang=apl inline>l</syntaxhighlight> the number of divisions is <source lang=apl inline>⊃⌽l</syntaxhighlight>. BQN's [[Group (BQN)|Group]] uses an extension of this representation.
A more rigorous approach is to allow the left argument to specify the [[index]] of the division to which the corresponding element belongs. In order for this to produce a partition, the left argument must be a vector of non-decreasing non-negative integers (as with the division endpoint representation). Then for a left argument <syntaxhighlight lang=apl inline>l</syntaxhighlight> the number of divisions is <syntaxhighlight lang=apl inline>⊃⌽l</syntaxhighlight>. BQN's [[Group (BQN)|Group]] uses an extension of this representation.
<source lang=apl>
<syntaxhighlight lang=apl>
       1 1 3 3 3 3 6 part 'abcdefg'
       1 1 3 3 3 3 6 part 'abcdefg'
┌┬──┬┬────┬┬┬─┐
┌┬──┬┬────┬┬┬─┐
Line 60: Line 60:
└┴──┴┴────┴┴┴─┘
└┴──┴┴────┴┴┴─┘
</syntaxhighlight>
</syntaxhighlight>
Note that the non-empty divisions above are identical to those produced by Partition. The difference is that empty divisions are inserted when the index increases (including from an implicit initial index of <source lang=apl inline>¯1</syntaxhighlight>) based on how many indices were skipped (since the divisions with those indices can't contain any elements).
Note that the non-empty divisions above are identical to those produced by Partition. The difference is that empty divisions are inserted when the index increases (including from an implicit initial index of <syntaxhighlight lang=apl inline>¯1</syntaxhighlight>) based on how many indices were skipped (since the divisions with those indices can't contain any elements).


Target indices can be converted to division endpoints using [[Interval Index]], and then to division lengths with a pair-wise difference:
Target indices can be converted to division endpoints using [[Interval Index]], and then to division lengths with a pair-wise difference:
<source lang=apl>
<syntaxhighlight lang=apl>
       {1+⍵⍸⍳1+⊃⌽⍵}1 1 3 3 3 3 6
       {1+⍵⍸⍳1+⊃⌽⍵}1 1 3 3 3 3 6
0 2 2 6 6 6 7
0 2 2 6 6 6 7
Line 74: Line 74:


The target index of an element is the same as the number of dividers which fall before it. Motivated by this fact, we might use the pair-wise differences of the target indices as another partition representation. Taking differences converts the total number of dividers before each element to the number of dividers immediately preceding it. For example, the vector below encodes the same partition as that shown in the previous section.
The target index of an element is the same as the number of dividers which fall before it. Motivated by this fact, we might use the pair-wise differences of the target indices as another partition representation. Taking differences converts the total number of dividers before each element to the number of dividers immediately preceding it. For example, the vector below encodes the same partition as that shown in the previous section.
<source lang=apl>
<syntaxhighlight lang=apl>
       ¯2-/0,1 1 3 3 3 3 6
       ¯2-/0,1 1 3 3 3 3 6
1 0 2 0 0 0 3
1 0 2 0 0 0 3
Line 115: Line 115:


Because it interleaves elements from a vector with dividers, partitioning is closely related to the [[Mesh]] operation. In fact, a partition can be used to produce a mesh by prepending an element to each division and [[Raze]]ing:
Because it interleaves elements from a vector with dividers, partitioning is closely related to the [[Mesh]] operation. In fact, a partition can be used to produce a mesh by prepending an element to each division and [[Raze]]ing:
<source lang=apl>
<syntaxhighlight lang=apl>
       ⎕←P
       ⎕←P
┌┬──┬┬────┬┬┬─┐
┌┬──┬┬────┬┬┬─┐
Line 123: Line 123:
ABabCDcdefEFGg
ABabCDcdefEFGg
</syntaxhighlight>
</syntaxhighlight>
The above vector is identical to the mesh of <source lang=apl inline>'ABCDEFG'</syntaxhighlight> with <source lang=apl inline>'abcdefg'</syntaxhighlight> using mesh vector <source lang=apl inline>0 0 1 1 0 0 1 1 1 1 0 0 0 1</syntaxhighlight>.
The above vector is identical to the mesh of <syntaxhighlight lang=apl inline>'ABCDEFG'</syntaxhighlight> with <syntaxhighlight lang=apl inline>'abcdefg'</syntaxhighlight> using mesh vector <syntaxhighlight lang=apl inline>0 0 1 1 0 0 1 1 1 1 0 0 0 1</syntaxhighlight>.


A partition can also be extracted from the mesh using the [[Cut]] operator and the mesh vector. In [[J]]:
A partition can also be extracted from the mesh using the [[Cut]] operator and the mesh vector. In [[J]]:
<source lang=j>
<syntaxhighlight lang=j>
   (-.0 0 1 1 0 0 1 1 1 1 0 0 0 1) <;._1 'ABabCDcdefEFGg'
   (-.0 0 1 1 0 0 1 1 1 1 0 0 0 1) <;._1 'ABabCDcdefEFGg'
┌┬──┬┬────┬┬┬─┐
┌┬──┬┬────┬┬┬─┐
Line 133: Line 133:
</syntaxhighlight>
</syntaxhighlight>
{{Works in|[[J]]}}
{{Works in|[[J]]}}
Here <source lang=j inline>-.</syntaxhighlight> is [[Not]], <source lang=j inline><</syntaxhighlight> is [[Box]], and <source lang=j inline>;._1</syntaxhighlight> uses the Cut operator <source lang=j inline>;.</syntaxhighlight> with dropped initial elements.
Here <syntaxhighlight lang=j inline>-.</syntaxhighlight> is [[Not]], <syntaxhighlight lang=j inline><</syntaxhighlight> is [[Box]], and <syntaxhighlight lang=j inline>;._1</syntaxhighlight> uses the Cut operator <syntaxhighlight lang=j inline>;.</syntaxhighlight> with dropped initial elements.


The [[Boolean]] control vector used by Mesh serves as another way to represent partitions. Unlike the case with [[Partitioned Enclose]], a Boolean vector here is sufficient to represent any partition including those with empty divisions. This is possible because of the vector's increased length: while Partitioned Enclose uses only one Boolean for each element in the partitioned vector, the mesh vector has a 1 for each element in the partitioned vector as well as a 0 for each divider.
The [[Boolean]] control vector used by Mesh serves as another way to represent partitions. Unlike the case with [[Partitioned Enclose]], a Boolean vector here is sufficient to represent any partition including those with empty divisions. This is possible because of the vector's increased length: while Partitioned Enclose uses only one Boolean for each element in the partitioned vector, the mesh vector has a 1 for each element in the partitioned vector as well as a 0 for each divider.


The mesh vector can be obtained from the other representations. In the diagram above, it contains a 1 for each horizontal line segment in the target indices graph, and a 0 for each vertical one. The directions are reversed for division endpoints.
The mesh vector can be obtained from the other representations. In the diagram above, it contains a 1 for each horizontal line segment in the target indices graph, and a 0 for each vertical one. The directions are reversed for division endpoints.
<source lang=apl>
<syntaxhighlight lang=apl>
       {⍵/0 1⍴⍨≢⍵},1,⍤0⍨ 0 0 0 2 1 0  ⍝ From divider counts
       {⍵/0 1⍴⍨≢⍵},1,⍤0⍨ 0 0 0 2 1 0  ⍝ From divider counts
1 1 1 0 0 1 0 1 1
1 1 1 0 0 1 0 1 1
Line 147: Line 147:


Conversion from index-based representations is less obvious but more efficient, and also easily invertible. These representations index into either elements or dividers, but can be converted to index into the mesh vector, which combines both, by adding one for each previous index. These adjusted indices give the locations of all ones, or all zeros, in the mesh vector, depending on the initial representation.
Conversion from index-based representations is less obvious but more efficient, and also easily invertible. These representations index into either elements or dividers, but can be converted to index into the mesh vector, which combines both, by adding one for each previous index. These adjusted indices give the locations of all ones, or all zeros, in the mesh vector, depending on the initial representation.
<source lang=apl>
<syntaxhighlight lang=apl>
       ⍸⍣¯1{⍵+⍳≢⍵} 0 0 0 2 3 3  ⍝ From target indices
       ⍸⍣¯1{⍵+⍳≢⍵} 0 0 0 2 3 3  ⍝ From target indices
1 1 1 0 0 1 0 1 1
1 1 1 0 0 1 0 1 1

Navigation menu