Unicode: Difference between revisions

From APL Wiki
Jump to navigation Jump to search
(Created page with "The advent of '''wikipedia:Unicode''' solved many problems with dealing with APL glyphs, however there was still some wiggle room as to which Unicode wikipedia:code...")
 
No edit summary
Line 1: Line 1:
The advent of '''[[wikipedia:Unicode|Unicode]]''' solved many problems with dealing with APL [[glyph]]s, however there was still some wiggle room as to which Unicode [[wikipedia:code point|code point]] were to be used in a Unicode implementation of APL, and different implementers made different choices.  This article, which documents these differences, is adapted from an original paper by [[Bob Smith]]<ref>Smith, Bob. [http://www.sudleyplace.com/APL/APL%20Characters%20and%20Their%20Aliases.pdf APL Characters and Their Aliases]. 14 Dec 2013–25 Dec 2019. Sudley Place Software.</ref> that attempted to raise awareness of these issues because the differences impede transfer of information.
The advent of '''[[wikipedia:Unicode|Unicode]]''' solved many problems with dealing with APL [[glyph|character]]s, however there was still some wiggle room as to which Unicode [[wikipedia:code point|codepoint]] were to be used in a Unicode implementation of APL, and different implementors made different choices.  This article, which documents these differences, is adapted from an original paper by [[Bob Smith]]<ref>Smith, Bob. [http://www.sudleyplace.com/APL/APL%20Characters%20and%20Their%20Aliases.pdf APL Characters and Their Aliases]. 14 Dec 2013–25 Dec 2019. Sudley Place Software.</ref> that attempted to raise awareness of these issues because the differences impede transfer of information.


The relevant document for the APL character set is the ''APL Character Repertoire'' (ACR)<ref>ISO-IEC/JTC1/SC22/WG3. [http://std.dkuug.dk/jtc1/sc22/open/n3067.pdf N3067]: APL Character Repertoire. 28 Dec 1999.</ref>. For whatever reasons, that document never became a standard, but it does provide some guidance, and is better than each implementor making separate choices.
==Introduction==
There are a surprising number of similar APL characters in Unicode and in a number of cases some implementors went one way, others the other way.  The following table lists the characters in question, along with the way [[APL2]], [[Dyalog]], [[GNU APL]], [[NARS2000]], [[ngn/apl]], and [[dzaima/APL]] behave. APL2000 states that ''Generally the default codepoint scheme for the VisualAPL product follows the IBM APL2 workstation scheme''. Please [https://aplwiki.com/index.php?title=Unicode&action=edit edit] this page if you believe there are other characters that should be included in the table.
When there are differences among APL implementations, users can become confused. They type something into one APL system, copy it to another and are greeted by a [[SYNTAX ERROR]] or the like.
The whole basis for the confusion in a lengthy thread on comp.lang.apl entitled ''caret vs and''<ref>[https://groups.google.com/forum/#!forum/comp.lang.apl comp.lang.apl]. [https://groups.google.com/d/msg/comp.lang.apl/LTV-HTxEZI0/DAPcTrVPnmwJ caret vs and]. 28 Oct 2013&ndash;9 Dec 2013</ref> is that in some implementations the symbol for the logical [[And]] function is U+005E only, in some implementations it's U+2227 only, and in some both characters work. The original poster encountered some APL text from the [[APL Wiki]] that had been produced by a system that supports U+005E and copied it into a system that uses U+2227 only and fails on U+005E.
When our systems differ in the set of acceptable characters for the same function, it serves only to confuse the end user to the detriment of the community.
==Comparison of implementations==
{| class=wikitable  
{| class=wikitable  
! APL name !! Glyph !! Code point !! Unicode name !! [[APL2]] !! [[Dyalog APL]] !! [[GNU APL]] !! [[NARS2000]] !! [[ngn/apl]] !! [[dzaima/APL]]
! APL name !! Glyph !! Codepoint !! Unicode name !! [[APL2]] !! [[Dyalog APL]] !! [[GNU APL]] !! [[NARS2000]] !! [[ngn/apl]] !! [[dzaima/APL]]
|-
! rowspan=2 | Epsilon
| style=text-align:center | <source lang=apl inline>∈</source><ref>Found by Hanspeter Moser in [https://www.gnu.org/software/apl/Bits_and_Pieces/toronto-toolkit.apl.html The Toronto Toolkit]</ref> || style=text-align:center | U+2208 || Element of || {{Yes}} || {{Yes}} || {{No}} || {{No}} || {{Yes}} || {{Yes}}
|-
| style=text-align:center | <source lang=apl inline>∊</source> || style=text-align:center | U+220A || Small Element of || {{Yes}} || {{Yes}} || {{Yes}} || {{Yes}} || {{Yes}} || {{Yes}}
|-
! style=background:red;text-align:center colspan=10 | main content awaiting approval
|-
|-
! rowspan=2 | Circle
! rowspan=2 | Circle
Line 8: Line 26:
|-
|-
| style=text-align:center | <source lang=apl inline>⚪</source> || style=text-align:center | U+26AA || Medium white circle || {{No}} || {{No}} || {{No}} || {{Yes}} || {{No}} || {{No}}
| style=text-align:center | <source lang=apl inline>⚪</source> || style=text-align:center | U+26AA || Medium white circle || {{No}} || {{No}} || {{No}} || {{Yes}} || {{No}} || {{No}}
|-
|-
|-
! rowspan=4 | Diamond
! rowspan=4 | Diamond
Line 19: Line 36:
| style=text-align:center | <source lang=apl inline>⬦</source> || style=text-align:center | U+2B26 || Diamond          || {{No}} || {{No}} || {{Yes}} || {{Yes}} || {{No}} || {{No}}
| style=text-align:center | <source lang=apl inline>⬦</source> || style=text-align:center | U+2B26 || Diamond          || {{No}} || {{No}} || {{Yes}} || {{Yes}} || {{No}} || {{No}}
|}
|}
These characters are included here because they have been encountered them in APL code displayed somewhere on the Internet or in a PDF file. Thus blindly copying them into an APL [[session]] can produce an error which might well confuse the user.
== Functionality ==
The following statements can be used to test the functionality of the symbols:
<source lang=apl>
⍎⎕← '1',(⎕UCS 16⊥2  2  0 10),'1'  ⍝ Epsilon
⍎⎕← '1',(⎕UCS 16⊥2  2  0  8),'1'  ⍝ Epsilon
⍎⎕← '1',(⎕UCS 16⊥2  2  1  2),'1'  ⍝ Minus
⍎⎕← '1',(⎕UCS 16⊥0  0  2 13),'1'  ⍝ Minus
⍎⎕← '1',(⎕UCS 16⊥2  2  2  3),'1'  ⍝ Modulus
⍎⎕← '1',(⎕UCS 16⊥0  0  7 12),'1'  ⍝ Modulus
⍎⎕← '1',(⎕UCS 16⊥2  2 12  6),'1'  ⍝ Star
⍎⎕← '1',(⎕UCS 16⊥0  0  2 10),'1'  ⍝ Star
⍎⎕←    (⎕UCS 16⊥2  2  3 12),'1'  ⍝ Tilde
⍎⎕←    (⎕UCS 16⊥0  0  7 14),'1'  ⍝ Tilde
⍎⎕← '1',(⎕UCS 16⊥2  3  7  1),'1'  ⍝ Nor
⍎⎕← '1',(⎕UCS 16⊥2  2 11 13),'1'  ⍝ Nor
⍎⎕← '1',(⎕UCS 16⊥2  3  7  2),'1'  ⍝ Nand
⍎⎕← '1',(⎕UCS 16⊥2  2 11 12),'1'  ⍝ Nand
⍎⎕← '1',(⎕UCS 16⊥2  2  2  7),'1'  ⍝ And
⍎⎕← '1',(⎕UCS 16⊥0  0  5 14),'1'  ⍝ And
⍎⎕← '1',(⎕UCS 16⊥2  2  6  4),'1'  ⍝ Not More
⍎⎕← '1',(⎕UCS 16⊥2 10  7 13),'1'  ⍝ Not More
⍎⎕← '1',(⎕UCS 16⊥2  2  6  5),'1'  ⍝ Not Less
⍎⎕← '1',(⎕UCS 16⊥2 10  7 14),'1'  ⍝ Not less
⍎⎕← '1',(⎕UCS 16⊥2  5 14  6),'.=1' ⍝ Jot
⍎⎕← '1',(⎕UCS 16⊥2  2  1  8),'.=1' ⍝ Jot
⍎⎕← '1',(⎕UCS 16⊥2  6 10 10),'1'  ⍝ Circle
⍎⎕← '1',(⎕UCS 16⊥2  5 12 11),'1'  ⍝ Circle
⍎⎕← '1',(⎕UCS 16⊥2 11  2  6),'1'  ⍝ Diamond
⍎⎕← '1',(⎕UCS 16⊥2  2 12  4),'1'  ⍝ Diamond
⍎⎕← '1',(⎕UCS 16⊥2  5 12  7),'1'  ⍝ Diamond
⍎⎕← '1',(⎕UCS 16⊥2  5 12 10),'1'  ⍝ Diamond
⍎⎕←    (⎕UCS 16⊥2  5 10 15),'←1'  ⍝ Quad
⍎⎕←    (⎕UCS 16⊥2  3  9  5),'←1'  ⍝ Quad
⍎⎕←'1{',(⎕UCS 16⊥2  3  7 10),'}1'  ⍝ Alpha
⍎⎕←'1{',(⎕UCS 16⊥0  3 11  1),'}1'  ⍝ Alpha
⍎⎕← '{',(⎕UCS 16⊥2  3  7  5),'}1'  ⍝ Omega
⍎⎕← '{',(⎕UCS 16⊥0  3 12  9),'}1'  ⍝ Omega
</source>
Note that the last four lines will not work on a system that doesn’t support [[dfns]].
==Atomic Vector==
If the [[Atomic vector]] (<source lang=apl inline>⎕AV</source>) has no room in which to include these new characters, an implementation can translate them on entry to the corresponding symbol that is in <source lang=apl inline>⎕AV</source>. [[NARS2000]] even has a means of translating symbols on the way out via Copy (<kbd>Ctrl</kbd>+<kbd>C</kbd> in Windows) to various other APL systems that don't support the same set of principal characters NARS2000 uses for the functions in the above table.
==Considerations==
Unicode was a great start to enabling APL characters to be used, however in order for there to be interoperability, implementors have to agree upon which characters are functional. It doesn't matter if one's system can change the mapping of glyphs to codepoints as the vast majority of users won't change from the default behavior. Implementors therefore have to decide if it is worthwhile to support the above codepoints.


== References ==
== References ==
<references/>
<references/>

Revision as of 14:24, 26 December 2019

The advent of Unicode solved many problems with dealing with APL characters, however there was still some wiggle room as to which Unicode codepoint were to be used in a Unicode implementation of APL, and different implementors made different choices. This article, which documents these differences, is adapted from an original paper by Bob Smith[1] that attempted to raise awareness of these issues because the differences impede transfer of information.

The relevant document for the APL character set is the APL Character Repertoire (ACR)[2]. For whatever reasons, that document never became a standard, but it does provide some guidance, and is better than each implementor making separate choices.

Introduction

There are a surprising number of similar APL characters in Unicode and in a number of cases some implementors went one way, others the other way. The following table lists the characters in question, along with the way APL2, Dyalog, GNU APL, NARS2000, ngn/apl, and dzaima/APL behave. APL2000 states that Generally the default codepoint scheme for the VisualAPL product follows the IBM APL2 workstation scheme. Please edit this page if you believe there are other characters that should be included in the table.

When there are differences among APL implementations, users can become confused. They type something into one APL system, copy it to another and are greeted by a SYNTAX ERROR or the like.

The whole basis for the confusion in a lengthy thread on comp.lang.apl entitled caret vs and[3] is that in some implementations the symbol for the logical And function is U+005E only, in some implementations it's U+2227 only, and in some both characters work. The original poster encountered some APL text from the APL Wiki that had been produced by a system that supports U+005E and copied it into a system that uses U+2227 only and fails on U+005E.

When our systems differ in the set of acceptable characters for the same function, it serves only to confuse the end user to the detriment of the community.

Comparison of implementations

APL name Glyph Codepoint Unicode name APL2 Dyalog APL GNU APL NARS2000 ngn/apl dzaima/APL
Epsilon [4] U+2208 Element of Yes Yes No No Yes Yes
U+220A Small Element of Yes Yes Yes Yes Yes Yes
main content awaiting approval
Circle U+25CB White circle Yes Yes Yes Yes Yes Yes
U+26AA Medium white circle No No No Yes No No
Diamond U+22C4 Diamond operator Yes Yes Yes Yes Yes Yes
U+25C7 White Diamond No No No Yes No No
U+25CA Lozenge No No Yes Yes No No
U+2B26 Diamond No No Yes Yes No No

These characters are included here because they have been encountered them in APL code displayed somewhere on the Internet or in a PDF file. Thus blindly copying them into an APL session can produce an error which might well confuse the user.

Functionality

The following statements can be used to test the functionality of the symbols:

⍎⎕← '1',(⎕UCS 16⊥2  2  0 10),'1'   ⍝ Epsilon
⍎⎕← '1',(⎕UCS 16⊥2  2  0  8),'1'   ⍝ Epsilon
⍎⎕← '1',(⎕UCS 16⊥2  2  1  2),'1'   ⍝ Minus
⍎⎕← '1',(⎕UCS 16⊥0  0  2 13),'1'   ⍝ Minus
⍎⎕← '1',(⎕UCS 16⊥2  2  2  3),'1'   ⍝ Modulus
⍎⎕← '1',(⎕UCS 16⊥0  0  7 12),'1'   ⍝ Modulus
⍎⎕← '1',(⎕UCS 16⊥2  2 12  6),'1'   ⍝ Star
⍎⎕← '1',(⎕UCS 16⊥0  0  2 10),'1'   ⍝ Star
⍎⎕←     (⎕UCS 16⊥2  2  3 12),'1'   ⍝ Tilde
⍎⎕←     (⎕UCS 16⊥0  0  7 14),'1'   ⍝ Tilde
⍎⎕← '1',(⎕UCS 16⊥2  3  7  1),'1'   ⍝ Nor
⍎⎕← '1',(⎕UCS 16⊥2  2 11 13),'1'   ⍝ Nor
⍎⎕← '1',(⎕UCS 16⊥2  3  7  2),'1'   ⍝ Nand
⍎⎕← '1',(⎕UCS 16⊥2  2 11 12),'1'   ⍝ Nand
⍎⎕← '1',(⎕UCS 16⊥2  2  2  7),'1'   ⍝ And
⍎⎕← '1',(⎕UCS 16⊥0  0  5 14),'1'   ⍝ And
⍎⎕← '1',(⎕UCS 16⊥2  2  6  4),'1'   ⍝ Not More
⍎⎕← '1',(⎕UCS 16⊥2 10  7 13),'1'   ⍝ Not More
⍎⎕← '1',(⎕UCS 16⊥2  2  6  5),'1'   ⍝ Not Less
⍎⎕← '1',(⎕UCS 16⊥2 10  7 14),'1'   ⍝ Not less
⍎⎕← '1',(⎕UCS 16⊥2  5 14  6),'.=1' ⍝ Jot
⍎⎕← '1',(⎕UCS 16⊥2  2  1  8),'.=1' ⍝ Jot
⍎⎕← '1',(⎕UCS 16⊥2  6 10 10),'1'   ⍝ Circle
⍎⎕← '1',(⎕UCS 16⊥2  5 12 11),'1'   ⍝ Circle
⍎⎕← '1',(⎕UCS 16⊥2 11  2  6),'1'   ⍝ Diamond
⍎⎕← '1',(⎕UCS 16⊥2  2 12  4),'1'   ⍝ Diamond
⍎⎕← '1',(⎕UCS 16⊥2  5 12  7),'1'   ⍝ Diamond
⍎⎕← '1',(⎕UCS 16⊥2  5 12 10),'1'   ⍝ Diamond
⍎⎕←     (⎕UCS 16⊥2  5 10 15),'←1'  ⍝ Quad
⍎⎕←     (⎕UCS 16⊥2  3  9  5),'←1'  ⍝ Quad
⍎⎕←'1{',(⎕UCS 16⊥2  3  7 10),'}1'  ⍝ Alpha
⍎⎕←'1{',(⎕UCS 16⊥0  3 11  1),'}1'  ⍝ Alpha
⍎⎕← '{',(⎕UCS 16⊥2  3  7  5),'}1'  ⍝ Omega
⍎⎕← '{',(⎕UCS 16⊥0  3 12  9),'}1'  ⍝ Omega

Note that the last four lines will not work on a system that doesn’t support dfns.

Atomic Vector

If the Atomic vector (⎕AV) has no room in which to include these new characters, an implementation can translate them on entry to the corresponding symbol that is in ⎕AV. NARS2000 even has a means of translating symbols on the way out via Copy (Ctrl+C in Windows) to various other APL systems that don't support the same set of principal characters NARS2000 uses for the functions in the above table.

Considerations

Unicode was a great start to enabling APL characters to be used, however in order for there to be interoperability, implementors have to agree upon which characters are functional. It doesn't matter if one's system can change the mapping of glyphs to codepoints as the vast majority of users won't change from the default behavior. Implementors therefore have to decide if it is worthwhile to support the above codepoints.

References

  1. Smith, Bob. APL Characters and Their Aliases. 14 Dec 2013–25 Dec 2019. Sudley Place Software.
  2. ISO-IEC/JTC1/SC22/WG3. N3067: APL Character Repertoire. 28 Dec 1999.
  3. comp.lang.apl. caret vs and. 28 Oct 2013–9 Dec 2013
  4. Found by Hanspeter Moser in The Toronto Toolkit