SC22/WG20 N883R

From: Keld Jørn Simonsen [[email protected]]

Sent: Friday, October 05, 2001 12:19 AM

Draft for approval/Keld

Title: Disposition of comments on DTR 14652.

Source: WG20

Date: 2001-10-04

Cross reference: JTC1 N6483

Draft disposition of comments on JTC 1 N 6404, ISO/IEC DTR 14652 -

Functionality for Internalization - Specification Method for Cultural

Conventions

The DTR 14652 ballot closed with 10 votes approving the draft

and 8 votes disapproving.

Although this ballot received a majority of approval votes, there

were a significant number of negative votes and comments from

national bodies. Therefore, the summary of voting was sent

to SC 22 for resolution of the comments and preparation of a revised DTR text.

SC22 passed a resolution where WG20 was requested to produce

a repacked DTR where the non-controversial and controversial

portions are identified.

This will be done by marking all controversial clauses of the specification,

as noted below.

The following is the dispositions of comments.

> Finland

> DISAPPROVAL OF THE DRAFT FOR REASONS ON THE ATTACHED (Please

> indicate if acceptance of these reasons and appropriate changes

> in the text will change your vote to approval)

> In our consultation with other national bodies on this, we have

> become aware of the forthcoming negative US comments that we

> find both relevant and appropriate. We believe that it would

> serve no useful purpose and be utmost inefficient if we were

> to reformulate them by ourselves, especially since we believe

> that it would be difficult if not impossible to try to solve

> the problems inherent in this DTR with pointed, individual

> editing instructions.

Response: The Finnish comments are implicitely resonded to via

other responses, as they are defined implicitely via other comments.

The Finnish member body will be asked if they would like to

have these comments or other comments in annex D.

> France

> Please note that the acceptance of these reasons and appropriate changes

> in the text will change your vote to approval.

> 1-Audience expectation

> The objective of DTR 14652 set forth in line 148 :

> [..] that are expected to be developed for a number of programming languages

> cannot be reached because the DTR 4652 is kept compatible with POSIX:1996,

> as stated in its "Scope" section. POSIX:1996 architecture is not fit for

> a general and modern specification of cultural services, and its next

> revision is not expected to improve on that particular matter.

> Conversely, keeping POSIX compatibility will doubtlessly serve

> POSIX audience better, so we believe the DTR shall insists it

> belong to the POSIX culture. Therefore, we kindly request the

> line 148 to specify its audience is POSIX culture, such as :

> The descriptions are intended to be coded in text files

> to be used via Application Programming Interfaces,

> that are expected to be developed for a number of

> systems which comply with ISO/IEC 9945.

Response: accepted. Will replace the text.

> 2-POSIX 200X compatibility

> At the same time, every reasonable step must be taken to

> ensure that the DTR will accommodate POSIX 200x. We kindly

> request this effort is asserted or documented somewhere.

Response: Accepted. The work has been aligned with the POSIX:200x work.

The POSIX 200X will be added to the normative references, and that an

alignment effort has been done is added to the the scope.

> 3-Multiple currency

> The currency multiple value is not expected to be used because

> the DTR will be published after the most important dual

> currency period. It is expected that the solutions that

> will be implemented using (hopefully) POSIX until 2002-02-17

> (end of dual currency for the first round of "Euroland"

> countries) will be used in future currency-switching countries,

> in Europe or elsewhere. In short this solution arrives too late,

> and is not proven to be appropriate. We kindly request it is withdrawn.

Response: Other member bodies want this functionality.

The LC_MONETARY clause will be marked as controversial.

> 4-Iswgraph

> We are not sure that the DTR allows for intelligent categories, e.g.

> that one can handle multiple "space" characters, like C/C++ does

> with iswgraph. If not so, we kindly request the DTR to do so,

> in particular (but not only) for multiple space.

Response: The TR will be catering for multiple "space" characters as

C/C++ does, as the information provided via the classes are meant

to also be used by C/C++ APIs. No change.

> Germany

> Vote: Disapproval with comments

> If the comments are satisfactorily resolved, the vote will change to

> approval for this TR.

> For the record it should be stated that Germany will not support the

> transformation of this TR into an international standard even if all of

> these comments are resolved.

> Comments:

> General: The draft technical report has now reached a certain stage of

> maturity that might possibly make it useful for guidance in certain

> communities. However, it still contains a considerable number of errors and

> shortcomings, some of a systematic nature, that make it unsuitable for

> acceptance.

> * There are multiple errors in the membership of LC_CTYPE classes. For

> example, the draft introduces two new classes that are meant to be related

> to the ISO/IEC 10646-1 descriptions of combining characters. However, the

> draft has its own, somewhat peculiar interpretation of combining

> characters: Simply "combining" are, quite properly, "Characters to form

> composite graphic symbols, such as characters listed in ISO/IEC 10646:1993

> annex B.1." (l. 939f). This is, what one would intuitively understand also

> combining3 (i. e. combining characters allowed in a level 3 implementation

> of 10646, i. e. all) to mean. However, combining3 is combining -

> "combining2", i. e. minus those combining characters in a level 2

> implementation. The terminology should be adapted accordingly, e. g.

> combining for all combining characters (with combining3 as an equivalent)

> and combining2 to mean specifically those allowed in a level 2

> implementation - if combining2 is indeed needed at all.

> Most ideographs are not included in the <graph> class (why?). Also, the

> draft includes only the repertoire of 10646 as it was in 1998 and should be

> extended to cover at least 10646-1:2000.

Response: Partly accepted. Some errors have been found in the

tables for the LC_CTYPE classes. They will be corrected as follows:

Include in line 1670: U06D6

Reason: this character is in the combining class in 10646

Exclude in line 1683: U0F88..U0F89

Reason: these characters are not in the combining class in 10646

line 1263 change U20AA to U20AC

Reason: to include the Dong sign and Euro sign in class punct

Also inclue FFFC in class punct.

For combining classes: The draft follows the tables in ISO/IEC 10646-1:1993,

and the tables of annex B in that standard and the draft are the same.

It seems to be most secure to closely follow the originating standard.

The keywords reflect a widely implemeted industry practice.

No change.

For the ideographs: As they are in the "alpha" class, they are automatically

included in the "graph" class. But it is agreed that this would be

better documented if these characters were explicitely included.

Thus for line 1338 add Hangul and CJK: UAC00..UD7A3 U4E00..U9FA5

It was decided to follow closely a specific repertoire of IS 10646,

as also done for IS 14651 and TR 10176. The repertoire can be updated

in a future revision of this TR.

> *The changes to the monetary section that are incompatible with the current

> POSIX.2 standard (ISO/IEC 9945-2) must be removed, in particular all cases

> where it has previously only been allowed to insert one value, but now a

> semicolon-delimited list of values. This is true in particular for the

> definition of multiple national currencies.

Response: Not accepted. LC_MONETARY will be marked as controversial.

A new specification will not be downward compatible, such that

older applications can use the new specifications.

This is also what is stated in the scope, that is is not expected that

POSIX will be able to handle all new constructs in this specification.

> a) This breaks implementations that expect the single values defined in

> POSIX.2.

Response: 14652 is an enhancement to POSIX. Applications that use the new

format can also use the older POSIX locales. However, it is

not the intention that old applications can use the new format

without change.

> b) It does not specify which of the currencies should be selected,

> unless the "valid_from" and "valid_to" keywords are meant to be such a

> mechanism. In this case, the mechanism would be highly unsuitable

> especially for the case of the Euro where the currency co-exists over a

> period of time (now virtually over), and the correct currency sign is

> selected on a case by case basis. The Java approach of multiple locales for

> one language and locale, differing only with respect to a certain point -

> currency in this case - is far more flexible and user friendly.

Response: Yes, the "valid_from" and "valid_to" keywords are meant

to be such a mechanism. It caters for the Euro case, which may occur in the

future for other European countries switching to the Euro currency.

The Java approach is not something that is generally available in

SC22 language standards, while the specification here may be used with

more programming languages, even with Java.

> * The fixed, locale-based currency-rate must be removed, as repeatedly

> discussed in the past (ll 2275ff and also ll 6504ff). It is an unsuitable

> mechanism that will not work even for those European countries in the

> Euro-zone.

Response: the fixed currency-rate will work for the Euro-zone.

> * The changes to the LC_TIME section (section 4.7) that are incompatible

> with POSIX.2 must be undone. This includes the issue of "twelve or thirteen

> semicolon-separated" (2574f) months, whereas previously only twelve months

> were allowed. Implementations that expect exactly twelve entries here will

> break.

Response: noted. The LC_TIME section will be marked as controversial.

14652 is upwards compatible with POSIX. It allows for 13 months

because some lunar calenders prescribe this.

Oldfashioned POSIX compliant locale parsers cannot expect to

be able to parse 14652 FDCC-sets. There are many new features that

old POSIX-compliant parsers would not understand.

One such thing is the hexadecimal ranges.

> * The value of the timezone keyword (2663ff) in LC_TIME is difficult to

> see for countries that span more than one timezone. The relevant timezone

> is in any case present in the TIMEZONE environment variable.

Response: noted. The LC_TIME section will be marked as controversial.

No change.

> * The usefulness of the LC_XLITERATE category (section 4.9) has repeatedly

> been questioned. As the TR freely admits, it is suitable only to "simple

> transliteration based on substring substitution" (2938). There is often if

> not usually more than one transliteration scheme from a source script to a

> target script even within one culture. To hardcode one of these into a

> locale makes little sense. Therefore, LC_XLITERATE should be removed.

Response: noted. The LC_XLITERATE section will be marked as controversial.

The transliteration facilities provided by the specification is implemented

and used in some Linux implementations.

> Ireland

> Ireland votes NO on DTR 14652. In consultation with members in the

> Irish IT industry, we became aware of the forthcoming negative US

> comments. These comments are extensive and exhaustive, and it seems

> clear that this project, which has been on the books for a very long

> time indeed, still lacks consensus and technical accuracy. We do not

> feel that it would be useful to publish our own litany of what is

> wrong with this standard; rather, in this case, we consider the US

> comments to state the case quite clearly.

Response: These comments will thus be dealt with implicitely via the response

to the US comments.

> It is not clear that this matter ought to be standardized. It seems

> far more appropriate for it to be formulated and published in another

> medium, such as an RFC or a UTR.

Response: Ireland will be contacted for a possible entry in annex D.

> Slovenia

> Standards and Metrology Institute of Slovenia (SMIS) as a full

> member of JTC1 would like to vote "against" for the document

> ISO/IEC 14652 with the folowing techical comments:

> GENERAL: There is no consistency with existing practice in the

> technical part of the document. In particular:

Response: the document is reflecting the most widespread existing

practice on Unix/Linux today, in terms of computers running with this

behaviour.

> OBJECTION 1

> Section 4.1.4.1 comment_char (lines 652-653, and affecting the FDCC-set definition)

> Current text:

> "The comment character defaults to the number_sign "#". All examples

> in this Technical Report uses "%" as the comment character,

> except where otherwise noted."

> Problem and Action:

> ISO/IEC 9945-2:1992 (POSIX.2) uses the default comment_char, and for

> consistency with existing practice, this document should as well.

> Change the sentence "All examples..." to "All examples in this

> Technical Report use the default comment character." Also,

> revise the FDCC-set definition.

Response: not accepted. The specification reflects current practice on

Linux or other platforms using the GNU C compiler. Also, it does not

change any behaviour, and

it would be tedious and errorprone to change in the specification.

It also reflects use in IS 14651. This use removes some

problems that occurred with some communications protocols, and

some problems with presentation in some countries.

> OBJECTION 2

> Section 4.1.4.2 escape_char (lines 666-667, and affecting the FDCC-set definition

> Current text:

> "The escape character defaults to backslash "\". All examples in this

> Technical Report uses "/" as the escape character, except where otherwise noted."

> Problem and Action:

> ISO/IEC 9945-2:1992 (POSIX.2) uses the default escape_char, and for

> consistency with existing practice, this document should as well. =

> Change the sentence "All examples..." to "All examples in this

> Technical Report use the default escape character." Also, revise the FDCC-set

> definition.

Response: Not accepted. See response to previous comment.

> Sweden

> We find that this Technical Report of type 1 are not up-to-date with

> modern internationalisation techniques. Incremental changes are unlikely

> to result in anything sufficiently up-to-date. We therefore suggest

> that this project be discontinued. A new internationalisation format

> report could be taken up at a later date, should resources and

> sufficient consensus arise.

Response: It is acknowled this TR has controversial sections.

The Swedish member body is invited to contribute to annex D

if they wish.

> SE 1. MAJOR:

> There is no character encoding declaration for a FDCC set file

> itself, nor any requirement to use an encoding scheme for the

> universal character set (e.g. UTF-8). Instead there is

> essentially a limitation to POSIX so-called portable

> characters (a subset of ASCII), otherwise the encoding

> is in principle undefined ("implementation defined") and

> that cannot be relied upon. Therefore expressing some of

> the things covered by 14652, like weekday names, are needlessly

> cumbersome, using various kinds of character references.

> Instead such items should be expressed directly as

> the strings that one wishes to have output (or parsed).

Response: not accepted. This follows POSIX practice.

> SE 2. MAJOR, LC_CTYPE:

> Draft 14652 suggests to tie character properties to

> locales (FDCC sets). This will surely lead to

> inconsistencies among locales for property

> assignments for the same characters. Instead haracter

> properties should be defined on the universal character set (UCS).

> Together with well defined mappings between various character

> encodings and the UCS one can get consistent property assignments.

> In particular some properties may be defined only for a subset

> of the UCS characters in many locales, which works very badly

> together with programming paradigm where all character string

> processing is done on UCS strings, and other encodings are

> handled via conversion (this is the modern approach to

> character processing).

Response: not accepted. It is recommended that the

character properties of the i18n FDCC set be used, which

provides a full set of character attributes for UCS in

a specific version. But the model allows for deviance as it

is known that some deviance is needed in some cases.

It is agreed that it is advisable to do all string handling

in UCS and then convert to the actual encoding, but it

is also known that this may not always be doable, eg in embedded

systems, and the model thus allows for that. No change.

> SE 3. MAJOR, REPERTOIREMAP:

> More than 25 pages (in small print) are devoted to a

> so-called repertoiremap (clause 6), with non-mnemonic

> arcane "names" for characters. This list of names should

> be removed. Instead, for these names, for the few instances

> really needed (like invisible characters), use the code

> point number (in hexadecimal). But for the majority of

> cases use the character itself, as mentioned in the first point above.

Response: Not accepted. Clause 6 wil be marked as contoversial.

> SE 4. MAJOR:

> Many of the components formats presupposes a C-like API,

> using format strings with % followed by a letter. Not all

> systems may wish to use such format strings. Further,

> the character classes are insufficient for many purposes,

> assuming an "encoding independent" paradigm (which

> is assumed for standard C, but cannot be used for the

> most modern character encodings, i.e. the UCS encoding forms,

> since the UCS has many features not present in most or

> any legacy character encoding.

Response: Not accepted. No changes. No alternates were provided.

It is agreed that the syntax is C-like, but no better

syntax has been proposed, and it is too late to come up with a new

scheme at this stage. The C syntax has been well established.

This is only a TR type 1 and a better syntax may be proposed

for a revision of the TR. This is existing POSIX practice.

> SE 5. MAJOR:

> The locale (FDCC set) layout structure is very much geared

> towards having fixed premade locales. It's not geared

> towards having data in one layer and user preference selections

> in another layer. Instead these layers are mixed up, needlessly

> complicating things for users that may wish to compose their

> own "locale", i.e. formatting preferences.

Response: Not accepted. There are mechanisms for modifying data with user

preferences. It does not seem very complicated for users to

produce their own specifications, or even for programs to aid

users setting up their preferences.

> SE 6. MAJOR:

> LC_COLLATE: The format and semantics are described in 14651.

> Only a reference to that is to be made, no conflicting (as

> is presently given in DTR 14652) specification can be allowed.

Response: Not accepted. The format and semantics are not fully described in

14651, there are some features missing in 14651. The syntax

and semantics in 14651 is only described for use with the

Default template, not generally. It is decided for completeness that

LC_COLLATE be described in its full in 14652.

> SE 7. MAJOR:

> Charmaps: charmaps are not in any way related to 'locales'

> (FDCC sets), and locales should thus never specify any charmap.

> Any character encoding can occur for any locale (compare XML

> and its 'encoding' pseudo attribute). Further the "xliterate"

> 'category' seems to be more related to character mapping

> fallbacks than to real transliteration.

Response: Not accepted. It is agreed that charmaps should be seen as orthogonal

to the rest of the FDCC set description, but it is a required

element to make FDCC sets work in many environments.

It is agreed that 'xliterate' can be used for character mapping,

but it can also be used for many common transliterations purposes.

The LC_XLITERATE clause will be marked as controversial.

> United Kingdom

> Technical Comments

> The work is still premature and does not represent any industry

> practice. Although some useful possibilities are documented which

> would benefit from further development, many of these should not be

> considered for development within a standard, nor in a technical report

> which can be referred to normatively.

Response: The work at hand is in most parts implemented

in the Linux operating system, as part of the GNU C and

C++ compilers, so it represents widespread industry practice.

It is thus relevant to document this practice. The area is a good

candidate for standardization as an interface is needed to supply

information in the area, and to be able to process the information.

Several clauses will be marked as controverisal, and the UK is invited

to contribute to annex D.

> In addition, there has been sustained opposition to this from various

> industry sources who participate in ISO/IEC JTC1/SC22.

Response: True, but there has also been a majority behind issuing the

Technical report.

> Given the lack of consensus, this item should be withdrawn from the

> ISO/IEC JTC1/SC22/WG20 work programme. It may be useful for this to be

> developed in other fora, e.g. some Linux development groups, but it

> should not be developed further in ISO/IEC JTC1/SC22/WG20.

Response: There is consensus, as defined by JTC 1 rules, to approve

this as a TR. JTC 1 and SC22 has instructed WG20 to go forward with

the TR for a second DTR ballot.

> There are also errors in this draft which have not been corrected to

> take account of previous comments in the meetings. The editorial

> comments below list just a few of these.

Response: All comments on the earlier drafts have been addressed,

by disposition of comments, that have been approved by WG20,

and the resulting changes have been applied by the editor.

> Editorial comments

> There remain errors in new tables in LC_CTYPE classes: these use

> different descriptions and different character groups than those

> defined in ISO/IEC 10646-1:2000.

Response: An effort was done to correct this in DTR 1.

The U.K. is invited to come forward with a list of changes

in this respect.

> In the monetary and time sections, (a) definitions of multiple

> currencies are introduced, which conflict with implementations which

> anticipate only the single values defined in the POSIX.2 standard

> ISO/IEC 9945-2, and (b) different definitions for the number of

> months, and the start day of the week, are introduced.

Response: Not accepted. The specification cannot be backward compatible,

when new issues are handled. The POSIX standard has no way of

understanding more than one currency nor a year with more than 12 months.

The clauses are marked as controversial.

> The LC_XLITERATE section for character transliteration does not include

> the corrections suggested at previous meetings of ISO/IEC

> JTC1/SC22/WG20. The conversions proposed are somewhat idiosyncratic, and

> do not represent any consensus for conversion within ISO/TC46/SC2

> (Conversion of Written Languages) which develops standards on

> transliteration, and other alternative transliteration conventions are

> not catered for.

Response: Partial accepted. There have previously been some corrections proposed

wrt transliterations and they have all been accomodated.

The section will be described more like character string substitution

which can be used for culturally dependent issues like transliteration

and fallback. The clause is marked as controversial.

> United States

> OBJECTION #1

> Section 4.1.4.1 comment_char (lines 652-653, and affecting the FDCC-set

> definition)

> Current text:

> "The comment character defaults to the number_sign "#". All examples in this

> Technical Report uses "%" as the comment character, except where otherwise

> noted."

> Problem and Action:

> ISO/IEC 9945-2:1992 (POSIX.2) uses the default comment_char, and for

> consistency with existing practice, this document should as well. Change the

> sentence "All examples..." to "All examples in this Technical Report

> use the default comment character." Also, revise the FDCC-set definition.

Response: not accepted. The document reflects widespread existing

practice, which enhances the portability of the specifications.

See also response to Slovenian comments.

> OBJECTION #2

> Section 4.1.4.2 escape_char (lines 666-667, and affecting the FDCC-set

> definition)

> Current text:

> "The escape character defaults to backslash "\". All examples in this

> Technical Report uses "/" as the escape character, except where

> otherwise noted."

> Problem and Action:

> ISO/IEC 9945-2:1992 (POSIX.2) uses the default escape_char, and for

> consistency with existing practice, this document should as well. Change the

> sentence "All examples..." to "All examples in this Technical Report

> use the default escape character." Also, revise the FDCC-set definition.

Response: not accepted. The document reflects widespread existing

practice, which enhances the portability of the specifications.

See also response to Slovenian comments.

> OBJECTION #3

> Section 4.2 LC_IDENTIFICATION (lines 698-777)

> Problem:

> The text defines a list of properties for an FDCC-set, and states that

> "All keywords are mandatory unless otherwise noted." (lines 701-702) However,

> at lines 728-729, it states "If information required for any of the

> mandatory keywords above is not available, then the corresponding string

> is an empty string." Further, the i18n LC_IDENTIFICATION section defined

> at lines 748-777 contains empty strings for six `mandatory' keywords.

> This is confusing. What the text is trying to say is that certain keywords

> must be present, as opposed to requiring that values be assigned to certain

> keywords. But when most people think of "mandatory", they think of it in

> terms of values, not keywords. Besides, what is the rationale of requiring

> that certain keywords be present, but NOT requiring that they include a

> value? If values are not required, they are not mandatory.

> Action:

> Make the following changes.

> 1. Change the sentence "All keywords are mandatory..." to "Values must be

> supplied for all keywords, unless otherwise noted."

> 2. Add the sentence "This keyword is optional." to the description of

> keywords email, tel, fax, language, and territory.

> 3. Remove the sentence at lines 728-729 ("If information required for

> any of the mandatory keywords...").

Response: accepted.

> OBJECTION #4

> Section 4.3 LC_CTYPE (lines 787-788 and 817-821 and affecting

> Section 4.3.2 "i18n" LC_CTYPE category)

> Current wording:

> "The double increment hexadecimal symbolic ellipses ("..(2)..") works

> like the hexadecimal symbolic ellipses, but generates only every other

> of the symbolic character names. As an example. <U01AC>..(2)..<U01B2>

> is interpreted as the symbolic character names <U01AC>, <U01AE>, <U01B0>,

> and <U01B2>, in that order."

> Problem:

> This type of symbolic ellipses allows an FDCC-set author to save a little

> typing for some scripts if letters for those scripts are arranged in a code set

> in uppercase/lowercase pairs. Using this type of ellipses, the author can

> indicate a start and end point for a range, and pick up every other

> entry.

> The problem is that this is extremely confusing, especially considering

> that there already are three other types of ellipses. It will be extremely

> easy for authors to make mistakes, and difficult to implement and maintain

> all these variations. The work saved by adding this type of ellipses is

> overshadowed by the implementation, maintenance, readability, and

> potential for mistakes that it adds.

> Action:

> Remove lines 817-821. Remove the reference to double increment

> hexadecimal symbolic ellipses in lines 787-788. Change the entries

> in Section 4.3.2 to eliminate usage of this type of ellipses.

Response: accepted. Text will be removed and related tables will

be modified accordingly.

> EDITORIAL #5

> Section 4.3.1 Character classification keywords (line 834)

> Problem and Action:

> Grammar; change existing text to "...the interpreting system provides

> them if missing and accepts them silently..."

Response: Not accepted. this is wording from 9945 and it is not foreseen that

the wording creates problems.

> OBJECTION #6

> Section 4.3.1 Character classification keywords (lines 855-857)

> Current wording for digit class:

> "Define the characters to be classified as numeric digits. Digits

> corresponding to the values 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be

> specified in groups of 10 digits,..."

> Problem:

> The text was not quite accurate in POSIX.2, and it definitely is not

> accurate here. The first sentence is copied from POSIX.2, but in that

> standard, *only* the portable digits 0-9 could be specified. This proposal

> extends the definition, but only allows decimal digits. The restriction

> should be spelled out.

> Action:

> Change the first sentence to "Define the characters to be classified as

> decimal digits."

Response: accepted.

> OBJECTION #7

> Section 4.3.1 Character classification keywords (line 867)

> Problem and Action:

> Incorrect class name; change "digits" to "digit".

Response: accepted.

> OBJECTION #8

> Section 4.3.1 Character classification keywords (lines 869-878)

> Current wording for "outdigit" class:

> "Define the characters to be classified as numeric digits for output from an

> application, such as to a printer or a display or a output text file. Digits

> corresponding to the values <0>, <1>, <2>, <3>, <4>, <5>, <6>, <7>, <8>,

> and <9> can be specified, and in ascending order of the values they

> represent. The intended use is for all places where digits are used for

> output, including numeric and monetary formatting, and date and time

> formatting. Only one set of 10 digits may be specified. If this keyword is

> not specified, the digits 0 through 9 of the portable character set

> automatically belong to this class, with application-defined character

> values..."

> Problem:

> This keyword as defined is insufficient for its stated use. Assume someone

> wants to define Roman numerals for use in dates. Since only the values 0-9

> can be specified, there is no way to list the Roman numerals X, XI, and XII

> for the 10th-12th months. Or suppose someone wants to write Chinese monetary

> values. There is a single character for "ten", a single character for

> "hundred", and so on. To express 10, you use the "ten" character; to

> express 20, you use the "two" character plus the "ten" character (two 10s).

> The outdigit keyword does not allow for the Chinese "ten" or "hundred"

> (and so on) characters, and so does not fulfill the intended use for

> "all places where digits are used for output, including numeric and

> monetary..."

> Action:

> Remove this keyword since it does not satisfy the stated need.

Response: not accepted. Roman numerals are not digits, and Chinese

numarals neither. Add "decimal" before digits to clarify this.

> OBJECTION #9

> Section 4.3.1 Character classification keywords (lines 902-905)

> Current wording in description of "xdigit" class:

> "...If this keyword is not specified, the digits <0> through <9>, the

> uppercase letters "A" through <F>, and the lowercase letters <a> through

> <f>, automatically belong to this class, with application-defined

> character values..."

> Problem:

> As written, this is different from the POSIX.2 requirement that the xdigit

> class must contain the portable digits 0-9 and the portable letters A-F

> and a-f. This only says that if the keyword is not specified, these

> portable characters are included, but with this text, a person could

> write an xdigit class that included only Hindi digits and some subset

> of Greek letters, and it would be legal. This is inconsistent with

> POSIX.2, and therefore must be changed.

> Action:

> Remove the clause "If this keyword is not specified," from the sentence

> beginning at line 902. The revised sentence will read "The digits <0>

> through <9>..."

> Also note that "A" in the sentence should be <A>.

Response: accepted.

> OBJECTION #10

> Section 4.3.1 Character classification keywords (lines 929-932)

> Current wording in tolower description:

> "...If this keyword is specified,

> the uppercase letters <A> through <Z>, and their corresponding lowercase

> letter, are specified. If this keyword is not specified, the mapping is the

> reverse mapping of the one specified for toupper."

> Problem:

> The description is incorrect for what happens when the keyword is

> specified. This is what happens if the keyword is NOT specified.

> However, the sentence (if fixed) still would be unnecessary because

> the second sentence "If this keyword is not specified, the mapping is

> the reverse..." implies that <A> to <Z> will be included.

> Action:

> Remove the sentence on lines 929-931 ("If this keyword is specified,...")

Response: not accepted. When the keyword is specified it states that

A-z is always included.

> OBJECTION #11

> Section 4.3.1 Character classification keywords (lines 933-946)

> (and see also Section 4.3.2 "i18n" LC_CTYPE category [class "combining" and

> class "combining_level3; lines 1664-1694])

> Current wording for "class" class:

> "Define characters to be classified in the class with the name given in the

> first operand, which is a string. This string only contains characters of the

> portable character set that either has the string "LETTER" in its description,

> or is a digit or <hyphen-minus> or <low-line>. The following operands are

> characters. This keyword is optional. The keyword can only be specified

> once per named class. The following two names are recognized:

> combining Characters to form composite graphic symbols, such

> as characters listed in ISO/IEC 10646:1993 annex B.1.

> combining_level3 Characters to form composite graphic symbols, that

> may also be represented by other characters, such as

> characters listed in ISO/IEC 10646-1:1993 annex B.2."

> And also current wording from the "i18n" FDCC-set definition, lines 1664-1694:

> "% The "combining" class reflects ISO/IEC 10646-1 annex B.1

> % That is, all combining characters (level 2+3).

> class "combining" /

> <U0300>..<U036F>; <U20D0>..<U20FF>; <UFE20>..<UFE2F>;/

> <U0483>..<U0486>;<U0591>..<U05A1>;<U05A3>..<U05B9>;/

> <U05BB>..<U05BD>;<U05BF>;<U05C1>;<U05C2>;<U05C4>;<U064B>..<U0652>;<U0670>;/

> <U06D7>..<U06E4>;<U06E7>;<U06E8>;<U06EA>..<U06ED>;<U0901>..<U0903>;<U093C>;/

> <U093E>..<U094D>;<U0951>..<U0954>;<U0962>;<U0963>;<U0981>..<U0983>;<U09BC>;/

> ...

> <U0F97>;<U0F99>..<U0FAD>;<U0FB1>..<U0FB7>;<U0FB9>;<U302A>..<U302F>;/

> <U3099>;<U309A>;<UFB1E>

> %

> % The "combining_level3" class reflects ISO/IEC 10646-1 annex B.2

> % That is, combining characters of level 3.

> class "combining_level3"; /

> <U0300>..<U036F>;<U20D0>..<U20FF>;<U1100>..<U11FF>;<UFE20>..<UFE2F>;/

> <U0483>..<U0486>;<U0591>..<U05A1>;<U05A3>..<U05AE>;<U05C4>;/

> <U05AF>;<U093C>;<U0953>;<U0954>;<U09BC>;<U09D7>;<U0A3C>;/

> <U0A70>;<U0A71>;<U0ABC>;<U0B3C>;<U0B56>;<U0B57>;<U0BD7>;<U0C55>;<U0C56>;/

> <U0CD5>;<U0CD6>;<U0D57>;<U0F39>;<U302A>..<U302F>;<U3099>;<U309A>"

> Problem:

> I've quoted a lot of original text here, because this is a confusing

> problem. I could not understand from the description what the classes

> were supposed to be for, so I looked at the i18n FDCC-set example.

> It turns out the description and definition of the two combining

> classes is exactly backward. ISO 10646 defines three levels:

> Level 1 -- most restrictive; shall not contain any characters listed

> in Annex B.1

> Level 2 -- less restrictive; shall not contain any characters listed

> in Annex B.2

> Level 3 -- least restrictive; can contain any coded character.

> The members listed of the classes in the FDCC-set, however, do not

> match the definitions. What is called combining_level3 is the group

> of characters that canNOT appear in a Level 1 or 2 implementation. What

> is called "combining", and described as being "all combining characters

> (level 2 + 3)", actually is the list of characters that canNOT

> appear in a Level 1 implementation.

> Action:

> These classes do not exist in other standards and are so ill-defined that

> it is impossible to say what characters are supposed to be defined in

> which class. Remove lines 933-946 and lines 1664-1694 from the draft.

Response: not accepted. The classes exist in another standard namely

IS 10646. This faithfully reflects the definition in that standard,

as it is defined in its annex B. LC_CTYPE keywords class, map, width

will be marked as controversial.

> OBJECTION #12

> Section 4.3.1 Character classification keywords (lines 947-955)

> Current wording in width description:

> "Define the column width of characters, for example for use of the C

> function wcwidth(). The operands are first a list for characters, possibly

> using various ellipses, and semicolon separated, then a <colon>, and then

> the width of these characters given as an unsigned positive integer. Such

> width-lists separated by <semicolon> may be given for the various widths.

> The default value of width of characters in class "cntrl" and class

> "combining" is 0, else the default value of width is 1. A width for a

> character may be overridden by a WIDTH specification in a charmap. This

> keyword is optional."

> Problem:

> This description is very confusing. What does it mean that a "...width

> for a character may be overridden by a WIDTH specification in a charmap"?

> Does that mean if it's one thing in the charmap and another in the

> FDCC-set, the charmap wins? Why should width specifications be in two

> places?

Response: Partially accepted. This is what it means. It seem that the wording is

clear enough as the text has been understood correctly. The reason for a

mechanism to override a default, is that in many cases the default

would suffice, while there are a some exceptions from this rule.

It is thus efficient to have a place to specify a default, and

places to specify exceptions. Keyword with will be marked as controversial.

> Also, this class is quite different from other LC_CTYPE classes. For other

> classes, one lists which characters are in that class, or a one-to-one

> mapping between uppercase and lowercase. This is different; you list a

> group of characters, and then define what value their width is. Each

> character in this class can have a different value, as opposed to other

> classes where it simply is a Boolean function -- if you're listed, you're in.

Response: yes, this is a correct observation. The specification is different

because it is a different subject.

> This class is confusingly-defined, and seems out-of-place in the

> Boolean-oriented LC_CTYPE section.

> Action:

> Remove lines 947-955.

Response: Not accepted. The specification satisfies a clear objective, and

it is well understood, as also indicated in the comments above.

LC_CTYPE also contains mapping to other characters, as toupper and tolower,

and with this keyword mappings to integers.

> OBJECTION #13

> Section 4.3.1 Character classification keywords (lines 956-973)

> Problem:

> The map keyword is poorly described. According to Annex A, it is supposed

> to provide the functionality associated with the C library function

> towctrans(), but that's not clear from the text here ("Define the

> mapping of characters." What?).

> Action:

> Either remove this keyword, or rewrite the description to make it

> clearer that this is designed to allow mapping of one type of characters

> to another, related type. For example, you might want to map hiragana

> to katakana. Or Hindi digits to portable digits. Etc.

Response: Partially accepted. The specification will be clarified to be to

map characters to other characters, as it is also descibed three

lines further ahead. Keyword map will be marked as controversial.

> OBJECTION #14

> Section 4.3.1 Character classification keywords (lines 975-1002)

> Problem:

> The mapping table of character class combinations duplicates information in

> POSIX.2 without adding any new data about classes included in this

> document.

> Action:

> Either remove the table completely, since the information already is

> available in another standard, or update it to include combination information

> about classes added for this document.

Response: not accepted. It is the intension of 14652 to carry all

information pertinent to i18n that was also included in IS 9945.

> OBJECTION #15

> Section 4.3.2 "i18n" LC_CTYPE category

> Problem:

> The membership of classes is inconsistent and confusing. With a few

> exceptions, it should match the classifications in the Unicode standard,

> where the classes/properties are comparable. Right now, class memberships

> are similar, but not identical to, comparable Unicode classes. For

> example:

> * the digit class includes a large group of digits that Unicode

> also identifies as being decimal, but is missing these groups:

> Myanmar (U1040..U1049)

> Ethiopic (U1369..U1371)

> Khmer (U17E0..U17E9)

> Mongolian (U1810..U1819)

> Fullwidth (UFF10..UFF19)

> Why should these be omitted, when the others are included?

Because they are not in the covered repertoire, except for

the fullwidth digits, which will be added.

> * the space class includes many of those that Unicode identifies

> as being space, but is missing:

> U00A0 -- No-Break Space

> U2007 -- Figure Space

> U202F -- Narrow No-Break Space

These spaces are not considered part of the space class, as they

are spaces that are regarded as graphical characters - nonremovable,

eg they cannot be removed at line breaks, and cannot be used for

word breaks. This is aligned with some existing practice.

> Note that this class also has several control characters, like <tab> and

> <carriage-return>, that Unicode does not consider part of the space class.

> However there is much existing practice on POSIX-based systems for

> including those controls, so it is understandable why they are here.

> * the punct class includes some, but not all, characters that Unicode

> identifies as being punctuation. For example:

> + it includes U2030..U2046, which are in the Unicode general punctuation

> block, but omits

> U2048 -- Question Exclamation Mark

> U2049 -- Exclamation Question Mark

> U204A -- Tironian Sign Et

> U204B -- Reversed Pilcrow Sign

> These also are in the general punctuation block.

Yes, but not in the covered repertoire

> + it includes the currency symbols in the range U20A0..U20AA, but

> omits these other currency symbols in the same block:

> U20AB -- Dong Sign

> U20AC -- Euro Sign

> U20AD -- Kip Sign

> U20AE -- Tugrik Sign

> U20AF -- Drachma Sign

U20AD U20EA and U20AF are not part of the repertoire covered.

U20AB and U20AC are in the repertoire, and will be added.

> + unlike Unicode 3.0, it includes most of the "Letterlike Symbols"

> from the range U2100..U213A in the punct class. This includes

> characters like U210B (Script Capital H), U2115 (Double-Struck

> Capital N), etc., but omits those that happen to have the word

> "LETTER" in their name; e.g.,

> U210C -- Black-Letter Capital H

> U2111 -- Black-Letter Capital I

> This range also omits U2139 (Information Source), and U213A

> (Rotated Capital Q), which are also in this Letterlike Symbols

> block.

> It's not clear why any in this range are included in punct, but

> the particular subset of characters listed is even more confusing.

U2139 and U213A are not in the covered repertoire.

> There are many more differences between this i18n FDCC-set and

> Unicode, but the point is that the differences exist. This document

> should use the Unicode values where they exist instead of inventing

> another group of classifications that differ in dozens of small ways.

> Action:

> Revise the membership of all classes to match the lists Unicode provides,

> where they exist. HOWEVER, in the few cases where the common practice in

> POSIX systems differs from Unicode (for example, including some control

> characters in the space class), retain that existing practice for

> members of the portable character set.

> Note, too, that 14652 defines some classes for which there are no

> matching Unicode properties. Obviously, in these cases, the i18n FDCC-set

> cannot match Unicode.

Partially accepted, with changes as indicated above.

> Section 4.4 LC_COLLATE

> This is a placeholder for the content of Section 4.4 (LC_COLLATE).

> See TECHNICAL #61, TECHNICAL #62 and TECHNICAL #63 later in this document.

> OBJECTION #16

> Section 4.5 LC_MONETARY (entire section)

> Problem:

> This section includes multiple keywords that were defined in POSIX.2,

> but it changes their definitions in such a way that existing applications

> would be invalid. This is incorrect. The changes allow the rules for

> multiple currencies to be specified in existing keywords, but in POSIX.2,

> only rules for single currencies can be defined.

yes, this is an enhancement of POSIX.2.

> While the need to handle multiple currencies is real, the method defined

> here is significantly different than what has been done when other

> LC_ categories have had to be extended. When expanding LC_TIME to allow

> for multiple calendars, new keywords were added (era, era_year, etc.),

> rather than simply tacking new entries on to the end of existing keywords.

No, this is not different to ways other categories have been

extended. This is an extension to the local currency symbol,

where there are more than one at one time, or there are different symbols

at different times.

> Consider the previously existing LC_MONETARY keyword currency_symbol.

> It is defined in POSIX.2 as "The string that shall be used as the local

> currency symbol," while here it is defined as "One or more strings

> separated by semicolons that are used as the local currency symbol." (lines

> 2293-2294). Assume I'm defining French currency and the euro. I might

> have something like this:

> currency_symbol "<F>";"<euro>"

> However, the description of this category no longer is correct -- these are

> not strings "that are used as the local currency symbol". That implies the

> two strings are synonyms for each other. The reality is that these are

> strings that represent different currencies used for this locale. They

> should not be glommed together in one keyword. It would be more accurate to

> separate these (and all other keywords that in this draft can take multiple

> values) into something like

> currency_symbol "<F>";

> alt_currency_symbol "<euro>";

The <f> and <euro> symbols are both local symbols.

The euro is not an alternate currency symbol, but

an additional local currency symbol, which is just as

necessary as the first mentioned.

If the proposed extension scheme would be followed, an

indefinite number of keywords would need to be defined,

for which no well defined API support could be specified.

> As defined in this draft, it is not clear how application programs parse

> or use these values. Existing implementations request *the* currency symbol

> and use it to format values. What would happen to a previously conforming

> application if it requested the (single) currency_symbol value, but an array

> of strings was returned? Lines 6509-6510 of the rationale state:

> "Also the same application call can be made to be valid for countries with

> a single currency and countries with dual currencies." That's only true

> if the application is expecting one *or more* values. Existing applications

> expect exactly one value for most of these keywords.

There is no previously conforming implementations to this specification.

You cannot make the POSIX standards forward compatible, when

that standard does not have the capability to handle the problems

at hand. That standard would give wrong results in any case.

> Now, suppose an application is rewritten to allow for multiple currency

> symbols. Now what? What rules does it use to decide which currency_symbol

> value it should use to format a monetary quantity? If the section were

> designed so that the existing definitions had not changed, but alt_*

> keywords were added when needed, an application could request currency_symbol

> when formatting national currency values, and alt_currency_symbol when

> formatting euros (or another alternate currency).

The euro is not an alternate. It must be displayed together

with the other currency. You have a double-currency in these countries.

> Also, *because* this section allows multiple currencies to be specified,

> there is an implied tie between keywords. If currency_symbol includes

> French francs and euros (in that order), frac_digits, ps_cs_precedes,

> etc., must also specify the rules for francs and euros in the SAME order.

> The valid_from keyword attempts to explain this dependency, but the

> wording is very confusing and not restricted to that keyword.

> Moving to other keywords, there are a new set of int_* keywords. Under

> POSIX.2, there were only two such keywords -- int_currency_symbol and

> int_frac_digits. They were for formatting monetary values using the

> international currency strings (e.g., "USD " rather than "$" for the

> U.S. dollar; "DEM " rather than "DM" for the German mark; etc.). Under

> POSIX.2, quantities that used the international currency string and

> those that used the local currency symbol used the same values for

> keywords such as p_cs_precedes, p_sep_by_space, etc. Annex A says these

> have been added to accommodate "differences between local and international

> formats." For example?

For example for Finnish, where the international currency symbol

is written in front of the amount, where the local symbol

is written after the amount.

> At the end of this section, the "i18n" FDCC-set does nothing to illuminate

> the many new keywords and revised definitions of existing keywords.

> Since attempted support for multiple currencies is the reason for the

> many changes and additions to this section (as compared to POSIX.2),

> an example in this section that illustrates how multiple currencies might

> actually be specified must be provided. There is an example in the

> rationale section, but the information needs to be available here.

There are examples in the rationale.

> See below for additional comments on specific keywords.

> Action:

> Restore the original definitions of keywords that exist in the LC_MONETARY

> section of POSIX.2. Add new keywords for defining alternate currencies.

> Remove the additional int_* keywords, unless a concrete rationale with

> examples of real differences between local and international formats, is

> provided. Add an example that shows how to specify multiple currencies.

Rsponse: not accepted. Clause 4.5 will be marked as contorversial

> EDITORIAL #17

> Section 4.5 LC_MONETARY (lines 2250-2252)

> Current wording:

> "...Keywords that are not provided, string values set to the empty

> string "", or integer keywords set to -1, are used to indicate that

> the value is unspecified, and then no default is implied."

> Problem:

> This wording is unclear.

> Action:

> To follow POSIX.2 more closely, revise the sentence as follows: "Keywords

> that are not provided, string values set to the empty string (""), or

> integer keywords set to -1 shall be used to indicate that the value is

> not available. No defaults are implied."

Response: not accepted. These are wordings coming from POSIX, so we cannot follow

POSIX more closely than this.

> OBJECTION #18

> Section 4.5 LC_MONETARY (lines 2258-2268)

> Current wording of valid_from keyword:

> "One or more strings separated by semicolons, representing a

> Gregorian date in the form "YYYYMMDD" according to

> ISO 8601, specifying the beginning date (inclusive from the

> beginning of day local time) of the validity of a currency.

> The position of the string in the list corresponds to the

> position of operands in other keywords in the

> LC_MONETARY category. The currencies should be

> ordered in terms of validity dates, and for each validity

> period with the currency that the amounts are stored in first.

> If not specified, it is taken to be an implementation-defined

> beginning of time. This keyword is optional."

> Problem:

> This wording is unclear and confusing. I think *part* of what this is

> trying to say is:

> "One or more strings, separated by semicolons, of Gregorian dates in

> the form "YYYYMMDD" that specify the date on which a currency became

> or becomes valid. Dates are inclusive from the beginning of the day

> local time....If not specified, the value of this keyword is an

> implementation-defined beginning of time. This keyword is optional."

> The earlier overall objection to this section notes that information

> about dependencies on the order of values is not restricted to this

> keyword. Thus, the sentence "The position of the string..." should not appear

> in this description.

"The position for" is needed for which currency specification

it relates to. No change.

> There is no reason to mention ISO 8601 here; specifying the YYYYMMDD

> order is sufficient.

Yes, but it does not hurt. No change.

> The sentence "The currencies should be ordered in terms of validity

> dates..." is unclear; I have no idea what it means.

> Action:

> Revise the text as recommended, rewrite the sentence about "The currencies

> should be ordered...", and add an example to show how this might be

> defined.

Respose: not accepted. Clause 4.5 will be marked as controversial.

> OBJECTION #19

> Section 4.5 LC_MONETARY (lines 2269-2274)

> Current wording of the valid_to keyword:

> "One or more strings separated by semicolons, representing a

> Gregorian date in the form "YYYYMMDD" according to

> ISO 8601, specifying the end date (inclusive to the end of

> day local time) of the validity of a currency. If not specified,

> it is taken to be an implementation-defined end of time. This

> keyword is optional."

> Problem:

> The current wording is unclear, and the default value is inappropriate

> since not all systems define an end of time.

> Action:

> Rewrite as follows:

> "One or more strings, separated by semicolons, of Gregorian dates in

> the form "YYYYMMDD" that specify the last day on which a currency was

> or will be valid. Dates are inclusive to the end of the day local time.

> This keyword is optional."

Respose: partially accepted. Clause 4.5 will be marked as controversial.

The systems need to define an end of time for this purpose.

Some rewording will be done.

> OBJECTION #20

> Section 4.5 LC_MONETARY (lines 2275-2292)

> Current wording:

> "one or more pairs of integers separated by a <semicolon>

> specifying the fixed conversion rate between the current

> currency (determined by the parameter number) and the first

> currency that is valid, determined by a date provided by the

> application. If the currency is not the first valid currency for

> the period in question, the first integer is for multiplying the

> first valid currency, and the second for dividing this result to

> get the amount in the current currency. The currency to be

> the current currency is selected by the application from the

> date applicable and the currency number (first, second, third

> etc valid currency at that date); and whether domestic or

> international formatting is used is also determined by the

> application. Each pair of integers are separated by a <slash>.

> The default value is "1/100". This keyword is optional..."

> Problem:

> The description of the conversion_rate keyword is incomprehensible.

> However, an example in the rationale section shows this definition

> for Deutsch marks and euros:

> conversion_rate 1; 195/100

> From this, it appears that the first value for conversion rate should

> be 1, because it is the "primary" currency, and the value for the

> second currency should be the true conversion value. However, this

> example does not match the current keyword text. Note, for example,

> that an entry is supposed to be "one or more *pairs* of integers", but

> that the first value in the example is a single integer.

The "1" entry will be corrected to "1/1"

> It also is not clear that conversion rates should be in locales, since

> they often change over time. The fact that euro conversion rates are

> fixed in relation to certain national currencies is a specific instance

> of currency rules, but is not applicable around the world.

For the Euro conversion it does not change over time.

This has been the most complicated changeover for official

currencies, and it is not ancipated that more complicated

schemes would be envisaged. The specification only adresses

currencies that are cultural conventions for a given culture,

not general courrency exhange, nor custumized currency exchanges.

> Action:

> Remove this keyword. The description is not clear, the example does not

> match the description, and it addresses a euro-specific feature, as

> opposed to being generally applicable to multiple currencies and locales.

> Further, since previous recommendations are to define different currencies

> in separate keywords, it would not be consistent to continue defining

> rules for multiple currencies in conversion_rate.

response: not accepted. The description is general as it addresses

all official currency changeovers known.

The clause 4.5 will be marked as controversial.

> OBJECTION #21

> Section 4.5 LC_MONETARY (multiple keyword entries)

> Current wording at end of all non-optional keywords:

> "This keyword is specified, unless the "copy" keyword is used."

> Problem:

> The multiple appearances of this sentence all are unnecessary. The

> description of the "copy" keyword states that if it "...is specified, no

> other keyword is specified." Thus, it is redundant to spell out the

> restriction about the "copy" keyword at the end of the other keywords.

> It also is inconsistent. Keyword descriptions in other sections of this

> draft do not include the redundant sentence.

> Action:

> Remove the sentence at lines 2317-2318, 2320-2321, 2323-2324, 2328-2329,

> 2333-2334, 2339-2340, 2344-2345, 2350-2351, 2367, 2384, 2392-2393, and

> 2397-2398.

Respone: not accepted.

This is taken directly from POSIX. In some cases you may

use other keywords when "copy" is used. There is no need

removing it.

> OBJECTION #22

> Section 4.7 LC_TIME (lines 2540-2543 and 2547-2551)

> Current wording in abday and day keyword descriptions:

> "... The first string is the [abbreviated|full] name of the day

> corresponding to the first day of the week (default Sunday), the

> second the [abbreviated|full] name of the day corresponding to the second

> day of the week (default Monday), and so on."

> Problem:

> This wording implies that the first day of the week is locale-specific,

> and that the %a and %A descriptors may produce the locale-equivalent

> of "Sunday" if Sunday is defined as the first day of the week, *or* the

> locale-equivalent of "Monday" if Monday is defined as the first day of

> the week, etc. This differs from the existing POSIX.2 definition and

> the descriptions in ISO C for the keywords and the meaning of the format

> descriptors. In the other standards, abday, day, %a, and %A all are

> defined in terms of a week that begins on Sunday.

> Of course, many locales use a week that begins on Monday, and it is

> understandable that some want to support this within abday, day, and

> the format descriptors. But this is an incompatible change with existing

> practice that will break existing implementations. Further, support for

> Monday-first locales already exists with the %u, %V, and %W format

> descriptors.

> Action:

> Revise the text at 2540-2543 as follows: "The first string is the

> abbreviated name of the day corresponding to Sunday, the second the

> abbreviated name of the day corresponding to Monday, and so on."

> Revise the text at 2547-2551 as follows: ""The first string is the

> full name of the day corresponding to Sunday, the second the

> full name of the day corresponding to Monday, and so on."

Response: not accepted. It cannot be expected the POSIX will

understand new features of 14652.

The LC_TIME clause will be marked as controversial.

> OBJECTION #23

> Section 4.7 LC_TIME (lines 2552-2567)

> Current wording for week keyword:

> "Is used to define the number of days in a week, and which weekday

> is the first weekday (the first weekday has the value 1), and which

> week is to be considered the first in a year. The first operand is an

> integer specifying the number of days in the week. The second

> operand is an integer specifying the Gregorian date in the format

> YYYYMMDD, and it specifies a day that is a first weekday (all

> other first weekdays may then be calculated by adding or subtracting

> a whole multiple of the number of days in the week as specified

> with the first operand). The third operand is an integer specifying the

> weekday number to be contained in the first week of the year. The

> third operand may also be understood as the number of days required

> in a week for it to be considered the first week of the year. If the

> keyword is not specified the values are taken as 7, 19971130 (a

> Sunday), and 7 (Saturday), respectively. ISO 8601 conforming

> applications should use the values 7, 19971201 (a Monday), and 4

> (Thursday), respectively. This keyword is optional."

> Problems:

> There are multiple problems with this description.

> 1. There is no need to define the number of days in a week, because

> the seven-day week is common to all major calendars.

There are some minor calendars with other than 7 days in a week.

> 2. The description says this keyword defines "...which weekday

> is the first weekday (the first weekday has the value 1)" which is

> confusing but probably is supposed to define which day of the week is

> considered the first (for example, Sunday is the first day of the week

> in some cultures, while Monday is in others). Assuming this interpretation

> is correct, the second operand here is ill-defined to meet this requirement.

> It requires picking a random date that falls on the first day of the week

> for this FDCC-set. In this example, November 30, 1997 falls on a Sunday,

> so it is the value used for locales that have a Sunday-first rule.

> Implementors then are required to calculate ALL other first weekdays

> (before and after) from the randomly chosen date. This is hogwash.

The calculating of the first day of all weeks is straightforward

and part of normal date handling.

> 3. The description further says the keyword defines "...which week is

> to be considered the first in a year." It is more accurately defined

> later in the description as "the number of days required in a week

> for it to be considered the first in a year." The first definition

> is unclear and should be changed.

It is an explanation of the purpose.

> Action:

> Remove the operand for the number of days in a week. Remove the operand

> that defines a date of a (random) first weekday. Change the description

> of the keyword to be defining "the number of days required in a week for

> it to be considered the first in a year."

Response: not accepted. The LC_TIME clause will be marked as controversial.

> OBJECTION #24

> Section 4.7 LC_TIME (lines 2569 and 2574)

> Problem:

> The descriptions of the abmon and mon keywords say they consist of

> "twelve or thirteen" month names. POSIX.2 and ISO C only support

> twelve-month calendars, and existing implementations will break if

> this is changed.

> Action:

> Change the descriptions of the keywords to say the operands consist

> of twelve month names, not "twelve or thirteen."

Response: not accepted. It is not expected that the POSIX standard

can handle new features of 14652.

The LC_TIME clause will be marked as controversial.

> OBJECTION! #25

> Section 4.7 LC_TIME (timezone section; lines 2663-2757)

> Problem:

> It is completely inappropriate to specify timezone information in a

> FDCC-set. The draft says this is for specifying cultural conventions,

> but timezones can cross national boundaries and many time zones can

> exist within a single country. For countries like the U.S., Canada,

> Russia, Australia, and others that span many time zones, there is

> no way to determine which time zone to include in an FDCC-set, or, if

> multiple zones are included, how to figure out which one to use in

> what area.

> As the draft notes, the TZ (timezone) environment variable already exists

> for specifying time zone information. It absolutely does not belong

> within a locale or FDCC-set.

> Action.

> Remove lines 2663-2757.

Response: not accepted. See response to Germany on this issue.

The 4.7 section will be marked as controversial.

> EDITORIAL #26

> Section 4.7 LC_TIME (line 2767)

> Problem:

> Table 3 is called "Escape sequences for the date field", but all

> other text calls these values "field descriptors".

> Action:

> Change "Escape sequences" to "Field descriptors".

Response: accepted.

> OBJECTION #27

> Section 4.7 LC_TIME (line 2780)

> Current wording for the %F descriptor:

> "The date in the format YYYY-MM-DD (ISO 8601 format)."

> Problem:

> Multiple other places in this draft describe "ISO 8601" format as

> YYYYMMDD.

> Action:

> Make all references to ISO 8601 consistent.

Response: Accepted. ISO 8601 allows multiple formats for dates, and what is

used here all conforms, but with different formats for different purposes.

A note of this will be added in top of clause 4.7.

> OBJECTION #28

> Section 4.7 LC_TIME (lines 2781-2782)

> Current wording for the %g and %G descriptors, respectively:

> "Week-based year within century, as a decimal number (00-99).

> Week-based year with century, as a decimal number (for example 1997)."

> Problem:

> There is no explanation of how a "week-based year" differs from any other

> year. The existing %y and %Y descriptors specify the year within a

> century, and the year with century, so there is no need for these new

> descriptors.

> Action:

> Remove lines 2781-2782.

Response: Not accepted.

There is a need. One day in the beginning of a year may as a weekday

be a day of a week in the preceding year.

> OBJECTION #29

> Section 4.7 LC_TIME (line 2787)

> Current wording for the %m descriptor:

> "Month, as a decimal number (01-13)."

> Problem:

> As described previously, existing implementations support a 12-month

> calendar.

> Action:

> Change the text as follows: "Month, as a decimal number (01-12)."

Response: not accepted, as also previously responded.

> OBJECTION #30

> Section 4.7 LC_TIME (lines 2819-2826)

> Current wording:

> "NOTE: %g, %G and %V give values according to the ISO 8601 week-based year.

> In this system, weeks begin on a Monday and week 1 of the year is the week

> that includes 4th January, which is also the week that includes the first

> Thursday of the year, and is also the first week that contains at least four

> days in the year. . . If the 29th, 30th or 31st January is a Monday, it and

> any following days are part of week 1 of the following year. Thus, for

> Tuesday 30th December 1997, %G is replaced by 1998 and %V is replaced by 1."

> Problem:

> The month name in one example is wrong. The sentence should read "...If

> the 29th, 30th, or 31st of December is a Monday,..."

Accepted.

> Also, since an earlier objection recommends removing %g and %G, this

> text should remove references to the descriptors, too.

Not accepted.

Here is the explanation for %g etc that was asked for in OBJECTION 28.

> Action:

> Revise the text as indicated.

Response: not accepted, except for editorial correction as noted above.

> EDITORIAL #31

> Section 4.8 LC_MESSAGES (lines 2931-2932)

> Current wording:

> "Note: This uses regular expression syntax with brackets ([]) to for example

> specify the both <+> and <1> is allowed as an affirmative answer."

> Problem:

> Grammatically incorrect sentence that doesn't say what it means to say.

> Inconsistent use of symbolic names. Also, since the definitions of yesexpr

> and noexpr say they are "extended regular expression[s]", it is not

> necessary to repeat that in the note.

Response: not accepted.

"the" will be changed to "that".

Why is the use of symbolic name inconsistent?

It does not harm to repeat "regexp" - some readers do not

appreciate all facts at once.

> Action:

> Rewrite the text as follows: "For yesexpr, this specifies that either

> <plus-sign> or <one> is considered an affirmative answer. For noexpr, the

> supported negative responses are defined as <hyphen-minus> or <zero>."

Response: not accepted, the symbolic names are the ones that are

preferred for the specification.

> OBJECTION #32

> Section 4.9 LC_XLITERATE (lines 2934-3047)

> Problem:

> The ability to transliterate characters from one writing system and/or

> language to another is something users might think of as a "wow, cool"

> bit of functionality. However, this is an extremely complex problem. The

> keywords and syntax defined in this section are completely inadequate to

> handle this problem, so this section should be removed from the document.

> Consider the example provided and the way it is described to work. Of course,

> the example is not intended to be a complete functioning transliteration

> section, but it raises enough questions to point out how inadequate this

> proposal is.

> Current wording:

> [begin]

> "4.9.3 Example of use of transliteration

> LC_XLITERATE

> include "de_DE";"de_repmap"

> default_missing <?>

> translit_ignore <U3200>..<UFAFF>

> <ae> <a:>;<e*>;"<a><e>";"<e>"

> <s> <s*>;<s=>

> "<K><O>" <KO>

> END LC_XLITERATE

> ...

> The "include" keyword specifies that the FDCC-set "de_DE" is copied and that

> the repertoiremap "de_repmap" is used to define the symbolic character names

> in the FDCC-set "de_DE".

> ...

> The first transliteration statement defines a number of transliterations

> for the LATIN LETTER AE, including into LATIN LETTER A WITH DIAERESIS,

> GREEK LETTER EPSILON, the two Latin letters A and E, and finally the

> LATIN LETTER E.

> The second transliteration statement defines transliteration of the LATIN

> LETTER S into GREEK LETTER SIGMA, and CYRILLIC LETTER ES.

> The third transliteration statement transliterates the two Latin letters

> K and O into the Japanese Hiragana character KO."

> [end]

> Start with the "include" keyword. The example shows including the de_DE

> FDCC-set, and according to the keyword description this is "the name of

> the FDCC-set...to transliterate from." So the plan here is to transliterate

> from German into something else. But what part or parts of a FDCC-set

> are supposed to be included here? The entire FDCC-set, with all sections

> as defined in 14652? If so, what purpose does it serve to include

> LC_CTYPE, LC_COLLATE, etc., sections here? If not, what exactly from

> the de_DE set is supposed to be included? There is no information in

> the LC_XLITERATE section to explain this.

It only includes the LC-XLITERATE category of the "de_DE" FDCC-set.

This is clarified in the section on the include statement,

which says it is only transliteration statements that are

included. It does not need to be German that is the

source language. Actually what is normally expected here is

the target language. But this does not apply. You can use

a german specification to build on, for say a Danish target.

> Under what circumstances might users define in a locale (FDCC-set) that

> they want to transliterate from German? Suppose this excerpt appears in

> a Japanese FDCC-set. It's easy to imagine users wanting to transliterate

> from Japanese to a number of other writing systems and/or languages. But

> under this design, a finite set of transliterations would have to be

> hard-coded into each FDCC-set, seriously limiting users' choices.

The scheme is defining a target culture, not a source language.

Text will be added at beginning of clause 4.9 to clarify this.