ISO/IEC JTC1/SC22/WG20 N706R 2000-05-24 Title: Disposition of comments to PDTR ballot 14652 Source: WG20 Status: As approved by WG20 editing group > SUMMARY OF VOTING ON > > Letter Ballot Reference No: SC22 N2955 > Circulated by: JTC 1/SC22 > Circulation Date: 1999-07-07 > Closing Date: 1999-10-08 > > SUBJECT: PDTR Ballot for PDTR 14652: Information technology - > Programming languages, their environments and system software interfaces - > Specification Method for Cultural Conventions (Technical Report Type 1) > > ----------------------------------------------------------------------- > The following responses have been received on the subject of approval: > > > "P" Members supporting approval > without comment 7 > > "P" Members supporting approval > with comment 1 > > "P" Members not supporting approval 4 > > "P" Members abstaining 1 > > "P" Members not voting 8 > > "O" Members supporting approval 1 It is noted that if this were a DTR ballot, the ballot would have passed, as there was a majority of P members voting in favour of the document, as per JTC 1 directives clause 9.8.3 > > ---------------------------------------------------------------------- > Secretariat Action: > > The comment accompanying the abstention vote from France was: "Due to > lack of resources." > > The comments accompanying the affirmative vote from Canada and the > comments accompanying the negative votes from Denmark, Germany, Japan and > the United States of America are attached. > > WG20 is requested to prepare a Disposition of Comments Report and make a > recommendation on the further processing of the PDTR. > > > ______ end of overall summary; beginning of detail summary _____________ > > ISO/IEC JTC1/SC22 LETTER BALLOT SUMMARY > > > PROJECT NO: JTC 1.22.30.02.03 > > SUBJECT: PDTR Ballot for PDTR 14652: Information technology - Programming > languages, their environments and system software interfaces - > Specification Method for Cultural Conventions (Technical Report, > Type 1) > > Reference Document No: N2955 Ballot Document No: N2955 > Circulation Date: 1999-07-07 Closing Date: 1999-10-08 > > Circulated To: SC22 P, O, L Circulated By: Secretariat > > > SUMMARY OF VOTING AND COMMENTS RECEIVED > > Approve Disapprove Abstain Comments Not Voting > 'P' Members > > Austria ( ) ( ) ( ) ( ) (X) > Belgium (X) ( ) ( ) ( ) ( ) > Brazil ( ) ( ) ( ) ( ) (X) > Canada (X) ( ) ( ) (X) ( ) > China ( ) ( ) ( ) ( ) (X) > Czech Republic (X) ( ) ( ) ( ) ( ) > Denmark ( ) (X) ( ) (X) ( ) > Egypt ( ) ( ) ( ) ( ) (X) > Finland (X) ( ) ( ) ( ) ( ) > France ( ) ( ) (X) (X) ( ) > Germany ( ) (X) ( ) (X) ( ) > Ireland (X) ( ) ( ) ( ) ( ) > Japan ( ) (X) ( ) (X) ( ) > Netherlands (X) ( ) ( ) ( ) ( ) > Norway (X) ( ) ( ) ( ) ( ) > Romania ( ) ( ) ( ) ( ) (X) > Russian Federation (X) ( ) ( ) ( ) ( ) > Slovenia ( ) ( ) ( ) ( ) (X) > UK ( ) ( ) ( ) ( ) (X) > Ukraine ( ) ( ) ( ) ( ) (X) > USA ( ) (X) ( ) (X) ( ) > > 'O' Members Voting > > Korea Republic (X) ( ) ( ) ( ) ( ) > > _________ end of detail summary; beginning of Canada comments _________ > > COMMENTS ACCOMPANYING THE CANADA AFFIRMATIVE VOTE ON > SC22 LETTER BALLOT N2955 > > We are pleased to note that the editor has taken a number of our > suggestions for improvement of the document when it was in its previous > incarnation, and incorporated them in this Technical Report. Our comments > specific to the current draft follow. > > > 1. There are instances throughout this document where it refers to itself > as a standard - sometimes on the same page! For instance, the introduction > on page iv, line 70 says "This Technical Report" yet page iv, lines 77, > 110 (twice) call it a standard. Another one is on page 1, line 133 > (calling it a Technical Report) and line 139 (calling it a standard). Some > more are mentioned below. These have to be fixed; our suggestion is that > the entire document be searched and all instances be corrected. Accepted. > 2. In many places, this document expresses things in a way suggesting > conformance to a standard. The pedigree of this technical report was a > standard but it is no longer a standard and artifacts of this pedigree > must be removed. It should not state the prescriptive "...shall be used.." > or "..shall contain ..." but rather just describe and specify things. In a > number of places the editor has already made the change but not > consistently throughout the document and that creates problems. An > example, amongst many throughout this document, can be found on page 7, > line 485: > > "A category source definition shall consist of a category > header, a category body, and a category trailer. A category > header shall consist of the character string ......." > > This could be replaced by: > > " A category source definition consists of a category header, > a category body, and a category trailer. A category header > consists of the character string ...." accepted. The editor will remove "shalls". > > Specific comments: > > 1. page iv line 74, 75 "..., and a way to specify how much is covered and > the status of it ." If this is an indirect reference to LC_IDENTIFICATION > then it is wide of the mark. It needs to be reworded in that case. If it a > reference to something else, it still fails as it is not clear from > reading the document what this refers to. The sentence will be removed. > > 2. page iv line 91 "With those data.." ==> "With that data .." accepted. > 3. page iv line 103 "..a built-in nature .." does not make sense. Perhaps > it was supposed to be " ..built-in .." Accepted in principle. It means "or provided by the application or operating system". > 4. pages 2 and 3 Section 3.1 - it would be better to order the terms and > definitions or at least group them Noted. They are supposed to be grouped together in related terms, this will be put as notes. > 5. page 3 line 268 "..this standard .." ==> "..this TR .." accepted. > 6. page 3 line 272 "..this standard .." ==> "..this TR.." accepted. > 7. page 6 line 462 " ...while it in .." ==> "..while in many cases, it > could ..." accepted. > 8. page 11 line 691 remove the words "..need not be a national extent .." > because it does not add anything; the extent is tied to ISO 3166. changed to "where applicable". > 9. page 11 line 704 + "If any of the above information is non-existent, > it must be stated; the corresponding string is then the empty string " > should be restated as "If information required for any of the mandatory > keywords above is not available, then the corresponding string is an > empty string." accepted. > 10. page 11 line 709+ "Note: Only one language can be addressed with > the concepts of a FDCC-set; to address for example a bilingual culture, > one need to have 2 FDCC-sets" Here, language and culture are being > confused and it would be better to state: "Note: Only one language per > territory can be addressed with a single FDCC-set; an additional FDCC-set > is required for each additional language for that territory." accepted. > 11. page 12 line 738 "ISO" ==> incorrect; there is no entry like this in > ISO 3166 today Accepted, will be changed to empty entries. > 12. page 12 line 742+ It may be better to change the "i18n:1999" entries > to "i18n:2000" now rather than later. Also applies to page 11, line 717. Accepted. > 13. page 13 line 808 States "Basic keywords" Does this imply that the > keywords that relate to transliteration are somehow of the "advanced" > variety? It would be better, if a division is desired, to label these as " > Character Classification Keywords". Accepted. The basic keywords are defined in this specification, there may be user-defined keywords. The terminology is consistent with POSIX. Anyway the keywords will be renamed "character classification keywords" and "translitteration keywords" > 14. page 14 line 874+ xdigit. To allow xdigit to take on script and digit > variants is a bad, and dangerous concept and an unnecessary addition. > The xdigit area should be fenced off to be only 0-9, a..f, and A..F. accepted > 15. page 15 line 926+ map. Remove this keyword. This is dangerous > character encoding territory and one that does not belong in LC_CTYPE, > if at all in this TR. Have there ever been any requests to have this type > of stuff included in this category? Not accepted. It maps to the 15435 APIs. There are things that need to be done this way, like mapping to primary form of arabic letters, and width of characters. > 16. page 16 line 967 "A Automatically included; see text" - this is only > for the portable character set and only parts of it are relevant for a > class. Perhaps it would be better to state it as such. Not accepted. It is for all of UCS. This is in line with POSIX. > 17. page 16 line 976 section 4.3.2 Opposition to having transliteration > specified in this document has been stated before. Having that same > transliteration specification under LC_CTYPE does not make it better; in > fact we are vehemently opposed to having this in LC_CTYPE. Of th two > evils, it is better to have it as a separate category (say LC_XLITERATE) > than bundled into LC_CTYPE. Accpted. > > 18. page 19+ line 1108+ section 4.3.3 The comments lines, of the type > "Table 9 Basic Greek", inserted in the "upper" "lower" and "alpha" etc. > keywords to break up the sections is a good idea. It should be continued > in the "punct", "graph", "toupper", "tolower", etc. keywords. Accepted. > 19. page 19 general Are all of these ranges accurate? Do we need to > continue to specify these or should this TR point elsewhere for this > information? They should be correct. They have been reviewed for two drafts and have been quite stable. > 20. page 22 line 1371 Why are currency signs and miscellaneous symbols > included under "punct"? This is POSIX/industry tradition. > 21. page 28 line 1846+ This is a little unclear particularly the last > sentence; it does not make much sense, so better wording cannot be > suggested. This is POSIX wording. > 22. page 29 line 1877+ A few of the keywords are defined to be optional. > Are all the others mandatory? This should be made clear as was done under > LC_IDENTIFICATION and LC_CTYPE. accepted. The keywords will be marked as manatory, unless othewise noted > 23. page 39 line 2082 "COLL_WEIGHT_MAX limit. The minimum value is 7." As > commented on before and accepted last time, the minimum value cannot be 7; > it can be a maximum value of 7 (and that is what the MAX in the > COLL_WEIGHT_MAX is!). As noted previously, 14651 states this value as 4 - > so, since one cannot have a value less than the minimum value, it would be > in error. This has to be a maximum value of 7. Accepted. It will be clarified using line 1889-1890 wording. > 24. page 34 line 2140+ Please use the same font size as the other > "Example" text. Accepted. > 25. page 34+ line 2171/73 The statement should be amended to use > "sort-rule" instead of "sort-rules" in both cases. This would then be in > line with the example where each "sort-rule" is either the rule for > forward sorting or backward sorting etc. on a level. Accepted. > 26. page 41 line 2607 The keywords "valid_from" and "valid-to" should be > removed. There are new entries relating to dual currency support. Dual > currencies, in almost all cases, are only present for a relatively short > period of time and as such should not entertained in this manner. There > are other well-established mechanism for handling these transient > situations. Not accepted. The dual currency rollover is an important period and it is important that this cultural convention be documented. The keywords will be optional. > 27. page 41 line 2622 The "conversion_rate" keyword should be removed. The > reason for removal is that a currency conversion rate is not a constant; > it fluctuates by the second. Conversion rate handling is best done outside > of this category. Not accepted. This is for the case of dual currency - where the rate is stable. Anyway references to FDCC-sets could fluctate by the second by referencing it eg over the Internet. The keywords will be optional. > 28. page 43 line 2680 int_curr_symbol ==> "int_curr_symbol" for > consistency accepted. > 29. page 43+ line 2677+ Group the keywords, rather than having them > scattered as at present, by "int_curr_symbol" and "currency_cymbol" usage. > For example, list all of the "int_*" keywords first, followed by all of > the non-"int_*" keywords. The order of the keywords within these groups > should be the same. Accepted. > 30. page 44 line 2773 Replace "..formatted international monetary > quantity.." with "..formatted monetary quantity using the > "int_curr_symbol"." for consistency. Accepted. > 31. page 45 line 2791 as above for comment 30. Accepted. > 32. page 45 line 2815 To be culturally neutral, the value for > "mon_decimal_point" should be an empty string i.e. the same value as the > "mon_thousands_sep" keyword. Not accepted. This follows ISO standards and JTC1 directives, and it is most confusing to have mon_decimal_point as the empty string. Cultures that deviate from the ISO standard should specify this accordingly in their FDCC-set specification. > 33. page 46 line 2871 As above, to be culturally neutral, the value for > the "decimal_point" should be an empty string, as per the value for the > "thousands_sep" keyword. Same response as to CA-32. > 34. page 47 line 2905+ "The second operand is an integer specifying the > Gregorian date in the format YYYYMMDD" .This suggests that one can enter > any date. Is that the intent.? Is so, what relevance does it have with the > "week" keyword? If not, then this needs to be explained, and the tie-in > with the "week" keyword needs to be elaborated. As it stands, one cannot > determine it's usefulness or validity. Accepted. The second parameter indicates a date wihich is a first weekday. Then according to the week length, you can find all other first weekdays of time. This will be clarified. > 35. page 47 line 2907+ "The third operand is an integer specifying the > weekday number to be contained in the first week of the year". The first > problem with this is that the term "weekday number" has not been define up > to this point. The second problem is with what is the intent and the > statement reflecting that intent. The intent, if we understand this > correctly, is to ascertain how many days are required in a week for it to > be considered the first week of the year. If this is correct, then state > it as such. If not, please elaborate so it can be evaluated. Accepted. The Canadian explanation will be added as "The third operand may also be understood as the number of days required in a week for it to be considered the first week of the year". > 36. page 48 line 2980+ This keyword seems to be geared towards the notion > of a single timezone per LC_TIME. While it is true that many countries in > the world have a single timezone, there are others where multiple > timezones are used. One such country is Canada. This keyword needs to be > changed to allow specification of multiple timezone usage. Accepted. This will be modified, so "rule" will be have more timezones. > 37. page 49 line 2989+ " and Indicates no less than three, nor > more than 10 characters that are the designation for the standard or > summer time zone" . > > The first problem with this is to do with the range of characters allowed > in the name of the timezone; with this restriction as it stands, one > cannot enter any timezone name - for example, one timezone name, for the > standard time, is "Eastern Standard Time" which clearly extends beyond the > 10 character limit! Restricting to just common abbreviations or acronyms > does not always work either. What is needed are two keywords each for each > of and . The first keyword indicates the full name of the > timezone, and the second keyword indicates the abbreviations or acronym. Accepted. Text to be supplied from Baldev. > The second problem is with the term "summer" for the . This should > be replaced by "Daylight saving time or summer time". Remember that the > acronym "dst" does stand for "Daylight saving time". Accepted. > 38. page 49 line 3020+ This only deals with the rule based change > to/from Daylight saving time. All countries in the world that observe > do not all have rule based criteria for its observance. The keyword, > or the "timezone" keyword, should indicate that it ONLY applies to rule > based observance. Accepted. > 39. page 53 line 3237+ Section 4.9 If this is to be useful, allowance has > to be made for different cultural conventions used in different parts of > the world. First , all countries do not use millimetres. Second, common > country-specific industry standard names should be allowed - such as A4 > in the U.K., and "letter", "legal" in Canada. LC_PAPER will be removed. > The other problem is that the current statements for "height" and "width" > suggest specification of a single size. That is not the common practice. > Multiple paper sizes have to be allowed for - for example, one needs to be > able to specify the dimension of both "letter" and "legal" sizes following > the example above. LC_PAPER will be removed. > 40. page 55/56 line 3356+ The keywords "lang_term" and "lang_lib" are > specifying exactly the same thing - "a three-letter abbreviation of th > language for *** use, according to ISO 639-2." The only difference > between the two statements is the term "terminal" and "library" replacing > "***" for the appropriate keyword. These keywords should be replaced > with a single keyword "lang_ab3" (and the keyword "lang_ab" changed to > "lang_ab2" to indicate the two-letter abbreviations from ISO 639) ) > following the same convention as adopted for the "country_ab2" and > "country_ab3" keyword in this section. Accepted in principle. lang_term and lang_lib may be different, thus the two keywords, eg "esp" and "spa" for Spanish. A note will be added to clarify this. The names will be changed to ab2 and ab3_term and ab3_lib. > > ______ end of Canada comments; beginning of Denmark comments ___________ > > > DENMARK COMMENTS ACCOMPANYING NEGATIVE VOTE ON SC22 LETTER BALLOT N2955 > > > We can inform you that Denmark disapproves the balloted text for PDTR > 14652. However, if the report type is changed to Type 2 and the following > general and specific observations are implemented, Denmark will change its > vote to approval. It is understood that the Danish NB wants to have the possiblity to change the TR to an IS, if concensus is reached. This can be done both for TR type 1 and 2, as per Directives clause 15.4.2.1. The history of the document is that the NP was approved as for an IS, and later it was experienced that concensus could not be reached. Such a project can be changed into a TR type 1, as per Directives clause 15.2.1. A TR type 2 needs to go though the JTC 1 5 stage development process as prescribed in Directives clause 15.1. > > From an overall perspective the PDTR should be rewritten in order to > achieve the goal of being a useful Technical Report. The rewriting should > pay special attention to the use of 'it', 'its', spoken language forms, > "short-hand" sentences and the like, which makes the text hard to read > and, thus, error prone. Much of the text is taken from other ISO standards, such as ISO/IEC 9945-2, and as such deemed appropiate for ISO standards. > All usage of the term 'standard', 'this standard', etc. should be replaced > by 'Technical Report', 'this Technical Report', etc. Accepted. > Definitions should be sorted alphanumeric and renumbered accordingly. Not accepted. Sorted alphanumerically by which language? JTC 1 specifications may be translated into a number of languages, and the structure of the document must be preserved. > Definitions (and other material) copied from other > standards/specifications/TRs should carry a reference to their origin > with a clear indication of differences from the original, if applicable. Accepted where ambigeous definitions occur. > The definition of FDCC-set is confusing because it (sort of) defines > itself. A definition of FDCC would probably help. Accept. Cultural convention is defined. > The definition of 'cultural convention' as 'A data item for information > technology that may vary dependent on language, territory, or other > cultural habits' is particularly helpful for the reader in identifying > the actual subject of the Technical Report. Noted. This will be referred to in a definiton of FDCC. > Line 307/308 states 'The first eight entries in Table 1 are defined in > ISO/IEC 6429 and others are defined in ISO/IEC 10646-1'. It is unclear if > this indicates that the remaining entries stems from somewhere else, in > which case the text should be adjusted. Aceepted. The rest stems from ISO/IEC 9945-2. This will be clarified. > Plans for future editions of this TR should be moved to Notes or removed. Acceped. The plans will be moved to notes. Intentions will be noted as reserved. > The title of subclause 4.1 should be changed to FDCC-set description, > because FDCC-set already is defined in subclause 3.1.6. accepted. > The use of the word 'shall' seems inconsistent with the concept of a TR > > The use of the word 'define' (all forms) in the running text of the TR > should, in general, be replaced by 'specify' or ' describe' (similar > forms). Definitions belong to subclause 3. Not accepted. This is a type 1 TR, and it is written as international standards, as per directives clause 15.3.1. If the TR is converted to an IS as per Directives clause 15.4.1.2 then these changes need to be reversed. > References to other International Standards should be normalised (e.g. > see
in International Standard ISO/IEC yyyyy-z) throughout > the text. Not accepted. All efforts will be taken to be consistent with JTC1 directives. > Line 556: The expression 'shall be escaped' is unclear. accepted. add "by the escape character". > Line 605: Change 'wth ' to 'with' Accepted. > > _____ end of Denmark comments; beginning of Germany comments _______ > > > DIN Vote on SC 22 N 2955 > > Disapproval with Comments: > > Germany is in favour of the transformation of the former FCD 14652.2 > into a TR. In Malvern, WG20 opted for this transformation to safeguard > the valuable information inherent in the draft which should, while in > Germany's views not ripe for standardization, not be withheld from the > implementor community. > > However, it notes in many places that the conversion is as yet incomplete. > > This includes the frequent self-description of the document as a > "standard". A sample of this occurs right on the cover page ("Document > type: International standard"), followed by p. iv ("There are a number > of benefits coming from this standard" and "This standard specifies > ..."), p. 1 ("specifications in this Standard") and so forth. Accepted. The front page will be provided by ITTF. The other text will be changed. > > Germany also disapproves of the LC_CTYPE category of the FDCC set > (mainly section 4.3). No character classifications should be doublicated > between SC22/WG20 and the Unicode Consortium, as inconsistencies are > likely to slip in to the detriment of implementers. Not accepted. There are many other classifications of characters available in the industry, eg. what has been produced by national bodies and X/Open. What is listed here represent the consensus of the WG and has been checked with Unicode data. > The need for a conformance clause is also to be decided upon by the WG. There is a need for a conformance clause. > Numerous other details need to be settled before the PDTR can be > finalized. Many of those have been brought forth in the national body > comments to the last FCD and should be clarified before any progress > can be made. Not accepted. The disposition of comments on FCD2 was approved by the WG20 and has been implemented in the PDTR, and thus all previous comments have been resolved. > > ___________ end of Germany comments; beginning of Japan comments ________ > > > Comments on PDTR 14652 > > The National Body of Japan disapproves PDTR 14652 for the reason below. > > If the comment is satisfactorily resolved, Japan will change its vote to > approval. > > ------- > > The reason why this document becomes a technical report instead of a > standard should be explained in this document. accepted. > Japan proposes to add some paragraphs for this purpose as follows. > > ---- beginning of the proposed text --- > > This Technical Reports presents a trial for defining a general mechanism > to specify cultural conventions. Though its contents are developed in > order to form a standard, it is decided to be a technical report in order > to give information to public earlier instead of resolving the issues > coming from the National Bodies. The information is really as resolved issues from the national bodies. A number of NBs have been participating in this work and decided the contents of the TR. > The issues includes but are not limited to > > 1) Whether the features which have their origin in ISO/IEC 9945-2 -- > POSIX Part 2 -- works well after its separation from ISO/IEC 9945-2 or > not. > 2) Whether it makes sense or not to have a default value, which may be > considered as a recommendation, for each cultural convention item. > 3) Whether each specification form fits for world-wide cultural > variations or not. > The preparer of this report, ISO/IEC JTC1 SC22, expects the rapid > progress of internationalization in the field of information technology > will solve the above mentioned issues and this technical report will be > used as a base for a new standard in near future. Accepted. An annex will be added with this text and member body concerns will be listed. Member bodies will provide this text. > > ______ end of Japan comments; beginning of USA comments ________________ > > > The US National Body votes to disapprove ISO/IEC PDTR 14652, Information > technology - Programming languages, their environments and system > software interfaces - Specification Method for Cultural Conventions, see > below for comments > > > US comments to the PDTR ballot on TR 14652 > Documents: SC22 N2955 (SC22/WG20 N690, L2/99-209) > September 24, 1999 > > > General Comments > > The US is in favor of the change of status of this document from a > draft standard to a draft technical report, which seems appropriate > under ISO directives, given the disagreements within the committee > regarding how to proceed and the lack of consensus to make the > document an International Standard. However, the conversion from > a draft standard to a draft technical report does not seem to have > been completed. Two issues stand out: > > 1. There are a number of places in the document where it refers to > itself as a "standard". These must, of course, all be corrected. > Places we noted were: p. iv, lines 77, and 110 (two times) and > p. 3, line 268. The entire document should be searched to guarantee > that no other instances occur. Accepted. > 2. While it is not unheard of for a Technical Report to contain > a conformance clause, and makes a certain amount of sense, given > the history of this document, it is still unusual. Furthermore, the > language throughout the Technical Report is expressed in terms that > express conformance: "...shall be used as...", "...shall contain...", > and so on. The more appropriate rhetorical structure for a Technical > Report is simply to describe and specify things, rather than > strive to sound as if it is a standard even when it isn't. To > pick a typical example from page 7 of the document: It is usual for a TR type 1 to have conformance clauses. > "The FDCC-set definition text shall contain one or more FDCC-set > category source definitions, and shall not contain more than one > definition for the same FDCC-set category." > > That is a prescriptive specification, tied to the conformance clause. > The descriptive reformulation would be: > > "The FDCC-set definition contains one or more FDCC-set category > source definitions, and does not contain more than one definition > for the same FDCC-set category." > > We would prefer that the text be rewritten to remove its prescriptive > character. But failing that, it would make sense to at least add a > prominent paragraph to the Forward that describes the origin > of the document, explains why it is written *like* a standard, but > is not actually a standard. What is there now is just directives > boiler plate, but does not explain why the text is still written > in its prescriptive manner and why it still has a conformance clause. It is accepted to make a prominent paragraph that describes the origin and history of the document. (This is actually prescribed by the Directives). > Technical Comments > > p. v, line 126 > > "A standard set of values for all the categories has been defined > covering the repertoire of ISO/IEC 10646-1." > > This may not be correct, depending on what is meant by "covering > the repertoire". LC_COLLATE, by reference to FCD3 14651, does indeed > provide full coverage for 10646-1 (plus Amendments 1-7), plus > two characters from Amendment 18. LC_CTYPE also intends to cover > the same repertoire, although it is difficult to determine whether > it is actually complete. In any case, PDTR 14652 is insufficiently > precise in defining the repertoire to be covered, since the > "repertoire of ISO/IEC 10646-1" is constantly changing. > > It would be helpful to refer to specific versions of the Unicode > Standard to define repertoire, rather than 10646-1 plus amendments, > since that would be more precise (and mechanically checkable, since > a fixed, machine-readable data file exists for each Unicode version). > For example, 10646-1 plus Amendments 1-7 (or 1-9, for that matter, > since Amendments 8 and 9 did not add characters), corresponds > to Unicode 2.0.0. But the addition of the euro sign and object > replacement character can be specified exactly as Unicode 2.1.9, > but does not correspond to any particular configuration of 10646-1:1993 > plus amendments. Accepted in principle. The precise repertoire of 10646 is identified in the normative reference clause. Reference to ISO standards is preferred to references to liking industry standards. The repertoire used in 14652 is intended to be the same as for 14651. > p. 1, line 169 in Normative References > > This lists 10646-1:1993 as the normative reference. We think this > is a mistake. Given the timing of this Technical Report, it is > clear that the reference should be to 10646-1:2000, the > second edition. Then the specification of the particular repertoire > of that standard to be covered should be by reference to > a particular version (or versions) of the Unicode Standard, > or by reference to the numbered collections of the standard > (e.g. collection BMP-AMD.7 is explicitly provided in 10646-1:2000 > to specify the repertoire that matches Unicode 2.0.0.). Not accepted. The aim is as stated in the references, and it is aligned with 14651 and 10176. The same as is done for 14651 will be done here. > p. 3, line 270, section 3.2.1 Notation for defining syntax. > > This entire syntax is unhelpful and could easily be dispensed > with if the thrust of this document were not to maintain > upward compatibility with POSIX specifications. In a > descriptive specification of cultural conventions, it is > far better to simply specify a tagged format of some sort. > This would allow dispensing with the superfluous C-style > printf specifiers and "\n" line terminators, etc. Such a > format would be more useful to other users of the > cultural conventions, and if properly designed could be > easily converted to XML, where it could be transmitted > and verified in standard ways. Not accepted. The syntax is verified in standard ways, and the outline suggested is not state of the art. An effort will be made to make the TR consistent with specifications mentioned. > Furthermore, use of the printf argument specifiers in this > meta-syntax conflicts with the definition of other > similar format specifiers for date, time, and other > format specifications within particular FDCC-set categories. > This is most confusing in a Technical Report. The TR > should describe the specification for cultural conventions > in an implementation neutral but clear way -- and then the > POSIX++ implementation is free to make use of printf-style > specifiers for its actual implementation on UNIX platforms. Not accepted. This is how it also has been done in other ISO/IEC specifications. > p. 4, line 304, section 3.2.3 Portable character set. > > This is completely unnecessary for the specification of > cultural conventions. A neutral specification would simply > make use of the UCS identifications of characters. The > requirement to specify the Portable character set is a > POSIX artifact that should itself be restricted to the > POSIX specifications. Not accepted. This is necessary for bacwards compatibility with POSIX specifications. > p. 7, line 491ff > > The definition of category body is not consistent with the > allowance for comment lines at any point. Not accepted. Comments are not allowed everywhere, as per earlier Japanese comments. > p. 9, line 585 > > "Concatenated constants can include a mix > of the above character representations." > > This is just a silly idea. At the very least this should > be deprecated. Not accepted. This is useful for specifying a number of coded character sets. A number of spcicifcations already use these capabilities. > p. 11, line 677ff, > > Section 4.2 LC_IDENTIFICATION > "All keywords are mandatory unless otherwise noted, and > the operands are strings." > > This is a *good* example of how the specification should > work. This should be carried through for all the other > FDCC-set categories, so that the printf-style format > specifications can be dropped. Noted. Syntax needs to be shown a number of places. > > p. 11, line 717 > > "i18n:1999" ==> "i18n:2000" accepted. > p. 12, line 738, > > the "i18n" LC_IDENTIFICATION category. > > "ISO" does not match the spec for territory, which > required this to be a 2-letter 3166 code. Accepted. "ISO" will be changed to empty string. > p. 12, line 740ff. > > Update all 1999's to 2000's. accepted. > p. 13, line 795ff. > > The "double increment hexadecimal symbolic ellipses" are > a clever, but still goofy convention. They make the > tables for LC_CTYPE less mind-numbing, but a better > strategy is to define all such property categories by > reference to a specified level of the UnicodeData.txt > database, and then to define only the deltas from that > to satisfy the committee on certain categories that might > differ in usage from that specified in UnicodeData.txt. > This would be far, far easier to verify and to > implement than what is currently provided. Not accepted. The ISO TR should not refer to one specific set of data, such as Unicode data, where other data sets have at least as much prcatical use and more legal status, such as POSIX specs and member body locales, and POSIX industry practice. The TR need to be self-consistent. > p. 14, line 874ff, xdigit > > It is an incredibly bad idea to allow hex digits to > have script variants. xdigit should always refer to > 0..9, a..f, A..F, period, full stop. Anything else is > inviting implementation disasters. Accepted, (as also done for CA-14) > p. 15, line 926 map > > The "map" keyword is an unnecessary extension of the concept > of an FDCC-set. The FDCC-set is not the place that all > possible relations between characters are defined. The > concept of an FDCC-set should be restricted to the > specification of *cultural* practices for formatting > quantities, names, addresses, and such, which are > clearly relevant to localization of software. Mixing it > all up with an architecturally unsound approach to the > specification of character encodings and character semantics > is a very, very, bad idea. This should just be removed. Not accepted. See response to CA-15 > p. 16, line 976ff, > > section 4.3.2 Character string transliteration > > It is no more acceptable to have a bad transliteration > specification tucked away inside the LC_CTYPE category > than it was to have it specified as a separate category. > In fact, it is even less coherent than before, when > treated as a part of the LC_CTYPE specifications of > cultural conventions. This section should be removed, as > it has nothing to do with the specification of cultural > conventions. Accepted in part, see response to CA-17. The transliteration specification is culturally dependent. as for example German transliteration of cyrillic characters are different from English transliteration of same characters. > By the way, if this is included in the Technical Report, it > is quite likely that it will be widely ignored by those > implementing transliteration (outside the POSIX community > at least), and will just make the committee look silly. Response: It is not silly to provide support for needed functionality. > p. 18, line 1091. > > .. is not the correct > range for ideographic characters in UCS, in whatever > version. Accepted in principle, and added text to clarify the 3200-FAFF range as "and Hangul and East Asian symbols and private use area etc". > p. 19, line 1108ff "i18n" LC_CTYPE category. > > Once again, for the Technical Report to try to define all this > is an incredible waste of time and error-prone. Now that > 14652 is a Technical Report and not an International Standard, > the allergy it has previously shown towards referring to > the industry standard implementation of these properties > should be shrugged off in favor of a more useful and > accurate approach. Not accepted. The data is refencing to a number of de jure and industry specifications, but not a specific specification made in the USA. > We have not tried to locate *every* error in these tables, but > some examples will show the problem. > > For the "upper" category: 01A6 should be added; 01C5, 01C8, > 01CB, 01F2 should be removed (those are titlecase, not uppercase); > all the IPA extensions should be removed (the IPA small-caps > letters are notionally lowercase, and should not be included > in the "upper" category); 03D2..03D4 should be added; > the range ..(2).. has the wrong start point -- > it should be , ... and so on. These data will be checked. > On p. 21, line 1315, > the CJK unified ideographs range starts > at the wrong point. It should be 4E00, not 4E01. accepted. > On p. 22, line 1369, > the cntrl range is .., > not ... accepted. > For the "punct" category on p. 22, this departs very strongly > from the UnicodeData definition of punctuation. "punct" here > includes currency signs and miscellaneous symbols as well > as true punctuation. This should be justified or corrected. Not accepted. The specification follows dejure and POSIX industry practice. > LC_CTYPE category (continued) > > By the way, the consistent use of UCS designations to refer > to characters in the "i18n" LC_CTYPE specification puts the > lie to the usefulness and requirement for 14652 to define > and make use of the bizarre repertoiremap of section 6 of > 14652 (p. 62 ff). Consistent use of ASCII characters as > themselves, a judicious use of a few other symbolic names > for the few other characters explicitly referred to in > ways where that would help (e.g. for examples for > collation, etc.) and use of UCS symbolic names for everything > else would be far, far preferable for this Technical Report, > to the way it currently stands. Not accepted. This practice is used by MBs, on the internet, in other ISO/IEC standrds and it is industry standard. > p. 28, line 1848 > > "This ordering is used by regular expressions... > also as the collation weight to be used in sorting." > > The intent of this sentence is unclear. What is the > effect on pattern matching if collation weights *are* > explicitly specified? This item must be clarified. accepted, "explicitely specified" will be deleted. > p. 29, line 1888 > > "This value is elsewhere referred [to] as > the COLL_WEIGHT_MAX limit." (also on p. 33, line 2081) > > Where is this elsewhere? Either specify the elsewhere, or > drop this non sequitur. Accepted. A check will also be done for occurance of this limit, eg. line 2185. (2085: correct to exceeds the number of levels the keyword supports) > p. 31, line 1964 > > "lower than the coded character set value" > > Since this is talking about a *symbolic* ellipsis here, > should this be lower in the sequence of symbolic names, > rather than lower in the *coded character set value* ? Accepted. > p. 31 ff., general > > All this specification of *how* to assign weights and deal > with the IGNORE's, and so on should be left to 14651, and > acquired in 14652 by reference. This discussion here is > just inviting inconsistencies in implementation. The appropriate > scope for 14652 is to specify the additions to the LC_COLLATE > syntax over and above 14651, i.e. specification of limits > on numbers of levels, introduction of the symbol-equivalence > keyword, and use of the copy keyword. Not accepted. We need the full specification to be defined here, for completeness, and for upwards compatibility with POSIX. > p. 32, line 2014. > > "A occurring where > the delimiter ";" may occur, terminates the collating > statement." > > This is a *good* example of the way the entire text of the TR > should be written. This is descriptive, and does not resort > to the prescriptive "shall" when describing a format. Noted. > p. 34, line 2121 Note. > > This claim is not generally true. It depends on what type of > decomposition is done. This claim is for canonical decomposition. > Clarify the text if this note is to remain. Accepted. > p. 35, line 2203 F and B, and B and P > > The way these "and"s are used here, this claim is very difficult > to parse and understand. This should be broken out to explicit > single statements about what terms are mutually exclusive. Accepted. > p. 36, line 2226 ". > > ..and shall be present in the source FDCC-set > copied via the "copy" keyword." > > This is not required by 14651, and should not be, because it > restricts tailorings unnecessarily. The point is, that a > has to be defined before it is referred > to by a reorder-after statement, but there is no reason why > that definition has to come from a particular FDCC-set copied > via the "copy" keyword. If this is POSIX-specific stuff, once > again, the implementation details should be elsewhere, and not > in the neutral specification of FDCC-sets for cultural > conventions. Not accepted. You need to know where the defintions come from. > p. 37, line 2293, section 4.4.12.1 section reordering statements > > This syntax extends the syntax of 14651 for this statement type. > That should be explicitly pointed out and explained, since it > is not going to be the expectation of those using the TR. Accepted. > p. 38 ff., "i18n" LC_COLLATE category > > The long list of collating symbols is not needed here. This > completely duplicates the list present in the 14651 common > tailorable template, but with the introduction of just > four collating-symbols: , , , > and . The introduction of is not necessary. > That is the result of an error in the 14651 table for two > entries, where "" should be corrected to "". > The other three are simply unexplained additions by the > editor. And even *if* they were needed for some further > purpose, the correct specification for 14652 is simply to > specify the 3 additional collating-symbols, along with > any desired symbol-equivalences, rather than duplicating > the rest of the list in 14652. Accepted in principle, all collation-symbols will be reviewed. > p. 53, line 3277 ff. "i18n" LC_MESSAGES > > Why is "[+1]" and "[-0]" suggested as the default > definition for yes and no in the "i18n" LC_MESSAGES category? > "[1]" and "[0]" might make some sense, though "y" and "n" > would be better, given the status of English as the > international language. But at the very least, the > minus sign on the zero makes no sense and should be > removed. The line number is taken to be line 3233 ff. Not accepted. The notation means that either + and 1 are regarded as positive responses, while - and 0 are regarded as negative responses. The notation is regular expressions using brackets, a comment on this will be added. > p. 53, line 3237, Section 4.9 LC_PAPER > > Paper size conventions certainly are cultural conventions, > but there is a mismatch between what the requirements should > be for specifying these cultural conventions and what is > presented here. > > For the U.S., for example, paper size conventions are > "letter" (8-1/2"x11") and "legal" (8-1/2"x14"), whereas > for the U.K., normal printing paper is A4 (210mm x 297mm). > In addition to the metric series defined in DIN 66008, > there may be other local traditional sizes not defined in > terms of the metric system, as for example, traditional > sizes used in book publishing. > > A specification for cultural conventions should allow the > expression of all this, and not attempt to reduce everything > to one height and one width expressed in millimeters (which > won't even be accurate for U.S. paper sizes, by the way). > > This FDCC-set category seems to be, instead, > another implementation-driven category that could be used > transiently to convey some "locale-based" information to > a printing process through some API. Effectively, the > FDCC-set is being conceived of a one gigantic C struct > for storing anything an application might ever conceivably > need for cultural adaptability for access to an API, rather than > as a "specification for cultural conventions", which is > nominally what this Technical Report is about. Accepted. The LC_PAPER category will be removed. > p. 54, line 3262, Section LC_NAME > > This entire specification is very weak and basically useless. > The list of keywords provided is Western-centric. > > This is a good example of a category that should be > reorganized to *first* provide a discursive listing of > name formatting and salutation conventions in different > cultures, before trying to invent a syntax to make it > machine-readable. Otherwise this is "standards-fishing"-- > promulgating a standard syntax in the hope that it may > be sufficient and implementable, but in the absence of > evidence of use or enough data to demonstrate its appropriateness > to the field of application. Not accepted. The category builds on information from 10 south-east Asian countries along with European conventions. It is thus better researched than a number of the original POSIX categories. > p. 54, line 3298 %m Middle names > > It is unclear whether this is intended to be one string > associated with potentially multiple names. If so, this > should be clarified. Also, what is the relationship between > the potentially multiple names of %m and the apparent > single initial letter of %M "Middle initial"? Accepted in principle. %m is intended to be used with multiple names. %M is one or more initials. > p. 54, line 3200 %p Profession. > > This is also unclear as to intent. Does this, combined > with the "i18n" LC_NAME category value for name_fmt > imply that "Lawyer John C. Smith" and > "Assistant Manager at McDonald's James T. Peabody" are > valid formatted names? accepted, guidance of its use will be added in the rationale. > p. 54, line 3274 name_gen > > The example given for the use of Japanese "-sama" as a > salutation is incorrect. While it is true that "-sama" is > not gender-specific, it is not appropriate "for all > persons", and certainly not in all contexts. Furthermore, > it is an honorific, and not a salutation, since it cannot > be used independently of a name. accepted. example will be removed. > p. 55, line 3305 > > "The va[lu]e may be stored in the database > with the person information." > > What database? Once again, this specification is referring > to some unclear context outside the scope of the document, > with the implication that this is designed for some particular > implementation, rather than standing independently as a > specification. Either delete the reference to "the database" > or make it clear in this document what is intended by it. Accepted, an example of its use with an API will be given. > p. 55, line 3318 name_fmt specification > > This is essentially just a mask definition for a format. > It would be better stated explicitly as a mask, rather than as a > printf style format string. And there is nothing gained > by use of the symbolic name "

", rather than just "p"; > this just clutters the specification and makes it hard > to read. > > The same comment applies to the format masks for > LC_ADDRESS pos.al.fmt (p. 56, line 3400), and the > LC_TELEPHONE tel_int_fmt (p. 57, line 3443). The

provides portability when compiling the FDCC-set into a target native binary representation. A note on use of <> syntax will be added to clause 4. > p. 55, section 4.11 LC_ADDRESS > > There is no justification for why the LC_ADDRESS category, > which should be focussed on specification of cultural > specifications for addresses, is cluttered up with a bunch > of keywords related to ISO 3166 country codes, motor > vehicle country codes, ISO 2108 ISBN codes, and ISO 639 > language codes. Once again this looks like a case of > tossing in the kitchen sink, rather than designing the > category appropriate to its usage. All of these superfluous > keywords should be removed from the category. If an > FDCC-set needs to specify any of this information, it > should be elsewhere, and not just tossed into LC_ADDRESS > for no apparent reason. > > The Rationale for this category in Annex B does not > discuss this, either. This will be removed. > p. 57, section 5. CHARMAP > > As we have indicated before, specification of a CHARMAP > is completely out of scope for cultural conventions. > This is just a bad design that continues the way POSIX > deals with implementation of character set encodings. > There is no good reason for the 14652 Technical Report > on the specification method for cultural conventions to > be continuing and expanding on this bad design. It > should be removed. (See further comments on the Annex B > Rationale below.) Not accepted. Coded character sets are cultural conventions, and necessary to make to FDCC-sets work. > p. 58, line 3502, > > Even with the example now provided on p. 61, this > explanation for is still almost incomprehensible. > The same applies to the discussion of operands for > the keyword on p. 59, which also refer to > "the g-set or c-set to be defined" and the "range > of characters in the referenced charmap". This is > another reason why this entire section should be dropped > from the TR. Not accepted. is needed to describe some encodings, such as ISO 2022. The keyword will be changed to reflect that it applies to 2022 encoding. > p. 60, line 3593 ff., > > "The encoding part shall be expressed as..." > > Even if this is specified for upward compatibility with > existing POSIX practice, it is ridiculous in this day > and age to be encourage the use of octal or decimal > in the specification of character encoding values. > Hexadecimal is clearly the radix of choice, and should > be encouraged by the technical report. The others should > be deprecated. Not accepted. Octal and decimal is also used in some circles. A note on preference to hex will be added. > p. 61, line 3625 ff., > > "Example of using ISO 2022 techniques" > > These examples do now provide enough information to be > able to make a guess at how the CHARMAP keyword "" > is intended to be used. But the examples also illustrate > the reason why this entire CHARMAP section should be > removed from the Technical Report. This is an implementation > format for extension of the POSIX architecture for > character set encoding definitions, rather than any > information useful for the specification of cultural > conventions. Anyone wanting to understand the structure > of these character encodings would be far better off going > to the easily available (and far more complete and accurate) > book by Ken Lunde on CJKV Information Processing. And as > for cultural conventions, this section of CHARMAP (as well > as the subsequent one on REPERTOIREMAP) just get in the > way and confuse the issues. Not accepted. Encodings are cultural conventions. > p. 61, lines 3661, 3662. > > If the CHARMAP section and examples stay in, at the very > least, the example characters used should be legal, > assigned characters. The comment says "the character > codes are only examples", but in fact and > are not valid, assigned characters in the version of > 10646-1 normatively referred to by the Technical Report. > At the least, make the effort to use assigned characters. accepted. > p. 62, Section 6 REPERTOIREMAP > > Once again, we object to this ridiculous REPERTOIREMAP, > which does not belong in 14652. It should be removed. > > The justification on p. 63 keeps getting longer, but is > no more convincing than before. The *concept* of a > repertoiremap is prior art, but the wholesale extension > of that prior art to include thousands of the editor's > whole cloth inventions can hardly be characterized as > prior art. It is simply an example of dogged persistence. The concept is to descripe a repertoire, a basic concept in chracter set terminology. > At most, perhaps 100 or so of these symbols beyond > the range of Latin-1 characters have any usefulness. > The rest are just the result of a mediocre concept > extended at least two standard deviations beyond the > range of reasonableness. Why would anyone want to > use the symbol "" instead of "" to > refer to U+1FAF GREEK CAPITAL LETTER OMEGA WITH DASIA AND > PERISPOMENI AND PROSGEGRAMMENI? Or "<_./>//>" instead > of "" for U+25E2 BLACK LOWER RIGHT TRIANGLE? > > The silliness of this REPERTOIREMAP, which takes up nearly > one quarter of a 114 page Technical Report, is illustrated > by its incompleteness. After inventing 2318 of these > symbols, the editor apparently ran out of gas, and didn't > extend them to cover Georgian, Thai, Lao, or any of the > Indic scripts. He included symbols for compatibility > Arabic positional shape characters, but not symbols for > compatibility fullwidth ASCII or halfwidth katakana > characters. Why? And it is completely unclear how this > mechanism would be extended, except by more arbitrary > and nearly random string assignments, to cover the > repertoire of Unicode 3.0 (with 1165 Yi syllables and > 630 Canadian Aboriginal Syllabics syllables, just to name > the most problematical). Not accepted. This reflects practice, as stated. > p. 64, line 3823 - 3848 > > The introduction of these "" .. "" symbols > here is problematical. First of all, these are intended > for use with LC_COLLATE for tailoring of the table from > 14651, but that is not explained here or anywhere. Accepted in principle, will be explained. > Second, the particular assignments of these symbol > weights to particular Unicode characters is invalid > in general. There is nothing that ensures that U+0252 > is going to be the "last A" in the common tailorable > template table. This could be affected either by a > revision of that table or the addition of new characters > to 10646 that get incorporated into the table. It is > inadvisable in any case to tie a hack for tailoring > Latin characters to particular Unicode values. The > proper way to do this is to introduce "", etc. > as collating-symbols in LC_COLLATE and then introduce > the appropriate tailoring with reorder-after statements > based on a particular version of the 14651 table. The references will be updated when the updates are made. This is for making deltas stable. > p. 92 Annex B (informative) Rationale > > The entire Technical Report now is informative. Restructuring > the document as a Technical Report should move most of this > material into the main body of the text. There is no > rhetorical reason why it should be separated off in > an annex, since it provides explanatory material that > is often needed at the point where concepts are introduced. > > In particular, the LC_MONETARY Rationale (B.1.4) is a > better start toward what the Technical Report *should* > contain about monetary formatting than the definition > of the LC_MONETARY category itself. Not accepted. It is intended that the TR will be an IS eventually and there is no need to do this wholesale reediting two times. > p. 92, line 6183 > > "an ISO/IEC 10646 system that has > defined 16-bit bytes may..." > > There is no reason for the document to be obtuse about > this. No Unicode (or 10646) system implements 16-bit > characters by defining 16-bit bytes. "Byte" in the > industry has an unshakeable meaning of an 8-bit quantity. > It is only character standards diehards who keep insisting > that "byte" is variable width and refers to the width > in bits that a character is encoded in. The entire > industry has chosen the other route, and measures data > in bytes, which is a fixed-size quantity. The days when > bytes did differ in size on different machine architecture > are long gone. Not accepted. The ISO/IEC 9899 C standard, and ISO POSIX standards allow for bytes with bitsizes different from 8. > p. 93, Section B.1.3 LC_COLLATE Rationale > > Most of this discussion is out of scope. It is arguing > about 14651, rather than providing the particulars for > the rationale for the LC_COLLATE category. The text will be reviewed with a specific eye on differences from 14651 and POSIX. > p. 94, line 6286 > > "The syntax for the LC_COLLATE > category source is the result of a cooperative effort > between representatives for many countries and organizations > working with international issues, such as UniForum, X/Open, > and ISO, ..." > > If this rationale is to contain all the general discussion > about collation weighting and contents relevant to 14651, > rather than just the discussion of the specific keywords > for the LC_COLLATE category in the FDCC-set, then why > is not the Unicode Consortium on that list? Either add > it or remove all the out-of-scope discussion of 14651- > related issues from this rationale. Accepted, Unicode Consortium Inc will be added. > p. 95, line 6315 > > "It is estimated that the Technical > Report covers the requirements for all European languages, > and no particular problems are anticipated for Cyrillic > or Middle Eastern scripts." > > First of all, it is not PDTR 14652 that covers these > requirements, but FCD3 14651 that does. PDTR 14652 simply > provides the metasyntactic shell to incorporate the > 14651 framework inside the LC_COLLATE category. Accepted in principle. The text will be rewritten. > Secondly, 14651 *does* cover Cyrillic, Arabic, and Hebrew, > as well as the rest of the scripts included in Unicode 2.0, > so there is no reason to make this imprecise statement > about anticipations. Accepted. The statement can be made more firm. > p. 98, Section B.1.3.3 > > Sample FDCC-set specification for Danish. > > line 6485: The symbol "" is not defined. > > In general, the introduction of the particular symbols > needed for this Danish collation specification should be > done *here*, and not in the "i18n" LC_COLLATE category > definition on p. 38 and in the "i18nrep" REPERTOIREMAP > on p. 64. Accepted. The symbol will be fixed. It is better do to the requirements for portability in general, to support the industry practice that is alredy there, and also recorded de jure in the cultural registry, than to do this for every localization with the non-standardization and errors that may follow. > p. 103, line 6753. > > "It is expected that National Standards > Bodies will provide specifications." > > This seems a rather forlorn hope, given a bad format > metasyntax in the first place. Many national bodies are in the process of providing these specifications. > And why insert a plaintive cry for other NB's to do the > work, instead of just digging into the IBM Green Book > discussion of Date Format as a starting point? Because we do not want specific firms or consortia to define very cruital information for nations. Let the nations define this themselves. > p. 103, line 6762. > > "The internationalization working > group is developing an interface..." > > Who? What internationalization working group, affiliated > with whom? This is not made clear in the document. This is WG20 and ISO/IEC 15435. "ISO" may be added. > p. 104, Section B.2 Character Set Rationale > > This rationale is completely unconvincing as a rationale > for including the CHARMAP mechanism in the Technical > Report 14652. > > In particular, the claim that the "charmap was introduced > to resolve problems with the portability of, especially, > FDCC-set sources" should have been addressed simply by > using 10646 (i.e. Unicode) as the *reference* character > set for the Technical Report. This is the obvious solution, > as taken by the HTML and XML standards. This approach allows for > a source document to be represented in any character encoding, > but it is interpreted *as if* it were converted to the > reference character set, i.e., the UCS. In fact, most > implementations will *actually* convert the source document > to Unicode, to simplify their parsing/lexing engines. > The same approach should now be taken towards all such > specifications, including that of 14652, now that the UCS > is a reality. Continuing to fob off this old model of > "source portability" from POSIX on new standards and > technical reports does a disservice to those who are trying > to understand and implement them. Instead of assisting in > source portability, it really only succeeds again and again > in unnecessarily importing all the complexities of legacy > character encodings into standards and technical reports > that have nothing to do with the details of character > encoding. Not accepted. charmaps describes cultural conventions. > p. 105, Section B.3 Repertoiremap Rationale > > "The repertoiremap was introduced to make FDCC-sets > independent of the availability of charmaps." > > Once again, this entire argument would be obviated by making > the UCS the reference character set for the document. All > the complexity of CHARMAP and REPERTOIREMAP would be > pushed off to where it belongs, in POSIX implementations > of the technical report, rather than in the technical report > itself. It is specified both to describe a character repertoire, which is a central character set object, and to make bindings to encodings and FDCC-sets. > > Technical Comments re Section 4.5 LC_MONETARY > > We consider the specification in section 4.5 to illustrate the > nature of the technical problems endemic to the TR's entire > approach to the specification of cultural conventions. The > metasyntax provided for LC_MONETARY is designed to be an > extension of existing POSIX implementations of currency formatting. > However, the extensions are: > > A. Over-elaborate in terms of particular parameter specifications. > > B. Insufficiently precise to be well-defined or to avoid undecidable > cases of conflicting parameter specifications. > > C. Result in specifications of cultural conventions for monetary > formatting that are incomprehensible to the human reader, but > rather are designed to facilitate a programmers task of > parsing out parameters and setting a number of boolean settings > in a localedef implementation. It is not expected that end users specify these things, but exprienced persons with national bodies, or IT suppliers. > > D. Have a "data normalization" problem forced by insisting that > a single FDCC-set must have *two* monetary formats specifiable-- > one for the "local" currency and one for the "international" > currency. This is the Unix euro hack. It seriously complicates > specification of cultural conventions by pushing the euro > problem onto the procrustean bed of the single locale. This > is another sign of implementation considerations driving a > complex and unclear specification, rather than considerations > of clear and simple exposition driving the specification. The two monetary forms are needed to support ISO C, C++ and POSIX standards. They do describe cultural conventions in a country. > We consider the third problem to be particularly egregious, since it > means that registered FDCC-set's will be effectively incomprehensible > in the registry and be unmaintainable by visual inspection. This is > likely to result in duplicate, overlapping, and/or mistaken registrations, > where the formatting intent of the human trying to produce these > FDCC-set definitions may not match the behavior of the implementations > using them. Many skilled people can read understand the specifications. For more narrative descriptions, please refer to registrations of Narrative Coultural Specifications with IS 15897. > The problems can be illustrated by examining the "i18n" FDCC-set > definition proposed for LC_MONETARY, and then by giving another > example where the mind flows freely, posing an outlandish case > to see what the metasyntax allows (and thereby what it *requires* of > implementations). > > LC_MONETARY > % This is the 14652 i18n fdcc-set definition for > % the LC_MONETARY category. > % > int_curr_symbol "" > currency_symbol "" > mon_decimal_point "<,>" > mon_thousands_sep "" > mon_grouping -1 > positive_sign "" > negative_sign "" > int_frac_digits -1 > frac_digits -1 > p_cs_precedes -1 > p_sep_by_space -1 > n_cs_precedes -1 > n_sep_by_space -1 > p_sign_posn -1 > n_sign_posn -1 > % > END LC_MONETARY > > O.k., but what does this mean? It apparently is an attempt to provide > a vanilla default that effectively does nothing except specify that > "," is the decimal separator. But rather than being conceived in > terms of a visual international currency format that would make any > sense whatsoever, it is conceived in terms of the implementation of > localedef, with values to match variable initializations in that > program (null strings, unused values undefined). But if you think in > terms of actual formatting recommendations, this implies that > the currency number 1000 would format as: > > "1000" > > (or possibly "1000,0" or "1000,00" or ..., since frac_digits in not > specified!) > > But the negative value for a currency number -1000 would *also* format as: > > "1000" > > (or possibly "1000,0" or "1000,00" or ...) > > We don't think it was the intention of the editor of 14652 to recommend > that positive and negative currency values be formatted exactly the > same, but that is the implication of this LC_MONETARY definition. So > an unreasonable default has crept in, simply because thinking in terms > of program parameter initialization for setting values in a cryptic > metasyntax does not lend itself to specification of real formats that > would make sense as defaults or recommendations. The negative sign will be added. > By the way, there is a definite technical problem which *must* be > addressed to even make the specification comprehensible. The meaning > of -1 as an argument value for int_frac_digits, frac_digits, p_cs_precedes, > etc. through n_sep_by_space, is not defined in the TR. So as it stands, > the LC_MONETARY definition is anomalous. It doesn't mean anything to > have a negative one number of fractional digits to the right of a > decimal separator, for example. The meaning of -1 is defined in line 2600. > Now for the overpermissiveness and imprecision of the metasyntax > suggested. Let's consider the case of Lower Slobovia. Lower Slobovia > used the tugrik as its local currency until the end of 1998, but > had an unplanned currency reform starting earlier this year. Things > being rather chaotic in Lower Slobovia, the currency reform didn't > complete until 1999-09-16, when Lower Slobovia officially declared the > slobovik as its currency and established the conversion rate of > 1 slobovik for 9 tugrik, suggested by the court astrologer because > the King of Slobovia has 9 daughters. Now since Lower Slobovia > derives most of its income by rather shady money-laundering, they > have adopted the USD as their international money formatting > convention. This tends to obscure the conversions back and forth > from dollars to sloboviks (or tugriks). Lower Slobovia has chosen > to register the following LC_MONETARY specification to reflect > their local cultural conventions: > > LC_MONETARY > % This is the official Lower Slobovian fdcc-set definition for > % the LC_MONETARY category. > % > valid_from ;"19990916" > valid_to "19981231"; > conversion_rate 1;9 > int_curr_symbol "USD$";"USD!" > currency_symbol "";" > % For substitute U+20AE. > % For substitute U+2445 (chosen to honor the > % King of Lower Slobovia's sartorial style) > mon_decimal_point "?" > mon_thousands_sep "6" > % The use of "6" as a thousands separator was dictated by > % the king's astrologer for its felicitous sound, but has > % proven quite profitable when foreign computers unfamiliar > % with our conventions misinterpret it as a digit when > % parsing out tugriks and sloboviks. > mon_grouping 1;2;3;2;1 > positive_sign "-" > negative_sign "+" > int_frac_digits 6;1 > frac_digits 4;9 > p_cs_precedes 0 > p_sep_by_space 2 > n_cs_precedes 0;1 > n_sep_by_space 0;1 > int_p_cs_precedes 1;0 > int_p_sep_by_space 2;0 > int_n_cs_precedes 0;1 > int_n_sep_by_space 0;2 > p_sign_posn 0;1 > n_sign_posn 2;4 > int_p_sign_posn 1;3 > int_n_sign_posn 3 > % > END LC_MONETARY > > The implication for valid_from and valid_to is that the tugrik was valid > from "the beginning of time" to 19981231, and that the slobovik is valid > from 19990916 to "the end of time". (The metasyntax for valid_from and > valid_to in fuzzy on this point, so this is just a stab at what it might > mean to make use of those implicit infinities when defining more than own > currency in the LC_MONETARY specification.) What an implementation will do > when faced with a currency formatting for a date in between is unclear -- > but perhaps that is o.k., since Lower Slobovia was in a state of financial > chaos at the time, anyway. > > O.k., now let's see what happens to the currency values when we > use this definition to describe a money-laundering operation that > had a credit of a little over 2 billion tugrik last year (2,000,600,609 > to be exact) and then had to enter a negative value in the books for > the same amount today. When you work it all out, the fully formatted > currency strings come out to be: > > Positive amount in 1998: > > Local format: "(260600660066069?0000 )" > > Internat format: "- USD$ 260600660066069?0" > > Negative amount today: > > Local format: "+ 2622628869566?566666666" > > Internat format: "2622628869566?6+!USD" > > Is this absurd? Well, of course. But it is *allowed* by the current > specification. That means, in principle, that implementations of this > specification have to be able to produce garbage like this without > choking on various bizarre combinations of parameters. And legions > of programmers are going to be scratching their heads over such things > as what to do with the fourth character of the int_curr_symbol which > "shall be the character used to separate the international currency > symbol from the monetary quantity" when int_p_cs_precedes specifies > that the currency sign goes to the *right* of the monetary quantity > and int_p_sep_by_space specifies that a *space* is used for > separation. Hmm. Much of this is POSIX and C and C++ heritage. The spec accomodates existing issues, like the Euro, and is not generalized beyond what is needed for known examples. > To avoid this kind of nonsense, a usable specification should be > constraining the allowable values for parameters and not allowing any > possible combination in the forlorn hope of not offending the Lower > Slobovians when they finally come to register their cultural conventions > and find that their system wasn't accounted for by some constraint on > allowed values. > > It would be far, far better to specify in detail the *actual* > cultural conventions for currency formatting, with a clear method > for *describing* those conventions in detail. The IBM Green Book > (August 1994, pp. 11 - 19) does so already in great detail in a far > more useful format for implementers. That actual number of combinations > in use is far, far less than an unconstrained metasyntax such as > that of section 4.5 of PDTR 15642 allows. A better approach would > be to simply define a hierarchy of: > > A. Unsigned positive numeric formats. > > B. Positive and negative signed numeric formats. > > C. Currency sign placements in A. and B. > > D. A list of known local currency signs with their country > association and relation to official international banking > signs. > > >From that, any competent software designer can create a mechanism > for formatting and parsing currency strings that is adaptive to > local cultural conventions. And it can be related to the TR's > specification of values for A, B, C, and D above, so that > one implementer of A:2;B:3;C:1;D:Pts can relate that to another > implementation of the same format. The intend of the TR is to allow data suppliers to provide the data, not for data engineers to specify data for a number of countries. The metasyntax is as defined by C, C++ and POSIX and need to be implemented anyway if you ever encounter systems using these standards. > But PDTR 14652 is attempting something quite different -- it is > attempting to *standardize* the Son-of-POSIX specification for > LC_MONETARY to support portable implementations of localdef for > Linux. And possible a number of other systems, such as systems using C, C++ and possibly other progeramming languages.. > It is our opinion that the two goals--clear and effective > exposition of cultural conventions versus extension of POSIX > mechanisms for portable implementations on UNIX systems-- > are effectively at odds, and that the technical quality of > PDTR 14652 is suffering from mixing of incompatible goals. The TR does not try to collect all of this information, this is done with other means such as the cultural registry standard and its narrative cultural specifications. > > > Minor Technical Comments re Section 4.5 LC_MONETARY > > p. 41, line 2616 and line 2621. > The definition of "the beginning > of time" and "the end of time" is not clear here. As a cultural > convention, these should just be UNSPECIFIED. It is then up > to implementations to decide what to do about that. A reasonable > option, of course, is to equate UNSPECIFIED time with either the > first possible or last possible "time" value in a machine > implementation of time, but that is platform-specific in terms > of actual dates that get associated with those values. Accepted in principle, the beginning and end of time will be implementation defined. > > p. 41, line 2622ff. > The convention for using two integers > in a specification of conversion rate also seems implementation-driven, > rather than designed for clarity. If a conversion rate is set > at 1.86301 (not at all unusual as a possibility), why should not > a cultural specification be able to use "1.86301" to express that, > rather than 186301;100000 ? This has to do with avoiding > float parsing complications on Unix rather than with clarity > of specification. This also allows for specific multiplication and division as required by the european commisssion when converting between national countries in the Eurozone. Not accepted. It seems like dual currency problems are not appreciated in the USA. A note on European rules will be added. > > Editorial Comments > > 1. p. iv, line 84 cultural ==> culturally > > 2. p. iv, line 104 become ==> becomes > > 3. p. iv, line 116 backwards ==> upward (cf. usage on p. 1, line 137) > > 4. p. iv, line 117 particulary ==> particularly > > 5. p. 2, line 221 Change final quotation mark to "." > > 6. p. 4, line 290 represent ==> represents > > 7. p. 10, line 652 indicate ==> indicates > > 8. p. 20, line 1167 does ==> do > > 9. p. 29, line 1883 reorder-sections-after ==> reorder-section-after > line 1884 reorder-sections-end ==> reorder-section-end > > (This mistake is made many times on subsequent pages. The entire > document should be searched and all instances fixed. See, for > example, p. 33, lines 2075, 2089; p. 37 multiple times, etc.) > > 10. p. 29, line 1888 referred as ==> referred to as > > 11. p. 30, line 1925 replace-after ==> reorder-after > line 1934 (same mistake -- search entire document) > > 12. p. 30, line 1941 " specified by its place." Add > "in the list of collating statements." to the end of > the sentence. > > 13. p. 31, line 1962 ellipsises ==> ellipses > > 14. p. 33, line 2078 & line 2084 col_weight_max ==> coll_weight_max > > 15. p. 33, line 2097 with the ==> within the > > (Same error on p. 34, lines 2136 and 2157) > > 16. p. 34, line 2155 collating-symbol-2 ==> collating-symbol-1 > > 17. p. 36, line 2236 on ==> in > > 18. p. 55, line 3305 vaule ==> value > > 19. p. 56, line 3391 start ==> starting > > 20. p. 58, line 3512 added the ==> added to the > > 21. p. 90, line 6091 done ==> made > > 22. p. 90, line 6129 introduce ==> introduced > > 23. p. 91, line 6132 elipsises ==> ellipses > > Editorial Comments re Section 4.5 LC_MONETARY > > p. 41, line 2601 "is taken" ==> "is implied" > > p. 43, line 2680 For consistency, "int_curr_symbol" should be > in double quotes. > accepted. --------- Further changes to the document as result of discussion in the Copenhagen WG20 meeting: As the TR has a stated goal to be aligned with IS 14651 - string ordering - and IS 9945-2 POSIX shell and utilities, the editing group took a look at the latest versions of these standards (including the recent POSIX amendment (.2b) and determined some areas to be aligned, and furthermore instructed the editor to look thru the documents to add text to align with these standards. This is documented in an annex to the disposition of comments. The new draft will include accommodation of change requests from ISO/CS. ----------------------------------------------------------------------- Annex to WG20 N706R - dispositon of comments on PDTR 14652 ballot Comments from the editing group that were acceded to by the editing group in order to make the document more consistent in the light of alignment principles as noted in ed.1 and ed.2. > ed.1 As one of the stated purposes of the TR is to be aligned with > POSIX standards, the TR should be aligned with the new amendment > to IS 9945-2 POSIX shell and utilities (.2b). Accepted. 1. The LC_CTYPE "class" keyword mechanism will be changed to "charclass" 2. "alnum" will be added as a new LC_CTYPE keyword. 3. LC_TIME era_d_t_fmt era_t_fmt era_fmt_ampm keywords will be added. 4. The charmap WIDTH section will be added, together with a width keyword in LC_CTYPE. > ed.2 As one of the stated purposes of the TR is to be aligned with > the 14651 standard, on which it depends, the TR should be aligned > with the latest changes in that standard (as agreed in the same > editing meeting) accepted. 1. The BNF for EOL was recursively defined in 7124-7125, and will be broken up so that EOL includes a comment and refers to a new eol-function rule that only includes the semantics for end of line. 2. In line 6999 reorder_sector_end will be removed. 3. In line 6985 collating_element is undefined, and in line 6986 collation-element is unused, the latter will be changed to "collation_element" 4. Line 6970 has an spelling error "spece+" which is corrected to "space+" 5. In line 6968 - BNF rule for collating_element, the words "'from' SP" will be added. 6. In line 6977 "section-symbol" will be changed to "section" 7. ucs-symbols in LC_COLLATE "i18n" will be reviewed. 8. Ranges will be introduced on collating_symbol keyword. 9. In the BNF two characters in the collelem-string keyword will be changed to "one or more". End of disposition of comments.