JTC1/SC22/WG14 N895

	Document: WG14 N895

WG14,

 Japan has the following three items for the agenda of WG14 Kona meeting.
 Please discuss about them during the meeting.


 1) Rationale of 64-bit integer type

    Japan needs to confirm whether the result of discussion at London
    meeting is completely reflected at revised Rationale (especially
    topic for elimination of long long overhead).
    (Japan would like to know whether or not a draft written by Mr.Gwyn
    after London meeting (see attached email SC22WG14.7328) is going to
    be included into revised Rationale.)


2) Guarantee the behavior of extended conversion utilities even after
   change of LC_CTYPE at specific condition

    Japan wants to propose a correction about the behavior of the extended
    multibyte and wide-character conversion utilities in 7.24.6.
    Please discuss about this issue at WG14 meeting.

    Current :   The behavior of the extended multibyte and wide-character
                conversion  utilities in 7.24.6 after change of LC_CTYPE
                is undefined.

    Proposal:   The behavior of the extended multibyte and wide-character
                conversion  utilities in 7.24.6 after change of LC_CTYPE
                is  guaranteed.


   Details:

     FDIS of C9X has the following specification:

        > 7.24.6  Extended multibyte and wide-character conversion
        > utilities
        > [...]
        > [#2] Most of the following functions   --   those  that  are
        > listed as ``restartable'', 7.24.6.3 and 7.24.6.4  -- take as
        > a last argument a pointer to an  object  of  type  mbstate_t
        > that is used to describe the current conversion state from a
        > particular multibyte character sequence to a  wide-character
        > sequence  (or  the  reverse) under the rules of a particular
        > setting for the LC_CTYPE category of the current locale.
        >
        > [#3]  The  initial  conversion  state  corresponds,  for   a
        > conversion  in  either  direction, to the beginning of a new
        > multibyte character in the initial  shift  state.   A  zero-
        > valued mbstate_t object is (at least) one way to describe an
        > initial conversion state.  A  zero-valued  mbstate_t  object
        > can  be  used to initiate conversion involving any multibyte
        > character sequence, in any LC_CTYPE category setting.  If an
        > mbstate_t  object  has  been altered by any of the functions
        > described in  this  subclause,  and  is  then  used  with  a
        > different  multibyte  character  sequence,  or  in the other
        > conversion direction, or with a different LC_CTYPE  category
        >                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        > setting  than  on  earlier  function  calls, the behavior is
        > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        > undefined.290)
        > ^^^^^^^^^
        > 290Thus, a particular mbstate_t  object  can  be  used,  for
        >    example, with both the mbrtowc and mbsrtowcs functions as
        >    long as they are used to step  sequentially  through  the
        >    same multibyte character string.

     The above specification is not Japan's original intention as
     the author of the Multibyte Support Extension a.k.a Amendment1.
     Japan's original intention about the behavior of the extended
     multibyte and wide-character conversion utilities is described
     in the Rationale of Amendment 1:

        "Annex H9.2.3 Multiple encoding environment" in the Amendment 1:
        > [...]
        > The encoding rule information is effectively a part of the
        > conversion state. Thus, the encoding information should be stored
        > with the hidden mbstate_t object with the FILE object. (Some
        > implementation may even choose to store the encoding rule as
        > part of the value of an fpos_t object.)  The conversion state
        > just created when a file is opened is said to have *unbound*
        > state because it has no relations to any of the encoding
        > rules. Just after the first wide-character input/output operation,
        > the conversion state is *bound* to the encoding rule which
        > correspond to LC_CTYPE category of the current locale.  The
        > following is a summary of the relations between various objects,
        > the shift state, and the encoding rules.
        >                      fpos_t        FILE
        >  shift state        | included    |included  |
        >  encoding rule      | maybe       | included |
        >  changing LC_CTYPE
        >          (unbound)  | no effect   | affected |
        >          (bound)    | no effect   | no effect|


        "Annex H.13.1 Conversion state" in the Amendment 1:
        > To handle multiple strings with a state-dependent encoding, the
        > committee introduced the concept of conversion state.  The
        > conversion state determines the behavior of a conversion between
        > multibyte and wide-character encodings.  For conversion from
        > multibyte to wide character, the conversion state stores
        > information such as the position within the current multibyte
        > character(as a sequence of characters or a wide-character
        > accumulator). And for conversions in either direction, the
        > conversion state stores the current shift state (if any) and
        > possible the encoding rule.
        > [...]


     Please consider the following program:

         setlocale(A);
         f1 = fopen("...", "r");
         wc = fgetwc(f1);

         setlocale(B);
         f2 = fopen("...", "w");

         while (wc != WEOF) {
             fputwc(wc, f2);
             wc = fgetrwc(f1);
         }

     According to the current C9X FDIS, the behavior of this program
     is undefined.  However, from the viewpoint of the design goal
     of MSE as described in the rationale of Amendment1, it should be
     well defined.  Namely, the conversion state  in bound state
     should not be affected by changing LC_CTYPE in order to support
     the multiple strings with a state-dependent encoding in multiple
     encoding environment.

     Please discuss and consider this issue at WG14 meeting.
     If more detail is needed, please email to [email protected].
     More technical discussion is welcome.


 3) WG14 Tokyo meeting 2000/Apr

    ITSCJ(Information Technology Standards Commission of Japan)
         http://www.itscj.ipsj.or.jp/eg/index.html
    will host the WG14 meeting which will be held on 2000-04-10/14.

    The meeting place is located in KIKAI-SHINKO-KAIKAN Building:
         3-5-8, Shiba-Koen, Minato-Ku, Tokyo 105-0011 Japan

    One room(Room #68, cap. 18 people, on the 6th floor) for WG14 meeting
    is already booked. (In Japan, 1st. floor is a ground floor.)

    Traffic information, hotel and other accommodation information and so on
    will be informed by ITSCJ or Makoto Noda to WG14 via email soon.



 Thank you,
 Makoto Noda, Chair of ITSCJ/SC22/C WG



 --------------------------
> Date:     Fri, 16 Jul 99 10:39:59 EDT
> From: "Douglas A. Gwyn (IST)" <[email protected]>
> X-Sequence: [email protected] 7328
> X-Errors-To: [email protected]
> To: Randy Meyers <[email protected]>
> Cc: sc22wg14 <[email protected]>
> Subject: (c.wg 8975) (SC22WG14.7328)   (SC22WG14.7294) Rationale for
elimination of long long overhead
>
> As a specific illustration of an implementation techgnique: the compiler
> can emit an external reference to a symbol ".i64used" (for example) if
> and only if the source code makes any use of a 64-bit integer type, and
> the C run-time object library can contain two versions of the _doprint
> module (which performs the actual work for the *printf family).  The first
> version of printf would define the symbol ".i64used" and contain support
> for formatting 64-bit integers, while the second version would do neither.
> If the two versions of _dorpnt and the *printf modules are ordered
> properly in the library, then the linker would automatically include the
> 64-bit supporting version to satisfy the reference to ".i64used", before
> seeing any reference to _doprint from the *printf modules, and then the
> *printf modules would use the already-defined 64-bit version of _doprint.
> If there were no reference to ".i64used" (because the program made no
> use of any 64-bit integer type), then the linker would skip the first
 > _doprnt module, would include *printf modules, then would include the
> non-64-bit _doprnt module in order to satisfy the referece to _doprnt
> from the *printf modules.
>