JTC1/SC22/WG14 N861

SC22/WG14 N861 1998-11-03

               Issues with CD2 (FCD1)
                Clive Feather
               [email protected]


The following 13 items represent further issues with CD2. They are in order
of location within the CD.


========
[Item 01]
Category: Inconsistency
Committee Draft subsection: 6.2.5, 6.5.3.4, 6.7

Title: Issues with prototypes and completeness.

Detailed description:

6.2.5p23 says "An array type of unknown size is an incomplete type". Is the
type "int [*]" (which can only occur within a prototype) complete or
incomplete ?

If it is complete, then what is its size ? This can occur in the construct
    int f (int a [sizeof (int [*][*])]);

It it is incomplete, then the type "int [*][*]" is not permitted, which is
clearly wrong.

Now consider the prototype:

    int g (int a []);

The parameter clearly has an incomplete type, but since a parameter is an
object (see 3.16) this is forbidden by 6.7p7:

    If an identifier for an object is declared with no linkage, the type
    for the object shall be complete by the end of its declarator, or by
    the end of its init-declarator if it has an initializer.

This is also clearly not what was intended.

One way to fix the first item would be to change 6.5.3.4p1 to read:

    [#1]  The  sizeof  operator  shall  not  be  applied  to  an
    expression that has function type or an incomplete type,  to
 || an array type with unspecified size 72a), to
    the  parenthesized name of such a type, or to an lvalue that
    designates a bit-field object.

 || 72a) An array type with unspecified size occurs in function
 || prototypes when the notation [*] is used, as in "int [*][5][*]".

One way to fix the second item would be to change 6.7p7 to read:

    If an identifier for an object is declared with no linkage, the type
    for the object shall be complete by the end of its declarator, or by
    the end of its init-declarator if it has an initializer;
 || in the case of function arguments (including in prototypes) this shall
 || be after making the adjustments of 6.7.5.3 (from array and function
 || types to pointer types).


========
[Item 02]
Category: Normative change to existing feature retaining the original intent
Committee Draft subsection: 6.2.5, 6.7

Title: Problems with flexible array members

Detailed description:

Sometime after CD1 the following wording was added to 6.2.5p23:

    A structure type containing a flexible array member is an
    incomplete type that cannot be completed.

Presumably this was done to eliminate some conceptual problems with
structures that contain such members. However, this change makes almost all
use of such structures forbidden, because it is no longer possible to take
their size, and it is unclear what other operations are valid. This was
also not the intent behind the original proposal.

On the other hand, if such a structure is a complete type, there are a
number of issues to be defined, such as what happens when the structure is
copied or initialized. These need to be addressed.

The wording defining flexible array members is in 6.7.2.1p15:

    [#15] As a special case, the last  element  of  a  structure
    with more than one named member may have an incomplete array
    type.  This is called a flexible array member, and the  size
    of  the  structure  shall be equal to the offset of the last
    element of an otherwise identical  structure  that  replaces
    the  flexible  array  member  with  an  array of unspecified
    length.95)  When an lvalue whose type is a structure with  a
    flexible  array  member  is  used  to  access  an object, it
    behaves as if that member were  replaced  with  the  longest
    array,  with  the same element type, that would not make the
    structure larger than the object being accessed; the  offset
    of the array shall remain that of the flexible array member,
    even if this would  differ  from  that  of  the  replacement
    array.   If  this  array  would  have  no  elements, then it
    behaves as if it  had  one  element,  but  the  behavior  is
    undefined  if  any attempt is made to access that element or
    to generate a pointer one past it.

A solution to the problem is to leave the structure as complete but have
the flexible member ignored in most contexts. To do this, delete the last
sentence of 6.2.5p23, and change 6.7.2.1p15 as follows:
 
    [#15] As a special case, the last  element  of  a  structure
    with more than one named member may have an incomplete array
 || type.  This is called a flexible array member. With two
 || exceptions the flexible array member is ignored. Firstly, the
 || size of the structure shall be equal to the offset of the last
    element of an otherwise identical  structure  that  replaces
    the  flexible  array  member  with  an  array of unspecified
    length.95) Secondly, when the . or -> operator has a left
 || operand which is, or is a pointer to, a structure with a flexible
 || array member and the right operand names that member, it
    behaves as if that member were  replaced  with  the  longest
    array,  with  the same element type, that would not make the
    structure larger than the object being accessed; the  offset
    of the array shall remain that of the flexible array member,
    even if this would  differ  from  that  of  the  replacement
    array.   If  this  array  would  have  no  elements, then it
    behaves as if it  had  one  element,  but  the  behavior  is
    undefined  if  any attempt is made to access that element or
    to generate a pointer one past it.

Finally, add further example text after 6.7.2.1p18:

    The assignment:

        *s1 = *s2;

    only copies the member n, and not any of the array elements.
    Similarly:

        struct s  t1 = { 0 };          // valid
        struct s  t2 = { 2 };          // valid
        struct ss tt = { 1, { 4.2 }};  // valid
        struct s  t3 = { 1, { 4.2 }};  // error; there is nothing
                                       // for the 42 to initialize

        t1.n = 4;                      // valid
        t1.d [0] = 4.2;                // undefined behavior


========
[Item 03]
Category: Inconsistency
Committee Draft subsection: 6.2.5, 6.7.2.2

Title: Circular definition of enumerated types

Detailed description:

6.7.2.2 para 4 says:

    Each enumerated type shall be compatible with an integer type. 

However, 6.2.5 para 17 says:

    The type char, the signed and unsigned integer types,
    and the enumerated types are collectively called integer types.

Thus we have a circular definition. To fix this, change the former to one
of:

    Each enumerated type shall be compatible with a signed or unsigned
    integer type. 

or:

    Each enumerated type shall be compatible with a standard integer type
    or an extended integer type. 


========
[Item 04]
Category: Clarification
Committee Draft subsection: 6.2.6.2

Title: Clarify aspects of negative zeros and related situations

Detailed description:

Subclause 6.2.6.1p2 makes it clear that there are only three permitted
representations for signed integers - two's complement, one's complement,
and sign-and-magnitude. It is reported, however, that certain
implementations have problems with the "minus zero" representation;
furthermore, because signed and unsigned integer types have the same
representation over the common value range, it is useful to know when a
minus zero can appear.

The suggested change is to alter the last part of this paragraph:

    If the sign bit is one, then the value shall be modified in one of the
    following ways:
    -- the corresponding value with sign bit 0 is negated;
    -- the sign bit has the value -2N;
    -- the sign bit has the value 1-2N.

to:
 
    If the sign bit is one, then the value shall be modified in one of the
    following ways:
    -- the corresponding value with sign bit 0 is negated (/sign and
       magnitude/);
    -- the sign bit has the value -2N (/two's complement/);
    -- the sign bit has the value 1-2N (/one's complement/).
    The implementation shall document which shall apply, and whether the
    value with sign bit 1 and all value bits 0 (for the first two), or
    with sign bit and all value bits 1 (for one's complement) is a trap
    representation or a normal value. In the case of sign and magnitude
    and one's complement, if this representation is a normal value it is
    called a /negative zero/.

    If the implementation supports negative zeros, then they shall only be
    generated by:
    - the & | ^ ~ << and >> operators with appropriate arguments;
    - the + - * / and % operators where one argument is a negative zero
      and the result is zero;
    - compound assignment operators based on the above cases.
    It is unspecified if these cases actually generate negative zero or
    normal zero, and whether a negative zero becomes a normal zero or
    remains a negative zero when stored in an object.

    If the implementation does not support negative zeros, the behavior
    of an & | ^ ~ << or >> operator with appropriate arguments is undefined.


========
[Item 05]
Category: Normative change to existing feature retaining the original intent
Committee Draft subsection: 6.3.2.3

Title: Null pointer constants should be castable to pointer types

Detailed description:

6.3.2.3p3 says that:

    If a null pointer constant is assigned
    to or compared for equality to a pointer,  the  constant  is
    converted to a pointer of that type.  Such a pointer, called
    a null pointer,

However, this doesn't cover cases such as:

    (char *) 0

which is neither an assignment or a comparison. Therefore this is not a
null pointer constant, but rather an implementation-defined conversion from
an integer to a pointer. This is clearly an oversight and should be fixed.

Either change:

    If a null pointer constant is assigned
    to or compared for equality to a pointer,  the  constant  is
    converted to a pointer of that type.  Such a pointer, called
    a null pointer,  is  guaranteed  to  compare  unequal  to  a
    pointer to any object or function.

to:

    If a null pointer constant is assigned
    to or compared for equality to a pointer,  the  constant  is
    converted to a pointer of that type. When a null pointer
    constant is converted to a pointer, the result (called a /null
    pointer/) is guaranteed to compare unequal to a pointer to any
    object or function.

or change:

    If a null pointer constant is assigned
    to or compared for equality to a pointer,  the  constant  is
    converted to a pointer of that type.  Such a pointer, called
    a null pointer,  is  guaranteed  to  compare  unequal  to  a
    pointer to any object or function.

    [#4]  Conversion  of  a null pointer to another pointer type
    yields a null pointer of that type.  Any two  null  pointers
    shall compare equal.

to:

    If a null pointer constant is assigned
    to or compared for equality to a pointer,  the  constant  is
    converted to a pointer of that type.

    A /null pointer/ is a special value of any given pointer type
    that is guaranteed to compare unequal to a pointer to any
    object or function. Conversion of a null pointer constant to a
    pointer type, or of a null pointer to another pointer type,
    yields a null pointer of that type.  Any two null pointers
    shall compare equal.


========
[Item 06]
Category: Inconsistency
Committee Draft subsection: 6.4

Title: UCNs as preprocessing-tokens

Detailed description:

In 6.4 the syntax for "preprocessing-token" includes:
    identifier
    each universal-character-name that cannot be one of the above

In 6.4.2.1 the syntax for "identifier" includes:
    identifier:
        identifier-nondigit
        identifier identifier-nondigit
        identifier digit

    identifier-nondigit:
        nondigit
        universal-character-name
        other implementation-defined characters

Therefore a universal-character-name is always a valid identifier
preprocessing token, and so the second alternative can never apply. It is
true that 6.4.2.3p3 makes certain constructs undefined, but this does not
alter the tokenisation.

There are two ways to fix this situation. The first is to delete the second
alternative for preprocessing-token. The second would be to add text to
6.4p3, or as a footnote, along the following lines:

    The alternative "each universal-character-name" that cannot be one of
    the above can never occur in the initial tokenisation of a program in
    translation phase 3. However, if an identifier includes a universal-
    character name that is not listed in Annex I, the implementation may
    choose to retokenise using this alternative.


========
[Item 07]
Category: Normative change where the intent is unclear
Committee Draft subsection: 6.7.5.2

Title: Side effects in VLAs

Detailed description:

There has been a long discussion on both the reflector and comp.std.c
about the issue of side effects in VLA types. I do not intend to repeat
all the arguments for and against. However, after considering all the
issues it seems to me that the problems are to do with function prototypes
more than they are to do with VLAs or side effects. In particular, code
such as:

    int n;
    /* ... */
    int vla [n++];

is both meaningful and easy to compile; the problems come with constructs
like:

    void f (int n, int a [n++]);

Therefore the changes proposed below implement the following principles:

* If a declarator or an abstract-declarator is not part of a parameter-
  declaration, any expressions occuring in variably modified types are
  evaluated in the normal way, including all function calls and side
  effects.
* If the declarator or abstract-declarator *is* part of a parameter-
  declaration, any expressions within array declarators are *not* evaluated,
  and thus the behaviour is as if all such expressions were replaced with
  "*" (though the latter would be forbidden in a function definition).

An optional addition to the proposal enforces the latter rule with a
constraint that forbids side effects and function calls in such expressions.

Recommended changes:

In 6.7.5.2, add a new heading and paragraph before the constraints:

    Definitions

    The direct-declarator or direct-abstract-declarator which forms the
    array declarator can occur in one of three contexts, each of which
    is given a name:
    (1) It is derived from the parameter-type-list of the declarator of
    a function definition. This is a /parameter-array-declarator/.
    (2) It is derived from some other parameter-type-list. This is a
    /prototype-array-declarator/.
    (3) It is not derived from a parameter-type-list. This is an
    /actual-array-declarator/.

In the existing paragraph 3, replace the two sentences:

    If the size expression is not a constant expression, and it is
    evaluated at program execution time, it shall evaluate to a value
    greater than zero. It is unspecified whether side effects are
    produced when the size expression is evaluated.

with:

    If the size expression is part of a parameter-array-declarator or
    of an actual-array declarator, and is not a constant expression, it
    shall be evaluated at program execution time (in the former case on
    each entry to the function) and shall evaluate to a value greater
    than zero. Otherwise it shall not be evaluated.

Optional addition: add a further constraint:

    A parameter-array-declarator or prototype-array-declarator shall not
    contain a function call operator, nor shall it contain any operator
    which can modify an object.


========
[Item 08]
Category: Normative change to existing feature retaining the original intent
Committee Draft subsection: 6.8.5

Title: Error in new for syntax

Detailed description:

C9X adds a new form of syntax for for statements:

    for ( declaration ; expr-opt ; expr-opt ) statement

However, 6.7 states that /declaration/ *includes* the trailing semicolon.
The simplest solution is to remove the corresponding semicolon in 6.8.5 and
not worry about the informal use of the term in 6.8.5.3p1. Alternatively
the syntax needs to be completely reviewed to allow the term to exclude the
trailing semicolon.


========
[Item 09]
Category: Inconsistency
Committee Draft subsection: 6.9

Title: References to sizeof not allowing for VLAs

Detailed description:

6.9p3 and p5 use sizeof without allowing for VLAs. In each case, change
the parenthetical remark:

    (other than as a part of the operand of a sizeof operator)

to:

    (other than as a part of the operand of a sizeof operator which
    is not evaluated)


========
[Item 10, based on PC-UK0027]
Category: Normative change to existing feature retaining the original intent
Committee Draft subsection: 6.10.3

Title: Problems with extended characters in object-like macros

Detailed description:

When an object-like macro is #defined, there is no requirement for a
delimiter between the macro identifier and the replacement list. This can
be a problem when extended characters are involved - for example, some
implementations view $ as valid in a macro identifier while others do not.
Thus the line:

    #define THIS$AND$THAT(x) ((x)+42)

can be parsed in either of two ways:

    Identifier       Arguments   Replacement list
    THIS             -           $AND$THAT(x) ((x)+42)
    THIS$AND$THAT    x           ((x)+42)

TC1 addressed this by requiring the use of a space in certain circumstances
so as to eliminate the ambiguity. However, this requirement has been
removed in C9X for good reasons. Regrettably this reintroduces the original
ambiguity.

The simplest solution seems to be to require a space - or possibly one of
the basic graphic characters - between the identifier and the replacement
list. This change is unlikely to affect much code, and is not a Quiet
Change - it will require a diagnostic for any affected code. It has the big
advantage of eliminating the ambiguity.

Insert a new Constraint in 6.10.3, either:

    In the definition of an object-like macro there shall be white space
    between the identifier and the replacement list.

or

    In the definition of an object-like macro there shall be white space
    between the identifier and the replacement list unless the replacement
    list begins with one of the 26 graphic characters in the required
    character set other than ( _ or \.


========
[Item 11]
Category: Feature that should be included
Committee Draft subsection: 7.8

Title: Missing functions for intmax_t values

Detailed description:

Several utility functions have versions for types int and long int, and when
long long was added corresponding versions were added. Then when intmax_t
was added to C9X, further versions were provided for some of these
functions. However, three cases were missed. For intmax_t to be useful to
the same audience as other features of the Standard, these three functions
should be added. Obviously they should be added to <inttypes.h>.

Add a new subclause 7.8.3:

    7.8.3  Miscellaneous functions

    7.8.3.1  The atoimax function

    Synopsis

        #include <inttypes.h>
        intmax_t atoimax(const char *nptr);

    Description

    The atoimax function converts the initial portion of the string
    pointed to by nptr to intmax_t representation. Except for the
    behaviour on error, it is equivalent to

        strtoimax(nptr, (char **)NULL, 10)

    The function atoimax need not affect the value of the integer
    expression errno on an error. If the value of the result cannot
    be represented, the behavior is undefined.

    Returns

    The atoimax function returns the converted value.

    7.8.3.2  The imaxabs function

    Synopsis

        #include <inttypes.h>
        intmax_t abs(intmax_t j);

    Description

    The imaxabs function computes the absolute value of an integer j.
    If the result cannot be represented, the behavior is undefined.

    Returns

    The imaxabs function returns the absolute value.

    7.8.3.3  The imaxdiv function

    Synopsis

        #include <inttypes.h>
        imaxdiv_t div(intmax_t numer, intmax_t denom);

    Description

    The imaxdiv function computes numer/denom and numer%denom in a
    single operation.

    Returns

    The imaxdiv function returns a structure of type imaxdiv_t,
    comprising both the quotient and the remainder. The structure shall
    contain (in either order) the members quot (the quotient) and rem
    (the remainder), each of which have the type intmax_t. If either
    part of the result cannot be represented, the behavior is undefined.
 
7.8 paragraph 2 will need consequential changes.


========
[Item 12]
Category: Clarification
Committee Draft subsection: 7.19.5.1

Title: Clarify meaning of a failed fclose

Detailed description:

If a call to fclose() fails it is not clear whether:
- it is still possible to access the stream;
- whether fflush(NULL) will attempt to flush the stream.
It is probably best to take the view that fclose always "closes" the stream
as far as the program is concerned, whether or not the process was fully
successful.

The existing wording - read strictly - also requires the full list of
actions to be carried out successfully whether or not the call fails. This
is clearly an oversight.

Change 7.19.5.1p2 to read:

    A successful call to the fclose function causes the stream pointed
    to by stream to be flushed and the associated file to be closed.
    Any unwritten buffered data for the stream are delivered to the
    host environment to be written to the file; any unread buffered
    data are discarded. Whether or not the call succeeds, the stream
    is disassociated from the file, and if the associated buffer was
    automatically allocated, it is deallocated.


========
[Item 13]
Category: Correction restoring original intent
Committee Draft subsection: 7.23.3.7

Title: Wrong time system notation used

Detailed description:

In 7.23.3.7p2, the expression "UTC-UT1" appears. This should read "TAI-UTC".




-- 
Clive D.W. Feather   | Regulation Officer, LINX | Work: <[email protected]>
Tel: +44 1733 705000 | (on secondment from      | Home: <[email protected]>
Fax: +44 1733 353929 |  Demon Internet)         | <http://i.am/davros>
Written on my laptop; please observe the Reply-To address