Doc. No.: | WG14/N1253 |
---|---|
Date: | 2007-09-06 |
Reply to: | Clark Nelson |
Phone: | +1-503-712-8433 |
Email: | [email protected] |
Consider the following program:
#include <stdio.h> struct X { char a[8]; }; struct X salutation() { struct X result = { "Hello" }; return result; } struct X addressee() { struct X result = { "world" }; return result; } int main() { printf("%s, %s!\n", salutation().a, addressee().a); return 0; }
In C++, this is a perfectly valid program, and prints "Hello, world!".
As far as C99 is concerned, it is syntactically valid, and violates no constraint.
However, it has undefined behavior, because of the last sentence of 6.5.2.2p5: "If
an attempt is made to modify the result of a function call or to access it after
the next sequence point, the behavior is undefined." And in this example, for each
call to a struct-returning function, the sequence point preceding the call to
printf
must necessarily come between the call and the access by
printf
of the string in the returned object.
What purpose is served by declaring programs such as this one to be non-conformant?
Let's consider another example. With the exception of main
, all
functions and declarations from the previous example are part of this example as
well:
int f() { return 'i'; } int main() { printf("%c%c!\n", salutation().a[0], f()); return 0; }
As before, in C++ the behavior of this program is well-defined: it prints "Hi!".
But in C99 it apparently has undefined behavior. This time, there is only one access
to an object resulting from a call to a struct-returning function, and it is right
in plain sight. But unfortunately, the standard allows the possibility that the
sequence point preceding the call to f
might come between the call
to salutation
and accessing (a single byte of) the value of the returned
object.
It should also be noted that in this case the undefined behavior doesn't depend on the fact that the field being accessed from the returned struct is an array; as far as the standard is concerned, undefined behavior is equally possible if a scalar member is selected. In other words, in adding to C99 the possibility of referencing an array member of a non-lvalue struct, accessing any member of a non-lvalue struct was turned into undefined behavior, if the full expression containing the non-lvalue struct contains any other operation that might introduce a sequence point. Formally speaking, this "broke" some conforming C89 programs -- depending on the strictness with which one interprets the phrase "after the next sequence point".
It might be possible to "patch" this problem, while substantially retaining the rule that a non-lvalue (a.k.a. temporary) struct can't be accessed after the "next" sequence point. But C++ has a very similar problem (in spades, actually), and the rule in C++ is that the lifetime of a temporary ends at the end of evaluation of the containing full expression. If the rule from C99 needs to be adjusted (and I think it does, if only to recover the ground that was lost from C89), then serious consideration should be given to switching to the C++ rule. This would tend to increase compatibility of both programs and implementations between C and C++.
Assuming the committee agrees that this is a problem that should be addressed, I am not certain what editorial approach the committee would prefer. Two different approaches have occurred to me.
The simplest and most direct would be to change the five relevant occurrences of the phrase "after the next sequence point" (in the descriptions of function call, conditional operator, assignment operator, comma operator, and the summary of undefined behavior in annex J) to instead read something like "after the end of the containing full expression". Unfortunately that wouldn't quite handle cases where the operations in question are contained in a full declarator instead of a full expression. But it would probably work to say that all the subexpressions of a full declarator together constitute a full expression.
The other approach would be more basic and more thorough. It would ascribe storage duration and lifetime to these non-lvalue objects. Then the five specific references described above would simply vanish, being subsumed by the general statement that accessing an object after the end of its lifetime gives undefined behavior. Besides getting rid of four special-case rules, this approach would have the advantage that the terms of description would be more compatible with those used in C++.