ISO/ IEC JTC1/SC22/WG14 N685


                             C9X Revision Proposal
                             =====================

     WG14/N685 (J11/97-048)                 WG14/N685 (J11/97-048)

     Title: Compatibility Issues with Union Members
     Author: Tom MacDonald and Bill Homer
     Author Affiliation: Cray Research, an SGI company
     Postal Address:     Cray Research Park
                         655F Lone Oak Drive
                         Eagan,  MN  55121
                         USA
     E-mail Address:     [email protected]      [email protected]
     Document Number:    WG14/N685   (J11/97-048)
     Telephone Number:   +1-612-683-5818
     Fax Number:         +1-612-683-5307
     Sponsor: J11
     Date: 08-May-1997
     Proposal Category:
        __ Editorial change/non-normative contribution
        __ Correction
        XX New feature
        __ Addition to obsolescent feature list
        __ Addition to Future Directions
        __ Other (please specify)  Change of current behavior
     Area of Standard Affected:
        __ Environment
        XX Language
        __ Preprocessor
        __ Library
           __ Macro/typedef/tag name
           __ Function
           __ Header
        __ Other (please specify)  ______________________________
     Prior Art:  Tag Compatibility
     Target Audience:  all C programmers
     Related Documents (if any):  NONE
     Proposal Attached: XX Yes   __ No, but what's your interest?

     Abstract: A discussion is presented on how initial common
               sequences in union members presents the same
               problem solved by the "tag compatibility" changes.
               Proposed edits to C9X to ameliorate this problem
               are also provided.

     ======================= Cover sheet ends here ==============

 Introduction:

 A new issue related to the tag compatibility issue has surfaced.
 As mentioned in recent email discussion of the representations of
 pointers, there is an "aliasing loophole" for union objects having
 structure members with a common initial sequence of members.

 The issue is that the loophole as written has the (apparently
 unintended) consequence of making alias analysis more difficult for
 _any_ pair of pointers to different structures that happen to share a
 common initial sequence.  See below for details and a proposal to limit
 the consequences of the loophole while still allowing programmers to
 take advantage of it when convenient.

 Historical perspective:

 If you remember, the following example is no longer considered to be
 strictly conforming:


		 x.c                        |        y.c
	 ___________________________________|________________________________
					    |
	  #include <stdio.h>                |  struct tag2 {
	  int func();                       |     int m1, m2;
					    |  };
	  struct tag1 {                     |  struct tag3 {
	     int m1, m2;                    |     int m1, m2;
	  } st1;                            |  };
					    |
	  main() {                          |  int func(struct tag2 *pst2,
	     if (func(&st1, &st1)) {        |           struct tag3 *pst3) {
		printf("optimized\n");      |     pst2->m1 = 2;
	     } else {                       |     pst3->m1 = 0;   /* alias? */
		printf("unoptimized\n");    |     return pst2->m1;
	     }                              |  }
	  }                                 |


 in that a highly optimizing compiler might produce "optimized" as
 the output of the program, while the same compiler with optimization
 turned off might produce "unoptimized" as the output.  This is because
 Translation Unit (TU) y.c defines "func" with 2 parameters each as
 pointers to different structures, and TU x.c calls "func" but passes
 the address of the same structure for each argument.

 We made this change to help optimizers, debuggers, lint-like tools, etc.

 New issue:

 Consider the following:


		 w.c
	 ___________________________________

	  #include <stdio.h>
	  int func();

	  union utag {
	    struct tag1 {
	       int m1;
	       double d2;
	    } st1;
	    struct tag2 {
	       int m1;
	       char c2;
	    } st2;
	  } un1;

	  main() {
	     if (similar_func(&un1.st1, &un1.st2)) {
		printf("optimized\n");
	     } else {
		printf("unoptimized\n");
	     }
	  }


 Since unions are allowed to have structure members with a common initial
 sequence, and you can modify one structure and then inspect the common
 initial part through a different structure, most of the perceived benefit
 of the new tag compatibility rules are lost.

 Proposed solution:

 The proposed solution is to require that a union declaration be visible
 if aliases through a common initial sequence (like the above) are possible.
 Therefore the following TU provides this kind of aliasing if desired:

 union utag {
   struct tag1 { int m1; double d2; } st1;
   struct tag2 { int m1; char c2; } st2;
 };

 int similar_func(struct tag1 *pst2, struct tag2 *pst3) {
   pst2->m1 = 2;
   pst3->m1 = 0;   /* might be an alias for pst2->m1 */
   return pst2->m1;
 }


 Here are some proposed words for C9X (with change bars in the left
 margin):

 ----------------------------------------------------------------------

	C9X Draft 9-pre3, 1997-02-15, WG14/Nxxx J11/97-xxx

	6.3.2.3  Structure and union members

	Constraints

	[#5] With one exception, if a member of a  union  object  is
	accessed after a value has been stored in a different member
	of the object, the  behavior  is  implementation-defined.54
	One  special  guarantee is made in order to simplify the use
	of unions:  If a  union  contains  several  structures  that
	share  a  common  initial  sequence  (see below), and if the
	union object currently contains one of these structures,  it
	is  permitted  to  inspect the common initial part of any of
	them.  Two structures share a  common  initial  sequence  if
	    ^
 |          anywhere that a declaration of the completed type of the
 |          union is visible
	corresponding  members  have compatible types (and, for bit-
	fields, the same widths) for  a  sequence  of  one  or  more
	initial members.


	__________

	54. The ``byte orders'' for scalar types  are  invisible  to
	    isolated  programs  that  do not indulge in type punning
	    (for example, by assigning to one member of a union  and
	    inspecting  the storage by accessing another member that
	    is an appropriately sized array of character type),  but
	    must  be  accounted  for  when  conforming to externally
	    imposed storage layouts.


	Examples                                                      |

	[#6]

	  1.  If f is a function returning a structure or union, and
	      x  is  a member of that structure or union, f().x is a
	      valid postfix expression but is not an lvalue.

	  2.  The following is a valid fragment:                      |

		      union {
			      struct {
				      int     alltypes;
			      } n;
			      struct {
				      int     type;
				      int     intnode;
			      } ni;
			      struct {
				      int     type;
				      double  doublenode;
			      } nf;
		      } u;
		      u.nf.type = 1;
		      u.nf.doublenode = 3.14;
		      /*...*/
		      if (u.n.alltypes == 1)
			      /*...*/ sin(u.nf.doublenode) /*...*/

 |          3.  The following is not a valid fragment (because
 |              the union type is not visible within function f):
 |
 |                    struct t1 { int m; };
 |                    struct t2 { int m; };
 |                    int f(struct t1 * p1, struct t2 * p2) {
 |                        if ( p1->m < 0 ) {
 |                            p2->m = -p2->m;
 |                        }
 |                        return p1->m;
 |                    }
 |                    int g() {
 |                        union { struct t1 s1; struct t2 s2; } u;
 |                        /* ... */
 |                        return f(&u.s1, &u.s2);
 |                    }