* fun with ?: @ 2007-05-19 2:52 Al Viro 2007-05-22 21:40 ` Josh Triplett 0 siblings, 1 reply; 24+ messages in thread From: Al Viro @ 2007-05-19 2:52 UTC (permalink / raw) To: linux-sparse; +Cc: Linus Torvalds There's an unpleasant case in conditional operator we are getting wrong. int *p; const void *v; int n; n ? p : (const void *)0 According to C standard, the type of that expression is const void *. Note that n ? p : (void *)0 is an entirely different story - it's int *. What's going on here is pretty simple: there are two degenerate cases of conditional operator: pointer vs. null pointer constant and pointer vs. possibly qualified pointer to void. Look at these cases: n ? p : NULL => should be the same type as p n ? p : v => clearly const void * - pointer to void with union of qualifiers; in this case we obviously lose any information about the type of object being pointed to. The tricky part comes from definition of what null pointer constant _is_. C allows two variants - integer constant expression with value 0 (we accept it, but warn about bad taste) and the same cast to void * (we also accept that, of course). Note that this is specific type - pointer to void. Without any qualifiers. We are guaranteed that we can convert it to any pointer type and get a pointer distinct from address of any object. So (const void *)0 is the same thing as (const void *)(void *)0 and it is the null pointer to const void. *HOWEVER*, it is not a null pointer constant. The standard is clear here and frankly, it's reasonable. If you cast to anything other than void *, then you presumably mean it and want the conversion rules as for any pointer of that type. Think of something like #ifdef FOO const void *f(int n); #else #define f(n) ((const void *)NULL) #endif You don't want to have types suddenly change under you depending on FOO. sparse is more liberal than standard C in what it accepts as null pointer constant. It almost never matters; however, in case of conditional operator we end up with a different type for an expression both sparse and any C compiler will accept as valid. I'm fixing other fun stuff in that area (e.g. we ought to take a union of qualifiers, ought _not_ to mix different structs or unions, etc.), so unless there are serious objections I'd rather go with standard behaviour in that case. What will change: int n; int *p; n ? p : (const void *)NULL int * => const void * n ? p : (const void *)0 ditto n ? p : (char *)0 int * => a warning on mixing int * with char * n ? p : (char *)NULL ditto n ? p : (void *)NULL int * => void * n ? p : (void *)0 unchanged n ? p : NULL unchanged Objections? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-19 2:52 fun with ?: Al Viro @ 2007-05-22 21:40 ` Josh Triplett 2007-05-22 22:46 ` Al Viro 0 siblings, 1 reply; 24+ messages in thread From: Josh Triplett @ 2007-05-22 21:40 UTC (permalink / raw) To: Al Viro; +Cc: linux-sparse, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 3462 bytes --] Al Viro wrote: > There's an unpleasant case in conditional operator we are getting > wrong. > int *p; > const void *v; > int n; > > n ? p : (const void *)0 > > According to C standard, the type of that expression is const void *. Note > that > n ? p : (void *)0 > is an entirely different story - it's int *. That much actually makes sense to me. You can convert from (int *) to (const void *), but not from (const void *) to (int *), so I'd expect the behavior of the first case. You *can* convert bidirectionally between (void *) and (int *), so I expect the behavior of the second case as well. > What's going on here is pretty simple: there are two degenerate cases of > conditional operator: pointer vs. null pointer constant and pointer vs. > possibly qualified pointer to void. Look at these cases: > n ? p : NULL => should be the same type as p > n ? p : v => clearly const void * - pointer to void with union of > qualifiers; in this case we obviously lose any information about the type > of object being pointed to. I didn't actually know about the special case for a null pointer constant. > The tricky part comes from definition of what null pointer constant _is_. > C allows two variants - integer constant expression with value 0 (we accept > it, but warn about bad taste) and the same cast to void * (we also accept > that, of course). Right. > Note that this is specific type - pointer to void. Without any qualifiers. > We are guaranteed that we can convert it to any pointer type and get > a pointer distinct from address of any object. So (const void *)0 is the same > thing as (const void *)(void *)0 and it is the null pointer to const void. > *HOWEVER*, it is not a null pointer constant. The standard is clear here and > frankly, it's reasonable. If you cast to anything other than void *, then > you presumably mean it and want the conversion rules as for any pointer > of that type. Think of something like > #ifdef FOO > const void *f(int n); > #else > #define f(n) ((const void *)NULL) > #endif > You don't want to have types suddenly change under you depending on FOO. Definitely not. I don't want qualifiers to disappear just because I applied them to NULL. > sparse is more liberal than standard C in what it accepts as null pointer > constant. It almost never matters; however, in case of conditional operator > we end up with a different type for an expression both sparse and any > C compiler will accept as valid. > > I'm fixing other fun stuff in that area (e.g. we ought to take a union of > qualifiers, ought _not_ to mix different structs or unions, etc.), so > unless there are serious objections I'd rather go with standard behaviour > in that case. What will change: > > int n; > int *p; > > n ? p : (const void *)NULL int * => const void * > n ? p : (const void *)0 ditto > n ? p : (char *)0 int * => a warning on mixing int * with char * > n ? p : (char *)NULL ditto > n ? p : (void *)NULL int * => void * > n ? p : (void *)0 unchanged > n ? p : NULL unchanged > > Objections? Thanks for the clear explanation. I think following the standard seems like a good idea here, though I find some of the cases somewhat unintuitive and potentially error-prone. In particular: > n ? p : (void *)NULL int * => void * Shouldn't this have type int * just like n ? p : NULL ? - Josh Triplett [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-22 21:40 ` Josh Triplett @ 2007-05-22 22:46 ` Al Viro 2007-05-22 23:24 ` Josh Triplett 0 siblings, 1 reply; 24+ messages in thread From: Al Viro @ 2007-05-22 22:46 UTC (permalink / raw) To: Josh Triplett; +Cc: linux-sparse, Linus Torvalds On Tue, May 22, 2007 at 02:40:11PM -0700, Josh Triplett wrote: > > What's going on here is pretty simple: there are two degenerate cases of > > conditional operator: pointer vs. null pointer constant and pointer vs. > > possibly qualified pointer to void. Look at these cases: > > n ? p : NULL => should be the same type as p > > n ? p : v => clearly const void * - pointer to void with union of > > qualifiers; in this case we obviously lose any information about the type > > of object being pointed to. > > I didn't actually know about the special case for a null pointer constant. Rationale is pretty simple: normally if you have void * in the mix, you _can't_ expect more type information from the result; i.e. you are not promised that result of ?: will point to int. However, null pointer constant is a chameleon - it accepts whatever pointer type you might need in given context. So in that case you do _not_ lose the type information. > In particular: > > n ? p : (void *)NULL int * => void * > Shouldn't this have type int * just like n ? p : NULL ? No. It's "void * and I _mean_ it". Well... actually (void *)(void *)0 if you want to be 100% portable and protect yourself against cretinous systems that define NULL to 0. Again, null pointer constant is not the same thing as null pointer to void. BTW, there's another painful area: what do we do to somebody who uses (void *)(69 + 1 - 70) as null pointer constant? Currenly sparse doesn't recognize it as such; C standard does. IMO the right thing to do is to add a flag that would switch to full-blown standard rules in that area ("integer constant expression returning 0" instead of basically "0 in some layers of ()") and flame to the crisp any wanker caught at actually doing that. Any suggestions re sufficiently violent warning messages? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-22 22:46 ` Al Viro @ 2007-05-22 23:24 ` Josh Triplett 2007-05-23 0:02 ` Al Viro 0 siblings, 1 reply; 24+ messages in thread From: Josh Triplett @ 2007-05-22 23:24 UTC (permalink / raw) To: Al Viro; +Cc: linux-sparse, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 2918 bytes --] Al Viro wrote: > On Tue, May 22, 2007 at 02:40:11PM -0700, Josh Triplett wrote: >>> What's going on here is pretty simple: there are two degenerate cases of >>> conditional operator: pointer vs. null pointer constant and pointer vs. >>> possibly qualified pointer to void. Look at these cases: >>> n ? p : NULL => should be the same type as p >>> n ? p : v => clearly const void * - pointer to void with union of >>> qualifiers; in this case we obviously lose any information about the type >>> of object being pointed to. >> I didn't actually know about the special case for a null pointer constant. > > Rationale is pretty simple: normally if you have void * in the mix, you > _can't_ expect more type information from the result; i.e. you are not > promised that result of ?: will point to int. However, null pointer constant > is a chameleon - it accepts whatever pointer type you might need in given > context. So in that case you do _not_ lose the type information. Makes sense, except that in C you can assign a void pointer to an arbitrary * without a warning, so why can't conditional expressions do the equivalent? >> In particular: >>> n ? p : (void *)NULL int * => void * >> Shouldn't this have type int * just like n ? p : NULL ? > > No. It's "void * and I _mean_ it". Well... actually (void *)(void *)0 if > you want to be 100% portable and protect yourself against cretinous systems > that define NULL to 0. > > Again, null pointer constant is not the same thing as null pointer to void. I see. I find it very strange that (void *)0 and (void *)(void *)0 have different behavior. I also find it strange that conditional expressions can't convert void * to an arbitrary pointer as assignment can. > BTW, there's another painful area: what do we do to somebody who uses > (void *)(69 + 1 - 70) as null pointer constant? Currenly sparse doesn't > recognize it as such; C standard does. IMO the right thing to do is > to add a flag that would switch to full-blown standard rules in that area > ("integer constant expression returning 0" instead of basically "0 in some > layers of ()") and flame to the crisp any wanker caught at actually doing > that. Any suggestions re sufficiently violent warning messages? I didn't know that the C standard actually *required* constant folding. Interesting. Would it add excessively to compilation time to apply the usual Sparse constant folding here? If so, and if you really think this case matters, let's have an option to turn on this constant folding, and warn whenever we see it. I'll let you come up with the wording; flame away. :) Anyone expecting that behavior has some serious dain-bramage. If constant folding *wouldn't* add excessively to the compilation time, go ahead and handle the insanity the standard way by default, but still warn for the insane case. - Josh Triplett [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-22 23:24 ` Josh Triplett @ 2007-05-23 0:02 ` Al Viro 2007-05-23 0:25 ` Al Viro ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Al Viro @ 2007-05-23 0:02 UTC (permalink / raw) To: Josh Triplett; +Cc: linux-sparse, Linus Torvalds On Tue, May 22, 2007 at 04:24:49PM -0700, Josh Triplett wrote: > Makes sense, except that in C you can assign a void pointer to an arbitrary * > without a warning, so why can't conditional expressions do the equivalent? {in sparse it's obviously not true due to address_space, but that hadn't affected C standard decisions; the rest applies to standard C} For one thing, no, you can't (pointers to functions have every right to be unrelated to void *). For another, you are not guaranteed that conversion from void * to int * will not yield undefined behaviour. It is OK if the value of void * is properly aligned; then you know that conversion back to void * will give the original pointer. Otherwise you are in nasal demon country. The other way round (int * to void * to int *) you are always safe. Rationale: systems that have pointers to words and char smaller than word. There you can very well have pointers to void and pointers to less-than-word objects bigger than pointers to anything word-sized or bigger (e.g. they can be represented as word address + bit offset). Conversion to the latter will lose offset and compiler is allowed to throw up on that at runtime. Moral: void * -> int * may lose parts of value and trigger undefined behaviour; int * -> void * loses information about type, is always revertible and always safe. In assignment operator it's your responsibility to make sure that void * you are converting to int * is properly aligned (anything that started its life as int * will be). In ?: C could * lose type information, allowing any values (result is void *); if you are sure that it's well-aligned, you can cast void * argument to int * or cast the result. * require the void * argument to be well-aligned, keep the type. The former makes more sense... > > Again, null pointer constant is not the same thing as null pointer to void. > > I see. I find it very strange that (void *)0 and (void *)(void *)0 have > different behavior. I also find it strange that conditional expressions can't > convert void * to an arbitrary pointer as assignment can. It would be nicer if C had __null__ as the *only* null pointer constant (with flexible type) and could refuse to accept anything else. Too late for that, unfortunately. As for conversions - see above. > > BTW, there's another painful area: what do we do to somebody who uses > > (void *)(69 + 1 - 70) as null pointer constant? Currenly sparse doesn't > > recognize it as such; C standard does. IMO the right thing to do is > > to add a flag that would switch to full-blown standard rules in that area > > ("integer constant expression returning 0" instead of basically "0 in some > > layers of ()") and flame to the crisp any wanker caught at actually doing > > that. Any suggestions re sufficiently violent warning messages? > > I didn't know that the C standard actually *required* constant folding. > Interesting. Would it add excessively to compilation time to apply the usual > Sparse constant folding here? If so, and if you really think this case > matters, let's have an option to turn on this constant folding, and warn > whenever we see it. Usual sparse constant folding is _almost_ OK, provided that we play a bit with evaluate_expression() and let it decide if subexpression is an integer constant expression. No prototype changes, we just get a global flag and save/restore it around sizeof argument handling and several other places. It's actually pretty easy. And if we get "it is an integer constant expression", well, then caller can call expand stuff. "Almost" bit above refers to another bit of insanity, fortunately easily handled; we need division by zero to raise no error and just yield "the entire thing is not an integer constant expression with value 0". That's exactly what longjmp() is for... Speaking of C standard, if you need access to the current one - yell. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 0:02 ` Al Viro @ 2007-05-23 0:25 ` Al Viro 2007-05-23 1:05 ` Josh Triplett 2007-05-23 4:53 ` Al Viro 2007-05-23 1:03 ` Josh Triplett 2007-05-23 14:25 ` Neil Booth 2 siblings, 2 replies; 24+ messages in thread From: Al Viro @ 2007-05-23 0:25 UTC (permalink / raw) To: Josh Triplett; +Cc: linux-sparse, Linus Torvalds On Wed, May 23, 2007 at 01:02:34AM +0100, Al Viro wrote: > Moral: void * -> int * may lose parts of value and trigger undefined behaviour; > int * -> void * loses information about type, is always revertible and always > safe. In assignment operator it's your responsibility to make sure that > void * you are converting to int * is properly aligned (anything that started > its life as int * will be). In ?: C could > * lose type information, allowing any values (result is void *); if > you are sure that it's well-aligned, you can cast void * argument to int * > or cast the result. > * require the void * argument to be well-aligned, keep the type. > > The former makes more sense... PS: that's a design decision that had to be made back when void * got added to the language and it had to work for all architectures. IOW, the text above is explanation why it had to be done that way and why we have to stick to it on all targets. > > I see. I find it very strange that (void *)0 and (void *)(void *)0 have > > different behavior. I also find it strange that conditional expressions can't > > convert void * to an arbitrary pointer as assignment can. > > It would be nicer if C had __null__ as the *only* null pointer constant > (with flexible type) and could refuse to accept anything else. Too late > for that, unfortunately. As for conversions - see above. To clarify: all mess with null pointer constants comes from lack of explicit token and need to avoid massive breakage of old programs. That's what it's all about - compiler recognizing some subexpressions as representations of that missing token and trying to do that in a way that would break as little as possible of existing C code. It's an old story - decisions had to be made in 80s and now we are stuck with them. IOW, (void *)0 in contexts that allow null pointer constant is *not* a 0 cast to pointer to void; it's a compiler-recognized kludge for __null__. And it's not a pointer to void. It can become a pointer to any type, including void. If converted to a pointer type it gives the same value you get if you convert 0 to that type ("null pointer to type"). But unlike null pointer to type it retains full flexibility. NULL is required to expand to null pointer constant and that's one of the reasons why sane code should be using it instead of explicitly spelled variants. The next best thing to actually having __null__ in the language... ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 0:25 ` Al Viro @ 2007-05-23 1:05 ` Josh Triplett 2007-05-23 4:53 ` Al Viro 1 sibling, 0 replies; 24+ messages in thread From: Josh Triplett @ 2007-05-23 1:05 UTC (permalink / raw) To: Al Viro; +Cc: linux-sparse, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 1492 bytes --] Al Viro wrote: > On Wed, May 23, 2007 at 01:02:34AM +0100, Al Viro wrote: >> It would be nicer if C had __null__ as the *only* null pointer constant >> (with flexible type) and could refuse to accept anything else. Too late >> for that, unfortunately. As for conversions - see above. > > To clarify: all mess with null pointer constants comes from lack of > explicit token and need to avoid massive breakage of old programs. That's > what it's all about - compiler recognizing some subexpressions as > representations of that missing token and trying to do that in a way that > would break as little as possible of existing C code. It's an old story - > decisions had to be made in 80s and now we are stuck with them. > > IOW, (void *)0 in contexts that allow null pointer constant is *not* a > 0 cast to pointer to void; it's a compiler-recognized kludge for __null__. > And it's not a pointer to void. It can become a pointer to any type, > including void. If converted to a pointer type it gives the same value > you get if you convert 0 to that type ("null pointer to type"). But > unlike null pointer to type it retains full flexibility. > > NULL is required to expand to null pointer constant and that's one of > the reasons why sane code should be using it instead of explicitly spelled > variants. The next best thing to actually having __null__ in the language... That makes perfect sense now. Thanks for the explanation. - Josh Triplett [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 0:25 ` Al Viro 2007-05-23 1:05 ` Josh Triplett @ 2007-05-23 4:53 ` Al Viro 2007-05-23 12:26 ` Morten Welinder 1 sibling, 1 reply; 24+ messages in thread From: Al Viro @ 2007-05-23 4:53 UTC (permalink / raw) To: Josh Triplett; +Cc: linux-sparse, Linus Torvalds On Wed, May 23, 2007 at 01:25:06AM +0100, Al Viro wrote: > To clarify: all mess with null pointer constants comes from lack of > explicit token and need to avoid massive breakage of old programs. That's > what it's all about - compiler recognizing some subexpressions as > representations of that missing token and trying to do that in a way that > would break as little as possible of existing C code. It's an old story - > decisions had to be made in 80s and now we are stuck with them. > > IOW, (void *)0 in contexts that allow null pointer constant is *not* a > 0 cast to pointer to void; it's a compiler-recognized kludge for __null__. > And it's not a pointer to void. It can become a pointer to any type, > including void. If converted to a pointer type it gives the same value > you get if you convert 0 to that type ("null pointer to type"). But > unlike null pointer to type it retains full flexibility. PPS: original idiom for that sucker was, of course, simply 0. My understanding (and I'm seriously risking playing wrongbot here, so take it with big grain of salt) is that raise of "cast 0 to pointer" idiom was mostly due to targets where sizeof(pointer) > sizeof(int); there the lack of prototypes meant that passing 0 in pointer argument would do the wrong thing. I _think_ it predates the introduction of void *, but not by much; 32V source is available, so that would be a useful data point. When use of void * stopped being serious portability issue it became the best variant in that family (the most type-neutral of available options). Then (void *)0 went into Feb 86 ANSI draft as acceptable alternative for null pointer constant (i.e. recognized by compiler and converted to type appropriate by context). BTW, note that null pointer constant is *not* recognized in variable part of argument list when you are calling a vararg function, so there you get what you get - integer-type 0 or null pointer to void. Which may make life bloody interesting wrt portability. Fortunately, these days userland headers tend to have NULL defined to (void *)0 or integer 0 of the right size, so you are probably OK as long as you use va_arg(..., void *) to get it, but $DEITY help you if you pass NULL in vararg in place of pointer to function - on minimally weird targets it will be ugly. I don't think that sparse can catch that kind of braindamage, though... ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 4:53 ` Al Viro @ 2007-05-23 12:26 ` Morten Welinder 0 siblings, 0 replies; 24+ messages in thread From: Morten Welinder @ 2007-05-23 12:26 UTC (permalink / raw) To: Al Viro; +Cc: Josh Triplett, linux-sparse, Linus Torvalds >[...] but $DEITY help you if you pass NULL > in vararg in place of pointer to function - on minimally weird targets > it will be ugly. I don't think that sparse can catch that kind of > braindamage, though... If the rules for the types in any given function's varargs can be taught to sparse, I don't see why not. I did some vararg checking -- "must end with -1" kind of thing -- early for sparse. It should be in the archives. (An early prototype with sub-standard monkey-see-monkey-do coding, but still.) Morten ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 0:02 ` Al Viro 2007-05-23 0:25 ` Al Viro @ 2007-05-23 1:03 ` Josh Triplett 2007-06-03 1:05 ` Al Viro 2007-05-23 14:25 ` Neil Booth 2 siblings, 1 reply; 24+ messages in thread From: Josh Triplett @ 2007-05-23 1:03 UTC (permalink / raw) To: Al Viro; +Cc: linux-sparse, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 4946 bytes --] Al Viro wrote: > On Tue, May 22, 2007 at 04:24:49PM -0700, Josh Triplett wrote: >> Makes sense, except that in C you can assign a void pointer to an arbitrary * >> without a warning, so why can't conditional expressions do the equivalent? > > {in sparse it's obviously not true due to address_space, but that hadn't > affected C standard decisions; the rest applies to standard C} > > For one thing, no, you can't (pointers to functions have every right to > be unrelated to void *). For another, you are not guaranteed that > conversion from void * to int * will not yield undefined behaviour. > It is OK if the value of void * is properly aligned; then you know > that conversion back to void * will give the original pointer. Otherwise > you are in nasal demon country. The other way round (int * to void * > to int *) you are always safe. > > Rationale: systems that have pointers to words and char smaller than word. > There you can very well have pointers to void and pointers to less-than-word > objects bigger than pointers to anything word-sized or bigger (e.g. they > can be represented as word address + bit offset). Conversion to the latter > will lose offset and compiler is allowed to throw up on that at runtime. Go Cray go. > Moral: void * -> int * may lose parts of value and trigger undefined behaviour; > int * -> void * loses information about type, is always revertible and always > safe. In assignment operator it's your responsibility to make sure that > void * you are converting to int * is properly aligned (anything that started > its life as int * will be). In ?: C could > * lose type information, allowing any values (result is void *); if > you are sure that it's well-aligned, you can cast void * argument to int * > or cast the result. > * require the void * argument to be well-aligned, keep the type. > > The former makes more sense... Fair enough. >>> Again, null pointer constant is not the same thing as null pointer to void. >> I see. I find it very strange that (void *)0 and (void *)(void *)0 have >> different behavior. I also find it strange that conditional expressions can't >> convert void * to an arbitrary pointer as assignment can. > > It would be nicer if C had __null__ as the *only* null pointer constant > (with flexible type) and could refuse to accept anything else. Too late > for that, unfortunately. As for conversions - see above. Agreed; that seems far saner. (Or, as long as we wishfully redefine the original C spec, just NULL or null, sans underscores.) >>> BTW, there's another painful area: what do we do to somebody who uses >>> (void *)(69 + 1 - 70) as null pointer constant? Currenly sparse doesn't >>> recognize it as such; C standard does. IMO the right thing to do is >>> to add a flag that would switch to full-blown standard rules in that area >>> ("integer constant expression returning 0" instead of basically "0 in some >>> layers of ()") and flame to the crisp any wanker caught at actually doing >>> that. Any suggestions re sufficiently violent warning messages? >> I didn't know that the C standard actually *required* constant folding. >> Interesting. Would it add excessively to compilation time to apply the usual >> Sparse constant folding here? If so, and if you really think this case >> matters, let's have an option to turn on this constant folding, and warn >> whenever we see it. > > Usual sparse constant folding is _almost_ OK, provided that we play a bit > with evaluate_expression() and let it decide if subexpression is an integer > constant expression. No prototype changes, we just get a global flag > and save/restore it around sizeof argument handling and several other > places. It's actually pretty easy. And if we get "it is an integer > constant expression", well, then caller can call expand stuff. Sounds reasonable to me. Possibly better written as some kind of generic parse-tree-walking operation, but this approach should work fine. > "Almost" bit above refers to another bit of insanity, fortunately easily > handled; we need division by zero to raise no error and just yield "the > entire thing is not an integer constant expression with value 0". That's > exactly what longjmp() is for... $ egrep -r 'setjmp|longjmp' ~/src/sparse/* $ Let's keep it that way, please. :) Evaluation almost certainly should warn for a compile-time divide-by-zero regardless, though we don't want to warn twice for the same expression. With the global "check for integer constant expression" flag set, if evaluation encounters a compile-time divide-by-zero, evaluation could just set a divided_by_zero flag. Or something like that. > Speaking of C standard, if you need access to the current one - yell. I use http://c0x.coding-guidelines.com/ , which has the C99 spec and some subsequent updates. - Josh Triplett [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 1:03 ` Josh Triplett @ 2007-06-03 1:05 ` Al Viro 0 siblings, 0 replies; 24+ messages in thread From: Al Viro @ 2007-06-03 1:05 UTC (permalink / raw) To: Josh Triplett; +Cc: linux-sparse, Linus Torvalds On Tue, May 22, 2007 at 06:03:31PM -0700, Josh Triplett wrote: > > Usual sparse constant folding is _almost_ OK, provided that we play a bit > > with evaluate_expression() and let it decide if subexpression is an integer > > constant expression. No prototype changes, we just get a global flag > > and save/restore it around sizeof argument handling and several other > > places. It's actually pretty easy. And if we get "it is an integer > > constant expression", well, then caller can call expand stuff. > > Sounds reasonable to me. Possibly better written as some kind of generic > parse-tree-walking operation, but this approach should work fine. Hrm... Actually, "global flag" approach doesn't work at all, since we mishandle already-evaluated subtrees (happens with inlined functions). AFAICS, here's what we can do with relatively little PITA: Split some of the EXPR_... cases in two; additional variant would differ by | EXPR_INT_CONST, which would be 1U<<31 (let's call it i-bit). EXPR_CAST gets split in three - normal, EXPR_CAST with i-bit and EXPR_CAST with f-bit (1U<<30). At parse time, do the following: * EXPR_VALUE, EXPR_TYPE, EXPR_SIZEOF, EXPR_PTRSIZEOF and EXPR_ALIGNOF get i-bit. * EXPR_COMPARE, EXPR_BINOP, EXPR_LOGICAL, EXPR_CONDITIONAL, EXPR_PREOP with !, + , -, ~ or ( and EXPR_CAST get i-bit if all their arguments have it. * EXPR_CAST also gets both i- and f-bit if its argument is EXPR_FVALUE, possibly wrapped into some EXPR_PREOP[(]. That won't really complicate the parser. Now, at that point i-bit is weaker than "subexpression is an integer constant expression", but not by much - the only things we are missing are "it typechecks OK" and "all casts are to integer types" (note that sizeof(VLA) would have to be caught when we start handling those; as it is, it fails typechecking, plain and simple). So what we do at evaluate_expression() is simple - we remove some i-bits. Rules: * EXPR_COMPARE, EXPR_BINOP, EXPR_LOGICAL, EXPR_CONDITIONAL, EXPR_PREOP: lose i-bit if some argument doesn't have it after evaluation. * EXPR_IMPLIED_CAST to an integer type inherits i-bit from argument. * EXPR_CAST loses i-bit if the type we are casting to is not an integer one; it also loses i-bit if evaluated argument doesn't have i-bit *and* EXPR_CAST itself doesn't have f-bit. * in cannibalizing EXPR_PREOP in &*p and *&p simplifications, keep the (lack of) i-bit and f-bit on the overwritten node. In expand_expression() we keep i-bit through the replacement. That's it. Now, "has i-bit after evaluate_expression()" == "expression is an integer constant one". Now, we do the following: * static struct symbol null_ctype, initialized to SYM_PTR over void. * evaluation of EXPR_CAST with target type being a pointer to void and argument bearing an i-bit should check if argument is in fact 0; if it is, replace the node with EXPR_VALUE[0] and &null_ctype as type. * int is_null_pointer_constant(expr) - return 1 if type is &null_ctype, 2 if argument bears i-bit and is in fact 0, return 0 otherwise. * have degenerate() turn &null_ctype into a normal pointer to void * callers of degenerate() in contexts where we care about null pointer constants (?:, assignment, argument of function call, pointer comparison, cast) should do is_null_pointer_constant() first. Act accordingly, warn if it had returned 2. Note that we can't blindly generate a warning in is_null_pointer_constant() - sometimes 0 is simply 0 and we don't want it to become a pointer at all, let alone generate any warnings. AFAICS, that will do the right thing with little pain. I'm not all that happy about splitting EXPR_... (in effect, stealing bit from expr->type); perhaps reducing the size of expr->type and adding expr->flags would be better. Hell knows... Comments? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 0:02 ` Al Viro 2007-05-23 0:25 ` Al Viro 2007-05-23 1:03 ` Josh Triplett @ 2007-05-23 14:25 ` Neil Booth 2007-05-23 14:32 ` Al Viro 2 siblings, 1 reply; 24+ messages in thread From: Neil Booth @ 2007-05-23 14:25 UTC (permalink / raw) To: Al Viro; +Cc: Josh Triplett, linux-sparse, Linus Torvalds Al Viro wrote:- > > Sparse constant folding here? If so, and if you really think this case > > matters, let's have an option to turn on this constant folding, and warn > > whenever we see it. > > Usual sparse constant folding is _almost_ OK, provided that we play a bit > with evaluate_expression() and let it decide if subexpression is an integer > constant expression. No prototype changes, we just get a global flag > and save/restore it around sizeof argument handling and several other > places. It's actually pretty easy. And if we get "it is an integer > constant expression", well, then caller can call expand stuff. > > "Almost" bit above refers to another bit of insanity, fortunately easily > handled; we need division by zero to raise no error and just yield "the > entire thing is not an integer constant expression with value 0". That's > exactly what longjmp() is for... I respect you too much Al to doubt you, but I do warn you that getting the rules for integer constant expressions right in C is harder than it looks. GCC is not very close. The immediate cast of float bit painful in recursive descent parsers. I've managed to find cases where Comeau's online compiler doesn't get it right, and they're pretty good. I have a test suite that tests these things to an unhealthy level of pedantry for my own implementation (the only one I know passes the lot, of course 8-); happy to run sparse when you've finished if you like. Neil. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 14:25 ` Neil Booth @ 2007-05-23 14:32 ` Al Viro 2007-05-23 14:47 ` Neil Booth 2007-05-23 21:16 ` Derek M Jones 0 siblings, 2 replies; 24+ messages in thread From: Al Viro @ 2007-05-23 14:32 UTC (permalink / raw) To: Neil Booth; +Cc: Josh Triplett, linux-sparse, Linus Torvalds On Wed, May 23, 2007 at 11:25:44PM +0900, Neil Booth wrote: > I respect you too much Al to doubt you, but I do warn you that > getting the rules for integer constant expressions right in C is > harder than it looks. GCC is not very close. The immediate > cast of float bit painful in recursive descent parsers. I've managed > to find cases where Comeau's online compiler doesn't get it right, > and they're pretty good. > > I have a test suite that tests these things to an unhealthy level of > pedantry for my own implementation (the only one I know passes the lot, > of course 8-); happy to run sparse when you've finished if you like. gcc integer constant expressions handling is a bad joke. extern int n; struct { int x : 1 + n - n; } y; passes with -pedantic -std=c99. Replacing that with 1 + n - n + n - n is still OK with gcc; 1 + n + n - n - n is not. So that's hardly an example of, well, anything. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 14:32 ` Al Viro @ 2007-05-23 14:47 ` Neil Booth 2007-05-23 15:32 ` Al Viro 2007-05-23 21:16 ` Derek M Jones 1 sibling, 1 reply; 24+ messages in thread From: Neil Booth @ 2007-05-23 14:47 UTC (permalink / raw) To: Al Viro; +Cc: Josh Triplett, linux-sparse, Linus Torvalds Al Viro wrote:- > gcc integer constant expressions handling is a bad joke. > > extern int n; > struct { > int x : 1 + n - n; > } y; > > passes with -pedantic -std=c99. Replacing that with 1 + n - n + n - n > is still OK with gcc; 1 + n + n - n - n is not. > > So that's hardly an example of, well, anything. Consistency? :-) I wasn't aware of the quirk of the second example. Apparently these expressions are only folded if their net result is obvious as parsed by the grammar. Fixing this in GCC is a horrendous amount of work; so much internal logic relies on this early simplification, which is why it's not been done I guess. Neil. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 14:47 ` Neil Booth @ 2007-05-23 15:32 ` Al Viro 2007-05-23 23:01 ` Neil Booth 0 siblings, 1 reply; 24+ messages in thread From: Al Viro @ 2007-05-23 15:32 UTC (permalink / raw) To: Neil Booth; +Cc: Josh Triplett, linux-sparse, Linus Torvalds On Wed, May 23, 2007 at 11:47:16PM +0900, Neil Booth wrote: > > gcc integer constant expressions handling is a bad joke. > > > > extern int n; > > struct { > > int x : 1 + n - n; > > } y; > > > > passes with -pedantic -std=c99. Replacing that with 1 + n - n + n - n > > is still OK with gcc; 1 + n + n - n - n is not. > > > > So that's hardly an example of, well, anything. > > Consistency? :-) I wasn't aware of the quirk of the second example. > > Apparently these expressions are only folded if their net result > is obvious as parsed by the grammar. Fixing this in GCC is a > horrendous amount of work; so much internal logic relies on this > early simplification, which is why it's not been done I guess. With sparse that's easier - we have parsing and typechecking separated enough. And yes, blind evaluate + expand will suffer similar problems. However, it's not hard to have evaluate_expression() to set a flag when it steps into a construct prohibited in integer constant expressions. BTW, immediate cast from float is not hard - you need all casts other than under sizeof go to integer type *and* you need any operations involving floating point types to set "not an integer constant expression". Since comma and function calls are also banned, float will either go all way up or it will be eaten by cast. BTW, the fun question is whether (int)(1.1) is allowed; the same goes for "is ((void *)0) a null pointer constant". 6.5.1 is sloppy ;-) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 15:32 ` Al Viro @ 2007-05-23 23:01 ` Neil Booth 2007-05-24 0:10 ` Derek M Jones 2007-05-24 0:14 ` Al Viro 0 siblings, 2 replies; 24+ messages in thread From: Neil Booth @ 2007-05-23 23:01 UTC (permalink / raw) To: Al Viro; +Cc: Josh Triplett, linux-sparse, Linus Torvalds Al Viro wrote:- > BTW, the fun question is whether (int)(1.1) is allowed; the same goes > for "is ((void *)0) a null pointer constant". 6.5.1 is sloppy ;-) I believe that the intent is that parentheses are not viewed as operators, just a grouping tool, so yes to both. All implementations I'm aware of take that view. Neil. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 23:01 ` Neil Booth @ 2007-05-24 0:10 ` Derek M Jones 2007-05-24 0:14 ` Al Viro 1 sibling, 0 replies; 24+ messages in thread From: Derek M Jones @ 2007-05-24 0:10 UTC (permalink / raw) To: Neil Booth; +Cc: Al Viro, Josh Triplett, linux-sparse, Linus Torvalds Neil, >> BTW, the fun question is whether (int)(1.1) is allowed; the same goes >> for "is ((void *)0) a null pointer constant". 6.5.1 is sloppy ;-) > > I believe that the intent is that parentheses are not viewed as > operators, I suspect that many implementations simply throw them away (or rather don't bother hanging onto them) in this case, and so work by 'accident'. > just a grouping tool, so yes to both. All implementations > I'm aware of take that view. The word 'immediate' in sentence 1318 http://c0x.coding-guidelines.com/6.6.html "... floating constants that are the immediate operands of casts" could be interpreted to mean that no other tokens occur between the cast and its operand. I suspect that the behavior of existing implementations would be enough to push any implementation that did not treat (int)(1.1) as an integer constant into accepting it as such. -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:derek@knosof.co.uk Applications Standards Conformance Testing http://www.knosof.co.uk ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 23:01 ` Neil Booth 2007-05-24 0:10 ` Derek M Jones @ 2007-05-24 0:14 ` Al Viro 1 sibling, 0 replies; 24+ messages in thread From: Al Viro @ 2007-05-24 0:14 UTC (permalink / raw) To: Neil Booth; +Cc: Josh Triplett, linux-sparse, Linus Torvalds On Thu, May 24, 2007 at 08:01:43AM +0900, Neil Booth wrote: > Al Viro wrote:- > > > BTW, the fun question is whether (int)(1.1) is allowed; the same goes > > for "is ((void *)0) a null pointer constant". 6.5.1 is sloppy ;-) > > I believe that the intent is that parentheses are not viewed as > operators, just a grouping tool, so yes to both. All implementations > I'm aware of take that view. Sure - especially for the second example ;-) And no, I don't think anybody can be arsed to file a DR about that. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 14:32 ` Al Viro 2007-05-23 14:47 ` Neil Booth @ 2007-05-23 21:16 ` Derek M Jones 2007-05-23 21:59 ` Linus Torvalds 2007-05-24 1:36 ` Brett Nash 1 sibling, 2 replies; 24+ messages in thread From: Derek M Jones @ 2007-05-23 21:16 UTC (permalink / raw) To: Al Viro; +Cc: Neil Booth, Josh Triplett, linux-sparse, Linus Torvalds Al, >> cast of float bit painful in recursive descent parsers. I've managed >> to find cases where Comeau's online compiler doesn't get it right, >> and they're pretty good. I would be interested to know what these cases were. > extern int n; > struct { > int x : 1 + n - n; > } y; > > passes with -pedantic -std=c99. Replacing that with 1 + n - n + n - n > is still OK with gcc; 1 + n + n - n - n is not. > > So that's hardly an example of, well, anything. It is an example of order of evaluation mattering when overflow occurs. What with game programming growing and growing in importance I think it won't be long before saturated interger arithmetic overflow will be encountered just as often as the 'conventional' wrapping behavior. -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:derek@knosof.co.uk Applications Standards Conformance Testing http://www.knosof.co.uk ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 21:16 ` Derek M Jones @ 2007-05-23 21:59 ` Linus Torvalds 2007-05-23 23:29 ` Derek M Jones 2007-05-24 1:36 ` Brett Nash 1 sibling, 1 reply; 24+ messages in thread From: Linus Torvalds @ 2007-05-23 21:59 UTC (permalink / raw) To: Derek M Jones; +Cc: Al Viro, Neil Booth, Josh Triplett, linux-sparse On Wed, 23 May 2007, Derek M Jones wrote: > > > > passes with -pedantic -std=c99. Replacing that with 1 + n - n + n - n > > is still OK with gcc; 1 + n + n - n - n is not. > > > > So that's hardly an example of, well, anything. > > It is an example of order of evaluation mattering when overflow > occurs. No it isn't. "1 + n - n" can overflow equally as "1 + n + n - n -n" can, and if you want them to do saturation or something, you cannot optimize _either_ of them to just "1". If "n" is MAX_INT, then with saturating arithmetic, neither of them results in 1. Not that signed overflow is even specified by the C standard (and unsigned is specified to be well-behaved). So it seems to be purely a compiler misfeature. No excuses. Linus ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 21:59 ` Linus Torvalds @ 2007-05-23 23:29 ` Derek M Jones 2007-05-24 0:02 ` Al Viro 2007-05-24 0:29 ` Linus Torvalds 0 siblings, 2 replies; 24+ messages in thread From: Derek M Jones @ 2007-05-23 23:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, Neil Booth, Josh Triplett, linux-sparse Linus, >>> passes with -pedantic -std=c99. Replacing that with 1 + n - n + n - n >>> is still OK with gcc; 1 + n + n - n - n is not. >>> >>> So that's hardly an example of, well, anything. >> It is an example of order of evaluation mattering when overflow >> occurs. > > No it isn't. It was intended as a probabilit statement (ok, I did not make that clear). An expression containing n+n is more likely to overflow than one containing n-n. Anyway, getting away from nit-picking of what was intended to be a throw away remark. > "1 + n - n" can overflow equally as "1 + n + n - n -n" can, and if you > want them to do saturation or something, you cannot optimize _either_ of > them to just "1". If "n" is MAX_INT, then with saturating arithmetic, > neither of them results in 1. Saturated arithmetic kills off so many optimizations because reordering an expression might produce different results. > Not that signed overflow is even specified by the C standard (and > unsigned is specified to be well-behaved). Overflow for signed integer types is undefined behavior (well technically this is an instance of "... not in the range of representable values for its type", sentence 490 http://c0x.coding-guidelines.com/6.5.html). > So it seems to be purely a compiler misfeature. No excuses. This is the point of the discussion that has got me confused. What compiler misfeature? Perhaps I am using the 'wrong' version of gcc (version 4.0.2), but I get the expected wrapping behavior (ie, the compiler tries to behave at translate time the same way as st runtime). -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:derek@knosof.co.uk Applications Standards Conformance Testing http://www.knosof.co.uk ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 23:29 ` Derek M Jones @ 2007-05-24 0:02 ` Al Viro 2007-05-24 0:29 ` Linus Torvalds 1 sibling, 0 replies; 24+ messages in thread From: Al Viro @ 2007-05-24 0:02 UTC (permalink / raw) To: Derek M Jones; +Cc: Linus Torvalds, Neil Booth, Josh Triplett, linux-sparse On Thu, May 24, 2007 at 12:29:43AM +0100, Derek M Jones wrote: > It was intended as a probabilit statement (ok, I did not make that > clear). An expression containing n+n is more likely to overflow > than one containing n-n. Gimme a break. a) s/int/unsigned and run that through gcc; no change in behaviour b) no fscking way in hell *either* is acceptable for bitfield width - definitely not with -std=c99 -pedantic. Violates 6.6p6 and 6.7.2.1p3. c) what's happening is pretty obvious - the difference is not in overflows, it's in expression tree structure (remember, + and - are left-to-right). gcc throws several cheap optimizations at the expression and checks if it has come up with a constant. Simple common factors are taken out (n*m - n*m is seen as 0), common subexpressions are not recognized ((n+m)-(n+m) is not seen as constant). d) (c) is an exercise in software proctology - gcc has an obvious bug in that area (mishandling recognition of integer constant expressions), period. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 23:29 ` Derek M Jones 2007-05-24 0:02 ` Al Viro @ 2007-05-24 0:29 ` Linus Torvalds 1 sibling, 0 replies; 24+ messages in thread From: Linus Torvalds @ 2007-05-24 0:29 UTC (permalink / raw) To: Derek M Jones; +Cc: Al Viro, Neil Booth, Josh Triplett, linux-sparse [-- Attachment #1: Type: TEXT/PLAIN, Size: 1513 bytes --] On Thu, 24 May 2007, Derek M Jones wrote: > > Saturated arithmetic kills off so many optimizations because reordering > an expression might produce different results. Yes. However, the FP people do have solutions to that, for all the same reasons. (ie, you can add rules like: - parentheses have meaning outside of just precedence, and disable associativity-based ordering optimizations. - the normal C side effect boundaries also act as ordering boundaries for arithmetic. > This is the point of the discussion that has got me confused. > What compiler misfeature? Perhaps I am using the 'wrong' version > of gcc (version 4.0.2), but I get the expected wrapping behavior (ie, > the compiler tries to behave at translate time the same way as st > runtime). The problem that Al was pointing to that sometimes gcc will do *constant folding* at parse time, and treat "1+n-n" as a compile-time constant in situations where that simply isn't valid C. Try this: [torvalds@woody ~]$ cat t.c extern int n; int a[1 + n - n]; [torvalds@woody ~]$ gcc -c -Wall t.c and notice the lack of any error what-so-ever. With sparse, you get [torvalds@woody ~]$ sparse -Wall t.c t.c:2:13: error: bad constant expression now, change the "1+n-n" into "1+n+n-n-n", and notice what gcc says: [torvalds@woody ~]$ gcc -c -Wall t.c t.c:2: error: variably modified ‘a’ at file scope iow, gcc actually thinks that "1+n-n" is somehow different from "1+n+n-n-n". Which was Al's point. Linus ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fun with ?: 2007-05-23 21:16 ` Derek M Jones 2007-05-23 21:59 ` Linus Torvalds @ 2007-05-24 1:36 ` Brett Nash 1 sibling, 0 replies; 24+ messages in thread From: Brett Nash @ 2007-05-24 1:36 UTC (permalink / raw) To: Derek M Jones; +Cc: linux-sparse On Wed, 2007-05-23 at 22:16 +0100, Derek M Jones wrote: > > extern int n; > > struct { > > int x : 1 + n - n; > > } y; > > > > passes with -pedantic -std=c99. Replacing that with 1 + n - n + n - n > > is still OK with gcc; 1 + n + n - n - n is not. > > > > So that's hardly an example of, well, anything. > > It is an example of order of evaluation mattering when overflow > occurs. I shudder to think of the architecture where integer overflow and bit-field declarations are mentioned in the same sentence. nash [Going back to lurking, on his new amd2147483648] ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2007-06-03 1:05 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-05-19 2:52 fun with ?: Al Viro 2007-05-22 21:40 ` Josh Triplett 2007-05-22 22:46 ` Al Viro 2007-05-22 23:24 ` Josh Triplett 2007-05-23 0:02 ` Al Viro 2007-05-23 0:25 ` Al Viro 2007-05-23 1:05 ` Josh Triplett 2007-05-23 4:53 ` Al Viro 2007-05-23 12:26 ` Morten Welinder 2007-05-23 1:03 ` Josh Triplett 2007-06-03 1:05 ` Al Viro 2007-05-23 14:25 ` Neil Booth 2007-05-23 14:32 ` Al Viro 2007-05-23 14:47 ` Neil Booth 2007-05-23 15:32 ` Al Viro 2007-05-23 23:01 ` Neil Booth 2007-05-24 0:10 ` Derek M Jones 2007-05-24 0:14 ` Al Viro 2007-05-23 21:16 ` Derek M Jones 2007-05-23 21:59 ` Linus Torvalds 2007-05-23 23:29 ` Derek M Jones 2007-05-24 0:02 ` Al Viro 2007-05-24 0:29 ` Linus Torvalds 2007-05-24 1:36 ` Brett Nash
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).