[PATCHSET] fouled-bitwise handling

linux-sparse.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET] fouled-bitwise handling
@ 2006-10-01 13:57 Al Viro
  2006-10-01 14:09 ` Al Viro
  2006-10-01 16:21 ` Linus Torvalds
  0 siblings, 2 replies; 9+ messages in thread
From: Al Viro @ 2006-10-01 13:57 UTC (permalink / raw)
  To: linux-sparse

This stuff comes from handling smaller-than-int bitwise types (e.g. __le16).
The problem is in handling things like
	__be16 x, y;
	...
	if (x == (x & ~y))
The code is bitwise-clean, but current sparse can't deduce that.  Operations
allowed on bitwise types have the following property: (type)(x <op> y) can
be substituted for x <op> y in any expression other than sizeof.  That allows
us to ignore usual arithmetical conversions for those types and treat e.g.
| as __be16 x __be16 -> __be16, despite the promotion rules; resulting
semantics will be the same.  However, ~ on smaller-than-int does not have
such property; indeed, ~y is guaranteed to _not_ fit into range of __be16
in the example above.

That causes a lot of unpleasant problems when dealing with e.g. networking
code - IP checksums are 16bit and ~ is often used in their (re)calculations.
The way to deal with that is based on the observation that even though we do
get junk in upper bits, it normally ends up being discarded and sparse can
be taught to prove that.  To do that we need "fouled" conterparts for short
bitwise types.  They will be assigned to (sub)expressions that might carry
junk in upper bits, but trimming those bits would result in the value we'd
get if all operations had been done within the bitwise type.  E.g. in the
example above y would be __be16, ~y - fouled __be16, x & ~y - __be16 again
and x == (x & ~y) - boolean.

Basically, we delay reporting an error on ~<short bitwise> for as long as
possible in hope that taint will be cleansed later.

This patchset can be pulled from
git://git.kernel.org/pub/scm/linux/kernel/git/viro/sparse.git for-linus

shortlog:
Al Viro:
      casting null pointer constant to non-zero address space is always OK
      introduce classify_type(), use it in obvious places
      evaluate_compare() can just use evaluate_arith() for non-pointer cases
      beginning of SYM_RESTRICT rewrite: restricted_binop_type()
      merged compatible_..._binop() into single function
      saner recovery from endianness errors, part 1.
      handle fouled-bitwise

diffstat:
 evaluate.c                |  442 ++++++++++++++++++++++++++-------------------
 parse.c                   |    1 
 show-parse.c              |    8 +
 symbol.c                  |   38 ++++
 symbol.h                  |    5 +
 validation/foul-bitwise.c |   20 ++
 6 files changed, 324 insertions(+), 190 deletions(-)
 create mode 100644 validation/foul-bitwise.c

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHSET] fouled-bitwise handling
  2006-10-01 13:57 [PATCHSET] fouled-bitwise handling Al Viro
@ 2006-10-01 14:09 ` Al Viro
  2006-10-01 16:21 ` Linus Torvalds
  1 sibling, 0 replies; 9+ messages in thread
From: Al Viro @ 2006-10-01 14:09 UTC (permalink / raw)
  To: linux-sparse

>       casting null pointer constant to non-zero address space is always OK

... gah.  That patch has nothing to do with the series, but it's probably
not worth pulling out (cast handling is seriously affected by the patches
that do belong there).  My apologies for missing it.

OTOH, the patch is trivial and obviously correct, so...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHSET] fouled-bitwise handling
  2006-10-01 13:57 [PATCHSET] fouled-bitwise handling Al Viro
  2006-10-01 14:09 ` Al Viro
@ 2006-10-01 16:21 ` Linus Torvalds
  2006-10-01 16:45   ` Al Viro
  2006-10-01 16:52   ` Linus Torvalds
  1 sibling, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2006-10-01 16:21 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-sparse

On Sun, 1 Oct 2006, Al Viro wrote:
> 
> This stuff comes from handling smaller-than-int bitwise types (e.g. __le16).
> The problem is in handling things like
> 	__be16 x, y;
> 	...
> 	if (x == (x & ~y))
> The code is bitwise-clean, but current sparse can't deduce that.

Al, I understand why you did it the way you did, but I think it's wrong.

Why do I think it's wrong? Let me count the ways.

Umm. Ok, I counted, and there's only one way.

The only reason I think you're doing it wrong is that you're using this 
totally new separate mechanism for it (which is not a bad mechanism, but 
it's very much a special case), and you should not need to!

The fact is, regardless of any bitwise issues, the

	(cast) narrower-type <op> (wider-type "&" (cast) narrower-type)

should _always_ be simplified to a small-type.

See? Even from a purely optimization angle, we're actually better off just 
noticing that the whole comparison can be done in the narrower type.

Imagine this piece of code:

	int mask;
	char c, *p;

	if (*p == (mask & c))

where there is no bitwise stuff, a good compiler should notice on its own 
that it should just be done in 8 bits rather than having to convert "c" 
and "*p" to integers.

And once you do that _generic_ optimization, your whole "fouled" thing is 
useless.

Btw, we _already_ do this optimization for a few cases, we just didn't do 
it for enough cases.  We do it purely for assignment statements, ie notice 
how git already does not warn for

	__le16 a, b;

	a = ~b;

exactly because we have done a simplification of

	a = (implicit cast back to __le16)~(implicit cast to int)b;

to just keeping everything in the narrower type..

See commit 4b6c2e63d3423931ac687f113882cec427df9f91 for details.

And I _think_ we should be able to do exactly the same for this one, by 
simply noticing that doing a

	(widening-cast) a == b & (widening-cast) c

is by definition throwing away the higher bits on both sides, and we can 
just turn it into the simpler

	a == c & (narrowing-cast) b

instead. Of course, we can only do this with ops that don't have overflow 
from the narrower type (but comparisons and bitops are ok).

No?

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHSET] fouled-bitwise handling
  2006-10-01 16:21 ` Linus Torvalds
@ 2006-10-01 16:45   ` Al Viro
  2006-10-01 16:52   ` Linus Torvalds
  1 sibling, 0 replies; 9+ messages in thread
From: Al Viro @ 2006-10-01 16:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-sparse

On Sun, Oct 01, 2006 at 09:21:21AM -0700, Linus Torvalds wrote:
> The only reason I think you're doing it wrong is that you're using this 
> totally new separate mechanism for it (which is not a bad mechanism, but 
> it's very much a special case), and you should not need to!

> The fact is, regardless of any bitwise issues, the
> 
> 	(cast) narrower-type <op> (wider-type "&" (cast) narrower-type)
> 
> should _always_ be simplified to a small-type.

Sure.  Now try to carry that to more complex expressions.

> And I _think_ we should be able to do exactly the same for this one, by 
> simply noticing that doing a
> 
> 	(widening-cast) a == b & (widening-cast) c
> 
> is by definition throwing away the higher bits on both sides, and we can 
> just turn it into the simpler
> 
> 	a == c & (narrowing-cast) b
> 
> instead. Of course, we can only do this with ops that don't have overflow 
> from the narrower type (but comparisons and bitops are ok).
> 
> No?

At which phase would you do that?  We assign types pretty much leaves
to root - type of node is derived from the types of children.

Now, you suggest basically going *other* way.  I.e. noticing that
we have
	<complex expression in narrow> & <complex expression in wide>
and trying to propagate narrowing down another subexpression.

Note that we need *both* arguments typed - we don't know which one
might be narrow here.  So it's really re-evaluation of subexpression.

OK, so we have expand phase that could do that.  _However_, we'd have
to postpone the bitwise checks until that phase, or we'd get them
anyway.  And that is not to mention the fun with evaluate_expression()
mangling the tree, so re-walking it won't be easy.

IOW, I very much prefer to deal with that by _existing_ mechanism;
namely, the type assignment.  It's easier to propagate the information
along with the rest of type information, in the direction we are
already doing.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHSET] fouled-bitwise handling
  2006-10-01 16:21 ` Linus Torvalds
  2006-10-01 16:45   ` Al Viro
@ 2006-10-01 16:52   ` Linus Torvalds
  2006-10-01 17:22     ` Al Viro
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2006-10-01 16:52 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-sparse

On Sun, 1 Oct 2006, Linus Torvalds wrote:
>
> See? Even from a purely optimization angle, we're actually better off just 
> noticing that the whole comparison can be done in the narrower type.

So here's a test-case:

	typedef unsigned short le16;

	int cmp(le16 a, le16 b)
	{
		return a == (a & ~b);
	}

and doing a "./test-linearize" on this gets us:

	cmp:
	.L0xf7fa500c:
	        <entry-point>
	        dead        %arg1
	        scast.32    %r2 <- (16) %arg1
	        dead        %arg2
	        scast.32    %r6 <- (16) %arg2
	        dead        %r6
	        not.32      %r7 <- %r6
	        dead        %r7
	        and.32      %r8 <- %r2, %r7
	        dead        %r8
	        dead        %r2
	        seteq.32    %r9 <- %r2, %r8
	        dead        %r9
	        scast.32    %r10 <- (1) %r9
	        dead        %r10
	        ret.32      %r10

which is just horrible code anyway.

I think it _should_ be

	cmp:
	.L0xf7fa500c:
	        <entry-point>
	        dead        %arg2
	        not         %r4 <- %arg2
	        dead        %r4
	        and         %r5 <- %arg1, %r4
	        dead        %r5
	        dead        %arg1
	        seteq.16    %r6 <- %arg1, %r5
	        dead        %r6
	        scast.32    %r7 <- (1) %r6
	        dead        %r7
	        ret.32      %r7

instead, which is obviously smaller and avoids the two unnecessary casts.

(Now, it's entirely possible that we warn so early that we can't 
reasonably do this optimization until after we've already warned, but I've 
been able to remove these kinds of warnings before, so I think it should 
be possible to do it here too).

I much prefer (if possible) the "make sparse so much smarter that it sees 
that it's ok" approach over "let's add a magic special case". For example, 
a lot of the work I did to make the lock acquire/release logic useful was 
very much about teaching sparse to simplify code-flow rather than anything 
else.

But hey, maybe this case is too hard or nasty..

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHSET] fouled-bitwise handling
  2006-10-01 16:52   ` Linus Torvalds
@ 2006-10-01 17:22     ` Al Viro
  2006-10-01 17:29       ` Al Viro
  2006-10-01 17:33       ` Linus Torvalds
  0 siblings, 2 replies; 9+ messages in thread
From: Al Viro @ 2006-10-01 17:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-sparse

On Sun, Oct 01, 2006 at 09:52:31AM -0700, Linus Torvalds wrote:
> (Now, it's entirely possible that we warn so early that we can't 
> reasonably do this optimization until after we've already warned, but I've 
> been able to remove these kinds of warnings before, so I think it should 
> be possible to do it here too).
> 
> I much prefer (if possible) the "make sparse so much smarter that it sees 
> that it's ok" approach over "let's add a magic special case". For example, 
> a lot of the work I did to make the lock acquire/release logic useful was 
> very much about teaching sparse to simplify code-flow rather than anything 
> else.
> 
> But hey, maybe this case is too hard or nasty..

Hrm...  The gut feeling: we mangle expression trees too much to be able
to do that decently.

We would have to carry "this might be le16"/"this might be be16"/"this
might be foo_t" up *anyway*.  Because what we see several nodes up is
be16 | int.  And even if we guess that it might be expanded 16bit,
we _still_ need to know which 16bit it was.  So you have to propagate
that up and that's exactly what I'm doing.

"Find how much we could narrow it" is a separate question; _that_ is
best done by moving the casts down.  But I would not mix that with
type warnings.  We _do_ mangle the tree a lot, so by the time we
start walking down, we might have large chunks in a very inconvenient
shape.

The bottom line: your variant either requires to carry "which 16bit
it might have been" along with the expression type (in which case it's
exactly what I've done, modulo data representation), _OR_ it requires
3 passes - normal type evaluation + walking down trying to narrow the
things down + walking up *again* trying to generate those postponed
warnings.  Because you won't be able to do them on the second pass -
not enough information.

I'd rather handle them on the first pass when we can easily do that
and then do exact equivalent of the second pass (top to bottom) of
your approach.  Endianness type checks can be easily combined with
the first pass, but not with the second...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHSET] fouled-bitwise handling
  2006-10-01 17:22     ` Al Viro
@ 2006-10-01 17:29       ` Al Viro
  2006-10-01 17:58         ` Linus Torvalds
  2006-10-01 17:33       ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Al Viro @ 2006-10-01 17:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-sparse

PS: "fouled(le16)" is exactly "int, and if you narrow it to 16bit, it'd
better be le16".  We could separate that from type information, but hey,
if we want to carry reference to some type + indication that we do have
that reference + int as type, we might as well introduce a type node
saying "I'm int, might be base_type".  Which is exactly what I've done...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHSET] fouled-bitwise handling
  2006-10-01 17:29       ` Al Viro
@ 2006-10-01 17:58         ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2006-10-01 17:58 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-sparse

On Sun, 1 Oct 2006, Al Viro wrote:
>
> PS: "fouled(le16)" is exactly "int, and if you narrow it to 16bit, it'd
> better be le16".  We could separate that from type information, but hey,
> if we want to carry reference to some type + indication that we do have
> that reference + int as type, we might as well introduce a type node
> saying "I'm int, might be base_type".  Which is exactly what I've done...

Ok, I'm convinced.

The "fouled" bit may be a special case, but it's less of a special case 
than I thought. In fact, it could be used as a preliminary kind of "expand 
this op to int" that would entirely replace the explicit "implicit cast" 
we do now. 

(Although doing the implicit C casts as _explicit_ casts in sparse does 
have a lot of advantages, and avoids us having to test for things, so I'm 
not convinced we would want to actually expand "fouled" bit usage past the 
bitwise ops).

So I'll pull your tree.

Thanks,

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHSET] fouled-bitwise handling
  2006-10-01 17:22     ` Al Viro
  2006-10-01 17:29       ` Al Viro
@ 2006-10-01 17:33       ` Linus Torvalds
  1 sibling, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2006-10-01 17:33 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-sparse

On Sun, 1 Oct 2006, Al Viro wrote:
> 
> Hrm...  The gut feeling: we mangle expression trees too much to be able
> to do that decently.

Having looked at it a bit now, I think I agree.

We warn about the whole "~" thing too early, so we'd have to remove that 
warning anyway (easy enough to do), but then we'd have to track the ops 
that don't like using a "~" because of its C semantics.

And at that point, your "fouled" bit is actually probably simpler.

Let me think about this a bit more, but I'm starting to think that you're 
right.

As usual.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-10-01 17:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-01 13:57 [PATCHSET] fouled-bitwise handling Al Viro
2006-10-01 14:09 ` Al Viro
2006-10-01 16:21 ` Linus Torvalds
2006-10-01 16:45   ` Al Viro
2006-10-01 16:52   ` Linus Torvalds
2006-10-01 17:22     ` Al Viro
2006-10-01 17:29       ` Al Viro
2006-10-01 17:58         ` Linus Torvalds
2006-10-01 17:33       ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).