netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fixing sk_stream_rfree()
@ 2006-04-15  3:59 David S. Miller
  2006-04-15  8:23 ` David S. Miller
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: David S. Miller @ 2006-04-15  3:59 UTC (permalink / raw)
  To: netdev; +Cc: herbert


Herbert, as you may have noticed we found some missing
locking in sk_stream_rfree().  It could explain the
"!sk_forward_alloc" BUG() we thought was caused by e1000's
TSO implementation and the Intel folks have provided enough
datapoints to prove that it is indeed not an e1000 specific
problem.

sk_stream_rfree() modifies sk->sk_forward_alloc without holding any
locks from the __kfree_skb() callback.

Now, _most_ of the time it is OK because we're being invoked with the
socket lock held as ACKs come back to liberate the transmit queue or
during socket shutdown which also holds the socket locked.  (I did
audit this, but I may have missed something, please double check this
assertion)

However, if skb->users is incremented or similar, that entity could
end up being the context in which the skb is freed up and that will
not necessarily have the socket locked correctly when the
skb->destructor callback invokes sk_stream_rfree().  Consider tcpdump
or similar.

But we're stuck if we try to cure this:

1) We can't take bh_lock_sock().  Well, we could if we disabled
   BH explicitly first but even then if the socket is locked
   by the user we can't tell if that's the current cpu and there
   is no easy way to handle deferring this work.

   We really can't consider using a workqueue or something like
   that to handle this, that's way too heavy handed.

2) We can't turn sk_forward_alloc easily into an atomic_t.  The
   masking operation in __sk_stream_mem_reclaim() does not translate
   readily into an atomic_t op:

		sk->sk_forward_alloc &= SK_STREAM_MEM_QUANTUM - 1;

   That line has always driven me nuts, but I know that it is there
   to handle partial page allocations existing when that function
   is called.

I guess for #2 we could change those two lines into:

		int n = atomic_read(&sk->sk_forward_alloc) /
				SK_STREAM_MEM_QUANTUM;

		atomic_sub(n, sk->sk_prot->memory_allocated);
		sk->sk_forward_alloc -= n * SK_STREAM_MEM_QUANTUM;

in order to make it "atomic_t op" translatable.

But even if we did this with sk->sk_forward_alloc as an atomic_t
it would have the same kinds of races we're trying to cure here,
namely that we test the value and then act upon it while an unlocked
asynchronous context can modify the value in another thread of
execution in between the test and the action.

Any ideas?  Come to think of it, __sk_stream_mem_reclaim() looks
really racey even as-is.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-04-18 11:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-15  3:59 fixing sk_stream_rfree() David S. Miller
2006-04-15  8:23 ` David S. Miller
2006-04-15  9:03   ` David S. Miller
2006-04-16 11:18 ` Herbert Xu
2006-04-17  5:32   ` David S. Miller
2006-04-17  6:17     ` Herbert Xu
2006-04-17 20:27       ` Jesse Brandeburg
2006-04-17 20:29       ` David S. Miller
2006-04-17 21:09         ` Jesse Brandeburg
2006-04-17 21:40           ` Herbert Xu
2006-04-17 21:56             ` Jesse Brandeburg
2006-04-18 11:49 ` Ingo Oeser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).