Re: [PATCH] arp_queue: serializing unlink + kfree_skb

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "David S. Miller" <davem@davemloft.net>
To: Herbert Xu <herbert@gondor.apana.org.au>,
	anton@samba.org, anton@au.ibm.com
Cc: okir@suse.de, netdev@oss.sgi.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] arp_queue: serializing unlink + kfree_skb
Date: Thu, 3 Feb 2005 16:34:58 -0800	[thread overview]
Message-ID: <20050203163458.6282c3fe.davem@davemloft.net> (raw)
In-Reply-To: <20050203111224.GA3267@gondor.apana.org.au>

On Thu, 3 Feb 2005 22:12:24 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> This paradigm is repeated throughout the kernel.  I bet the
> same race can be found in a lot of those places.  So we really
> need to sit down and audit them one by one or else come up with
> a magical solution apart from disabling SMP :) 

Ok.  I'm commenting now considering Anton's atomic_t rules.
Anton, please read down below, I think there is a hole in the
PPC memory barriers used for atomic ops on SMP.

I don't see what changes are needed anywhere given those
rules.  Olaf says the problem shows up on PPC SMP system,
and since Anton knows the proper rules we hopefully can
safely assume he implemented them correctly on PPC :-)

I thought for a moment that the atomic_read() might be
an issue, and I'd really hate to kill that optimization.
But I can't see how it is.  Let us restate Olaf's original
guess as to the problematic sequence of events:

	cpu 0			cpu 1
	skb_get(skb)
	unlock(neigh)
				lock(neigh)
				__skb_unlink(skb)
				kfree_skb(sb)
	kfree_skb(skb)

First, __skb_unlink(skb) does an unlocked queue unlink, and
these memory operations may have their visibility freely reordered
by the processor.

However, cpu 1 will see the refcount at 2, so it will execute:

	atomic_dec_and_test(&skb->users)

which has the implicit memory barriers, as Anton stated.  This
means that the cpu will make the __skb_unlink(skb) results visible
globally before the decrement.

Now the kfree_skb() on cpu 0 executes, the atomic_read() sees it
at 1, we do the __kfree_skb() and since the __skb_unlink() has been
made visible before the decrement on the count to "1" the BUG()
should not trigger in __kfree_skb().

This all assumes, again, that PPC implements these things properly.

Let's take a look (Anton, start reading here).  My understanding of PPC
memory barriers, wrt. the physical memory coherency domain, is as follows:

	sync	! All previous read/write execute before all subsequent read/write
	lwsync	! All previous reads execute before all subsequent read/write
	eieio	! All previous writes execute before all subsequent read/write
	isync	! All previous memory operations and instructions execute and
		! reach global visibility before any subsequent instructions execute

What guarentees does isync really make about "read" reordering around
the atomic increment?  Any descrepencies here would account for the
case Olaf observed.

All the atomic ops returning values on PPC do this on SMP:

	eieio
	atomic_op()
	isync

At a minimum, it seems that the eieio is not strong enough a
memory barrier.  It is defined to order previous writes against
future memory operations, but we also need to strictly order
previous reads as well don't we?

Also, if my understanding of isync is not correct, that could
have implications as well.

I guess for performance reasons, ppc doesn't use "sync" both
before and after the atomic ops requiring ordering.  But I
suspect that might be what is actually needed for proper conformity
to the atomic_t memory ordering rules.

Anton?

next prev parent reply	other threads:[~2005-02-04  0:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-31 10:29 [PATCH] arp_queue: serializing unlink + kfree_skb Olaf Kirch
2005-01-31 11:33 ` Herbert Xu
2005-02-03  0:20   ` David S. Miller
2005-02-03 11:12     ` Herbert Xu
2005-02-04  0:34       ` David S. Miller [this message]
2005-02-03 14:27   ` Anton Blanchard
2005-02-03 18:14     ` David S. Miller
2005-02-03 20:30     ` Herbert Xu
2005-02-03 22:19       ` David S. Miller
2005-02-03 23:50         ` Herbert Xu
2005-02-04  0:49           ` David S. Miller
2005-02-04  1:20             ` Herbert Xu
2005-02-04  1:23               ` David S. Miller
2005-02-04  1:55                 ` Herbert Xu
2005-02-04 11:16                   ` Olaf Kirch
2005-02-06  1:14                   ` David S. Miller
2005-02-03 23:08     ` David S. Miller
2005-02-04 11:33       ` Herbert Xu
2005-02-04 23:48         ` David S. Miller
2005-02-05  6:24           ` David S. Miller
2005-02-05  6:50             ` Herbert Xu
2005-02-10  4:23             ` Werner Almesberger
2005-02-10  4:56               ` Herbert Xu
2005-02-11  3:46                 ` David S. Miller
2005-02-11  3:50               ` David S. Miller
2005-02-11  4:27                 ` Werner Almesberger
2005-02-11  5:04                 ` Dmitry Torokhov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050203163458.6282c3fe.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=anton@au.ibm.com \
    --cc=anton@samba.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@oss.sgi.com \
    --cc=okir@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).