* atomic_dec_and_test for child dst needed in dst_destroy?
@ 2005-04-05 18:55 Christoph Lameter
2005-04-05 19:34 ` David S. Miller
2005-04-05 21:45 ` Herbert Xu
0 siblings, 2 replies; 27+ messages in thread
From: Christoph Lameter @ 2005-04-05 18:55 UTC (permalink / raw)
To: netdev
It seems that the atomic_dec_and_test for the child dst in net/core/dst.c
is really not needed since dst_destroy cannot be called simultanously for
the same dst (frees dst unconditionally via kmem_cache_free). If a dst can
only have one child then the dst may only be deleted from its parent dst.
The refcnt is also incremented via dst_hold and decremented via
dst_release elsewhere (seems that they may race with dst_destroy?) but
there is no provisions to do something special if __refcnt would reach
zero in dst_release. __refcnt is always expected to be always
greater than zero there.
However, it is checked without dec_and_test for zero in various other
locations (dst_free, dst_run_gc, dn_dst_check_expire).
Is the atomic_dec_and_test in dst_destroy just there to join two atomic
operations into one without being necessary for the correctness of freeing
dsts?
Then we really would not need atomic_dec_and_test in dst_destroy() and the
following patch would be safe (There are some ideas for optimizations that
would not be able to preserve the atomic_dec_and_test).
Index: linux-2.6.11/net/core/dst.c
===================================================================
--- linux-2.6.11.orig/net/core/dst.c 2005-03-01 23:38:12.000000000 -0800
+++ linux-2.6.11/net/core/dst.c 2005-04-05 11:04:41.000000000 -0700
@@ -198,7 +198,8 @@ again:
dst = child;
if (dst) {
- if (atomic_dec_and_test(&dst->__refcnt)) {
+ atomic_dec(&dst->__refcnt);
+ if (!atomic_read(&dst->__refcnt)) {
/* We were real parent of this dst, so kill child. */
if (dst->flags&DST_NOHASH)
goto again;
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-05 18:55 atomic_dec_and_test for child dst needed in dst_destroy? Christoph Lameter
@ 2005-04-05 19:34 ` David S. Miller
2005-04-05 19:47 ` Christoph Lameter
2005-04-05 21:45 ` Herbert Xu
1 sibling, 1 reply; 27+ messages in thread
From: David S. Miller @ 2005-04-05 19:34 UTC (permalink / raw)
To: Christoph Lameter; +Cc: netdev
On Tue, 5 Apr 2005 11:55:45 -0700 (PDT)
Christoph Lameter <christoph@lameter.com> wrote:
> Is the atomic_dec_and_test in dst_destroy just there to join two atomic
> operations into one without being necessary for the correctness of freeing
> dsts?
Otimizing big SGI NUM systems again, are you? :-)
atomic_dec() has no memory ordering guarentees, only the atomic
routines returning values do the proper SMP memory barriers.
So, based upon this alone I don't think your change is valid.
I've even documented this fully, see Documentation/atomic_ops.txt
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-05 19:34 ` David S. Miller
@ 2005-04-05 19:47 ` Christoph Lameter
2005-04-05 19:50 ` David S. Miller
0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2005-04-05 19:47 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
On Tue, 5 Apr 2005, David S. Miller wrote:
> On Tue, 5 Apr 2005 11:55:45 -0700 (PDT)
> Christoph Lameter <christoph@lameter.com> wrote:
>
> > Is the atomic_dec_and_test in dst_destroy just there to join two atomic
> > operations into one without being necessary for the correctness of freeing
> > dsts?
> atomic_dec() has no memory ordering guarentees, only the atomic
> routines returning values do the proper SMP memory barriers.
> So, based upon this alone I don't think your change is valid.
Correct that applies in general. But what could go wrong if the atomic_dec
is separated from the atomic_read in this specific location?
I fail to see what the point of having a single instance of
atomic_dec_and_test for __refcnt is. In particular since the upper layers
guarantee that dst_destroy is not called multiple times for the same dst
entry.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-05 19:47 ` Christoph Lameter
@ 2005-04-05 19:50 ` David S. Miller
2005-04-05 19:58 ` Christoph Lameter
0 siblings, 1 reply; 27+ messages in thread
From: David S. Miller @ 2005-04-05 19:50 UTC (permalink / raw)
To: Christoph Lameter; +Cc: netdev
On Tue, 5 Apr 2005 12:47:09 -0700 (PDT)
Christoph Lameter <christoph@lameter.com> wrote:
> Correct that applies in general. But what could go wrong if the atomic_dec
> is separated from the atomic_read in this specific location?
>
> I fail to see what the point of having a single instance of
> atomic_dec_and_test for __refcnt is. In particular since the upper layers
> guarantee that dst_destroy is not called multiple times for the same dst
> entry.
If this is true, what performance improvement could you possibly be
seeing from this change?
I know you are making this change for performance reasons, yet you
aren't mentioning any details about this. That information is
part of what we need to know to judge this change.
I've very hesistant to undo atomic operation memory barriers, after
all of the weird problems we had in the neighbour cache.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-05 19:50 ` David S. Miller
@ 2005-04-05 19:58 ` Christoph Lameter
2005-04-05 20:12 ` David S. Miller
0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2005-04-05 19:58 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
On Tue, 5 Apr 2005, David S. Miller wrote:
> > I fail to see what the point of having a single instance of
> > atomic_dec_and_test for __refcnt is. In particular since the upper layers
> > guarantee that dst_destroy is not called multiple times for the same dst
> > entry.
>
> If this is true, what performance improvement could you possibly be
> seeing from this change?
We could make refcnt into an array of pointers that point to node specific
memory. This avoids cache line bouncing. However, you cannot atomically
dec_and_test an array. This is the only location where a dec_and_test is
used on dst->__refcnt.
> I know you are making this change for performance reasons, yet you
> aren't mentioning any details about this. That information is
> part of what we need to know to judge this change.
I figured that it is not worth posting the patch if there is a reason
for the dec_and_test here. I was sure if my assessment of the role
of this atomic_dec_and_test here is correct.
> I've very hesistant to undo atomic operation memory barriers, after
> all of the weird problems we had in the neighbour cache.
We can put an explicit barier in if that is the only reason for the
atomic_dec_and_test.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-05 19:58 ` Christoph Lameter
@ 2005-04-05 20:12 ` David S. Miller
0 siblings, 0 replies; 27+ messages in thread
From: David S. Miller @ 2005-04-05 20:12 UTC (permalink / raw)
To: Christoph Lameter; +Cc: netdev
On Tue, 5 Apr 2005 12:58:15 -0700 (PDT)
Christoph Lameter <christoph@lameter.com> wrote:
> On Tue, 5 Apr 2005, David S. Miller wrote:
>
> > > I fail to see what the point of having a single instance of
> > > atomic_dec_and_test for __refcnt is. In particular since the upper layers
> > > guarantee that dst_destroy is not called multiple times for the same dst
> > > entry.
> >
> > If this is true, what performance improvement could you possibly be
> > seeing from this change?
>
> We could make refcnt into an array of pointers that point to node specific
> memory. This avoids cache line bouncing. However, you cannot atomically
> dec_and_test an array. This is the only location where a dec_and_test is
> used on dst->__refcnt.
The dst object is already too large. You have to show a serious
performance improvement to justify bloating it up further.
> We can put an explicit barier in if that is the only reason for the
> atomic_dec_and_test.
Then we lose the optimizations of those memory barriers that platforms
do in their atomic_op assembly.
Let me check out if your assertions about dst_destroy() usage are correct
first, hold on for a bit.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-05 18:55 atomic_dec_and_test for child dst needed in dst_destroy? Christoph Lameter
2005-04-05 19:34 ` David S. Miller
@ 2005-04-05 21:45 ` Herbert Xu
2005-04-05 21:48 ` David S. Miller
` (2 more replies)
1 sibling, 3 replies; 27+ messages in thread
From: Herbert Xu @ 2005-04-05 21:45 UTC (permalink / raw)
To: Christoph Lameter; +Cc: netdev, davem
Christoph Lameter <christoph@lameter.com> wrote:
>
> Index: linux-2.6.11/net/core/dst.c
> ===================================================================
> --- linux-2.6.11.orig/net/core/dst.c 2005-03-01 23:38:12.000000000 -0800
> +++ linux-2.6.11/net/core/dst.c 2005-04-05 11:04:41.000000000 -0700
> @@ -198,7 +198,8 @@ again:
>
> dst = child;
> if (dst) {
> - if (atomic_dec_and_test(&dst->__refcnt)) {
> + atomic_dec(&dst->__refcnt);
> + if (!atomic_read(&dst->__refcnt)) {
> /* We were real parent of this dst, so kill child. */
> if (dst->flags&DST_NOHASH)
> goto again;
This is racy (albeit very unlikely) because dst may be freed by
dst_run_gc after the atomic_dec.
When we are not the real parent of the dst (e.g., when we're xfrm_dst
and the child is an rtentry), it may already be on the GC list.
In fact the current code is buggy to, we need to check dst->flags
before the dec as dst may no longer be valid afterwards.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
===== net/core/dst.c 1.28 vs edited =====
--- 1.28/net/core/dst.c 2005-02-16 09:23:10 +11:00
+++ edited/net/core/dst.c 2005-04-06 07:41:02 +10:00
@@ -198,13 +198,15 @@
dst = child;
if (dst) {
+ int nohash = dst->flags & DST_NOHASH;
+
if (atomic_dec_and_test(&dst->__refcnt)) {
/* We were real parent of this dst, so kill child. */
- if (dst->flags&DST_NOHASH)
+ if (nohash)
goto again;
} else {
/* Child is still referenced, return it for freeing. */
- if (dst->flags&DST_NOHASH)
+ if (nohash)
return dst;
/* Child is still in his hash table */
}
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-05 21:45 ` Herbert Xu
@ 2005-04-05 21:48 ` David S. Miller
2005-04-05 22:14 ` Christoph Lameter
[not found] ` <b82a8917050406002339f732ca@mail.gmail.com>
2 siblings, 0 replies; 27+ messages in thread
From: David S. Miller @ 2005-04-05 21:48 UTC (permalink / raw)
To: Herbert Xu; +Cc: christoph, netdev
On Wed, 06 Apr 2005 07:45:06 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:
> When we are not the real parent of the dst (e.g., when we're xfrm_dst
> and the child is an rtentry), it may already be on the GC list.
>
> In fact the current code is buggy to, we need to check dst->flags
> before the dec as dst may no longer be valid afterwards.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Good catch Herbert, I'll apply this.
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-05 21:45 ` Herbert Xu
2005-04-05 21:48 ` David S. Miller
@ 2005-04-05 22:14 ` Christoph Lameter
2005-04-06 2:19 ` Herbert Xu
[not found] ` <b82a8917050406002339f732ca@mail.gmail.com>
2 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2005-04-05 22:14 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev, davem
On Wed, 6 Apr 2005, Herbert Xu wrote:
> > - if (atomic_dec_and_test(&dst->__refcnt)) {
> > + atomic_dec(&dst->__refcnt);
> > + if (!atomic_read(&dst->__refcnt)) {
>
> This is racy (albeit very unlikely) because dst may be freed by
> dst_run_gc after the atomic_dec.
If that is so then it is also possible that the gc frees after
atomic_dec_and_test:
cpu0 dst_destroy cpu1 dst_run_gc
atomic_dec_and_test(refcnt)
if (atomic_read(refcnt)) ...
..
dst_destroy(dst)
kmem_cache_free(dst)
..
goto again
...
kmem_cache_free(dst)
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-05 22:14 ` Christoph Lameter
@ 2005-04-06 2:19 ` Herbert Xu
2005-04-06 3:19 ` Christoph Lameter
0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2005-04-06 2:19 UTC (permalink / raw)
To: Christoph Lameter; +Cc: herbert, netdev, davem
Christoph Lameter <christoph@lameter.com> wrote:
>
> If that is so then it is also possible that the gc frees after
> atomic_dec_and_test:
No this is prevented by the nohash check.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-06 2:19 ` Herbert Xu
@ 2005-04-06 3:19 ` Christoph Lameter
2005-04-06 8:32 ` Herbert Xu
0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2005-04-06 3:19 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev, davem
On Wed, 6 Apr 2005, Herbert Xu wrote:
> Christoph Lameter <christoph@lameter.com> wrote:
> >
> > If that is so then it is also possible that the gc frees after
> > atomic_dec_and_test:
>
> No this is prevented by the nohash check.
Ok then the purpose of this whole thing is to decrement the usage counter
and at the same time figure out if this is the last child on an unhashed
entry. That unhashed entry is not handled by the garbage collector and
must be freed separately.
The atomic_dec_and_test is needed because as soon as the refcnt reaches
zero the dst entry may potentially be freed by the garbage collector. Thus
no future access to the dst entry is possible and the dec_and_test is the
only safe way to figure out if the counter reached zero.
I hope I got it now. Thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-06 3:19 ` Christoph Lameter
@ 2005-04-06 8:32 ` Herbert Xu
2005-04-06 18:17 ` David S. Miller
0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2005-04-06 8:32 UTC (permalink / raw)
To: Christoph Lameter; +Cc: herbert, netdev, davem
Christoph Lameter <christoph@lameter.com> wrote:
>
> The atomic_dec_and_test is needed because as soon as the refcnt reaches
> zero the dst entry may potentially be freed by the garbage collector. Thus
> no future access to the dst entry is possible and the dec_and_test is the
> only safe way to figure out if the counter reached zero.
>
> I hope I got it now. Thanks.
Yep you're totally correct.
In fact, the atomic_dec_and_test is really only needed for unhashed
entries (i.e., IPsec). So we could do something like this so that
all hashed entries undergo atomic_dec.
This would only make sense if there were architectures where
atomic_dec is significantly cheaper compared to atomic_dec_and_test.
Do such beasts exist?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
--- linux-2.6/net/core/dst.c.orig 2005-04-06 18:29:00.000000000 +1000
+++ linux-2.6/net/core/dst.c 2005-04-06 18:26:46.000000000 +1000
@@ -197,20 +197,21 @@
kmem_cache_free(dst->ops->kmem_cachep, dst);
dst = child;
- if (dst) {
- int nohash = dst->flags & DST_NOHASH;
+ if (!dst)
+ return NULL;
+ if (dst->flags & DST_NOHASH) {
if (atomic_dec_and_test(&dst->__refcnt)) {
/* We were real parent of this dst, so kill child. */
- if (nohash)
- goto again;
+ goto again;
} else {
/* Child is still referenced, return it for freeing. */
- if (nohash)
- return dst;
- /* Child is still in his hash table */
+ return dst;
}
}
+
+ /* Child is still in his hash table or on the GC list. */
+ atomic_dec(&dst->__refcnt);
return NULL;
}
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-06 8:32 ` Herbert Xu
@ 2005-04-06 18:17 ` David S. Miller
2005-04-06 18:48 ` Christoph Lameter
2005-04-07 11:07 ` Herbert Xu
0 siblings, 2 replies; 27+ messages in thread
From: David S. Miller @ 2005-04-06 18:17 UTC (permalink / raw)
To: Herbert Xu; +Cc: christoph, herbert, netdev
On Wed, 06 Apr 2005 18:32:54 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:
> In fact, the atomic_dec_and_test is really only needed for unhashed
> entries (i.e., IPsec). So we could do something like this so that
> all hashed entries undergo atomic_dec.
>
> This would only make sense if there were architectures where
> atomic_dec is significantly cheaper compared to atomic_dec_and_test.
>
> Do such beasts exist?
See his other emails in this thread. If it can be converted to
atomic_dec() then he wants to change the counter into an array
of counters on NUMA systems.
But his trick only works if the atomic_dec_and_test() can be eliminated
for all cases, which we're now quite certain is not possible.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-06 18:17 ` David S. Miller
@ 2005-04-06 18:48 ` Christoph Lameter
2005-04-07 11:07 ` Herbert Xu
1 sibling, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2005-04-06 18:48 UTC (permalink / raw)
To: David S. Miller; +Cc: Herbert Xu, netdev
On Wed, 6 Apr 2005, David S. Miller wrote:
> See his other emails in this thread. If it can be converted to
> atomic_dec() then he wants to change the counter into an array
> of counters on NUMA systems.
Correct.
> But his trick only works if the atomic_dec_and_test() can be eliminated
> for all cases, which we're now quite certain is not possible.
Some changes to the way locking is done between the garbage collector
and dst_destroy would be necessary. Lets see if the author
of the patch can come up with a solution to this.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-06 18:17 ` David S. Miller
2005-04-06 18:48 ` Christoph Lameter
@ 2005-04-07 11:07 ` Herbert Xu
2005-04-07 16:00 ` Christoph Lameter
1 sibling, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2005-04-07 11:07 UTC (permalink / raw)
To: David S. Miller; +Cc: christoph, netdev
[-- Attachment #1: Type: text/plain, Size: 1182 bytes --]
On Wed, Apr 06, 2005 at 11:17:21AM -0700, David S. Miller wrote:
>
> See his other emails in this thread. If it can be converted to
> atomic_dec() then he wants to change the counter into an array
> of counters on NUMA systems.
>
> But his trick only works if the atomic_dec_and_test() can be eliminated
> for all cases, which we're now quite certain is not possible.
OK I've thought more about this and indeed it is possible to get rid of
atomic_dec_and_test.
As you can see from my previous patch, it is possible to restrict the
use of atomic_dec_and_test to the NOHASH path.
However, for NOHASH dst entries we know that
1) They're not in a hash table.
2) They're not on the GC list.
Therefore it is safe to look at the entry even after dropping our
reference count.
This is what the following patch does.
I added the unlikely on dst since entries with child pointers are
rare in most systems.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p --]
[-- Type: text/plain, Size: 944 bytes --]
--- linux-2.6/net/core/dst.c.orig 2005-04-07 20:48:25.000000000 +1000
+++ linux-2.6/net/core/dst.c 2005-04-07 20:57:33.000000000 +1000
@@ -169,9 +169,9 @@
struct neighbour *neigh;
struct hh_cache *hh;
+again:
smp_rmb();
-again:
neigh = dst->neighbour;
hh = dst->hh;
child = dst->child;
@@ -197,19 +197,19 @@
kmem_cache_free(dst->ops->kmem_cachep, dst);
dst = child;
- if (dst) {
+ if (unlikely(dst)) {
int nohash = dst->flags & DST_NOHASH;
- if (atomic_dec_and_test(&dst->__refcnt)) {
+ atomic_dec(&dst->__refcnt);
+
+ if (nohash) {
/* We were real parent of this dst, so kill child. */
- if (nohash)
+ if (!atomic_read(&dst->__refcnt))
goto again;
- } else {
/* Child is still referenced, return it for freeing. */
- if (nohash)
- return dst;
- /* Child is still in his hash table */
+ return dst;
}
+ /* Child is still in his hash table or on the GC list. */
}
return NULL;
}
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-07 11:07 ` Herbert Xu
@ 2005-04-07 16:00 ` Christoph Lameter
2005-04-07 21:25 ` Herbert Xu
0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2005-04-07 16:00 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev, pravins, shai
On Thu, 7 Apr 2005, Herbert Xu wrote:
> However, for NOHASH dst entries we know that
>
> 1) They're not in a hash table.
> 2) They're not on the GC list.
>
> Therefore it is safe to look at the entry even after dropping our
> reference count.
In an earlier email you said that there was a slight chance that child
entries (typically NOHASH from what I can see in the code) could be on the
gc list.
If its not on the hash table and not on the gc list then how could the
refcnt be > 1 ? Doesnt the refcnt > 1 imply that multiple concurrent
accesses are possible?
Also if this is as you say then both types of entries better be
treated in a distinct way for clarity in the code. This also avoids
having a nohash variable:
if ((dst->flags & DST_NOHASH)) {
/* dst not on hash and also not on gc list */
atomic_dec(&dst->__refcnt);
if (!atomic_read(&dst->refcnt))
/* We were the only reference kill child */
goto again;
/* return child to put it on the gc list */
return dst;
}
/* dst on the hashed list. Decrement refcnt and rely on garbage collector
* to eventually remove it
*/
atomic_dec(&dst->__refcnt);
return NULL;
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-07 16:00 ` Christoph Lameter
@ 2005-04-07 21:25 ` Herbert Xu
2005-04-07 22:30 ` Christoph Lameter
2005-04-08 5:45 ` Christoph Lameter
0 siblings, 2 replies; 27+ messages in thread
From: Herbert Xu @ 2005-04-07 21:25 UTC (permalink / raw)
To: Christoph Lameter; +Cc: David S. Miller, netdev, pravins, shai
On Thu, Apr 07, 2005 at 09:00:51AM -0700, Christoph Lameter wrote:
>
> In an earlier email you said that there was a slight chance that child
> entries (typically NOHASH from what I can see in the code) could be on the
> gc list.
Only child entries with NOHASH unset can be on the GC list.
> If its not on the hash table and not on the gc list then how could the
> refcnt be > 1 ? Doesnt the refcnt > 1 imply that multiple concurrent
> accesses are possible?
For NOHASH entries refcnt can be greater than one due to dst_pop.
Of course concurrent access is possible. However, the important
thing is that only the "owner" of an entry can call dst_destroy.
In the case of NOHASH entries, we are the owner.
> Also if this is as you say then both types of entries better be
> treated in a distinct way for clarity in the code. This also avoids
> having a nohash variable:
I never said that it is better to treat them differently. I wrote
it like this earlier because I still had atomic_dec_and_test.
> if ((dst->flags & DST_NOHASH)) {
> /* dst not on hash and also not on gc list */
> atomic_dec(&dst->__refcnt);
> if (!atomic_read(&dst->refcnt))
> /* We were the only reference kill child */
> goto again;
> /* return child to put it on the gc list */
> return dst;
> }
BTW this patch is missing a barrier.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-07 21:25 ` Herbert Xu
@ 2005-04-07 22:30 ` Christoph Lameter
2005-04-07 23:07 ` Herbert Xu
2005-04-08 5:45 ` Christoph Lameter
1 sibling, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2005-04-07 22:30 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev, pravins, shai
On Fri, 8 Apr 2005, Herbert Xu wrote:
> Only child entries with NOHASH unset can be on the GC list.
>
> > If its not on the hash table and not on the gc list then how could the
> > refcnt be > 1 ? Doesnt the refcnt > 1 imply that multiple concurrent
> > accesses are possible?
>
> For NOHASH entries refcnt can be greater than one due to dst_pop.
> Of course concurrent access is possible. However, the important
> thing is that only the "owner" of an entry can call dst_destroy.
>
> In the case of NOHASH entries, we are the owner.
So in case f.e. the refcnt was 2 for a child NOHASH entry, then we
decrement the refcnt for the child but do not free it. However after
dst_destroy the parent is gone and presumably some skb->dst are still
pointing to the dst entry (or is there something else accounting for the
remaining __refcnt)?
What happens to the child dst entry? Will the system simply never
deallocate it?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-07 22:30 ` Christoph Lameter
@ 2005-04-07 23:07 ` Herbert Xu
0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2005-04-07 23:07 UTC (permalink / raw)
To: Christoph Lameter; +Cc: David S. Miller, netdev, pravins, shai
On Thu, Apr 07, 2005 at 03:30:28PM -0700, Christoph Lameter wrote:
>
> So in case f.e. the refcnt was 2 for a child NOHASH entry, then we
> decrement the refcnt for the child but do not free it. However after
> dst_destroy the parent is gone and presumably some skb->dst are still
> pointing to the dst entry (or is there something else accounting for the
> remaining __refcnt)?
>
> What happens to the child dst entry? Will the system simply never
> deallocate it?
If the refcnt is non-zero after the atomic_dec and the entry is NOHASH,
we return it and the caller will add it to the GC list.
See the relevant section in dst_run_gc and dst_free.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-07 21:25 ` Herbert Xu
2005-04-07 22:30 ` Christoph Lameter
@ 2005-04-08 5:45 ` Christoph Lameter
2005-04-08 5:48 ` Herbert Xu
1 sibling, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2005-04-08 5:45 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev, pravins, shai
On Fri, 8 Apr 2005, Herbert Xu wrote:
> I never said that it is better to treat them differently. I wrote
> it like this earlier because I still had atomic_dec_and_test.
>
> > if ((dst->flags & DST_NOHASH)) {
> > /* dst not on hash and also not on gc list */
> > atomic_dec(&dst->__refcnt);
> > if (!atomic_read(&dst->refcnt))
> > /* We were the only reference kill child */
> > goto again;
> > /* return child to put it on the gc list */
> > return dst;
> > }
>
> BTW this patch is missing a barrier.
That was just some C code to explain what I meant. Here is a patch
against 2.6.12-rc2 with explanations as to what exactly is going on with
__refcnt so that we can avoid future confusion about this piece of code.
It removes the atomic_dec_and_test like the initial patch. I hope this
is acceptable now.
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Index: linux-2.6.11/net/core/dst.c
===================================================================
--- linux-2.6.11.orig/net/core/dst.c 2005-03-01 23:38:12.000000000 -0800
+++ linux-2.6.11/net/core/dst.c 2005-04-07 22:39:09.000000000 -0700
@@ -191,23 +191,47 @@ again:
dst->ops->destroy(dst);
if (dst->dev)
dev_put(dst->dev);
-#if RT_CACHE_DEBUG >= 2
+#if RT_CACHE_DEBUG >= 2
atomic_dec(&dst_total);
#endif
kmem_cache_free(dst->ops->kmem_cachep, dst);
dst = child;
if (dst) {
- if (atomic_dec_and_test(&dst->__refcnt)) {
- /* We were real parent of this dst, so kill child. */
- if (dst->flags&DST_NOHASH)
+ /*
+ * Note that the check for NO_HASH must be done before
+ * decrementing the __refcnt. If __refcnt reaches zero
+ * then the dst entry may be freed immediately by the gc
+ * running on another cpu. Therefore no field of the dst
+ * entry may be accessed after decrementing __refcnt
+ */
+ if (dst->flags&DST_NOHASH) {
+ /*
+ * The child is not on a hash table or on the gc list.
+ * We are the owner and are the only ones who could
+ * free the dst. There is no possibility of racing
+ * with the gc code.
+ */
+ atomic_dec(&dst->__refcnt);
+ smp_mb__after_atomic_dec();
+ if (!atomic_read(&dst->__refcnt))
+ /*
+ * There are no other references and therefore
+ * the dst can be freed now
+ */
goto again;
- } else {
- /* Child is still referenced, return it for freeing. */
- if (dst->flags&DST_NOHASH)
- return dst;
- /* Child is still in his hash table */
+
+ /*
+ * Child is still referenced and will be put on the gc
+ * list by the function invoking dst_destroy.
+ */
+ return dst;
}
+
+ /*
+ * Child is on the hash table. Just decrement the use counter.
+ */
+ atomic_dec(&dst->__refcnt);
}
return NULL;
}
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-08 5:45 ` Christoph Lameter
@ 2005-04-08 5:48 ` Herbert Xu
2005-04-08 15:05 ` Christoph Lameter
0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2005-04-08 5:48 UTC (permalink / raw)
To: Christoph Lameter; +Cc: David S. Miller, netdev, pravins, shai
On Thu, Apr 07, 2005 at 10:45:02PM -0700, Christoph Lameter wrote:
>
> + atomic_dec(&dst->__refcnt);
> + smp_mb__after_atomic_dec();
> + if (!atomic_read(&dst->__refcnt))
> + /*
> + * There are no other references and therefore
> + * the dst can be freed now
> + */
> goto again;
This is wrong. The barrier needs to be after the atomic_read.
Please see my patch to Dave.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-08 5:48 ` Herbert Xu
@ 2005-04-08 15:05 ` Christoph Lameter
2005-04-08 21:45 ` Herbert Xu
0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2005-04-08 15:05 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
On Fri, 8 Apr 2005, Herbert Xu wrote:
> This is wrong. The barrier needs to be after the atomic_read.
> Please see my patch to Dave.
Ok. How does one get onto the netdev list? The personal cc's of my mail
to you also always bounces with admin prohibition.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-08 15:05 ` Christoph Lameter
@ 2005-04-08 21:45 ` Herbert Xu
2005-04-09 15:28 ` Christoph Lameter
0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2005-04-08 21:45 UTC (permalink / raw)
To: Christoph Lameter; +Cc: David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 736 bytes --]
On Fri, Apr 08, 2005 at 08:05:00AM -0700, Christoph Lameter wrote:
> On Fri, 8 Apr 2005, Herbert Xu wrote:
>
> > This is wrong. The barrier needs to be after the atomic_read.
> > Please see my patch to Dave.
>
> Ok. How does one get onto the netdev list? The personal cc's of my mail
echo subscribe netdev | mail majordomo@oss.sgi.com
> to you also always bounces with admin prohibition.
Strange. I'm getting your direct emails just fine. In fact I'm
replying to it now.
Anyway, here is my last patch again.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p --]
[-- Type: text/plain, Size: 944 bytes --]
--- linux-2.6/net/core/dst.c.orig 2005-04-07 20:48:25.000000000 +1000
+++ linux-2.6/net/core/dst.c 2005-04-07 20:57:33.000000000 +1000
@@ -169,9 +169,9 @@
struct neighbour *neigh;
struct hh_cache *hh;
+again:
smp_rmb();
-again:
neigh = dst->neighbour;
hh = dst->hh;
child = dst->child;
@@ -197,19 +197,19 @@
kmem_cache_free(dst->ops->kmem_cachep, dst);
dst = child;
- if (dst) {
+ if (unlikely(dst)) {
int nohash = dst->flags & DST_NOHASH;
- if (atomic_dec_and_test(&dst->__refcnt)) {
+ atomic_dec(&dst->__refcnt);
+
+ if (nohash) {
/* We were real parent of this dst, so kill child. */
- if (nohash)
+ if (!atomic_read(&dst->__refcnt))
goto again;
- } else {
/* Child is still referenced, return it for freeing. */
- if (nohash)
- return dst;
- /* Child is still in his hash table */
+ return dst;
}
+ /* Child is still in his hash table or on the GC list. */
}
return NULL;
}
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-08 21:45 ` Herbert Xu
@ 2005-04-09 15:28 ` Christoph Lameter
0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2005-04-09 15:28 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
On Sat, 9 Apr 2005, Herbert Xu wrote:
> Anyway, here is my last patch again.
And here is my fixed up one:
Index: linux-2.6.11/net/core/dst.c
===================================================================
--- linux-2.6.11.orig/net/core/dst.c 2005-03-01 23:38:12.000000000 -0800
+++ linux-2.6.11/net/core/dst.c 2005-04-09 08:25:13.000000000 -0700
@@ -169,9 +169,8 @@ struct dst_entry *dst_destroy(struct dst
struct neighbour *neigh;
struct hh_cache *hh;
- smp_rmb();
-
again:
+ smp_rmb();
neigh = dst->neighbour;
hh = dst->hh;
child = dst->child;
@@ -191,23 +190,46 @@ again:
dst->ops->destroy(dst);
if (dst->dev)
dev_put(dst->dev);
-#if RT_CACHE_DEBUG >= 2
+#if RT_CACHE_DEBUG >= 2
atomic_dec(&dst_total);
#endif
kmem_cache_free(dst->ops->kmem_cachep, dst);
dst = child;
if (dst) {
- if (atomic_dec_and_test(&dst->__refcnt)) {
- /* We were real parent of this dst, so kill child. */
- if (dst->flags&DST_NOHASH)
+ /*
+ * Note that the check for NO_HASH must be done before
+ * decrementing the __refcnt. If __refcnt reaches zero
+ * then the dst entry may be freed immediately by the gc
+ * running on another cpu. Therefore no field of the dst
+ * entry may be accessed after decrementing __refcnt
+ */
+ if (dst->flags&DST_NOHASH) {
+ /*
+ * The child is not on a hash table or on the gc list.
+ * We are the owner and are the only ones who could
+ * free the dst. There is no possibility of racing
+ * with the gc code.
+ */
+ atomic_dec(&dst->__refcnt);
+ if (!atomic_read(&dst->__refcnt))
+ /*
+ * There are no other references and therefore
+ * the dst can be freed now
+ */
goto again;
- } else {
- /* Child is still referenced, return it for freeing. */
- if (dst->flags&DST_NOHASH)
- return dst;
- /* Child is still in his hash table */
+
+ /*
+ * Child is still referenced and will be put on the gc
+ * list by the function invoking dst_destroy.
+ */
+ return dst;
}
+
+ /*
+ * Child is on the hash table. Just decrement the use counter.
+ */
+ atomic_dec(&dst->__refcnt);
}
return NULL;
}
^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <b82a8917050406002339f732ca@mail.gmail.com>]
* Re: Fwd: atomic_dec_and_test for child dst needed in dst_destroy?
[not found] ` <b82a8917050406002339f732ca@mail.gmail.com>
@ 2005-04-06 8:53 ` pravin b shelar
2005-04-07 11:23 ` Herbert Xu
0 siblings, 1 reply; 27+ messages in thread
From: pravin b shelar @ 2005-04-06 8:53 UTC (permalink / raw)
To: "Christoph Lameter", netdev, davem
[-- Attachment #1: Type: text/plain, Size: 576 bytes --]
>
>
>This is racy (albeit very unlikely) because dst may be freed by
>dst_run_gc after the atomic_dec.
>
>When we are not the real parent of the dst (e.g., when we're xfrm_dst
>and the child is an rtentry), it may already be on the GC list.
>
>In fact the current code is buggy to, we need to check dst->flags
>before the dec as dst may no longer be valid afterwards.
>
>
>
>
This patch will remove the need for atomic_dec_and_test for this
particular case.
Now we can break down the atomic_dec_and_test to atomic_dec & atomic_read.
Please comment.
Regards
Pravin.
[-- Attachment #2: dst-atomic --]
[-- Type: text/plain, Size: 991 bytes --]
Index: linux-2.6.11-dsta/net/core/dst.c
===================================================================
--- linux-2.6.11-dsta.orig/net/core/dst.c 2005-04-06 00:18:32.000000000 -0700
+++ linux-2.6.11-dsta/net/core/dst.c 2005-04-06 01:31:10.000000000 -0700
@@ -198,17 +198,19 @@
dst = child;
if (dst) {
- if (atomic_dec_and_test(&dst->__refcnt)) {
- /* We were real parent of this dst, so kill child. */
- if (dst->flags&DST_NOHASH)
- goto again;
- } else {
- /* Child is still referenced, return it for freeing. */
- if (dst->flags&DST_NOHASH)
- return dst;
- /* Child is still in his hash table */
+ if (dst->flags&DST_NOHASH) {
+ atomic_dec(&dst->__refcnt);
+ if (atomic_read(&dst->__refcnt))
+ /* Child is still referenced, return it for freeing. */
+ return dst;
+
+ /*We were real parent of this dst, so kill child. */
+ goto again;
}
+ else /* Child is still in his hash table */
+ atomic_dec(&dst->__refcnt);
}
+
return NULL;
}
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: Fwd: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-06 8:53 ` Fwd: " pravin b shelar
@ 2005-04-07 11:23 ` Herbert Xu
2005-04-07 12:30 ` pravin b shelar
0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2005-04-07 11:23 UTC (permalink / raw)
To: pravin b shelar; +Cc: christoph, netdev, davem
pravin b shelar <pravins@calsoftinc.com> wrote:
>
> This patch will remove the need for atomic_dec_and_test for this
> particular case.
> Now we can break down the atomic_dec_and_test to atomic_dec & atomic_read.
Sorry I overlooked your patch. Otherwise we could have solved this
one day earlier :)
> Please comment.
There is just one problem. We need an extra barrier before or after
the goto since we no longer have the implicit barrier given by
atomic_dec_and_test.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: Fwd: atomic_dec_and_test for child dst needed in dst_destroy?
2005-04-07 11:23 ` Herbert Xu
@ 2005-04-07 12:30 ` pravin b shelar
0 siblings, 0 replies; 27+ messages in thread
From: pravin b shelar @ 2005-04-07 12:30 UTC (permalink / raw)
To: Herbert Xu; +Cc: christoph, netdev, davem
[-- Attachment #1: Type: text/plain, Size: 637 bytes --]
Herbert Xu wrote:
>pravin b shelar <pravins@calsoftinc.com> wrote:
>
>
>>This patch will remove the need for atomic_dec_and_test for this
>>particular case.
>>Now we can break down the atomic_dec_and_test to atomic_dec & atomic_read.
>>
>>
>
>Sorry I overlooked your patch. Otherwise we could have solved this
>one day earlier :)
>
>
ah ... I was just waiting :)
>
>
>>Please comment.
>>
>>
>
>There is just one problem. We need an extra barrier before or after
>the goto since we no longer have the implicit barrier given by
>atomic_dec_and_test.
>
>
I am attaching updated patch with barrier.
Regards,
Pravin.
[-- Attachment #2: dst-atomic_dec_and_test-removed.patch --]
[-- Type: text/x-patch, Size: 1176 bytes --]
This patch removes atomic_dec_and_test from dst_destroy, used to test child dst __refcnt.
Signed-off by: Pravin B. Shelar <pravins@calsoftinc.com>
Index: linux-2.6.11-dsta/net/core/dst.c
===================================================================
--- linux-2.6.11-dsta.orig/net/core/dst.c 2005-04-06 00:18:32.000000000 -0700
+++ linux-2.6.11-dsta/net/core/dst.c 2005-04-07 04:53:28.000000000 -0700
@@ -198,17 +198,21 @@
dst = child;
if (dst) {
- if (atomic_dec_and_test(&dst->__refcnt)) {
- /* We were real parent of this dst, so kill child. */
- if (dst->flags&DST_NOHASH)
- goto again;
- } else {
- /* Child is still referenced, return it for freeing. */
- if (dst->flags&DST_NOHASH)
- return dst;
- /* Child is still in his hash table */
+ if (dst->flags&DST_NOHASH) {
+ atomic_dec(&dst->__refcnt);
+ smp_mb__after_atomic_dec();
+
+ if (atomic_read(&dst->__refcnt))
+ /*-- Child is still referenced, return it for freeing. */
+ return dst;
+
+ /*We were real parent of this dst, so kill child. */
+ goto again;
}
+ else /* Child is still in his hash table */
+ atomic_dec(&dst->__refcnt);
}
+
return NULL;
}
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2005-04-09 15:28 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-05 18:55 atomic_dec_and_test for child dst needed in dst_destroy? Christoph Lameter
2005-04-05 19:34 ` David S. Miller
2005-04-05 19:47 ` Christoph Lameter
2005-04-05 19:50 ` David S. Miller
2005-04-05 19:58 ` Christoph Lameter
2005-04-05 20:12 ` David S. Miller
2005-04-05 21:45 ` Herbert Xu
2005-04-05 21:48 ` David S. Miller
2005-04-05 22:14 ` Christoph Lameter
2005-04-06 2:19 ` Herbert Xu
2005-04-06 3:19 ` Christoph Lameter
2005-04-06 8:32 ` Herbert Xu
2005-04-06 18:17 ` David S. Miller
2005-04-06 18:48 ` Christoph Lameter
2005-04-07 11:07 ` Herbert Xu
2005-04-07 16:00 ` Christoph Lameter
2005-04-07 21:25 ` Herbert Xu
2005-04-07 22:30 ` Christoph Lameter
2005-04-07 23:07 ` Herbert Xu
2005-04-08 5:45 ` Christoph Lameter
2005-04-08 5:48 ` Herbert Xu
2005-04-08 15:05 ` Christoph Lameter
2005-04-08 21:45 ` Herbert Xu
2005-04-09 15:28 ` Christoph Lameter
[not found] ` <b82a8917050406002339f732ca@mail.gmail.com>
2005-04-06 8:53 ` Fwd: " pravin b shelar
2005-04-07 11:23 ` Herbert Xu
2005-04-07 12:30 ` pravin b shelar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).