netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net: only run neigh_forced_gc() from one cpu
@ 2012-09-19  9:27 Eric Dumazet
  2012-09-19 10:50 ` Neil Horman
  2012-09-19 17:45 ` David Miller
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2012-09-19  9:27 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Maciej Żenczykowski, Tom Herbert, Lorenzo Colitti

From: Eric Dumazet <edumazet@google.com>

With multiqueue NIC or RPS, we can have situation where all cpus are
spending huge amount of cycles in neigh_forced_gc(), and machine can
crash.

Since we are under probable attack, its better to let only one cpu
do the scan, and other cpus immediately return from neigh_forced_gc()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
---
Google-Bug-Id: 7121897

 include/net/neighbour.h |    1 +
 net/core/neighbour.c    |    9 +++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 0dab173..ba21e93 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -178,6 +178,7 @@ struct neigh_table {
 	struct neigh_statistics	__percpu *stats;
 	struct neigh_hash_table __rcu *nht;
 	struct pneigh_entry	**phash_buckets;
+	spinlock_t		forced_gc_lock;
 };
 
 #define NEIGH_PRIV_ALIGN	sizeof(long long)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index c160adb..1f7d8fa 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -134,9 +134,12 @@ static int neigh_forced_gc(struct neigh_table *tbl)
 	int i;
 	struct neigh_hash_table *nht;
 
+	if (!spin_trylock_bh(&tbl->forced_gc_lock))
+		return 0;
+
 	NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
 
-	write_lock_bh(&tbl->lock);
+	write_lock(&tbl->lock);
 	nht = rcu_dereference_protected(tbl->nht,
 					lockdep_is_held(&tbl->lock));
 	for (i = 0; i < (1 << nht->hash_shift); i++) {
@@ -169,7 +172,8 @@ static int neigh_forced_gc(struct neigh_table *tbl)
 
 	tbl->last_flush = jiffies;
 
-	write_unlock_bh(&tbl->lock);
+	write_unlock(&tbl->lock);
+	spin_unlock_bh(&tbl->forced_gc_lock);
 
 	return shrunk;
 }
@@ -1545,6 +1549,7 @@ static void neigh_table_init_no_netlink(struct neigh_table *tbl)
 		panic("cannot allocate neighbour cache hashes");
 
 	rwlock_init(&tbl->lock);
+	spin_lock_init(&tbl->forced_gc_lock);
 	INIT_DELAYED_WORK_DEFERRABLE(&tbl->gc_work, neigh_periodic_work);
 	schedule_delayed_work(&tbl->gc_work, tbl->parms.reachable_time);
 	setup_timer(&tbl->proxy_timer, neigh_proxy_process, (unsigned long)tbl);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] net: only run neigh_forced_gc() from one cpu
  2012-09-19  9:27 [PATCH net-next] net: only run neigh_forced_gc() from one cpu Eric Dumazet
@ 2012-09-19 10:50 ` Neil Horman
  2012-09-19 11:07   ` Eric Dumazet
  2012-09-19 17:45 ` David Miller
  1 sibling, 1 reply; 8+ messages in thread
From: Neil Horman @ 2012-09-19 10:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Maciej Żenczykowski, Tom Herbert,
	Lorenzo Colitti

On Wed, Sep 19, 2012 at 11:27:07AM +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> With multiqueue NIC or RPS, we can have situation where all cpus are
> spending huge amount of cycles in neigh_forced_gc(), and machine can
> crash.
> 
> Since we are under probable attack, its better to let only one cpu
> do the scan, and other cpus immediately return from neigh_forced_gc()
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Lorenzo Colitti <lorenzo@google.com>
> Cc: Maciej Żenczykowski <maze@google.com>
> Cc: Tom Herbert <therbert@google.com>
> ---
> Google-Bug-Id: 7121897
> 
>  include/net/neighbour.h |    1 +
>  net/core/neighbour.c    |    9 +++++++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
> index 0dab173..ba21e93 100644
> --- a/include/net/neighbour.h
> +++ b/include/net/neighbour.h
> @@ -178,6 +178,7 @@ struct neigh_table {
>  	struct neigh_statistics	__percpu *stats;
>  	struct neigh_hash_table __rcu *nht;
>  	struct pneigh_entry	**phash_buckets;
> +	spinlock_t		forced_gc_lock;
>  };
>  
>  #define NEIGH_PRIV_ALIGN	sizeof(long long)
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index c160adb..1f7d8fa 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -134,9 +134,12 @@ static int neigh_forced_gc(struct neigh_table *tbl)
>  	int i;
>  	struct neigh_hash_table *nht;
>  
> +	if (!spin_trylock_bh(&tbl->forced_gc_lock))
> +		return 0;
> +
This is going to cause callers in neigh_alloc to immediately fail their
allocation attempts.  Would it be a good idea to modify that call site so that
instead of returning NULL, instead reread tbl->entries before comparing to
gc_thresh3, on the hope that the cpu in the garbage collecting routine has freed
some entries?

Neil

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] net: only run neigh_forced_gc() from one cpu
  2012-09-19 10:50 ` Neil Horman
@ 2012-09-19 11:07   ` Eric Dumazet
  2012-09-19 11:09     ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2012-09-19 11:07 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, netdev, Maciej Żenczykowski, Tom Herbert,
	Lorenzo Colitti

On Wed, 2012-09-19 at 06:50 -0400, Neil Horman wrote:

> This is going to cause callers in neigh_alloc to immediately fail their
> allocation attempts.  Would it be a good idea to modify that call site so that
> instead of returning NULL, instead reread tbl->entries before comparing to
> gc_thresh3, on the hope that the cpu in the garbage collecting routine has freed
> some entries?

neigh_alloc() fails only if gc_thresh3 is hit, and if it is hit, we are
under attack by definition.

(the gc is run every 5 seconds is above gc_thresh2, and below
gc_thresh3)

No matter what you try, the attacker is going to be the winner.

The best thing here is to drop packets, not spending several milli
seconds to serve one packet, as queues are going to tail drop anyway.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] net: only run neigh_forced_gc() from one cpu
  2012-09-19 11:07   ` Eric Dumazet
@ 2012-09-19 11:09     ` Eric Dumazet
  2012-09-19 12:54       ` Neil Horman
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2012-09-19 11:09 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, netdev, Maciej Żenczykowski, Tom Herbert,
	Lorenzo Colitti

On Wed, 2012-09-19 at 13:07 +0200, Eric Dumazet wrote:
> On Wed, 2012-09-19 at 06:50 -0400, Neil Horman wrote:
> 
> > This is going to cause callers in neigh_alloc to immediately fail their
> > allocation attempts.  Would it be a good idea to modify that call site so that
> > instead of returning NULL, instead reread tbl->entries before comparing to
> > gc_thresh3, on the hope that the cpu in the garbage collecting routine has freed
> > some entries?
> 
> neigh_alloc() fails only if gc_thresh3 is hit, and if it is hit, we are
> under attack by definition.
> 
> (the gc is run every 5 seconds is above gc_thresh2, and below
> gc_thresh3)
> 
> No matter what you try, the attacker is going to be the winner.
> 
> The best thing here is to drop packets, not spending several milli
> seconds to serve one packet, as queues are going to tail drop anyway.
> 

I meant several hundred of milli seconds per packet.

In our tests we even trigger a softlockup, so thats more than 10 seconds
waiting for the rwlock, for a single packet.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] net: only run neigh_forced_gc() from one cpu
  2012-09-19 11:09     ` Eric Dumazet
@ 2012-09-19 12:54       ` Neil Horman
  0 siblings, 0 replies; 8+ messages in thread
From: Neil Horman @ 2012-09-19 12:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Maciej Żenczykowski, Tom Herbert,
	Lorenzo Colitti

On Wed, Sep 19, 2012 at 01:09:17PM +0200, Eric Dumazet wrote:
> On Wed, 2012-09-19 at 13:07 +0200, Eric Dumazet wrote:
> > On Wed, 2012-09-19 at 06:50 -0400, Neil Horman wrote:
> > 
> > > This is going to cause callers in neigh_alloc to immediately fail their
> > > allocation attempts.  Would it be a good idea to modify that call site so that
> > > instead of returning NULL, instead reread tbl->entries before comparing to
> > > gc_thresh3, on the hope that the cpu in the garbage collecting routine has freed
> > > some entries?
> > 
> > neigh_alloc() fails only if gc_thresh3 is hit, and if it is hit, we are
> > under attack by definition.
> > 
> > (the gc is run every 5 seconds is above gc_thresh2, and below
> > gc_thresh3)
> > 
> > No matter what you try, the attacker is going to be the winner.
> > 
> > The best thing here is to drop packets, not spending several milli
> > seconds to serve one packet, as queues are going to tail drop anyway.
> > 
> 
> I meant several hundred of milli seconds per packet.
> 
> In our tests we even trigger a softlockup, so thats more than 10 seconds
> waiting for the rwlock, for a single packet.
> 
Ok, thanks for the explination
Acked-by: Neil Horman <nhorman@tuxdriver.com>

> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] net: only run neigh_forced_gc() from one cpu
  2012-09-19  9:27 [PATCH net-next] net: only run neigh_forced_gc() from one cpu Eric Dumazet
  2012-09-19 10:50 ` Neil Horman
@ 2012-09-19 17:45 ` David Miller
       [not found]   ` <CAKD1Yr11twYakD=-Mn8FiB11-xtyOLb-Q6XWmiE3H63SFRbzUw@mail.gmail.com>
  1 sibling, 1 reply; 8+ messages in thread
From: David Miller @ 2012-09-19 17:45 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, maze, therbert, lorenzo

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 19 Sep 2012 11:27:07 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> With multiqueue NIC or RPS, we can have situation where all cpus are
> spending huge amount of cycles in neigh_forced_gc(), and machine can
> crash.
> 
> Since we are under probable attack, its better to let only one cpu
> do the scan, and other cpus immediately return from neigh_forced_gc()
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

I suspect you're only seeing this with ipv6 traffic present.

And you should mention that because it determines how we really
should handle this.

I seriously doubt you can trigger this with a pure-ipv4 workload
because all neigh entries are ref-less and thus trivially reclaimable.

If you can trigger it with ipv4, then reclaim needs to be rewritten so
that for ref-less entries it's more efficient and scalable.

I'm not applying this patch.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] net: only run neigh_forced_gc() from one cpu
       [not found]   ` <CAKD1Yr11twYakD=-Mn8FiB11-xtyOLb-Q6XWmiE3H63SFRbzUw@mail.gmail.com>
@ 2012-09-20  3:51     ` David Miller
  2012-09-20 11:22       ` Lorenzo Colitti
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2012-09-20  3:51 UTC (permalink / raw)
  To: lorenzo; +Cc: eric.dumazet, netdev, maze, therbert

From: Lorenzo Colitti <lorenzo@google.com>
Date: Thu, 20 Sep 2012 12:48:09 +0900

> Sorry to butt in, but just to clarify - are you just saying that the
> description should mention IPv6 explicitly, or is there something else
> wrong with the patch?
> 
> If this patch makes IPv6 performance better without affecting IPv4, it's a
> good idea to apply it anyway, right? IPv6 dst entry garbage collection can
> potentially cause serious performance issues on any server with a public
> IPv6 address, and this patch substantially improves the situation.

He's targetting net-next, and I've told him both in previous public
discussions and in recent private communication that the correct fix
is to make ipv6 routes use ref-count-less neighbour handling schemes
like ipv4.

This patch will not be applied as-is, it's not the correct way to fix
this problem.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] net: only run neigh_forced_gc() from one cpu
  2012-09-20  3:51     ` David Miller
@ 2012-09-20 11:22       ` Lorenzo Colitti
  0 siblings, 0 replies; 8+ messages in thread
From: Lorenzo Colitti @ 2012-09-20 11:22 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, maze, therbert

On Thu, Sep 20, 2012 at 12:51 PM, David Miller <davem@davemloft.net> wrote:
>> If this patch makes IPv6 performance better without affecting IPv4, it's a
>> good idea to apply it anyway, right? IPv6 dst entry garbage collection can
>> potentially cause serious performance issues on any server with a public
>> IPv6 address, and this patch substantially improves the situation.
>
> He's targetting net-next, and I've told him both in previous public
> discussions and in recent private communication that the correct fix
> is to make ipv6 routes use ref-count-less neighbour handling schemes
> like ipv4.

Fair enough. Removing the cache is a better solution - requiring a
separate cache entry for every address you want to send a packet to is
not suited to a world where every user has 2^64 addresses or more. But
if removing the route cache for IPv6 is a large amount of work that
nobody will sign up for, then fixing the symptoms might be better than
nothing.

The performance degradation could become an attack vector. Of course
the people that run IPv6 servers today can maintain their own patches,
but that's sort of suboptimal.

Is there something else that can be done other than moving to
non-refcounted neighbours?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-09-20 11:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-19  9:27 [PATCH net-next] net: only run neigh_forced_gc() from one cpu Eric Dumazet
2012-09-19 10:50 ` Neil Horman
2012-09-19 11:07   ` Eric Dumazet
2012-09-19 11:09     ` Eric Dumazet
2012-09-19 12:54       ` Neil Horman
2012-09-19 17:45 ` David Miller
     [not found]   ` <CAKD1Yr11twYakD=-Mn8FiB11-xtyOLb-Q6XWmiE3H63SFRbzUw@mail.gmail.com>
2012-09-20  3:51     ` David Miller
2012-09-20 11:22       ` Lorenzo Colitti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).