All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Roman Mindalev <r000n@r000n.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>,
	Patrick McHardy <kaber@trash.net>,
	LKML <linux-kernel@vger.kernel.org>,
	netfilter@vger.kernel.org
Subject: Re: Regression caused by commit "netfilter: iptables: lock free counters"
Date: Mon, 30 Mar 2009 09:32:51 +0200	[thread overview]
Message-ID: <49D075A3.1050202@cosmosbay.com> (raw)
In-Reply-To: <20090329234702.4988017f@desktop.r000n.info>

Roman Mindalev a écrit :
> Hi.
> Since 2.6.29-git1 I got many debug messages in kernel log.
> One from them:
> 
> Mar 29 12:06:11 desktop kernel: [  917.414369] BUG: using
> smp_processor_id() in preemptible [00000000] code: hddtemp/2820 Mar 29
> 12:06:11 desktop kernel: [  917.414378] caller is
> ipt_do_table+0xd0/0x3a8 [ip_tables] Mar 29 12:06:11 desktop kernel:
> [  917.414382] Pid: 2820, comm: hddtemp Not tainted
> 2.6.29-05093-gc31f403 #1 Mar 29 12:06:11 desktop kernel: [  917.414386]
> Call Trace: Mar 29 12:06:11 desktop kernel: [  917.414393]
> [<ffffffff8042d037>] debug_smp_processor_id+0xe7/0xf0 Mar 29 12:06:11
> desktop kernel: [  917.414400]  [<ffffffffa0171f10>]
> ipt_do_table+0xd0/0x3a8 [ip_tables] Mar 29 12:06:11 desktop kernel:
> [  917.414414]  [<ffffffffa0180d0e>] ? nf_conntrack_in+0x23e/0x610
> [nf_conntrack] Mar 29 12:06:11 desktop kernel: [  917.414422]
> [<ffffffff8026d47d>] ? trace_hardirqs_off+0xd/0x10 Mar 29 12:06:11
> desktop kernel: [  917.414429]  [<ffffffffa008e0b3>]
> ipt_local_out_hook+0x53/0x60 [iptable_filter] Mar 29 12:06:11 desktop
> kernel: [  917.414435]  [<ffffffff80546a07>] nf_iterate+0x67/0xa0 Mar
> 29 12:06:11 desktop kernel: [  917.414441]  [<ffffffff80551140>] ?
> dst_output+0x0/0x10 Mar 29 12:06:11 desktop kernel: [  917.414446]
> [<ffffffff80546af5>] nf_hook_slow+0xb5/0x100 Mar 29 12:06:11 desktop
> kernel: [  917.414452]  [<ffffffff80551140>] ? dst_output+0x0/0x10 Mar
> 29 12:06:11 desktop kernel: [  917.414458]  [<ffffffff80552f7f>]
> __ip_local_out+0x9f/0xb0 Mar 29 12:06:11 desktop kernel:
> [  917.414464]  [<ffffffff8058c4db>] ? bictcp_cong_avoid+0x4b/0x2d0 Mar
> 29 12:06:11 desktop kernel: [  917.414469]  [<ffffffff80552fa1>]
> ip_local_out+0x11/0x30 Mar 29 12:06:11 desktop kernel: [  917.414474]
> [<ffffffff8055389c>] ip_queue_xmit+0x21c/0x410 Mar 29 12:06:11 desktop
> kernel: [  917.414480]  [<ffffffff80272825>] ?
> print_lock_contention_bug+0x25/0x120 Mar 29 12:06:11 desktop kernel:
> [  917.414486]  [<ffffffff8056810e>] ?
> tcp_established_options+0x3e/0xe0 Mar 29 12:06:11 desktop kernel:
> [  917.414492]  [<ffffffff80272825>] ?
> print_lock_contention_bug+0x25/0x120 Mar 29 12:06:11 desktop kernel:
> [  917.414498]  [<ffffffff80568ef5>] tcp_transmit_skb+0x535/0x820 Mar
> 29 12:06:11 desktop kernel: [  917.414504]  [<ffffffff8056b5fb>]
> tcp_write_xmit+0x20b/0xaa0 Mar 29 12:06:11 desktop kernel:
> [  917.414510]  [<ffffffff8055d451>] ? sk_stream_alloc_skb+0x41/0x110
> Mar 29 12:06:11 desktop kernel: [  917.414515]  [<ffffffff8052799e>] ?
> __alloc_skb+0x6e/0x150 Mar 29 12:06:11 desktop kernel: [  917.414521]
> [<ffffffff8056befb>] __tcp_push_pending_frames+0x2b/0x90 Mar 29
> 12:06:11 desktop kernel: [  917.414527]  [<ffffffff8055ddc3>]
> tcp_sendmsg+0x8a3/0xbb0 Mar 29 12:06:11 desktop kernel: [  917.414533]
> [<ffffffff8051d2bb>] sock_aio_write+0x12b/0x140 Mar 29 12:06:11 desktop
> kernel: [  917.414539]  [<ffffffff802df6f1>] do_sync_write+0xf1/0x140
> Mar 29 12:06:11 desktop kernel: [  917.414545]  [<ffffffff8025e470>] ?
> autoremove_wake_function+0x0/0x40 Mar 29 12:06:11 desktop kernel:
> [  917.414551]  [<ffffffff802e0259>] vfs_write+0x179/0x190 Mar 29
> 12:06:11 desktop kernel: [  917.414557]  [<ffffffff802e0360>]
> sys_write+0x50/0x90 Mar 29 12:06:11 desktop kernel: [  917.414563]
> [<ffffffff8020b51b>] system_call_fastpath+0x16/0x1b
> 
> Result of the bisection:
> 
> 784544739a25c30637397ace5489eeb6e15d7d49 is first bad commit
> commit 784544739a25c30637397ace5489eeb6e15d7d49             
> Author: Stephen Hemminger <shemminger@vyatta.com>           
> Date:   Fri Feb 20 10:35:32 2009 +0100                      
> 
>     netfilter: iptables: lock free counters
>                                            
>     The reader/writer lock in ip_tables is acquired in the critical
>     path of processing packets and is one of the reasons just loading
>     iptables can cause a 20% performance loss. The rwlock serves two
>     functions:
> 
>     1) it prevents changes to table state (xt_replace) while table is
>     in use. This is now handled by doing rcu on the xt_table. When
>     table is replaced, the new table(s) are put in and the old one
>     table(s) are freed after RCU period.
> 
>     2) it provides synchronization when accesing the counter values.
>        This is now handled by swapping in new table_info entries for
>     each cpu then summing the old values, and putting the result back
>     onto one cpu.  On a busy system it may cause sampling to occur at
>     different times on each cpu, but no packet/byte counts are lost in
>     the process.
> 
>     Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
>     Sucessfully tested on my dual quad core machine too, but iptables
>     only (no ipv6 here) BTW, my new "tbench 8" result is 2450 MB/s, (it
>     was 2150 MB/s not so long ago)
> 
>     Acked-by: Eric Dumazet <dada1@cosmosbay.com>
>     Signed-off-by: Patrick McHardy <kaber@trash.net>
> 
> :040000 040000 67ef5761d009330e8d538dd1a3aff77c86995172
> 0d8d6398ced2117bb6995b0c414fd8013685324e M      include :040000 040000
> 0317c745a0722ade9a6c0cbe9d6cb92abbd40efa
> 0eb195a365164c05e7c764ab1c2a7da891276837 M      net
> 
> After reverting commit no more these messages.
> 
> This issue in bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=12960

Thanks Roman for this excellent report 

Could you try following patch ?

[PATCH] netfilter: use rcu_read_bh() in ipt_do_table()

Commit 784544739a25c30637397ace5489eeb6e15d7d49
(netfilter: iptables: lock free counters) forgot to disable BH
in arpt_do_table(), ipt_do_table() and  ip6t_do_table()

Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.

Reported-and-bisected-by: Roman Mindalev <r000n@r000n.net>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 35c5f6a..5ba533d 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -253,7 +253,7 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 	indev = in ? in->name : nulldevname;
 	outdev = out ? out->name : nulldevname;
 
-	rcu_read_lock();
+	rcu_read_lock_bh();
 	private = rcu_dereference(table->private);
 	table_base = rcu_dereference(private->entries[smp_processor_id()]);
 
@@ -329,7 +329,7 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 		}
 	} while (!hotdrop);
 
-	rcu_read_unlock();
+	rcu_read_unlock_bh();
 
 	if (hotdrop)
 		return NF_DROP;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 82ee7c9..810c0b6 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -339,7 +339,7 @@ ipt_do_table(struct sk_buff *skb,
 
 	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
 
-	rcu_read_lock();
+	rcu_read_lock_bh();
 	private = rcu_dereference(table->private);
 	table_base = rcu_dereference(private->entries[smp_processor_id()]);
 
@@ -437,7 +437,7 @@ ipt_do_table(struct sk_buff *skb,
 		}
 	} while (!hotdrop);
 
-	rcu_read_unlock();
+	rcu_read_unlock_bh();
 
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index e89cfa3..dfed176 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -365,7 +365,7 @@ ip6t_do_table(struct sk_buff *skb,
 
 	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
 
-	rcu_read_lock();
+	rcu_read_lock_bh();
 	private = rcu_dereference(table->private);
 	table_base = rcu_dereference(private->entries[smp_processor_id()]);
 
@@ -466,7 +466,7 @@ ip6t_do_table(struct sk_buff *skb,
 #ifdef CONFIG_NETFILTER_DEBUG
 	((struct ip6t_entry *)table_base)->comefrom = NETFILTER_LINK_POISON;
 #endif
-	rcu_read_unlock();
+	rcu_read_unlock_bh();
 
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;

  reply	other threads:[~2009-03-30  7:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-29 19:47 Regression caused by commit "netfilter: iptables: lock free counters" Roman Mindalev
2009-03-30  7:32 ` Eric Dumazet [this message]
2009-03-30 10:50   ` Roman Mindalev
2009-03-30 10:50     ` Roman Mindalev
2009-03-30 12:08     ` Patrick McHardy
2009-03-30 14:56       ` Stephen Hemminger
2009-04-02  7:54       ` David Miller
2009-03-30 12:06 ` Patrick McHardy
2009-04-05  7:05 ` Graham Murray
2009-04-05  8:22   ` David Miller
2009-04-05 10:01     ` Eric Dumazet
2009-04-05 10:01       ` Eric Dumazet
2009-04-05 10:12       ` Graham Murray
2009-04-05 10:30         ` Eric Dumazet
2009-04-05 12:29           ` [PATCH] netfilter: ip6tables fix Eric Dumazet
2009-04-05 12:40             ` Eric Dumazet
2009-04-06 15:08               ` Patrick McHardy
2009-04-05 11:36       ` Regression caused by commit "netfilter: iptables: lock free counters" Jan Engelhardt
2009-04-05 12:34         ` Eric Dumazet
2009-04-05 12:34           ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49D075A3.1050202@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=kaber@trash.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netfilter@vger.kernel.org \
    --cc=r000n@r000n.net \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.