Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
       [not found]         ` <CAB9dFds7KQxihReHhW9CXJeY9+=4BPema3ZawVA89U45QL5uBw@mail.gmail.com>
@ 2016-07-05 12:28           ` Pablo Neira Ayuso
  2016-07-10 19:48             ` Marc Dionne
  0 siblings, 1 reply; 5+ messages in thread
From: Pablo Neira Ayuso @ 2016-07-05 12:28 UTC (permalink / raw)
  To: Marc Dionne; +Cc: Florian Westphal, netdev, regressions, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1537 bytes --]

Hi,

On Mon, Jul 04, 2016 at 09:35:28AM -0300, Marc Dionne wrote:
> If there is no quick fix, seems like a revert should be considered:
> - Looks to me like the commit attempts to fix a long standing bug
> (exists at least as far back as 3.5,
> https://bugzilla.kernel.org/show_bug.cgi?id=52991)
> - The above bug has a simple workaround (at least for us) that we
> implemented more than 3 years ago

I guess the workaround consists of using a rule to NOTRACK this
traffic. Or there is any custom patch that you've used on your side to
resolve this?

> - The commit reverts cleanly, restoring the original behaviour
> - From that bug report, bind was one of the affected applications; I
> would suspect that this regression is likely to affect bind as well
> 
> I'd be more than happy to test suggested fixes or give feedback with
> debugging patches, etc.

Could you monitor

# conntrack -S

or alternatively (if conntrack utility not available in your system):

# cat /proc/net/stat/nf_conntrack

?

Please, watch for insert_failed and drop statistics.

Are you observing any splat or just large packet drops? Could you
compile your kernel with lockdep on and retest?

Is there any chance I can get your test file that generates the UDP
client threads to reproduce this here?

I'm also attaching a patch to drop old ct that lost race path out from
hashtable locks to avoid releasing the ct object while holding the
locks, although I couldn't come up with any interaction so far
triggering the condition that you're observing.

Thanks.

[-- Attachment #2: x.patch --]
[-- Type: text/x-diff, Size: 1917 bytes --]

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 62c42e9..98a71f1 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -638,7 +638,8 @@ static void nf_ct_acct_merge(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
 /* Resolve race on insertion if this protocol allows this. */
 static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
 			       enum ip_conntrack_info ctinfo,
-			       struct nf_conntrack_tuple_hash *h)
+			       struct nf_conntrack_tuple_hash *h,
+			       struct nf_conn **old_ct)
 {
 	/* This is the conntrack entry already in hashes that won race. */
 	struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
@@ -649,7 +650,7 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
 	    !nf_ct_is_dying(ct) &&
 	    atomic_inc_not_zero(&ct->ct_general.use)) {
 		nf_ct_acct_merge(ct, ctinfo, (struct nf_conn *)skb->nfct);
-		nf_conntrack_put(skb->nfct);
+		*old_ct = (struct nf_conn *)skb->nfct;
 		/* Assign conntrack already in hashes to this skbuff. Don't
 		 * modify skb->nfctinfo to ensure consistent stateful filtering.
 		 */
@@ -667,7 +668,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	const struct nf_conntrack_zone *zone;
 	unsigned int hash, reply_hash;
 	struct nf_conntrack_tuple_hash *h;
-	struct nf_conn *ct;
+	struct nf_conn *ct, *old_ct = NULL;
 	struct nf_conn_help *help;
 	struct nf_conn_tstamp *tstamp;
 	struct hlist_nulls_node *n;
@@ -771,11 +772,14 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 
 out:
 	nf_ct_add_to_dying_list(ct);
-	ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
+	ret = nf_ct_resolve_clash(net, skb, ctinfo, h, &old_ct);
 dying:
 	nf_conntrack_double_unlock(hash, reply_hash);
 	NF_CT_STAT_INC(net, insert_failed);
 	local_bh_enable();
+	if (old_ct)
+		nf_ct_put(old_ct);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(__nf_conntrack_confirm);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-07-05 12:28           ` Multi-thread udp 4.7 regression, bisected to 71d8c47fc653 Pablo Neira Ayuso
@ 2016-07-10 19:48             ` Marc Dionne
  2016-07-11 16:26               ` Pablo Neira Ayuso
  0 siblings, 1 reply; 5+ messages in thread
From: Marc Dionne @ 2016-07-10 19:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, netdev, Thorsten Leemhuis, netfilter-devel

On Tue, Jul 5, 2016 at 9:28 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi,
>
> On Mon, Jul 04, 2016 at 09:35:28AM -0300, Marc Dionne wrote:
>> If there is no quick fix, seems like a revert should be considered:
>> - Looks to me like the commit attempts to fix a long standing bug
>> (exists at least as far back as 3.5,
>> https://bugzilla.kernel.org/show_bug.cgi?id=52991)
>> - The above bug has a simple workaround (at least for us) that we
>> implemented more than 3 years ago
>
> I guess the workaround consists of using a rule to NOTRACK this
> traffic. Or there is any custom patch that you've used on your side to
> resolve this?
>
>> - The commit reverts cleanly, restoring the original behaviour
>> - From that bug report, bind was one of the affected applications; I
>> would suspect that this regression is likely to affect bind as well
>>
>> I'd be more than happy to test suggested fixes or give feedback with
>> debugging patches, etc.
>
> Could you monitor
>
> # conntrack -S
>
> or alternatively (if conntrack utility not available in your system):
>
> # cat /proc/net/stat/nf_conntrack
>
> ?
>
> Please, watch for insert_failed and drop statistics.
>
> Are you observing any splat or just large packet drops? Could you
> compile your kernel with lockdep on and retest?
>
> Is there any chance I can get your test file that generates the UDP
> client threads to reproduce this here?
>
> I'm also attaching a patch to drop old ct that lost race path out from
> hashtable locks to avoid releasing the ct object while holding the
> locks, although I couldn't come up with any interaction so far
> triggering the condition that you're observing.
>
> Thanks.

An update here since I've had some interactions with Pablo off list.

Further testing shows that the underlying cause of the different test
results is a udp packet that has a bogus source port number.  In the
test the server process tries to send an ack to the bogus port and the
flow is disrupted.

Notes:
- The packet with the bad source port is from a sendmsg() call that
has hit the connection tracker clash code introduced by 71d8c47fc653
- Packets are successfully sent after the bad one, from the same
socket, with the correct source port number
- The problem does not reproduce with 71d8c47fc653 reverted, or
without nf_conntrack loaded
- The bogus port numbers start at 1024, bumping up by 1 every few
times the problem occurs (1025, 1026, etc.)
- The patch above does not change the behaviour
- Enabling lockdep does not show anything

Our workaround for the original race was to retry sendmsg() once on
EPERM errors, and that had been effective.
I can trigger the insertion clash easily with some simple test code,
but I have not been able so far to reproduce the packets with bad
source port numbers with some simpler code that I could share.

Thanks,
Marc

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-07-10 19:48             ` Marc Dionne
@ 2016-07-11 16:26               ` Pablo Neira Ayuso
  2016-07-11 21:17                 ` Marc Dionne
  0 siblings, 1 reply; 5+ messages in thread
From: Pablo Neira Ayuso @ 2016-07-11 16:26 UTC (permalink / raw)
  To: Marc Dionne; +Cc: Florian Westphal, netdev, Thorsten Leemhuis, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1834 bytes --]

On Sun, Jul 10, 2016 at 04:48:26PM -0300, Marc Dionne wrote:
> An update here since I've had some interactions with Pablo off list.
> 
> Further testing shows that the underlying cause of the different test
> results is a udp packet that has a bogus source port number.  In the
> test the server process tries to send an ack to the bogus port and the
> flow is disrupted.
> 
> Notes:
> - The packet with the bad source port is from a sendmsg() call that
> has hit the connection tracker clash code introduced by 71d8c47fc653
> - Packets are successfully sent after the bad one, from the same
> socket, with the correct source port number
> - The problem does not reproduce with 71d8c47fc653 reverted, or
> without nf_conntrack loaded
> - The bogus port numbers start at 1024, bumping up by 1 every few
> times the problem occurs (1025, 1026, etc.)
> - The patch above does not change the behaviour
> - Enabling lockdep does not show anything
> 
> Our workaround for the original race was to retry sendmsg() once on
> EPERM errors, and that had been effective.
> I can trigger the insertion clash easily with some simple test code,
> but I have not been able so far to reproduce the packets with bad
> source port numbers with some simpler code that I could share.

The NAT nul-binding is triggering the source port mangling, even if
there is no NAT rules in place. The following patch skips the clash
resolution for NAT by now since we don't see a simple solution for
this case at the moment.

Could you give a try to this patch in these two cases?

1) No NAT in place: Make sure iptable_nat module is not there. Or if
   you're using nf_tables, just make sure you have no nat chains at
   all.

2) With NAT in place, you hit back the EPERM errors that you've
   observed so far.

Please, test both scenarios and report back. Thanks.

[-- Attachment #2: 0001-netfilter-conntrack-skip-clash-resolution-if-nat-is-.patch --]
[-- Type: text/x-diff, Size: 1559 bytes --]

>From c3b9dfbcf35ea38a3dce22daf7450fc23c620aea Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon, 11 Jul 2016 17:28:54 +0200
Subject: [PATCH] netfilter: conntrack: skip clash resolution if nat is in
 place

The clash resolution is not easy to apply if the NAT table is
registered. Even if no NAT rules are installed, the nul-binding ensures
that a unique tuple is used, thus, the packet that loses race gets a
different source port number, as described by:

http://marc.info/?l=netfilter-devel&m=146818011604484&w=2

Clash resolution with NAT is also problematic if addresses/port range
ports are used since the conntrack that wins race may describe a
different mangling that we may have earlier applied to the packet via
nf_nat_setup_info().

Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index e0e9c9a..2b0f80a 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -657,6 +657,7 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
 
 	l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
 	if (l4proto->allow_clash &&
+	    !nfct_nat(ct) &&
 	    !nf_ct_is_dying(ct) &&
 	    atomic_inc_not_zero(&ct->ct_general.use)) {
 		nf_ct_acct_merge(ct, ctinfo, (struct nf_conn *)skb->nfct);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-07-11 16:26               ` Pablo Neira Ayuso
@ 2016-07-11 21:17                 ` Marc Dionne
  2016-07-12 14:25                   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 5+ messages in thread
From: Marc Dionne @ 2016-07-11 21:17 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, netdev, Thorsten Leemhuis, netfilter-devel

On Mon, Jul 11, 2016 at 1:26 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Sun, Jul 10, 2016 at 04:48:26PM -0300, Marc Dionne wrote:
>> An update here since I've had some interactions with Pablo off list.
>>
>> Further testing shows that the underlying cause of the different test
>> results is a udp packet that has a bogus source port number.  In the
>> test the server process tries to send an ack to the bogus port and the
>> flow is disrupted.
>>
>> Notes:
>> - The packet with the bad source port is from a sendmsg() call that
>> has hit the connection tracker clash code introduced by 71d8c47fc653
>> - Packets are successfully sent after the bad one, from the same
>> socket, with the correct source port number
>> - The problem does not reproduce with 71d8c47fc653 reverted, or
>> without nf_conntrack loaded
>> - The bogus port numbers start at 1024, bumping up by 1 every few
>> times the problem occurs (1025, 1026, etc.)
>> - The patch above does not change the behaviour
>> - Enabling lockdep does not show anything
>>
>> Our workaround for the original race was to retry sendmsg() once on
>> EPERM errors, and that had been effective.
>> I can trigger the insertion clash easily with some simple test code,
>> but I have not been able so far to reproduce the packets with bad
>> source port numbers with some simpler code that I could share.
>
> The NAT nul-binding is triggering the source port mangling, even if
> there is no NAT rules in place. The following patch skips the clash
> resolution for NAT by now since we don't see a simple solution for
> this case at the moment.
>
> Could you give a try to this patch in these two cases?
>
> 1) No NAT in place: Make sure iptable_nat module is not there. Or if
>    you're using nf_tables, just make sure you have no nat chains at
>    all.
>
> 2) With NAT in place, you hit back the EPERM errors that you've
>    observed so far.
>
> Please, test both scenarios and report back. Thanks.

Hi Pablo,

Testing out your patch:

1) With no NAT in place, the clash resolution happens, with no side
effects.  No EPERM errors are seen.

2) With ip(6)table_nat loaded, the clash resolution fails and I get
some EPERM errors from sendmsg(), same as before 71d8c47fc653.

Turns out that even though I have no NAT rules in my iptables config,
the system also had firewalld active and that caused the modules to be
loaded.

So the bottom line is that the patch looks good to me..

Thanks,
Marc

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-07-11 21:17                 ` Marc Dionne
@ 2016-07-12 14:25                   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 5+ messages in thread
From: Pablo Neira Ayuso @ 2016-07-12 14:25 UTC (permalink / raw)
  To: Marc Dionne; +Cc: Florian Westphal, netdev, Thorsten Leemhuis, netfilter-devel

On Mon, Jul 11, 2016 at 06:17:39PM -0300, Marc Dionne wrote:
> Hi Pablo,
> 
> Testing out your patch:
> 
> 1) With no NAT in place, the clash resolution happens, with no side
> effects.  No EPERM errors are seen.
> 
> 2) With ip(6)table_nat loaded, the clash resolution fails and I get
> some EPERM errors from sendmsg(), same as before 71d8c47fc653.
> 
> Turns out that even though I have no NAT rules in my iptables config,
> the system also had firewalld active and that caused the modules to be
> loaded.
> 
> So the bottom line is that the patch looks good to me..

Thanks Marc, I'm going to apply this then.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-07-12 14:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAB9dFduvE0dKzZ8Dm5RVVrUAq1Auvj8t9xXAyARGyO4NmowvYw@mail.gmail.com>
     [not found] ` <20160627142238.GA10613@breakpoint.cc>
     [not found]   ` <CAB9dFds=qY=Dk++p7qVX7a8aOOH4wn0rtL3m4poO6HMQPuPrnA@mail.gmail.com>
     [not found]     ` <20160627153820.GB10613@breakpoint.cc>
     [not found]       ` <CAB9dFdvQ4UyKNMmOSx+FePyR0_Q425XLJRb_k5h+4JOSkQkf3w@mail.gmail.com>
     [not found]         ` <CAB9dFds7KQxihReHhW9CXJeY9+=4BPema3ZawVA89U45QL5uBw@mail.gmail.com>
2016-07-05 12:28           ` Multi-thread udp 4.7 regression, bisected to 71d8c47fc653 Pablo Neira Ayuso
2016-07-10 19:48             ` Marc Dionne
2016-07-11 16:26               ` Pablo Neira Ayuso
2016-07-11 21:17                 ` Marc Dionne
2016-07-12 14:25                   ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).