Multi-thread udp 4.7 regression, bisected to 71d8c47fc653

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
@ 2016-06-27 13:41 Marc Dionne
  2016-06-27 14:22 ` Florian Westphal
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Dionne @ 2016-06-27 13:41 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netdev

Hi,

I've been seeing issues with 4.7-rc kernels with some of our
multi-thread test cases.  I've bisected it down to this commit:

  commit 71d8c47fc653711c41bc3282e5b0e605b3727956
  Author: Pablo Neira Ayuso <pablo@netfilter.org>
  Date:   Sun May 1 00:28:40 2016 +0200

      netfilter: conntrack: introduce clash resolution on insertion race

.. and verified that reverting the commit restores the expected behaviour.

The test spins up 30 threads that start communicating simultaneously
with a server at the same target port, using a UDP based protocol over
loopback.  This test is known to easily trigger the insertion race
that the commit is trying to fix - we check for EPERM errors and retry
in that case.

On my usual test rig the test gets about 700Mb/s.  On current
mainline, the result alternates between a somewhat normal number,
something much lower (~150Mb/s), or complete failure following an
internal timeout.

Any thoughts on what may be going on?

Thanks,
Marc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-06-27 13:41 Multi-thread udp 4.7 regression, bisected to 71d8c47fc653 Marc Dionne
@ 2016-06-27 14:22 ` Florian Westphal
  2016-06-27 14:46   ` Marc Dionne
  0 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2016-06-27 14:22 UTC (permalink / raw)
  To: Marc Dionne; +Cc: Pablo Neira Ayuso, netdev

Marc Dionne <marc.c.dionne@gmail.com> wrote:
> Hi,
> 
> I've been seeing issues with 4.7-rc kernels with some of our
> multi-thread test cases.  I've bisected it down to this commit:
> 
>   commit 71d8c47fc653711c41bc3282e5b0e605b3727956
>   Author: Pablo Neira Ayuso <pablo@netfilter.org>
>   Date:   Sun May 1 00:28:40 2016 +0200
> 
>       netfilter: conntrack: introduce clash resolution on insertion race
> 
> 
> .. and verified that reverting the commit restores the expected behaviour.

Does this change help?

Subject: Restrict clash resolution to original direction

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -721,13 +721,18 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	   not in the hash.  If there is, we lost race. */
 	hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
-				    zone, net))
-			goto out;
+				    zone, net)) {
+			nf_ct_add_to_dying_list(ct);
+			ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
+			goto dying;
+		}
 
 	hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[reply_hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
-				    zone, net))
-			goto out;
+				    zone, net)) {
+			nf_ct_add_to_dying_list(ct);
+			goto dying;
+		}
 
 	/* Timer relative to confirmation time, not original
 	   setting time, otherwise we'd get timer wrap in
@@ -763,9 +768,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 				 IPCT_RELATED : IPCT_NEW, ct);
 	return NF_ACCEPT;
 
-out:
-	nf_ct_add_to_dying_list(ct);
-	ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
 dying:
 	nf_conntrack_double_unlock(hash, reply_hash);
 	NF_CT_STAT_INC(net, insert_failed);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-06-27 14:22 ` Florian Westphal
@ 2016-06-27 14:46   ` Marc Dionne
  2016-06-27 15:38     ` Florian Westphal
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Dionne @ 2016-06-27 14:46 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netdev

On Mon, Jun 27, 2016 at 11:22 AM, Florian Westphal <fw@strlen.de> wrote:
> Marc Dionne <marc.c.dionne@gmail.com> wrote:
>> Hi,
>>
>> I've been seeing issues with 4.7-rc kernels with some of our
>> multi-thread test cases.  I've bisected it down to this commit:
>>
>>   commit 71d8c47fc653711c41bc3282e5b0e605b3727956
>>   Author: Pablo Neira Ayuso <pablo@netfilter.org>
>>   Date:   Sun May 1 00:28:40 2016 +0200
>>
>>       netfilter: conntrack: introduce clash resolution on insertion race
>>
>>
>> .. and verified that reverting the commit restores the expected behaviour.
>
> Does this change help?
>
> Subject: Restrict clash resolution to original direction
>
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -721,13 +721,18 @@ __nf_conntrack_confirm(struct sk_buff *skb)
>            not in the hash.  If there is, we lost race. */
>         hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
>                 if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
> -                                   zone, net))
> -                       goto out;
> +                                   zone, net)) {
> +                       nf_ct_add_to_dying_list(ct);
> +                       ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
> +                       goto dying;
> +               }
>
>         hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[reply_hash], hnnode)
>                 if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
> -                                   zone, net))
> -                       goto out;
> +                                   zone, net)) {
> +                       nf_ct_add_to_dying_list(ct);
> +                       goto dying;
> +               }
>
>         /* Timer relative to confirmation time, not original
>            setting time, otherwise we'd get timer wrap in
> @@ -763,9 +768,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)
>                                  IPCT_RELATED : IPCT_NEW, ct);
>         return NF_ACCEPT;
>
> -out:
> -       nf_ct_add_to_dying_list(ct);
> -       ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
>  dying:
>         nf_conntrack_double_unlock(hash, reply_hash);
>         NF_CT_STAT_INC(net, insert_failed);

No, that doesn't seem to make any noticeable difference.

Marc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-06-27 14:46   ` Marc Dionne
@ 2016-06-27 15:38     ` Florian Westphal
  2016-06-27 17:21       ` Marc Dionne
  0 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2016-06-27 15:38 UTC (permalink / raw)
  To: Marc Dionne; +Cc: Florian Westphal, Pablo Neira Ayuso, netdev

Marc Dionne <marc.c.dionne@gmail.com> wrote:
> On Mon, Jun 27, 2016 at 11:22 AM, Florian Westphal <fw@strlen.de> wrote:
> > Marc Dionne <marc.c.dionne@gmail.com> wrote:
> >> Hi,

> >         hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
> >                 if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
> > -                                   zone, net))
> > -                       goto out;
> > +                                   zone, net)) {
> > +                       nf_ct_add_to_dying_list(ct);
> > +                       ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
> > +                       goto dying;
> > +               }

This is bogus as h can be a reply too (key compare does not deal
with it).

Below is what I actually intended; I can't come up with a reason why
you experience this issue other than that we're getting confused over
reply/original direction.

If the patch doesn't help either, can you tell us what kind of iptables
rules are installed on the affected system or perhaps report perf drop
monitor stat when things go wrong?

Thanks!

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -638,6 +638,12 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
 	struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
 	struct nf_conntrack_l4proto *l4proto;
 
+	/* skb being confirmed is always original dir; do not attach to
+	 * a reply tuple.
+	 */
+	if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
+		goto out;
+
 	l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
 	if (l4proto->allow_clash &&
 	    !nf_ct_is_dying(ct) &&
@@ -650,6 +656,7 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
 		skb->nfct = &ct->ct_general;
 		return NF_ACCEPT;
 	}
+ out:
 	NF_CT_STAT_INC(net, drop);
 	return NF_DROP;
 }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-06-27 15:38     ` Florian Westphal
@ 2016-06-27 17:21       ` Marc Dionne
  2016-07-04 12:35         ` Marc Dionne
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Dionne @ 2016-06-27 17:21 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netdev

[-- Attachment #1: Type: text/plain, Size: 1482 bytes --]

On Mon, Jun 27, 2016 at 12:38 PM, Florian Westphal <fw@strlen.de> wrote:
> Marc Dionne <marc.c.dionne@gmail.com> wrote:
>> On Mon, Jun 27, 2016 at 11:22 AM, Florian Westphal <fw@strlen.de> wrote:
>> > Marc Dionne <marc.c.dionne@gmail.com> wrote:
>> >> Hi,
>
>> >         hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
>> >                 if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
>> > -                                   zone, net))
>> > -                       goto out;
>> > +                                   zone, net)) {
>> > +                       nf_ct_add_to_dying_list(ct);
>> > +                       ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
>> > +                       goto dying;
>> > +               }
>
> This is bogus as h can be a reply too (key compare does not deal
> with it).
>
> Below is what I actually intended; I can't come up with a reason why
> you experience this issue other than that we're getting confused over
> reply/original direction.
>
> If the patch doesn't help either, can you tell us what kind of iptables
> rules are installed on the affected system or perhaps report perf drop
> monitor stat when things go wrong?
>
> Thanks!

The additional patch didn't help either.

I had a lot of iptables bloat, but I reverted to old simple iptables
and ip6tables configs (attached), and still see the problem.  Note
that the test normally uses ipv6, but the behaviour is the same with
ipv4.

Marc

[-- Attachment #2: iptables --]
[-- Type: application/octet-stream, Size: 550 bytes --]

# sample configuration for iptables service
# you can edit this manually or use system-config-firewall
# please do not ask us to add additional ports/services to this default configuration
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

[-- Attachment #3: ip6tables --]
[-- Type: application/octet-stream, Size: 969 bytes --]

# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p ipv6-icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 546 -d fe80::/64 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 631 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 5353 -d ff02::fb -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 631 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 631 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 137 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 138 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp6-adm-prohibited
-A FORWARD -j REJECT --reject-with icmp6-adm-prohibited
COMMIT

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-06-27 17:21       ` Marc Dionne
@ 2016-07-04 12:35         ` Marc Dionne
  2016-07-05 12:28           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Dionne @ 2016-07-04 12:35 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netdev, regressions

On Mon, Jun 27, 2016 at 2:21 PM, Marc Dionne <marc.c.dionne@gmail.com> wrote:
> On Mon, Jun 27, 2016 at 12:38 PM, Florian Westphal <fw@strlen.de> wrote:
>> Marc Dionne <marc.c.dionne@gmail.com> wrote:
>>> On Mon, Jun 27, 2016 at 11:22 AM, Florian Westphal <fw@strlen.de> wrote:
>>> > Marc Dionne <marc.c.dionne@gmail.com> wrote:
>>> >> Hi,
>>
>>> >         hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
>>> >                 if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
>>> > -                                   zone, net))
>>> > -                       goto out;
>>> > +                                   zone, net)) {
>>> > +                       nf_ct_add_to_dying_list(ct);
>>> > +                       ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
>>> > +                       goto dying;
>>> > +               }
>>
>> This is bogus as h can be a reply too (key compare does not deal
>> with it).
>>
>> Below is what I actually intended; I can't come up with a reason why
>> you experience this issue other than that we're getting confused over
>> reply/original direction.
>>
>> If the patch doesn't help either, can you tell us what kind of iptables
>> rules are installed on the affected system or perhaps report perf drop
>> monitor stat when things go wrong?
>>
>> Thanks!
>
> The additional patch didn't help either.
>
> I had a lot of iptables bloat, but I reverted to old simple iptables
> and ip6tables configs (attached), and still see the problem.  Note
> that the test normally uses ipv6, but the behaviour is the same with
> ipv4.
>
> Marc

Hi,

Any other ideas for this issue?  I would hate to see the 4.7 release
go out without a fix or revert, and have a kernel that we have to tell
our clients not to use.

If there is no quick fix, seems like a revert should be considered:
- Looks to me like the commit attempts to fix a long standing bug
(exists at least as far back as 3.5,
https://bugzilla.kernel.org/show_bug.cgi?id=52991)
- The above bug has a simple workaround (at least for us) that we
implemented more than 3 years ago
- The commit reverts cleanly, restoring the original behaviour
- From that bug report, bind was one of the affected applications; I
would suspect that this regression is likely to affect bind as well

I'd be more than happy to test suggested fixes or give feedback with
debugging patches, etc.

Thanks,
Marc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-07-04 12:35         ` Marc Dionne
@ 2016-07-05 12:28           ` Pablo Neira Ayuso
  2016-07-10 19:48             ` Marc Dionne
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2016-07-05 12:28 UTC (permalink / raw)
  To: Marc Dionne; +Cc: Florian Westphal, netdev, regressions, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1537 bytes --]

Hi,

On Mon, Jul 04, 2016 at 09:35:28AM -0300, Marc Dionne wrote:
> If there is no quick fix, seems like a revert should be considered:
> - Looks to me like the commit attempts to fix a long standing bug
> (exists at least as far back as 3.5,
> https://bugzilla.kernel.org/show_bug.cgi?id=52991)
> - The above bug has a simple workaround (at least for us) that we
> implemented more than 3 years ago

I guess the workaround consists of using a rule to NOTRACK this
traffic. Or there is any custom patch that you've used on your side to
resolve this?

> - The commit reverts cleanly, restoring the original behaviour
> - From that bug report, bind was one of the affected applications; I
> would suspect that this regression is likely to affect bind as well
> 
> I'd be more than happy to test suggested fixes or give feedback with
> debugging patches, etc.

Could you monitor

# conntrack -S

or alternatively (if conntrack utility not available in your system):

# cat /proc/net/stat/nf_conntrack

?

Please, watch for insert_failed and drop statistics.

Are you observing any splat or just large packet drops? Could you
compile your kernel with lockdep on and retest?

Is there any chance I can get your test file that generates the UDP
client threads to reproduce this here?

I'm also attaching a patch to drop old ct that lost race path out from
hashtable locks to avoid releasing the ct object while holding the
locks, although I couldn't come up with any interaction so far
triggering the condition that you're observing.

Thanks.

[-- Attachment #2: x.patch --]
[-- Type: text/x-diff, Size: 1917 bytes --]

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 62c42e9..98a71f1 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -638,7 +638,8 @@ static void nf_ct_acct_merge(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
 /* Resolve race on insertion if this protocol allows this. */
 static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
 			       enum ip_conntrack_info ctinfo,
-			       struct nf_conntrack_tuple_hash *h)
+			       struct nf_conntrack_tuple_hash *h,
+			       struct nf_conn **old_ct)
 {
 	/* This is the conntrack entry already in hashes that won race. */
 	struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
@@ -649,7 +650,7 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
 	    !nf_ct_is_dying(ct) &&
 	    atomic_inc_not_zero(&ct->ct_general.use)) {
 		nf_ct_acct_merge(ct, ctinfo, (struct nf_conn *)skb->nfct);
-		nf_conntrack_put(skb->nfct);
+		*old_ct = (struct nf_conn *)skb->nfct;
 		/* Assign conntrack already in hashes to this skbuff. Don't
 		 * modify skb->nfctinfo to ensure consistent stateful filtering.
 		 */
@@ -667,7 +668,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	const struct nf_conntrack_zone *zone;
 	unsigned int hash, reply_hash;
 	struct nf_conntrack_tuple_hash *h;
-	struct nf_conn *ct;
+	struct nf_conn *ct, *old_ct = NULL;
 	struct nf_conn_help *help;
 	struct nf_conn_tstamp *tstamp;
 	struct hlist_nulls_node *n;
@@ -771,11 +772,14 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 
 out:
 	nf_ct_add_to_dying_list(ct);
-	ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
+	ret = nf_ct_resolve_clash(net, skb, ctinfo, h, &old_ct);
 dying:
 	nf_conntrack_double_unlock(hash, reply_hash);
 	NF_CT_STAT_INC(net, insert_failed);
 	local_bh_enable();
+	if (old_ct)
+		nf_ct_put(old_ct);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(__nf_conntrack_confirm);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-07-05 12:28           ` Pablo Neira Ayuso
@ 2016-07-10 19:48             ` Marc Dionne
  2016-07-11 16:26               ` Pablo Neira Ayuso
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Dionne @ 2016-07-10 19:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, netdev, Thorsten Leemhuis, netfilter-devel

On Tue, Jul 5, 2016 at 9:28 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi,
>
> On Mon, Jul 04, 2016 at 09:35:28AM -0300, Marc Dionne wrote:
>> If there is no quick fix, seems like a revert should be considered:
>> - Looks to me like the commit attempts to fix a long standing bug
>> (exists at least as far back as 3.5,
>> https://bugzilla.kernel.org/show_bug.cgi?id=52991)
>> - The above bug has a simple workaround (at least for us) that we
>> implemented more than 3 years ago
>
> I guess the workaround consists of using a rule to NOTRACK this
> traffic. Or there is any custom patch that you've used on your side to
> resolve this?
>
>> - The commit reverts cleanly, restoring the original behaviour
>> - From that bug report, bind was one of the affected applications; I
>> would suspect that this regression is likely to affect bind as well
>>
>> I'd be more than happy to test suggested fixes or give feedback with
>> debugging patches, etc.
>
> Could you monitor
>
> # conntrack -S
>
> or alternatively (if conntrack utility not available in your system):
>
> # cat /proc/net/stat/nf_conntrack
>
> ?
>
> Please, watch for insert_failed and drop statistics.
>
> Are you observing any splat or just large packet drops? Could you
> compile your kernel with lockdep on and retest?
>
> Is there any chance I can get your test file that generates the UDP
> client threads to reproduce this here?
>
> I'm also attaching a patch to drop old ct that lost race path out from
> hashtable locks to avoid releasing the ct object while holding the
> locks, although I couldn't come up with any interaction so far
> triggering the condition that you're observing.
>
> Thanks.

An update here since I've had some interactions with Pablo off list.

Further testing shows that the underlying cause of the different test
results is a udp packet that has a bogus source port number.  In the
test the server process tries to send an ack to the bogus port and the
flow is disrupted.

Notes:
- The packet with the bad source port is from a sendmsg() call that
has hit the connection tracker clash code introduced by 71d8c47fc653
- Packets are successfully sent after the bad one, from the same
socket, with the correct source port number
- The problem does not reproduce with 71d8c47fc653 reverted, or
without nf_conntrack loaded
- The bogus port numbers start at 1024, bumping up by 1 every few
times the problem occurs (1025, 1026, etc.)
- The patch above does not change the behaviour
- Enabling lockdep does not show anything

Our workaround for the original race was to retry sendmsg() once on
EPERM errors, and that had been effective.
I can trigger the insertion clash easily with some simple test code,
but I have not been able so far to reproduce the packets with bad
source port numbers with some simpler code that I could share.

Thanks,
Marc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-07-10 19:48             ` Marc Dionne
@ 2016-07-11 16:26               ` Pablo Neira Ayuso
  2016-07-11 21:17                 ` Marc Dionne
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2016-07-11 16:26 UTC (permalink / raw)
  To: Marc Dionne; +Cc: Florian Westphal, netdev, Thorsten Leemhuis, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1834 bytes --]

On Sun, Jul 10, 2016 at 04:48:26PM -0300, Marc Dionne wrote:
> An update here since I've had some interactions with Pablo off list.
> 
> Further testing shows that the underlying cause of the different test
> results is a udp packet that has a bogus source port number.  In the
> test the server process tries to send an ack to the bogus port and the
> flow is disrupted.
> 
> Notes:
> - The packet with the bad source port is from a sendmsg() call that
> has hit the connection tracker clash code introduced by 71d8c47fc653
> - Packets are successfully sent after the bad one, from the same
> socket, with the correct source port number
> - The problem does not reproduce with 71d8c47fc653 reverted, or
> without nf_conntrack loaded
> - The bogus port numbers start at 1024, bumping up by 1 every few
> times the problem occurs (1025, 1026, etc.)
> - The patch above does not change the behaviour
> - Enabling lockdep does not show anything
> 
> Our workaround for the original race was to retry sendmsg() once on
> EPERM errors, and that had been effective.
> I can trigger the insertion clash easily with some simple test code,
> but I have not been able so far to reproduce the packets with bad
> source port numbers with some simpler code that I could share.

The NAT nul-binding is triggering the source port mangling, even if
there is no NAT rules in place. The following patch skips the clash
resolution for NAT by now since we don't see a simple solution for
this case at the moment.

Could you give a try to this patch in these two cases?

1) No NAT in place: Make sure iptable_nat module is not there. Or if
   you're using nf_tables, just make sure you have no nat chains at
   all.

2) With NAT in place, you hit back the EPERM errors that you've
   observed so far.

Please, test both scenarios and report back. Thanks.

[-- Attachment #2: 0001-netfilter-conntrack-skip-clash-resolution-if-nat-is-.patch --]
[-- Type: text/x-diff, Size: 1559 bytes --]

>From c3b9dfbcf35ea38a3dce22daf7450fc23c620aea Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon, 11 Jul 2016 17:28:54 +0200
Subject: [PATCH] netfilter: conntrack: skip clash resolution if nat is in
 place

The clash resolution is not easy to apply if the NAT table is
registered. Even if no NAT rules are installed, the nul-binding ensures
that a unique tuple is used, thus, the packet that loses race gets a
different source port number, as described by:

http://marc.info/?l=netfilter-devel&m=146818011604484&w=2

Clash resolution with NAT is also problematic if addresses/port range
ports are used since the conntrack that wins race may describe a
different mangling that we may have earlier applied to the packet via
nf_nat_setup_info().

Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index e0e9c9a..2b0f80a 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -657,6 +657,7 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
 
 	l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
 	if (l4proto->allow_clash &&
+	    !nfct_nat(ct) &&
 	    !nf_ct_is_dying(ct) &&
 	    atomic_inc_not_zero(&ct->ct_general.use)) {
 		nf_ct_acct_merge(ct, ctinfo, (struct nf_conn *)skb->nfct);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-07-11 16:26               ` Pablo Neira Ayuso
@ 2016-07-11 21:17                 ` Marc Dionne
  2016-07-12 14:25                   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Dionne @ 2016-07-11 21:17 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, netdev, Thorsten Leemhuis, netfilter-devel

On Mon, Jul 11, 2016 at 1:26 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Sun, Jul 10, 2016 at 04:48:26PM -0300, Marc Dionne wrote:
>> An update here since I've had some interactions with Pablo off list.
>>
>> Further testing shows that the underlying cause of the different test
>> results is a udp packet that has a bogus source port number.  In the
>> test the server process tries to send an ack to the bogus port and the
>> flow is disrupted.
>>
>> Notes:
>> - The packet with the bad source port is from a sendmsg() call that
>> has hit the connection tracker clash code introduced by 71d8c47fc653
>> - Packets are successfully sent after the bad one, from the same
>> socket, with the correct source port number
>> - The problem does not reproduce with 71d8c47fc653 reverted, or
>> without nf_conntrack loaded
>> - The bogus port numbers start at 1024, bumping up by 1 every few
>> times the problem occurs (1025, 1026, etc.)
>> - The patch above does not change the behaviour
>> - Enabling lockdep does not show anything
>>
>> Our workaround for the original race was to retry sendmsg() once on
>> EPERM errors, and that had been effective.
>> I can trigger the insertion clash easily with some simple test code,
>> but I have not been able so far to reproduce the packets with bad
>> source port numbers with some simpler code that I could share.
>
> The NAT nul-binding is triggering the source port mangling, even if
> there is no NAT rules in place. The following patch skips the clash
> resolution for NAT by now since we don't see a simple solution for
> this case at the moment.
>
> Could you give a try to this patch in these two cases?
>
> 1) No NAT in place: Make sure iptable_nat module is not there. Or if
>    you're using nf_tables, just make sure you have no nat chains at
>    all.
>
> 2) With NAT in place, you hit back the EPERM errors that you've
>    observed so far.
>
> Please, test both scenarios and report back. Thanks.

Hi Pablo,

Testing out your patch:

1) With no NAT in place, the clash resolution happens, with no side
effects.  No EPERM errors are seen.

2) With ip(6)table_nat loaded, the clash resolution fails and I get
some EPERM errors from sendmsg(), same as before 71d8c47fc653.

Turns out that even though I have no NAT rules in my iptables config,
the system also had firewalld active and that caused the modules to be
loaded.

So the bottom line is that the patch looks good to me..

Thanks,
Marc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653
  2016-07-11 21:17                 ` Marc Dionne
@ 2016-07-12 14:25                   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2016-07-12 14:25 UTC (permalink / raw)
  To: Marc Dionne; +Cc: Florian Westphal, netdev, Thorsten Leemhuis, netfilter-devel

On Mon, Jul 11, 2016 at 06:17:39PM -0300, Marc Dionne wrote:
> Hi Pablo,
> 
> Testing out your patch:
> 
> 1) With no NAT in place, the clash resolution happens, with no side
> effects.  No EPERM errors are seen.
> 
> 2) With ip(6)table_nat loaded, the clash resolution fails and I get
> some EPERM errors from sendmsg(), same as before 71d8c47fc653.
> 
> Turns out that even though I have no NAT rules in my iptables config,
> the system also had firewalld active and that caused the modules to be
> loaded.
> 
> So the bottom line is that the patch looks good to me..

Thanks Marc, I'm going to apply this then.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-07-12 14:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-27 13:41 Multi-thread udp 4.7 regression, bisected to 71d8c47fc653 Marc Dionne
2016-06-27 14:22 ` Florian Westphal
2016-06-27 14:46   ` Marc Dionne
2016-06-27 15:38     ` Florian Westphal
2016-06-27 17:21       ` Marc Dionne
2016-07-04 12:35         ` Marc Dionne
2016-07-05 12:28           ` Pablo Neira Ayuso
2016-07-10 19:48             ` Marc Dionne
2016-07-11 16:26               ` Pablo Neira Ayuso
2016-07-11 21:17                 ` Marc Dionne
2016-07-12 14:25                   ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).