From: Benjamin Poirier <bpoirier@nvidia.com>
To: Mike Manning <mvrmanning@gmail.com>
Cc: Netdev <netdev@vger.kernel.org>, David Ahern <dsahern@gmail.com>,
Saikrishna Arcot <sarcot@microsoft.com>,
Craig Gallek <kraig@google.com>
Subject: Re: [PATCH] net: prefer socket bound to interface when not in VRF
Date: Mon, 27 Jun 2022 09:05:37 +0900 [thread overview]
Message-ID: <Yrj0UWOQM8QNqJqu@d3> (raw)
In-Reply-To: <9b37fcae-214c-3a5b-d976-8c94632184d0@gmail.com>
On 2022-06-26 23:25 +0100, Mike Manning wrote:
> On 13/06/2022 04:14, Benjamin Poirier wrote:
> > On 2021-10-05 14:03 +0100, Mike Manning wrote:
[...]
> > Hi Mike,
> >
> > I was looking at this commit, 8d6c414cd2fb ("net: prefer socket bound to
> > interface when not in VRF"), and I get the feeling that it is only
> > partially effective. It works with UDP connected sockets but it doesn't
> > work for TCP and UDP unconnected sockets.
> >
> > The compute_score() functions are a bit misleading. Because of the
> > reuseport shortcut in their callers (inet_lhash2_lookup() and the like),
> > the first socket with score > 0 may be chosen, not necessarily the
> > socket with highest score. In order to prefer certain sockets, I think
> > an approach like commit d894ba18d4e4 ("soreuseport: fix ordering for
> > mixed v4/v6 sockets") would be needed. What do you think?
>
> Hi Benjamin,
>
> We had never observed any issues with any of our configurations. The VRF changes introduced
>
> in 7e225619e8af result in a failure being returned when there is no device match, which satisfies
>
> the requirements for VRF handling so unbound vs. bound to an l3mdev - the score is irrelevant.
>
> However, 8d6c414cd2fb was subsequently needed as unbound and bound sockets were scored
>
> equally, so that fix reinstated a higher score needed for sockets bound to an interface. Wrt to
>
> your query, the scoring resolved the issue. I am unaware of any problematic use-cases, but in
>
> any case, my changes are in line with the current approach.
The problematic use case involves sockets that have SO_REUSEPORT +
SO_BINDTODEVICE. Earlier in the thread I've included a test that
demonstrates the issue.
For the Cumulus kernel I've put in place a workaround that removes the
reuseport optimization (see below). I probably won't have time to work
on a proper upstream solution.
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index b9d995b5ce24..1765ac837358 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -253,16 +253,20 @@ static struct sock *inet_lhash2_lookup(struct net *net,
sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) {
score = compute_score(sk, net, hnum, daddr, dif, sdif);
if (score > hiscore) {
- result = lookup_reuseport(net, sk, skb, doff,
- saddr, sport, daddr, hnum);
- if (result)
- return result;
-
result = sk;
hiscore = score;
}
}
+ if (result) {
+ struct sock *reuse_sk;
+
+ reuse_sk = lookup_reuseport(net, result, skb, doff,
+ saddr, sport, daddr, hnum);
+ if (reuse_sk)
+ result = reuse_sk;
+ }
+
return result;
}
> > Extra info:
> > 1) fcnal-test.sh results
> >
> > I tried to reproduce the fcnal-test.sh test results quoted above but in
> > my case the test cases already pass at 8d6c414cd2fb^ and 9e9fb7655ed5.
> > Moreover I believe those test cases don't have multiple listening
> > sockets. So that just added to my confusion.
>
> The fix was not targeting those 2 failed test cases, the output was only to show the before/after
>
> test results. It is unclear why they failed for me with with the 9e9fb7655ed5 baseline,
Thanks for taking a look, that was also my guess.
prev parent reply other threads:[~2022-06-27 0:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-05 13:03 [PATCH] net: prefer socket bound to interface when not in VRF Mike Manning
2021-10-07 14:07 ` Jakub Kicinski
2021-10-07 14:10 ` David Ahern
2021-10-07 14:27 ` Jakub Kicinski
2022-06-13 3:14 ` Benjamin Poirier
2022-06-13 3:52 ` David Ahern
2022-06-14 0:39 ` Benjamin Poirier
2022-06-26 22:25 ` Mike Manning
2022-06-27 0:05 ` Benjamin Poirier [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yrj0UWOQM8QNqJqu@d3 \
--to=bpoirier@nvidia.com \
--cc=dsahern@gmail.com \
--cc=kraig@google.com \
--cc=mvrmanning@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=sarcot@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox