From: Kuniyuki Iwashima <kuniyu@amazon.com>
To: <pabeni@redhat.com>, <edumazet@google.com>
Cc: <davem@davemloft.net>, <kuba@kernel.org>, <kuni1840@gmail.com>,
<kuniyu@amazon.com>, <netdev@vger.kernel.org>
Subject: Re: [PATCH v3 net-next 3/5] tcp: Access &tcp_hashinfo via net.
Date: Thu, 1 Sep 2022 14:25:20 -0700 [thread overview]
Message-ID: <20220901212520.11421-1-kuniyu@amazon.com> (raw)
In-Reply-To: <f154fcd1d7e9c856c46dbf00ef4998773574a5cc.camel@redhat.com>
From: Paolo Abeni <pabeni@redhat.com>
Date: Thu, 01 Sep 2022 12:57:33 +0200
> On Tue, 2022-08-30 at 12:15 -0700, Kuniyuki Iwashima wrote:
> > We will soon introduce an optional per-netns ehash.
> >
> > This means we cannot use tcp_hashinfo directly in most places.
> >
> > Instead, access it via net->ipv4.tcp_death_row->hashinfo.
> >
> > The access will be valid only while initialising tcp_hashinfo
> > itself and creating/destroying each netns.
> >
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> > .../chelsio/inline_crypto/chtls/chtls_cm.c | 5 +-
> > net/core/filter.c | 5 +-
> > net/ipv4/esp4.c | 3 +-
> > net/ipv4/inet_hashtables.c | 2 +-
> > net/ipv4/netfilter/nf_socket_ipv4.c | 2 +-
> > net/ipv4/netfilter/nf_tproxy_ipv4.c | 17 +++--
> > net/ipv4/tcp_diag.c | 18 +++++-
> > net/ipv4/tcp_ipv4.c | 63 +++++++++++--------
> > net/ipv4/tcp_minisocks.c | 2 +-
> > net/ipv6/esp6.c | 3 +-
> > net/ipv6/inet6_hashtables.c | 4 +-
> > net/ipv6/netfilter/nf_socket_ipv6.c | 2 +-
> > net/ipv6/netfilter/nf_tproxy_ipv6.c | 5 +-
> > net/ipv6/tcp_ipv6.c | 16 ++---
> > net/mptcp/mptcp_diag.c | 7 ++-
> > 15 files changed, 92 insertions(+), 62 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> > index ddfe9208529a..f90bfba4b303 100644
> > --- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> > +++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> > @@ -1069,8 +1069,7 @@ static void chtls_pass_accept_rpl(struct sk_buff *skb,
> > cxgb4_l2t_send(csk->egress_dev, skb, csk->l2t_entry);
> > }
> >
> > -static void inet_inherit_port(struct inet_hashinfo *hash_info,
> > - struct sock *lsk, struct sock *newsk)
> > +static void inet_inherit_port(struct sock *lsk, struct sock *newsk)
> > {
> > local_bh_disable();
> > __inet_inherit_port(lsk, newsk);
> > @@ -1240,7 +1239,7 @@ static struct sock *chtls_recv_sock(struct sock *lsk,
> > ipv4.sysctl_tcp_window_scaling),
> > tp->window_clamp);
> > neigh_release(n);
> > - inet_inherit_port(&tcp_hashinfo, lsk, newsk);
> > + inet_inherit_port(lsk, newsk);
> > csk_set_flag(csk, CSK_CONN_INLINE);
> > bh_unlock_sock(newsk); /* tcp_create_openreq_child ->sk_clone_lock */
>
> I looks to me that the above chunks are functionally a no-op and I
> think that omitting the 2 drivers from the v2:
>
> https://lore.kernel.org/netdev/20220829161920.99409-4-kuniyu@amazon.com/
>
> should break mlx5/nfp inside a netns. I don't understand why including
> the above and skipping the latters?!? I guess is a question mostly for
> Eric :)
My best guess is that it's ok unless it does not touch TCP stack deeply
and if it does, the driver developer must catch up with the core changes
not to burden maintainers...?
If so, I understand that take. OTOH, I also don't want to break anything
when we know the change would do.
So, I'm fine to either stay as is or add the change in v4 again.
>
> > @@ -1728,6 +1728,7 @@ EXPORT_SYMBOL(tcp_v4_do_rcv);
> >
> > int tcp_v4_early_demux(struct sk_buff *skb)
> > {
> > + struct net *net = dev_net(skb->dev);
> > const struct iphdr *iph;
> > const struct tcphdr *th;
> > struct sock *sk;
> > @@ -1744,7 +1745,7 @@ int tcp_v4_early_demux(struct sk_buff *skb)
> > if (th->doff < sizeof(struct tcphdr) / 4)
> > return 0;
> >
> > - sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
> > + sk = __inet_lookup_established(net, net->ipv4.tcp_death_row->hashinfo,
> > iph->saddr, th->source,
> > iph->daddr, ntohs(th->dest),
> > skb->skb_iif, inet_sdif(skb));
>
> /Me is thinking aloud...
>
> I'm wondering if the above has some measurable negative effect for
> large deployments using only the main netns?
>
> Specifically, are net->ipv4.tcp_death_row and net->ipv4.tcp_death_row-
> >hashinfo already into the working set data for established socket?
> Would the above increase the WSS by 2 cache-lines?
Currently, the death_row and hashinfo are touched around tw sockets or
connect(). If connections on the deployment are short-lived or frequently
initiated by itself, that would be host and included in WSS.
If the workload is server and there's no active-close() socket or
connections are long-lived, then it might not be included in WSS.
But I think it's not likely than the former if the deployment is
large enough.
If this change had large impact, then we could revert fbb8295248e1
which converted net->ipv4.tcp_death_row into pointer for 0dad4087a86a
that tried to fire a TW timer after netns is freed, but 0dad4087a86a
has already reverted.
>
> Thanks!
>
> Paolo
next prev parent reply other threads:[~2022-09-01 21:25 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-30 19:15 [PATCH v3 net-next 0/5] tcp: Introduce optional per-netns ehash Kuniyuki Iwashima
2022-08-30 19:15 ` [PATCH v3 net-next 1/5] tcp: Clean up some functions Kuniyuki Iwashima
2022-08-30 19:15 ` [PATCH v3 net-next 2/5] tcp: Set NULL to sk->sk_prot->h.hashinfo Kuniyuki Iwashima
2022-08-30 19:15 ` [PATCH v3 net-next 3/5] tcp: Access &tcp_hashinfo via net Kuniyuki Iwashima
2022-09-01 10:57 ` Paolo Abeni
2022-09-01 21:25 ` Kuniyuki Iwashima [this message]
2022-09-01 21:30 ` Eric Dumazet
2022-09-01 22:12 ` Kuniyuki Iwashima
2022-09-03 0:44 ` Kuniyuki Iwashima
2022-09-03 0:53 ` Eric Dumazet
2022-09-03 1:12 ` Kuniyuki Iwashima
2022-09-03 1:44 ` Kuniyuki Iwashima
2022-09-03 2:30 ` Eric Dumazet
2022-09-03 2:50 ` Kuniyuki Iwashima
2022-09-03 3:16 ` Eric Dumazet
2022-09-03 3:25 ` Kuniyuki Iwashima
2022-09-01 21:49 ` Jakub Kicinski
2022-09-01 22:19 ` Kuniyuki Iwashima
2022-08-30 19:15 ` [PATCH v3 net-next 4/5] tcp: Save unnecessary inet_twsk_purge() calls Kuniyuki Iwashima
2022-08-30 19:15 ` [PATCH v3 net-next 5/5] tcp: Introduce optional per-netns ehash Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220901212520.11421-1-kuniyu@amazon.com \
--to=kuniyu@amazon.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=kuni1840@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).