netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Qian Cai <quic_qiancai@quicinc.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH v2 net-next] net: generalize skb freeing deferral to per-cpu lists
Date: Fri, 29 Apr 2022 09:23:13 -0700	[thread overview]
Message-ID: <CANn89iLNpxXJMyDM-xLE_=N5jZ4jw6tfu+U++LSduUmnX7UE+g@mail.gmail.com> (raw)
In-Reply-To: <20220429161810.GA175@qian>

On Fri, Apr 29, 2022 at 9:18 AM Qian Cai <quic_qiancai@quicinc.com> wrote:
>
> On Fri, Apr 22, 2022 at 01:12:37PM -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Logic added in commit f35f821935d8 ("tcp: defer skb freeing after socket
> > lock is released") helped bulk TCP flows to move the cost of skbs
> > frees outside of critical section where socket lock was held.
> >
> > But for RPC traffic, or hosts with RFS enabled, the solution is far from
> > being ideal.
> >
> > For RPC traffic, recvmsg() has to return to user space right after
> > skb payload has been consumed, meaning that BH handler has no chance
> > to pick the skb before recvmsg() thread. This issue is more visible
> > with BIG TCP, as more RPC fit one skb.
> >
> > For RFS, even if BH handler picks the skbs, they are still picked
> > from the cpu on which user thread is running.
> >
> > Ideally, it is better to free the skbs (and associated page frags)
> > on the cpu that originally allocated them.
> >
> > This patch removes the per socket anchor (sk->defer_list) and
> > instead uses a per-cpu list, which will hold more skbs per round.
> >
> > This new per-cpu list is drained at the end of net_action_rx(),
> > after incoming packets have been processed, to lower latencies.
> >
> > In normal conditions, skbs are added to the per-cpu list with
> > no further action. In the (unlikely) cases where the cpu does not
> > run net_action_rx() handler fast enough, we use an IPI to raise
> > NET_RX_SOFTIRQ on the remote cpu.
> >
> > Also, we do not bother draining the per-cpu list from dev_cpu_dead()
> > This is because skbs in this list have no requirement on how fast
> > they should be freed.
> >
> > Note that we can add in the future a small per-cpu cache
> > if we see any contention on sd->defer_lock.
>
> Hmm, we started to see some memory leak reports from kmemleak that have
> been around for hours without being freed since yesterday's linux-next
> tree which included this commit. Any thoughts?
>
> unreferenced object 0xffff400610f55cc0 (size 216):
>   comm "git-remote-http", pid 781180, jiffies 4314091475 (age 4323.740s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 c0 7e 87 ff 3f ff ff 00 00 00 00 00 00 00 00  ..~..?..........
>   backtrace:
>      kmem_cache_alloc_node
>      __alloc_skb
>      __tcp_send_ack.part.0
>      tcp_send_ack
>      tcp_cleanup_rbuf
>      tcp_recvmsg_locked
>      tcp_recvmsg
>      inet_recvmsg
>      __sys_recvfrom
>      __arm64_sys_recvfrom
>      invoke_syscall
>      el0_svc_common.constprop.0
>      do_el0_svc
>      el0_svc
>      el0t_64_sync_handler
>      el0t_64_sync
> unreferenced object 0xffff4001e58f0c40 (size 216):
>   comm "git-remote-http", pid 781180, jiffies 4314091483 (age 4323.968s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 c0 7e 87 ff 3f ff ff 00 00 00 00 00 00 00 00  ..~..?..........
>   backtrace:
>      kmem_cache_alloc_node
>      __alloc_skb
>      __tcp_send_ack.part.0
>      tcp_send_ack
>      tcp_cleanup_rbuf
>      tcp_recvmsg_locked
>      tcp_recvmsg
>      inet_recvmsg
>      __sys_recvfrom
>      __arm64_sys_recvfrom
>      invoke_syscall
>      el0_svc_common.constprop.0
>      do_el0_svc
>      el0_svc
>      el0t_64_sync_handler
>      el0t_64_sync

Which tree are you using ?

Ido said the leak has been fixed in
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=f3412b3879b4f7c4313b186b03940d4791345534

  reply	other threads:[~2022-04-29 16:23 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-22 20:12 [PATCH v2 net-next] net: generalize skb freeing deferral to per-cpu lists Eric Dumazet
2022-04-26  7:38 ` Paolo Abeni
2022-04-26 13:11   ` Eric Dumazet
2022-04-26 15:28     ` Paolo Abeni
2022-04-26 16:13       ` Eric Dumazet
2022-04-27  0:50 ` patchwork-bot+netdevbpf
2022-04-27 15:34 ` Ido Schimmel
2022-04-27 16:53   ` Eric Dumazet
2022-04-27 17:11     ` Eric Dumazet
2022-04-27 17:59       ` Eric Dumazet
2022-04-27 18:54         ` Eric Dumazet
2022-04-29 16:18 ` Qian Cai
2022-04-29 16:23   ` Eric Dumazet [this message]
2022-04-29 20:44     ` Qian Cai
2022-05-06  6:44 ` [net] 72fd55c0db: invoked_oom-killer:gfp_mask=0x kernel test robot
2022-05-06  8:15   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANn89iLNpxXJMyDM-xLE_=N5jZ4jw6tfu+U++LSduUmnX7UE+g@mail.gmail.com' \
    --to=edumazet@google.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=quic_qiancai@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).