netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	kernel-team <kernel-team@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH net] net: memcontrol: charge allocated memory after mem_cgroup_sk_alloc()
Date: Thu, 1 Feb 2018 22:55:50 +0000	[thread overview]
Message-ID: <20180201225542.GA13072@castle> (raw)
In-Reply-To: <CANn89iLBOUZNxNqrECvZ7ktX8uu0j3REiZaqTwr=SP-U=vAPpQ@mail.gmail.com>

On Thu, Feb 01, 2018 at 01:17:56PM -0800, Eric Dumazet wrote:
> On Thu, Feb 1, 2018 at 12:22 PM, Roman Gushchin <guro@fb.com> wrote:
> > On Thu, Feb 01, 2018 at 10:16:55AM -0500, David Miller wrote:
> >> From: Roman Gushchin <guro@fb.com>
> >> Date: Wed, 31 Jan 2018 21:54:08 +0000
> >>
> >> > So I really start thinking that reverting 9f1c2674b328
> >> > ("net: memcontrol: defer call to mem_cgroup_sk_alloc()")
> >> > and fixing the original issue differently might be easier
> >> > and a proper way to go. Does it makes sense?
> >>
> >> You'll need to work that out with Eric Dumazet who added the
> >> change in question which you think we should revert.
> >
> > Eric,
> >
> > can you, please, provide some details about the use-after-free problem
> > that you've fixed with commit 9f1c2674b328 ("net: memcontrol: defer call
> > to mem_cgroup_sk_alloc()" ? Do you know how to reproduce it?
> >
> > Deferring mem_cgroup_sk_alloc() breaks socket memory accounting
> > and makes it much more fragile in general. So, I wonder, if there are
> > solutions for the use-after-free problem.
> >
> > Thank you!
> >
> > Roman
> 
> Unfortunately bug is not public (Google-Bug-Id 67556600 for Googlers
> following this thread )
> 
> Our kernel has a debug feature on percpu_ref_get_many() which detects
> the typical use-after-free problem of
> doing atomic_long_add(nr, &ref->count); while ref->count is 0, or
> memory already freed.
> 
> Bug was serious because css_put() will release the css a second time.
> 
> Stack trace looked like :
> 
> Oct  8 00:23:14 lphh23 kernel: [27239.568098]  <IRQ>
> [<ffffffff909d2fb1>] dump_stack+0x4d/0x6c
> Oct  8 00:23:14 lphh23 kernel: [27239.568108]  [<ffffffff906df6e3>] ?
> cgroup_get+0x43/0x50
> Oct  8 00:23:14 lphh23 kernel: [27239.568114]  [<ffffffff906f2f35>]
> warn_slowpath_common+0xac/0xc8
> Oct  8 00:23:14 lphh23 kernel: [27239.568117]  [<ffffffff906f2f6b>]
> warn_slowpath_null+0x1a/0x1c
> Oct  8 00:23:14 lphh23 kernel: [27239.568120]  [<ffffffff906df6e3>]
> cgroup_get+0x43/0x50
> Oct  8 00:23:14 lphh23 kernel: [27239.568123]  [<ffffffff906e07a4>]
> cgroup_sk_alloc+0x64/0x90

Hm, that looks strange... It's cgroup_sk_alloc(),
not mem_cgroup_sk_alloc(), which was removed by 9f1c2674b328.

I thought, that it's css_get() in mem_cgroup_sk_alloc(), which
you removed, but the stacktrace you've posted is different.

void mem_cgroup_sk_alloc(struct sock *sk) {
	/*
	 * Socket cloning can throw us here with sk_memcg already
	 * filled. It won't however, necessarily happen from
	 * process context. So the test for root memcg given
	 * the current task's memcg won't help us in this case.
	 *
	 * Respecting the original socket's memcg is a better
	 * decision in this case.
	 */
	if (sk->sk_memcg) {
		BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
>>>		css_get(&sk->sk_memcg->css);
		return;
	}

Is it possible to reproduce the issue on an upstream kernel?
Any ideas of what can trigger it?

Btw, with the following patch applied (below) and cgroup v2 enabled,
the issue, which I'm talking about, can be reproduced in seconds after reboot
by doing almost any network activity. Just sshing to a machine is enough.
The corresponding warning will be printed to dmesg.

What is a proper way to fix the socket memory accounting in this case,
what do you think?

Thank you!

Roman

--

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c51c589..55fb890 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -276,6 +276,8 @@ struct mem_cgroup {
 	struct list_head event_list;
 	spinlock_t event_list_lock;
 
+	atomic_t tcpcnt;
+
 	struct mem_cgroup_per_node *nodeinfo[0];
 	/* WARNING: nodeinfo must be the last member here */
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 19eea69..c69ff04 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5623,6 +5623,8 @@ static int memory_stat_show(struct seq_file *m, void *v)
 	seq_printf(m, "workingset_nodereclaim %lu\n",
 		   stat[WORKINGSET_NODERECLAIM]);
 
+	seq_printf(m, "tcpcnt %d\n", atomic_read(&memcg->tcpcnt));
+
 	return 0;
 }
 
@@ -6139,6 +6141,8 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	gfp_t gfp_mask = GFP_KERNEL;
 
+	atomic_add(nr_pages, &memcg->tcpcnt);
+
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
 		struct page_counter *fail;
 
@@ -6171,6 +6175,11 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
  */
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
+	int v = atomic_sub_return(nr_pages, &memcg->tcpcnt);
+	if (v < 0) {
+		pr_info("@@@ %p %d \n", memcg, v);
+	}
+
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
 		page_counter_uncharge(&memcg->tcpmem, nr_pages);
 		return;

  reply	other threads:[~2018-02-01 22:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-25  0:19 [PATCH net] net: memcontrol: charge allocated memory after mem_cgroup_sk_alloc() Roman Gushchin
2018-01-25 17:03 ` David Miller
2018-01-25 17:15   ` Roman Gushchin
2018-01-31 21:54   ` Roman Gushchin
2018-02-01 15:16     ` David Miller
2018-02-01 20:22       ` Roman Gushchin
2018-02-01 21:17         ` Eric Dumazet
2018-02-01 22:55           ` Roman Gushchin [this message]
2018-02-01 23:27             ` Eric Dumazet
2018-02-01 23:42               ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180201225542.GA13072@castle \
    --to=guro@fb.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).