linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Sudeep K N <sudeepholla.maillist@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"David S. Miller" <davem@davemloft.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>, <linux-mm@kvack.org>,
	<linux-fsdevel@vger.kernel.org>, netdev <netdev@vger.kernel.org>,
	<x86@kernel.org>, open list <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Sudeep Holla <sudeep.holla@arm.com>
Subject: Re: [PATCH RESEND 8/8] af_unix: charge buffers to kmemcg
Date: Tue, 23 Aug 2016 19:44:59 +0300	[thread overview]
Message-ID: <20160823164459.GD1863@esperanza> (raw)
In-Reply-To: <CAPKp9uY9kFTqPT+9rkAcWACWrnE-FbGbuU=6mw715X6eCC4PVg@mail.gmail.com>

Hello,

On Tue, Aug 23, 2016 at 02:48:11PM +0100, Sudeep K N wrote:
> On Tue, May 24, 2016 at 5:36 PM, Vladimir Davydov
> <vdavydov@virtuozzo.com> wrote:
> > On Tue, May 24, 2016 at 06:02:06AM -0700, Eric Dumazet wrote:
> >> On Tue, 2016-05-24 at 11:49 +0300, Vladimir Davydov wrote:
> >> > Unix sockets can consume a significant amount of system memory, hence
> >> > they should be accounted to kmemcg.
> >> >
> >> > Since unix socket buffers are always allocated from process context,
> >> > all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in
> >> > sock->sk_allocation mask.
> >>
> >> I have two questions :
> >>
> >> 1) What happens when a buffer, allocated from socket <A> lands in a
> >> different socket <B>, maybe owned by another user/process.
> >>
> >> Who owns it now, in term of kmemcg accounting ?
> >
> > We never move memcg charges. E.g. if two processes from different
> > cgroups are sharing a memory region, each page will be charged to the
> > process which touched it first. Or if two processes are working with the
> > same directory tree, inodes and dentries will be charged to the first
> > user. The same is fair for unix socket buffers - they will be charged to
> > the sender.
> >
> >>
> >> 2) Has performance impact been evaluated ?
> >
> > I ran netperf STREAM_STREAM with default options in a kmemcg on
> > a 4 core x 2 HT box. The results are below:
> >
> >  # clients            bandwidth (10^6bits/sec)
> >                     base              patched
> >          1      67643 +-  725      64874 +-  353    - 4.0 %
> >          4     193585 +- 2516     186715 +- 1460    - 3.5 %
> >          8     194820 +-  377     187443 +- 1229    - 3.7 %
> >
> > So the accounting doesn't come for free - it takes ~4% of performance.
> > I believe we could optimize it by using per cpu batching not only on
> > charge, but also on uncharge in memcg core, but that's beyond the scope
> > of this patch set - I'll take a look at this later.
> >
> > Anyway, if performance impact is found to be unacceptable, it is always
> > possible to disable kmem accounting at boot time (cgroup.memory=nokmem)
> > or not use memory cgroups at runtime at all (thanks to jump labels
> > there'll be no overhead even if they are compiled in).
> >
> 
> I started seeing almost 10% degradation in the hackbench score with v4.8-rc1
> Bisecting it resulted in this patch, i.e. Commit 3aa9799e1364 ("af_unix: charge
> buffers to kmemcg") in the mainline.
> 
> As per the commit log, it seems like that's expected but I was not sure about
> the margin. I also see the hackbench score is more inconsistent after this
> patch, but I may be wrong as that's based on limited observation.
> 
> Is this something we can ignore as hackbench is more synthetic compared
> to the gain this patch provides in some real workloads ?

AFAIU hackbench essentially measures the rate of sending data over a
unix socket back and forth between processes running on different cpus,
so it isn't a surprise that the patch resulted in a degradation, as it
makes every skb page allocation/deallocation inc/dec an atomic counter
inside memcg. The more processes/cpus running in the same cgroup are
involved in this test, the more significant the overhead of this atomic
counter is going to be.

The degradation is not unavoidable - it can be fixed by making kmem
charge/uncharge code use per-cpu batches. The infrastructure for this
already exists in memcontrol.c. If it were not for the legacy
mem_cgroup->kmem counter (which is actually useless and will be dropped
in cgroup v2), the issue would be pretty easy to fix. However, this
legacy counter makes a possible implementation quite messy, so I'd like
to postpone it until cgroup v2 has finally settled down.

Regarding your problem. As a workaround you can either start your
workload in the root memory cgroup or disable kmem accounting for memory
cgroups altogether (via cgroup.memory=nokmem boot option). If you find
the issue critical, I don't mind reverting the patch - we can always
re-apply it once per-cpu batches are implemented for kmem charges.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-08-23 16:44 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-24  8:49 [PATCH RESEND 0/8] More stuff to charge to kmemcg Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 1/8] mm: remove pointless struct in struct page definition Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 2/8] mm: clean up non-standard page->_mapcount users Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 3/8] mm: memcontrol: cleanup kmem charge functions Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 4/8] mm: charge/uncharge kmemcg from generic page allocator paths Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 5/8] mm: memcontrol: teach uncharge_list to deal with kmem pages Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 6/8] arch: x86: charge page tables to kmemcg Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 7/8] pipe: account " Vladimir Davydov
2016-05-24 12:59   ` Eric Dumazet
2016-05-24 16:13     ` Vladimir Davydov
2016-05-24 20:04       ` Eric Dumazet
2016-05-25 10:30         ` Vladimir Davydov
2016-05-26  7:04           ` Minchan Kim
2016-05-26 13:59             ` Vladimir Davydov
2016-05-26 14:15               ` Eric Dumazet
2016-05-27 15:03                 ` Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 8/8] af_unix: charge buffers " Vladimir Davydov
2016-05-24 13:02   ` Eric Dumazet
2016-05-24 16:36     ` Vladimir Davydov
2016-08-23 13:48       ` Sudeep K N
2016-08-23 16:44         ` Vladimir Davydov [this message]
2016-08-23 16:50           ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160823164459.GD1863@esperanza \
    --to=vdavydov@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sudeep.holla@arm.com \
    --cc=sudeepholla.maillist@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).