All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Rik van Riel <riel@redhat.com>, Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Michel Lespinasse <walken@google.com>,
	Andrea Argangeli <andrea@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Daniel Forrest <dan.forrest@ssec.wisc.edu>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: anon_vma accumulating for certain load still not addressed
Date: Fri, 14 Nov 2014 18:10:47 +0100	[thread overview]
Message-ID: <54663797.1060106@suse.cz> (raw)
In-Reply-To: <54661A8C.5050806@redhat.com>

On 11/14/2014 04:06 PM, Rik van Riel wrote:
> On 11/14/2014 08:08 AM, Michal Hocko wrote:
>> Hi,
>> back in 2012 [1] there was a discussion about a forking load which
>> accumulates anon_vmas. There was a trivial test case which triggers this
>> and can potentially deplete the memory by local user.
>>
>> We have a report for an older enterprise distribution where nsd is
>> suffering from this issue most probably (I haven't debugged it throughly
>> but accumulating anon_vma structs over time sounds like a good enough
>> fit) and has to be restarted after some time to release the accumulated
>> anon_vma objects.
>>
>> There was a patch which tried to work around the issue [2] but I do not
>> see any follow ups nor any indication that the issue would be addressed
>> in other way.
>>
>> The test program from [1] was running for around 39 mins on my laptop
>> and here is the result:
>>
>> $ date +%s; grep anon_vma /proc/slabinfo
>> 1415960225
>> anon_vma           11664  11900    160   25    1 : tunables    0    0    0 : slabdata    476    476      0
>>
>> $ ./a # The reproducer
>>
>> $ date +%s; grep anon_vma /proc/slabinfo
>> 1415962592
>> anon_vma           34875  34875    160   25    1 : tunables    0    0    0 : slabdata   1395   1395      0
>>
>> $ killall a
>> $ date +%s; grep anon_vma /proc/slabinfo
>> 1415962607
>> anon_vma           11277  12175    160   25    1 : tunables    0    0    0 : slabdata    487    487      0
>>
>> So we have accumulated 23211 objects over that time period before the
>> offender was killed which released all of them.
>>
>> The proposed workaround is kind of ugly but do people have a better idea
>> than reference counting? If not should we merge it?
>
> I believe we should just merge that patch.
>
> I have not seen any better ideas come by.

I have some very vague idea that if we could distinguish (with a flag?) 
anon_vma_chain (avc) pointing to parent's anon_vma, from avc's created 
for new anon_vma's in the child, we could maybe detect at "child-type" 
avc removal time, that the only avc's left for a non-root anon_vma are 
those of "parent-type" pointing from children. Then we could go through 
all pages that map the anon_vma, and change their mapping to the root 
anon_vma. The root would have to stay, orphaned or not, because of the 
lock there.

That would remove the need for determining a magic constant and the 
possibility that we still leave non-useful "orphaned" anon_vma's on the 
top levels of the fork hierarchy, while all the bottom levels have to 
share the last anon_vma's that were allowed to be created. I'm not sure 
if that's the case of nsd - if besides the "orphaned parent" forks it 
also forks some workers that would no longer benefit from having their 
private anon_vma's.

Of course the downside is that the idea would be too complicated wrt 
locking and incur overhead on some fast paths (process exit?). And I 
admit I'm not very familiar with the code (which is perhaps euphemism :)
Still, what do you think, Rik?

Vlastimil

> The comment should probably be fixed to reflect the
> chain length of 5 though :)
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Rik van Riel <riel@redhat.com>, Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Michel Lespinasse <walken@google.com>,
	Andrea Argangeli <andrea@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Daniel Forrest <dan.forrest@ssec.wisc.edu>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: anon_vma accumulating for certain load still not addressed
Date: Fri, 14 Nov 2014 18:10:47 +0100	[thread overview]
Message-ID: <54663797.1060106@suse.cz> (raw)
In-Reply-To: <54661A8C.5050806@redhat.com>

On 11/14/2014 04:06 PM, Rik van Riel wrote:
> On 11/14/2014 08:08 AM, Michal Hocko wrote:
>> Hi,
>> back in 2012 [1] there was a discussion about a forking load which
>> accumulates anon_vmas. There was a trivial test case which triggers this
>> and can potentially deplete the memory by local user.
>>
>> We have a report for an older enterprise distribution where nsd is
>> suffering from this issue most probably (I haven't debugged it throughly
>> but accumulating anon_vma structs over time sounds like a good enough
>> fit) and has to be restarted after some time to release the accumulated
>> anon_vma objects.
>>
>> There was a patch which tried to work around the issue [2] but I do not
>> see any follow ups nor any indication that the issue would be addressed
>> in other way.
>>
>> The test program from [1] was running for around 39 mins on my laptop
>> and here is the result:
>>
>> $ date +%s; grep anon_vma /proc/slabinfo
>> 1415960225
>> anon_vma           11664  11900    160   25    1 : tunables    0    0    0 : slabdata    476    476      0
>>
>> $ ./a # The reproducer
>>
>> $ date +%s; grep anon_vma /proc/slabinfo
>> 1415962592
>> anon_vma           34875  34875    160   25    1 : tunables    0    0    0 : slabdata   1395   1395      0
>>
>> $ killall a
>> $ date +%s; grep anon_vma /proc/slabinfo
>> 1415962607
>> anon_vma           11277  12175    160   25    1 : tunables    0    0    0 : slabdata    487    487      0
>>
>> So we have accumulated 23211 objects over that time period before the
>> offender was killed which released all of them.
>>
>> The proposed workaround is kind of ugly but do people have a better idea
>> than reference counting? If not should we merge it?
>
> I believe we should just merge that patch.
>
> I have not seen any better ideas come by.

I have some very vague idea that if we could distinguish (with a flag?) 
anon_vma_chain (avc) pointing to parent's anon_vma, from avc's created 
for new anon_vma's in the child, we could maybe detect at "child-type" 
avc removal time, that the only avc's left for a non-root anon_vma are 
those of "parent-type" pointing from children. Then we could go through 
all pages that map the anon_vma, and change their mapping to the root 
anon_vma. The root would have to stay, orphaned or not, because of the 
lock there.

That would remove the need for determining a magic constant and the 
possibility that we still leave non-useful "orphaned" anon_vma's on the 
top levels of the fork hierarchy, while all the bottom levels have to 
share the last anon_vma's that were allowed to be created. I'm not sure 
if that's the case of nsd - if besides the "orphaned parent" forks it 
also forks some workers that would no longer benefit from having their 
private anon_vma's.

Of course the downside is that the idea would be too complicated wrt 
locking and incur overhead on some fast paths (process exit?). And I 
admit I'm not very familiar with the code (which is perhaps euphemism :)
Still, what do you think, Rik?

Vlastimil

> The comment should probably be fixed to reflect the
> chain length of 5 though :)
>


  reply	other threads:[~2014-11-14 17:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-14 13:08 anon_vma accumulating for certain load still not addressed Michal Hocko
2014-11-14 13:08 ` Michal Hocko
2014-11-14 15:06 ` Rik van Riel
2014-11-14 15:06   ` Rik van Riel
2014-11-14 17:10   ` Vlastimil Babka [this message]
2014-11-14 17:10     ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54663797.1060106@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@kernel.org \
    --cc=dan.forrest@ssec.wisc.edu \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.