Re: crash during oom reaper - Kirill A. Shutemov

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Michal Hocko <mhocko@kernel.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rik van Riel <riel@redhat.com>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Ingo Molnar <mingo@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: crash during oom reaper
Date: Fri, 16 Dec 2016 16:07:30 +0300	[thread overview]
Message-ID: <20161216130730.GF27758@node> (raw)
In-Reply-To: <20161216125650.GJ13940@dhcp22.suse.cz>

On Fri, Dec 16, 2016 at 01:56:50PM +0100, Michal Hocko wrote:
> On Fri 16-12-16 15:35:55, Kirill A. Shutemov wrote:
> > On Fri, Dec 16, 2016 at 12:42:43PM +0100, Michal Hocko wrote:
> > > On Fri 16-12-16 13:44:38, Kirill A. Shutemov wrote:
> > > > On Fri, Dec 16, 2016 at 11:11:13AM +0100, Michal Hocko wrote:
> > > > > On Fri 16-12-16 10:43:52, Vegard Nossum wrote:
> > > > > [...]
> > > > > > I don't think it's a bug in the OOM reaper itself, but either of the
> > > > > > following two patches will fix the problem (without my understand how or
> > > > > > why):
> > > > > > 
> > > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > > > > index ec9f11d4f094..37b14b2e2af4 100644
> > > > > > --- a/mm/oom_kill.c
> > > > > > +++ b/mm/oom_kill.c
> > > > > > @@ -485,7 +485,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk,
> > > > > > struct mm_struct *mm)
> > > > > >  	 */
> > > > > >  	mutex_lock(&oom_lock);
> > > > > > 
> > > > > > -	if (!down_read_trylock(&mm->mmap_sem)) {
> > > > > > +	if (!down_write_trylock(&mm->mmap_sem)) {
> > > > > 
> > > > > __oom_reap_task_mm is basically the same thing as MADV_DONTNEED and that
> > > > > doesn't require the exlusive mmap_sem. So this looks correct to me.
> > > > 
> > > > BTW, shouldn't we filter out all VM_SPECIAL VMAs there? Or VM_PFNMAP at
> > > > least.
> > > > 
> > > > MADV_DONTNEED doesn't touch VM_PFNMAP, but I don't see anything matching
> > > > on __oom_reap_task_mm() side.
> > > 
> > > I guess you are right and we should match the MADV_DONTNEED behavior
> > > here. Care to send a patch?
> > 
> > Below. Testing required.
> > 
> > > > Other difference is that you use unmap_page_range() witch doesn't touch
> > > > mmu_notifiers. MADV_DONTNEED goes via zap_page_range(), which invalidates
> > > > the range. Not sure if it can make any difference here.
> > > 
> > > Which mmu notifier would care about this? I am not really familiar with
> > > those users so I might miss something easily.
> > 
> > No idea either.
> > 
> > Is there any reason not to use zap_page_range here too?
> 
> Yes, zap_page_range is much more heavy and performs operations which
> might lock AFAIR which I really would like to prevent from.

What exactly can block there? I don't see anything with that potential.

> > Few more notes:
> > 
> > I propably miss something, but why do we need details->ignore_dirty?
> >
> > It only appiled for non-anon pages, but since we filter out shared
> > mappings, how can we have pte_dirty() for !PageAnon()?
> 
> Why couldn't we have dirty pages on the private file mappings? The
> underlying page might be still in the page cache, right?

The check is about dirty PTE, not dirty page.

> > check_swap_entries is also sloppy: the behavior doesn't match the comment:
> > details == NULL makes it check swap entries. I removed it and restore
> > details->check_mapping test as we had before.
> 
> the reason is unmap_mapping_range which didn't use to check swap entries
> so I wanted to have it opt in AFAIR.

details == NULL would give you it in both cases.

> > @@ -531,8 +519,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
> >  		 * count elevated without a good reason.
> >  		 */
> >  		if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED))
> > -			unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end,
> > -					 &details);
> > +			madvise_dontneed(vma, &vma, vma->vm_start, vma->vm_end);
> 
> I would rather keep the unmap_page_range because it is the bare minumum
> we have to do. Currently we are doing 
> 
> 		if (is_vm_hugetlb_page(vma))
> 			continue;
> 
> so I would rather do something like
> 		if (!can_vma_madv_dontneed(vma))
> 			continue;
> instead.

We can do that.
But let's first understand why code should differ from madvise_dontneed().
It's not obvious to me.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Michal Hocko <mhocko@kernel.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rik van Riel <riel@redhat.com>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Ingo Molnar <mingo@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: crash during oom reaper
Date: Fri, 16 Dec 2016 16:07:30 +0300	[thread overview]
Message-ID: <20161216130730.GF27758@node> (raw)
In-Reply-To: <20161216125650.GJ13940@dhcp22.suse.cz>

On Fri, Dec 16, 2016 at 01:56:50PM +0100, Michal Hocko wrote:
> On Fri 16-12-16 15:35:55, Kirill A. Shutemov wrote:
> > On Fri, Dec 16, 2016 at 12:42:43PM +0100, Michal Hocko wrote:
> > > On Fri 16-12-16 13:44:38, Kirill A. Shutemov wrote:
> > > > On Fri, Dec 16, 2016 at 11:11:13AM +0100, Michal Hocko wrote:
> > > > > On Fri 16-12-16 10:43:52, Vegard Nossum wrote:
> > > > > [...]
> > > > > > I don't think it's a bug in the OOM reaper itself, but either of the
> > > > > > following two patches will fix the problem (without my understand how or
> > > > > > why):
> > > > > > 
> > > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > > > > index ec9f11d4f094..37b14b2e2af4 100644
> > > > > > --- a/mm/oom_kill.c
> > > > > > +++ b/mm/oom_kill.c
> > > > > > @@ -485,7 +485,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk,
> > > > > > struct mm_struct *mm)
> > > > > >  	 */
> > > > > >  	mutex_lock(&oom_lock);
> > > > > > 
> > > > > > -	if (!down_read_trylock(&mm->mmap_sem)) {
> > > > > > +	if (!down_write_trylock(&mm->mmap_sem)) {
> > > > > 
> > > > > __oom_reap_task_mm is basically the same thing as MADV_DONTNEED and that
> > > > > doesn't require the exlusive mmap_sem. So this looks correct to me.
> > > > 
> > > > BTW, shouldn't we filter out all VM_SPECIAL VMAs there? Or VM_PFNMAP at
> > > > least.
> > > > 
> > > > MADV_DONTNEED doesn't touch VM_PFNMAP, but I don't see anything matching
> > > > on __oom_reap_task_mm() side.
> > > 
> > > I guess you are right and we should match the MADV_DONTNEED behavior
> > > here. Care to send a patch?
> > 
> > Below. Testing required.
> > 
> > > > Other difference is that you use unmap_page_range() witch doesn't touch
> > > > mmu_notifiers. MADV_DONTNEED goes via zap_page_range(), which invalidates
> > > > the range. Not sure if it can make any difference here.
> > > 
> > > Which mmu notifier would care about this? I am not really familiar with
> > > those users so I might miss something easily.
> > 
> > No idea either.
> > 
> > Is there any reason not to use zap_page_range here too?
> 
> Yes, zap_page_range is much more heavy and performs operations which
> might lock AFAIR which I really would like to prevent from.

What exactly can block there? I don't see anything with that potential.

> > Few more notes:
> > 
> > I propably miss something, but why do we need details->ignore_dirty?
> >
> > It only appiled for non-anon pages, but since we filter out shared
> > mappings, how can we have pte_dirty() for !PageAnon()?
> 
> Why couldn't we have dirty pages on the private file mappings? The
> underlying page might be still in the page cache, right?

The check is about dirty PTE, not dirty page.

> > check_swap_entries is also sloppy: the behavior doesn't match the comment:
> > details == NULL makes it check swap entries. I removed it and restore
> > details->check_mapping test as we had before.
> 
> the reason is unmap_mapping_range which didn't use to check swap entries
> so I wanted to have it opt in AFAIR.

details == NULL would give you it in both cases.

> > @@ -531,8 +519,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
> >  		 * count elevated without a good reason.
> >  		 */
> >  		if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED))
> > -			unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end,
> > -					 &details);
> > +			madvise_dontneed(vma, &vma, vma->vm_start, vma->vm_end);
> 
> I would rather keep the unmap_page_range because it is the bare minumum
> we have to do. Currently we are doing 
> 
> 		if (is_vm_hugetlb_page(vma))
> 			continue;
> 
> so I would rather do something like
> 		if (!can_vma_madv_dontneed(vma))
> 			continue;
> instead.

We can do that.
But let's first understand why code should differ from madvise_dontneed().
It's not obvious to me.

-- 
 Kirill A. Shutemov

next prev parent reply	other threads:[~2016-12-16 13:07 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-16  8:21 [PATCH 1/4] mm: add new mmgrab() helper Vegard Nossum
2016-12-16  8:21 ` Vegard Nossum
2016-12-16  8:22 ` [PATCH 2/4] mm: add new mmget() helper Vegard Nossum
2016-12-16  8:22   ` Vegard Nossum
2016-12-16  9:26   ` Michal Hocko
2016-12-16  9:26     ` Michal Hocko
2016-12-16  8:22 ` [PATCH 3/4] mm: use mmget_not_zero() helper Vegard Nossum
2016-12-16  8:22   ` Vegard Nossum
2016-12-16  9:27   ` Michal Hocko
2016-12-16  9:27     ` Michal Hocko
2016-12-16  8:22 ` [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting debugging Vegard Nossum
2016-12-16  8:22   ` Vegard Nossum
2016-12-16  9:01   ` Michal Hocko
2016-12-16  9:01     ` Michal Hocko
2016-12-16  9:43     ` Vegard Nossum
2016-12-16  9:43       ` Vegard Nossum
2016-12-16 10:11       ` crash during oom reaper (was: Re: [PATCH 4/4] [RFC!] mm: 'struct mm_struct' reference counting debugging) Michal Hocko
2016-12-16 10:11         ` Michal Hocko
2016-12-16 10:44         ` Kirill A. Shutemov
2016-12-16 10:44           ` Kirill A. Shutemov
2016-12-16 11:42           ` crash during oom reaper Michal Hocko
2016-12-16 11:42             ` Michal Hocko
2016-12-16 12:12             ` Michal Hocko
2016-12-16 12:12               ` Michal Hocko
2016-12-16 12:35             ` Kirill A. Shutemov
2016-12-16 12:35               ` Kirill A. Shutemov
2016-12-16 12:56               ` Michal Hocko
2016-12-16 12:56                 ` Michal Hocko
2016-12-16 13:07                 ` Kirill A. Shutemov [this message]
2016-12-16 13:07                   ` Kirill A. Shutemov
2016-12-16 13:14                   ` Michal Hocko
2016-12-16 13:14                     ` Michal Hocko
2016-12-18 13:47                     ` Tetsuo Handa
2016-12-18 13:47                       ` Tetsuo Handa
2016-12-18 16:06                       ` Michal Hocko
2016-12-18 16:06                         ` Michal Hocko
2016-12-16 13:14         ` Vegard Nossum
2016-12-16 13:14           ` Vegard Nossum
2016-12-16 14:00           ` Michal Hocko
2016-12-16 14:00             ` Michal Hocko
2016-12-16 14:25             ` Vegard Nossum
2016-12-16 14:25               ` Vegard Nossum
2016-12-16 14:32               ` Michal Hocko
2016-12-16 14:32                 ` Michal Hocko
2016-12-16 14:53                 ` Vegard Nossum
2016-12-16 14:53                   ` Vegard Nossum
2016-12-16 14:04           ` Vegard Nossum
2016-12-16 14:04             ` Vegard Nossum
2016-12-16  9:14 ` [PATCH 1/4] mm: add new mmgrab() helper Michal Hocko
2016-12-16  9:14   ` Michal Hocko
2016-12-16  9:56 ` Peter Zijlstra
2016-12-16  9:56   ` Peter Zijlstra
2016-12-16 10:19   ` Kirill A. Shutemov
2016-12-16 10:19     ` Kirill A. Shutemov
2016-12-16 10:20     ` Vegard Nossum
2016-12-16 10:20       ` Vegard Nossum
2016-12-16 10:36       ` Michal Hocko
2016-12-16 10:36         ` Michal Hocko
2016-12-16 11:14   ` Vegard Nossum
2016-12-16 11:14     ` Vegard Nossum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161216130730.GF27758@node \
    --to=kirill@shutemov.name \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mawilcox@microsoft.com \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vegard.nossum@oracle.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.