* Re: [Intel-gfx] memcontrol.c BUG
[not found] ` <20150128084852.GC28132@nuc-i3427.alporthouse.com>
@ 2015-01-28 14:32 ` Michal Hocko
2015-01-29 8:16 ` Chris Wilson
2015-01-30 2:04 ` Hugh Dickins
0 siblings, 2 replies; 6+ messages in thread
From: Michal Hocko @ 2015-01-28 14:32 UTC (permalink / raw)
To: Chris Wilson, Dave Airlie, Johannes Weiner,
intel-gfx@lists.freedesktop.org, Hugh Dickins, Tejun Heo,
Vladimir Davydov, Jet Chen, Felipe Balbi, Andrew Morton
Cc: linux-mm
On Wed 28-01-15 08:48:52, Chris Wilson wrote:
> On Wed, Jan 28, 2015 at 08:13:06AM +1000, Dave Airlie wrote:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1165369
> >
> > ov 18 09:23:22 elissa.gathman.org kernel: page:f5e36a40 count:2
> > mapcount:0 mapping: (null) index:0x0
> > Nov 18 09:23:22 elissa.gathman.org kernel: page flags:
> > 0x80090029(locked|uptodate|lru|swapcache|swapbacked)
> > Nov 18 09:23:22 elissa.gathman.org kernel: page dumped because:
> > VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage))
> > Nov 18 09:23:23 elissa.gathman.org kernel: ------------[ cut here ]------------
> > Nov 18 09:23:23 elissa.gathman.org kernel: kernel BUG at mm/memcontrol.c:6733!
I guess this matches the following bugon in your kernel:
VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage), oldpage);
so the oldpage is on the LRU list already. I am completely unfamiliar
with 965GM but is the page perhaps shared with somebody with a different
gfp mask requirement (e.g. userspace accessing the memory via mmap)? So
the other (racing) caller didn't need to move the page and put it on
LRU.
If yes we need to tell shmem_replace_page to do the lrucare handling.
diff --git a/mm/shmem.c b/mm/shmem.c
index 339e06639956..e3cdc1a16c0f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1013,7 +1013,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
*/
oldpage = newpage;
} else {
- mem_cgroup_migrate(oldpage, newpage, false);
+ mem_cgroup_migrate(oldpage, newpage, true);
lru_cache_add_anon(newpage);
*pagep = newpage;
}
[...]
> 965GM and that it uniquely uses
>
> mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> if (IS_CRESTLINE(dev) || IS_BROADWATER(dev)) {
> /* 965gm cannot relocate objects above 4GiB. */
> mask &= ~__GFP_HIGHMEM;
> mask |= __GFP_DMA32;
> }
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Intel-gfx] memcontrol.c BUG
2015-01-28 14:32 ` [Intel-gfx] memcontrol.c BUG Michal Hocko
@ 2015-01-29 8:16 ` Chris Wilson
2015-01-29 23:26 ` Dave Airlie
2015-01-30 2:04 ` Hugh Dickins
1 sibling, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2015-01-29 8:16 UTC (permalink / raw)
To: Michal Hocko
Cc: Dave Airlie, Johannes Weiner, intel-gfx@lists.freedesktop.org,
Hugh Dickins, Tejun Heo, Vladimir Davydov, Jet Chen, Felipe Balbi,
Andrew Morton, linux-mm
On Wed, Jan 28, 2015 at 03:32:43PM +0100, Michal Hocko wrote:
> On Wed 28-01-15 08:48:52, Chris Wilson wrote:
> > On Wed, Jan 28, 2015 at 08:13:06AM +1000, Dave Airlie wrote:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1165369
> > >
> > > ov 18 09:23:22 elissa.gathman.org kernel: page:f5e36a40 count:2
> > > mapcount:0 mapping: (null) index:0x0
> > > Nov 18 09:23:22 elissa.gathman.org kernel: page flags:
> > > 0x80090029(locked|uptodate|lru|swapcache|swapbacked)
> > > Nov 18 09:23:22 elissa.gathman.org kernel: page dumped because:
> > > VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage))
> > > Nov 18 09:23:23 elissa.gathman.org kernel: ------------[ cut here ]------------
> > > Nov 18 09:23:23 elissa.gathman.org kernel: kernel BUG at mm/memcontrol.c:6733!
>
> I guess this matches the following bugon in your kernel:
> VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage), oldpage);
>
> so the oldpage is on the LRU list already. I am completely unfamiliar
> with 965GM but is the page perhaps shared with somebody with a different
> gfp mask requirement (e.g. userspace accessing the memory via mmap)? So
> the other (racing) caller didn't need to move the page and put it on
> LRU.
Generally, yes. The shmemfs filp is exported through a vm_mmap() as well
as pinned into the GPU via shmem_read_mapping_page_gfp(). But I would
not expect that to be the case very often, if at all, on 965GM as the
two access paths are incoherent. Still it sounds promising, hopefully
Dave can put it into a fedora kernel for testing?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Intel-gfx] memcontrol.c BUG
2015-01-29 8:16 ` Chris Wilson
@ 2015-01-29 23:26 ` Dave Airlie
0 siblings, 0 replies; 6+ messages in thread
From: Dave Airlie @ 2015-01-29 23:26 UTC (permalink / raw)
To: Chris Wilson, Michal Hocko, Dave Airlie, Johannes Weiner,
intel-gfx@lists.freedesktop.org, Hugh Dickins, Tejun Heo,
Vladimir Davydov, Jet Chen, Felipe Balbi, Andrew Morton,
Linux Memory Management List
On 29 January 2015 at 18:16, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Wed, Jan 28, 2015 at 03:32:43PM +0100, Michal Hocko wrote:
>> On Wed 28-01-15 08:48:52, Chris Wilson wrote:
>> > On Wed, Jan 28, 2015 at 08:13:06AM +1000, Dave Airlie wrote:
>> > > https://bugzilla.redhat.com/show_bug.cgi?id=1165369
>> > >
>> > > ov 18 09:23:22 elissa.gathman.org kernel: page:f5e36a40 count:2
>> > > mapcount:0 mapping: (null) index:0x0
>> > > Nov 18 09:23:22 elissa.gathman.org kernel: page flags:
>> > > 0x80090029(locked|uptodate|lru|swapcache|swapbacked)
>> > > Nov 18 09:23:22 elissa.gathman.org kernel: page dumped because:
>> > > VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage))
>> > > Nov 18 09:23:23 elissa.gathman.org kernel: ------------[ cut here ]------------
>> > > Nov 18 09:23:23 elissa.gathman.org kernel: kernel BUG at mm/memcontrol.c:6733!
>>
>> I guess this matches the following bugon in your kernel:
>> VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage), oldpage);
>>
>> so the oldpage is on the LRU list already. I am completely unfamiliar
>> with 965GM but is the page perhaps shared with somebody with a different
>> gfp mask requirement (e.g. userspace accessing the memory via mmap)? So
>> the other (racing) caller didn't need to move the page and put it on
>> LRU.
>
> Generally, yes. The shmemfs filp is exported through a vm_mmap() as well
> as pinned into the GPU via shmem_read_mapping_page_gfp(). But I would
> not expect that to be the case very often, if at all, on 965GM as the
> two access paths are incoherent. Still it sounds promising, hopefully
> Dave can put it into a fedora kernel for testing?
http://kojipkgs.fedoraproject.org/scratch/airlied/task_8760024/
done, also asked on the bug for testers.
Dave.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Intel-gfx] memcontrol.c BUG
2015-01-28 14:32 ` [Intel-gfx] memcontrol.c BUG Michal Hocko
2015-01-29 8:16 ` Chris Wilson
@ 2015-01-30 2:04 ` Hugh Dickins
2015-02-02 15:00 ` [PATCH] memcg, shmem: fix shmem migration to use lrucare. (was: Re: [Intel-gfx] memcontrol.c BUG) Michal Hocko
1 sibling, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2015-01-30 2:04 UTC (permalink / raw)
To: Michal Hocko
Cc: Chris Wilson, Dave Airlie, Johannes Weiner,
intel-gfx@lists.freedesktop.org, Hugh Dickins, Tejun Heo,
Vladimir Davydov, Jet Chen, Felipe Balbi, Andrew Morton, linux-mm
On Wed, 28 Jan 2015, Michal Hocko wrote:
> On Wed 28-01-15 08:48:52, Chris Wilson wrote:
> > On Wed, Jan 28, 2015 at 08:13:06AM +1000, Dave Airlie wrote:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1165369
> > >
> > > ov 18 09:23:22 elissa.gathman.org kernel: page:f5e36a40 count:2
> > > mapcount:0 mapping: (null) index:0x0
> > > Nov 18 09:23:22 elissa.gathman.org kernel: page flags:
> > > 0x80090029(locked|uptodate|lru|swapcache|swapbacked)
> > > Nov 18 09:23:22 elissa.gathman.org kernel: page dumped because:
> > > VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage))
> > > Nov 18 09:23:23 elissa.gathman.org kernel: ------------[ cut here ]------------
> > > Nov 18 09:23:23 elissa.gathman.org kernel: kernel BUG at mm/memcontrol.c:6733!
>
> I guess this matches the following bugon in your kernel:
> VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage), oldpage);
>
> so the oldpage is on the LRU list already. I am completely unfamiliar
> with 965GM but is the page perhaps shared with somebody with a different
> gfp mask requirement (e.g. userspace accessing the memory via mmap)? So
> the other (racing) caller didn't need to move the page and put it on
> LRU.
It would be surprising (but not impossible) for oldpage not to be on
the LRU already: it's a swapin readahead page that has every right to
be on LRU, but turns out to have been allocated from an unsuitable zone,
once we discover that it's needed in one of these odd hardware-limited
mappings. (Whereas newpage is newly allocated and not yet on LRU.)
>
> If yes we need to tell shmem_replace_page to do the lrucare handling.
Absolutely, thanks Michal. It would also be good to change the comment
on mem_cgroup_migrate() in mm/memcontrol.c, from "@lrucare: both pages..."
to "@lrucare: either or both pages..." - though I certainly won't pretend
that the corrected wording would have prevented this bug creeping in!
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 339e06639956..e3cdc1a16c0f 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1013,7 +1013,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
> */
> oldpage = newpage;
> } else {
> - mem_cgroup_migrate(oldpage, newpage, false);
> + mem_cgroup_migrate(oldpage, newpage, true);
> lru_cache_add_anon(newpage);
> *pagep = newpage;
> }
Acked-by: Hugh Dickins <hughd@google.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] memcg, shmem: fix shmem migration to use lrucare. (was: Re: [Intel-gfx] memcontrol.c BUG)
2015-01-30 2:04 ` Hugh Dickins
@ 2015-02-02 15:00 ` Michal Hocko
2015-02-02 16:18 ` Johannes Weiner
0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2015-02-02 15:00 UTC (permalink / raw)
To: Hugh Dickins
Cc: Chris Wilson, Dave Airlie, Johannes Weiner,
intel-gfx@lists.freedesktop.org, Tejun Heo, Vladimir Davydov,
Jet Chen, Felipe Balbi, Andrew Morton, linux-mm
On Thu 29-01-15 18:04:15, Hugh Dickins wrote:
> On Wed, 28 Jan 2015, Michal Hocko wrote:
> > On Wed 28-01-15 08:48:52, Chris Wilson wrote:
> > > On Wed, Jan 28, 2015 at 08:13:06AM +1000, Dave Airlie wrote:
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1165369
> > > >
> > > > ov 18 09:23:22 elissa.gathman.org kernel: page:f5e36a40 count:2
> > > > mapcount:0 mapping: (null) index:0x0
> > > > Nov 18 09:23:22 elissa.gathman.org kernel: page flags:
> > > > 0x80090029(locked|uptodate|lru|swapcache|swapbacked)
> > > > Nov 18 09:23:22 elissa.gathman.org kernel: page dumped because:
> > > > VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage))
> > > > Nov 18 09:23:23 elissa.gathman.org kernel: ------------[ cut here ]------------
> > > > Nov 18 09:23:23 elissa.gathman.org kernel: kernel BUG at mm/memcontrol.c:6733!
> >
> > I guess this matches the following bugon in your kernel:
> > VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage), oldpage);
> >
> > so the oldpage is on the LRU list already. I am completely unfamiliar
> > with 965GM but is the page perhaps shared with somebody with a different
> > gfp mask requirement (e.g. userspace accessing the memory via mmap)? So
> > the other (racing) caller didn't need to move the page and put it on
> > LRU.
>
> It would be surprising (but not impossible) for oldpage not to be on
> the LRU already: it's a swapin readahead page that has every right to
> be on LRU,
True, thanks for pointing this out.
> but turns out to have been allocated from an unsuitable zone,
> once we discover that it's needed in one of these odd hardware-limited
> mappings. (Whereas newpage is newly allocated and not yet on LRU.)
>
> >
> > If yes we need to tell shmem_replace_page to do the lrucare handling.
>
> Absolutely, thanks Michal. It would also be good to change the comment
> on mem_cgroup_migrate() in mm/memcontrol.c, from "@lrucare: both pages..."
> to "@lrucare: either or both pages..." - though I certainly won't pretend
> that the corrected wording would have prevented this bug creeping in!
Yes, I have updated the wording.
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 339e06639956..e3cdc1a16c0f 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1013,7 +1013,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
> > */
> > oldpage = newpage;
> > } else {
> > - mem_cgroup_migrate(oldpage, newpage, false);
> > + mem_cgroup_migrate(oldpage, newpage, true);
> > lru_cache_add_anon(newpage);
> > *pagep = newpage;
> > }
>
> Acked-by: Hugh Dickins <hughd@google.com>
Thanks! The full patch is below. I wasn't sure who was the one to report
the issue so I hope the credits are right. I have marked the patch for
stable because some people are running with VM debugging enabled. AFAICS
the issue is not so harmful without debugging on because the stale
oldpage would be removed from the LRU list eventually.
---
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] memcg, shmem: fix shmem migration to use lrucare. (was: Re: [Intel-gfx] memcontrol.c BUG)
2015-02-02 15:00 ` [PATCH] memcg, shmem: fix shmem migration to use lrucare. (was: Re: [Intel-gfx] memcontrol.c BUG) Michal Hocko
@ 2015-02-02 16:18 ` Johannes Weiner
0 siblings, 0 replies; 6+ messages in thread
From: Johannes Weiner @ 2015-02-02 16:18 UTC (permalink / raw)
To: Michal Hocko
Cc: Hugh Dickins, Chris Wilson, Dave Airlie,
intel-gfx@lists.freedesktop.org, Tejun Heo, Vladimir Davydov,
Jet Chen, Felipe Balbi, Andrew Morton, linux-mm
On Mon, Feb 02, 2015 at 04:00:51PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.cz>
> Date: Mon, 2 Feb 2015 15:22:19 +0100
> Subject: [PATCH] memcg, shmem: fix shmem migration to use lrucare.
>
> It has been reported that 965GM might trigger
>
> VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage), oldpage)
>
> in mem_cgroup_migrate when shmem wants to replace a swap cache page
> because of shmem_should_replace_page (the page is allocated from an
> inappropriate zone). shmem_replace_page expects that the oldpage is not
> on LRU list and calls mem_cgroup_migrate without lrucare. This is obviously
> incorrect because swapcache pages might be on the LRU list (e.g. swapin
> readahead page).
>
> Fix this by enabling lrucare for the migration in shmem_replace_page.
> Also clarify that lrucare should be used even if one of the pages might
> be on LRU list.
>
> The BUG_ON will trigger only when CONFIG_DEBUG_VM is enabled but even
> without that the migration code might leave the old page on an
> inappropriate memcg' LRU which is not that critical because the page
> would get removed with its last reference but it is still confusing.
>
> Fixes: 0a31bc97c80c (mm: memcontrol: rewrite uncharge API)
> Cc: stable@vger.kernel.org # 3.17+
> Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reported-by: Dave Airlie <airlied@gmail.com>
> Acked-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Thanks, Michal.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-02-02 16:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAPM=9tyyP_pKpWjc7LBZU7e6wAt26XGZsyhRh7N497B2+28rrQ@mail.gmail.com>
[not found] ` <20150128084852.GC28132@nuc-i3427.alporthouse.com>
2015-01-28 14:32 ` [Intel-gfx] memcontrol.c BUG Michal Hocko
2015-01-29 8:16 ` Chris Wilson
2015-01-29 23:26 ` Dave Airlie
2015-01-30 2:04 ` Hugh Dickins
2015-02-02 15:00 ` [PATCH] memcg, shmem: fix shmem migration to use lrucare. (was: Re: [Intel-gfx] memcontrol.c BUG) Michal Hocko
2015-02-02 16:18 ` Johannes Weiner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).