From: Minchan Kim <minchan@kernel.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>,
Sasha Levin <sasha.levin@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
Vlastimil Babka <vbabka@suse.cz>
Subject: Re: kernel oops on mmotm-2015-10-15-15-20
Date: Fri, 30 Oct 2015 16:03:50 +0900 [thread overview]
Message-ID: <20151030070350.GB16099@bbox> (raw)
In-Reply-To: <20151029095206.GB29870@node.shutemov.name>
On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > Hello Hugh,
> > > > >
> > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > >
> > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > to be true.
> > > > > > >
> > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
> > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > >
> > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > >
> > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > series and will test it again.
> > > > > If it is fixed, I will test again with your migration patchset, then.
> > > >
> > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > And I added below debug code with request from Kirill to all test kernels.
> > >
> > > It took too long time (and a lot of printk()), but I think I track it down
> > > finally.
> > >
> > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > looks like it works.
> > >
> > > The problem was my wrong assumption on how migration works: I thought that
> > > kernel would wait migration to finish on before deconstruction mapping.
> > >
> > > But turn out that's not true.
> > >
> > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > with page which is not mapped anymore but has _count and _mapcount
> > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > It's likely that page->mapping in this case would point to freed anon_vma.
> > >
> > > BOOM!
> > >
> > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > we get pin back and put page on rmap. This way even if migration entry
> > > will be removed under us we don't corrupt page's state.
> > >
> > > Please, test.
> > >
> >
> > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > one I sent to you(ie, oops.c + memcg_test.sh)
> >
> > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > page->mem_cgroup:ffff88007f613c00
>
> Ignore my previous answer. Still sleeping.
>
> The right way to fix I think is something like:
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 35643176bc15..f2d46792a554 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> bool compound = flags & RMAP_COMPOUND;
> bool first;
>
> - if (PageTransCompound(page)) {
> + if (PageTransCompound(page) && compound) {
> + atomic_t *mapcount;
> VM_BUG_ON_PAGE(!PageLocked(page), page);
> - if (compound) {
> - atomic_t *mapcount;
> -
> - VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> - mapcount = compound_mapcount_ptr(page);
> - first = atomic_inc_and_test(mapcount);
> - } else {
> - /* Anon THP always mapped first with PMD */
> - first = 0;
> - VM_BUG_ON_PAGE(!page_mapcount(page), page);
> - atomic_inc(&page->_mapcount);
> - }
> + VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> + mapcount = compound_mapcount_ptr(page);
> + first = atomic_inc_and_test(mapcount);
> } else {
> VM_BUG_ON_PAGE(compound, page);
> first = atomic_inc_and_test(&page->_mapcount);
> --
kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
<SNIP>
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>,
Sasha Levin <sasha.levin@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
Vlastimil Babka <vbabka@suse.cz>
Subject: Re: kernel oops on mmotm-2015-10-15-15-20
Date: Fri, 30 Oct 2015 16:03:50 +0900 [thread overview]
Message-ID: <20151030070350.GB16099@bbox> (raw)
In-Reply-To: <20151029095206.GB29870@node.shutemov.name>
On Thu, Oct 29, 2015 at 11:52:06AM +0200, Kirill A. Shutemov wrote:
> On Thu, Oct 29, 2015 at 04:58:29PM +0900, Minchan Kim wrote:
> > On Thu, Oct 29, 2015 at 02:25:24AM +0200, Kirill A. Shutemov wrote:
> > > On Thu, Oct 22, 2015 at 06:00:51PM +0900, Minchan Kim wrote:
> > > > On Thu, Oct 22, 2015 at 10:21:36AM +0900, Minchan Kim wrote:
> > > > > Hello Hugh,
> > > > >
> > > > > On Wed, Oct 21, 2015 at 05:59:59PM -0700, Hugh Dickins wrote:
> > > > > > On Thu, 22 Oct 2015, Minchan Kim wrote:
> > > > > > >
> > > > > > > I added the code to check it and queued it again but I had another oops
> > > > > > > in this time but symptom is related to anon_vma, too.
> > > > > > > (kernel is based on recent mmotm + unconditional mkdirty for bug fix)
> > > > > > > It seems page_get_anon_vma returns NULL since the page was not page_mapped
> > > > > > > at that time but second check of page_mapped right before try_to_unmap seems
> > > > > > > to be true.
> > > > > > >
> > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
> > > > > > > Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
> > > > > > > page:ffffea0001cfbfc0 count:3 mapcount:1 mapping:ffff88007f1b5f51 index:0x600000aff
> > > > > > > flags: 0x4000000000048019(locked|uptodate|dirty|swapcache|swapbacked)
> > > > > > > page dumped because: VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma)
> > > > > >
> > > > > > That's interesting, that's one I added in my page migration series.
> > > > > > Let me think on it, but it could well relate to the one you got before.
> > > > >
> > > > > I will roll back to mm/madv_free-v4.3-rc5-mmotm-2015-10-15-15-20
> > > > > instead of next-20151021 to remove noise from your migration cleanup
> > > > > series and will test it again.
> > > > > If it is fixed, I will test again with your migration patchset, then.
> > > >
> > > > I tested mmotm-2015-10-15-15-20 with test program I attach for a long time.
> > > > Therefore, there is no patchset from Hugh's migration patch in there.
> > > > And I added below debug code with request from Kirill to all test kernels.
> > >
> > > It took too long time (and a lot of printk()), but I think I track it down
> > > finally.
> > >
> > > The patch below seems fixes issue for me. It's not yet properly tested, but
> > > looks like it works.
> > >
> > > The problem was my wrong assumption on how migration works: I thought that
> > > kernel would wait migration to finish on before deconstruction mapping.
> > >
> > > But turn out that's not true.
> > >
> > > As result if zap_pte_range() races with split_huge_page(), we can end up
> > > with page which is not mapped anymore but has _count and _mapcount
> > > elevated. The page is on LRU too. So it's still reachable by vmscan and by
> > > pfn scanners (Sasha showed few similar traces from compaction too).
> > > It's likely that page->mapping in this case would point to freed anon_vma.
> > >
> > > BOOM!
> > >
> > > The patch modify freeze/unfreeze_page() code to match normal migration
> > > entries logic: on setup we remove page from rmap and drop pin, on removing
> > > we get pin back and put page on rmap. This way even if migration entry
> > > will be removed under us we don't corrupt page's state.
> > >
> > > Please, test.
> > >
> >
> > kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + your new patch, I tested
> > one I sent to you(ie, oops.c + memcg_test.sh)
> >
> > page:ffffea00016a0000 count:3 mapcount:0 mapping:ffff88007f49d001 index:0x600001800 compound_mapcount: 0
> > flags: 0x4000000000044009(locked|uptodate|head|swapbacked)
> > page dumped because: VM_BUG_ON_PAGE(!page_mapcount(page))
> > page->mem_cgroup:ffff88007f613c00
>
> Ignore my previous answer. Still sleeping.
>
> The right way to fix I think is something like:
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 35643176bc15..f2d46792a554 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1173,20 +1173,12 @@ void do_page_add_anon_rmap(struct page *page,
> bool compound = flags & RMAP_COMPOUND;
> bool first;
>
> - if (PageTransCompound(page)) {
> + if (PageTransCompound(page) && compound) {
> + atomic_t *mapcount;
> VM_BUG_ON_PAGE(!PageLocked(page), page);
> - if (compound) {
> - atomic_t *mapcount;
> -
> - VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> - mapcount = compound_mapcount_ptr(page);
> - first = atomic_inc_and_test(mapcount);
> - } else {
> - /* Anon THP always mapped first with PMD */
> - first = 0;
> - VM_BUG_ON_PAGE(!page_mapcount(page), page);
> - atomic_inc(&page->_mapcount);
> - }
> + VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> + mapcount = compound_mapcount_ptr(page);
> + first = atomic_inc_and_test(mapcount);
> } else {
> VM_BUG_ON_PAGE(compound, page);
> first = atomic_inc_and_test(&page->_mapcount);
> --
kernel: On mmotm-2015-10-15-15-20 + pte_mkdirty patch + freeze/unfreeze patch + above patch,
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
<SNIP>
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5. Priority:-1 extents:1 across:4191228k FS
BUG: Bad rss-counter state mm:ffff880046980700 idx:1 val:511
BUG: Bad rss-counter state mm:ffff880046980700 idx:2 val:1
next prev parent reply other threads:[~2015-10-30 7:03 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-21 5:28 kernel oops on mmotm-2015-10-15-15-20 Minchan Kim
2015-10-21 5:28 ` Minchan Kim
2015-10-21 11:07 ` Kirill A. Shutemov
2015-10-21 11:07 ` Kirill A. Shutemov
2015-10-22 0:06 ` Minchan Kim
2015-10-22 0:06 ` Minchan Kim
2015-10-22 0:59 ` Hugh Dickins
2015-10-22 0:59 ` Hugh Dickins
2015-10-22 1:21 ` Minchan Kim
2015-10-22 1:21 ` Minchan Kim
2015-10-22 9:00 ` Minchan Kim
2015-10-29 0:25 ` Kirill A. Shutemov
2015-10-29 0:25 ` Kirill A. Shutemov
2015-10-29 7:58 ` Minchan Kim
2015-10-29 7:58 ` Minchan Kim
2015-10-29 9:43 ` Kirill A. Shutemov
2015-10-29 9:43 ` Kirill A. Shutemov
2015-10-29 9:52 ` Kirill A. Shutemov
2015-10-29 9:52 ` Kirill A. Shutemov
2015-10-30 7:03 ` Minchan Kim [this message]
2015-10-30 7:03 ` Minchan Kim
2015-11-02 12:57 ` Kirill A. Shutemov
2015-11-02 12:57 ` Kirill A. Shutemov
2015-11-03 3:02 ` Minchan Kim
2015-11-03 3:02 ` Minchan Kim
2015-11-03 7:16 ` Kirill A. Shutemov
2015-11-03 7:16 ` Kirill A. Shutemov
2015-11-03 7:33 ` Minchan Kim
2015-11-03 7:33 ` Minchan Kim
2015-11-03 15:20 ` Minchan Kim
2015-11-03 15:20 ` Minchan Kim
2015-11-04 14:21 ` Kirill A. Shutemov
2015-11-04 14:21 ` Kirill A. Shutemov
2015-11-05 0:19 ` Minchan Kim
2015-11-05 0:19 ` Minchan Kim
2015-11-08 22:55 ` Kirill A. Shutemov
2015-11-08 22:55 ` Kirill A. Shutemov
2015-11-12 0:36 ` Minchan Kim
2015-11-12 0:36 ` Minchan Kim
2015-11-16 1:45 ` Minchan Kim
2015-11-16 1:45 ` Minchan Kim
2015-11-16 8:45 ` Kirill A. Shutemov
2015-11-16 8:45 ` Kirill A. Shutemov
2015-11-16 10:32 ` Minchan Kim
2015-11-16 10:32 ` Minchan Kim
2015-11-16 10:54 ` Kirill A. Shutemov
2015-11-16 10:54 ` Kirill A. Shutemov
2015-11-17 7:35 ` Minchan Kim
2015-11-17 7:35 ` Minchan Kim
2015-11-17 9:32 ` Kirill A. Shutemov
2015-11-17 9:32 ` Kirill A. Shutemov
2015-11-19 2:12 ` Minchan Kim
2015-11-19 2:12 ` Minchan Kim
2015-11-19 6:58 ` Kirill A. Shutemov
2015-11-19 6:58 ` Kirill A. Shutemov
2015-11-19 10:10 ` yalin wang
2015-11-19 10:10 ` yalin wang
2015-11-25 7:21 ` Minchan Kim
2015-11-25 7:21 ` Minchan Kim
2015-10-22 2:15 ` Hugh Dickins
2015-10-22 2:15 ` Hugh Dickins
2015-10-22 4:25 ` Hugh Dickins
2015-10-22 4:25 ` Hugh Dickins
2015-10-22 22:26 ` Hugh Dickins
2015-10-22 22:26 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151030070350.GB16099@bbox \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
--cc=sasha.levin@oracle.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.