* kernel BUG at mm/huge_memory.c:212! @ 2012-11-27 21:18 Jiri Slaby 2012-11-27 23:47 ` David Rientjes ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Jiri Slaby @ 2012-11-27 21:18 UTC (permalink / raw) To: linux-mm, LKML Hi, I've hit BUG_ON(atomic_dec_and_test(&huge_zero_refcount)) in put_huge_zero_page right now. There are some "Bad rss-counter state" before that, but those are perhaps unrelated as I saw many of them in the previous -next. But even with yesterday's next I got the BUG. [ 7395.654928] BUG: Bad rss-counter state mm:ffff8800088289c0 idx:1 val:-1 [ 7417.652911] BUG: Bad rss-counter state mm:ffff880008829a00 idx:1 val:-1 [ 7423.317027] BUG: Bad rss-counter state mm:ffff8800088296c0 idx:1 val:-1 [ 7463.737596] BUG: Bad rss-counter state mm:ffff88000882ad80 idx:1 val:-2 [ 7486.462237] BUG: Bad rss-counter state mm:ffff880008829040 idx:1 val:-2 [ 7499.118560] BUG: Bad rss-counter state mm:ffff880008829040 idx:1 val:-2 [ 7507.000464] BUG: Bad rss-counter state mm:ffff880008828000 idx:1 val:-2 [ 7512.898902] BUG: Bad rss-counter state mm:ffff880008829380 idx:1 val:-2 [ 7522.299066] BUG: Bad rss-counter state mm:ffff8800088296c0 idx:1 val:-2 [ 7530.471048] BUG: Bad rss-counter state mm:ffff8800088296c0 idx:1 val:-2 [ 7597.602661] BUG: 'atomic_dec_and_test(&huge_zero_refcount)' is true! [ 7597.602683] ------------[ cut here ]------------ [ 7597.602711] kernel BUG at /l/latest/linux/mm/huge_memory.c:212! [ 7597.602732] invalid opcode: 0000 [#1] SMP [ 7597.602751] Modules linked in: vfat fat dvb_usb_dib0700 dib0090 dib7000p dib7000m dib0070 dib8000 dib3000mc dibx000_common microcode [ 7597.602811] CPU 1 [ 7597.602823] Pid: 1221, comm: java Not tainted 3.7.0-rc6-next-20121126_64+ #1698 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M. [ 7597.602867] RIP: 0010:[<ffffffff8116839e>] [<ffffffff8116839e>] put_huge_zero_page+0x2e/0x30 [ 7597.602902] RSP: 0000:ffff8801a58cdd48 EFLAGS: 00010292 [ 7597.602921] RAX: 0000000000000038 RBX: ffff880183cc0d00 RCX: 0000000000000007 [ 7597.602944] RDX: 00000000000000b5 RSI: 0000000000000046 RDI: ffffffff81dc605c [ 7597.602967] RBP: ffff8801a58cdd48 R08: 746127203a475542 R09: 000000000000047b [ 7597.602990] R10: 6365645f63696d6f R11: 7365745f646e615f R12: 00007fd4b3e00000 [ 7597.603014] R13: 00007fd4b3dcc000 R14: ffff8801bdebab00 R15: 8000000001d94225 [ 7597.603037] FS: 00007fd4c7ebe700(0000) GS:ffff8801cbc80000(0000) knlGS:0000000000000000 [ 7597.603064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7597.603083] CR2: 00007fd4b3dcc498 CR3: 000000017d6bc000 CR4: 00000000000007e0 [ 7597.603106] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7597.603129] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 7597.603152] Process java (pid: 1221, threadinfo ffff8801a58cc000, task ffff8801a4655be0) [ 7597.603178] Stack: [ 7597.603187] ffff8801a58cddc8 ffffffff8116b8d4 ffff8801a38cb000 ffff8801bdebab00 [ 7597.603219] ffff880183cc0d00 00000001a38cb067 ffffea0006cccb40 ffff8801a3911cf0 [ 7597.603250] 00000001b332d000 00007fd4b3c00000 ffff880183cc0d00 00007fd4b3dcc498 [ 7597.603282] Call Trace: [ 7597.603293] [<ffffffff8116b8d4>] do_huge_pmd_wp_page+0x7e4/0x900 [ 7597.603316] [<ffffffff81148755>] handle_mm_fault+0x145/0x330 [ 7597.603337] [<ffffffff81071e45>] __do_page_fault+0x145/0x480 [ 7597.603358] [<ffffffff810b42c5>] ? sched_clock_local+0x25/0xa0 [ 7597.603378] [<ffffffff810b4ec8>] ? __enqueue_entity+0x78/0x80 [ 7597.603400] [<ffffffff810d0efd>] ? sys_futex+0x8d/0x190 [ 7597.603420] [<ffffffff810721be>] do_page_fault+0xe/0x10 [ 7597.603440] [<ffffffff816b7c72>] page_fault+0x22/0x30 [ 7597.603458] Code: 66 90 f0 ff 0d c0 05 cf 00 0f 94 c0 84 c0 75 02 f3 c3 55 48 c7 c6 60 51 97 81 48 c7 c7 1a 82 94 81 48 89 e5 31 c0 e8 25 60 54 00 <0f> 0b 66 66 66 66 90 55 48 89 e5 53 48 83 ec 08 48 83 7e 08 00 [ 7597.603640] RIP [<ffffffff8116839e>] put_huge_zero_page+0x2e/0x30 [ 7597.603664] RSP <ffff8801a58cdd48> [ 7597.636299] ---[ end trace 241e96a56fc0cf87 ]--- [ 7612.907136] SysRq : Keyboard mode set to system default thanks, -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel BUG at mm/huge_memory.c:212! 2012-11-27 21:18 kernel BUG at mm/huge_memory.c:212! Jiri Slaby @ 2012-11-27 23:47 ` David Rientjes 2012-11-29 7:38 ` Bob Liu 2012-11-30 15:03 ` [PATCH 0/2] " Kirill A. Shutemov 2 siblings, 0 replies; 12+ messages in thread From: David Rientjes @ 2012-11-27 23:47 UTC (permalink / raw) To: Jiri Slaby, Kirill A. Shutemov; +Cc: linux-mm, LKML On Tue, 27 Nov 2012, Jiri Slaby wrote: > Hi, > > I've hit BUG_ON(atomic_dec_and_test(&huge_zero_refcount)) in > put_huge_zero_page right now. There are some "Bad rss-counter state" > before that, but those are perhaps unrelated as I saw many of them in > the previous -next. But even with yesterday's next I got the BUG. > > [ 7395.654928] BUG: Bad rss-counter state mm:ffff8800088289c0 idx:1 val:-1 > [ 7417.652911] BUG: Bad rss-counter state mm:ffff880008829a00 idx:1 val:-1 > [ 7423.317027] BUG: Bad rss-counter state mm:ffff8800088296c0 idx:1 val:-1 > [ 7463.737596] BUG: Bad rss-counter state mm:ffff88000882ad80 idx:1 val:-2 > [ 7486.462237] BUG: Bad rss-counter state mm:ffff880008829040 idx:1 val:-2 > [ 7499.118560] BUG: Bad rss-counter state mm:ffff880008829040 idx:1 val:-2 > [ 7507.000464] BUG: Bad rss-counter state mm:ffff880008828000 idx:1 val:-2 > [ 7512.898902] BUG: Bad rss-counter state mm:ffff880008829380 idx:1 val:-2 > [ 7522.299066] BUG: Bad rss-counter state mm:ffff8800088296c0 idx:1 val:-2 > [ 7530.471048] BUG: Bad rss-counter state mm:ffff8800088296c0 idx:1 val:-2 > [ 7597.602661] BUG: 'atomic_dec_and_test(&huge_zero_refcount)' is true! > [ 7597.602683] ------------[ cut here ]------------ > [ 7597.602711] kernel BUG at /l/latest/linux/mm/huge_memory.c:212! > [ 7597.602732] invalid opcode: 0000 [#1] SMP > [ 7597.602751] Modules linked in: vfat fat dvb_usb_dib0700 dib0090 > dib7000p dib7000m dib0070 dib8000 dib3000mc dibx000_common microcode > [ 7597.602811] CPU 1 > [ 7597.602823] Pid: 1221, comm: java Not tainted > 3.7.0-rc6-next-20121126_64+ #1698 To Be Filled By O.E.M. To Be Filled By > O.E.M./To be filled by O.E.M. > [ 7597.602867] RIP: 0010:[<ffffffff8116839e>] [<ffffffff8116839e>] > put_huge_zero_page+0x2e/0x30 > [ 7597.602902] RSP: 0000:ffff8801a58cdd48 EFLAGS: 00010292 > [ 7597.602921] RAX: 0000000000000038 RBX: ffff880183cc0d00 RCX: > 0000000000000007 > [ 7597.602944] RDX: 00000000000000b5 RSI: 0000000000000046 RDI: > ffffffff81dc605c > [ 7597.602967] RBP: ffff8801a58cdd48 R08: 746127203a475542 R09: > 000000000000047b > [ 7597.602990] R10: 6365645f63696d6f R11: 7365745f646e615f R12: > 00007fd4b3e00000 > [ 7597.603014] R13: 00007fd4b3dcc000 R14: ffff8801bdebab00 R15: > 8000000001d94225 > [ 7597.603037] FS: 00007fd4c7ebe700(0000) GS:ffff8801cbc80000(0000) > knlGS:0000000000000000 > [ 7597.603064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 7597.603083] CR2: 00007fd4b3dcc498 CR3: 000000017d6bc000 CR4: > 00000000000007e0 > [ 7597.603106] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 7597.603129] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 7597.603152] Process java (pid: 1221, threadinfo ffff8801a58cc000, > task ffff8801a4655be0) > [ 7597.603178] Stack: > [ 7597.603187] ffff8801a58cddc8 ffffffff8116b8d4 ffff8801a38cb000 > ffff8801bdebab00 > [ 7597.603219] ffff880183cc0d00 00000001a38cb067 ffffea0006cccb40 > ffff8801a3911cf0 > [ 7597.603250] 00000001b332d000 00007fd4b3c00000 ffff880183cc0d00 > 00007fd4b3dcc498 > [ 7597.603282] Call Trace: > [ 7597.603293] [<ffffffff8116b8d4>] do_huge_pmd_wp_page+0x7e4/0x900 > [ 7597.603316] [<ffffffff81148755>] handle_mm_fault+0x145/0x330 > [ 7597.603337] [<ffffffff81071e45>] __do_page_fault+0x145/0x480 > [ 7597.603358] [<ffffffff810b42c5>] ? sched_clock_local+0x25/0xa0 > [ 7597.603378] [<ffffffff810b4ec8>] ? __enqueue_entity+0x78/0x80 > [ 7597.603400] [<ffffffff810d0efd>] ? sys_futex+0x8d/0x190 > [ 7597.603420] [<ffffffff810721be>] do_page_fault+0xe/0x10 > [ 7597.603440] [<ffffffff816b7c72>] page_fault+0x22/0x30 > [ 7597.603458] Code: 66 90 f0 ff 0d c0 05 cf 00 0f 94 c0 84 c0 75 02 f3 > c3 55 48 c7 c6 60 51 97 81 48 c7 c7 1a 82 94 81 48 89 e5 31 c0 e8 25 60 > 54 00 <0f> 0b 66 66 66 66 90 55 48 89 e5 53 48 83 ec 08 48 83 7e 08 00 > [ 7597.603640] RIP [<ffffffff8116839e>] put_huge_zero_page+0x2e/0x30 > [ 7597.603664] RSP <ffff8801a58cdd48> > [ 7597.636299] ---[ end trace 241e96a56fc0cf87 ]--- > [ 7612.907136] SysRq : Keyboard mode set to system default > Thanks for the report. Adding Kirill to the cc since this is from the huge zero page patchset sitting in next and is due to the refcounting on lazy allocation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel BUG at mm/huge_memory.c:212! 2012-11-27 21:18 kernel BUG at mm/huge_memory.c:212! Jiri Slaby 2012-11-27 23:47 ` David Rientjes @ 2012-11-29 7:38 ` Bob Liu 2012-11-30 15:03 ` [PATCH 0/2] " Kirill A. Shutemov 2 siblings, 0 replies; 12+ messages in thread From: Bob Liu @ 2012-11-29 7:38 UTC (permalink / raw) To: Jiri Slaby; +Cc: linux-mm, LKML, kirill.shutemov Hi Jiri, On Wed, Nov 28, 2012 at 5:18 AM, Jiri Slaby <jslaby@suse.cz> wrote: > Hi, > > I've hit BUG_ON(atomic_dec_and_test(&huge_zero_refcount)) in > put_huge_zero_page right now. There are some "Bad rss-counter state" > before that, but those are perhaps unrelated as I saw many of them in > the previous -next. But even with yesterday's next I got the BUG. > Could you please give more details about your test or how to trigger this BUG? I'm using kernel with huge zero page feature but haven't seen it yet. > [ 7395.654928] BUG: Bad rss-counter state mm:ffff8800088289c0 idx:1 val:-1 > [ 7417.652911] BUG: Bad rss-counter state mm:ffff880008829a00 idx:1 val:-1 > [ 7423.317027] BUG: Bad rss-counter state mm:ffff8800088296c0 idx:1 val:-1 > [ 7463.737596] BUG: Bad rss-counter state mm:ffff88000882ad80 idx:1 val:-2 > [ 7486.462237] BUG: Bad rss-counter state mm:ffff880008829040 idx:1 val:-2 > [ 7499.118560] BUG: Bad rss-counter state mm:ffff880008829040 idx:1 val:-2 > [ 7507.000464] BUG: Bad rss-counter state mm:ffff880008828000 idx:1 val:-2 > [ 7512.898902] BUG: Bad rss-counter state mm:ffff880008829380 idx:1 val:-2 > [ 7522.299066] BUG: Bad rss-counter state mm:ffff8800088296c0 idx:1 val:-2 > [ 7530.471048] BUG: Bad rss-counter state mm:ffff8800088296c0 idx:1 val:-2 > [ 7597.602661] BUG: 'atomic_dec_and_test(&huge_zero_refcount)' is true! > [ 7597.602683] ------------[ cut here ]------------ > [ 7597.602711] kernel BUG at /l/latest/linux/mm/huge_memory.c:212! > [ 7597.602732] invalid opcode: 0000 [#1] SMP > [ 7597.602751] Modules linked in: vfat fat dvb_usb_dib0700 dib0090 > dib7000p dib7000m dib0070 dib8000 dib3000mc dibx000_common microcode > [ 7597.602811] CPU 1 > [ 7597.602823] Pid: 1221, comm: java Not tainted > 3.7.0-rc6-next-20121126_64+ #1698 To Be Filled By O.E.M. To Be Filled By > O.E.M./To be filled by O.E.M. > [ 7597.602867] RIP: 0010:[<ffffffff8116839e>] [<ffffffff8116839e>] > put_huge_zero_page+0x2e/0x30 > [ 7597.602902] RSP: 0000:ffff8801a58cdd48 EFLAGS: 00010292 > [ 7597.602921] RAX: 0000000000000038 RBX: ffff880183cc0d00 RCX: > 0000000000000007 > [ 7597.602944] RDX: 00000000000000b5 RSI: 0000000000000046 RDI: > ffffffff81dc605c > [ 7597.602967] RBP: ffff8801a58cdd48 R08: 746127203a475542 R09: > 000000000000047b > [ 7597.602990] R10: 6365645f63696d6f R11: 7365745f646e615f R12: > 00007fd4b3e00000 > [ 7597.603014] R13: 00007fd4b3dcc000 R14: ffff8801bdebab00 R15: > 8000000001d94225 > [ 7597.603037] FS: 00007fd4c7ebe700(0000) GS:ffff8801cbc80000(0000) > knlGS:0000000000000000 > [ 7597.603064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 7597.603083] CR2: 00007fd4b3dcc498 CR3: 000000017d6bc000 CR4: > 00000000000007e0 > [ 7597.603106] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 7597.603129] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 7597.603152] Process java (pid: 1221, threadinfo ffff8801a58cc000, > task ffff8801a4655be0) > [ 7597.603178] Stack: > [ 7597.603187] ffff8801a58cddc8 ffffffff8116b8d4 ffff8801a38cb000 > ffff8801bdebab00 > [ 7597.603219] ffff880183cc0d00 00000001a38cb067 ffffea0006cccb40 > ffff8801a3911cf0 > [ 7597.603250] 00000001b332d000 00007fd4b3c00000 ffff880183cc0d00 > 00007fd4b3dcc498 > [ 7597.603282] Call Trace: > [ 7597.603293] [<ffffffff8116b8d4>] do_huge_pmd_wp_page+0x7e4/0x900 > [ 7597.603316] [<ffffffff81148755>] handle_mm_fault+0x145/0x330 > [ 7597.603337] [<ffffffff81071e45>] __do_page_fault+0x145/0x480 > [ 7597.603358] [<ffffffff810b42c5>] ? sched_clock_local+0x25/0xa0 > [ 7597.603378] [<ffffffff810b4ec8>] ? __enqueue_entity+0x78/0x80 > [ 7597.603400] [<ffffffff810d0efd>] ? sys_futex+0x8d/0x190 > [ 7597.603420] [<ffffffff810721be>] do_page_fault+0xe/0x10 > [ 7597.603440] [<ffffffff816b7c72>] page_fault+0x22/0x30 > [ 7597.603458] Code: 66 90 f0 ff 0d c0 05 cf 00 0f 94 c0 84 c0 75 02 f3 > c3 55 48 c7 c6 60 51 97 81 48 c7 c7 1a 82 94 81 48 89 e5 31 c0 e8 25 60 > 54 00 <0f> 0b 66 66 66 66 90 55 48 89 e5 53 48 83 ec 08 48 83 7e 08 00 > [ 7597.603640] RIP [<ffffffff8116839e>] put_huge_zero_page+0x2e/0x30 > [ 7597.603664] RSP <ffff8801a58cdd48> > [ 7597.636299] ---[ end trace 241e96a56fc0cf87 ]--- > [ 7612.907136] SysRq : Keyboard mode set to system default > Btw: Could you have a try with below patch? I think it might be related but not sure. Thank you! (Sorry i can only use web email currently so the patch format may be incorrect) ------------ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4489e16..d282d80 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1096,7 +1096,7 @@ pgtable_t get_pmd_huge_pte(struct mm_struct *mm) static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - pmd_t *pmd, unsigned long haddr) + pmd_t *pmd, pmd_t orig_pmd, unsigned long haddr) { pgtable_t pgtable; pmd_t _pmd; @@ -1125,6 +1125,10 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm, mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); spin_lock(&mm->page_table_lock); + if (unlikely(!pmd_same(*pmd, orig_pmd))) { + WARN_ON(1); + goto out_free_page; + } pmdp_clear_flush(vma, haddr, pmd); /* leave pmd empty until pte is filled */ @@ -1156,6 +1160,14 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm, ret |= VM_FAULT_WRITE; out: return ret; +out_free_page: + spin_unlock(&mm->page_table_lock); + mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); + mem_cgroup_uncharge_start(); + mem_cgroup_uncharge_page(page); + put_page(page); + mem_cgroup_uncharge_end(); + goto out; } static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm, @@ -1302,7 +1314,7 @@ alloc: count_vm_event(THP_FAULT_FALLBACK); if (is_huge_zero_pmd(orig_pmd)) { ret = do_huge_pmd_wp_zero_page_fallback(mm, vma, - address, pmd, haddr); + address, pmd, orig_pmd, haddr); } else { ret = do_huge_pmd_wp_page_fallback(mm, vma, address, pmd, orig_pmd, page, haddr); -- Regards, --Bob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 0/2] kernel BUG at mm/huge_memory.c:212! 2012-11-27 21:18 kernel BUG at mm/huge_memory.c:212! Jiri Slaby 2012-11-27 23:47 ` David Rientjes 2012-11-29 7:38 ` Bob Liu @ 2012-11-30 15:03 ` Kirill A. Shutemov 2012-11-30 15:03 ` [PATCH 1/2] thp: fix anononymous page accounting in fallback path for COW of HZP Kirill A. Shutemov ` (2 more replies) 2 siblings, 3 replies; 12+ messages in thread From: Kirill A. Shutemov @ 2012-11-30 15:03 UTC (permalink / raw) To: Jiri Slaby Cc: linux-mm, LKML, David Rientjes, Bob Liu, Andrew Morton, Andrea Arcangeli, Kirill A. Shutemov From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Hi Jiri, Sorry for late answer. It took time to reproduce and debug the issue. Could you test two patches below by thread. I expect it to fix both issues: put_huge_zero_page() and Bad rss-counter state. Kirill A. Shutemov (2): thp: fix anononymous page accounting in fallback path for COW of HZP thp: avoid race on multiple parallel page faults to the same page mm/huge_memory.c | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] thp: fix anononymous page accounting in fallback path for COW of HZP 2012-11-30 15:03 ` [PATCH 0/2] " Kirill A. Shutemov @ 2012-11-30 15:03 ` Kirill A. Shutemov 2012-12-03 3:14 ` Bob Liu 2012-11-30 15:03 ` [PATCH 2/2] thp: avoid race on multiple parallel page faults to the same page Kirill A. Shutemov 2012-12-03 13:02 ` [PATCH 0/2] kernel BUG at mm/huge_memory.c:212! Jiri Slaby 2 siblings, 1 reply; 12+ messages in thread From: Kirill A. Shutemov @ 2012-11-30 15:03 UTC (permalink / raw) To: Jiri Slaby Cc: linux-mm, LKML, David Rientjes, Bob Liu, Andrew Morton, Andrea Arcangeli, Kirill A. Shutemov From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Don't forget to account newly allocated page in fallback path for copy-on-write of huge zero page. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/huge_memory.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 57f0024..9d6f521 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1164,6 +1164,7 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm, pmd_populate(mm, pmd, pgtable); spin_unlock(&mm->page_table_lock); put_huge_zero_page(); + inc_mm_counter(mm, MM_ANONPAGES); mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] thp: fix anononymous page accounting in fallback path for COW of HZP 2012-11-30 15:03 ` [PATCH 1/2] thp: fix anononymous page accounting in fallback path for COW of HZP Kirill A. Shutemov @ 2012-12-03 3:14 ` Bob Liu 2012-12-03 8:15 ` Kirill A. Shutemov 0 siblings, 1 reply; 12+ messages in thread From: Bob Liu @ 2012-12-03 3:14 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Jiri Slaby, linux-mm, LKML, David Rientjes, Andrew Morton, Andrea Arcangeli On Fri, Nov 30, 2012 at 11:03 PM, Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > Don't forget to account newly allocated page in fallback path for > copy-on-write of huge zero page. > What about fallback path in do_huge_pmd_wp_page_fallback()? I think we should also account newly allocated page in it. > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/huge_memory.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 57f0024..9d6f521 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1164,6 +1164,7 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm, > pmd_populate(mm, pmd, pgtable); > spin_unlock(&mm->page_table_lock); > put_huge_zero_page(); > + inc_mm_counter(mm, MM_ANONPAGES); > > mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); > > -- > 1.7.11.7 > -- Regards, --Bob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] thp: fix anononymous page accounting in fallback path for COW of HZP 2012-12-03 3:14 ` Bob Liu @ 2012-12-03 8:15 ` Kirill A. Shutemov 0 siblings, 0 replies; 12+ messages in thread From: Kirill A. Shutemov @ 2012-12-03 8:15 UTC (permalink / raw) To: Bob Liu Cc: Jiri Slaby, linux-mm, LKML, David Rientjes, Andrew Morton, Andrea Arcangeli [-- Attachment #1: Type: text/plain, Size: 711 bytes --] On Mon, Dec 03, 2012 at 11:14:38AM +0800, Bob Liu wrote: > On Fri, Nov 30, 2012 at 11:03 PM, Kirill A. Shutemov > <kirill.shutemov@linux.intel.com> wrote: > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > > > Don't forget to account newly allocated page in fallback path for > > copy-on-write of huge zero page. > > > > What about fallback path in do_huge_pmd_wp_page_fallback()? > I think we should also account newly allocated page in it. No. Normal huge pages has already accounted on fork(). See copy_huge_pmd(). Huge zero page (as 4k zero page) doesn't contribute to RSS, so we need to account the page which replaces huge zero page on COW. -- Kirill A. Shutemov [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/2] thp: avoid race on multiple parallel page faults to the same page 2012-11-30 15:03 ` [PATCH 0/2] " Kirill A. Shutemov 2012-11-30 15:03 ` [PATCH 1/2] thp: fix anononymous page accounting in fallback path for COW of HZP Kirill A. Shutemov @ 2012-11-30 15:03 ` Kirill A. Shutemov 2012-12-03 2:29 ` Bob Liu 2012-12-03 13:02 ` [PATCH 0/2] kernel BUG at mm/huge_memory.c:212! Jiri Slaby 2 siblings, 1 reply; 12+ messages in thread From: Kirill A. Shutemov @ 2012-11-30 15:03 UTC (permalink / raw) To: Jiri Slaby Cc: linux-mm, LKML, David Rientjes, Bob Liu, Andrew Morton, Andrea Arcangeli, Kirill A. Shutemov From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> pmd value is stable only with mm->page_table_lock taken. After taking the lock we need to check that nobody modified the pmd before change it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/huge_memory.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9d6f521..51cb8fe 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -770,17 +770,20 @@ static inline struct page *alloc_hugepage(int defrag) } #endif -static void set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm, +static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, unsigned long zero_pfn) { pmd_t entry; + if (!pmd_none(*pmd)) + return false; entry = pfn_pmd(zero_pfn, vma->vm_page_prot); entry = pmd_wrprotect(entry); entry = pmd_mkhuge(entry); set_pmd_at(mm, haddr, pmd, entry); pgtable_trans_huge_deposit(mm, pgtable); mm->nr_ptes++; + return true; } int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, @@ -800,6 +803,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, transparent_hugepage_use_zero_page()) { pgtable_t pgtable; unsigned long zero_pfn; + bool set; pgtable = pte_alloc_one(mm, haddr); if (unlikely(!pgtable)) return VM_FAULT_OOM; @@ -810,9 +814,13 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, goto out; } spin_lock(&mm->page_table_lock); - set_huge_zero_page(pgtable, mm, vma, haddr, pmd, + set = set_huge_zero_page(pgtable, mm, vma, haddr, pmd, zero_pfn); spin_unlock(&mm->page_table_lock); + if (!set) { + pte_free(mm, pgtable); + put_huge_zero_page(); + } return 0; } page = alloc_hugepage_vma(transparent_hugepage_defrag(vma), @@ -1046,14 +1054,16 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, */ if (is_huge_zero_pmd(pmd)) { unsigned long zero_pfn; + bool set; /* * get_huge_zero_page() will never allocate a new page here, * since we already have a zero page to copy. It just takes a * reference. */ zero_pfn = get_huge_zero_page(); - set_huge_zero_page(pgtable, dst_mm, vma, addr, dst_pmd, + set = set_huge_zero_page(pgtable, dst_mm, vma, addr, dst_pmd, zero_pfn); + BUG_ON(!set); /* unexpected !pmd_none(dst_pmd) */ ret = 0; goto out_unlock; } @@ -1110,7 +1120,7 @@ unlock: static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - pmd_t *pmd, unsigned long haddr) + pmd_t *pmd, pmd_t orig_pmd, unsigned long haddr) { pgtable_t pgtable; pmd_t _pmd; @@ -1139,6 +1149,9 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm, mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); spin_lock(&mm->page_table_lock); + if (unlikely(!pmd_same(*pmd, orig_pmd))) + goto out_free_page; + pmdp_clear_flush(vma, haddr, pmd); /* leave pmd empty until pte is filled */ @@ -1171,6 +1184,12 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm, ret |= VM_FAULT_WRITE; out: return ret; +out_free_page: + spin_unlock(&mm->page_table_lock); + mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); + mem_cgroup_uncharge_page(page); + put_page(page); + goto out; } static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm, @@ -1317,7 +1336,7 @@ alloc: count_vm_event(THP_FAULT_FALLBACK); if (is_huge_zero_pmd(orig_pmd)) { ret = do_huge_pmd_wp_zero_page_fallback(mm, vma, - address, pmd, haddr); + address, pmd, orig_pmd, haddr); } else { ret = do_huge_pmd_wp_page_fallback(mm, vma, address, pmd, orig_pmd, page, haddr); -- 1.7.11.7 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] thp: avoid race on multiple parallel page faults to the same page 2012-11-30 15:03 ` [PATCH 2/2] thp: avoid race on multiple parallel page faults to the same page Kirill A. Shutemov @ 2012-12-03 2:29 ` Bob Liu 0 siblings, 0 replies; 12+ messages in thread From: Bob Liu @ 2012-12-03 2:29 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Jiri Slaby, linux-mm, LKML, David Rientjes, Andrew Morton, Andrea Arcangeli On Fri, Nov 30, 2012 at 11:03 PM, Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > pmd value is stable only with mm->page_table_lock taken. After taking > the lock we need to check that nobody modified the pmd before change it. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Bob Liu <lliubbo@gmail.com> > --- > mm/huge_memory.c | 29 ++++++++++++++++++++++++----- > 1 file changed, 24 insertions(+), 5 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 9d6f521..51cb8fe 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -770,17 +770,20 @@ static inline struct page *alloc_hugepage(int defrag) > } > #endif > > -static void set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm, > +static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm, > struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, > unsigned long zero_pfn) > { > pmd_t entry; > + if (!pmd_none(*pmd)) > + return false; > entry = pfn_pmd(zero_pfn, vma->vm_page_prot); > entry = pmd_wrprotect(entry); > entry = pmd_mkhuge(entry); > set_pmd_at(mm, haddr, pmd, entry); > pgtable_trans_huge_deposit(mm, pgtable); > mm->nr_ptes++; > + return true; > } > > int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, > @@ -800,6 +803,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, > transparent_hugepage_use_zero_page()) { > pgtable_t pgtable; > unsigned long zero_pfn; > + bool set; > pgtable = pte_alloc_one(mm, haddr); > if (unlikely(!pgtable)) > return VM_FAULT_OOM; > @@ -810,9 +814,13 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, > goto out; > } > spin_lock(&mm->page_table_lock); > - set_huge_zero_page(pgtable, mm, vma, haddr, pmd, > + set = set_huge_zero_page(pgtable, mm, vma, haddr, pmd, > zero_pfn); > spin_unlock(&mm->page_table_lock); > + if (!set) { > + pte_free(mm, pgtable); > + put_huge_zero_page(); > + } > return 0; > } > page = alloc_hugepage_vma(transparent_hugepage_defrag(vma), > @@ -1046,14 +1054,16 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, > */ > if (is_huge_zero_pmd(pmd)) { > unsigned long zero_pfn; > + bool set; > /* > * get_huge_zero_page() will never allocate a new page here, > * since we already have a zero page to copy. It just takes a > * reference. > */ > zero_pfn = get_huge_zero_page(); > - set_huge_zero_page(pgtable, dst_mm, vma, addr, dst_pmd, > + set = set_huge_zero_page(pgtable, dst_mm, vma, addr, dst_pmd, > zero_pfn); > + BUG_ON(!set); /* unexpected !pmd_none(dst_pmd) */ > ret = 0; > goto out_unlock; > } > @@ -1110,7 +1120,7 @@ unlock: > > static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm, > struct vm_area_struct *vma, unsigned long address, > - pmd_t *pmd, unsigned long haddr) > + pmd_t *pmd, pmd_t orig_pmd, unsigned long haddr) > { > pgtable_t pgtable; > pmd_t _pmd; > @@ -1139,6 +1149,9 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm, > mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); > > spin_lock(&mm->page_table_lock); > + if (unlikely(!pmd_same(*pmd, orig_pmd))) > + goto out_free_page; > + > pmdp_clear_flush(vma, haddr, pmd); > /* leave pmd empty until pte is filled */ > > @@ -1171,6 +1184,12 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm, > ret |= VM_FAULT_WRITE; > out: > return ret; > +out_free_page: > + spin_unlock(&mm->page_table_lock); > + mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); > + mem_cgroup_uncharge_page(page); > + put_page(page); > + goto out; > } > > static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm, > @@ -1317,7 +1336,7 @@ alloc: > count_vm_event(THP_FAULT_FALLBACK); > if (is_huge_zero_pmd(orig_pmd)) { > ret = do_huge_pmd_wp_zero_page_fallback(mm, vma, > - address, pmd, haddr); > + address, pmd, orig_pmd, haddr); > } else { > ret = do_huge_pmd_wp_page_fallback(mm, vma, address, > pmd, orig_pmd, page, haddr); > -- > 1.7.11.7 > -- Regards, --Bob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] kernel BUG at mm/huge_memory.c:212! 2012-11-30 15:03 ` [PATCH 0/2] " Kirill A. Shutemov 2012-11-30 15:03 ` [PATCH 1/2] thp: fix anononymous page accounting in fallback path for COW of HZP Kirill A. Shutemov 2012-11-30 15:03 ` [PATCH 2/2] thp: avoid race on multiple parallel page faults to the same page Kirill A. Shutemov @ 2012-12-03 13:02 ` Jiri Slaby 2012-12-12 5:36 ` Bob Liu 2 siblings, 1 reply; 12+ messages in thread From: Jiri Slaby @ 2012-12-03 13:02 UTC (permalink / raw) To: Kirill A. Shutemov Cc: linux-mm, LKML, David Rientjes, Bob Liu, Andrew Morton, Andrea Arcangeli On 11/30/2012 04:03 PM, Kirill A. Shutemov wrote: > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > Hi Jiri, > > Sorry for late answer. It took time to reproduce and debug the issue. > > Could you test two patches below by thread. I expect it to fix both > issues: put_huge_zero_page() and Bad rss-counter state. Hi, yes, since applying the patches on the last Thu, it didn't recur. > Kirill A. Shutemov (2): > thp: fix anononymous page accounting in fallback path for COW of HZP > thp: avoid race on multiple parallel page faults to the same page > > mm/huge_memory.c | 30 +++++++++++++++++++++++++----- > 1 file changed, 25 insertions(+), 5 deletions(-) thanks, -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] kernel BUG at mm/huge_memory.c:212! 2012-12-03 13:02 ` [PATCH 0/2] kernel BUG at mm/huge_memory.c:212! Jiri Slaby @ 2012-12-12 5:36 ` Bob Liu 2012-12-12 10:59 ` Kirill A. Shutemov 0 siblings, 1 reply; 12+ messages in thread From: Bob Liu @ 2012-12-12 5:36 UTC (permalink / raw) To: Jiri Slaby Cc: Kirill A. Shutemov, linux-mm, LKML, David Rientjes, Andrew Morton, Andrea Arcangeli On Mon, Dec 3, 2012 at 9:02 PM, Jiri Slaby <jslaby@suse.cz> wrote: > On 11/30/2012 04:03 PM, Kirill A. Shutemov wrote: >> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> >> >> Hi Jiri, >> >> Sorry for late answer. It took time to reproduce and debug the issue. >> >> Could you test two patches below by thread. I expect it to fix both >> issues: put_huge_zero_page() and Bad rss-counter state. > > Hi, yes, since applying the patches on the last Thu, it didn't recur. > >> Kirill A. Shutemov (2): >> thp: fix anononymous page accounting in fallback path for COW of HZP >> thp: avoid race on multiple parallel page faults to the same page >> >> mm/huge_memory.c | 30 +++++++++++++++++++++++++----- >> 1 file changed, 25 insertions(+), 5 deletions(-) > I still saw this bug on 3.7.0-rc8, but it's hard to reproduce it. It appears only once. -- Regards, --Bob -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] kernel BUG at mm/huge_memory.c:212! 2012-12-12 5:36 ` Bob Liu @ 2012-12-12 10:59 ` Kirill A. Shutemov 0 siblings, 0 replies; 12+ messages in thread From: Kirill A. Shutemov @ 2012-12-12 10:59 UTC (permalink / raw) To: Bob Liu Cc: Jiri Slaby, linux-mm, LKML, David Rientjes, Andrew Morton, Andrea Arcangeli [-- Attachment #1: Type: text/plain, Size: 1189 bytes --] On Wed, Dec 12, 2012 at 01:36:36PM +0800, Bob Liu wrote: > On Mon, Dec 3, 2012 at 9:02 PM, Jiri Slaby <jslaby@suse.cz> wrote: > > On 11/30/2012 04:03 PM, Kirill A. Shutemov wrote: > >> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > >> > >> Hi Jiri, > >> > >> Sorry for late answer. It took time to reproduce and debug the issue. > >> > >> Could you test two patches below by thread. I expect it to fix both > >> issues: put_huge_zero_page() and Bad rss-counter state. > > > > Hi, yes, since applying the patches on the last Thu, it didn't recur. > > > >> Kirill A. Shutemov (2): > >> thp: fix anononymous page accounting in fallback path for COW of HZP > >> thp: avoid race on multiple parallel page faults to the same page > >> > >> mm/huge_memory.c | 30 +++++++++++++++++++++++++----- > >> 1 file changed, 25 insertions(+), 5 deletions(-) > > > > I still saw this bug on 3.7.0-rc8, but it's hard to reproduce it. > It appears only once. I guess the patch you've posted fixes the issue, right? It's useful to enable debug_cow to test fallback path: echo 1 > /sys/kernel/mm/transparent_hugepage/debug_cow -- Kirill A. Shutemov [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-12-12 10:57 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-27 21:18 kernel BUG at mm/huge_memory.c:212! Jiri Slaby 2012-11-27 23:47 ` David Rientjes 2012-11-29 7:38 ` Bob Liu 2012-11-30 15:03 ` [PATCH 0/2] " Kirill A. Shutemov 2012-11-30 15:03 ` [PATCH 1/2] thp: fix anononymous page accounting in fallback path for COW of HZP Kirill A. Shutemov 2012-12-03 3:14 ` Bob Liu 2012-12-03 8:15 ` Kirill A. Shutemov 2012-11-30 15:03 ` [PATCH 2/2] thp: avoid race on multiple parallel page faults to the same page Kirill A. Shutemov 2012-12-03 2:29 ` Bob Liu 2012-12-03 13:02 ` [PATCH 0/2] kernel BUG at mm/huge_memory.c:212! Jiri Slaby 2012-12-12 5:36 ` Bob Liu 2012-12-12 10:59 ` Kirill A. Shutemov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).