From: Zhouping Liu <zliu@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mgorman@suse.de>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Ingo Molnar <mingo@kernel.org>, CAI Qian <caiqian@redhat.com>
Subject: Re: [PATCH 00/31] numa/core patches
Date: Tue, 30 Oct 2012 14:29:25 +0800 [thread overview]
Message-ID: <508F73C5.7050409@redhat.com> (raw)
In-Reply-To: <20121028175615.GC29827@cmpxchg.org>
On 10/29/2012 01:56 AM, Johannes Weiner wrote:
> On Fri, Oct 26, 2012 at 11:08:00AM +0200, Peter Zijlstra wrote:
>> On Fri, 2012-10-26 at 17:07 +0800, Zhouping Liu wrote:
>>> [ 180.918591] RIP: 0010:[<ffffffff8118c39a>] [<ffffffff8118c39a>] mem_cgroup_prepare_migration+0xba/0xd0
>>> [ 182.681450] [<ffffffff81183b60>] do_huge_pmd_numa_page+0x180/0x500
>>> [ 182.775090] [<ffffffff811585c9>] handle_mm_fault+0x1e9/0x360
>>> [ 182.863038] [<ffffffff81632b62>] __do_page_fault+0x172/0x4e0
>>> [ 182.950574] [<ffffffff8101c283>] ? __switch_to_xtra+0x163/0x1a0
>>> [ 183.041512] [<ffffffff8101281e>] ? __switch_to+0x3ce/0x4a0
>>> [ 183.126832] [<ffffffff8162d686>] ? __schedule+0x3c6/0x7a0
>>> [ 183.211216] [<ffffffff81632ede>] do_page_fault+0xe/0x10
>>> [ 183.293705] [<ffffffff8162f518>] page_fault+0x28/0x30
>> Johannes, this looks like the thp migration memcg hookery gone bad,
>> could you have a look at this?
> Oops. Here is an incremental fix, feel free to fold it into #31.
Hello Johannes,
maybe I don't think the below patch completely fix this issue, as I
found a new error(maybe similar with this):
[88099.923724] ------------[ cut here ]------------
[88099.924036] kernel BUG at mm/memcontrol.c:1134!
[88099.924036] invalid opcode: 0000 [#1] SMP
[88099.924036] Modules linked in: lockd sunrpc kvm_amd kvm
amd64_edac_mod edac_core ses enclosure serio_raw bnx2 pcspkr shpchp
joydev i2c_piix4 edac_mce_amd k8temp dcdbas ata_generic pata_acpi
megaraid_sas pata_serverworks usb_storage radeon i2c_algo_bit
drm_kms_helper ttm drm i2c_core
[88099.924036] CPU 7
[88099.924036] Pid: 3441, comm: stress Not tainted 3.7.0-rc2Jons+ #3
Dell Inc. PowerEdge 6950/0WN213
[88099.924036] RIP: 0010:[<ffffffff81188e97>] [<ffffffff81188e97>]
mem_cgroup_update_lru_size+0x27/0x30
[88099.924036] RSP: 0000:ffff88021b247ca8 EFLAGS: 00010082
[88099.924036] RAX: ffff88011d310138 RBX: ffffea0002f18000 RCX:
0000000000000001
[88099.924036] RDX: fffffffffffffe00 RSI: 000000000000000e RDI:
ffff88011d310138
[88099.924036] RBP: ffff88021b247ca8 R08: 0000000000000000 R09:
a8000bc600000000
[88099.924036] R10: 0000000000000000 R11: 0000000000000000 R12:
00000000fffffe00
[88099.924036] R13: ffff88011ffecb40 R14: 0000000000000286 R15:
0000000000000000
[88099.924036] FS: 00007f787d0bf740(0000) GS:ffff88021fc80000(0000)
knlGS:0000000000000000
[88099.924036] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[88099.924036] CR2: 00007f7873a00010 CR3: 000000021bda0000 CR4:
00000000000007e0
[88099.924036] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[88099.924036] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[88099.924036] Process stress (pid: 3441, threadinfo ffff88021b246000,
task ffff88021b399760)
[88099.924036] Stack:
[88099.924036] ffff88021b247cf8 ffffffff8113a9cd ffffea0002f18000
ffff88011d310138
[88099.924036] 0000000000000200 ffffea0002f18000 ffff88019bace580
00007f7873c00000
[88099.924036] ffff88021aca0cf0 ffffea00081e0000 ffff88021b247d18
ffffffff8113aa7d
[88099.924036] Call Trace:
[88099.924036] [<ffffffff8113a9cd>] __page_cache_release.part.11+0xdd/0x140
[88099.924036] [<ffffffff8113aa7d>] __put_compound_page+0x1d/0x30
[88099.924036] [<ffffffff8113ac4d>] put_compound_page+0x5d/0x1e0
[88099.924036] [<ffffffff8113b1a5>] put_page+0x45/0x50
[88099.924036] [<ffffffff8118378c>] do_huge_pmd_numa_page+0x2ec/0x4e0
[88099.924036] [<ffffffff81158089>] handle_mm_fault+0x1e9/0x360
[88099.924036] [<ffffffff8162cd22>] __do_page_fault+0x172/0x4e0
[88099.924036] [<ffffffff810958b9>] ? task_numa_work+0x1c9/0x220
[88099.924036] [<ffffffff8107c56c>] ? task_work_run+0xac/0xe0
[88099.924036] [<ffffffff8162d09e>] do_page_fault+0xe/0x10
[88099.924036] [<ffffffff816296d8>] page_fault+0x28/0x30
[88099.924036] Code: 00 00 00 00 66 66 66 66 90 44 8b 1d 1c 90 b5 00 55
48 89 e5 45 85 db 75 10 89 f6 48 63 d2 48 83 c6 0e 48 01 54 f7 08 78 02
5d c3 <0f> 0b 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 48 83 ec
[88099.924036] RIP [<ffffffff81188e97>]
mem_cgroup_update_lru_size+0x27/0x30
[88099.924036] RSP <ffff88021b247ca8>
[88099.924036] ---[ end trace c8d6b169e0c3f25a ]---
[88108.054610] ------------[ cut here ]------------
[88108.054610] WARNING: at kernel/watchdog.c:245
watchdog_overflow_callback+0x9c/0xd0()
[88108.054610] Hardware name: PowerEdge 6950
[88108.054610] Watchdog detected hard LOCKUP on cpu 3
[88108.054610] Modules linked in: lockd sunrpc kvm_amd kvm
amd64_edac_mod edac_core ses enclosure serio_raw bnx2 pcspkr shpchp
joydev i2c_piix4 edac_mce_amd k8temp dcdbas ata_generic pata_acpi
megaraid_sas pata_serverworks usb_storage radeon i2c_algo_bit
drm_kms_helper ttm drm i2c_core
[88108.054610] Pid: 3429, comm: stress Tainted: G D 3.7.0-rc2Jons+ #3
[88108.054610] Call Trace:
[88108.054610] <NMI> [<ffffffff8105c29f>] warn_slowpath_common+0x7f/0xc0
[88108.054610] [<ffffffff8105c396>] warn_slowpath_fmt+0x46/0x50
[88108.054610] [<ffffffff81093fa8>] ? sched_clock_cpu+0xa8/0x120
[88108.054610] [<ffffffff810e95c0>] ? touch_nmi_watchdog+0x80/0x80
[88108.054610] [<ffffffff810e965c>] watchdog_overflow_callback+0x9c/0xd0
[88108.054610] [<ffffffff81124e6d>] __perf_event_overflow+0x9d/0x230
[88108.054610] [<ffffffff81121f44>] ? perf_event_update_userpage+0x24/0x110
[88108.054610] [<ffffffff81125a74>] perf_event_overflow+0x14/0x20
[88108.054610] [<ffffffff8102440a>] x86_pmu_handle_irq+0x10a/0x160
[88108.054610] [<ffffffff8162ac4d>] perf_event_nmi_handler+0x1d/0x20
[88108.054610] [<ffffffff8162a411>] nmi_handle.isra.0+0x51/0x80
[88108.054610] [<ffffffff8162a5b9>] do_nmi+0x179/0x350
[88108.054610] [<ffffffff81629a30>] end_repeat_nmi+0x1e/0x2e
[88108.054610] [<ffffffff816290c2>] ? _raw_spin_lock_irqsave+0x32/0x40
[88108.054610] [<ffffffff816290c2>] ? _raw_spin_lock_irqsave+0x32/0x40
[88108.054610] [<ffffffff816290c2>] ? _raw_spin_lock_irqsave+0x32/0x40
[88108.054610] <<EOE>> [<ffffffff8113b087>] pagevec_lru_move_fn+0x97/0x110
[88108.054610] [<ffffffff8113a5f0>] ? pagevec_move_tail_fn+0x80/0x80
[88108.054610] [<ffffffff8113b11c>] __pagevec_lru_add+0x1c/0x20
[88108.054610] [<ffffffff8113b4e8>] __lru_cache_add+0x68/0x90
[88108.054610] [<ffffffff8113b71b>] lru_cache_add_lru+0x3b/0x60
[88108.054610] [<ffffffff81161151>] page_add_new_anon_rmap+0xc1/0x170
[88108.054610] [<ffffffff811854b2>] do_huge_pmd_anonymous_page+0x242/0x330
[88108.054610] [<ffffffff81158162>] handle_mm_fault+0x2c2/0x360
[88108.054610] [<ffffffff8162cd22>] __do_page_fault+0x172/0x4e0
[88108.054610] [<ffffffff8109520f>] ? __dequeue_entity+0x2f/0x50
[88108.054610] [<ffffffff810125d1>] ? __switch_to+0x181/0x4a0
[88108.054610] [<ffffffff8162d09e>] do_page_fault+0xe/0x10
[88108.054610] [<ffffffff816296d8>] page_fault+0x28/0x30
[88108.054610] ---[ end trace c8d6b169e0c3f25b ]---
......
......
it's easy to reproduce with stress[1] workload.
what command I used is '# stress -i 20 -m 30 -v'
I will report it on a new subject if it's a new issue.
let me know if you need other info.
[1] http://weather.ou.edu/~apw/projects/stress/
Thanks,
Zhouping
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5c30a14..0d7ebd3 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -801,8 +801,6 @@ void do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
> if (!new_page)
> goto alloc_fail;
>
> - mem_cgroup_prepare_migration(page, new_page, &memcg);
> -
> lru = PageLRU(page);
>
> if (lru && isolate_lru_page(page)) /* does an implicit get_page() */
> @@ -835,6 +833,14 @@ void do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
>
> return;
> }
> + /*
> + * Traditional migration needs to prepare the memcg charge
> + * transaction early to prevent the old page from being
> + * uncharged when installing migration entries. Here we can
> + * save the potential rollback and start the charge transfer
> + * only when migration is already known to end successfully.
> + */
> + mem_cgroup_prepare_migration(page, new_page, &memcg);
>
> entry = mk_pmd(new_page, vma->vm_page_prot);
> entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> @@ -845,6 +851,12 @@ void do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
> set_pmd_at(mm, haddr, pmd, entry);
> update_mmu_cache_pmd(vma, address, entry);
> page_remove_rmap(page);
> + /*
> + * Finish the charge transaction under the page table lock to
> + * prevent split_huge_page() from dividing up the charge
> + * before it's fully transferred to the new page.
> + */
> + mem_cgroup_end_migration(memcg, page, new_page, true);
> spin_unlock(&mm->page_table_lock);
>
> put_page(page); /* Drop the rmap reference */
> @@ -856,18 +868,14 @@ void do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
>
> unlock_page(new_page);
>
> - mem_cgroup_end_migration(memcg, page, new_page, true);
> -
> unlock_page(page);
> put_page(page); /* Drop the local reference */
>
> return;
>
> alloc_fail:
> - if (new_page) {
> - mem_cgroup_end_migration(memcg, page, new_page, false);
> + if (new_page)
> put_page(new_page);
> - }
>
> unlock_page(page);
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7acf43b..011e510 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3255,15 +3255,18 @@ void mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
> struct mem_cgroup **memcgp)
> {
> struct mem_cgroup *memcg = NULL;
> + unsigned int nr_pages = 1;
> struct page_cgroup *pc;
> enum charge_type ctype;
>
> *memcgp = NULL;
>
> - VM_BUG_ON(PageTransHuge(page));
> if (mem_cgroup_disabled())
> return;
>
> + if (PageTransHuge(page))
> + nr_pages <<= compound_order(page);
> +
> pc = lookup_page_cgroup(page);
> lock_page_cgroup(pc);
> if (PageCgroupUsed(pc)) {
> @@ -3325,7 +3328,7 @@ void mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
> * charged to the res_counter since we plan on replacing the
> * old one and only one page is going to be left afterwards.
> */
> - __mem_cgroup_commit_charge(memcg, newpage, 1, ctype, false);
> + __mem_cgroup_commit_charge(memcg, newpage, nr_pages, ctype, false);
> }
>
> /* remove redundant charge if migration failed*/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Zhouping Liu <zliu@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mgorman@suse.de>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Ingo Molnar <mingo@kernel.org>, CAI Qian <caiqian@redhat.com>
Subject: Re: [PATCH 00/31] numa/core patches
Date: Tue, 30 Oct 2012 14:29:25 +0800 [thread overview]
Message-ID: <508F73C5.7050409@redhat.com> (raw)
In-Reply-To: <20121028175615.GC29827@cmpxchg.org>
On 10/29/2012 01:56 AM, Johannes Weiner wrote:
> On Fri, Oct 26, 2012 at 11:08:00AM +0200, Peter Zijlstra wrote:
>> On Fri, 2012-10-26 at 17:07 +0800, Zhouping Liu wrote:
>>> [ 180.918591] RIP: 0010:[<ffffffff8118c39a>] [<ffffffff8118c39a>] mem_cgroup_prepare_migration+0xba/0xd0
>>> [ 182.681450] [<ffffffff81183b60>] do_huge_pmd_numa_page+0x180/0x500
>>> [ 182.775090] [<ffffffff811585c9>] handle_mm_fault+0x1e9/0x360
>>> [ 182.863038] [<ffffffff81632b62>] __do_page_fault+0x172/0x4e0
>>> [ 182.950574] [<ffffffff8101c283>] ? __switch_to_xtra+0x163/0x1a0
>>> [ 183.041512] [<ffffffff8101281e>] ? __switch_to+0x3ce/0x4a0
>>> [ 183.126832] [<ffffffff8162d686>] ? __schedule+0x3c6/0x7a0
>>> [ 183.211216] [<ffffffff81632ede>] do_page_fault+0xe/0x10
>>> [ 183.293705] [<ffffffff8162f518>] page_fault+0x28/0x30
>> Johannes, this looks like the thp migration memcg hookery gone bad,
>> could you have a look at this?
> Oops. Here is an incremental fix, feel free to fold it into #31.
Hello Johannes,
maybe I don't think the below patch completely fix this issue, as I
found a new error(maybe similar with this):
[88099.923724] ------------[ cut here ]------------
[88099.924036] kernel BUG at mm/memcontrol.c:1134!
[88099.924036] invalid opcode: 0000 [#1] SMP
[88099.924036] Modules linked in: lockd sunrpc kvm_amd kvm
amd64_edac_mod edac_core ses enclosure serio_raw bnx2 pcspkr shpchp
joydev i2c_piix4 edac_mce_amd k8temp dcdbas ata_generic pata_acpi
megaraid_sas pata_serverworks usb_storage radeon i2c_algo_bit
drm_kms_helper ttm drm i2c_core
[88099.924036] CPU 7
[88099.924036] Pid: 3441, comm: stress Not tainted 3.7.0-rc2Jons+ #3
Dell Inc. PowerEdge 6950/0WN213
[88099.924036] RIP: 0010:[<ffffffff81188e97>] [<ffffffff81188e97>]
mem_cgroup_update_lru_size+0x27/0x30
[88099.924036] RSP: 0000:ffff88021b247ca8 EFLAGS: 00010082
[88099.924036] RAX: ffff88011d310138 RBX: ffffea0002f18000 RCX:
0000000000000001
[88099.924036] RDX: fffffffffffffe00 RSI: 000000000000000e RDI:
ffff88011d310138
[88099.924036] RBP: ffff88021b247ca8 R08: 0000000000000000 R09:
a8000bc600000000
[88099.924036] R10: 0000000000000000 R11: 0000000000000000 R12:
00000000fffffe00
[88099.924036] R13: ffff88011ffecb40 R14: 0000000000000286 R15:
0000000000000000
[88099.924036] FS: 00007f787d0bf740(0000) GS:ffff88021fc80000(0000)
knlGS:0000000000000000
[88099.924036] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[88099.924036] CR2: 00007f7873a00010 CR3: 000000021bda0000 CR4:
00000000000007e0
[88099.924036] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[88099.924036] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[88099.924036] Process stress (pid: 3441, threadinfo ffff88021b246000,
task ffff88021b399760)
[88099.924036] Stack:
[88099.924036] ffff88021b247cf8 ffffffff8113a9cd ffffea0002f18000
ffff88011d310138
[88099.924036] 0000000000000200 ffffea0002f18000 ffff88019bace580
00007f7873c00000
[88099.924036] ffff88021aca0cf0 ffffea00081e0000 ffff88021b247d18
ffffffff8113aa7d
[88099.924036] Call Trace:
[88099.924036] [<ffffffff8113a9cd>] __page_cache_release.part.11+0xdd/0x140
[88099.924036] [<ffffffff8113aa7d>] __put_compound_page+0x1d/0x30
[88099.924036] [<ffffffff8113ac4d>] put_compound_page+0x5d/0x1e0
[88099.924036] [<ffffffff8113b1a5>] put_page+0x45/0x50
[88099.924036] [<ffffffff8118378c>] do_huge_pmd_numa_page+0x2ec/0x4e0
[88099.924036] [<ffffffff81158089>] handle_mm_fault+0x1e9/0x360
[88099.924036] [<ffffffff8162cd22>] __do_page_fault+0x172/0x4e0
[88099.924036] [<ffffffff810958b9>] ? task_numa_work+0x1c9/0x220
[88099.924036] [<ffffffff8107c56c>] ? task_work_run+0xac/0xe0
[88099.924036] [<ffffffff8162d09e>] do_page_fault+0xe/0x10
[88099.924036] [<ffffffff816296d8>] page_fault+0x28/0x30
[88099.924036] Code: 00 00 00 00 66 66 66 66 90 44 8b 1d 1c 90 b5 00 55
48 89 e5 45 85 db 75 10 89 f6 48 63 d2 48 83 c6 0e 48 01 54 f7 08 78 02
5d c3 <0f> 0b 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 48 83 ec
[88099.924036] RIP [<ffffffff81188e97>]
mem_cgroup_update_lru_size+0x27/0x30
[88099.924036] RSP <ffff88021b247ca8>
[88099.924036] ---[ end trace c8d6b169e0c3f25a ]---
[88108.054610] ------------[ cut here ]------------
[88108.054610] WARNING: at kernel/watchdog.c:245
watchdog_overflow_callback+0x9c/0xd0()
[88108.054610] Hardware name: PowerEdge 6950
[88108.054610] Watchdog detected hard LOCKUP on cpu 3
[88108.054610] Modules linked in: lockd sunrpc kvm_amd kvm
amd64_edac_mod edac_core ses enclosure serio_raw bnx2 pcspkr shpchp
joydev i2c_piix4 edac_mce_amd k8temp dcdbas ata_generic pata_acpi
megaraid_sas pata_serverworks usb_storage radeon i2c_algo_bit
drm_kms_helper ttm drm i2c_core
[88108.054610] Pid: 3429, comm: stress Tainted: G D 3.7.0-rc2Jons+ #3
[88108.054610] Call Trace:
[88108.054610] <NMI> [<ffffffff8105c29f>] warn_slowpath_common+0x7f/0xc0
[88108.054610] [<ffffffff8105c396>] warn_slowpath_fmt+0x46/0x50
[88108.054610] [<ffffffff81093fa8>] ? sched_clock_cpu+0xa8/0x120
[88108.054610] [<ffffffff810e95c0>] ? touch_nmi_watchdog+0x80/0x80
[88108.054610] [<ffffffff810e965c>] watchdog_overflow_callback+0x9c/0xd0
[88108.054610] [<ffffffff81124e6d>] __perf_event_overflow+0x9d/0x230
[88108.054610] [<ffffffff81121f44>] ? perf_event_update_userpage+0x24/0x110
[88108.054610] [<ffffffff81125a74>] perf_event_overflow+0x14/0x20
[88108.054610] [<ffffffff8102440a>] x86_pmu_handle_irq+0x10a/0x160
[88108.054610] [<ffffffff8162ac4d>] perf_event_nmi_handler+0x1d/0x20
[88108.054610] [<ffffffff8162a411>] nmi_handle.isra.0+0x51/0x80
[88108.054610] [<ffffffff8162a5b9>] do_nmi+0x179/0x350
[88108.054610] [<ffffffff81629a30>] end_repeat_nmi+0x1e/0x2e
[88108.054610] [<ffffffff816290c2>] ? _raw_spin_lock_irqsave+0x32/0x40
[88108.054610] [<ffffffff816290c2>] ? _raw_spin_lock_irqsave+0x32/0x40
[88108.054610] [<ffffffff816290c2>] ? _raw_spin_lock_irqsave+0x32/0x40
[88108.054610] <<EOE>> [<ffffffff8113b087>] pagevec_lru_move_fn+0x97/0x110
[88108.054610] [<ffffffff8113a5f0>] ? pagevec_move_tail_fn+0x80/0x80
[88108.054610] [<ffffffff8113b11c>] __pagevec_lru_add+0x1c/0x20
[88108.054610] [<ffffffff8113b4e8>] __lru_cache_add+0x68/0x90
[88108.054610] [<ffffffff8113b71b>] lru_cache_add_lru+0x3b/0x60
[88108.054610] [<ffffffff81161151>] page_add_new_anon_rmap+0xc1/0x170
[88108.054610] [<ffffffff811854b2>] do_huge_pmd_anonymous_page+0x242/0x330
[88108.054610] [<ffffffff81158162>] handle_mm_fault+0x2c2/0x360
[88108.054610] [<ffffffff8162cd22>] __do_page_fault+0x172/0x4e0
[88108.054610] [<ffffffff8109520f>] ? __dequeue_entity+0x2f/0x50
[88108.054610] [<ffffffff810125d1>] ? __switch_to+0x181/0x4a0
[88108.054610] [<ffffffff8162d09e>] do_page_fault+0xe/0x10
[88108.054610] [<ffffffff816296d8>] page_fault+0x28/0x30
[88108.054610] ---[ end trace c8d6b169e0c3f25b ]---
......
......
it's easy to reproduce with stress[1] workload.
what command I used is '# stress -i 20 -m 30 -v'
I will report it on a new subject if it's a new issue.
let me know if you need other info.
[1] http://weather.ou.edu/~apw/projects/stress/
Thanks,
Zhouping
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5c30a14..0d7ebd3 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -801,8 +801,6 @@ void do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
> if (!new_page)
> goto alloc_fail;
>
> - mem_cgroup_prepare_migration(page, new_page, &memcg);
> -
> lru = PageLRU(page);
>
> if (lru && isolate_lru_page(page)) /* does an implicit get_page() */
> @@ -835,6 +833,14 @@ void do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
>
> return;
> }
> + /*
> + * Traditional migration needs to prepare the memcg charge
> + * transaction early to prevent the old page from being
> + * uncharged when installing migration entries. Here we can
> + * save the potential rollback and start the charge transfer
> + * only when migration is already known to end successfully.
> + */
> + mem_cgroup_prepare_migration(page, new_page, &memcg);
>
> entry = mk_pmd(new_page, vma->vm_page_prot);
> entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
> @@ -845,6 +851,12 @@ void do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
> set_pmd_at(mm, haddr, pmd, entry);
> update_mmu_cache_pmd(vma, address, entry);
> page_remove_rmap(page);
> + /*
> + * Finish the charge transaction under the page table lock to
> + * prevent split_huge_page() from dividing up the charge
> + * before it's fully transferred to the new page.
> + */
> + mem_cgroup_end_migration(memcg, page, new_page, true);
> spin_unlock(&mm->page_table_lock);
>
> put_page(page); /* Drop the rmap reference */
> @@ -856,18 +868,14 @@ void do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
>
> unlock_page(new_page);
>
> - mem_cgroup_end_migration(memcg, page, new_page, true);
> -
> unlock_page(page);
> put_page(page); /* Drop the local reference */
>
> return;
>
> alloc_fail:
> - if (new_page) {
> - mem_cgroup_end_migration(memcg, page, new_page, false);
> + if (new_page)
> put_page(new_page);
> - }
>
> unlock_page(page);
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7acf43b..011e510 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3255,15 +3255,18 @@ void mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
> struct mem_cgroup **memcgp)
> {
> struct mem_cgroup *memcg = NULL;
> + unsigned int nr_pages = 1;
> struct page_cgroup *pc;
> enum charge_type ctype;
>
> *memcgp = NULL;
>
> - VM_BUG_ON(PageTransHuge(page));
> if (mem_cgroup_disabled())
> return;
>
> + if (PageTransHuge(page))
> + nr_pages <<= compound_order(page);
> +
> pc = lookup_page_cgroup(page);
> lock_page_cgroup(pc);
> if (PageCgroupUsed(pc)) {
> @@ -3325,7 +3328,7 @@ void mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
> * charged to the res_counter since we plan on replacing the
> * old one and only one page is going to be left afterwards.
> */
> - __mem_cgroup_commit_charge(memcg, newpage, 1, ctype, false);
> + __mem_cgroup_commit_charge(memcg, newpage, nr_pages, ctype, false);
> }
>
> /* remove redundant charge if migration failed*/
next prev parent reply other threads:[~2012-10-30 6:27 UTC|newest]
Thread overview: 269+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-25 12:16 [PATCH 00/31] numa/core patches Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 12:16 ` [PATCH 01/31] sched, numa, mm: Make find_busiest_queue() a method Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 12:16 ` [PATCH 02/31] sched, numa, mm: Describe the NUMA scheduling problem formally Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 9:56 ` Mel Gorman
2012-11-01 9:56 ` Mel Gorman
2012-11-01 13:13 ` Rik van Riel
2012-11-01 13:13 ` Rik van Riel
2012-10-25 12:16 ` [PATCH 03/31] mm/thp: Preserve pgprot across huge page split Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 10:22 ` Mel Gorman
2012-11-01 10:22 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 04/31] x86/mm: Introduce pte_accessible() Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 20:10 ` Linus Torvalds
2012-10-25 20:10 ` Linus Torvalds
2012-10-26 6:24 ` [PATCH 04/31, v2] " Ingo Molnar
2012-10-26 6:24 ` Ingo Molnar
2012-11-01 10:42 ` [PATCH 04/31] " Mel Gorman
2012-11-01 10:42 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags() Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 20:17 ` Linus Torvalds
2012-10-25 20:17 ` Linus Torvalds
2012-10-26 2:30 ` Rik van Riel
2012-10-26 2:30 ` Rik van Riel
2012-10-26 2:56 ` Linus Torvalds
2012-10-26 2:56 ` Linus Torvalds
2012-10-26 3:57 ` Rik van Riel
2012-10-26 3:57 ` Rik van Riel
2012-10-26 4:23 ` Linus Torvalds
2012-10-26 4:23 ` Linus Torvalds
2012-10-26 6:42 ` Ingo Molnar
2012-10-26 6:42 ` Ingo Molnar
2012-10-26 12:34 ` Michel Lespinasse
2012-10-26 12:34 ` Michel Lespinasse
2012-10-26 12:48 ` Andi Kleen
2012-10-26 12:48 ` Andi Kleen
2012-10-26 13:16 ` Rik van Riel
2012-10-26 13:16 ` Rik van Riel
2012-10-26 13:26 ` Ingo Molnar
2012-10-26 13:26 ` Ingo Molnar
2012-10-26 13:28 ` Ingo Molnar
2012-10-26 13:28 ` Ingo Molnar
2012-10-26 18:44 ` [PATCH 1/3] x86/mm: only do a local TLB flush in ptep_set_access_flags() Rik van Riel
2012-10-26 18:44 ` Rik van Riel
2012-10-26 18:49 ` Linus Torvalds
2012-10-26 18:49 ` Linus Torvalds
2012-10-26 19:16 ` Rik van Riel
2012-10-26 19:16 ` Rik van Riel
2012-10-26 19:18 ` Linus Torvalds
2012-10-26 19:18 ` Linus Torvalds
2012-10-26 19:21 ` Rik van Riel
2012-10-26 19:21 ` Rik van Riel
2012-10-29 15:23 ` Rik van Riel
2012-10-29 15:23 ` Rik van Riel
2012-12-21 9:57 ` trailing flush_tlb_fix_spurious_fault in handle_pte_fault (was Re: [PATCH 1/3] x86/mm: only do a local TLB flush in ptep_set_access_flags()) Vineet Gupta
2012-12-21 9:57 ` Vineet Gupta
2012-10-26 18:45 ` [PATCH 2/3] x86,mm: drop TLB flush from ptep_set_access_flags Rik van Riel
2012-10-26 18:45 ` Rik van Riel
2012-10-26 21:12 ` Alan Cox
2012-10-26 21:12 ` Alan Cox
2012-10-27 3:49 ` Rik van Riel
2012-10-27 3:49 ` Rik van Riel
2012-10-27 10:29 ` Ingo Molnar
2012-10-27 10:29 ` Ingo Molnar
2012-10-27 13:40 ` Rik van Riel
2012-10-27 13:40 ` Rik van Riel
2012-10-29 16:57 ` Borislav Petkov
2012-10-29 16:57 ` Borislav Petkov
2012-10-29 17:06 ` Linus Torvalds
2012-10-29 17:06 ` Linus Torvalds
2012-11-17 14:50 ` Borislav Petkov
2012-11-17 14:50 ` Borislav Petkov
2012-11-17 14:56 ` Linus Torvalds
2012-11-17 14:56 ` Linus Torvalds
2012-11-17 15:17 ` Borislav Petkov
2012-11-17 15:17 ` Borislav Petkov
2012-11-17 15:24 ` Rik van Riel
2012-11-17 15:24 ` Rik van Riel
2012-11-17 21:53 ` Shentino
2012-11-17 21:53 ` Shentino
2012-11-18 15:29 ` Michel Lespinasse
2012-11-18 15:29 ` Michel Lespinasse
2012-10-26 18:46 ` [PATCH 3/3] mm,generic: only flush the local TLB in ptep_set_access_flags Rik van Riel
2012-10-26 18:46 ` Rik van Riel
2012-10-26 18:48 ` Linus Torvalds
2012-10-26 18:48 ` Linus Torvalds
2012-10-26 18:53 ` Linus Torvalds
2012-10-26 18:53 ` Linus Torvalds
2012-10-26 18:57 ` Rik van Riel
2012-10-26 18:57 ` Rik van Riel
2012-10-26 19:16 ` Linus Torvalds
2012-10-26 19:16 ` Linus Torvalds
2012-10-26 19:33 ` [PATCH -v2 " Rik van Riel
2012-10-26 19:33 ` Rik van Riel
2012-10-26 13:23 ` [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags() Michel Lespinasse
2012-10-26 13:23 ` Michel Lespinasse
2012-10-26 17:01 ` Linus Torvalds
2012-10-26 17:01 ` Linus Torvalds
2012-10-26 17:54 ` Rik van Riel
2012-10-26 17:54 ` Rik van Riel
2012-10-26 18:02 ` Linus Torvalds
2012-10-26 18:02 ` Linus Torvalds
2012-10-26 18:14 ` Rik van Riel
2012-10-26 18:14 ` Rik van Riel
2012-10-26 18:41 ` Linus Torvalds
2012-10-26 18:41 ` Linus Torvalds
2012-10-25 12:16 ` [PATCH 06/31] mm: Only flush the TLB when clearing an accessible pte Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 12:16 ` [PATCH 07/31] sched, numa, mm, s390/thp: Implement pmd_pgprot() for s390 Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 10:49 ` Mel Gorman
2012-11-01 10:49 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 08/31] sched, numa, mm, MIPS/thp: Add pmd_pgprot() implementation Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 12:16 ` [PATCH 09/31] mm/pgprot: Move the pgprot_modify() fallback definition to mm.h Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 12:16 ` [PATCH 10/31] mm/mpol: Remove NUMA_INTERLEAVE_HIT Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 20:58 ` Andi Kleen
2012-10-25 20:58 ` Andi Kleen
2012-10-26 7:59 ` Ingo Molnar
2012-10-26 7:59 ` Ingo Molnar
2012-10-25 12:16 ` [PATCH 11/31] mm/mpol: Make MPOL_LOCAL a real policy Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 10:58 ` Mel Gorman
2012-11-01 10:58 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 12/31] mm/mpol: Add MPOL_MF_NOOP Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 11:10 ` Mel Gorman
2012-11-01 11:10 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 13/31] mm/mpol: Check for misplaced page Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 12:16 ` [PATCH 14/31] mm/mpol: Create special PROT_NONE infrastructure Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 11:51 ` Mel Gorman
2012-11-01 11:51 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 15/31] mm/mpol: Add MPOL_MF_LAZY Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 12:01 ` Mel Gorman
2012-11-01 12:01 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 16/31] numa, mm: Support NUMA hinting page faults from gup/gup_fast Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 12:16 ` [PATCH 17/31] mm/migrate: Introduce migrate_misplaced_page() Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 12:20 ` Mel Gorman
2012-11-01 12:20 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 18/31] mm/mpol: Use special PROT_NONE to migrate pages Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 12:16 ` [PATCH 19/31] sched, numa, mm: Introduce tsk_home_node() Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 13:48 ` Mel Gorman
2012-11-01 13:48 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 20/31] sched, numa, mm/mpol: Make mempolicy home-node aware Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 13:58 ` Mel Gorman
2012-11-01 13:58 ` Mel Gorman
2012-11-01 14:10 ` Don Morris
2012-11-01 14:10 ` Don Morris
2012-10-25 12:16 ` [PATCH 21/31] sched, numa, mm: Introduce sched_feat_numa() Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 14:00 ` Mel Gorman
2012-11-01 14:00 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 22/31] sched, numa, mm: Implement THP migration Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 14:16 ` Mel Gorman
2012-11-01 14:16 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 23/31] sched, numa, mm: Implement home-node awareness Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 15:06 ` Mel Gorman
2012-11-01 15:06 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 24/31] sched, numa, mm: Introduce last_nid in the pageframe Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 15:17 ` Mel Gorman
2012-11-01 15:17 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 25/31] sched, numa, mm/mpol: Add_MPOL_F_HOME Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 12:16 ` [PATCH 26/31] sched, numa, mm: Add fault driven placement and migration policy Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 20:53 ` Linus Torvalds
2012-10-25 20:53 ` Linus Torvalds
2012-10-26 7:15 ` Ingo Molnar
2012-10-26 7:15 ` Ingo Molnar
2012-10-26 13:50 ` Ingo Molnar
2012-10-26 13:50 ` Ingo Molnar
2012-10-26 14:11 ` Peter Zijlstra
2012-10-26 14:11 ` Peter Zijlstra
2012-10-26 14:14 ` Ingo Molnar
2012-10-26 14:14 ` Ingo Molnar
2012-10-26 16:47 ` Linus Torvalds
2012-10-26 16:47 ` Linus Torvalds
2012-10-30 19:23 ` Rik van Riel
2012-10-30 19:23 ` Rik van Riel
2012-11-01 15:40 ` Mel Gorman
2012-11-01 15:40 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 27/31] sched, numa, mm: Add credits for NUMA placement Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 12:16 ` [PATCH 28/31] sched, numa, mm: Implement constant, per task Working Set Sampling (WSS) rate Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 15:48 ` Mel Gorman
2012-11-01 15:48 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 29/31] sched, numa, mm: Add NUMA_MIGRATION feature flag Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-25 12:16 ` [PATCH 30/31] sched, numa, mm: Implement slow start for working set sampling Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-11-01 15:52 ` Mel Gorman
2012-11-01 15:52 ` Mel Gorman
2012-10-25 12:16 ` [PATCH 31/31] sched, numa, mm: Add memcg support to do_huge_pmd_numa_page() Peter Zijlstra
2012-10-25 12:16 ` Peter Zijlstra
2012-10-26 9:07 ` [PATCH 00/31] numa/core patches Zhouping Liu
2012-10-26 9:08 ` Peter Zijlstra
2012-10-26 9:08 ` Peter Zijlstra
2012-10-26 9:20 ` Ingo Molnar
2012-10-26 9:20 ` Ingo Molnar
2012-10-26 9:41 ` Zhouping Liu
2012-10-26 9:41 ` Zhouping Liu
2012-10-26 10:20 ` Zhouping Liu
2012-10-26 10:20 ` Zhouping Liu
2012-10-26 10:24 ` Ingo Molnar
2012-10-26 10:24 ` Ingo Molnar
2012-10-28 17:56 ` Johannes Weiner
2012-10-28 17:56 ` Johannes Weiner
2012-10-29 2:44 ` Zhouping Liu
2012-10-29 2:44 ` Zhouping Liu
2012-10-29 6:50 ` [PATCH] sched, numa, mm: Add memcg support to do_huge_pmd_numa_page() Ingo Molnar
2012-10-29 6:50 ` Ingo Molnar
2012-10-29 8:24 ` Johannes Weiner
2012-10-29 8:24 ` Johannes Weiner
2012-10-29 8:36 ` Zhouping Liu
2012-10-29 8:36 ` Zhouping Liu
2012-10-29 11:15 ` Ingo Molnar
2012-10-29 11:15 ` Ingo Molnar
2012-10-30 6:29 ` Zhouping Liu [this message]
2012-10-30 6:29 ` [PATCH 00/31] numa/core patches Zhouping Liu
2012-10-31 0:48 ` Johannes Weiner
2012-10-31 0:48 ` Johannes Weiner
2012-10-31 7:26 ` Hugh Dickins
2012-10-31 7:26 ` Hugh Dickins
2012-10-31 13:15 ` Zhouping Liu
2012-10-31 13:15 ` Zhouping Liu
2012-10-31 17:31 ` Hugh Dickins
2012-10-31 17:31 ` Hugh Dickins
2012-11-01 13:41 ` Hugh Dickins
2012-11-01 13:41 ` Hugh Dickins
2012-11-02 3:23 ` Zhouping Liu
2012-11-02 3:23 ` Zhouping Liu
2012-11-02 23:06 ` Hugh Dickins
2012-11-02 23:06 ` Hugh Dickins
2012-10-30 12:20 ` Mel Gorman
2012-10-30 12:20 ` Mel Gorman
2012-10-30 15:28 ` Andrew Morton
2012-10-30 15:28 ` Andrew Morton
2012-10-30 16:59 ` Mel Gorman
2012-10-30 16:59 ` Mel Gorman
2012-11-03 11:04 ` Alex Shi
2012-11-03 11:04 ` Alex Shi
2012-11-03 12:21 ` Mel Gorman
2012-11-03 12:21 ` Mel Gorman
2012-11-10 2:47 ` Alex Shi
2012-11-10 2:47 ` Alex Shi
2012-11-12 9:50 ` Mel Gorman
2012-11-12 9:50 ` Mel Gorman
2012-11-09 8:51 ` Rik van Riel
2012-11-09 8:51 ` Rik van Riel
2012-11-05 17:11 ` Srikar Dronamraju
2012-11-05 17:11 ` Srikar Dronamraju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=508F73C5.7050409@redhat.com \
--to=zliu@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=caiqian@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.