All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Jones <davej@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: mm: hangs in collapse_huge_page
Date: Fri, 1 May 2015 01:24:21 +0300	[thread overview]
Message-ID: <20150430222421.GA18890@node.dhcp.inet.fi> (raw)
In-Reply-To: <5542A9FE.1000604@oracle.com>

On Thu, Apr 30, 2015 at 06:17:34PM -0400, Sasha Levin wrote:
> On 04/30/2014 11:42 AM, Kirill A. Shutemov wrote:
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index b4b1feba6472..1c6ace5207b9 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1986,6 +1986,8 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm,
> >  
> >  static inline int khugepaged_test_exit(struct mm_struct *mm)
> >  {
> > +       VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem) &&
> > +                       !spin_is_locked(&khugepaged_mm_lock));
> >         return atomic_read(&mm->mm_users) == 0;
> >  }
> 
> I've managed to hit this during testing:
> 
> [ 8048.304275] kernel BUG at mm/huge_memory.c:2060!
> [ 8048.305878] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> [ 8048.307479] Modules linked in: quota_v2 quota_tree xfs libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ast kvm ttm drm_kms_helper crct10dif_pclmul crc32_pclmul drm ghash_clmulni_intel aesni_
> intel aes_x86_64 lrw glue_helper ablk_helper cryptd joydev i2c_algo_bit sb_edac syscopyarea sysfillrect edac_core sysimgblt lpc_ich ipmi_si ipmi_msghandler ioatdma shpchp mac_hid btrfs xor mlx4_en vxlan raid6_pq
> hid_generic ixgbe mlx4_core usbhid hid dca megaraid_sas ahci ptp libahci pps_core mdio
> [ 8048.314422] CPU: 31 PID: 13065 Comm: thp01 Not tainted 4.1.0-rc1-next-20150430+ #8
> [ 8048.316215] Hardware name: Oracle Corporation OVCA X3-2             /ASSY,MOTHERBOARD,1U   , BIOS 17021300 06/19/2012
> [ 8048.318070] task: ffff8837ba9b3b40 ti: ffff8837bfcf8000 task.ti: ffff8837bfcf8000
> [ 8048.319941] RIP: __khugepaged_enter (mm/huge_memory.c:2059 mm/huge_memory.c:2075)
> [ 8048.321856] RSP: 0018:ffff8837bfcff8a0  EFLAGS: 00010246
> [ 8048.323752] RAX: 000000000000d800 RBX: ffff8837b8314b00 RCX: 0000000000000000
> [ 8048.325665] RDX: 00000000000000d8 RSI: 00000000000000fc RDI: ffff8837b8314ba8
> [ 8048.327570] RBP: ffff8837bfcff8e0 R08: ffff8837df1e5040 R09: ffffed06f4b701b8
> [ 8048.329486] R10: 000000002a82d01f R11: 1ffff106f82c0f77 R12: ffff8837a5b80d98
> [ 8048.331414] R13: ffff8837c6c58b80 R14: ffff8837c6c58bd0 R15: 0000000000000000
> [ 8048.333357] FS:  00007f238e593740(0000) GS:ffff8837df1c0000(0000) knlGS:0000000000000000
> [ 8048.335329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8048.337304] CR2: 00007f8a8d2c4740 CR3: 00000037c7c10000 CR4: 00000000000407e0
> [ 8048.339369] Stack:
> [ 8048.341343]  0000000000000000 00000007fffffffe ffff883700000001 ffff8837b8314b00
> [ 8048.343382]  00007fffffc00000 ffff8837c6c58b80 ffff8837c6c58bd0 0000000000000000
> [ 8048.345421]  ffff8837bfcff910 ffffffff815cfa60 ffff8837bfcff910 ffffffff81249cd3
> [ 8048.347473] Call Trace:
> [ 8048.349502] khugepaged_enter_vma_merge (include/linux/khugepaged.h:46 mm/huge_memory.c:2115)
> [ 8048.351584] ? up_write (kernel/locking/rwsem.h:9 kernel/locking/rwsem.c:93)
> [ 8048.353654] expand_downwards (mm/mmap.c:2278)
> [ 8048.355719] ? __mem_cgroup_count_vm_event (mm/memcontrol.c:1156)
> [ 8048.357791] handle_mm_fault (mm/memory.c:2673 mm/memory.c:3250 mm/memory.c:3371 mm/memory.c:3400)
> [ 8048.359886] ? follow_page_pte (mm/gup.c:48)
> [ 8048.361952] ? __pmd_alloc (mm/memory.c:3382)
> [ 8048.364020] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:154 kernel/locking/spinlock.c:183)
> [ 8048.366083] ? follow_page_pte (mm/gup.c:125)
> [ 8048.368139] ? follow_page_mask (mm/gup.c:209)
> [ 8048.370181] __get_user_pages (mm/gup.c:285 mm/gup.c:477)
> [ 8048.372214] ? follow_page_mask (mm/gup.c:420)
> [ 8048.374242] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3762)
> [ 8048.376269] get_user_pages (mm/gup.c:818)

We call get_user_pages() here without ->mmap_sem taken. It violates
get_user_pages() interface but should not cause a problem because we don't
have concurency for the mm yet -- it's exec path.

Not sure if we should correct it.

Hm. __bprm_mm_init() in the same exec path takes ->mmap_sem.

Any comments?

> [ 8048.378295] copy_strings.isra.20 (fs/exec.c:197 fs/exec.c:510)
> [ 8048.380392] ? count.isra.18.constprop.36 (fs/exec.c:454)
> [ 8048.382439] ? copy_strings_kernel (fs/exec.c:556)
> [ 8048.384464] do_execveat_common.isra.32 (fs/exec.c:1577)
> [ 8048.386469] ? do_execveat_common.isra.32 (include/linux/spinlock.h:312 fs/exec.c:1263 fs/exec.c:1518)
> [ 8048.388448] ? prepare_bprm_creds (fs/exec.c:1475)
> [ 8048.390395] ? kmem_cache_alloc (include/trace/events/kmem.h:53 mm/slub.c:2524)
> [ 8048.392309] ? getname_flags (fs/namei.c:135)
> [ 8048.394187] ? up_read (./arch/x86/include/asm/rwsem.h:156 kernel/locking/rwsem.c:81)
> [ 8048.396027] ? getname_flags (fs/namei.c:146)
> [ 8048.397869] SyS_execve (fs/exec.c:1701)
> [ 8048.399715] stub_execve (arch/x86/kernel/entry_64.S:510)
> [ 8048.401482] ? system_call_fastpath (arch/x86/kernel/entry_64.S:261)
> [ 8048.403207] Code: 1f 84 00 00 00 00 00 b8 f4 ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f b7 05 a9 fb db 01 0f b6 d4 31 d0 a8 fe 0f 85 3e fe ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 48 89 df e8 18 88 f6 ff 0f
> All code
> ========
>    0:   1f                      (bad)
>    1:   84 00                   test   %al,(%rax)
>    3:   00 00                   add    %al,(%rax)
>    5:   00 00                   add    %al,(%rax)
>    7:   b8 f4 ff ff ff          mov    $0xfffffff4,%eax
>    c:   c3                      retq
>    d:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
>   14:   00 00 00
>   17:   0f b7 05 a9 fb db 01    movzwl 0x1dbfba9(%rip),%eax        # 0x1dbfbc7
>   1e:   0f b6 d4                movzbl %ah,%edx
>   21:   31 d0                   xor    %edx,%eax
>   23:   a8 fe                   test   $0xfe,%al
>   25:   0f 85 3e fe ff ff       jne    0xfffffffffffffe69
>   2b:*  0f 0b                   ud2             <-- trapping instruction
>   2d:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
>   34:   00 00 00
>   37:   48 89 df                mov    %rbx,%rdi
>   3a:   e8 18 88 f6 ff          callq  0xfffffffffff68857
>   3f:
> 
> Code starting with the faulting instruction
> ===========================================
>    0:   0f 0b                   ud2
>    2:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
>    9:   00 00 00
>    c:   48 89 df                mov    %rbx,%rdi
>    f:   e8 18 88 f6 ff          callq  0xfffffffffff6882c
>   14:
> [ 8048.406837] RIP __khugepaged_enter (mm/huge_memory.c:2059 mm/huge_memory.c:2075)
> [ 8048.408525]  RSP <ffff8837bfcff8a0>
> 
> 
> Thanks,
> Sasha
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Jones <davej@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: mm: hangs in collapse_huge_page
Date: Fri, 1 May 2015 01:24:21 +0300	[thread overview]
Message-ID: <20150430222421.GA18890@node.dhcp.inet.fi> (raw)
In-Reply-To: <5542A9FE.1000604@oracle.com>

On Thu, Apr 30, 2015 at 06:17:34PM -0400, Sasha Levin wrote:
> On 04/30/2014 11:42 AM, Kirill A. Shutemov wrote:
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index b4b1feba6472..1c6ace5207b9 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1986,6 +1986,8 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm,
> >  
> >  static inline int khugepaged_test_exit(struct mm_struct *mm)
> >  {
> > +       VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem) &&
> > +                       !spin_is_locked(&khugepaged_mm_lock));
> >         return atomic_read(&mm->mm_users) == 0;
> >  }
> 
> I've managed to hit this during testing:
> 
> [ 8048.304275] kernel BUG at mm/huge_memory.c:2060!
> [ 8048.305878] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> [ 8048.307479] Modules linked in: quota_v2 quota_tree xfs libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ast kvm ttm drm_kms_helper crct10dif_pclmul crc32_pclmul drm ghash_clmulni_intel aesni_
> intel aes_x86_64 lrw glue_helper ablk_helper cryptd joydev i2c_algo_bit sb_edac syscopyarea sysfillrect edac_core sysimgblt lpc_ich ipmi_si ipmi_msghandler ioatdma shpchp mac_hid btrfs xor mlx4_en vxlan raid6_pq
> hid_generic ixgbe mlx4_core usbhid hid dca megaraid_sas ahci ptp libahci pps_core mdio
> [ 8048.314422] CPU: 31 PID: 13065 Comm: thp01 Not tainted 4.1.0-rc1-next-20150430+ #8
> [ 8048.316215] Hardware name: Oracle Corporation OVCA X3-2             /ASSY,MOTHERBOARD,1U   , BIOS 17021300 06/19/2012
> [ 8048.318070] task: ffff8837ba9b3b40 ti: ffff8837bfcf8000 task.ti: ffff8837bfcf8000
> [ 8048.319941] RIP: __khugepaged_enter (mm/huge_memory.c:2059 mm/huge_memory.c:2075)
> [ 8048.321856] RSP: 0018:ffff8837bfcff8a0  EFLAGS: 00010246
> [ 8048.323752] RAX: 000000000000d800 RBX: ffff8837b8314b00 RCX: 0000000000000000
> [ 8048.325665] RDX: 00000000000000d8 RSI: 00000000000000fc RDI: ffff8837b8314ba8
> [ 8048.327570] RBP: ffff8837bfcff8e0 R08: ffff8837df1e5040 R09: ffffed06f4b701b8
> [ 8048.329486] R10: 000000002a82d01f R11: 1ffff106f82c0f77 R12: ffff8837a5b80d98
> [ 8048.331414] R13: ffff8837c6c58b80 R14: ffff8837c6c58bd0 R15: 0000000000000000
> [ 8048.333357] FS:  00007f238e593740(0000) GS:ffff8837df1c0000(0000) knlGS:0000000000000000
> [ 8048.335329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8048.337304] CR2: 00007f8a8d2c4740 CR3: 00000037c7c10000 CR4: 00000000000407e0
> [ 8048.339369] Stack:
> [ 8048.341343]  0000000000000000 00000007fffffffe ffff883700000001 ffff8837b8314b00
> [ 8048.343382]  00007fffffc00000 ffff8837c6c58b80 ffff8837c6c58bd0 0000000000000000
> [ 8048.345421]  ffff8837bfcff910 ffffffff815cfa60 ffff8837bfcff910 ffffffff81249cd3
> [ 8048.347473] Call Trace:
> [ 8048.349502] khugepaged_enter_vma_merge (include/linux/khugepaged.h:46 mm/huge_memory.c:2115)
> [ 8048.351584] ? up_write (kernel/locking/rwsem.h:9 kernel/locking/rwsem.c:93)
> [ 8048.353654] expand_downwards (mm/mmap.c:2278)
> [ 8048.355719] ? __mem_cgroup_count_vm_event (mm/memcontrol.c:1156)
> [ 8048.357791] handle_mm_fault (mm/memory.c:2673 mm/memory.c:3250 mm/memory.c:3371 mm/memory.c:3400)
> [ 8048.359886] ? follow_page_pte (mm/gup.c:48)
> [ 8048.361952] ? __pmd_alloc (mm/memory.c:3382)
> [ 8048.364020] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:154 kernel/locking/spinlock.c:183)
> [ 8048.366083] ? follow_page_pte (mm/gup.c:125)
> [ 8048.368139] ? follow_page_mask (mm/gup.c:209)
> [ 8048.370181] __get_user_pages (mm/gup.c:285 mm/gup.c:477)
> [ 8048.372214] ? follow_page_mask (mm/gup.c:420)
> [ 8048.374242] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3762)
> [ 8048.376269] get_user_pages (mm/gup.c:818)

We call get_user_pages() here without ->mmap_sem taken. It violates
get_user_pages() interface but should not cause a problem because we don't
have concurency for the mm yet -- it's exec path.

Not sure if we should correct it.

Hm. __bprm_mm_init() in the same exec path takes ->mmap_sem.

Any comments?

> [ 8048.378295] copy_strings.isra.20 (fs/exec.c:197 fs/exec.c:510)
> [ 8048.380392] ? count.isra.18.constprop.36 (fs/exec.c:454)
> [ 8048.382439] ? copy_strings_kernel (fs/exec.c:556)
> [ 8048.384464] do_execveat_common.isra.32 (fs/exec.c:1577)
> [ 8048.386469] ? do_execveat_common.isra.32 (include/linux/spinlock.h:312 fs/exec.c:1263 fs/exec.c:1518)
> [ 8048.388448] ? prepare_bprm_creds (fs/exec.c:1475)
> [ 8048.390395] ? kmem_cache_alloc (include/trace/events/kmem.h:53 mm/slub.c:2524)
> [ 8048.392309] ? getname_flags (fs/namei.c:135)
> [ 8048.394187] ? up_read (./arch/x86/include/asm/rwsem.h:156 kernel/locking/rwsem.c:81)
> [ 8048.396027] ? getname_flags (fs/namei.c:146)
> [ 8048.397869] SyS_execve (fs/exec.c:1701)
> [ 8048.399715] stub_execve (arch/x86/kernel/entry_64.S:510)
> [ 8048.401482] ? system_call_fastpath (arch/x86/kernel/entry_64.S:261)
> [ 8048.403207] Code: 1f 84 00 00 00 00 00 b8 f4 ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f b7 05 a9 fb db 01 0f b6 d4 31 d0 a8 fe 0f 85 3e fe ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 48 89 df e8 18 88 f6 ff 0f
> All code
> ========
>    0:   1f                      (bad)
>    1:   84 00                   test   %al,(%rax)
>    3:   00 00                   add    %al,(%rax)
>    5:   00 00                   add    %al,(%rax)
>    7:   b8 f4 ff ff ff          mov    $0xfffffff4,%eax
>    c:   c3                      retq
>    d:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
>   14:   00 00 00
>   17:   0f b7 05 a9 fb db 01    movzwl 0x1dbfba9(%rip),%eax        # 0x1dbfbc7
>   1e:   0f b6 d4                movzbl %ah,%edx
>   21:   31 d0                   xor    %edx,%eax
>   23:   a8 fe                   test   $0xfe,%al
>   25:   0f 85 3e fe ff ff       jne    0xfffffffffffffe69
>   2b:*  0f 0b                   ud2             <-- trapping instruction
>   2d:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
>   34:   00 00 00
>   37:   48 89 df                mov    %rbx,%rdi
>   3a:   e8 18 88 f6 ff          callq  0xfffffffffff68857
>   3f:
> 
> Code starting with the faulting instruction
> ===========================================
>    0:   0f 0b                   ud2
>    2:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
>    9:   00 00 00
>    c:   48 89 df                mov    %rbx,%rdi
>    f:   e8 18 88 f6 ff          callq  0xfffffffffff6882c
>   14:
> [ 8048.406837] RIP __khugepaged_enter (mm/huge_memory.c:2059 mm/huge_memory.c:2075)
> [ 8048.408525]  RSP <ffff8837bfcff8a0>
> 
> 
> Thanks,
> Sasha
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
 Kirill A. Shutemov

  reply	other threads:[~2015-04-30 22:24 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-16  2:06 mm: hangs in collapse_huge_page Sasha Levin
2014-04-16  2:06 ` Sasha Levin
2014-04-24 16:46 ` Sasha Levin
2014-04-24 16:46   ` Sasha Levin
2014-04-30 15:42 ` Kirill A. Shutemov
2014-04-30 15:42   ` Kirill A. Shutemov
2014-05-01 14:38   ` Hillf Danton
2014-05-01 14:38     ` Hillf Danton
2014-05-11  0:34   ` Sasha Levin
2014-05-11  0:34     ` Sasha Levin
2014-05-14 21:29     ` Kirill A. Shutemov
2014-05-14 21:29       ` Kirill A. Shutemov
2015-04-30 22:17   ` Sasha Levin
2015-04-30 22:17     ` Sasha Levin
2015-04-30 22:24     ` Kirill A. Shutemov [this message]
2015-04-30 22:24       ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150430222421.GA18890@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.