Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Michael Ellerman <michael@ellerman.id.au>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linuxppc-dev@ozlabs.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages
Date: Sat, 25 Apr 2009 01:24:50 +1000	[thread overview]
Message-ID: <1240586690.12551.31.camel@localhost> (raw)
In-Reply-To: <20090424095116.GB14283@csn.ul.ie>

[-- Attachment #1: Type: text/plain, Size: 5601 bytes --]

On Fri, 2009-04-24 at 10:51 +0100, Mel Gorman wrote:
> On Tue, Apr 21, 2009 at 08:27:57PM -0700, Linus Torvalds wrote:
> > Another week, another -rc.
> > 
> 
> I'm seeing some tests with sysbench+postgres+large pages fail on ppc64
> although a very clear pattern is not forming as to what exactly is
> causing it. However, the libhugetlbfs regression tests (make && make
> func) are triggering the following oops when calling mlock() and so are
> likely related.
> 
> ------------[ cut here ]------------
> kernel BUG at arch/powerpc/mm/pgtable.c:243!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=128 NUMA pSeries
> Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log qla2xxx
> loop nfnetlink iptable_filter iptable_nat nf_nat ip_tables
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
> xt_tcpudp xt_limit ipt_LOG xt_pkttype x_tables
> NIP: c00000000002becc LR: c00000000002c02c CTR: 0000000000000000
> REGS: c0000000ea92b4c0 TRAP: 0700   Not tainted  (2.6.30-rc3-autokern1)
> MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 28000484  XER: 20000020
> TASK = c00000000395b660[7611] 'mlock' THREAD: c0000000ea928000 CPU: 3
> GPR00: 0000000000000001 c0000000ea92b740 c0000000008ea170 c0000000ec7d4980 
> GPR04: 000000003f000000 c0000001e2278cf8 0000001900000393 0000000000000001 
> GPR08: f000000002bc0000 0000000000000000 0000000000000113 c0000001e2278c81 
> GPR12: 0000000044000482 c00000000093b880 0000000028004422 0000000000000000 
> GPR16: c0000000ea92bbf0 c0000000009f06f0 0000001900000113 c0000000ec7d4980 
> GPR20: 0000000000000000 f000000002bc0000 000000003f000000 c0000001e2278cf8 
> GPR24: c0000000eaa90bb0 0000000000000000 c0000000eaa90bb0 c0000000ea928000 
> GPR28: f000000002bc0000 0000001900000393 0000000000000001 c0000001e2278cf8 
> NIP [c00000000002becc] .assert_pte_locked+0x54/0x8c
> LR [c00000000002c02c] .ptep_set_access_flags+0x50/0x8c
> Call Trace:
> [c0000000ea92b740] [c0000000eaa90bb0] 0xc0000000eaa90bb0 (unreliable)
> [c0000000ea92b7d0] [c0000000000ed1b0] .hugetlb_cow+0xd4/0x654
> [c0000000ea92b900] [c0000000000edbf0] .hugetlb_fault+0x4c0/0x708
> [c0000000ea92b9f0] [c0000000000ee890] .follow_hugetlb_page+0x174/0x364
> [c0000000ea92bae0] [c0000000000d8d30] .__get_user_pages+0x288/0x4c0
> [c0000000ea92bbb0] [c0000000000da10c] .make_pages_present+0xa0/0xe0
> [c0000000ea92bc40] [c0000000000db758] .mlock_fixup+0x90/0x228
> [c0000000ea92bd00] [c0000000000dbb38] .do_mlock+0xc4/0x128
> [c0000000ea92bda0] [c0000000000dbccc] .SyS_mlock+0xb0/0xec
> [c0000000ea92be30] [c00000000000852c] syscall_exit+0x0/0x40
> Instruction dump:
> 0b000000 78892662 79291f24 7d69582a 7d600074 7800d182 0b000000 78895e62 
> 79291f24 7d29582a 7d200074 7800d182 <0b000000> 3c004000 3960ffff
> 780007c6 
> ---[ end trace 36a7faa04fa9452b ]---
> 
> This corresponds to
> 
> #ifdef CONFIG_DEBUG_VM
> void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
> {
>         pgd_t *pgd;
>         pud_t *pud;
>         pmd_t *pmd;
> 
>         if (mm == &init_mm)
>                 return;
>         pgd = mm->pgd + pgd_index(addr);
>         BUG_ON(pgd_none(*pgd));
>         pud = pud_offset(pgd, addr);
>         BUG_ON(pud_none(*pud));
>         pmd = pmd_offset(pud, addr);
>         BUG_ON(!pmd_present(*pmd));			<----- THIS LINE
>         BUG_ON(!spin_is_locked(pte_lockptr(mm, pmd)));
> }
> #endif /* CONFIG_DEBUG_VM */
> 
> This area was last changed by commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36
> in the 2.6.30-rc1 timeframe. I think there was another hugepage-related
> problem with this patch but I can't remember what it was.

It broke modules, but I don't remember anything hugepage related.

So the code changed from:

-#define  ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \
-({                                                                        \
-       int __changed = !pte_same(*(__ptep), __entry);                     \
-       if (__changed) {                                                   \
-               __ptep_set_access_flags(__ptep, __entry, __dirty);         \
-               flush_tlb_page_nohash(__vma, __address);                   \
-       }                                                                  \
-       __changed;                                                         \
-})

to:

+int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
+                         pte_t *ptep, pte_t entry, int dirty)
+{
+       int changed;
+       if (!dirty && pte_need_exec_flush(entry, 0))
+               entry = do_dcache_icache_coherency(entry);
+       changed = !pte_same(*(ptep), entry);
+       if (changed) {
+               assert_pte_locked(vma->vm_mm, address);
+               __ptep_set_access_flags(ptep, entry);
+               flush_tlb_page_nohash(vma, address);
+       }
+       return changed;
+}

So the call to assert_pte_locked() is new. And it's never going to work
for huge pages, the page table structure is different right? Notice
pte_update() checks (arch/powerpc/include/asm/pgtable-ppc64.h):

198         /* huge pages use the old page table lock */
199         if (!huge)
200                 assert_pte_locked(mm, addr);

But unlike pte_update() ptep_set_access_flags() has no way of knowing
it's been called from huge_ptep_set_access_flags().

So my guess is we either remove the call to assert_pte_locked() in
there, or have assert_pte_locked() check whether it's being called for a
huge pte.

cheers


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

WARNING: multiple messages have this Message-ID (diff)

From: Michael Ellerman <michael@ellerman.id.au>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linuxppc-dev@ozlabs.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages
Date: Sat, 25 Apr 2009 01:24:50 +1000	[thread overview]
Message-ID: <1240586690.12551.31.camel@localhost> (raw)
In-Reply-To: <20090424095116.GB14283@csn.ul.ie>

[-- Attachment #1: Type: text/plain, Size: 5601 bytes --]

On Fri, 2009-04-24 at 10:51 +0100, Mel Gorman wrote:
> On Tue, Apr 21, 2009 at 08:27:57PM -0700, Linus Torvalds wrote:
> > Another week, another -rc.
> > 
> 
> I'm seeing some tests with sysbench+postgres+large pages fail on ppc64
> although a very clear pattern is not forming as to what exactly is
> causing it. However, the libhugetlbfs regression tests (make && make
> func) are triggering the following oops when calling mlock() and so are
> likely related.
> 
> ------------[ cut here ]------------
> kernel BUG at arch/powerpc/mm/pgtable.c:243!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=128 NUMA pSeries
> Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log qla2xxx
> loop nfnetlink iptable_filter iptable_nat nf_nat ip_tables
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
> xt_tcpudp xt_limit ipt_LOG xt_pkttype x_tables
> NIP: c00000000002becc LR: c00000000002c02c CTR: 0000000000000000
> REGS: c0000000ea92b4c0 TRAP: 0700   Not tainted  (2.6.30-rc3-autokern1)
> MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 28000484  XER: 20000020
> TASK = c00000000395b660[7611] 'mlock' THREAD: c0000000ea928000 CPU: 3
> GPR00: 0000000000000001 c0000000ea92b740 c0000000008ea170 c0000000ec7d4980 
> GPR04: 000000003f000000 c0000001e2278cf8 0000001900000393 0000000000000001 
> GPR08: f000000002bc0000 0000000000000000 0000000000000113 c0000001e2278c81 
> GPR12: 0000000044000482 c00000000093b880 0000000028004422 0000000000000000 
> GPR16: c0000000ea92bbf0 c0000000009f06f0 0000001900000113 c0000000ec7d4980 
> GPR20: 0000000000000000 f000000002bc0000 000000003f000000 c0000001e2278cf8 
> GPR24: c0000000eaa90bb0 0000000000000000 c0000000eaa90bb0 c0000000ea928000 
> GPR28: f000000002bc0000 0000001900000393 0000000000000001 c0000001e2278cf8 
> NIP [c00000000002becc] .assert_pte_locked+0x54/0x8c
> LR [c00000000002c02c] .ptep_set_access_flags+0x50/0x8c
> Call Trace:
> [c0000000ea92b740] [c0000000eaa90bb0] 0xc0000000eaa90bb0 (unreliable)
> [c0000000ea92b7d0] [c0000000000ed1b0] .hugetlb_cow+0xd4/0x654
> [c0000000ea92b900] [c0000000000edbf0] .hugetlb_fault+0x4c0/0x708
> [c0000000ea92b9f0] [c0000000000ee890] .follow_hugetlb_page+0x174/0x364
> [c0000000ea92bae0] [c0000000000d8d30] .__get_user_pages+0x288/0x4c0
> [c0000000ea92bbb0] [c0000000000da10c] .make_pages_present+0xa0/0xe0
> [c0000000ea92bc40] [c0000000000db758] .mlock_fixup+0x90/0x228
> [c0000000ea92bd00] [c0000000000dbb38] .do_mlock+0xc4/0x128
> [c0000000ea92bda0] [c0000000000dbccc] .SyS_mlock+0xb0/0xec
> [c0000000ea92be30] [c00000000000852c] syscall_exit+0x0/0x40
> Instruction dump:
> 0b000000 78892662 79291f24 7d69582a 7d600074 7800d182 0b000000 78895e62 
> 79291f24 7d29582a 7d200074 7800d182 <0b000000> 3c004000 3960ffff
> 780007c6 
> ---[ end trace 36a7faa04fa9452b ]---
> 
> This corresponds to
> 
> #ifdef CONFIG_DEBUG_VM
> void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
> {
>         pgd_t *pgd;
>         pud_t *pud;
>         pmd_t *pmd;
> 
>         if (mm == &init_mm)
>                 return;
>         pgd = mm->pgd + pgd_index(addr);
>         BUG_ON(pgd_none(*pgd));
>         pud = pud_offset(pgd, addr);
>         BUG_ON(pud_none(*pud));
>         pmd = pmd_offset(pud, addr);
>         BUG_ON(!pmd_present(*pmd));			<----- THIS LINE
>         BUG_ON(!spin_is_locked(pte_lockptr(mm, pmd)));
> }
> #endif /* CONFIG_DEBUG_VM */
> 
> This area was last changed by commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36
> in the 2.6.30-rc1 timeframe. I think there was another hugepage-related
> problem with this patch but I can't remember what it was.

It broke modules, but I don't remember anything hugepage related.

So the code changed from:

-#define  ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \
-({                                                                        \
-       int __changed = !pte_same(*(__ptep), __entry);                     \
-       if (__changed) {                                                   \
-               __ptep_set_access_flags(__ptep, __entry, __dirty);         \
-               flush_tlb_page_nohash(__vma, __address);                   \
-       }                                                                  \
-       __changed;                                                         \
-})

to:

+int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
+                         pte_t *ptep, pte_t entry, int dirty)
+{
+       int changed;
+       if (!dirty && pte_need_exec_flush(entry, 0))
+               entry = do_dcache_icache_coherency(entry);
+       changed = !pte_same(*(ptep), entry);
+       if (changed) {
+               assert_pte_locked(vma->vm_mm, address);
+               __ptep_set_access_flags(ptep, entry);
+               flush_tlb_page_nohash(vma, address);
+       }
+       return changed;
+}

So the call to assert_pte_locked() is new. And it's never going to work
for huge pages, the page table structure is different right? Notice
pte_update() checks (arch/powerpc/include/asm/pgtable-ppc64.h):

198         /* huge pages use the old page table lock */
199         if (!huge)
200                 assert_pte_locked(mm, addr);

But unlike pte_update() ptep_set_access_flags() has no way of knowing
it's been called from huge_ptep_set_access_flags().

So my guess is we either remove the call to assert_pte_locked() in
there, or have assert_pte_locked() check whether it's being called for a
huge pte.

cheers


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

next prev parent reply	other threads:[~2009-04-24 15:24 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-22  3:27 Linus 2.6.30-rc3 Linus Torvalds
2009-04-22  6:20 ` Ingo Molnar
2009-04-22  6:38   ` [PATCH] include/linux/pktcdvd.h: add mempool.h dependency Ingo Molnar
2009-04-22  6:39     ` Jens Axboe
2009-04-22  6:54       ` Ingo Molnar
2009-04-22  6:58         ` Jens Axboe
2009-04-22  7:06           ` Ingo Molnar
2009-04-22  6:42     ` Ingo Molnar
2009-04-22  9:24 ` Linus 2.6.30-rc3 Denys Vlasenko
2009-04-24  9:51 ` [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages Mel Gorman
2009-04-24  9:51   ` Mel Gorman
2009-04-24 15:24   ` Michael Ellerman [this message]
2009-04-24 15:24     ` Michael Ellerman
2009-04-30 20:59     ` Mel Gorman
2009-04-30 20:59       ` Mel Gorman
2009-04-30 21:48       ` Benjamin Herrenschmidt
2009-04-30 21:48         ` Benjamin Herrenschmidt
2009-05-18 17:13         ` Mel Gorman
2009-05-18 17:13           ` Mel Gorman
2009-05-18 17:26           ` Linus Torvalds
2009-05-18 17:26             ` Linus Torvalds
2009-04-27  8:15   ` Benjamin Herrenschmidt
2009-04-27  8:15     ` Benjamin Herrenschmidt
2009-04-24 17:52 ` [BUG] 2.6.30-rc3: bnx2 failing to load firmware Mel Gorman
2009-04-24 18:31   ` Frans Pop
2009-04-24 18:37     ` Linus Torvalds
2009-04-24 19:02       ` Frans Pop
2009-04-27 12:34   ` Martin Knoblauch
2009-04-27 13:33     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1240586690.12551.31.camel@localhost \
    --to=michael@ellerman.id.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mel@csn.ul.ie \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.