All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bharata B Rao <bharata@linux.ibm.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>,
	aneesh.kumar@linux.ibm.com, bharata@linux.vnet.ibm.com,
	linux-kernel@vger.kernel.org, linux-next@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org,
	srikanth <sraithal@linux.vnet.ibm.com>
Subject: Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
Date: Mon, 20 May 2019 11:26:22 +0530	[thread overview]
Message-ID: <20190520055622.GC22939@in.ibm.com> (raw)
In-Reply-To: <1558327521.633yjtl8ki.astroid@bobo.none>

On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> >> > git bisect points to
> >> >
> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> >> > Author: Nicholas Piggin <npiggin@gmail.com>
> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >> >
> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> >> >
> >> >     The page table fragment allocator uses the main page refcount racily
> >> >     with respect to speculative references. A customer observed a BUG due
> >> >     to page table page refcount underflow in the fragment allocator. This
> >> >     can be caused by the fragment allocator set_page_count stomping on a
> >> >     speculative reference, and then the speculative failure handler
> >> >     decrements the new reference, and the underflow eventually pops when
> >> >     the page tables are freed.
> >> >
> >> >     Fix this by using a dedicated field in the struct page for the page
> >> >     table fragment allocator.
> >> >
> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >> >     Cc: stable@vger.kernel.org # v3.10+
> >> 
> >> That's the commit that added the BUG_ON(), so prior to that you won't
> >> see the crash.
> > 
> > Right, but the commit says it fixes page table page refcount underflow by
> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> > for this pt_frag_refcount.
> 
> The fixed underflow is caused by a bug (race on page count) that got 
> fixed by that patch. You are hitting a different underflow here. It's
> not certain my patch caused it, I'm just trying to reproduce now.

Ok.

> 
> > 
> > BTW, if I go below this commit, I don't hit the pagecount
> > 
> > VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
> > 
> > which is in pte_fragment_free() path.
> 
> Do you have CONFIG_DEBUG_VM=y?

Yes.

Regards,
Bharata.

WARNING: multiple messages have this Message-ID (diff)
From: Bharata B Rao <bharata@linux.ibm.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: linux-kernel@vger.kernel.org, linux-next@vger.kernel.org,
	aneesh.kumar@linux.ibm.com, bharata@linux.vnet.ibm.com,
	srikanth <sraithal@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
Date: Mon, 20 May 2019 11:26:22 +0530	[thread overview]
Message-ID: <20190520055622.GC22939@in.ibm.com> (raw)
In-Reply-To: <1558327521.633yjtl8ki.astroid@bobo.none>

On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> >> > git bisect points to
> >> >
> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> >> > Author: Nicholas Piggin <npiggin@gmail.com>
> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >> >
> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> >> >
> >> >     The page table fragment allocator uses the main page refcount racily
> >> >     with respect to speculative references. A customer observed a BUG due
> >> >     to page table page refcount underflow in the fragment allocator. This
> >> >     can be caused by the fragment allocator set_page_count stomping on a
> >> >     speculative reference, and then the speculative failure handler
> >> >     decrements the new reference, and the underflow eventually pops when
> >> >     the page tables are freed.
> >> >
> >> >     Fix this by using a dedicated field in the struct page for the page
> >> >     table fragment allocator.
> >> >
> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >> >     Cc: stable@vger.kernel.org # v3.10+
> >> 
> >> That's the commit that added the BUG_ON(), so prior to that you won't
> >> see the crash.
> > 
> > Right, but the commit says it fixes page table page refcount underflow by
> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> > for this pt_frag_refcount.
> 
> The fixed underflow is caused by a bug (race on page count) that got 
> fixed by that patch. You are hitting a different underflow here. It's
> not certain my patch caused it, I'm just trying to reproduce now.

Ok.

> 
> > 
> > BTW, if I go below this commit, I don't hit the pagecount
> > 
> > VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
> > 
> > which is in pte_fragment_free() path.
> 
> Do you have CONFIG_DEBUG_VM=y?

Yes.

Regards,
Bharata.


  reply	other threads:[~2019-05-20  5:56 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-16 14:14 PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest srikanth
2019-05-16 14:14 ` srikanth
2019-05-17 11:20 ` Michael Ellerman
2019-05-17 11:20   ` Michael Ellerman
2019-05-18 14:14 ` Bharata B Rao
2019-05-18 14:14   ` Bharata B Rao
2019-05-20  2:02   ` Michael Ellerman
2019-05-20  2:02     ` Michael Ellerman
2019-05-20  4:25     ` Bharata B Rao
2019-05-20  4:25       ` Bharata B Rao
2019-05-20  4:48       ` Nicholas Piggin
2019-05-20  4:48         ` Nicholas Piggin
2019-05-20  5:56         ` Bharata B Rao [this message]
2019-05-20  5:56           ` Bharata B Rao
2019-05-20  7:00           ` Nicholas Piggin
2019-05-20  7:00             ` Nicholas Piggin
2019-05-20  8:20             ` Bharata B Rao
2019-05-20  8:20               ` Bharata B Rao
2019-05-20 14:29               ` Bharata B Rao
2019-05-20 14:29                 ` Bharata B Rao
2019-05-20 14:55                 ` Nicholas Piggin
2019-05-20 14:55                   ` Nicholas Piggin
2019-05-20 15:12                   ` Bharata B Rao
2019-05-20 15:12                     ` Bharata B Rao
2019-05-20 15:20                   ` Aneesh Kumar K.V
2019-05-20 15:20                     ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190520055622.GC22939@in.ibm.com \
    --to=bharata@linux.ibm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=sraithal@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.