Linux MIPS Architecture development
 help / color / mirror / Atom feed
From: David Daney <ddaney@caviumnetworks.com>
To: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: Ralf Baechle <ralf@linux-mips.org>,
	Aaro Koskinen <aaro.koskinen@nokia.com>,
	Joshua Kinard <kumba@gentoo.org>, <linux-mips@linux-mips.org>,
	"Hill, Steven" <Steven.Hill@cavium.com>
Subject: Re: THP broken on OCTEON?
Date: Mon, 23 May 2016 12:03:34 -0700	[thread overview]
Message-ID: <57435406.1060104@caviumnetworks.com> (raw)
In-Reply-To: <20160523185226.GA1253@raspberrypi.musicnaut.iki.fi>

On 05/23/2016 11:52 AM, Aaro Koskinen wrote:
> On Mon, May 23, 2016 at 09:21:22AM -0700, David Daney wrote:
>> On 05/23/2016 08:20 AM, Ralf Baechle wrote:
>>> On Mon, May 23, 2016 at 06:13:46PM +0300, Aaro Koskinen wrote:
>>>> I'm getting kernel crashes (see below) reliably when building Perl in
>>>> parallel (make -j16) on OCTEON EBH5600 board (8 cores, 4 GB RAM) with
>>>> Linux 4.6.
>>>>
>>>> It seems that CONFIG_TRANSPARENT_HUGEPAGE has something to do with the
>>>> issue - disabling it makes build go through fine.
>>>>
>>>> Any ideas?
>>>
>>> I thought it was working except on SGI Origin 200/2000 aka IP27 where
>>> Joshua Kinard (added to cc) was hitting issues as well.
>>>
>>> Joshua, does that similar to the issues you were hitting?
>>
>> There is nothing OCTEON specific in the THP code, or huge pages in general.
>>
>> That said, we have seen other THP related failures, and have never been able
>> to find the cause.
>>
>> If someone can come up with a reproducible test case that triggers quickly,
>> we can run it in our simulator and easily find the problem.
>
> Trying to build Perl is a reliable reproducer. Is that too heavyweight
> for your simulator?
>
> I was able to reproduce this also on EdgeRouter Pro, but there the kernel
> does not fail, only compiler dies with SIGBUS:
>
> [  315.095264] Data bus error, epc == 0000000000a801c4, ra == 0000000000a80624
>
> And without THP the build is fine.
>
> I also tried CN68XX board with 16 GB RAM and also there I get SIGBUS failure
> instead of Machine Check.
>

Yes.  I think the problem is some sort of corruption of the page tables. 
  This may show up as MachineCheck Errors, or bus errors, or SIGSEGV.

David.

WARNING: multiple messages have this Message-ID (diff)
From: David Daney <ddaney@caviumnetworks.com>
To: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: Ralf Baechle <ralf@linux-mips.org>,
	Aaro Koskinen <aaro.koskinen@nokia.com>,
	Joshua Kinard <kumba@gentoo.org>,
	linux-mips@linux-mips.org, "Hill,
	Steven" <Steven.Hill@cavium.com>
Subject: Re: THP broken on OCTEON?
Date: Mon, 23 May 2016 12:03:34 -0700	[thread overview]
Message-ID: <57435406.1060104@caviumnetworks.com> (raw)
Message-ID: <20160523190334.szE3C_KTfTarrgX7-T0GgCZxtAmFOHSiBbDREvgS0Go@z> (raw)
In-Reply-To: <20160523185226.GA1253@raspberrypi.musicnaut.iki.fi>

On 05/23/2016 11:52 AM, Aaro Koskinen wrote:
> On Mon, May 23, 2016 at 09:21:22AM -0700, David Daney wrote:
>> On 05/23/2016 08:20 AM, Ralf Baechle wrote:
>>> On Mon, May 23, 2016 at 06:13:46PM +0300, Aaro Koskinen wrote:
>>>> I'm getting kernel crashes (see below) reliably when building Perl in
>>>> parallel (make -j16) on OCTEON EBH5600 board (8 cores, 4 GB RAM) with
>>>> Linux 4.6.
>>>>
>>>> It seems that CONFIG_TRANSPARENT_HUGEPAGE has something to do with the
>>>> issue - disabling it makes build go through fine.
>>>>
>>>> Any ideas?
>>>
>>> I thought it was working except on SGI Origin 200/2000 aka IP27 where
>>> Joshua Kinard (added to cc) was hitting issues as well.
>>>
>>> Joshua, does that similar to the issues you were hitting?
>>
>> There is nothing OCTEON specific in the THP code, or huge pages in general.
>>
>> That said, we have seen other THP related failures, and have never been able
>> to find the cause.
>>
>> If someone can come up with a reproducible test case that triggers quickly,
>> we can run it in our simulator and easily find the problem.
>
> Trying to build Perl is a reliable reproducer. Is that too heavyweight
> for your simulator?
>
> I was able to reproduce this also on EdgeRouter Pro, but there the kernel
> does not fail, only compiler dies with SIGBUS:
>
> [  315.095264] Data bus error, epc == 0000000000a801c4, ra == 0000000000a80624
>
> And without THP the build is fine.
>
> I also tried CN68XX board with 16 GB RAM and also there I get SIGBUS failure
> instead of Machine Check.
>

Yes.  I think the problem is some sort of corruption of the page tables. 
  This may show up as MachineCheck Errors, or bus errors, or SIGSEGV.

David.

  reply	other threads:[~2016-05-23 19:03 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-23 15:13 THP broken on OCTEON? Aaro Koskinen
2016-05-23 15:20 ` Ralf Baechle
2016-05-23 16:21   ` David Daney
2016-05-23 18:52     ` Aaro Koskinen
2016-05-23 19:03       ` David Daney [this message]
2016-05-23 19:03         ` David Daney
2016-05-23 19:08       ` Joshua Kinard
2016-05-23 20:02         ` Alastair Bridgewater
2016-05-23 18:57   ` Joshua Kinard
2016-05-23 19:22     ` Ralf Baechle
2016-05-23 19:40       ` Joshua Kinard
2016-05-23 20:01         ` Ralf Baechle
2016-05-24 21:21         ` Aaro Koskinen
2016-05-24 22:39           ` David Daney
2016-05-25 13:41 ` Aaro Koskinen
2016-05-26  9:33   ` Joshua Kinard
2016-05-26 13:36     ` Aaro Koskinen
2016-05-26 17:59   ` David Daney
2016-05-26 19:23     ` Aaro Koskinen
2016-05-26 22:13       ` David Daney
2016-05-27 17:14         ` Aaro Koskinen
2016-05-27 21:03           ` Joshua Kinard
2016-05-27 22:05             ` Aaro Koskinen
2016-05-27 22:22               ` Joshua Kinard
2016-06-22 22:05 ` David Daney
2016-06-23 12:08   ` Aaro Koskinen
2016-06-23 12:08     ` Aaro Koskinen
2016-06-24 11:38     ` Joshua Kinard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57435406.1060104@caviumnetworks.com \
    --to=ddaney@caviumnetworks.com \
    --cc=Steven.Hill@cavium.com \
    --cc=aaro.koskinen@iki.fi \
    --cc=aaro.koskinen@nokia.com \
    --cc=kumba@gentoo.org \
    --cc=linux-mips@linux-mips.org \
    --cc=ralf@linux-mips.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox