From: Shawn Anastasio <shawn@anastas.io>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, linuxppc-dev@lists.ozlabs.org
Cc: Sam Bobroff <sbobroff@linux.ibm.com>,
Alistair Popple <alistair@popple.id.au>,
Oliver O'Halloran <oohall@gmail.com>,
David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
Date: Tue, 18 Jun 2019 02:00:06 -0500 [thread overview]
Message-ID: <553b21c2-c57f-2b27-3210-a957158d22dc@anastas.io> (raw)
In-Reply-To: <d4a8d06e-aa5b-dab7-4b20-d1aa77b5304a@ozlabs.ru>
On 6/18/19 1:39 AM, Alexey Kardashevskiy wrote:
>
>
> On 18/06/2019 14:26, Shawn Anastasio wrote:
>> On 6/12/19 2:15 PM, Shawn Anastasio wrote:
>>> On 6/12/19 2:07 AM, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 12/06/2019 15:05, Shawn Anastasio wrote:
>>>>> On 6/5/19 11:11 PM, Shawn Anastasio wrote:
>>>>>> On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:
>>>>>>> This is an attempt to allow DMA masks between 32..59 which are not
>>>>>>> large
>>>>>>> enough to use either a PHB3 bypass mode or a sketchy bypass.
>>>>>>> Depending
>>>>>>> on the max order, up to 40 is usually available.
>>>>>>>
>>>>>>>
>>>>>>> This is based on v5.2-rc2.
>>>>>>>
>>>>>>> Please comment. Thanks.
>>>>>>
>>>>>> I have tested this patch set with an AMD GPU that's limited to <64bit
>>>>>> DMA (I believe it's 40 or 42 bit). It successfully allows the card to
>>>>>> operate without falling back to 32-bit DMA mode as it does without
>>>>>> the patches.
>>>>>>
>>>>>> Relevant kernel log message:
>>>>>> ```
>>>>>> [ 0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass
>>>>>> ```
>>>>>>
>>>>>> Tested-by: Shawn Anastasio <shawn@anastas.io>
>>>>>
>>>>> After a few days of further testing, I've started to run into stability
>>>>> issues with the patch applied and used with an AMD GPU. Specifically,
>>>>> the system sometimes spontaneously crashes. Not just EEH errors either,
>>>>> the whole system shuts down in what looks like a checkstop.
>>>>>
>>>>> Perhaps some subtle corruption is occurring?
>>>>
>>>> Have you tried this?
>>>>
>>>> https://patchwork.ozlabs.org/patch/1113506/
>>>
>>> I have not. I'll give it a shot and try it out for a few days to see
>>> if I'm able to reproduce the crashes.
>>
>> A few days later and I was able to reproduce the checkstop while
>> watching a video in mpv. At this point the system had ~4 day
>> uptime and this wasn't the first video I watched during that time.
>>
>> This is with https://patchwork.ozlabs.org/patch/1113506/ applied, too.
>
>
> Any logs left? What was the reason for the checkstop and what is the
> hardware? "lscpu" and "lspci -vv" for the starter would help. Thanks,
The machine is a Talos II with 2x 8 core DD2.2 Sforza modules.
I've added the output of lscpu and lspci below. As for logs,
it doesn't seem there are any kernel logs of the event.
The opal-gard utility shows some error records which I have
also included below.
opal-gard:
```
$ sudo ./opal-gard show 1
Record ID: 0x00000001
========================
Error ID: 0x9000000b
Error Type: Fatal (0xe3)
Path Type: physical
>Sys, Instance #0
>Node, Instance #0
>Proc, Instance #1
>EQ, Instance #0
>EX, Instance #0
$ sudo ./opal-gard show 2
Record ID: 0x00000002
========================
Error ID: 0x90000021
Error Type: Fatal (0xe3)
Path Type: physical
>Sys, Instance #0
>Node, Instance #0
>Proc, Instance #1
>EQ, Instance #2
>EX, Instance #1
```
lscpu:
```
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 52
On-line CPU(s) list: 0-3,8-31,36-47,52-63
Thread(s) per core: 4
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Model: 2.2 (pvr 004e 1202)
Model name: POWER9, altivec supported
CPU max MHz: 3800.0000
CPU min MHz: 2154.0000
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 10240K
NUMA node0 CPU(s): 0-3,8-31
NUMA node8 CPU(s): 36-47,52-63
```
lspci -vv:
Output at: https://upaste.anastas.io/IwVQzt
prev parent reply other threads:[~2019-06-18 7:04 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-30 7:03 [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59 Alexey Kardashevskiy
2019-05-30 7:03 ` [PATCH kernel v3 1/3] powerpc/iommu: Allow bypass-only for DMA Alexey Kardashevskiy
2019-06-03 2:03 ` David Gibson
2019-05-30 7:03 ` [PATCH kernel v3 2/3] powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window Alexey Kardashevskiy
2019-07-08 7:01 ` alistair
2019-05-30 7:03 ` [PATCH kernel v3 3/3] powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU pages Alexey Kardashevskiy
2019-07-08 7:01 ` alistair
2019-07-09 0:57 ` Alexey Kardashevskiy
2019-06-06 4:11 ` [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59 Shawn Anastasio
2019-06-06 7:17 ` Alistair Popple
2019-06-06 12:07 ` Oliver
2019-06-07 1:41 ` Alistair Popple
2019-06-10 5:19 ` Alexey Kardashevskiy
2019-06-12 5:05 ` Shawn Anastasio
2019-06-12 6:16 ` Oliver O'Halloran
2019-06-12 19:14 ` Shawn Anastasio
2019-06-12 7:07 ` Alexey Kardashevskiy
2019-06-12 19:15 ` Shawn Anastasio
2019-06-18 4:26 ` Shawn Anastasio
2019-06-18 6:39 ` Alexey Kardashevskiy
2019-06-18 7:00 ` Shawn Anastasio [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=553b21c2-c57f-2b27-3210-a957158d22dc@anastas.io \
--to=shawn@anastas.io \
--cc=aik@ozlabs.ru \
--cc=alistair@popple.id.au \
--cc=david@gibson.dropbear.id.au \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=oohall@gmail.com \
--cc=sbobroff@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).