linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Shawn Anastasio <shawn@anastas.io>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, linuxppc-dev@lists.ozlabs.org
Cc: Sam Bobroff <sbobroff@linux.ibm.com>,
	Alistair Popple <alistair@popple.id.au>,
	Oliver O'Halloran <oohall@gmail.com>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
Date: Tue, 18 Jun 2019 02:00:06 -0500	[thread overview]
Message-ID: <553b21c2-c57f-2b27-3210-a957158d22dc@anastas.io> (raw)
In-Reply-To: <d4a8d06e-aa5b-dab7-4b20-d1aa77b5304a@ozlabs.ru>

On 6/18/19 1:39 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 18/06/2019 14:26, Shawn Anastasio wrote:
>> On 6/12/19 2:15 PM, Shawn Anastasio wrote:
>>> On 6/12/19 2:07 AM, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 12/06/2019 15:05, Shawn Anastasio wrote:
>>>>> On 6/5/19 11:11 PM, Shawn Anastasio wrote:
>>>>>> On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:
>>>>>>> This is an attempt to allow DMA masks between 32..59 which are not
>>>>>>> large
>>>>>>> enough to use either a PHB3 bypass mode or a sketchy bypass.
>>>>>>> Depending
>>>>>>> on the max order, up to 40 is usually available.
>>>>>>>
>>>>>>>
>>>>>>> This is based on v5.2-rc2.
>>>>>>>
>>>>>>> Please comment. Thanks.
>>>>>>
>>>>>> I have tested this patch set with an AMD GPU that's limited to <64bit
>>>>>> DMA (I believe it's 40 or 42 bit). It successfully allows the card to
>>>>>> operate without falling back to 32-bit DMA mode as it does without
>>>>>> the patches.
>>>>>>
>>>>>> Relevant kernel log message:
>>>>>> ```
>>>>>> [    0.311211] pci 0033:01     : [PE# 00] Enabling 64-bit DMA bypass
>>>>>> ```
>>>>>>
>>>>>> Tested-by: Shawn Anastasio <shawn@anastas.io>
>>>>>
>>>>> After a few days of further testing, I've started to run into stability
>>>>> issues with the patch applied and used with an AMD GPU. Specifically,
>>>>> the system sometimes spontaneously crashes. Not just EEH errors either,
>>>>> the whole system shuts down in what looks like a checkstop.
>>>>>
>>>>> Perhaps some subtle corruption is occurring?
>>>>
>>>> Have you tried this?
>>>>
>>>> https://patchwork.ozlabs.org/patch/1113506/
>>>
>>> I have not. I'll give it a shot and try it out for a few days to see
>>> if I'm able to reproduce the crashes.
>>
>> A few days later and I was able to reproduce the checkstop while
>> watching a video in mpv. At this point the system had ~4 day
>> uptime and this wasn't the first video I watched during that time.
>>
>> This is with https://patchwork.ozlabs.org/patch/1113506/ applied, too.
> 
> 
> Any logs left? What was the reason for the checkstop and what is the
> hardware? "lscpu" and "lspci -vv" for the starter would help. Thanks,

The machine is a Talos II with 2x 8 core DD2.2 Sforza modules.
I've added the output of lscpu and lspci below. As for logs,
it doesn't seem there are any kernel logs of the event.
The opal-gard utility shows some error records which I have
also included below.

opal-gard:
```
$ sudo ./opal-gard show 1
Record ID:    0x00000001
========================
Error ID:     0x9000000b
Error Type:   Fatal (0xe3)
Path Type: physical
 >Sys, Instance #0
  >Node, Instance #0
   >Proc, Instance #1
    >EQ, Instance #0
     >EX, Instance #0

$ sudo ./opal-gard show 2
Record ID:    0x00000002
========================
Error ID:     0x90000021
Error Type:   Fatal (0xe3)
Path Type: physical
 >Sys, Instance #0
  >Node, Instance #0
   >Proc, Instance #1
    >EQ, Instance #2
     >EX, Instance #1

```

lscpu:
```
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              52
On-line CPU(s) list: 0-3,8-31,36-47,52-63
Thread(s) per core:  4
Core(s) per socket:  6
Socket(s):           2
NUMA node(s):        2
Model:               2.2 (pvr 004e 1202)
Model name:          POWER9, altivec supported
CPU max MHz:         3800.0000
CPU min MHz:         2154.0000
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-3,8-31
NUMA node8 CPU(s):   36-47,52-63

```

lspci -vv:
Output at: https://upaste.anastas.io/IwVQzt

      reply	other threads:[~2019-06-18  7:04 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-30  7:03 [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59 Alexey Kardashevskiy
2019-05-30  7:03 ` [PATCH kernel v3 1/3] powerpc/iommu: Allow bypass-only for DMA Alexey Kardashevskiy
2019-06-03  2:03   ` David Gibson
2019-05-30  7:03 ` [PATCH kernel v3 2/3] powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window Alexey Kardashevskiy
2019-07-08  7:01   ` alistair
2019-05-30  7:03 ` [PATCH kernel v3 3/3] powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU pages Alexey Kardashevskiy
2019-07-08  7:01   ` alistair
2019-07-09  0:57     ` Alexey Kardashevskiy
2019-06-06  4:11 ` [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59 Shawn Anastasio
2019-06-06  7:17   ` Alistair Popple
2019-06-06 12:07     ` Oliver
2019-06-07  1:41       ` Alistair Popple
2019-06-10  5:19         ` Alexey Kardashevskiy
2019-06-12  5:05   ` Shawn Anastasio
2019-06-12  6:16     ` Oliver O'Halloran
2019-06-12 19:14       ` Shawn Anastasio
2019-06-12  7:07     ` Alexey Kardashevskiy
2019-06-12 19:15       ` Shawn Anastasio
2019-06-18  4:26         ` Shawn Anastasio
2019-06-18  6:39           ` Alexey Kardashevskiy
2019-06-18  7:00             ` Shawn Anastasio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=553b21c2-c57f-2b27-3210-a957158d22dc@anastas.io \
    --to=shawn@anastas.io \
    --cc=aik@ozlabs.ru \
    --cc=alistair@popple.id.au \
    --cc=david@gibson.dropbear.id.au \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=oohall@gmail.com \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).