Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chunyou Tang <tangchunyou@163.com>
To: Steven Price <steven.price@arm.com>
Cc: tomeu.vizoso@collabora.com, airlied@linux.ie,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	alyssa.rosenzweig@collabora.com,
	ChunyouTang <tangchunyou@icubecorp.cn>
Subject: Re: [PATCH v2] drm/panfrost:report the full raw fault information instead
Date: Tue, 29 Jun 2021 11:04:53 +0800	[thread overview]
Message-ID: <20210629110453.00007ace@163.com> (raw)
In-Reply-To: <14b2a3c8-4bc2-c8f9-627b-9ac5840cad11@arm.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=GB18030, Size: 8827 bytes --]

Hi Steve,
	thinks for your reply.
	I set the pte in arm_lpae_prot_to_pte(),
***********************************************************************
	/*
	 * Also Mali has its own notions of shareability wherein its
Inner
	 * domain covers the cores within the GPU, and its Outer domain
is
	 * "outside the GPU" (i.e. either the Inner or System domain in
CPU
	 * terms, depending on coherency).
	 */
	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
		pte |= ARM_LPAE_PTE_SH_IS;
	else
		pte |= ARM_LPAE_PTE_SH_OS;
***********************************************************************
I set pte |= ARM_LPAE_PTE_SH_NS.

	If I set pte to ARM_LPAE_PTE_SH_OS or
	ARM_LPAE_PTE_SH_IS,whether I use singel core GPU or multi core
	GPU,it will occur GPU Fault.
	if I set pte to ARM_LPAE_PTE_SH_NS,whether I use singel core
	GPU or multi core GPU,it will not occur GPU Fault.

Thinks

Chunyou

ÓÚ Mon, 28 Jun 2021 11:48:59 +0100
Steven Price <steven.price@arm.com> Ð´µÀ:

> On 25/06/2021 10:49, Chunyou Tang wrote:
> > Hi Steve,
> > 	Thinks for your reply.
> > 	When I only set the pte |= ARM_LPAE_PTE_SH_NS;there have no
> > "GPU Fault",When I set the pte |= ARM_LPAE_PTE_SH_IS(or
> > ARM_LPAE_PTE_SH_OS);there have "GPU Fault".I don't know how the pte
> > effect this issue?
> > 	Can you give me some suggestions again?
> > 
> > Thinks.
> > 
> > Chunyou
> 
> Hi Chunyou,
> 
> You haven't given me much context so I'm not entirely sure which PTE
> you are talking about (GPU or CPU), or indeed where you are changing
> the PTE values.
> 
> The PTEs control whether a page is shareable or not, the GPU requires
> that accesses are consistent (i.e. either all accesses to a page are
> shareable or all are non-shareable) and will race a fault if it
> detects this isn't the case. Mali also has a quirk for its version of
> 'LPAE' where inner shareable actually means only within the GPU and
> outer shareable means outside the GPU (which I think usually means
> Inner Shareable on the external bus).
> 
> Steve
> 
> > ÓÚ Thu, 24 Jun 2021 14:22:04 +0100
> > Steven Price <steven.price@arm.com> Ð´µÀ:
> > 
> >> On 22/06/2021 02:40, Chunyou Tang wrote:
> >>> Hi Steve,
> >>> 	I will send a new patch with suitable subject/commit
> >>> message. But I send a V3 or a new patch?
> >>
> >> Send a V3 - it is a new version of this patch.
> >>
> >>> 	I met a bug about the GPU,I have no idea about how to fix
> >>> it, If you can give me some suggestion,it is perfect.
> >>>
> >>> You can see such kernel log:
> >>>
> >>> Jun 20 10:20:13 icube kernel: [  774.566760] mvp_gpu 0000:05:00.0:
> >>> GPU Fault 0x00000088 (SHAREABILITY_FAULT) at 0x000000000310fd00
> >>> Jun 20 10:20:13 icube kernel: [  774.566764] mvp_gpu 0000:05:00.0:
> >>> There were multiple GPU faults - some have not been reported Jun
> >>> 20 10:20:13 icube kernel: [  774.667542] mvp_gpu 0000:05:00.0:
> >>> AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [  774.767900]
> >>> mvp_gpu 0000:05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube
> >>> kernel: [  774.868546] mvp_gpu 0000:05:00.0: AS_ACTIVE bit stuck
> >>> Jun 20 10:20:13 icube kernel: [  774.968910] mvp_gpu 0000:05:00.0:
> >>> AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [  775.069251]
> >>> mvp_gpu 0000:05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:22 icube
> >>> kernel: [  783.693971] mvp_gpu 0000:05:00.0: gpu sched timeout,
> >>> js=1, config=0x7300, status=0x8, head=0x362c900, tail=0x362c100,
> >>> sched_job=000000003252fb84
> >>>
> >>> In
> >>> https://lore.kernel.org/dri-devel/20200510165538.19720-1-peron.clem@gmail.com/
> >>> there had a same bug like mine,and I found you at the mail list,I
> >>> don't know how it fixed?
> >>
> >> The GPU_SHAREABILITY_FAULT error means that a cache line has been
> >> accessed both as shareable and non-shareable and therefore
> >> coherency cannot be guaranteed. Although the "multiple GPU faults"
> >> means that this may not be the underlying cause.
> >>
> >> The fact that your dmesg log has PCI style identifiers
> >> ("0000:05:00.0") suggests this is an unusual platform - I've not
> >> previously been aware of a Mali device behind PCI. Is this device
> >> working with the kbase/DDK proprietary driver? It would be worth
> >> looking at the kbase kernel code for the platform to see if there
> >> is anything special done for the platform.
> >>
> >> From the dmesg logs all I can really tell is that the GPU seems
> >> unhappy about the memory system.
> >>
> >> Steve
> >>
> >>> I need your help!
> >>>
> >>> thinks very much!
> >>>
> >>> Chunyou
> >>>
> >>> ÓÚ Mon, 21 Jun 2021 11:45:20 +0100
> >>> Steven Price <steven.price@arm.com> Ð´µÀ:
> >>>
> >>>> On 19/06/2021 04:18, Chunyou Tang wrote:
> >>>>> Hi Steve,
> >>>>> 	1,Now I know how to write the subject
> >>>>> 	2,the low 8 bits is the exception type in spec.
> >>>>>
> >>>>> and you can see prnfrost_exception_name()
> >>>>>
> >>>>> switch (exception_code) {
> >>>>>                 /* Non-Fault Status code */
> >>>>> case 0x00: return "NOT_STARTED/IDLE/OK";
> >>>>> case 0x01: return "DONE";
> >>>>> case 0x02: return "INTERRUPTED";
> >>>>> case 0x03: return "STOPPED";
> >>>>> case 0x04: return "TERMINATED";
> >>>>> case 0x08: return "ACTIVE";
> >>>>> ........
> >>>>> ........
> >>>>> case 0xD8: return "ACCESS_FLAG";
> >>>>> case 0xD9 ... 0xDF: return "ACCESS_FLAG";
> >>>>> case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
> >>>>> case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
> >>>>> }
> >>>>> return "UNKNOWN";
> >>>>> }
> >>>>>
> >>>>> the exception_code in case is only 8 bits,so if fault_status
> >>>>> in panfrost_gpu_irq_handler() don't & 0xFF,it can't get correct
> >>>>> exception reason,it will be always UNKNOWN.
> >>>>
> >>>> Yes, I'm happy with the change - I just need a patch that I can
> >>>> apply. At the moment this patch only changes the first '0x%08x'
> >>>> output rather than the call to panfrost_exception_name() as well.
> >>>> So we just need a patch which does:
> >>>>
> >>>> - fault_status & 0xFF, panfrost_exception_name(pfdev,
> >>>> fault_status),
> >>>> + fault_status, panfrost_exception_name(pfdev, fault_status &
> >>>> 0xFF),
> >>>>
> >>>> along with a suitable subject/commit message describing the
> >>>> change. If you can send me that I can apply it.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Steve
> >>>>
> >>>> PS. Sorry for going round in circles here - I'm trying to help
> >>>> you get setup so you'll be able to contribute patches easily in
> >>>> future. An important part of that is ensuring you can send a
> >>>> properly formatted patch to the list.
> >>>>
> >>>> PPS. I'm still not receiving your emails directly. I don't think
> >>>> it's a problem at my end because I'm receiving other emails, but
> >>>> if you can somehow fix the problem you're likely to receive a
> >>>> faster response.
> >>>>
> >>>>> ÓÚ Fri, 18 Jun 2021 13:43:24 +0100
> >>>>> Steven Price <steven.price@arm.com> Ð´µÀ:
> >>>>>
> >>>>>> On 17/06/2021 07:20, ChunyouTang wrote:
> >>>>>>> From: ChunyouTang <tangchunyou@icubecorp.cn>
> >>>>>>>
> >>>>>>> of the low 8 bits.
> >>>>>>
> >>>>>> Please don't split the subject like this. The first line of the
> >>>>>> commit should be a (very short) summary of the patch. Then a
> >>>>>> blank line and then a longer description of what the purpose of
> >>>>>> the patch is and why it's needed.
> >>>>>>
> >>>>>> Also you previously had this as part of a series (the first
> >>>>>> part adding the "& 0xFF" in the panfrost_exception_name()
> >>>>>> call). I'm not sure we need two patches for the single line,
> >>>>>> but as it stands this patch doesn't apply.
> >>>>>>
> >>>>>> Also I'm still not receiving any emails from you directly (only
> >>>>>> via the list), so it's possible I might have missed something
> >>>>>> you sent.
> >>>>>>
> >>>>>> Steve
> >>>>>>
> >>>>>>>
> >>>>>>> Signed-off-by: ChunyouTang <tangchunyou@icubecorp.cn>
> >>>>>>> ---
> >>>>>>>  drivers/gpu/drm/panfrost/panfrost_gpu.c | 2 +-
> >>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> >>>>>>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c index
> >>>>>>> 1fffb6a0b24f..d2d287bbf4e7 100644 ---
> >>>>>>> a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++
> >>>>>>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -33,7 +33,7 @@
> >>>>>>> static irqreturn_t panfrost_gpu_irq_handler(int irq, void
> >>>>>>> *data) address |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO); 
> >>>>>>>  		dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s)
> >>>>>>> at 0x%016llx\n",
> >>>>>>> -			 fault_status & 0xFF,
> >>>>>>> panfrost_exception_name(pfdev, fault_status & 0xFF),
> >>>>>>> +			 fault_status,
> >>>>>>> panfrost_exception_name(pfdev, fault_status & 0xFF), address);
> >>>>>>>  
> >>>>>>>  		if (state & GPU_IRQ_MULTIPLE_FAULT)
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> > 
> >

next prev parent reply	other threads:[~2021-06-29  3:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17  6:20 [PATCH v2] drm/panfrost:report the full raw fault information instead ChunyouTang
2021-06-17  6:20 ` ChunyouTang
2021-06-18 12:43 ` Steven Price
2021-06-19  3:18   ` Chunyou Tang
2021-06-19  3:18     ` Chunyou Tang
2021-06-21 10:45     ` Steven Price
2021-06-22  1:40       ` Chunyou Tang
2021-06-24 13:22         ` Steven Price
2021-06-25  9:49           ` Chunyou Tang
2021-06-28 10:48             ` Steven Price
2021-06-28 14:17               ` Robin Murphy
2021-06-29  3:08                 ` Chunyou Tang
2021-06-29  3:08                   ` Chunyou Tang
2021-06-29  3:04               ` Chunyou Tang [this message]
2021-07-01 10:15                 ` Steven Price
2021-07-02  1:40                   ` Chunyou Tang
2021-07-05 13:50                     ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210629110453.00007ace@163.com \
    --to=tangchunyou@163.com \
    --cc=airlied@linux.ie \
    --cc=alyssa.rosenzweig@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=steven.price@arm.com \
    --cc=tangchunyou@icubecorp.cn \
    --cc=tomeu.vizoso@collabora.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.