AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: xinhui pan <xinhui.pan@amd.com>, amd-gfx@lists.freedesktop.org
Cc: alexander.deucher@amd.com, christian.koenig@amd.com
Subject: Re: [PATCH] drm/amdgpu: Put drm_dev_enter/exit outside hot codepath
Date: Tue, 14 Sep 2021 23:02:51 -0400	[thread overview]
Message-ID: <e70ea465-93c1-842c-69b4-88f9d0aa5217@amd.com> (raw)
In-Reply-To: <20210915014256.20404-1-xinhui.pan@amd.com>


On 2021-09-14 9:42 p.m., xinhui pan wrote:
> We hit soft hang while doing memory pressure test on one numa system.
> After a qucik look, this is because kfd invalid/valid userptr memory
> frequently with process_info lock hold.
>
> perf top says below,
> 75.81%  [kernel]       [k] __srcu_read_unlock


Do you have any idea why most of CPU cycles would be spent in SRCU 
unlock ? It's
not waiting on anything within this function and does some simple 
arithmetic inside
as far as I see.

>   6.19%  [amdgpu]       [k] amdgpu_gmc_set_pte_pde
>   3.56%  [kernel]       [k] __srcu_read_lock
>   2.20%  [amdgpu]       [k] amdgpu_vm_cpu_update
>   2.20%  [kernel]       [k] __sg_page_iter_dma_next
>   2.15%  [drm]          [k] drm_dev_enter
>   1.70%  [drm]          [k] drm_prime_sg_to_dma_addr_array
>   1.18%  [kernel]       [k] __sg_alloc_table_from_pages
>   1.09%  [drm]          [k] drm_dev_exit
>
> So move drm_dev_enter/exit outside gmc code, instead let caller do it.


Not clear from explanation here how the soft hang with process_info lock 
being hold
is related to to SRCU lock of drm_dev_enter/exit.


> They are gart_unbind, gart_map, vm_cpu_update(already hold in its
> caller)


Where in the caller ?


> and gmc_init_pdb0(no need)


Why no need ? Those guards protect from accessing MMIO ranges after
device is hot removed and hence they don't belong to him anymore. The
function above is also called during device resume from S3 and it's possible
to hot unplug device during S3 so this might be called with extracted device

Is it possible to run libdrm amdgpu test hot plug test suite on this 
change (before and after)
to verify if this actually breaks hot unplug ? The suite is committed 
into latest libdrm but disabled
until latest fixes from amd-staging-drm-next reach upstream drm-next. So 
to enable it this
code 
https://gitlab.freedesktop.org/mesa/drm/-/blob/main/tests/amdgpu/hotunplug_tests.c#L65
needs to be commented out.

Andrey


>
> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 11 +++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c  |  7 -------
>   2 files changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index 76efd5f8950f..d7e4f4660acf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -34,6 +34,7 @@
>   #include <asm/set_memory.h>
>   #endif
>   #include "amdgpu.h"
> +#include <drm/drm_drv.h>
>   
>   /*
>    * GART
> @@ -230,12 +231,16 @@ int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset,
>   	u64 page_base;
>   	/* Starting from VEGA10, system bit must be 0 to mean invalid. */
>   	uint64_t flags = 0;
> +	int idx;
>   
>   	if (!adev->gart.ready) {
>   		WARN(1, "trying to unbind memory from uninitialized GART !\n");
>   		return -EINVAL;
>   	}
>   
> +	if (!drm_dev_enter(&adev->ddev, &idx))
> +		return 0;
> +
>   	t = offset / AMDGPU_GPU_PAGE_SIZE;
>   	p = t / AMDGPU_GPU_PAGES_IN_CPU_PAGE;
>   	for (i = 0; i < pages; i++, p++) {
> @@ -254,6 +259,7 @@ int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset,
>   	for (i = 0; i < adev->num_vmhubs; i++)
>   		amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);
>   
> +	drm_dev_exit(idx);
>   	return 0;
>   }
>   
> @@ -276,12 +282,16 @@ int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
>   {
>   	uint64_t page_base;
>   	unsigned i, j, t;
> +	int idx;
>   
>   	if (!adev->gart.ready) {
>   		WARN(1, "trying to bind memory to uninitialized GART !\n");
>   		return -EINVAL;
>   	}
>   
> +	if (!drm_dev_enter(&adev->ddev, &idx))
> +		return 0;
> +
>   	t = offset / AMDGPU_GPU_PAGE_SIZE;
>   
>   	for (i = 0; i < pages; i++) {
> @@ -291,6 +301,7 @@ int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
>   			page_base += AMDGPU_GPU_PAGE_SIZE;
>   		}
>   	}
> +	drm_dev_exit(idx);
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 54f059501a33..e973488250e8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -31,7 +31,6 @@
>   #include "amdgpu_ras.h"
>   #include "amdgpu_xgmi.h"
>   
> -#include <drm/drm_drv.h>
>   
>   /**
>    * amdgpu_gmc_pdb0_alloc - allocate vram for pdb0
> @@ -153,10 +152,6 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr,
>   {
>   	void __iomem *ptr = (void *)cpu_pt_addr;
>   	uint64_t value;
> -	int idx;
> -
> -	if (!drm_dev_enter(&adev->ddev, &idx))
> -		return 0;
>   
>   	/*
>   	 * The following is for PTE only. GART does not have PDEs.
> @@ -165,8 +160,6 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr,
>   	value |= flags;
>   	writeq(value, ptr + (gpu_page_idx * 8));
>   
> -	drm_dev_exit(idx);
> -
>   	return 0;
>   }
>   

  reply	other threads:[~2021-09-15  3:02 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-15  1:42 [PATCH] drm/amdgpu: Put drm_dev_enter/exit outside hot codepath xinhui pan
2021-09-15  3:02 ` Andrey Grodzovsky [this message]
     [not found]   ` <DM4PR12MB5165765C26B44904E030B1C587DB9@DM4PR12MB5165.namprd12.prod.outlook.com>
2021-09-15  4:13     ` 回复: " Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e70ea465-93c1-842c-69b4-88f9d0aa5217@amd.com \
    --to=andrey.grodzovsky@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=xinhui.pan@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox