All of lore.kernel.org
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com>
To: Shashank Sharma <shashank.sharma@amd.com>, amd-gfx@lists.freedesktop.org
Cc: Alex Deucher <alexander.deucher@amd.com>,
	mukul.joshi@amd.com, contactshashanksharma@gmail.com,
	Christian Koenig <christian.koenig@amd.com>,
	arvind.yadav@amd.com
Subject: Re: [PATCH v2 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells
Date: Fri, 21 Apr 2023 16:11:19 -0400	[thread overview]
Message-ID: <b8bf2cc9-6a72-5fce-e19f-cf69e37f382e@amd.com> (raw)
In-Reply-To: <20230412162537.1357-10-shashank.sharma@amd.com>

On 2023-04-12 12:25, Shashank Sharma wrote:
> This patch:
> - adds a doorbell object in kfd pdd structure.
> - allocates doorbells for a process while creating its pdd.
> - frees the doorbells with pdd destroy.
> - removes previous calls to allocate process doorbells as
>    its not required anymore.
>
> PS: This patch ensures that we don't break the existing KFD
>      functionality, but now KFD userspace library should also
>      create doorbell pages as AMDGPU GEM objects using libdrm
>      functions in userspace. The reference code for the same
>      is available with AMDGPU Usermode queue libdrm MR. Once
>      this is done, we will not need to create process doorbells
>      in kernel.

I like this approach of keeping existing functionality working, but 
having a transition path for user mode. If I understand it correctly, 
allocating doorbells as GEM objects would enable overcommitment of 
doorbells. That's a capability we haven't had in KFD up to now. Trying 
to launch too many processes that need doorbells would simlpy fail.

With overcommitment, idle processes could have their doorbells objects 
evicted, sot that active processes can work. Doorbell objects would need 
eviction fences to ensure that processes are not scheduled on the GPU 
while their doorbell allocation is in use by other processes. The 
doorbell offset in the queue properties would need to be updated when 
doorbells are validated and potentially moved to different physical pages.

These details would need to be worked out in kernel mode before user 
mode can transition to GEM for doorbell allocation.


>
> V2: - Do not use doorbell wrapper API, use amdgpu_bo_create_kernel
>        instead (Alex).
>      - Do not use custom doorbell structure, instead use separate
>        variables for bo and doorbell_bitmap (Alex)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 13 ----
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c |  8 +--
>   drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c     | 65 ++++++++++---------
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  9 ++-
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 21 +++---
>   .../amd/amdkfd/kfd_process_queue_manager.c    | 22 +++----
>   6 files changed, 64 insertions(+), 74 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 6d291aa6386b..0e40756417e5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -327,12 +327,6 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
>   		goto err_bind_process;
>   	}
>   
> -	if (!pdd->doorbell_index &&
> -	    kfd_alloc_process_doorbells(dev, &pdd->doorbell_index) < 0) {
> -		err = -ENOMEM;
> -		goto err_alloc_doorbells;
> -	}
> -

You're moving this to kfd_create_process_device_data, which means 
processes will allocate and pin doorbells, even if they never create 
queues. We specifically moved this to first queue creation to allow more 
ROCm processes to be started, as long as they don't create queues. This 
is important on systems with many GPUs where not all processes need to 
use all GPU. It also helps with certain tools that need to initialized 
ROCr to gather information, but don't need to create queues. See this 
patch for reference:

commit 16f0013157bf8c95d10b9360491e3c920f85641e
Author: Felix Kuehling <Felix.Kuehling@amd.com>
Date:   Wed Aug 3 14:45:45 2022 -0400

     drm/amdkfd: Allocate doorbells only when needed
     
     Only allocate doorbells when the first queue is created on a GPU or the
     doorbells need to be mapped into CPU or GPU virtual address space. This
     avoids allocating doorbells unnecessarily and can allow more processes
     to use KFD on multi-GPU systems.
     
     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
     Reviewed-by: Kent Russell <kent.Russell@amd.com>
     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


Until we have an alternative way to allocate doorbells that supports 
overcommitment, late doorbell allocation is an important feature. This 
change will cause regressions for some ROCm users.

Regards,
   Felix



>   	/* Starting with GFX11, wptr BOs must be mapped to GART for MES to determine work
>   	 * on unmapped queues for usermode queue oversubscription (no aggregated doorbell)
>   	 */
> @@ -410,7 +404,6 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
>   	if (wptr_bo)
>   		amdgpu_amdkfd_free_gtt_mem(dev->adev, wptr_bo);
>   err_wptr_map_gart:
> -err_alloc_doorbells:
>   err_bind_process:
>   err_pdd:
>   	mutex_unlock(&p->mutex);
> @@ -2163,12 +2156,6 @@ static int criu_restore_devices(struct kfd_process *p,
>   			ret = PTR_ERR(pdd);
>   			goto exit;
>   		}
> -
> -		if (!pdd->doorbell_index &&
> -		    kfd_alloc_process_doorbells(pdd->dev, &pdd->doorbell_index) < 0) {
> -			ret = -ENOMEM;
> -			goto exit;
> -		}
>   	}
>   
>   	/*
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index ecb4c3abc629..855bf8ce3f16 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -371,7 +371,7 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
>   			unsigned int found;
>   
>   			found = find_first_zero_bit(qpd->doorbell_bitmap,
> -						KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
> +						    KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
>   			if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
>   				pr_debug("No doorbells available");
>   				return -EBUSY;
> @@ -381,9 +381,9 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
>   		}
>   	}
>   
> -	q->properties.doorbell_off =
> -		kfd_get_doorbell_dw_offset_in_bar(dev, qpd_to_pdd(qpd),
> -					  q->doorbell_id);
> +	q->properties.doorbell_off = amdgpu_doorbell_index_on_bar(dev->adev,
> +								  qpd->proc_doorbells,
> +								  q->doorbell_id);
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> index 82b4a56b0afc..718cfe9cb4f5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> @@ -227,46 +227,47 @@ uint64_t kfd_get_number_elems(struct kfd_dev *kfd)
>   
>   phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
>   {
> -	if (!pdd->doorbell_index) {
> -		int r = kfd_alloc_process_doorbells(pdd->dev,
> -						    &pdd->doorbell_index);
> -		if (r < 0)
> -			return 0;
> -	}
> +	struct amdgpu_device *adev = pdd->dev->adev;
> +	uint32_t first_db_index;
>   
> -	return pdd->dev->doorbell_base +
> -		pdd->doorbell_index * kfd_doorbell_process_slice(pdd->dev);
> +	first_db_index = amdgpu_doorbell_index_on_bar(adev, pdd->qpd.proc_doorbells, 0);
> +	return adev->doorbell.base + first_db_index * sizeof(uint32_t);
>   }
>   
> -int kfd_alloc_process_doorbells(struct kfd_dev *kfd, unsigned int *doorbell_index)
> +int kfd_alloc_process_doorbells(struct kfd_dev *kfd, struct kfd_process_device *pdd)
>   {
> -	int r = 0;
> -
> -	if (!kfd->shared_resources.enable_mes)
> -		r = ida_simple_get(&kfd->doorbell_ida, 1,
> -				   kfd->max_doorbell_slices, GFP_KERNEL);
> -	else
> -		r = amdgpu_mes_alloc_process_doorbells(
> -				(struct amdgpu_device *)kfd->adev,
> -				doorbell_index);
> +	int r;
> +	struct qcm_process_device *qpd = &pdd->qpd;
>   
> -	if (r > 0)
> -		*doorbell_index = r;
> +	/* Allocate bitmap for dynamic doorbell allocation */
> +	qpd->doorbell_bitmap = bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
> +					     GFP_KERNEL);
> +	if (!qpd->doorbell_bitmap) {
> +		DRM_ERROR("Failed to allocate process doorbell bitmap\n");
> +		return -ENOMEM;
> +	}
>   
> -	if (r < 0)
> -		pr_err("Failed to allocate process doorbells\n");
> +	/* Allocate doorbells for this process */
> +	r = amdgpu_bo_create_kernel(kfd->adev,
> +				    kfd_doorbell_process_slice(kfd),
> +				    PAGE_SIZE,
> +				    AMDGPU_GEM_DOMAIN_DOORBELL,
> +				    &qpd->proc_doorbells,
> +				    NULL,
> +				    NULL);
> +	if (r) {
> +		DRM_ERROR("Failed to allocate process doorbells\n");
> +		bitmap_free(qpd->doorbell_bitmap);
> +		return r;
> +	}
>   
> -	return r;
> +	return 0;
>   }
>   
> -void kfd_free_process_doorbells(struct kfd_dev *kfd, unsigned int doorbell_index)
> +void kfd_free_process_doorbells(struct kfd_dev *kfd, struct kfd_process_device *pdd)
>   {
> -	if (doorbell_index) {
> -		if (!kfd->shared_resources.enable_mes)
> -			ida_simple_remove(&kfd->doorbell_ida, doorbell_index);
> -		else
> -			amdgpu_mes_free_process_doorbells(
> -					(struct amdgpu_device *)kfd->adev,
> -					doorbell_index);
> -	}
> +	struct qcm_process_device *qpd = &pdd->qpd;
> +
> +	bitmap_free(qpd->doorbell_bitmap);
> +	amdgpu_bo_free_kernel(&qpd->proc_doorbells, NULL, NULL);
>   }
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 0b199eb6dc88..dfff77379acb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -661,7 +661,10 @@ struct qcm_process_device {
>   	uint64_t ib_base;
>   	void *ib_kaddr;
>   
> -	/* doorbell resources per process per device */
> +	/* doorbells for kfd process */
> +	struct amdgpu_bo *proc_doorbells;
> +
> +	/* bitmap for dynamic doorbell allocation from the bo */
>   	unsigned long *doorbell_bitmap;
>   };
>   
> @@ -1009,9 +1012,9 @@ unsigned int kfd_get_doorbell_dw_offset_in_bar(struct kfd_dev *kfd,
>   					unsigned int doorbell_id);
>   phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd);
>   int kfd_alloc_process_doorbells(struct kfd_dev *kfd,
> -				unsigned int *doorbell_index);
> +				 struct kfd_process_device *pdd);
>   void kfd_free_process_doorbells(struct kfd_dev *kfd,
> -				unsigned int doorbell_index);
> +				 struct kfd_process_device *pdd);
>   /* GTT Sub-Allocator */
>   
>   int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 51b1683ac5c1..217ff72a97b0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1037,10 +1037,9 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
>   			free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
>   				get_order(KFD_CWSR_TBA_TMA_SIZE));
>   
> -		bitmap_free(pdd->qpd.doorbell_bitmap);
>   		idr_destroy(&pdd->alloc_idr);
>   
> -		kfd_free_process_doorbells(pdd->dev, pdd->doorbell_index);
> +		kfd_free_process_doorbells(pdd->dev, pdd);
>   
>   		if (pdd->dev->shared_resources.enable_mes)
>   			amdgpu_amdkfd_free_gtt_mem(pdd->dev->adev,
> @@ -1453,11 +1452,6 @@ static int init_doorbell_bitmap(struct qcm_process_device *qpd,
>   	if (!KFD_IS_SOC15(dev))
>   		return 0;
>   
> -	qpd->doorbell_bitmap = bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
> -					     GFP_KERNEL);
> -	if (!qpd->doorbell_bitmap)
> -		return -ENOMEM;
> -
>   	/* Mask out doorbells reserved for SDMA, IH, and VCN on SOC15. */
>   	pr_debug("reserved doorbell 0x%03x - 0x%03x\n", range_start, range_end);
>   	pr_debug("reserved doorbell 0x%03x - 0x%03x\n",
> @@ -1499,9 +1493,15 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>   	if (!pdd)
>   		return NULL;
>   
> +	retval = kfd_alloc_process_doorbells(dev, pdd);
> +	if (retval) {
> +		pr_err("failed to allocate process doorbells\n");
> +		goto err_free_pdd;
> +	}
> +
>   	if (init_doorbell_bitmap(&pdd->qpd, dev)) {
>   		pr_err("Failed to init doorbell for process\n");
> -		goto err_free_pdd;
> +		goto err_free_db;
>   	}
>   
>   	pdd->dev = dev;
> @@ -1529,7 +1529,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>   						false);
>   		if (retval) {
>   			pr_err("failed to allocate process context bo\n");
> -			goto err_free_pdd;
> +			goto err_free_db;
>   		}
>   		memset(pdd->proc_ctx_cpu_ptr, 0, AMDGPU_MES_PROC_CTX_SIZE);
>   	}
> @@ -1541,6 +1541,9 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>   
>   	return pdd;
>   
> +err_free_db:
> +	kfd_free_process_doorbells(pdd->dev, pdd);
> +
>   err_free_pdd:
>   	kfree(pdd);
>   	return NULL;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 5137476ec18e..6c95586f08a6 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -344,17 +344,19 @@ int pqm_create_queue(struct process_queue_manager *pqm,
>   		goto err_create_queue;
>   	}
>   
> -	if (q && p_doorbell_offset_in_process)
> +	if (q && p_doorbell_offset_in_process) {
>   		/* Return the doorbell offset within the doorbell page
>   		 * to the caller so it can be passed up to user mode
>   		 * (in bytes).
> -		 * There are always 1024 doorbells per process, so in case
> -		 * of 8-byte doorbells, there are two doorbell pages per
> -		 * process.
> +		 * relative doorbell index = Absolute doorbell index -
> +		 * absolute index of first doorbell in the page.
>   		 */
> -		*p_doorbell_offset_in_process =
> -			(q->properties.doorbell_off * sizeof(uint32_t)) &
> -			(kfd_doorbell_process_slice(dev) - 1);
> +		uint32_t first_db_index = amdgpu_doorbell_index_on_bar(pdd->dev->adev,
> +									pdd->qpd.proc_doorbells, 0);
> +
> +		*p_doorbell_offset_in_process = (q->properties.doorbell_off
> +						- first_db_index) * sizeof(uint32_t);
> +	}
>   
>   	pr_debug("PQM After DQM create queue\n");
>   
> @@ -858,12 +860,6 @@ int kfd_criu_restore_queue(struct kfd_process *p,
>   		goto exit;
>   	}
>   
> -	if (!pdd->doorbell_index &&
> -	    kfd_alloc_process_doorbells(pdd->dev, &pdd->doorbell_index) < 0) {
> -		ret = -ENOMEM;
> -		goto exit;
> -	}
> -
>   	/* data stored in this order: mqd, ctl_stack */
>   	mqd = q_extra_data;
>   	ctl_stack = mqd + q_data->mqd_size;

  parent reply	other threads:[~2023-04-21 20:11 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 01/12] drm/amdgpu: create a new file for " Shashank Sharma
2023-04-13 10:48   ` Christian König
2023-04-13 10:51     ` Christian König
2023-04-12 16:25 ` [PATCH v2 02/12] drm/amdgpu: don't modify num_doorbells for mes Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 03/12] drm/amdgpu: add UAPI for allocating doorbell memory Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 04/12] drm/amdgpu: accommodate DOMAIN/PL_DOORBELL Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 05/12] drm/amdgpu: initialize ttm for doorbells Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 06/12] drm/amdgpu: create kernel doorbell pages Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 07/12] get absolute offset from doorbell index Shashank Sharma
2023-04-13 10:59   ` Christian König
2023-04-12 16:25 ` [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells Shashank Sharma
2023-04-13 11:02   ` Christian König
2023-04-13 13:08     ` Shashank Sharma
2023-04-21 19:58   ` Felix Kuehling
2023-04-22  6:39     ` Shashank Sharma
2023-04-24 19:56       ` Felix Kuehling
2023-04-25 19:59         ` Shashank Sharma
2023-06-09 19:33           ` Felix Kuehling
2023-04-12 16:25 ` [PATCH v2 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells Shashank Sharma
2023-04-13 11:07   ` Christian König
2023-04-21 20:11   ` Felix Kuehling [this message]
2023-04-22  6:45     ` Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 10/12] drm/amdgpu: remove unused functions and variables Shashank Sharma
2023-06-09 19:34   ` Felix Kuehling
2023-04-12 16:25 ` [PATCH v2 11/12] drm/amdgpu: use doorbell mgr for MES kernel doorbells Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 12/12] drm/amdgpu: cleanup MES process level doorbells Shashank Sharma
2023-04-13 11:19   ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8bf2cc9-6a72-5fce-e19f-cf69e37f382e@amd.com \
    --to=felix.kuehling@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=arvind.yadav@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=contactshashanksharma@gmail.com \
    --cc=mukul.joshi@amd.com \
    --cc=shashank.sharma@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.