Re: [PATCH 9/9] habanalabs: allow multiple processes to open FD

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Greg KH <gregkh@linuxfoundation.org>
To: Oded Gabbay <oded.gabbay@gmail.com>
Cc: linux-kernel@vger.kernel.org, oshpigelman@habana.ai, ttayar@habana.ai
Subject: Re: [PATCH 9/9] habanalabs: allow multiple processes to open FD
Date: Sun, 28 Jul 2019 13:44:35 +0200	[thread overview]
Message-ID: <20190728114435.GB12033@kroah.com> (raw)
In-Reply-To: <20190728112818.30397-10-oded.gabbay@gmail.com>

On Sun, Jul 28, 2019 at 02:28:18PM +0300, Oded Gabbay wrote:
> This patch removes the limitation of a single process that can open the
> device.
> 
> Now, there is no limitation on the number of processes that can open the
> device and have a valid FD.
> 
> However, only a single process can perform compute operations. This is
> enforced by allowing only a single process to have a compute context.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
> ---
>  drivers/misc/habanalabs/context.c          | 100 +++++++++++++++------
>  drivers/misc/habanalabs/device.c           |  18 ++--
>  drivers/misc/habanalabs/habanalabs.h       |   1 -
>  drivers/misc/habanalabs/habanalabs_drv.c   |   8 --
>  drivers/misc/habanalabs/habanalabs_ioctl.c |   7 +-
>  5 files changed, 85 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/misc/habanalabs/context.c b/drivers/misc/habanalabs/context.c
> index 57bbe59da9b6..f64220fc3a55 100644
> --- a/drivers/misc/habanalabs/context.c
> +++ b/drivers/misc/habanalabs/context.c
> @@ -56,7 +56,7 @@ void hl_ctx_do_release(struct kref *ref)
>  	kfree(ctx);
>  }
>  
> -int hl_ctx_create(struct hl_device *hdev, struct hl_fpriv *hpriv)
> +static int hl_ctx_create(struct hl_device *hdev, struct hl_fpriv *hpriv)
>  {
>  	struct hl_ctx_mgr *mgr = &hpriv->ctx_mgr;
>  	struct hl_ctx *ctx;
> @@ -89,9 +89,6 @@ int hl_ctx_create(struct hl_device *hdev, struct hl_fpriv *hpriv)
>  	/* TODO: remove for multiple contexts per process */
>  	hpriv->ctx = ctx;
>  
> -	/* TODO: remove the following line for multiple process support */
> -	hdev->compute_ctx = ctx;
> -
>  	return 0;
>  
>  remove_from_idr:
> @@ -206,13 +203,22 @@ bool hl_ctx_is_valid(struct hl_fpriv *hpriv, bool requires_compute_ctx)
>  	int rc;
>  
>  	/* First thing, to minimize latency impact, check if context exists.
> -	 * Also check if it matches the requirements. If so, exit immediately
> +	 * This is relevant for the "steady state", where a process context
> +	 * already exists, and we want to minimize the latency in command
> +	 * submissions. In that case, we want to see if we can quickly exit
> +	 * with a valid answer.
> +	 *
> +	 * If a context doesn't exists, we must grab the mutex. Otherwise,
> +	 * there can be nasty races in case of multi-threaded application.
> +	 *
> +	 * So, if the context exists and we don't need a compute context,
> +	 * that's fine. If it exists and the context we have is the compute
> +	 * context, that's also fine. Other then that, we can't check anything
> +	 * without the mutex.
>  	 */
> -	if (hpriv->ctx) {
> -		if ((requires_compute_ctx) && (hdev->compute_ctx != hpriv->ctx))
> -			return false;
> +	if ((hpriv->ctx) && ((!requires_compute_ctx) ||
> +					(hdev->compute_ctx == hpriv->ctx)))
>  		return true;
> -	}
>  
>  	mutex_lock(&hdev->lazy_ctx_creation_lock);
>  
> @@ -222,35 +228,73 @@ bool hl_ctx_is_valid(struct hl_fpriv *hpriv, bool requires_compute_ctx)
>  	 * creation of a context
>  	 */
>  	if (hpriv->ctx) {
> -		if ((requires_compute_ctx) && (hdev->compute_ctx != hpriv->ctx))
> +		if ((!requires_compute_ctx) ||
> +					(hdev->compute_ctx == hpriv->ctx))
> +			goto unlock_mutex;
> +
> +		if (hdev->compute_ctx) {
>  			valid = false;
> -		goto unlock_mutex;
> -	}
> +			goto unlock_mutex;
> +		}
>  
> -	/* If we already have a compute context, there is no point
> -	 * of creating one in case we are called from ioctl that needs
> -	 * a compute context
> -	 */
> -	if ((hdev->compute_ctx) && (requires_compute_ctx)) {
> +		/* If we reached here, it means we have a non-compute context,
> +		 * but there is no compute context on the device. Therefore,
> +		 * we can try to "upgrade" the existing context to a compute
> +		 * context
> +		 */
> +		dev_dbg_ratelimited(hdev->dev,
> +				"Non-compute context %d exists\n",
> +				hpriv->ctx->asid);
> +
> +	} else if ((hdev->compute_ctx) && (requires_compute_ctx)) {
> +
> +		/* If we already have a compute context in the device, there is
> +		 * no point of creating one in case we are called from ioctl
> +		 * that needs a compute context
> +		 */
>  		dev_err(hdev->dev,
>  			"Can't create new compute context as one already exists\n");
>  		valid = false;
>  		goto unlock_mutex;
> -	}
> +	} else {
> +		/* If we reached here it is because there isn't a context for
> +		 * the process AND there is no compute context or compute
> +		 * context wasn't required. In any case, must create a context
> +		 * for the process
> +		 */
>  
> -	rc = hl_ctx_create(hdev, hpriv);
> -	if (rc) {
> -		dev_err(hdev->dev, "Failed to create context %d\n", rc);
> -		valid = false;
> -		goto unlock_mutex;
> +		rc = hl_ctx_create(hdev, hpriv);
> +		if (rc) {
> +			dev_err(hdev->dev, "Failed to create context %d\n", rc);
> +			valid = false;
> +			goto unlock_mutex;
> +		}
> +
> +		dev_dbg_ratelimited(hdev->dev, "Created context %d\n",
> +					hpriv->ctx->asid);
>  	}
>  
> -	/* Device is IDLE at this point so it is legal to change PLLs.
> -	 * There is no need to check anything because if the PLL is
> -	 * already HIGH, the set function will return without doing
> -	 * anything
> +	/* If we reached here then either we have a new context, or we can
> +	 * upgrade a non-compute context to a compute context. Do the upgrade
> +	 * only if the caller required a compute context
>  	 */
> -	hl_device_set_frequency(hdev, PLL_HIGH);
> +	if (requires_compute_ctx) {
> +		WARN(hdev->compute_ctx,
> +			"Compute context exists but driver is setting a new one");

This will trigger syzbot and will reboot machines that have
'panic-on-warn' set (i.e. all cloud systems).  So be _VERY_ careful
about this.

If a user can trigger this, do not use WARN(), that's not what it is
for.

thanks,

greg k-h

next prev parent reply	other threads:[~2019-07-28 11:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-28 11:28 [PATCH 0/9] habanalabs: support open device by multiple processes Oded Gabbay
2019-07-28 11:28 ` [PATCH 1/9] habanalabs: add handle field to context structure Oded Gabbay
2019-07-28 11:28 ` [PATCH 2/9] habanalabs: verify context is valid in IOCTLs Oded Gabbay
2019-07-28 11:28 ` [PATCH 3/9] habanalabs: create context in lazy mode Oded Gabbay
2019-07-28 11:28 ` [PATCH 4/9] habanalabs: don't change frequency if user context is valid Oded Gabbay
2019-07-28 11:28 ` [PATCH 5/9] habanalabs: maintain a list of file private data objects Oded Gabbay
2019-07-28 11:28 ` [PATCH 6/9] habanalabs: define user context as compute context Oded Gabbay
2019-07-28 11:28 ` [PATCH 7/9] habanalabs: protect only pointer dereference in hard-reset Oded Gabbay
2019-07-28 11:28 ` [PATCH 8/9] habanalabs: kill user process after CS rollback Oded Gabbay
2019-07-28 11:28 ` [PATCH 9/9] habanalabs: allow multiple processes to open FD Oded Gabbay
2019-07-28 11:44   ` Greg KH [this message]
2019-07-28 11:56     ` Oded Gabbay
2019-07-28 12:04       ` Greg KH
2019-07-28 12:06         ` Oded Gabbay
2019-07-28 12:12           ` Greg KH
2019-07-28 12:18             ` Oded Gabbay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190728114435.GB12033@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oded.gabbay@gmail.com \
    --cc=oshpigelman@habana.ai \
    --cc=ttayar@habana.ai \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.