All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: dri-devel@lists.freedesktop.org
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Steven Price <steven.price@arm.com>,
	Rob Herring <robh+dt@kernel.org>,
	Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>,
	Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH v3 09/15] drm/panfrost: Simplify the reset serialization logic
Date: Fri, 25 Jun 2021 18:22:35 +0200	[thread overview]
Message-ID: <20210625182235.13061360@collabora.com> (raw)
In-Reply-To: <20210625133327.2598825-10-boris.brezillon@collabora.com>

On Fri, 25 Jun 2021 15:33:21 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:


> @@ -379,57 +370,73 @@ void panfrost_job_enable_interrupts(struct panfrost_device *pfdev)
>  	job_write(pfdev, JOB_INT_MASK, irq_mask);
>  }
>  
> -static bool panfrost_scheduler_stop(struct panfrost_queue_state *queue,
> -				    struct drm_sched_job *bad)
> +static void panfrost_reset(struct panfrost_device *pfdev,
> +			   struct drm_sched_job *bad)
>  {
> -	enum panfrost_queue_status old_status;
> -	bool stopped = false;
> +	unsigned int i;
> +	bool cookie;
>  
> -	mutex_lock(&queue->lock);
> -	old_status = atomic_xchg(&queue->status,
> -				 PANFROST_QUEUE_STATUS_STOPPED);
> -	if (old_status == PANFROST_QUEUE_STATUS_STOPPED)
> -		goto out;
> +	if (WARN_ON(!atomic_read(&pfdev->reset.pending)))
> +		return;
> +
> +	/* Stop the schedulers.
> +	 *
> +	 * FIXME: We temporarily get out of the dma_fence_signalling section
> +	 * because the cleanup path generate lockdep splats when taking locks
> +	 * to release job resources. We should rework the code to follow this
> +	 * pattern:
> +	 *
> +	 *	try_lock
> +	 *	if (locked)
> +	 *		release
> +	 *	else
> +	 *		schedule_work_to_release_later
> +	 */
> +	for (i = 0; i < NUM_JOB_SLOTS; i++)
> +		drm_sched_stop(&pfdev->js->queue[i].sched, bad);
> +
> +	cookie = dma_fence_begin_signalling();
>  
> -	WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE);
> -	drm_sched_stop(&queue->sched, bad);
>  	if (bad)
>  		drm_sched_increase_karma(bad);
>  
> -	stopped = true;
> +	spin_lock(&pfdev->js->job_lock);
> +	for (i = 0; i < NUM_JOB_SLOTS; i++) {
> +		if (pfdev->jobs[i]) {
> +			pm_runtime_put_noidle(pfdev->dev);
> +			panfrost_devfreq_record_idle(&pfdev->pfdevfreq);
> +			pfdev->jobs[i] = NULL;
> +		}
> +	}
> +	spin_unlock(&pfdev->js->job_lock);
>  
> -	/*
> -	 * Set the timeout to max so the timer doesn't get started
> -	 * when we return from the timeout handler (restored in
> -	 * panfrost_scheduler_start()).
> +	panfrost_device_reset(pfdev);
> +
> +	/* GPU has been reset, we can cancel timeout/fault work that may have
> +	 * been queued in the meantime and clear the reset pending bit.
>  	 */
> -	queue->sched.timeout = MAX_SCHEDULE_TIMEOUT;
> +	atomic_set(&pfdev->reset.pending, 0);
> +	cancel_work_sync(&pfdev->reset.work);

This is introducing a deadlock since panfrost_reset() might be called
from the reset handler, and cancel_work_sync() waits for the handler to
return. Unfortunately there's no cancel_work() variant, so I'll just
remove the

	WARN_ON(!atomic_read(&pfdev->reset.pending)

and return directly when the pending bit is cleared.

> +	for (i = 0; i < NUM_JOB_SLOTS; i++)
> +		cancel_delayed_work(&pfdev->js->queue[i].sched.work_tdr);
>  
> -out:
> -	mutex_unlock(&queue->lock);
>  
> -	return stopped;
> -}
> +	/* Now resubmit jobs that were previously queued but didn't have a
> +	 * chance to finish.
> +	 * FIXME: We temporarily get out of the DMA fence signalling section
> +	 * while resubmitting jobs because the job submission logic will
> +	 * allocate memory with the GFP_KERNEL flag which can trigger memory
> +	 * reclaim and exposes a lock ordering issue.
> +	 */
> +	dma_fence_end_signalling(cookie);
> +	for (i = 0; i < NUM_JOB_SLOTS; i++)
> +		drm_sched_resubmit_jobs(&pfdev->js->queue[i].sched);
> +	cookie = dma_fence_begin_signalling();
>  
> -static void panfrost_scheduler_start(struct panfrost_queue_state *queue)
> -{
> -	enum panfrost_queue_status old_status;
> +	for (i = 0; i < NUM_JOB_SLOTS; i++)
> +		drm_sched_start(&pfdev->js->queue[i].sched, true);
>  
> -	mutex_lock(&queue->lock);
> -	old_status = atomic_xchg(&queue->status,
> -				 PANFROST_QUEUE_STATUS_STARTING);
> -	WARN_ON(old_status != PANFROST_QUEUE_STATUS_STOPPED);
> -
> -	/* Restore the original timeout before starting the scheduler. */
> -	queue->sched.timeout = msecs_to_jiffies(JOB_TIMEOUT_MS);
> -	drm_sched_resubmit_jobs(&queue->sched);
> -	drm_sched_start(&queue->sched, true);
> -	old_status = atomic_xchg(&queue->status,
> -				 PANFROST_QUEUE_STATUS_ACTIVE);
> -	if (old_status == PANFROST_QUEUE_STATUS_FAULT_PENDING)
> -		drm_sched_fault(&queue->sched);
> -
> -	mutex_unlock(&queue->lock);
> +	dma_fence_end_signalling(cookie);
>  }
>  

  parent reply	other threads:[~2021-06-25 16:22 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-25 13:33 [PATCH v3 00/15] drm/panfrost: Misc improvements Boris Brezillon
2021-06-25 13:33 ` [PATCH v3 01/15] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr Boris Brezillon
2021-06-25 15:07   ` Steven Price
2021-06-25 15:18     ` Boris Brezillon
2021-06-25 13:33 ` [PATCH v3 02/15] drm/panfrost: Make ->run_job() return an ERR_PTR() when appropriate Boris Brezillon
2021-06-25 13:45   ` Alyssa Rosenzweig
2021-06-25 13:33 ` [PATCH v3 03/15] drm/panfrost: Get rid of the unused JS_STATUS_EVENT_ACTIVE definition Boris Brezillon
2021-06-25 13:39   ` Alyssa Rosenzweig
2021-06-25 13:33 ` [PATCH v3 04/15] drm/panfrost: Drop the pfdev argument passed to panfrost_exception_name() Boris Brezillon
2021-06-25 13:45   ` Alyssa Rosenzweig
2021-06-25 13:33 ` [PATCH v3 05/15] drm/panfrost: Expose exception types to userspace Boris Brezillon
2021-06-25 13:42   ` Alyssa Rosenzweig
2021-06-25 14:21     ` Boris Brezillon
2021-06-25 15:32       ` Steven Price
2021-06-25 15:40         ` Boris Brezillon
2021-06-25 13:33 ` [PATCH v3 06/15] drm/panfrost: Do the exception -> string translation using a table Boris Brezillon
2021-06-25 13:41   ` Alyssa Rosenzweig
2021-06-25 15:35   ` Steven Price
2021-06-25 13:33 ` [PATCH v3 07/15] drm/panfrost: Expose a helper to trigger a GPU reset Boris Brezillon
2021-06-25 13:46   ` Alyssa Rosenzweig
2021-06-25 13:33 ` [PATCH v3 08/15] drm/panfrost: Use a threaded IRQ for job interrupts Boris Brezillon
2021-06-25 13:47   ` Alyssa Rosenzweig
2021-06-25 14:37     ` Boris Brezillon
2021-06-25 15:40   ` Steven Price
2021-06-25 13:33 ` [PATCH v3 09/15] drm/panfrost: Simplify the reset serialization logic Boris Brezillon
2021-06-25 15:42   ` Steven Price
2021-06-25 16:22   ` Boris Brezillon [this message]
2021-06-25 13:33 ` [PATCH v3 10/15] drm/panfrost: Make sure job interrupts are masked before resetting Boris Brezillon
2021-06-25 15:55   ` Steven Price
2021-06-25 16:02     ` Boris Brezillon
2021-06-25 16:11       ` Steven Price
2021-06-25 13:33 ` [PATCH v3 11/15] drm/panfrost: Disable the AS on unhandled page faults Boris Brezillon
2021-06-25 16:10   ` Steven Price
2021-06-25 13:33 ` [PATCH v3 12/15] drm/panfrost: Reset the GPU when the AS_ACTIVE bit is stuck Boris Brezillon
2021-06-25 13:33 ` [PATCH v3 13/15] drm/panfrost: Don't reset the GPU on job faults unless we really have to Boris Brezillon
2021-06-25 13:33 ` [PATCH v3 14/15] drm/panfrost: Kill in-flight jobs on FD close Boris Brezillon
2021-06-25 13:43   ` Lucas Stach
2021-06-25 14:46     ` Boris Brezillon
2021-06-25 13:33 ` [PATCH v3 15/15] drm/panfrost: Queue jobs on the hardware Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210625182235.13061360@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=alyssa.rosenzweig@collabora.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=steven.price@arm.com \
    --cc=tomeu.vizoso@collabora.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.