Re: [PATCH v2 2/2] drm/panfrost: Queue jobs on the hardware

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Boris Brezillon <boris.brezillon@collabora.com>
To: Steven Price <steven.price@arm.com>
Cc: dri-devel@lists.freedesktop.org, Rob Herring <robh+dt@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>,
	Tomeu Vizoso <tomeu.vizoso@collabora.com>
Subject: Re: [PATCH v2 2/2] drm/panfrost: Queue jobs on the hardware
Date: Thu, 24 Jun 2021 11:56:32 +0200	[thread overview]
Message-ID: <20210624115632.6ac8c073@collabora.com> (raw)
In-Reply-To: <c797d32a-1d0f-3d67-d7cd-fc9f9522d741@arm.com>

On Thu, 24 Jun 2021 10:23:51 +0100
Steven Price <steven.price@arm.com> wrote:

> >  static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status)
> >  {
> > -	int j;
> > +	struct panfrost_job *done[NUM_JOB_SLOTS][2] = {};
> > +	struct panfrost_job *failed[NUM_JOB_SLOTS] = {};
> > +	u32 js_state, js_events = 0;
> > +	unsigned int i, j;
> >  
> > -	dev_dbg(pfdev->dev, "jobslot irq status=%x\n", status);
> > +	while (status) {
> > +		for (j = 0; j < NUM_JOB_SLOTS; j++) {
> > +			if (status & JOB_INT_MASK_DONE(j)) {
> > +				if (done[j][0]) {
> > +					done[j][1] = panfrost_dequeue_job(pfdev, j);
> > +					WARN_ON(!done[j][1]);
> > +				} else {
> > +					done[j][0] = panfrost_dequeue_job(pfdev, j);
> > +					WARN_ON(!done[j][0]);  
> 
> NIT: I'd be tempted to move this WARN_ON into panfrost_dequeue_job() as
> it's relevant for any call to the function.

Makes sense. I'll move those WARN_ON()s.

> 
> > +				}
> > +			}
> >  
> > -	for (j = 0; status; j++) {
> > -		u32 mask = MK_JS_MASK(j);
> > +			if (status & JOB_INT_MASK_ERR(j)) {
> > +				/* Cancel the next submission. Will be submitted
> > +				 * after we're done handling this failure if
> > +				 * there's no reset pending.
> > +				 */
> > +				job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP);
> > +				failed[j] = panfrost_dequeue_job(pfdev, j);
> > +			}
> > +		}
> >  
> > -		if (!(status & mask))
> > +		/* JS_STATE is sampled when JOB_INT_CLEAR is written.
> > +		 * For each BIT(slot) or BIT(slot + 16) bit written to
> > +		 * JOB_INT_CLEAR, the corresponding bits in JS_STATE
> > +		 * (BIT(slot) and BIT(slot + 16)) are updated, but this
> > +		 * is racy. If we only have one job done at the time we
> > +		 * read JOB_INT_RAWSTAT but the second job fails before we
> > +		 * clear the status, we end up with a status containing
> > +		 * only the DONE bit and consider both jobs as DONE since
> > +		 * JS_STATE reports both NEXT and CURRENT as inactive.
> > +		 * To prevent that, let's repeat this clear+read steps
> > +		 * until status is 0.
> > +		 */
> > +		job_write(pfdev, JOB_INT_CLEAR, status);
> > +		js_state = job_read(pfdev, JOB_INT_JS_STATE);  
> 
> This seems a bit dodgy. The spec says that JOB_INT_JS_STATE[1] is
> updated only for the job slots which have bits set in the JOB_INT_CLEAR.
> So there's potentially two problems:
> 
>  * The spec makes no gaurentee about the values of the bits for other
> slots. But we're not masking off those bits.
> 
>  * If we loop (e.g. because the other slot finishes while handling the
> first interrupt) then we may lose the state for the first slot.
> 
> I'm not sure what the actual hardware returns in the bits which are
> unrelated to the previous JOB_INT_CLEAR - kbase is careful only to
> consider the bits relating to the slot it's currently dealing with.

Hm, I see. How about something like that?

        struct panfrost_job *done[NUM_JOB_SLOTS][2] = {};
        struct panfrost_job *failed[NUM_JOB_SLOTS] = {};
        u32 js_state = 0, js_events = 0;
        unsigned int i, j;

        while (status) {
                u32 js_state_mask = 0;

                for (j = 0; j < NUM_JOB_SLOTS; j++) {
                        if (status & MK_JS_MASK(j))
                                js_state_mask |= MK_JS_MASK(j);

                        if (status & JOB_INT_MASK_DONE(j)) {
                                if (done[j][0]) {
                                        done[j][1] = panfrost_dequeue_job(pfdev, j);
                                        WARN_ON(!done[j][1]);
                                } else {
                                        done[j][0] = panfrost_dequeue_job(pfdev, j);
                                        WARN_ON(!done[j][0]);
                                }
                        }

                        if (status & JOB_INT_MASK_ERR(j)) {
                                /* Cancel the next submission. Will be submitted
                                 * after we're done handling this failure if
                                 * there's no reset pending.
                                 */
                                job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP);
                                failed[j] = panfrost_dequeue_job(pfdev, j);
                        }
                }

                /* JS_STATE is sampled when JOB_INT_CLEAR is written.
                 * For each BIT(slot) or BIT(slot + 16) bit written to
                 * JOB_INT_CLEAR, the corresponding bits in JS_STATE
                 * (BIT(slot) and BIT(slot + 16)) are updated, but this
                 * is racy. If we only have one job done at the time we
                 * read JOB_INT_RAWSTAT but the second job fails before we
                 * clear the status, we end up with a status containing
                 * only the DONE bit and consider both jobs as DONE since
                 * JS_STATE reports both NEXT and CURRENT as inactive.
                 * To prevent that, let's repeat this clear+read steps
                 * until status is 0.
                 */
                job_write(pfdev, JOB_INT_CLEAR, status);
                js_state &= ~js_state_mask;
                js_state |= job_read(pfdev, JOB_INT_JS_STATE) & js_state_mask;
                js_events |= status;
                status = job_read(pfdev, JOB_INT_RAWSTAT);
        }


> 
> [1] Actually JOB_IRQ_JS_STATE - I sometimes think we should rename to
> match the spec, I keep searching the docs for the wrong name ;)

I can add a patch doing that :-).

> >  
> > -		status &= ~mask;
> > +	for (j = 0; j < NUM_JOB_SLOTS; j++) {
> > +		if (!(js_events & MK_JS_MASK(j)))
> > +			continue;
> > +
> > +		if (!failed[j] || !pfdev->jobs[j][0])
> > +			continue;
> > +
> > +		if (pfdev->jobs[j][0]->jc == 0) {
> > +			/* The job was cancelled, signal the fence now */
> > +			struct panfrost_job *canceled = panfrost_dequeue_job(pfdev, j);
> > +
> > +			panfrost_mmu_as_put(pfdev, canceled->file_priv->mmu);
> > +			panfrost_devfreq_record_idle(&pfdev->pfdevfreq);
> > +
> > +			dma_fence_set_error(canceled->done_fence, -ECANCELED);
> > +			dma_fence_signal_locked(canceled->done_fence);
> > +			pm_runtime_put_autosuspend(pfdev->dev);
> > +		} else if (!atomic_read(&pfdev->reset.pending)) {
> > +			/* Resume the job we stopped if no reset is pending */
> > +			job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_START);  
> 
> If I'm following correctly this is resubmitting a soft-stopped job. But
> I haven't seen where the new JC pointer is written.

Not exactly. It's submitting a job that was queued but not started.
When we see a fault, we write NOP to COMMAND_NEXT so the next job is
not started when the interrupt is cleared. And here we're just
requeuing it.

next prev parent reply	other threads:[~2021-06-24  9:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-21 14:02 [PATCH v2 0/2] drm/panfrost: Queue jobs on the hardware Boris Brezillon
2021-06-21 14:02 ` [PATCH v2 1/2] drm/panfrost: Use a threaded IRQ for job interrupts Boris Brezillon
2021-06-21 16:09   ` Steven Price
2021-06-24  9:05     ` Steven Price
2021-06-21 14:02 ` [PATCH v2 2/2] drm/panfrost: Queue jobs on the hardware Boris Brezillon
2021-06-21 16:08   ` Steven Price
2021-06-21 18:20     ` Alyssa Rosenzweig
2021-06-22  6:42       ` Boris Brezillon
2021-06-22  6:50     ` Boris Brezillon
2021-06-24  9:23   ` Steven Price
2021-06-24  9:56     ` Boris Brezillon [this message]
2021-06-24 10:10       ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210624115632.6ac8c073@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=alyssa.rosenzweig@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=steven.price@arm.com \
    --cc=tomeu.vizoso@collabora.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.