intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
	Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Subject: Re: [PATCH v2 2/2] drm/xe/oa: Fix locking for stream->pollin
Date: Wed, 22 Jan 2025 19:04:21 -0800	[thread overview]
Message-ID: <85frlafcy2.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <Z5DHG7-R3pRmSmvx@intel.com>

On Wed, 22 Jan 2025 02:23:23 -0800, Rodrigo Vivi wrote:
>

Hi Rodrigo,

Thanks for your inputs, it helped me look at the code more closely and
clarify some things.

I have responded to your comments below, but in case you don't want to go
through that, the summary is that I would like to drop this patch from
consideration at this time, basically because it needs more changes.

> On Tue, Jan 21, 2025 at 08:02:04PM -0800, Ashutosh Dixit wrote:
> > Previously locking was skipped for stream->pollin. This "mostly" worked
> > because pollin is u32/bool, even when stream->pollin is accessed
> > concurrently in multiple threads. However, with stream->pollin moving under
> > stream->oa_buffer.ptr_lock in this series, implement the correct way to
> > access stream->pollin, which is to access it under
> > stream->oa_buffer.ptr_lock.
> >
> > v2: Update commit message to explain the "why" of this change (Rodrigo)
> >     Document the change in scope for stream->oa_buffer.ptr_lock (Rodrigo)
>
> First of all thanks for the rework.
> But I believe I didn't have enough coffee today yet, because
> I'm still failing to understand why...
>
> Breaking your explanation to see if I can understand:
> 'mostly' - Why mostly? Did we face bugs?
>
> 'worked because pollin is u32/bool' - this sounds like 'works by luck'

No, there are no known bugs in the current code in the kernel. That is what
'mostly' is supposed to mean: basically the requirements on this code are
pretty lax, and even if writes into the boolean stream->pollin from
different threads cross each other, or they stomp on each other, things
work out. So not 'works by luck' or 'works most of the time' but the code
works even in absence of this locking.

>
> 'with stream->pollin moving under stream->oa_buffer.ptr_lock' - Why?
>
> I believe this is the main why that I had yesterday and that continues
> today. Why are we using the oa_buffer pointer lock to also protect
> the a stream variable?
>
> Why don't you use the stream_lock? Or why don't you create a dedicated
> polling_lock?
>

So, stream->oa_buffer.ptr_lock _is_ the correct lock to use for
'stream->pollin', because 'stream->pollin' is intimately connected to the
same pointers 'ptr_lock' is protecting. You can see this in Patch 1.

Also, there is basically one 'struct xe_oa_buffer' per stream (and stream_lock
is a mutex which can't be used from interrupt context).

So rather than using a stream level lock, we can just move pollin into
stream->oa_buffer. So it will become stream->oa_buffer.pollin.

> I'm sorry for not been clear yesterday, wasting your time and 1 cycle...

No, thanks for your comments, it made me think about all this.

> > @@ -562,8 +563,10 @@ static ssize_t xe_oa_read(struct file *file, char __user *buf,
> >	 * Also in case of -EIO, we have already waited for data before returning
> >	 * -EIO, so need to wait again
> >	 */
> > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> >	if (ret != -ENOSPC && ret != -EIO)
> >		stream->pollin = false;
> > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);

Also this is the part I don't like, this should be moved into
xe_oa_append_reports() where stream->oa_buffer.ptr_lock is already
held. Otherwise we are dropping and re-grabbing the lock etc.

So imo the basic direction of this patch is correct but I want to make all
these changes and resubmit this as part of a separate series. So please
ignore this patch for now. I will go ahead and merge Patch 1 since that is
ok and was already reviewed as a separate series:

https://patchwork.freedesktop.org/series/143575/

Thanks.
--
Ashutosh



> >
> >	/* Possible values for ret are 0, -EFAULT, -ENOSPC, -EIO, -EINVAL, ... */
> >	return offset ?: (ret ?: -EAGAIN);
> > @@ -573,6 +576,7 @@ static __poll_t xe_oa_poll_locked(struct xe_oa_stream *stream,
> >				  struct file *file, poll_table *wait)
> >  {
> >	__poll_t events = 0;
> > +	unsigned long flags;
> >
> >	poll_wait(file, &stream->poll_wq, wait);
> >
> > @@ -582,8 +586,10 @@ static __poll_t xe_oa_poll_locked(struct xe_oa_stream *stream,
> >	 * in use. We rely on hrtimer xe_oa_poll_check_timer_cb to notify us when there
> >	 * are samples to read
> >	 */
> > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> >	if (stream->pollin)
> >		events |= EPOLLIN;
> > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> >
> >	return events;
> >  }
> > diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
> > index 52e33c37d5ee8..5c4ea13f646fc 100644
> > --- a/drivers/gpu/drm/xe/xe_oa_types.h
> > +++ b/drivers/gpu/drm/xe/xe_oa_types.h
> > @@ -159,7 +159,7 @@ struct xe_oa_buffer {
> >	/** @vaddr: mapped vaddr of the OA buffer */
> >	u8 *vaddr;
> >
> > -	/** @ptr_lock: Lock protecting reads/writes to head/tail pointers */
> > +	/** @ptr_lock: Lock protecting reads/writes to head/tail pointers and stream->pollin */
> >	spinlock_t ptr_lock;
> >
> >	/** @head: Cached head to read from */
> > --
> > 2.47.1
> >

  reply	other threads:[~2025-01-23  3:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-22  4:02 [PATCH v2 0/2] stream->pollin fixes Ashutosh Dixit
2025-01-22  4:02 ` [PATCH 1/2] drm/xe/oa: Set stream->pollin in xe_oa_buffer_check_unlocked Ashutosh Dixit
2025-01-22  4:02 ` [PATCH v2 2/2] drm/xe/oa: Fix locking for stream->pollin Ashutosh Dixit
2025-01-22 10:23   ` Rodrigo Vivi
2025-01-23  3:04     ` Dixit, Ashutosh [this message]
2025-01-23 10:25       ` Rodrigo Vivi
2025-01-22  4:11 ` ✓ CI.Patch_applied: success for stream->pollin fixes (rev2) Patchwork
2025-01-22  4:12 ` ✓ CI.checkpatch: " Patchwork
2025-01-22  4:13 ` ✓ CI.KUnit: " Patchwork
2025-01-22  4:29 ` ✓ CI.Build: " Patchwork
2025-01-22  4:31 ` ✓ CI.Hooks: " Patchwork
2025-01-22  4:33 ` ✓ CI.checksparse: " Patchwork
2025-01-22  4:59 ` ✓ Xe.CI.BAT: " Patchwork
2025-01-22 14:14 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85frlafcy2.wl-ashutosh.dixit@intel.com \
    --to=ashutosh.dixit@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=umesh.nerlige.ramappa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).