linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Jan Kara <jack@suse.cz>
Cc: Bart Van Assche <bvanassche@acm.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org
Subject: Re: [PATCH] blktrace: Avoid sparse warnings when assigning q->blk_trace
Date: Fri, 29 May 2020 12:22:16 +0000	[thread overview]
Message-ID: <20200529122216.GF11244@42.do-not-panic.com> (raw)
In-Reply-To: <20200529121114.GR14550@quack2.suse.cz>

On Fri, May 29, 2020 at 02:11:14PM +0200, Jan Kara wrote:
> On Fri 29-05-20 11:43:00, Luis Chamberlain wrote:
> > On Fri, May 29, 2020 at 11:04:48AM +0200, Jan Kara wrote:
> > > On Fri 29-05-20 08:00:56, Luis Chamberlain wrote:
> > > > On Thu, May 28, 2020 at 08:55:39PM +0200, Jan Kara wrote:
> > > > > On Thu 28-05-20 18:43:33, Luis Chamberlain wrote:
> > > > > > On Thu, May 28, 2020 at 08:31:52PM +0200, Jan Kara wrote:
> > > > > > > On Thu 28-05-20 07:44:38, Bart Van Assche wrote:
> > > > > > > > (+Luis)
> > > > > > > > 
> > > > > > > > On 2020-05-28 02:29, Jan Kara wrote:
> > > > > > > > > Mostly for historical reasons, q->blk_trace is assigned through xchg()
> > > > > > > > > and cmpxchg() atomic operations. Although this is correct, sparse
> > > > > > > > > complains about this because it violates rcu annotations. Furthermore
> > > > > > > > > there's no real need for atomic operations anymore since all changes to
> > > > > > > > > q->blk_trace happen under q->blk_trace_mutex. So let's just replace
> > > > > > > > > xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
> > > > > > > > > rcu_assign_pointer(). This makes the code more efficient and sparse
> > > > > > > > > happy.
> > > > > > > > > 
> > > > > > > > > Reported-by: kbuild test robot <lkp@intel.com>
> > > > > > > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > > > > > > 
> > > > > > > > How about adding a reference to commit c780e86dd48e ("blktrace: Protect
> > > > > > > > q->blk_trace with RCU") in the description of this patch?
> > > > > > > 
> > > > > > > Yes, that's probably a good idea.
> > > > > > > 
> > > > > > > > > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
> > > > > > > > >  
> > > > > > > > >  	blk_trace_setup_lba(bt, bdev);
> > > > > > > > >  
> > > > > > > > > -	ret = -EBUSY;
> > > > > > > > > -	if (cmpxchg(&q->blk_trace, NULL, bt))
> > > > > > > > > -		goto free_bt;
> > > > > > > > > -
> > > > > > > > > +	rcu_assign_pointer(q->blk_trace, bt);
> > > > > > > > >  	get_probe_ref();
> > > > > > > > >  	return 0;
> > > > > > > > 
> > > > > > > > This changes a conditional assignment of q->blk_trace into an
> > > > > > > > unconditional assignment. Shouldn't q->blk_trace only be assigned if
> > > > > > > > q->blk_trace == NULL?
> > > > > > > 
> > > > > > > Yes but both callers of blk_trace_setup_queue() actually check that
> > > > > > > q->blk_trace is NULL before calling blk_trace_setup_queue() and since we
> > > > > > > hold blk_trace_mutex all the time, the value of q->blk_trace cannot change.
> > > > > > > So the conditional assignment was just bogus.
> > > > > > 
> > > > > > If you run a blktrace against a different partition the check does have
> > > > > > an effect today. This is because the request_queue is shared between
> > > > > > partitions implicitly, even though they end up using a different struct
> > > > > > dentry. So the check is actually still needed, however my change adds
> > > > > > this check early as well so we don't do a memory allocation just to
> > > > > > throw it away.
> > > > > 
> > > > > I'm not sure we are speaking about the same check but I might be missing
> > > > > something. blk_trace_setup_queue() is only called from
> > > > > sysfs_blk_trace_attr_store(). That does:
> > > > > 
> > > > >         mutex_lock(&q->blk_trace_mutex);
> > > > > 
> > > > >         bt = rcu_dereference_protected(q->blk_trace,
> > > > >                                        lockdep_is_held(&q->blk_trace_mutex));
> > > > >         if (attr == &dev_attr_enable) {
> > > > >                 if (!!value == !!bt) {
> > > > >                         ret = 0;
> > > > >                         goto out_unlock_bdev;
> > > > >                 }
> > > > > 
> > > > > 		^^^ So if 'bt' is non-NULL, and we are enabling, we bail
> > > > > instead of calling blk_trace_setup_queue().
> > > > > 
> > > > > Similarly later:
> > > > > 
> > > > >         if (bt == NULL) {
> > > > >                 ret = blk_trace_setup_queue(q, bdev);
> > > > > 	...
> > > > > so we again call blk_trace_setup_queue() only if bt is NULL. So IMO the
> > > > > cmpxchg() in blk_trace_setup_queue() could never fail to set the value.
> > > > > Am I missing something?
> > > > 
> > > > I believe we are talking about the same check indeed. Consider the
> > > > situation not as a race, but instead consider the state machine of
> > > > the ioctl. The BLKTRACESETUP goes first, and when that is over we
> > > > have not ran BLKTRACESTART. So, prior to BLKTRACESTART we can have
> > > > another BLKTRACESETUP run but against another partition.
> > > 
> > > So first note that BLKTRACESETUP goes through do_blk_trace_setup() while
> > > 'echo 1 >/sys/block/../trace/enable' goes through blk_trace_setup_queue().
> > > Although these operations achieve a very similar things, they are completely
> > > separate code paths. I was speaking about the second case while you are now
> > > speaking about the first one.
> > > 
> > > WRT to your BLKTRACESETUP example, the first BLKTRACESETUP will end up
> > > setting q->blk_trace to 'bt' so the second BLKTRACESETUP will see
> > > q->blk_trace is not NULL (my patch adds this check to do_blk_trace_setup()
> > > so we bail out earlier than during cmpxchg()) and fails. Again I don't see
> > > any problem here...
> > 
> > Ah, the patch I was CC'd on didn't contain this hunk! It only had the
> > change from cmpxchg() to the rcu_assign_pointer(), so I misunderstood
> > your intention, sorry!
> 
> Good that we are on the same page now :)

Yay!

> > In that case, I already proposed a patch to do that, and it also adds
> > a tiny bit of verbiage given we currently don't inform the user about
> > why this fails [0].
> 
> Honestly, I'm not sure pr_warn() you've added is that useful. We usually
> don't spam logs due to someone trying to use already used resource. But
> anyway, I can see other people are fine with that so I don't insist.

Well I would typically agree... however... 

It is in no way shape or form, not even in the blktrace documentation
that the request_queue / and therefore blktrace is shared between
partitions. Likewise for scsi-generic and say its respective block
device for TYPE_BLOCK.

If it is not obvious to some developer, it won't be obvious to users.
So *why* this fails really today is a mystery to users.

These limitations to the design of blktrace is not well documented
at all.

> > Let me know how you folks would like to proceed.
> 
> I guess I can rebase my patch on top of your series since that seems pretty
> much done.

I think so as well.

> I was aware of it but didn't realize there's a conflict...

Thanks for Bart for pointing it out!

  Luis

  reply	other threads:[~2020-05-29 12:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28  9:29 [PATCH] blktrace: Avoid sparse warnings when assigning q->blk_trace Jan Kara
2020-05-28 14:44 ` Bart Van Assche
2020-05-28 14:55   ` Luis Chamberlain
2020-05-28 18:31   ` Jan Kara
2020-05-28 18:43     ` Luis Chamberlain
2020-05-28 18:55       ` Jan Kara
2020-05-29  8:00         ` Luis Chamberlain
2020-05-29  9:04           ` Jan Kara
2020-05-29 11:43             ` Luis Chamberlain
2020-05-29 12:11               ` Jan Kara
2020-05-29 12:22                 ` Luis Chamberlain [this message]
  -- strict thread matches above, loose matches on Subject: below --
2020-06-02  7:12 Jan Kara
2020-06-02 14:17 ` Luis Chamberlain
2020-06-02 15:10   ` Luis Chamberlain
2020-06-03  8:35     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200529122216.GF11244@42.do-not-panic.com \
    --to=mcgrof@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).