linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Nilay Shroff <nilay@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>,
	linux-block@vger.kernel.org, dlemoal@kernel.org, axboe@kernel.dk,
	gjoyce@ibm.com
Subject: Re: [PATCH 2/2] block: avoid acquiring q->sysfs_lock while accessing sysfs attributes
Date: Sat, 8 Feb 2025 18:41:10 +0800	[thread overview]
Message-ID: <Z6c0xvOFGYrGqxCd@fedora> (raw)
In-Reply-To: <f933e87a-6014-434a-8258-d871c77ca14c@linux.ibm.com>

On Thu, Feb 06, 2025 at 07:24:02PM +0530, Nilay Shroff wrote:
> 
> 
> On 2/5/25 9:23 PM, Christoph Hellwig wrote:
> > On Wed, Feb 05, 2025 at 08:14:48PM +0530, Nilay Shroff wrote:
> >> The sysfs attributes are already protected with sysfs/kernfs internal
> >> locking. So acquiring q->sysfs_lock is not needed while accessing sysfs
> >> attribute files. So this change helps avoid holding q->sysfs_lock while
> >> accessing sysfs attribute files.
> > 
> > the sysfs/kernfs locking only protects against other accesses using
> > sysfs.  But that's not really the most interesting part here.  We
> > also want to make sure nothing changes underneath in a way that
> > could cause crashes (and maybe even torn information).
> > 
> > We'll really need to audit what is accessed in each method and figure
> > out what protects it.  Chances are that sysfs_lock provides that
> > protection in some case right now, and chances are also very high
> > that a lot of this is pretty broken.
> > 
> Yes that's possible and so I audited all sysfs attributes which are 
> currently protected using q->sysfs_lock and I found some interesting
> facts. Please find below:
> 
> 1. io_poll:
>    Write to this attribute is ignored. So, we don't need q->sysfs_lock.
> 
> 2. io_poll_delay:
>    Write to this attribute is NOP, so we don't need q->sysfs_lock.
> 
> 3. io_timeout:
>    Write to this attribute updates q->rq_timeout and read of this attribute
>    returns the value stored in q->rq_timeout Moreover, the q->rq_timeout is
>    set only once when we init the queue (under blk_mq_init_allocated_queue())
>    even before disk is added. So that means that we may not need to protect
>    it with q->sysfs_lock.
> 
> 4. nomerges:
>    Write to this attribute file updates two q->flags : QUEUE_FLAG_NOMERGES 
>    and QUEUE_FLAG_NOXMERGES. These flags are accessed during bio-merge which
>    anyways doesn't run with q->sysfs_lock held. Moreover, the q->flags are 
>    updated/accessed with bitops which are atomic. So, I believe, protecting
>    it with q->sysfs_lock is not necessary.
> 
> 5. nr_requests:
>    Write to this attribute updates the tag sets and this could potentially
>    race with __blk_mq_update_nr_hw_queues(). So I think we should really 
>    protect it with q->tag_set->tag_list_lock instead of q->sysfs_lock.
> 
> 6. read_ahead_kb:
>    Write to this attribute file updates disk->bdi->ra_pages. The disk->bdi->
>    ra_pages is also updated under queue_limits_commit_update() which runs 
>    holding q->limits_lock; so I think this attribute file should be protected
>    with q->limits_lock and protecting it with q->sysfs_lock is not necessary. 
>    Maybe we should move it under the same sets of attribute files which today
>    runs with q->limits_lock held.
> 
> 7. rq_affinity:
>    Write to this attribute file makes atomic updates to q->flags: QUEUE_FLAG_SAME_COMP
>    and QUEUE_FLAG_SAME_FORCE. These flags are also accessed from blk_mq_complete_need_ipi()
>    using test_bit macro. As read/write to q->flags uses bitops which are atomic, 
>    protecting it with q->stsys_lock is not necessary.
> 
> 8. scheduler:
>    Write to this attribute actually updates q->elevator and the elevator change/switch 
>    code expects that the q->sysfs_lock is held while we update the iosched to protect 
>    against the simultaneous __blk_mq_update_nr_hw_queues update. So yes, this field needs 
>    q->sysfs_lock protection.
> 
>    However if we're thinking of protecting sched change/update using q->tag_sets->
>    tag_list_lock (as discussed in another thread), then we may use q->tag_set->
>    tag_list_lock instead of q->sysfs_lock here while reading/writing to this attribute
>    file.

This is one misuse of tag_list_lock, which is supposed to cover host
wide change, and shouldn't be used for request queue level protection,
which is exactly provided by q->sysfs_lock.

Not mention it will cause ABBA deadlock over freeze lock, please see
blk_mq_update_nr_hw_queues(). And it can't be used for protecting
'nr_requests' too.

> 
> 9. wbt_lat_usec:
>    Write to this attribute file updates the various wbt limits and state. This may race 
>    with blk_mq_exit_sched() or blk_register_queue(). The wbt updates through the 
>    blk_mq_exit_sched() and blk_register_queue() is currently protected with q->sysfs_lock
>    and so yes, we need to protect this attribute with q->sysfs_lock.
> 
>    However, as mentioned above, if we're thinking of protecting elevator change/update
>    using q->sets->tag_list_lock then we may use q->tag_set->tag_list_lock intstead of
>    q->sysfs_lock while reading/writing to this attribute file.

Same ABBA deadlock with above.


Thanks,
Ming


  parent reply	other threads:[~2025-02-08 10:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-05 14:44 [PATCH 0/2] block: fix lock order and remove redundant locking Nilay Shroff
2025-02-05 14:44 ` [PATCH 1/2] block: fix lock ordering between the queue ->sysfs_lock and freeze-lock Nilay Shroff
2025-02-05 15:59   ` Christoph Hellwig
2025-02-06 13:22     ` Nilay Shroff
2025-02-06 14:15       ` Christoph Hellwig
2025-02-07 11:59       ` Ming Lei
2025-02-07 18:02         ` Nilay Shroff
2025-02-08  8:30           ` Ming Lei
2025-02-08 13:18             ` Nilay Shroff
2025-02-05 14:44 ` [PATCH 2/2] block: avoid acquiring q->sysfs_lock while accessing sysfs attributes Nilay Shroff
2025-02-05 15:53   ` Christoph Hellwig
2025-02-06 13:54     ` Nilay Shroff
2025-02-06 14:07       ` Christoph Hellwig
2025-02-07 11:03         ` Nilay Shroff
2025-02-08 10:41       ` Ming Lei [this message]
2025-02-08 12:56         ` Nilay Shroff
2025-02-09 11:41           ` Ming Lei
2025-02-09 13:41             ` Nilay Shroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6c0xvOFGYrGqxCd@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dlemoal@kernel.org \
    --cc=gjoyce@ibm.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=nilay@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).