Re: [PATCH 2/2] block: avoid acquiring q->sysfs_lock while accessing sysfs attributes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Nilay Shroff <nilay@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>,
	linux-block@vger.kernel.org, dlemoal@kernel.org, axboe@kernel.dk,
	gjoyce@ibm.com
Subject: Re: [PATCH 2/2] block: avoid acquiring q->sysfs_lock while accessing sysfs attributes
Date: Sat, 8 Feb 2025 18:41:10 +0800	[thread overview]
Message-ID: <Z6c0xvOFGYrGqxCd@fedora> (raw)
In-Reply-To: <f933e87a-6014-434a-8258-d871c77ca14c@linux.ibm.com>

On Thu, Feb 06, 2025 at 07:24:02PM +0530, Nilay Shroff wrote:
> 
> 
> On 2/5/25 9:23 PM, Christoph Hellwig wrote:
> > On Wed, Feb 05, 2025 at 08:14:48PM +0530, Nilay Shroff wrote:
> >> The sysfs attributes are already protected with sysfs/kernfs internal
> >> locking. So acquiring q->sysfs_lock is not needed while accessing sysfs
> >> attribute files. So this change helps avoid holding q->sysfs_lock while
> >> accessing sysfs attribute files.
> > 
> > the sysfs/kernfs locking only protects against other accesses using
> > sysfs.  But that's not really the most interesting part here.  We
> > also want to make sure nothing changes underneath in a way that
> > could cause crashes (and maybe even torn information).
> > 
> > We'll really need to audit what is accessed in each method and figure
> > out what protects it.  Chances are that sysfs_lock provides that
> > protection in some case right now, and chances are also very high
> > that a lot of this is pretty broken.
> > 
> Yes that's possible and so I audited all sysfs attributes which are 
> currently protected using q->sysfs_lock and I found some interesting
> facts. Please find below:
> 
> 1. io_poll:
>    Write to this attribute is ignored. So, we don't need q->sysfs_lock.
> 
> 2. io_poll_delay:
>    Write to this attribute is NOP, so we don't need q->sysfs_lock.
> 
> 3. io_timeout:
>    Write to this attribute updates q->rq_timeout and read of this attribute
>    returns the value stored in q->rq_timeout Moreover, the q->rq_timeout is
>    set only once when we init the queue (under blk_mq_init_allocated_queue())
>    even before disk is added. So that means that we may not need to protect
>    it with q->sysfs_lock.
> 
> 4. nomerges:
>    Write to this attribute file updates two q->flags : QUEUE_FLAG_NOMERGES 
>    and QUEUE_FLAG_NOXMERGES. These flags are accessed during bio-merge which
>    anyways doesn't run with q->sysfs_lock held. Moreover, the q->flags are 
>    updated/accessed with bitops which are atomic. So, I believe, protecting
>    it with q->sysfs_lock is not necessary.
> 
> 5. nr_requests:
>    Write to this attribute updates the tag sets and this could potentially
>    race with __blk_mq_update_nr_hw_queues(). So I think we should really 
>    protect it with q->tag_set->tag_list_lock instead of q->sysfs_lock.
> 
> 6. read_ahead_kb:
>    Write to this attribute file updates disk->bdi->ra_pages. The disk->bdi->
>    ra_pages is also updated under queue_limits_commit_update() which runs 
>    holding q->limits_lock; so I think this attribute file should be protected
>    with q->limits_lock and protecting it with q->sysfs_lock is not necessary. 
>    Maybe we should move it under the same sets of attribute files which today
>    runs with q->limits_lock held.
> 
> 7. rq_affinity:
>    Write to this attribute file makes atomic updates to q->flags: QUEUE_FLAG_SAME_COMP
>    and QUEUE_FLAG_SAME_FORCE. These flags are also accessed from blk_mq_complete_need_ipi()
>    using test_bit macro. As read/write to q->flags uses bitops which are atomic, 
>    protecting it with q->stsys_lock is not necessary.
> 
> 8. scheduler:
>    Write to this attribute actually updates q->elevator and the elevator change/switch 
>    code expects that the q->sysfs_lock is held while we update the iosched to protect 
>    against the simultaneous __blk_mq_update_nr_hw_queues update. So yes, this field needs 
>    q->sysfs_lock protection.
> 
>    However if we're thinking of protecting sched change/update using q->tag_sets->
>    tag_list_lock (as discussed in another thread), then we may use q->tag_set->
>    tag_list_lock instead of q->sysfs_lock here while reading/writing to this attribute
>    file.

This is one misuse of tag_list_lock, which is supposed to cover host
wide change, and shouldn't be used for request queue level protection,
which is exactly provided by q->sysfs_lock.

Not mention it will cause ABBA deadlock over freeze lock, please see
blk_mq_update_nr_hw_queues(). And it can't be used for protecting
'nr_requests' too.

> 
> 9. wbt_lat_usec:
>    Write to this attribute file updates the various wbt limits and state. This may race 
>    with blk_mq_exit_sched() or blk_register_queue(). The wbt updates through the 
>    blk_mq_exit_sched() and blk_register_queue() is currently protected with q->sysfs_lock
>    and so yes, we need to protect this attribute with q->sysfs_lock.
> 
>    However, as mentioned above, if we're thinking of protecting elevator change/update
>    using q->sets->tag_list_lock then we may use q->tag_set->tag_list_lock intstead of
>    q->sysfs_lock while reading/writing to this attribute file.

Same ABBA deadlock with above.


Thanks,
Ming

next prev parent reply	other threads:[~2025-02-08 10:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-05 14:44 [PATCH 0/2] block: fix lock order and remove redundant locking Nilay Shroff
2025-02-05 14:44 ` [PATCH 1/2] block: fix lock ordering between the queue ->sysfs_lock and freeze-lock Nilay Shroff
2025-02-05 15:59   ` Christoph Hellwig
2025-02-06 13:22     ` Nilay Shroff
2025-02-06 14:15       ` Christoph Hellwig
2025-02-07 11:59       ` Ming Lei
2025-02-07 18:02         ` Nilay Shroff
2025-02-08  8:30           ` Ming Lei
2025-02-08 13:18             ` Nilay Shroff
2025-02-05 14:44 ` [PATCH 2/2] block: avoid acquiring q->sysfs_lock while accessing sysfs attributes Nilay Shroff
2025-02-05 15:53   ` Christoph Hellwig
2025-02-06 13:54     ` Nilay Shroff
2025-02-06 14:07       ` Christoph Hellwig
2025-02-07 11:03         ` Nilay Shroff
2025-02-08 10:41       ` Ming Lei [this message]
2025-02-08 12:56         ` Nilay Shroff
2025-02-09 11:41           ` Ming Lei
2025-02-09 13:41             ` Nilay Shroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6c0xvOFGYrGqxCd@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dlemoal@kernel.org \
    --cc=gjoyce@ibm.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=nilay@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.