From: Nilay Shroff <nilay@linux.ibm.com>
To: linux-block@vger.kernel.org
Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org,
hare@suse.de, axboe@kernel.dk, gjoyce@ibm.com
Subject: [PATCHv3 3/7] block: remove q->sysfs_lock for attributes which don't need it
Date: Mon, 24 Feb 2025 19:00:54 +0530 [thread overview]
Message-ID: <20250224133102.1240146-4-nilay@linux.ibm.com> (raw)
In-Reply-To: <20250224133102.1240146-1-nilay@linux.ibm.com>
There're few sysfs attributes in block layer which don't really need
acquiring q->sysfs_lock while accessing it. The reason being, reading/
writing a value from/to such attributes are either atomic or could be
easily protected using READ_ONCE()/WRITE_ONCE(). Moreover, sysfs
attributes are inherently protected with sysfs/kernfs internal locking.
So this change help segregate all existing sysfs attributes for which
we could avoid acquiring q->sysfs_lock. For all read-only attributes
we removed the q->sysfs_lock from show method of such attributes. In
case attribute is read/write then we removed the q->sysfs_lock from
both show and store methods of these attributes.
We audited all block sysfs attributes and found following list of
attributes which shouldn't require q->sysfs_lock protection:
1. io_poll:
Write to this attribute is ignored. So, we don't need q->sysfs_lock.
2. io_poll_delay:
Write to this attribute is NOP, so we don't need q->sysfs_lock.
3. io_timeout:
Write to this attribute updates q->rq_timeout and read of this
attribute returns the value stored in q->rq_timeout Moreover, the
q->rq_timeout is set only once when we init the queue (under blk_mq_
init_allocated_queue()) even before disk is added. So that means
that we don't need to protect it with q->sysfs_lock. As this
attribute is not directly correlated with anything else simply using
READ_ONCE/WRITE_ONCE should be enough.
4. nomerges:
Write to this attribute file updates two q->flags : QUEUE_FLAG_
NOMERGES and QUEUE_FLAG_NOXMERGES. These flags are accessed during
bio-merge which anyways doesn't run with q->sysfs_lock held.
Moreover, the q->flags are updated/accessed with bitops which are
atomic. So, protecting it with q->sysfs_lock is not necessary.
5. rq_affinity:
Write to this attribute file makes atomic updates to q->flags:
QUEUE_FLAG_SAME_COMP and QUEUE_FLAG_SAME_FORCE. These flags are
also accessed from blk_mq_complete_need_ipi() using test_bit macro.
As read/write to q->flags uses bitops which are atomic, protecting
it with q->stsys_lock is not necessary.
6. nr_zones:
Write to this attribute happens in the driver probe method (except
nvme) before disk is added and outside of q->sysfs_lock or any other
lock. Moreover nr_zones is defined as "unsigned int" and so reading
this attribute, even when it's simultaneously being updated on other
cpu, should not return torn value on any architecture supported by
linux. So we can avoid using q->sysfs_lock or any other lock/
protection while reading this attribute.
7. discard_zeroes_data:
Reading of this attribute always returns 0, so we don't require
holding q->sysfs_lock.
8. write_same_max_bytes
Reading of this attribute always returns 0, so we don't require
holding q->sysfs_lock.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
block/blk-settings.c | 2 +-
block/blk-sysfs.c | 81 +++++++++++++++-----------------------------
2 files changed, 29 insertions(+), 54 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index c44dadc35e1e..c541bf22f543 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -21,7 +21,7 @@
void blk_queue_rq_timeout(struct request_queue *q, unsigned int timeout)
{
- q->rq_timeout = timeout;
+ WRITE_ONCE(q->rq_timeout, timeout);
}
EXPORT_SYMBOL_GPL(blk_queue_rq_timeout);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index fcfbe59f3a56..83f78d2e1cd3 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -172,12 +172,7 @@ QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_KB(max_hw_sectors)
#define QUEUE_SYSFS_SHOW_CONST(_name, _val) \
static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \
{ \
- ssize_t ret; \
- \
- mutex_lock(&disk->queue->sysfs_lock); \
- ret = sysfs_emit(page, "%d\n", _val); \
- mutex_unlock(&disk->queue->sysfs_lock); \
- return ret; \
+ return sysfs_emit(page, "%d\n", _val); \
}
/* deprecated fields */
@@ -266,17 +261,11 @@ QUEUE_SYSFS_FEATURE_SHOW(dax, BLK_FEAT_DAX);
static ssize_t queue_poll_show(struct gendisk *disk, char *page)
{
- ssize_t ret;
+ if (queue_is_mq(disk->queue))
+ return sysfs_emit(page, "%u\n", blk_mq_can_poll(disk->queue));
- mutex_lock(&disk->queue->sysfs_lock);
- if (queue_is_mq(disk->queue)) {
- ret = sysfs_emit(page, "%u\n", blk_mq_can_poll(disk->queue));
- } else {
- ret = sysfs_emit(page, "%u\n",
+ return sysfs_emit(page, "%u\n",
!!(disk->queue->limits.features & BLK_FEAT_POLL));
- }
- mutex_unlock(&disk->queue->sysfs_lock);
- return ret;
}
static ssize_t queue_zoned_show(struct gendisk *disk, char *page)
@@ -288,12 +277,7 @@ static ssize_t queue_zoned_show(struct gendisk *disk, char *page)
static ssize_t queue_nr_zones_show(struct gendisk *disk, char *page)
{
- ssize_t ret;
-
- mutex_lock(&disk->queue->sysfs_lock);
- ret = queue_var_show(disk_nr_zones(disk), page);
- mutex_unlock(&disk->queue->sysfs_lock);
- return ret;
+ return queue_var_show(disk_nr_zones(disk), page);
}
static ssize_t queue_iostats_passthrough_show(struct gendisk *disk, char *page)
@@ -320,13 +304,8 @@ static int queue_iostats_passthrough_store(struct gendisk *disk,
static ssize_t queue_nomerges_show(struct gendisk *disk, char *page)
{
- ssize_t ret;
-
- mutex_lock(&disk->queue->sysfs_lock);
- ret = queue_var_show((blk_queue_nomerges(disk->queue) << 1) |
+ return queue_var_show((blk_queue_nomerges(disk->queue) << 1) |
blk_queue_noxmerges(disk->queue), page);
- mutex_unlock(&disk->queue->sysfs_lock);
- return ret;
}
static ssize_t queue_nomerges_store(struct gendisk *disk, const char *page,
@@ -340,7 +319,6 @@ static ssize_t queue_nomerges_store(struct gendisk *disk, const char *page,
if (ret < 0)
return ret;
- mutex_lock(&q->sysfs_lock);
memflags = blk_mq_freeze_queue(q);
blk_queue_flag_clear(QUEUE_FLAG_NOMERGES, q);
blk_queue_flag_clear(QUEUE_FLAG_NOXMERGES, q);
@@ -349,22 +327,16 @@ static ssize_t queue_nomerges_store(struct gendisk *disk, const char *page,
else if (nm)
blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, q);
blk_mq_unfreeze_queue(q, memflags);
- mutex_unlock(&q->sysfs_lock);
return ret;
}
static ssize_t queue_rq_affinity_show(struct gendisk *disk, char *page)
{
- ssize_t ret;
- bool set, force;
+ bool set = test_bit(QUEUE_FLAG_SAME_COMP, &disk->queue->queue_flags);
+ bool force = test_bit(QUEUE_FLAG_SAME_FORCE, &disk->queue->queue_flags);
- mutex_lock(&disk->queue->sysfs_lock);
- set = test_bit(QUEUE_FLAG_SAME_COMP, &disk->queue->queue_flags);
- force = test_bit(QUEUE_FLAG_SAME_FORCE, &disk->queue->queue_flags);
- ret = queue_var_show(set << force, page);
- mutex_unlock(&disk->queue->sysfs_lock);
- return ret;
+ return queue_var_show(set << force, page);
}
static ssize_t
@@ -380,7 +352,12 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count)
if (ret < 0)
return ret;
- mutex_lock(&q->sysfs_lock);
+ /*
+ * Here we update two queue flags each using atomic bitops, although
+ * updating two flags isn't atomic it should be harmless as those flags
+ * are accessed individually using atomic test_bit operation. So we
+ * don't grab any lock while updating these flags.
+ */
memflags = blk_mq_freeze_queue(q);
if (val == 2) {
blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q);
@@ -393,7 +370,6 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count)
blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q);
}
blk_mq_unfreeze_queue(q, memflags);
- mutex_unlock(&q->sysfs_lock);
#endif
return ret;
}
@@ -411,30 +387,23 @@ static ssize_t queue_poll_store(struct gendisk *disk, const char *page,
ssize_t ret = count;
struct request_queue *q = disk->queue;
- mutex_lock(&q->sysfs_lock);
memflags = blk_mq_freeze_queue(q);
if (!(q->limits.features & BLK_FEAT_POLL)) {
ret = -EINVAL;
goto out;
}
+
pr_info_ratelimited("writes to the poll attribute are ignored.\n");
pr_info_ratelimited("please use driver specific parameters instead.\n");
out:
blk_mq_unfreeze_queue(q, memflags);
- mutex_unlock(&q->sysfs_lock);
-
return ret;
}
static ssize_t queue_io_timeout_show(struct gendisk *disk, char *page)
{
- ssize_t ret;
-
- mutex_lock(&disk->queue->sysfs_lock);
- ret = sysfs_emit(page, "%u\n",
- jiffies_to_msecs(disk->queue->rq_timeout));
- mutex_unlock(&disk->queue->sysfs_lock);
- return ret;
+ return sysfs_emit(page, "%u\n",
+ jiffies_to_msecs(READ_ONCE(disk->queue->rq_timeout)));
}
static ssize_t queue_io_timeout_store(struct gendisk *disk, const char *page,
@@ -448,11 +417,9 @@ static ssize_t queue_io_timeout_store(struct gendisk *disk, const char *page,
if (err || val == 0)
return -EINVAL;
- mutex_lock(&q->sysfs_lock);
memflags = blk_mq_freeze_queue(q);
blk_queue_rq_timeout(q, msecs_to_jiffies(val));
blk_mq_unfreeze_queue(q, memflags);
- mutex_unlock(&q->sysfs_lock);
return count;
}
@@ -706,6 +673,10 @@ static struct attribute *queue_attrs[] = {
* attributes protected with q->sysfs_lock
*/
&queue_ra_entry.attr,
+
+ /*
+ * attributes which don't require locking
+ */
&queue_discard_zeroes_data_entry.attr,
&queue_write_same_max_entry.attr,
&queue_nr_zones_entry.attr,
@@ -723,11 +694,15 @@ static struct attribute *blk_mq_queue_attrs[] = {
*/
&queue_requests_entry.attr,
&elv_iosched_entry.attr,
- &queue_rq_affinity_entry.attr,
- &queue_io_timeout_entry.attr,
#ifdef CONFIG_BLK_WBT
&queue_wb_lat_entry.attr,
#endif
+ /*
+ * attributes which don't require locking
+ */
+ &queue_rq_affinity_entry.attr,
+ &queue_io_timeout_entry.attr,
+
NULL,
};
--
2.47.1
next prev parent reply other threads:[~2025-02-24 13:33 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-24 13:30 [PATCHv3 0/7] block: fix lock order and remove redundant locking Nilay Shroff
2025-02-24 13:30 ` [PATCHv3 1/7] block: acquire q->limits_lock while reading sysfs attributes Nilay Shroff
2025-02-25 7:38 ` Hannes Reinecke
2025-02-24 13:30 ` [PATCHv3 2/7] block: move q->sysfs_lock and queue-freeze under show/store method Nilay Shroff
2025-02-24 16:31 ` Christoph Hellwig
2025-02-25 7:41 ` Hannes Reinecke
2025-02-24 13:30 ` Nilay Shroff [this message]
2025-02-25 7:46 ` [PATCHv3 3/7] block: remove q->sysfs_lock for attributes which don't need it Hannes Reinecke
2025-02-24 13:30 ` [PATCHv3 4/7] block: Introduce a dedicated lock for protecting queue elevator updates Nilay Shroff
2025-02-24 16:33 ` Christoph Hellwig
2025-02-25 13:28 ` Nilay Shroff
2025-02-25 7:49 ` Hannes Reinecke
2025-02-24 13:30 ` [PATCHv3 5/7] block: protect nr_requests update using q->elevator_lock Nilay Shroff
2025-02-25 7:50 ` Hannes Reinecke
2025-02-24 13:30 ` [PATCHv3 6/7] block: protect wbt_lat_usec " Nilay Shroff
2025-02-25 7:53 ` Hannes Reinecke
2025-02-25 10:05 ` Nilay Shroff
2025-02-24 13:30 ` [PATCHv3 7/7] block: protect read_ahead_kb using q->limits_lock Nilay Shroff
2025-02-25 7:58 ` Hannes Reinecke
2025-02-25 10:18 ` Nilay Shroff
2025-02-25 11:43 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250224133102.1240146-4-nilay@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=axboe@kernel.dk \
--cc=dlemoal@kernel.org \
--cc=gjoyce@ibm.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox