linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4 0/5] block: restructure elevator switch path and fix a lockdep splat
@ 2025-11-10  8:14 Nilay Shroff
  2025-11-10  8:14 ` [PATCHv4 1/5] block: unify elevator tags and type xarrays into struct elv_change_ctx Nilay Shroff
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Nilay Shroff @ 2025-11-10  8:14 UTC (permalink / raw)
  To: linux-block; +Cc: ming.lei, hch, axboe, yi.zhang, czhong, yukuai, gjoyce

Hi,

This patchset reorganizes the elevator switch path used during both
nr_hw_queues update and elv_iosched_store() operations to address a
recently reported lockdep splat [1].

The warning highlights a locking dependency between ->freeze_lock and
->elevator_lock on pcpu_alloc_mutex, triggered when the Kyber scheduler
dynamically allocates its private scheduling data. The fix is to ensure
that such allocations occur outside the locked sections, thus eliminating
the dependency chain.

While working on this, it also became evident that the nr_hw_queue update
code maintains two disjoint xarrays—one for elevator tags and another
for elevator type—both serving the same purpose. Unifying these into a
single elv_change_ctx structure improves clarity and maintainability.

This series therefore implements five patches:
The first perparatory patch unifies elevator tags and type xarrays. It
combines both xarrays into a single struct elv_change_ctx, simplifying
per-queue elevator state management.

The second patch is aimed to group together all elevator-related 
resources that share the same lifetime and as a first step we move the
elevator tags pointer from struct elv_change_ctx into the newly
inroduced struct elevator_resources. The subsequent patch extends the 
struct elevator_resources to include other elevator-related data.

The third patch introduce ->alloc_sched_data and ->free_sched_data
elevator ops which could be then used to safely allocate and free 
scheduler data.

The fourth patch now builds upon the previous patch and starts using the
newly introduced alloc/free sched data methods in the earlier patch
during elevator switch and nr_hw_queue update. And while doing so, it's
ensured that sched data allocation and free happens before we acquire
->freeze_lock and ->elevator_lock thus preventing its dependency on
pcpu_alloc_mutex.

The last patch of this series converts Kyber scheduler to use the new
methods inroduced in the previous patch. It hooks Kyber’s scheduler data
allocation and teardown logic from ->init_sched and ->exit_sched into
the new methods, ensuring memory operations are performed outside
locked sections.

Together, these changes simplify the elevator switch logic and prevent
the reported lockdep splat.

As always, feedback and suggestions are very welcome!

[1] https://lore.kernel.org/all/CAGVVp+VNW4M-5DZMNoADp6o2VKFhi7KxWpTDkcnVyjO0=-D5+A@mail.gmail.com/

Thanks,
--Nilay

changes from v3:
  - Split the third patch into two patches to separate the introduction
    of ->alloc_sched_data and ->free_sched_data methods from their users.
  - Free scheduler tags during sched resource allocation failures using
    blk_mq_free_sched_tags() instead of kfree() to avoid kmemleak
    (Ming Lei).
  - Delay the signature change of elevator_alloc() until the fourth
    patch, where we actually start allocating scheduler data during
    elevator switch and nr_hw_queue_update (Ming Lei).

Link to v3: https://lore.kernel.org/all/20251029103622.205607-1-nilay@linux.ibm.com/

changes fron v2:
  - Introduce helper functions blk_mq_alloc_sched_res_batch() and
    blk_mq_free_sched_res_batch() to encapsulate scheduler resource
    (tags and data) allocation and freeing in batch mode. (Ming Lei)

  - Introduce helper functions blk_mq_alloc_sched_res() and
    blk_mq_free_sched_res() to encapsulate scheduler resource
    allocation and freeing. (Ming Lei)

Link to v2: https://lore.kernel.org/all/20251027173631.1081005-1-nilay@linux.ibm.com/

changes from v1:
  - Keep blk_mq_free_sched_ctx_batch() and blk_mq_alloc_sched_ctx_batch()
    together in the same file (Ming Lei)
  - Since the ctx pointer is stored in xarray after it's dynamically
    allocated, if blk_mq_alloc_sched_ctx_batch() fails to allocate or
    insert ctx pointer in xarray then unwinding the allocation is not
    necessary. Instead looping over the xarray to retrieve the inserted
    ctx pointer and freeing it should be sufficibet. So invoke blk_mq_
    free_sched_ctx_batch() from the blk_mq_alloc_sched_ctx_batch()
    callsite on failure (Ming Lei)
  - As both elevator tags and elevator data shares the same lifetime
    and allocation constraints, abstract both into a new structure
    (Ming Lei)

Link to v1: https://lore.kernel.org/all/20251016053057.3457663-1-nilay@linux.ibm.com/

Nilay Shroff (5):
  block: unify elevator tags and type xarrays into struct elv_change_ctx
  block: move elevator tags into struct elevator_resources
  block: introduce alloc_sched_data and free_sched_data elevator methods
  block: use {alloc|free}_sched data methods
  block: define alloc_sched_data and free_sched_data methods for kyber

 block/blk-mq-sched.c  | 123 +++++++++++++++++++++++++++++++++---------
 block/blk-mq-sched.h  |  34 ++++++++++--
 block/blk-mq.c        |  50 +++++++++--------
 block/blk.h           |   7 ++-
 block/elevator.c      |  80 +++++++++++++--------------
 block/elevator.h      |  26 ++++++++-
 block/kyber-iosched.c |  30 ++++++++---
 7 files changed, 244 insertions(+), 106 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-11-11 12:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-10  8:14 [PATCHv4 0/5] block: restructure elevator switch path and fix a lockdep splat Nilay Shroff
2025-11-10  8:14 ` [PATCHv4 1/5] block: unify elevator tags and type xarrays into struct elv_change_ctx Nilay Shroff
2025-11-11  6:55   ` Yu Kuai
2025-11-11  8:37     ` Nilay Shroff
2025-11-11 10:02       ` Yu Kuai
2025-11-11 12:00         ` Nilay Shroff
2025-11-10  8:14 ` [PATCHv4 2/5] block: move elevator tags into struct elevator_resources Nilay Shroff
2025-11-11  2:52   ` Ming Lei
2025-11-11  6:49     ` Nilay Shroff
2025-11-10  8:14 ` [PATCHv4 3/5] block: introduce alloc_sched_data and free_sched_data elevator methods Nilay Shroff
2025-11-11  2:53   ` Ming Lei
2025-11-11  7:20   ` Yu Kuai
2025-11-11  8:39     ` Nilay Shroff
2025-11-10  8:14 ` [PATCHv4 4/5] block: use {alloc|free}_sched data methods Nilay Shroff
2025-11-11  2:58   ` Ming Lei
2025-11-11  6:51     ` Nilay Shroff
2025-11-10  8:14 ` [PATCHv4 5/5] block: define alloc_sched_data and free_sched_data methods for kyber Nilay Shroff
2025-11-11  3:01   ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).