Re: [PATCH] block: elevator: avoid to load iosched module from this disk

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Richard W.M. Jones" <rjones@redhat.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <dlemoal@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Jeff Moyer <jmoyer@redhat.com>,
	Jiri Jaburek <jjaburek@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Bart Van Assche <bvanassche@acm.org>,
	Hannes Reinecke <hare@suse.de>,
	Chaitanya Kulkarni <kch@nvidia.com>
Subject: Re: [PATCH] block: elevator: avoid to load iosched module from this disk
Date: Sat, 7 Sep 2024 11:36:32 +0100	[thread overview]
Message-ID: <20240907103632.GZ1450@redhat.com> (raw)
In-Reply-To: <Ztwl2RvR0DGbNuex@fedora>

On Sat, Sep 07, 2024 at 06:07:21PM +0800, Ming Lei wrote:
> On Sat, Sep 07, 2024 at 11:02:13AM +0100, Richard W.M. Jones wrote:
> > On Sat, Sep 07, 2024 at 05:48:44PM +0800, Ming Lei wrote:
> > > On Sat, Sep 07, 2024 at 06:04:59PM +0900, Damien Le Moal wrote:
> > > > On 9/7/24 16:58, Ming Lei wrote:
> > > > > On Sat, Sep 07, 2024 at 08:35:22AM +0100, Richard W.M. Jones wrote:
> > > > >> On Sat, Sep 07, 2024 at 09:43:31AM +0800, Ming Lei wrote:
> > > > >>> When switching io scheduler via sysfs, 'request_module' may be called
> > > > >>> if the specified scheduler doesn't exist.
> > > > >>>
> > > > >>> This was has deadlock risk because the module may be stored on FS behind
> > > > >>> our disk since request queue is frozen before switching its elevator.
> > > > >>>
> > > > >>> Fix it by returning -EDEADLK in case that the disk is claimed, which
> > > > >>> can be thought as one signal that the disk is mounted.
> > > > >>>
> > > > >>> Some distributions(Fedora) simulates the original kernel command line of
> > > > >>> 'elevator=foo' via 'echo foo > /sys/block/$DISK/queue/scheduler', and boot
> > > > >>> hang is triggered.
> > > > >>>
> > > > >>> Cc: Richard Jones <rjones@redhat.com>
> > > > >>> Cc: Jeff Moyer <jmoyer@redhat.com>
> > > > >>> Cc: Jiri Jaburek <jjaburek@redhat.com>
> > > > >>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > >>
> > > > >> I'd suggest also:
> > > > >>
> > > > >> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=219166
> > > > >> Reported-by: Richard W.M. Jones <rjones@redhat.com>
> > > > >> Reported-by: Jiri Jaburek <jjaburek@redhat.com>
> > > > >> Tested-by: Richard W.M. Jones <rjones@redhat.com>
> > > > >>
> > > > >> So I have tested this patch and it does fix the issue, at the possible
> > > > >> cost that now setting the scheduler can fail:
> > > > >>
> > > > >>   + for f in /sys/block/{h,s,ub,v}d*/queue/scheduler
> > > > >>   + echo noop
> > > > >>   /init: line 109: echo: write error: Resource deadlock avoided
> > > > >>
> > > > >> (I know I'm setting it to an impossible value here, but this could
> > > > >> also happen when setting it to a valid one.)
> > > > > 
> > > > > Actually in most of dist, io-schedulers are built-in, so request_module
> > > > > is just a nop, but meta IO must be started.
> > > > > 
> > > > >>
> > > > >> Since almost no one checks the result of 'echo foo > /sys/...'  that
> > > > >> would probably mean that sometimes a desired setting is silently not
> > > > >> set.
> > > > > 
> > > > > As I mentioned, io-schedulers are built-in for most of dist, so
> > > > > request_module isn't called in case of one valid io-sched.
> > > > > 
> > > > >>
> > > > >> Also I bisected this bug yesterday and found it was caused by (or,
> > > > >> more likely, exposed by):
> > > > >>
> > > > >>   commit af2814149883e2c1851866ea2afcd8eadc040f79
> > > > >>   Author: Christoph Hellwig <hch@lst.de>
> > > > >>   Date:   Mon Jun 17 08:04:38 2024 +0200
> > > > >>
> > > > >>     block: freeze the queue in queue_attr_store
> > > > >>     
> > > > >>     queue_attr_store updates attributes used to control generating I/O, and
> > > > >>     can cause malformed bios if changed with I/O in flight.  Freeze the queue
> > > > >>     in common code instead of adding it to almost every attribute.
> > > > >>
> > > > >> Reverting this commit on top of git head also fixes the problem.
> > > > >>
> > > > >> Why did this commit expose the problem?
> > > > > 
> > > > > That is really the 1st bad commit which moves queue freezing before
> > > > > calling request_module(), originally we won't freeze queue until
> > > > > we have to do it.
> > > > > 
> > > > > Another candidate fix is to revert it, or at least not do it
> > > > > for storing elevator attribute.
> > > > 
> > > > I do not think that reverting is acceptable. Rather, a proper fix would simply
> > > 
> > > Right, I remember that the freezing starts to cover update of
> > > max_sectors_kb.
> > > 
> > > > be to do the request_module() before freezing the queue.
> > > > Something like below should work (totally untested and that may be overkill).
> > > > 
> > > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > > > index 60116d13cb80..aef87f6b4a8a 100644
> > > > --- a/block/blk-sysfs.c
> > > > +++ b/block/blk-sysfs.c
> > > > @@ -23,6 +23,7 @@
> > > >  struct queue_sysfs_entry {
> > > >         struct attribute attr;
> > > >         ssize_t (*show)(struct gendisk *disk, char *page);
> > > > +       int (*pre_store)(struct gendisk *disk, const char *page, size_t count);
> > > 
> > > It seems over-kill to add one new callback, and another way is just to
> > > not freeze queue for storing elevator.
> > > 
> > > But if other attribute update needs to not freeze queue, 'pre_store'
> > > looks one reasonable solution.
> > > 
> > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > > index 60116d13cb80..c418edf66f0c 100644
> > > --- a/block/blk-sysfs.c
> > > +++ b/block/blk-sysfs.c
> > > @@ -666,15 +666,24 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
> > >  	struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj);
> > >  	struct request_queue *q = disk->queue;
> > >  	ssize_t res;
> > > +	bool need_freeze;
> > >  
> > >  	if (!entry->store)
> > >  		return -EIO;
> > >  
> > > -	blk_mq_freeze_queue(q);
> > > +	/*
> > > +	 * storing scheduler freezes queue in its way, especially
> > > +	 * loading scheduler module can't be done when queue is frozen
> > > +	 */
> > > +	need_freeze = (entry->store == elv_iosched_store);
> > > +
> > > +	if (need_freeze)
> > > +		blk_mq_freeze_queue(q);
> > >  	mutex_lock(&q->sysfs_lock);
> > >  	res = entry->store(disk, page, length);
> > >  	mutex_unlock(&q->sysfs_lock);
> > > -	blk_mq_unfreeze_queue(q);
> > > +	if (need_freeze)
> > > +		blk_mq_unfreeze_queue(q);
> > >  	return res;
> > >  }
> > >  
> > 
> > Unfortunately this doesn't fix the problem for me.  The test still
> > hangs occasionally in the same way as before.
> 
> 'need_freeze' needs to be flipped by:
> 
> 	need_freeze = (entry->store != elv_iosched_store);

I'm still running the test (takes 5,000 boot iterations before I can
be "sure"), but so far it seems flipping this test fixes the bug.

This seems like the neatest (or shortest) fix so far, but doesn't it
"mix up layers" by checking elv_iosched_store?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

next prev parent reply	other threads:[~2024-09-07 10:36 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-07  1:43 [PATCH] block: elevator: avoid to load iosched module from this disk Ming Lei
2024-09-07  7:35 ` Richard W.M. Jones
2024-09-07  7:58   ` Ming Lei
2024-09-07  9:04     ` Damien Le Moal
2024-09-07  9:48       ` Ming Lei
2024-09-07 10:02         ` Richard W.M. Jones
2024-09-07 10:07           ` Ming Lei
2024-09-07 10:36             ` Richard W.M. Jones [this message]
2024-09-07 11:01               ` Richard W.M. Jones
2024-09-07 11:02               ` Ming Lei
2024-09-07 11:14                 ` Richard W.M. Jones
2024-09-08  0:02                   ` Damien Le Moal
2024-09-09  1:00                     ` Ming Lei
2024-09-09  1:01                     ` Ming Lei
2024-09-07  9:53       ` Richard W.M. Jones
2024-09-07 13:50       ` Jens Axboe
2024-09-09  1:24         ` Ming Lei
2024-09-09  1:56           ` Damien Le Moal
2024-09-09  1:59           ` Damien Le Moal
2024-09-09  2:16             ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240907103632.GZ1450@redhat.com \
    --to=rjones@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=dlemoal@kernel.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jjaburek@redhat.com \
    --cc=jmoyer@redhat.com \
    --cc=kch@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).