public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held
       [not found]                 ` <ae6d294a-e9ec-a81d-6085-a9341ed8a470@deltatee.com>
@ 2022-05-26 11:53                   ` Jan Kara
  2022-05-31  6:11                     ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2022-05-26 11:53 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Guoqing Jiang, Donald Buczek, Song Liu, Jan Kara, Jens Axboe,
	Paolo Valente, linux-raid, Tejun Heo, Christoph Hellwig,
	linux-block

[Added couple of CCs since this seems to be an issue in the generic block
layer]

On Wed 25-05-22 12:22:06, Logan Gunthorpe wrote:
> On 2022-05-25 03:04, Guoqing Jiang wrote:
> > I would prefer to focus on block tree or md tree. With latest block tree
> > (commit 44d8538d7e7dbee7246acda3b706c8134d15b9cb), I get below
> > similar issue as Donald reported, it happened with the cmd (which did
> > work with 5.12 kernel).
> > 
> > vm79:~/mdadm> sudo ./test --dev=loop --tests=05r1-add-internalbitmap
> 
> Ok, so this test passes for me, but my VM was not running with bfq. It
> also seems we have layers upon layers of different bugs to untangle.
> Perhaps you can try the tests with bfq disabled to make progress on the
> other regression I reported.
> 
> If I enable bfq and set the loop devices to the bfq scheduler, then I
> hit the same bug as you and Donald. It's clearly a NULL pointer
> de-reference in the bfq code, which seems to be triggered on the
> partition read after mdadm opens a block device (not sure if it's the md
> device or the loop device but I suspect the latter seeing it's not going
> through any md code).
> 
> Simplifying things down a bit, the null pointer dereference can be
> triggered by creating an md device with loop devices that have bfq
> scheduler set:
> 
>   mdadm --create --run /dev/md0 --level=1 -n2 /dev/loop0 /dev/loop1
> 
> The crash occurs in bfq_bio_bfqg() with blkg_to_bfqg() returning NULL.
> It's hard to trace where the NULL comes from in there -- the code is a
> bit complex.
> 
> I've found that the bfq bug exists in current md-next (42b805af102) but
> did not trigger in the base tag of v5.18-rc3. Bisecting revealed the bug
> was introduced by:
> 
>   4e54a2493e58 ("bfq: Get rid of __bio_blkcg() usage")
> 
> Reverting that commit and the next commit (075a53b7) on top of md-next
> was confirmed to fix the bug.
> 
> I've copied Jan, Jens and Paolo who can hopefully help with this. A
> cleaned up stack trace follows this email for their benefit.

So I've debugged this. The crash happens on the very first bio submitted to
the md0 device. The problem is that this bio gets remapped to loop0 - this
happens through bio_alloc_clone() -> __bio_clone() which ends up calling
bio_clone_blkg_association(). Now the resulting bio is inconsistent - it's
dst_bio->bi_bdev is pointing to loop0 while dst_bio->bi_blkg is pointing to
blkcg_gq associated with md0 request queue. And this breaks BFQ because
when this bio is inserted to loop0 request queue, BFQ looks at
bio->bi_blkg->q (it is a bit more complex than that but this is the gist
of the problem), expects its data there but BFQ is not initialized for md0
request_queue.

Now I think this is a bug in __bio_clone() but the inconsistency in the bio
is very much what we asked bio_clone_blkg_association() to do so maybe I'm
missing something and bios that are associated with one bdev but pointing
to blkg of another bdev are fine and controllers are supposed to handle
that (although I'm not sure how should they do that). So I'm asking here
before I just go and delete bio_clone_blkg_association() from
__bio_clone()...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held
  2022-05-26 11:53                   ` [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held Jan Kara
@ 2022-05-31  6:11                     ` Christoph Hellwig
  2022-05-31  7:43                       ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2022-05-31  6:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: Logan Gunthorpe, Guoqing Jiang, Donald Buczek, Song Liu,
	Jens Axboe, Paolo Valente, linux-raid, Tejun Heo,
	Christoph Hellwig, linux-block

On Thu, May 26, 2022 at 01:53:36PM +0200, Jan Kara wrote:
> So I've debugged this. The crash happens on the very first bio submitted to
> the md0 device. The problem is that this bio gets remapped to loop0 - this
> happens through bio_alloc_clone() -> __bio_clone() which ends up calling
> bio_clone_blkg_association(). Now the resulting bio is inconsistent - it's
> dst_bio->bi_bdev is pointing to loop0 while dst_bio->bi_blkg is pointing to
> blkcg_gq associated with md0 request queue. And this breaks BFQ because
> when this bio is inserted to loop0 request queue, BFQ looks at
> bio->bi_blkg->q (it is a bit more complex than that but this is the gist
> of the problem), expects its data there but BFQ is not initialized for md0
> request_queue.
> 
> Now I think this is a bug in __bio_clone() but the inconsistency in the bio
> is very much what we asked bio_clone_blkg_association() to do so maybe I'm
> missing something and bios that are associated with one bdev but pointing
> to blkg of another bdev are fine and controllers are supposed to handle
> that (although I'm not sure how should they do that). So I'm asking here
> before I just go and delete bio_clone_blkg_association() from
> __bio_clone()...

This behavior probably goes back to my commit here:

ommit d92c370a16cbe0276954c761b874bd024a7e4fac
Author: Christoph Hellwig <hch@lst.de>
Date:   Sat Jun 27 09:31:48 2020 +0200

    block: really clone the block cgroup in bio_clone_blkg_association

and it seems everyone else was fine with that behavior so far.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held
  2022-05-31  6:11                     ` Christoph Hellwig
@ 2022-05-31  7:43                       ` Jan Kara
  0 siblings, 0 replies; 3+ messages in thread
From: Jan Kara @ 2022-05-31  7:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Logan Gunthorpe, Guoqing Jiang, Donald Buczek, Song Liu,
	Jens Axboe, Paolo Valente, linux-raid, Tejun Heo, linux-block

On Mon 30-05-22 23:11:47, Christoph Hellwig wrote:
> On Thu, May 26, 2022 at 01:53:36PM +0200, Jan Kara wrote:
> > So I've debugged this. The crash happens on the very first bio submitted to
> > the md0 device. The problem is that this bio gets remapped to loop0 - this
> > happens through bio_alloc_clone() -> __bio_clone() which ends up calling
> > bio_clone_blkg_association(). Now the resulting bio is inconsistent - it's
> > dst_bio->bi_bdev is pointing to loop0 while dst_bio->bi_blkg is pointing to
> > blkcg_gq associated with md0 request queue. And this breaks BFQ because
> > when this bio is inserted to loop0 request queue, BFQ looks at
> > bio->bi_blkg->q (it is a bit more complex than that but this is the gist
> > of the problem), expects its data there but BFQ is not initialized for md0
> > request_queue.
> > 
> > Now I think this is a bug in __bio_clone() but the inconsistency in the bio
> > is very much what we asked bio_clone_blkg_association() to do so maybe I'm
> > missing something and bios that are associated with one bdev but pointing
> > to blkg of another bdev are fine and controllers are supposed to handle
> > that (although I'm not sure how should they do that). So I'm asking here
> > before I just go and delete bio_clone_blkg_association() from
> > __bio_clone()...
> 
> This behavior probably goes back to my commit here:
> 
> ommit d92c370a16cbe0276954c761b874bd024a7e4fac
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Sat Jun 27 09:31:48 2020 +0200
> 
>     block: really clone the block cgroup in bio_clone_blkg_association
> 
> and it seems everyone else was fine with that behavior so far.

Yes, thanks for the pointer. I agree nobody else was crashing due to this
but it will be causing interesting inconsistencies in accounting and
throttling in all the IO controllers - e.g. usually both original
and cloned bio will get accounted to md0 device and loop0 device settings
will be completely ignored. From my experience these things do not
get really tested too much until some customer tries to use them :). So I
think we have to effectively revert your above change. I'll send a patch.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-05-31  7:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <141b4110-767e-7670-21d5-6a5f636d1207@linux.dev>
     [not found] ` <CAPhsuW6U3g-Xikbw4mAJOH1-kN42rYHLiq_ocv==436azhm33g@mail.gmail.com>
     [not found]   ` <b4244eab-d9e2-20a0-ebce-1a96e8fadb91@deltatee.com>
     [not found]     ` <836b2a93-65be-8d6c-8610-18373b88f86d@molgen.mpg.de>
     [not found]       ` <5b0584a3-c128-cb53-7c8a-63744c60c667@linux.dev>
     [not found]         ` <4edc9468-d195-6937-f550-211bccbd6756@molgen.mpg.de>
     [not found]           ` <954f9c33-7801-b6d2-65e3-9e5237905886@linux.dev>
     [not found]             ` <82a08e9c-e3f4-4eb6-cb06-58b96c0f01a8@deltatee.com>
     [not found]               ` <775d6734-2b08-21a8-a093-f750d31ce6ce@linux.dev>
     [not found]                 ` <ae6d294a-e9ec-a81d-6085-a9341ed8a470@deltatee.com>
2022-05-26 11:53                   ` [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held Jan Kara
2022-05-31  6:11                     ` Christoph Hellwig
2022-05-31  7:43                       ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox