All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Patrick Plenefisch <simonpatp@gmail.com>
Cc: Mike Snitzer <snitzer@kernel.org>,
	Goffredo Baroncelli <kreijack@inwind.it>,
	linux-kernel@vger.kernel.org, Alasdair Kergon <agk@redhat.com>,
	Mikulas Patocka <mpatocka@redhat.com>, Chris Mason <clm@fb.com>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	regressions@lists.linux.dev, dm-devel@lists.linux.dev,
	linux-btrfs@vger.kernel.org, ming.lei@redhat.com
Subject: Re: LVM-on-LVM: error while submitting device barriers
Date: Sun, 10 Mar 2024 19:34:36 +0800	[thread overview]
Message-ID: <Ze2azGlb1WxVFv7Z@fedora> (raw)
In-Reply-To: <CAOCpoWeoQMh_-MxzxGBnK2Kf5EhvTLs=GrGwJ5XcfGVRTp73Eg@mail.gmail.com>

On Sat, Mar 09, 2024 at 03:39:02PM -0500, Patrick Plenefisch wrote:
> On Wed, Mar 6, 2024 at 11:00 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Tue, Mar 05, 2024 at 12:45:13PM -0500, Mike Snitzer wrote:
> > > On Thu, Feb 29 2024 at  5:05P -0500,
> > > Goffredo Baroncelli <kreijack@inwind.it> wrote:
> > >
> > > > On 29/02/2024 21.22, Patrick Plenefisch wrote:
> > > > > On Thu, Feb 29, 2024 at 2:56 PM Goffredo Baroncelli <kreijack@inwind.it> wrote:
> > > > > >
> > > > > > > Your understanding is correct. The only thing that comes to my mind to
> > > > > > > cause the problem is asymmetry of the SATA devices. I have one 8TB
> > > > > > > device, plus a 1.5TB, 3TB, and 3TB drives. Doing math on the actual
> > > > > > > extents, lowerVG/single spans (3TB+3TB), and
> > > > > > > lowerVG/lvmPool/lvm/brokenDisk spans (3TB+1.5TB). Both obviously have
> > > > > > > the other leg of raid1 on the 8TB drive, but my thought was that the
> > > > > > > jump across the 1.5+3TB drive gap was at least "interesting"
> > > > > >
> > > > > >
> > > > > > what about lowerVG/works ?
> > > > > >
> > > > >
> > > > > That one is only on two disks, it doesn't span any gaps
> > > >
> > > > Sorry, but re-reading the original email I found something that I missed before:
> > > >
> > > > > BTRFS error (device dm-75): bdev /dev/mapper/lvm-brokenDisk errs: wr
> > > > > 0, rd 0, flush 1, corrupt 0, gen 0
> > > > > BTRFS warning (device dm-75): chunk 13631488 missing 1 devices, max
> > > >                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > > tolerance is 0 for writable mount
> > > > > BTRFS: error (device dm-75) in write_all_supers:4379: errno=-5 IO
> > > > > failure (errors while submitting device barriers.)
> > > >
> > > > Looking at the code, it seems that if a FLUSH commands fails, btrfs
> > > > considers that the disk is missing. The it cannot mount RW the device.
> > > >
> > > > I would investigate with the LVM developers, if it properly passes
> > > > the flush/barrier command through all the layers, when we have an
> > > > lvm over lvm (raid1). The fact that the lvm is a raid1, is important because
> > > > a flush command to be honored has to be honored by all the
> > > > devices involved.
> >
> > Hello Patrick & Goffredo,
> >
> > I can trigger this kind of btrfs complaint by simulating one FLUSH failure.
> >
> > If you can reproduce this issue easily, please collect log by the
> > following bpftrace script, which may show where the flush failure is,
> > and maybe it can help to narrow down the issue in the whole stack.
> >
> >
> > #!/usr/bin/bpftrace
> >
> > #ifndef BPFTRACE_HAVE_BTF
> > #include <linux/blkdev.h>
> > #endif
> >
> > kprobe:submit_bio_noacct,
> > kprobe:submit_bio
> > / (((struct bio *)arg0)->bi_opf & (1 << __REQ_PREFLUSH)) != 0 /
> > {
> >         $bio = (struct bio *)arg0;
> >         @submit_stack[arg0] = kstack;
> >         @tracked[arg0] = 1;
> > }
> >
> > kprobe:bio_endio
> > /@tracked[arg0] != 0/
> > {
> >         $bio = (struct bio *)arg0;
> >
> >         if (($bio->bi_flags & (1 << BIO_CHAIN)) && $bio->__bi_remaining.counter > 1) {
> >                 return;
> >         }
> >
> >         if ($bio->bi_status != 0) {
> >                 printf("dev %s bio failed %d, submitter %s completion %s\n",
> >                         $bio->bi_bdev->bd_disk->disk_name,
> >                         $bio->bi_status, @submit_stack[arg0], kstack);
> >         }
> >         delete(@submit_stack[arg0]);
> >         delete(@tracked[arg0]);
> > }
> >
> > END {
> >         clear(@submit_stack);
> >         clear(@tracked);
> > }
> >
> 
> Attaching 4 probes...
> dev dm-77 bio failed 10, submitter
>        submit_bio_noacct+5
>        __send_duplicate_bios+358
>        __send_empty_flush+179
>        dm_submit_bio+857
>        __submit_bio+132
>        submit_bio_noacct_nocheck+345
>        write_all_supers+1718
>        btrfs_commit_transaction+2342
>        transaction_kthread+345
>        kthread+229
>        ret_from_fork+49
>        ret_from_fork_asm+27
> completion
>        bio_endio+5
>        dm_submit_bio+955
>        __submit_bio+132
>        submit_bio_noacct_nocheck+345
>        write_all_supers+1718
>        btrfs_commit_transaction+2342
>        transaction_kthread+345
>        kthread+229
>        ret_from_fork+49
>        ret_from_fork_asm+27
> 
> dev dm-86 bio failed 10, submitter
>        submit_bio_noacct+5
>        write_all_supers+1718
>        btrfs_commit_transaction+2342
>        transaction_kthread+345
>        kthread+229
>        ret_from_fork+49
>        ret_from_fork_asm+27
> completion
>        bio_endio+5
>        clone_endio+295
>        clone_endio+295
>        process_one_work+369
>        worker_thread+635
>        kthread+229
>        ret_from_fork+49
>        ret_from_fork_asm+27
> 
> 
> For context, dm-86 is /dev/lvm/brokenDisk and dm-77 is /dev/lowerVG/lvmPool

io_status is 10(BLK_STS_IOERR), which is produced in submission code path on
/dev/dm-77(/dev/lowerVG/lvmPool) first, so looks it is one device mapper issue.

The error should be from the following code only:

static void __map_bio(struct bio *clone)

	...
	if (r == DM_MAPIO_KILL)
		dm_io_dec_pending(io, BLK_STS_IOERR);
	else
		dm_io_dec_pending(io, BLK_STS_DM_REQUEUE);
    break;

Patrick, you mentioned lvmPool is raid1, can you explain how lvmPool is
built? It is dm-raid1 target or over plain raid1 device which is
build over /dev/lowerVG?

Mike, the logic in the following code doesn't change from v5.18-rc2 to
v5.19, but I still can't understand why STS_IOERR is set in
dm_io_complete() in case of BLK_STS_DM_REQUEUE && !__noflush_suspending(),
since DMF_NOFLUSH_SUSPENDING is only set in __dm_suspend() which
is supposed to not happen in Patrick's case.

dm_io_complete()
	...
	if (io->status == BLK_STS_DM_REQUEUE) {
	        unsigned long flags;
	        /*
	         * Target requested pushing back the I/O.
	         */
	        spin_lock_irqsave(&md->deferred_lock, flags);
	        if (__noflush_suspending(md) &&
	            !WARN_ON_ONCE(dm_is_zone_write(md, bio))) {
	                /* NOTE early return due to BLK_STS_DM_REQUEUE below */
	                bio_list_add_head(&md->deferred, bio);
	        } else {
	                /*
	                 * noflush suspend was interrupted or this is
	                 * a write to a zoned target.
	                 */
	                io->status = BLK_STS_IOERR;
	        }
	        spin_unlock_irqrestore(&md->deferred_lock, flags);
	}



thanks,
Ming


  reply	other threads:[~2024-03-10 11:34 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAOCpoWc_HQy4UJzTi9pqtJdO740Wx5Yd702O-mwXBE6RVBX1Eg@mail.gmail.com>
     [not found] ` <CAOCpoWf3TSQkUUo-qsj0LVEOm-kY0hXdmttLE82Ytc0hjpTSPw@mail.gmail.com>
2024-02-28 17:25   ` [REGRESSION] LVM-on-LVM: error while submitting device barriers Patrick Plenefisch
2024-02-28 19:19     ` Goffredo Baroncelli
2024-02-28 19:37       ` Patrick Plenefisch
2024-02-29 19:56         ` Goffredo Baroncelli
2024-02-29 20:22           ` Patrick Plenefisch
2024-02-29 22:05             ` Goffredo Baroncelli
2024-03-05 17:45               ` Mike Snitzer
2024-03-06 15:59                 ` Ming Lei
2024-03-09 20:39                   ` Patrick Plenefisch
2024-03-10 11:34                     ` Ming Lei [this message]
2024-03-10 15:27                       ` Mike Snitzer
2024-03-10 15:47                         ` Ming Lei
2024-03-10 18:11                         ` Patrick Plenefisch
2024-03-11 13:13                           ` Ming Lei
2024-03-12 22:54                             ` Patrick Plenefisch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ze2azGlb1WxVFv7Z@fedora \
    --to=ming.lei@redhat.com \
    --cc=agk@redhat.com \
    --cc=clm@fb.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=regressions@lists.linux.dev \
    --cc=simonpatp@gmail.com \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.