Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@inwind.it>
To: Patrick Plenefisch <simonpatp@gmail.com>
Cc: stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@kernel.org>,
	Mikulas Patocka <mpatocka@redhat.com>, Chris Mason <clm@fb.com>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	regressions@lists.linux.dev, dm-devel@lists.linux.dev,
	linux-btrfs@vger.kernel.org
Subject: Re: [REGRESSION] LVM-on-LVM: error while submitting device barriers
Date: Thu, 29 Feb 2024 20:56:23 +0100	[thread overview]
Message-ID: <a1e30dab-dfde-418e-a0dd-3e294838e839@inwind.it> (raw)
In-Reply-To: <CAOCpoWexiuYLu0fpPr71+Uzxw_tw3q4HGF9tKgx5FM4xMx9fWA@mail.gmail.com>

On 28/02/2024 20.37, Patrick Plenefisch wrote:
> On Wed, Feb 28, 2024 at 2:19 PM Goffredo Baroncelli <kreijack@libero.it> wrote:
>>
>> On 28/02/2024 18.25, Patrick Plenefisch wrote:
>>> I'm unsure if this is just an LVM bug, or a BTRFS+LVM interaction bug,
>>> but LVM is definitely involved somehow.
>>> Upgrading from 5.10 to 6.1, I noticed one of my filesystems was
>>> read-only. In dmesg, I found:
>>>
>>> BTRFS error (device dm-75): bdev /dev/mapper/lvm-brokenDisk errs: wr
>>> 0, rd 0, flush 1, corrupt 0, gen 0
>>> BTRFS warning (device dm-75): chunk 13631488 missing 1 devices, max
>>> tolerance is 0 for writable mount
>>> BTRFS: error (device dm-75) in write_all_supers:4379: errno=-5 IO
>>> failure (errors while submitting device barriers.)
>>> BTRFS info (device dm-75: state E): forced readonly
>>> BTRFS warning (device dm-75: state E): Skipping commit of aborted transaction.
>>> BTRFS: error (device dm-75: state EA) in cleanup_transaction:1992:
>>> errno=-5 IO failure
>>>
>>> At first I suspected a btrfs error, but a scrub found no errors, and
>>> it continued to be read-write on 5.10 kernels.
>>>
>>> Here is my setup:
>>>
>>> /dev/lvm/brokenDisk is a lvm-on-lvm volume. I have /dev/sd{a,b,c,d}
>>> (of varying sizes) in a lower VG, which has three LVs, all raid1
>>> volumes. Two of the volumes are further used as PV's for an upper VGs.
>>> One of the upper VGs has no issues. The non-PV LV has no issue. The
>>> remaining one, /dev/lowerVG/lvmPool, hosting nested LVM, is used as a
>>> PV for VG "lvm", and has 3 volumes inside. Two of those volumes have
>>> no issues (and are btrfs), but the last one is /dev/lvm/brokenDisk.
>>> This volume is the only one that exhibits this behavior, so something
>>> is special.
>>>
>>> Or described as layers:
>>> /dev/sd{a,b,c,d} => PV => VG "lowerVG"
>>> /dev/lowerVG/single (RAID1 LV) => BTRFS, works fine
>>> /dev/lowerVG/works (RAID1 LV) => PV => VG "workingUpper"
>>> /dev/workingUpper/{a,b,c} => BTRFS, works fine
>>> /dev/lowerVG/lvmPool (RAID1 LV) => PV => VG "lvm"
>>> /dev/lvm/{a,b} => BTRFS, works fine
>>> /dev/lvm/brokenDisk => BTRFS, Exhibits errors
>>
>> I am a bit curious about the reasons of this setup.
> 
> The lowerVG is supposed to be a pool of storage for several VM's &
> containers. [workingUpper] is for one VM, and [lvm] is for another VM.
> However right now I'm still trying to organize the files directly
> because I don't have all the VM's fully setup yet
> 
>> However I understood that:
>>
>> /dev/sda -+                +-- single (RAID1) -> ok             +-> a   ok
>> /dev/sdb  |                |                                    |-> b   ok
>> /dev/sdc  +--> [lowerVG]>--+-- works (RAID1) -> [workingUpper] -+-> c   ok
>> /dev/sdd -+                |
>>                              |                       +-> a          -> ok
>>                              +-- lvmPool (raid1)-> [lvm] ->-|
>>                                                      +-> b          -> ok
>>                                                      |
>>                                                      +->brokenDisk  -> fail
>>
>> [xxx] means VG, the others are LVs that may act also as PV in
>> an upper VG
> 
> Note that lvmPool is also RAID1, but yes
> 
>>
>> So, it seems that
>>
>> 1) lowerVG/lvmPool/lvm/a
>> 2) lowerVG/lvmPool/lvm/a
>> 3) lowerVG/lvmPool/lvm/brokenDisk
>>
>> are equivalent ... so I don't understand how 1) and 2) are fine but 3) is
>> problematic.
> 
> I assume you meant  lvm/b for 2?

Yes

>>
>> Is my understanding of the LVM layouts correct ?
> 
> Your understanding is correct. The only thing that comes to my mind to
> cause the problem is asymmetry of the SATA devices. I have one 8TB
> device, plus a 1.5TB, 3TB, and 3TB drives. Doing math on the actual
> extents, lowerVG/single spans (3TB+3TB), and
> lowerVG/lvmPool/lvm/brokenDisk spans (3TB+1.5TB). Both obviously have
> the other leg of raid1 on the 8TB drive, but my thought was that the
> jump across the 1.5+3TB drive gap was at least "interesting"


what about lowerVG/works ?

However yes, I agree that the pair of disks involved may be the answer
of the problem.

Could you show us the output of

$ sudo pvdisplay -m

> 
>>
>>
>>>
>>> After some investigation, here is what I've found:
>>>
>>> 1. This regression was introduced in 5.19. 5.18 and earlier kernels I
>>> can keep this filesystem rw and everything works as expected, while
>>> 5.19.0 and later the filesystem is immediately ro on any write
>>> attempt. I couldn't build rc1, but I did confirm rc2 already has this
>>> regression.
>>> 2. Passing /dev/lvm/brokenDisk to a KVM VM as /dev/vdb with an
>>> unaffected kernel inside the vm exhibits the ro barrier problem on
>>> unaffected kernels.
>>
>> Is /dev/lvm/brokenDisk *always* problematic with affected ( >= 5.19 ) and
>> UNaffected ( < 5.19 ) kernel ?
> 
> Yes, I didn't test it in as much depth, but 5.15 and 6.1 in the VM
> (and 6.1 on the host) are identically problematic
> 
>>
>>> 3. Passing /dev/lowerVG/lvmPool to a KVM VM as /dev/vdb with an
>>> affected kernel inside the VM and using LVM inside the VM exhibits
>>> correct behavior (I can keep the filesystem rw, no barrier errors on
>>> host or guest)
>>
>> Is /dev/lowerVG/lvmPool problematic with only "affected" kernel ?
> 
> Uh, passing lvmPool directly to the VM is never problematic. I tested
> 5.10 and 6.1 in the VM (and 6.1 on the host), and neither setup throws
> barrier errors.
> 
>> [...]
>>
>> --
>> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
>> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
>>

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


  reply	other threads:[~2024-02-29 19:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAOCpoWc_HQy4UJzTi9pqtJdO740Wx5Yd702O-mwXBE6RVBX1Eg@mail.gmail.com>
     [not found] ` <CAOCpoWf3TSQkUUo-qsj0LVEOm-kY0hXdmttLE82Ytc0hjpTSPw@mail.gmail.com>
2024-02-28 17:25   ` [REGRESSION] LVM-on-LVM: error while submitting device barriers Patrick Plenefisch
2024-02-28 19:19     ` Goffredo Baroncelli
2024-02-28 19:37       ` Patrick Plenefisch
2024-02-29 19:56         ` Goffredo Baroncelli [this message]
2024-02-29 20:22           ` Patrick Plenefisch
2024-02-29 22:05             ` Goffredo Baroncelli
2024-03-05 17:45               ` Mike Snitzer
2024-03-06 15:59                 ` Ming Lei
2024-03-09 20:39                   ` Patrick Plenefisch
2024-03-10 11:34                     ` Ming Lei
2024-03-10 15:27                       ` Mike Snitzer
2024-03-10 15:47                         ` Ming Lei
2024-03-10 18:11                         ` Patrick Plenefisch
2024-03-11 13:13                           ` Ming Lei
2024-03-12 22:54                             ` Patrick Plenefisch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1e30dab-dfde-418e-a0dd-3e294838e839@inwind.it \
    --to=kreijack@inwind.it \
    --cc=agk@redhat.com \
    --cc=clm@fb.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=regressions@lists.linux.dev \
    --cc=simonpatp@gmail.com \
    --cc=snitzer@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox