Re: parent transid verify failed

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eric Levy <contact@ericlevy.name>
To: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: parent transid verify failed
Date: Sat, 01 Jan 2022 02:33:53 -0500	[thread overview]
Message-ID: <7cffc181c0b01a52dfd82128eb656ec2ec44b94d.camel@ericlevy.name> (raw)
In-Reply-To: <CAJCQCtRxkZ4NjQA9KrOvb_ybDE-sg_BzzMU=91fT_p8gMEKw6Q@mail.gmail.com>

On Fri, 2021-12-31 at 16:09 -0700, Chris Murphy wrote:
> Dec 29 21:01:09 hostname kernel: sd 4:0:0:1: Warning! Received an
> indication that the LUN assignments on this target have changed. The
> Linux SCSI layer does not automatical
> ...
> Dec 30 03:47:10 hostname kernel: sd 4:0:0:1: rejecting I/O to offline
> device
> Dec 30 03:47:10 hostname kernel: blk_update_request: I/O error, dev
> sdc, sector 523542288 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio
> class 0
> 
> 
> Can you tell us more about /dev/sdc a.k.a. scsi 4:0:0:1? Because this
> device seems to be in a bad state, and is rejecting writes.
> 
> 
> --
> Chris Murphy

The hardware is an appliance with an array of SATA magnetic disks
managed in a RAID6-like volume. The volume is formatted with a top-
level Btrfs file system, on which is installed an OS that manages
services and provides an administrative interface. 

All of these components are presently reported healthy through normal
diagnostics and logging. More intensive diagnostics would include a
S.M.A.R.T. extended test of each media device, and a scrubbing of the
file system. All available indications suggest these operations would
expose no problems.

Above these lower layers runs a logical-unit manager and an iSCSI
service, allowing provisioning of logical volumes. After a volume is
provisioned, it is given appropriate permissions (e.g. R/W), and mapped
to an iSCSI target. Adding it to an existing target allows it be
detected by an initiator (the host using the volume) without any
administrative overhead on the remote side. However, the new mapping is
not broadcast, so the initiator must request a refreshed the list of
items in the target.

The log message you have reproduced shows the iSCSI daemon detecting a
new volume in the target, as a consequence of my instruction from the
administrative tool to refresh the list. The volume labeled 4:0:0:1, or
/dev/sdc already existed. The one that was added was 4:0:0:2, or
/dev/sdd.

In this case, the logical units, one pre-existing and one new, were the
volumes for the failing file system.

Since all of these operations were new to me, I tried a variety of
smaller operations at each step to finally achieve the end result.  I
have no indication that I caused any problems, but it is possible there
were side effects due to my own missteps, or quirks in the design of
the administrative system.

While I think a problem of such kind is unlikely, it seems more likely
than a problem at a lower level.

Possible reasons for the device becoming read only include the
following:

   1. As a side effect of provisioning the new volume, write privileges
      were effectively removed for the existing one.
   2. The LV backend became confused and entered a protective state.
   3. The iSCSI initiator was unable to add a new volume without affecting
      the existing one.

However, the logs show that between the detection of the new volume and
the reversion to the RO state, several add and remove operations were
completed. Without knowing about the mechanics of Btrfs, I would
postulate that these operations depend on successful write access to
all devices in the pool, including any added or removed. It appears to
me as though the volumes were healthy and accessible for at least some
time, which can be calculated from the log timestamps, long enough for
me to issue commands.

Thus, my first thought was a file-system issue, not a device issue.

Is the present indication more strongly that a) the driver refuses to
mount the file system because its state shows as corrupt, or b) the
driver aborts the mount operation because it fails to write at the
block level?

If the problem is device-level, then there is much to try, including
renewing the iSCSI login. I can also restart the daemon, reboot the
host, even restart the iSCSI backend service or appliance. 

Would any operations of such a kind be helpful to try?

next prev parent reply	other threads:[~2022-01-01  7:34 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-30 21:10 parent transid verify failed Eric Levy
2021-12-30 21:47 ` Chris Murphy
2022-01-01 15:11   ` devel
2021-12-31 19:14 ` Zygo Blaxell
2021-12-31 20:33   ` Eric Levy
2021-12-31 23:09     ` Chris Murphy
2022-01-01  7:33       ` Eric Levy [this message]
2022-01-01 20:49         ` Chris Murphy
2022-01-01 21:57           ` Eric Levy
2022-01-01 20:56     ` Zygo Blaxell
2022-01-01 21:58       ` Eric Levy
2022-01-02  0:15         ` Zygo Blaxell
2022-01-02  0:55           ` Eric Levy
2022-01-02  3:27             ` Zygo Blaxell
2022-01-02  4:03               ` Eric Levy
2022-01-02  5:57                 ` Zygo Blaxell
2022-01-02 10:17                   ` Eric Levy
2022-01-03  7:41                 ` Chris Murphy
2022-01-02  7:31     ` Andrei Borzenkov
  -- strict thread matches above, loose matches on Subject: below --
2017-05-11 10:01 Massimo B.
     [not found] <E18363B1-CD81-41F4-A03C-4D09AA669915@plack.net>
2015-04-28 12:34 ` Anthony Plack
2010-09-06 17:28 Jan Steffens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7cffc181c0b01a52dfd82128eb656ec2ec44b94d.camel@ericlevy.name \
    --to=contact@ericlevy.name \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).