Re: Failover for unattached USB device

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dmitry Katsubo <dma_k@mail.ru>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Failover for unattached USB device
Date: Thu, 25 Oct 2018 11:47:13 +0200	[thread overview]
Message-ID: <e3d54b3d15a57d8fcd34e7a5bb9139c1@mail.ru> (raw)
In-Reply-To: <CAJCQCtQqus2ytiBCe=TZsWNsBNDy-SR064r1bPsyC5hL3iaJKQ@mail.gmail.com>

On 2018-10-24 20:05, Chris Murphy wrote:
> I think about the best we can expect in the short term is that Btrfs
> goes read-only before the file system becomes corrupted in a way it
> can't recover with a normal mount. And I'm not certain it is in this
> state of development right now for all cases. And I say the same thing
> for other file systems as well.
> 
> Running Btrfs on USB devices is fine, so long as they're well behaved.
> I have such a setup with USB 3.0 devices. Perhaps I got a bit lucky,
> because there are a lot of known bugs with USB controllers, USB bridge
> chipsets, and USB hubs.
> 
> Having user definable switches for when to go read-only is, I think
> misleading to the user, and very likely will mislead the file system.
> The file system needs to go read-only when it gets confused, period.
> It doesn't matter what the error rate is.

In general I agree. I just wonder why it couldn't happen quicker. For
example, from the log I've originally attached one can see that btrfs
made 1867 attempts to read (perhaps the same) block from both devices
in RAID1 volume, without success:

BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1867, flush 0, 
corrupt 0, gen 0
BTRFS error (device sdf): bdev /dev/sdg errs: wr 0, rd 1867, flush 0, 
corrupt 0, gen 0

Attempts lasted for 29 minutes.

> The work around is really to do the hard work making the devices
> stable. Not asking Btrfs to paper over known unstable hardware.
> 
> In my case, I started out with rare disconnects and resets with
> directly attached drives. This was a couple years ago. It was a Btrfs
> raid1 setup, and the drives would not go missing at the same time, but
> both would just drop off from time to time. Btrfs would complain of
> dropped writes, I vaguely remember it going read only. But normal
> mounts worked, sometimes with scary errors but always finding a good
> copy on the other drive, and doing passive fixups. Scrub would always
> fix up the rest. I'm still using those same file systems on those
> devices, but now they go through a dyconn USB 3.0 hub with a decently
> good power supply. I originally thought the drop offs were power
> related, so I explicitly looked for a USB hub that could supply at
> least 2A, and this one is 12VDC @ 2500mA. A laptop drive will draw
> nearly 1A on spin up, but at that point P=AV. Laptop drives during
> read/write using 1.5 W to 2.5 W @ 5VDC.
> 
> 1.5-2.5 W = A * 5 V
> Therefore A = 0.3-0.5A
> 
> And for 4 drives at possibly 0.5 A (although my drives are all at the
> 1.6 W read/write), that's 2 A @ 5 V, which is easily maintained for
> the hub power supply (which by my calculation could do 6 A @ 5 V, not
> accounting for any resistance).
> 
> Anyway, as it turns out I don't think it was power related, as the
> Intel NUC in question probably had just enough amps per port. And what
> it really was, was incompatibility between the Intel controller and
> the bridgechipset in the USB-SATA cases, and the USB hub is similar to
> an ethernet hub, it actually reads the USB stream and rewrites it out.
> So hubs are actually pretty complicated little things, and having a
> good one matters.

Thanks for this information. I have a situation similar to yours, with
only important difference that my drives are put into the USB dock with
independent power and cooling like this one:

https://www.ebay.com/itm/Mediasonic-ProBox-4-Bay-3-5-Hard-Drive-Enclosure-USB-3-0-eSATA-Sata-3-6-0Gbps/273161164246

so I don't think I need to worry about amps. This dock is connected
directly to USB port on the motherboard.

However indeed there could be bugs both on dock side and in south 
bridge.
More over I could imagine that USB reset happens due to another USB 
device,
like a wave stated in one place turning into tsunami for the whole
USB subsystem.

> There are pending patches for something similar that you can find in
> the archives. I think the reason they haven't been merged yet is there
> haven't been enough comments and feedback (?). I think Anand Jain is
> the author of those patches so you might dig around in the archives.
> In a way you have an ideal setup for testing them out. Just make sure
> you have backups...

Thanks for reference. Should I look for this patch here:

https://patchwork.kernel.org/project/linux-btrfs/list/?submitter=34632&order=-date

or this patch was only floating around in this maillist?

> 'btrfs check' without the --repair flag is safe and read only but
> takes a long time because it'll read all metadata. The fastest safe
> way is to mount it ro and read a directory recently being written to
> and see if there are any kernel errors. You could recursively copy
> files from a directory to /dev/null and then check kernel messages for
> any errors. So long as metadata is DUP, there is a good chance a bad
> copy of metadata can be automatically fixed up with a good copy. If
> there's only single copy of metadata, or both copies get corrupt, then
> it's difficult. Usually recovery of data is possible, but depending on
> what's damaged, repair might not be possible.

I think "btrfs check" would be too heavy. Monitoring kernel errors is
something I was thinking about as well.

I didn't observe any errors while doing "btrfs check" on this volume 
after
several such resets, because that volume is mostly used for reading and
chance that USB reset happens during the write is very low.

-- 
With best regards,
Dmitry

next prev parent reply	other threads:[~2018-10-25  9:47 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-16 22:14 Failover for unattached USB device Dmitry Katsubo
2018-10-24 15:03 ` Dmitry Katsubo
2018-10-24 18:05   ` Chris Murphy
2018-10-25  9:47     ` Dmitry Katsubo [this message]
2018-10-25 18:34       ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3d54b3d15a57d8fcd34e7a5bb9139c1@mail.ru \
    --to=dma_k@mail.ru \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).