How does btrfs handle sudden shutdowns?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* How does btrfs handle sudden shutdowns?
@ 2012-11-06 12:33 Michael Kjörling
  2012-11-06 12:48 ` Hugo Mills
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Michael Kjörling @ 2012-11-06 12:33 UTC (permalink / raw)
  To: linux-btrfs

Can btrfs deal reasonably gracefully with sudden shutdowns? (I'm
mainly thinking of power outages which lead to logical structure
damage but not physical media damage.)

What would be the risk points, file-system-wise?

Can for example a rotating snapshot schedule mitigate some or all
issues relating to sudden shutdowns, if any? (_For example_, take a
snapshot every minute, keeping the last five; if the main file system
fails to mount, then could the most recent usable snapshot be used as
a fallback, or is it likely to be equally damaged or inconsistent?)

Obviously a UPS or other form of fallback power is preferable to no
UPS if power outages are a concern, so as to allow a controlled system
shutdown (or fail-over to a more long-term backup power supply) in the
event of a prolonged power outage, but I'm wondering about situations
where such don't exist or even fail.

-- 
Michael Kjörling • http://michael.kjorling.se • michael@kjorling.se
                “People who think they know everything really annoy
                those of us who know we don’t.” (Bjarne Stroustrup)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How does btrfs handle sudden shutdowns?
  2012-11-06 12:33 How does btrfs handle sudden shutdowns? Michael Kjörling
@ 2012-11-06 12:48 ` Hugo Mills
  2012-11-06 13:47   ` Michael Kjörling
  2012-11-06 12:54 ` Liu Bo
  2012-11-09  1:04 ` Alex
  2 siblings, 1 reply; 6+ messages in thread
From: Hugo Mills @ 2012-11-06 12:48 UTC (permalink / raw)
  To: Michael Kjörling; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3047 bytes --]

On Tue, Nov 06, 2012 at 12:33:08PM +0000, Michael Kjörling wrote:
> Can btrfs deal reasonably gracefully with sudden shutdowns? (I'm
> mainly thinking of power outages which lead to logical structure
> damage but not physical media damage.)

   In theory (i.e. by the design of the FS), you should be able to
pull the plug on btrfs at any point, and the FS will always be
consistent.

   This makes some assumptions: That writing a single page to the FS
is atomic. That the hardware reports barriers to the OS reliably. i.e.
if the hardware says it's fully stored data without losing it, then it
actually has.

   There are also some caveats: while the FS should always be
consistent, the latest transaction write may not have been completed,
so you could potentially lose up to 30 seconds of writes to the FS
from immediately before the crash.

   If the FS does corrupt over a power failure, and the hardware can
be demonstrated to be good, then we have a bug that needs to be
tracked down. (There have been a number of these over the development
of the FS so far, but they do get fixed).

> What would be the risk points, file-system-wise?
> 
> Can for example a rotating snapshot schedule mitigate some or all
> issues relating to sudden shutdowns, if any? (_For example_, take a
> snapshot every minute, keeping the last five; if the main file system
> fails to mount, then could the most recent usable snapshot be used as
> a fallback, or is it likely to be equally damaged or inconsistent?)

   No, snapshots give you no additional guarantees -- if the FS
corrupts and is unmountable, a snapshot is part of the same FS and
will also be unmountable.

> Obviously a UPS or other form of fallback power is preferable to no
> UPS if power outages are a concern, so as to allow a controlled system
> shutdown (or fail-over to a more long-term backup power supply) in the
> event of a prolonged power outage, but I'm wondering about situations
> where such don't exist or even fail.

   As I said above, the FS structures _should_ be completely reliable
in the face of power loss; that they haven't been in the past is
definitely a bug, and those bugs have been / are being fixed as
they're found. We've had very few transid match failures recently,
which used to be the main failure mode for these bugs. I don't know
whether that's because people aren't reporting them, or because
they're not happening nearly so often these days. I suspect the
latter.

   I guess the question for you is: are you after the _expected_
behaviour of the FS (should always be consistent on good hardware, but
you may lose up to 30 seconds of writes), or are you after mitigation
strategies in the face of FS bugs (keep off-site backups and be
prepared to use them)?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
        --- emacs:  Eighty Megabytes And Constantly Swapping. ---        

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How does btrfs handle sudden shutdowns?
  2012-11-06 12:33 How does btrfs handle sudden shutdowns? Michael Kjörling
  2012-11-06 12:48 ` Hugo Mills
@ 2012-11-06 12:54 ` Liu Bo
  2012-11-09  1:04 ` Alex
  2 siblings, 0 replies; 6+ messages in thread
From: Liu Bo @ 2012-11-06 12:54 UTC (permalink / raw)
  To: Michael Kjörling; +Cc: linux-btrfs

On Tue, Nov 06, 2012 at 12:33:08PM +0000, Michael Kjörling wrote:
> Can btrfs deal reasonably gracefully with sudden shutdowns? (I'm
> mainly thinking of power outages which lead to logical structure
> damage but not physical media damage.)
> 

AFAIK, yes, because btrfs is naturally COW supported, which means
you can roll back to the latest stable situation at least.

> What would be the risk points, file-system-wise?
> 

Data loss is possible if you're not writing with O_SYNC or doing fsync
after a write.

> Can for example a rotating snapshot schedule mitigate some or all
> issues relating to sudden shutdowns, if any? (_For example_, take a
> snapshot every minute, keeping the last five; if the main file system
> fails to mount, then could the most recent usable snapshot be used as
> a fallback, or is it likely to be equally damaged or inconsistent?)
> 

In your case, when we finish creating a snapshot, the whole FS is at a
stable status(both metadata and data is safely written into the disk).

So yes, you can use the latest snapshot as a fallback or backup or something.

I'd note here, btrfs somewhat suffers from ENOSPC cases, where it may
recover itself or get you into readonly state, but you data is safe at least.

thanks,
liubo

> Obviously a UPS or other form of fallback power is preferable to no
> UPS if power outages are a concern, so as to allow a controlled system
> shutdown (or fail-over to a more long-term backup power supply) in the
> event of a prolonged power outage, but I'm wondering about situations
> where such don't exist or even fail.
> 
> -- 
> Michael Kjörling • http://michael.kjorling.se • michael@kjorling.se
>                 “People who think they know everything really annoy
>                 those of us who know we don’t.” (Bjarne Stroustrup)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How does btrfs handle sudden shutdowns?
  2012-11-06 12:48 ` Hugo Mills
@ 2012-11-06 13:47   ` Michael Kjörling
  2012-11-06 13:54     ` Hugo Mills
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Kjörling @ 2012-11-06 13:47 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs

On 6 Nov 2012 12:48 +0000, from hugo@carfax.org.uk (Hugo Mills):
>    There are also some caveats: while the FS should always be
> consistent, the latest transaction write may not have been completed,
> so you could potentially lose up to 30 seconds of writes to the FS
> from immediately before the crash.

I'd rather lose the most recent 30 seconds of writes but have a
consistent file system with as-consistent-as-can-be-expected data,
than end up with a corrupted file system.

On that note; can this value be tuned currently, is it hardcoded, or
is it stored in metadata somewhere but the tooling to tune it is not
yet available?

>    If the FS does corrupt over a power failure, and the hardware can
> be demonstrated to be good, then we have a bug that needs to be
> tracked down. (There have been a number of these over the development
> of the FS so far, but they do get fixed).

Is there a simple way to tell ahead of time whether the hardware meets
the assumptions made by the file system with regards to write barriers
etc.?

>    I guess the question for you is: are you after the _expected_
> behaviour of the FS (should always be consistent on good hardware, but
> you may lose up to 30 seconds of writes), or are you after mitigation
> strategies in the face of FS bugs (keep off-site backups and be
> prepared to use them)?

I already have full, daily on-site backups on an external drive that
is logically unmounted except for when backups are running, as well as
partial off-site backups to cloud storage - and of course, taking
advantage of btrfs's snapshotting support there is no real reason why
I couldn't increase the backup frequency while retaining data
consistency. Losing half a minute of writes is fairly inconsequential
for personal use as long as the file system remains consistent, and in
the face of disastrous corruption it is at least possible to do a full
restore to bare metal from rescue media and backup without losing too
much. Not trivial time-wise (that's currently 1.4 TB over USB 2.0),
but possible.

-- 
Michael Kjörling • http://michael.kjorling.se • michael@kjorling.se
                “People who think they know everything really annoy
                those of us who know we don’t.” (Bjarne Stroustrup)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How does btrfs handle sudden shutdowns?
  2012-11-06 13:47   ` Michael Kjörling
@ 2012-11-06 13:54     ` Hugo Mills
  0 siblings, 0 replies; 6+ messages in thread
From: Hugo Mills @ 2012-11-06 13:54 UTC (permalink / raw)
  To: Michael Kjörling; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3597 bytes --]

On Tue, Nov 06, 2012 at 01:47:02PM +0000, Michael Kjörling wrote:
> On 6 Nov 2012 12:48 +0000, from hugo@carfax.org.uk (Hugo Mills):
> >    There are also some caveats: while the FS should always be
> > consistent, the latest transaction write may not have been completed,
> > so you could potentially lose up to 30 seconds of writes to the FS
> > from immediately before the crash.
> 
> I'd rather lose the most recent 30 seconds of writes but have a
> consistent file system with as-consistent-as-can-be-expected data,
> than end up with a corrupted file system.
> 
> On that note; can this value be tuned currently, is it hardcoded, or
> is it stored in metadata somewhere but the tooling to tune it is not
> yet available?

   As far as I understand, no, it's hard-coded.

> >    If the FS does corrupt over a power failure, and the hardware can
> > be demonstrated to be good, then we have a bug that needs to be
> > tracked down. (There have been a number of these over the development
> > of the FS so far, but they do get fixed).
> 
> Is there a simple way to tell ahead of time whether the hardware meets
> the assumptions made by the file system with regards to write barriers
> etc.?

   "Most" hardware does. I think there's a "barriers disabled" warning
in the kernel logs on mounting the FS, and some time ago there were
rumours of a tool to check for it (from Red Hat, but I don't know if
it ever saw the light of day). That's all for the case where the
hardware explicitly states that it doesn't support barriers.

   More concerning is the out-of-spec hardware which claims to support
barriers and utterly fails to do so. I don't think there's much you
can do to detect that case, other than force failures and try to catch
it out -- then return it to the manufacturer under whatever consumer
protection laws you have, on the grounds that it's no fit for purpose.

   I think the number of actual such hard disks that do this is fairly
small, but they are out there. I'm not aware of a blacklist/quirks
list for them.

> >    I guess the question for you is: are you after the _expected_
> > behaviour of the FS (should always be consistent on good hardware, but
> > you may lose up to 30 seconds of writes), or are you after mitigation
> > strategies in the face of FS bugs (keep off-site backups and be
> > prepared to use them)?
> 
> I already have full, daily on-site backups on an external drive that
> is logically unmounted except for when backups are running, as well as
> partial off-site backups to cloud storage - and of course, taking
> advantage of btrfs's snapshotting support there is no real reason why
> I couldn't increase the backup frequency while retaining data
> consistency. Losing half a minute of writes is fairly inconsequential
> for personal use as long as the file system remains consistent, and in
> the face of disastrous corruption it is at least possible to do a full
> restore to bare metal from rescue media and backup without losing too
> much. Not trivial time-wise (that's currently 1.4 TB over USB 2.0),
> but possible.

   OK, so I hope I've managed to answer your question satisfactorily.
Let us know if there's any outstanding queries you want cleared up. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
    --- "I will not be pushed,  filed, stamped, indexed, briefed, ---    
               debriefed or numbered.  My life is my own."               

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How does btrfs handle sudden shutdowns?
  2012-11-06 12:33 How does btrfs handle sudden shutdowns? Michael Kjörling
  2012-11-06 12:48 ` Hugo Mills
  2012-11-06 12:54 ` Liu Bo
@ 2012-11-09  1:04 ` Alex
  2 siblings, 0 replies; 6+ messages in thread
From: Alex @ 2012-11-09  1:04 UTC (permalink / raw)
  To: linux-btrfs

Michael Kjörling <michael <at> kjorling.se> writes:

> 
> Can btrfs deal reasonably gracefully with sudden shutdowns? (I'm
> mainly thinking of power outages which lead to logical structure
> damage but not physical media damage.)
> 

Really rather well! We've had a sequence of power-cuts around here and I've
scrubbed each time, finding only one corruption over all which was fixed by the
scrub and no data lost.





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-09  1:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-06 12:33 How does btrfs handle sudden shutdowns? Michael Kjörling
2012-11-06 12:48 ` Hugo Mills
2012-11-06 13:47   ` Michael Kjörling
2012-11-06 13:54     ` Hugo Mills
2012-11-06 12:54 ` Liu Bo
2012-11-09  1:04 ` Alex

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).