[linux-lvm] Powerfailure and snapshot consistency

linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed

* [linux-lvm] Powerfailure and snapshot consistency
@ 2011-03-26  4:24 Stuart D. Gathman
  2011-03-26  4:42 ` Ron Johnson
  2011-03-26  7:49 ` Ray Morris
  0 siblings, 2 replies; 17+ messages in thread
From: Stuart D. Gathman @ 2011-03-26  4:24 UTC (permalink / raw)
  To: linux-lvm

At a production site, there was a power failure, and the UPS battery 
wasn't up to the job.  There was a backup running, but otherwise things 
were quiet.

The production system came up fine, but the test system, which was running 
from a snapshot, had a corrupted filesystem.  The PVs are md raid1.  One 
of the PVs had to resync when power was restored.

My theory about what happened to the test system is this: the origin and 
snapshot were on different PVs.  The origin was writing as part of the 
backup process, and some blocks that were supposed to be copied to the COW 
didn't get copied, but were written to the origin.  This would result in
an inconsistent image for the snapshot.

Assuming my theory is correct, the morals would be:

1) don't run production on a snapshot :-)

2) make sure important LVs do not span multiple PVs (except for LVM
    mirroring) - you could be unhappy in the event of a system crash.

While I can peruse lvdisplay --maps manually, is there a tool that 
highlights LVs that span multiple PVs?

--
 	      Stuart D. Gathman <stuart@bmsi.com>
     Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-26  4:24 [linux-lvm] Powerfailure and snapshot consistency Stuart D. Gathman
@ 2011-03-26  4:42 ` Ron Johnson
  2011-03-26  4:52   ` Stuart D. Gathman
  2011-03-26  7:49 ` Ray Morris
  1 sibling, 1 reply; 17+ messages in thread
From: Ron Johnson @ 2011-03-26  4:42 UTC (permalink / raw)
  To: linux-lvm

On 03/25/2011 11:24 PM, Stuart D. Gathman wrote:
[snip]
>
> 2) make sure important LVs do not span multiple PVs (except for LVM
> mirroring) - you could be unhappy in the event of a system crash.
>

But isn't "volumes larger than physical devices" (one of) the raison 
d'etre of LVM?

-- 
"Neither the wisest constitution nor the wisest laws will secure
the liberty and happiness of a people whose manners are universally
corrupt."
Samuel Adams, essay in The Public Advertiser, 1749

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-26  4:42 ` Ron Johnson
@ 2011-03-26  4:52   ` Stuart D. Gathman
  2011-03-26  5:25     ` Ron Johnson
  0 siblings, 1 reply; 17+ messages in thread
From: Stuart D. Gathman @ 2011-03-26  4:52 UTC (permalink / raw)
  To: LVM general discussion and development

On Fri, 25 Mar 2011, Ron Johnson wrote:

> On 03/25/2011 11:24 PM, Stuart D. Gathman wrote:
> [snip]
>> 
>> 2) make sure important LVs do not span multiple PVs (except for LVM
>> mirroring) - you could be unhappy in the event of a system crash.
>> 
>
> But isn't "volumes larger than physical devices" (one of) the raison d'etre 
> of LVM?

Yes, but a power failure can then mess up the ordering of write completions
distributed between 2 or more PVs, which could defeat the assumptions made
by your file system journaling.

and

No, YMMV, but I generally have a number of smaller LVs (for virtual machines)
and it is nice to have a larger pool of PVs from which they are allocated.

--
 	      Stuart D. Gathman <stuart@bmsi.com>
     Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-26  4:52   ` Stuart D. Gathman
@ 2011-03-26  5:25     ` Ron Johnson
  2011-03-26  5:53       ` hansbkk
  2011-03-26 16:07       ` Stuart D. Gathman
  0 siblings, 2 replies; 17+ messages in thread
From: Ron Johnson @ 2011-03-26  5:25 UTC (permalink / raw)
  To: linux-lvm

On 03/25/2011 11:52 PM, Stuart D. Gathman wrote:
> On Fri, 25 Mar 2011, Ron Johnson wrote:
>
>> On 03/25/2011 11:24 PM, Stuart D. Gathman wrote:
>> [snip]
>>>
>>> 2) make sure important LVs do not span multiple PVs (except for LVM
>>> mirroring) - you could be unhappy in the event of a system crash.
>>>
>>
>> But isn't "volumes larger than physical devices" (one of) the raison
>> d'etre of LVM?
>
> Yes, but a power failure can then mess up the ordering of write completions
> distributed between 2 or more PVs, which could defeat the assumptions made
> by your file system journaling.
>

File a bug...  But against what?  LVM?  The FS?  The block layer?

> and
>
> No, YMMV, but I generally have a number of smaller LVs (for virtual
> machines)
> and it is nice to have a larger pool of PVs from which they are allocated.
>

I guess I'm just a DP Dinosaur who thinks that if a machine is beefy
enough to run a bunch of services then those services should run 
directly on the machine.  (Obviously I don't work for a web hosting 
company...)

-- 
"Neither the wisest constitution nor the wisest laws will secure
the liberty and happiness of a people whose manners are universally
corrupt."
Samuel Adams, essay in The Public Advertiser, 1749

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-26  5:25     ` Ron Johnson
@ 2011-03-26  5:53       ` hansbkk
  2011-03-26 16:07       ` Stuart D. Gathman
  1 sibling, 0 replies; 17+ messages in thread
From: hansbkk @ 2011-03-26  5:53 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Ron Johnson

On Sat, Mar 26, 2011 at 12:25 PM, Ron Johnson <ron.l.johnson@cox.net> wrote:
> On 03/25/2011 11:52 PM, Stuart D. Gathman wrote:
>>> On 03/25/2011 11:24 PM, Stuart D. Gathman wrote:
>>>> 2) make sure important LVs do not span multiple PVs (except for LVM
>>>> mirroring) - you could be unhappy in the event of a system crash.
> File a bug... �But against what? �LVM? �The FS? �The block layer?

IMO it's fair enough for low-level processes to assume continuity of
power. Having (and regularly testing) an appropriate UPS is a baseline
component of any system where reliability is important.

Perhaps a re-wording: You will more easily be able to recover your
data in the event of a system crash if your LVs don't span multiple
PVs.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-26  4:24 [linux-lvm] Powerfailure and snapshot consistency Stuart D. Gathman
  2011-03-26  4:42 ` Ron Johnson
@ 2011-03-26  7:49 ` Ray Morris
  1 sibling, 0 replies; 17+ messages in thread
From: Ray Morris @ 2011-03-26  7:49 UTC (permalink / raw)
  To: linux-lvm

On Sat, 26 Mar 2011 00:24:56 -0400 (EDT)
"Stuart D. Gathman" <stuart@bmsi.com> wrote:

> Assuming my theory is correct, the morals would be:
...
> 2) make sure important LVs do not span multiple PVs (except for LVM
>     mirroring) - you could be unhappy in the event of a system crash.

   I don't believe that multiple PVs necesarily makes a 
difference. On a single PV, there is no guarantee that 
writes make it to disk during a power failure, what order
they make it in, or even if a write is correct while voltage
is dropping. The drive itself could reorder a write (NCQ), 
the raid card could do so, etc. One might say the moral
of the story is "don't run important production servers 
on spotty power systems without an effective UPS".  

  Someone mentioned journaling. Journaling does not of
course guarantee data consistency, only makes meta data, 
more likely to be consistent, so regardless when you 
undervolt a drive during power failure something is subject
to go wrong, aside from the more obvious effects of even
an instant power cut while writing a block.  Moral - 
don't yank the power is the data is important.
-- 
Ray Morris
support@bettercgi.com

Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/

Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/

Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-26  5:25     ` Ron Johnson
  2011-03-26  5:53       ` hansbkk
@ 2011-03-26 16:07       ` Stuart D. Gathman
  2011-03-26 20:30         ` Mike Snitzer
  2011-03-28 17:26         ` Les Mikesell
  1 sibling, 2 replies; 17+ messages in thread
From: Stuart D. Gathman @ 2011-03-26 16:07 UTC (permalink / raw)
  To: LVM general discussion and development

On Sat, 26 Mar 2011, Ron Johnson wrote:

>> Yes, but a power failure can then mess up the ordering of write completions
>> distributed between 2 or more PVs, which could defeat the assumptions made
>> by your file system journaling.
>
> File a bug...  But against what?  LVM?  The FS?  The block layer?

It is not a bug.  Some progress can be made with barriers (similar to fsync())
that block until all affected blocks are confirmed written on all devices
through all levels of the storage stack (e.g. written to all legs
of a raid1 device).  My database does an fsync after each journal batch,
and I think it reasonable to hope that this guarantees that the writes
from the journal batch complete before any subsequent writes.  I don't
depend on any other ordering.

In the case of a snapshot, I believe the COW and origin blocks are written
in parallel.  Snapshots are slow enough as it is.  :-)  So it is not
surprising that it loses  consistency on power failure.

>> No, YMMV, but I generally have a number of smaller LVs (for virtual
>> machines) and it is nice to have a larger pool of PVs from which they are
>> allocated.
>
> I guess I'm just a DP Dinosaur who thinks that if a machine is beefy
> enough to run a bunch of services then those services should run directly on 
> the machine.  (Obviously I don't work for a web hosting company...)

The VMs give excellent isolation between VMs in case you don't trust
the (morals and/or competence of) admins for the VMs.  I understand,
however, that Open Solaris has comparable isolation without a full VM, and one
of these days I'll play with it.

--
 	      Stuart D. Gathman <stuart@bmsi.com>
     Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-26 16:07       ` Stuart D. Gathman
@ 2011-03-26 20:30         ` Mike Snitzer
  2011-03-27 21:55           ` Stuart D. Gathman
  2011-03-28 17:26         ` Les Mikesell
  1 sibling, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2011-03-26 20:30 UTC (permalink / raw)
  To: LVM general discussion and development

On Sat, Mar 26 2011 at 12:07pm -0400,
Stuart D. Gathman <stuart@bmsi.com> wrote:

> On Sat, 26 Mar 2011, Ron Johnson wrote:
> 
> >>Yes, but a power failure can then mess up the ordering of write completions
> >>distributed between 2 or more PVs, which could defeat the assumptions made
> >>by your file system journaling.
> >
> >File a bug...  But against what?  LVM?  The FS?  The block layer?
> 
> It is not a bug.  Some progress can be made with barriers (similar to fsync())
> that block until all affected blocks are confirmed written on all devices
> through all levels of the storage stack (e.g. written to all legs
> of a raid1 device).  My database does an fsync after each journal batch,
> and I think it reasonable to hope that this guarantees that the writes
> from the journal batch complete before any subsequent writes.  I don't
> depend on any other ordering.
> 
> In the case of a snapshot, I believe the COW and origin blocks are written
> in parallel.  Snapshots are slow enough as it is.  :-)  So it is not
> surprising that it loses  consistency on power failure.

The cow is completed before the origin is written.  In addition, the
snapshot volume offers full support for flush (barriers) to both the
origin and snapshot devices.

Your FUD about inconsistency due to the snapshot implementation needs
to be substantiated with something more than an incoherent guesswork
theory.

That said, anything is possible.  But if you want real help you need to
be specific about which kernel you're using.  What is your underlying
hardware (and caching mode)?  And what it was you were doing at the time
of the power failure (running some FS benchmark? or what?).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-26 20:30         ` Mike Snitzer
@ 2011-03-27 21:55           ` Stuart D. Gathman
  2011-03-28 17:20             ` Phillip Susi
  0 siblings, 1 reply; 17+ messages in thread
From: Stuart D. Gathman @ 2011-03-27 21:55 UTC (permalink / raw)
  To: LVM general discussion and development

On Sat, 26 Mar 2011, Mike Snitzer wrote:

>> In the case of a snapshot, I believe the COW and origin blocks are written
>> in parallel.  Snapshots are slow enough as it is.  :-)  So it is not
>> surprising that it loses  consistency on power failure.
>
> The cow is completed before the origin is written.  In addition, the
> snapshot volume offers full support for flush (barriers) to both the
> origin and snapshot devices.
>
> Your FUD about inconsistency due to the snapshot implementation needs
> to be substantiated with something more than an incoherent guesswork
> theory.
>
> That said, anything is possible.  But if you want real help you need to
> be specific about which kernel you're using.  What is your underlying
> hardware (and caching mode)?  And what it was you were doing at the time
> of the power failure (running some FS benchmark? or what?).

Don't need help, just trying to understand the event.  So from what you
said, the problem probably stems from the RAID1 resync, which could
make one PV "older" than the other.

LVM is EL5.5 via Centos: lvm2-2.02.56 (really old - yes I know).  kernel-2.6.18
(with patches).

As I already said, the production system (on origin) was running a scheduled
database backup, which does involve copying about 2 gigs of data to another
directory on the origin.  So the origin was likely heavily writing when
the power (and UPS) failed.

I'll mention again that these are linear LVs on top of md raid1.

--
 	      Stuart D. Gathman <stuart@bmsi.com>
     Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-27 21:55           ` Stuart D. Gathman
@ 2011-03-28 17:20             ` Phillip Susi
  2011-03-28 17:24               ` Stuart D. Gathman
  0 siblings, 1 reply; 17+ messages in thread
From: Phillip Susi @ 2011-03-28 17:20 UTC (permalink / raw)
  To: LVM general discussion and development

On 3/27/2011 5:55 PM, Stuart D. Gathman wrote:
> Don't need help, just trying to understand the event.  So from what you
> said, the problem probably stems from the RAID1 resync, which could
> make one PV "older" than the other.

The data is written to the primary mirror first, so this can't happen.
Proper use of barriers will prevent this whole scenario, so my guess is
that you are using ext3 ( which defaults to nobarrier ) or explicitly
mounting ext4 with nobarrier.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-28 17:20             ` Phillip Susi
@ 2011-03-28 17:24               ` Stuart D. Gathman
  2011-03-28 17:37                 ` Phillip Susi
  2011-03-28 18:34                 ` Mike Snitzer
  0 siblings, 2 replies; 17+ messages in thread
From: Stuart D. Gathman @ 2011-03-28 17:24 UTC (permalink / raw)
  To: Phillip Susi; +Cc: LVM general discussion and development

On Mon, 28 Mar 2011, Phillip Susi wrote:

> On 3/27/2011 5:55 PM, Stuart D. Gathman wrote:
>> Don't need help, just trying to understand the event.  So from what you
>> said, the problem probably stems from the RAID1 resync, which could
>> make one PV "older" than the other.
>
> The data is written to the primary mirror first, so this can't happen.
> Proper use of barriers will prevent this whole scenario, so my guess is
> that you are using ext3 ( which defaults to nobarrier ) or explicitly
> mounting ext4 with nobarrier.

Ah, thank you!  Yes, I am using ext3 with EL5.5 defaults, and will now learn
about the barrier option.  Seems like a good thing to turn on when "replace
battery" comes up on the ups.

It sounds like dumb luck/divine mercy that the raid 1 PV on which the
production LV resides did not have a similar issue.

--
 	      Stuart D. Gathman <stuart@bmsi.com>
     Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-26 16:07       ` Stuart D. Gathman
  2011-03-26 20:30         ` Mike Snitzer
@ 2011-03-28 17:26         ` Les Mikesell
  2011-03-28 17:54           ` Stuart D. Gathman
  1 sibling, 1 reply; 17+ messages in thread
From: Les Mikesell @ 2011-03-28 17:26 UTC (permalink / raw)
  To: linux-lvm

On 3/26/2011 11:07 AM, Stuart D. Gathman wrote:
>
> It is not a bug. Some progress can be made with barriers (similar to
> fsync())
> that block until all affected blocks are confirmed written on all devices
> through all levels of the storage stack (e.g. written to all legs
> of a raid1 device). My database does an fsync after each journal batch,
> and I think it reasonable to hope that this guarantees that the writes
> from the journal batch complete before any subsequent writes. I don't
> depend on any other ordering.

Is there some non-destructive diagnostic that can tell you if a running 
machine can or can't manage write ordering correctly through all of its 
software and hardware layers?

-- 
   Les Mikesell
    lesmikesell@gmail.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-28 17:24               ` Stuart D. Gathman
@ 2011-03-28 17:37                 ` Phillip Susi
  2011-03-28 21:26                   ` Stuart D. Gathman
  2011-03-28 18:34                 ` Mike Snitzer
  1 sibling, 1 reply; 17+ messages in thread
From: Phillip Susi @ 2011-03-28 17:37 UTC (permalink / raw)
  To: Stuart D. Gathman; +Cc: LVM general discussion and development

On 3/28/2011 1:24 PM, Stuart D. Gathman wrote:
> Ah, thank you!  Yes, I am using ext3 with EL5.5 defaults, and will now
> learn
> about the barrier option.  Seems like a good thing to turn on when "replace
> battery" comes up on the ups.
> 
> It sounds like dumb luck/divine mercy that the raid 1 PV on which the
> production LV resides did not have a similar issue.

It actually should be on all the time, which is why it defaults to on in
ext4.  A UPS doesn't help if the kernel crashes or some hardware fails.
 I have no idea why incidents like this are not common place with ext3
since it is inherently unsafe without barriers, whether you are using
lvm or mdadm or not.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-28 17:26         ` Les Mikesell
@ 2011-03-28 17:54           ` Stuart D. Gathman
  2011-03-28 19:43             ` Ron Johnson
  0 siblings, 1 reply; 17+ messages in thread
From: Stuart D. Gathman @ 2011-03-28 17:54 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, 28 Mar 2011, Les Mikesell wrote:

> On 3/26/2011 11:07 AM, Stuart D. Gathman wrote:
>> 
> Is there some non-destructive diagnostic that can tell you if a running 
> machine can or can't manage write ordering correctly through all of its 
> software and hardware layers?

http://blog.nirkabel.org/2008/12/07/ext3-write-barriers-and-write-caching/

The above mentions a test program that demonstrates the ext3 weakness.
You should be able to run it in a test VM, and destroy (virtual power off)
the VM.  When the barrier=1 mount option is turned on, the mapper is supposed
to log a warning if this is not supported at some level.  There appear to
be disks, however, with hardware write caching that do not correct support
barriers.  You must turn off hardware write caching on these models.

You can probably test the software stack by running md + lvm in a VM and 
destroying the VM.  To test the whole thing, you could run a test VM
on an external SATA/USB/Firewire drive and power off the drive during
the test.  This *shouldn't* affect your production VMs (but I'd use separate
hardware just to be sure.)

What a headache.  I want to be working on my programming projects, not
becoming a super admin.

--
 	      Stuart D. Gathman <stuart@bmsi.com>
     Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-28 17:24               ` Stuart D. Gathman
  2011-03-28 17:37                 ` Phillip Susi
@ 2011-03-28 18:34                 ` Mike Snitzer
  1 sibling, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2011-03-28 18:34 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, Mar 28 2011 at  1:24pm -0400,
Stuart D. Gathman <stuart@bmsi.com> wrote:

> On Mon, 28 Mar 2011, Phillip Susi wrote:
> 
> >On 3/27/2011 5:55 PM, Stuart D. Gathman wrote:
> >>Don't need help, just trying to understand the event.  So from what you
> >>said, the problem probably stems from the RAID1 resync, which could
> >>make one PV "older" than the other.
> >
> >The data is written to the primary mirror first, so this can't happen.
> >Proper use of barriers will prevent this whole scenario, so my guess is
> >that you are using ext3 ( which defaults to nobarrier ) or explicitly
> >mounting ext4 with nobarrier.
> 
> Ah, thank you!  Yes, I am using ext3 with EL5.5 defaults, and will now learn
> about the barrier option.  Seems like a good thing to turn on when "replace
> battery" comes up on the ups.

RHEL5.x's DM doesn't support barriers (nor does MD afaik).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-28 17:54           ` Stuart D. Gathman
@ 2011-03-28 19:43             ` Ron Johnson
  0 siblings, 0 replies; 17+ messages in thread
From: Ron Johnson @ 2011-03-28 19:43 UTC (permalink / raw)
  To: linux-lvm

On 03/28/2011 12:54 PM, Stuart D. Gathman wrote:
> On Mon, 28 Mar 2011, Les Mikesell wrote:
>
>> On 3/26/2011 11:07 AM, Stuart D. Gathman wrote:
>>>
>> Is there some non-destructive diagnostic that can tell you if a
>> running machine can or can't manage write ordering correctly through
>> all of its software and hardware layers?
>
> http://blog.nirkabel.org/2008/12/07/ext3-write-barriers-and-write-caching/
>
> The above mentions a test program that demonstrates the ext3 weakness.
> You should be able to run it in a test VM, and destroy (virtual power off)
> the VM. When the barrier=1 mount option is turned on, the mapper is
> supposed
> to log a warning if this is not supported at some level. There appear to
> be disks, however, with hardware write caching that do not correct support
> barriers. You must turn off hardware write caching on these models.
>

"Why did our reports sudden take 85 hours to run?"

-- 
"Neither the wisest constitution nor the wisest laws will secure
the liberty and happiness of a people whose manners are universally
corrupt."
Samuel Adams, essay in The Public Advertiser, 1749

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [linux-lvm] Powerfailure and snapshot consistency
  2011-03-28 17:37                 ` Phillip Susi
@ 2011-03-28 21:26                   ` Stuart D. Gathman
  0 siblings, 0 replies; 17+ messages in thread
From: Stuart D. Gathman @ 2011-03-28 21:26 UTC (permalink / raw)
  To: Phillip Susi; +Cc: LVM general discussion and development

On Mon, 28 Mar 2011, Phillip Susi wrote:

> On 3/28/2011 1:24 PM, Stuart D. Gathman wrote:
>> Ah, thank you!  Yes, I am using ext3 with EL5.5 defaults, and will now
>> learn
>> about the barrier option.  Seems like a good thing to turn on when "replace
>> battery" comes up on the ups.
>>
>> It sounds like dumb luck/divine mercy that the raid 1 PV on which the
>> production LV resides did not have a similar issue.
>
> It actually should be on all the time, which is why it defaults to on in
> ext4.  A UPS doesn't help if the kernel crashes or some hardware fails.
> I have no idea why incidents like this are not common place with ext3
> since it is inherently unsafe without barriers, whether you are using
> lvm or mdadm or not.

EL5.x does not support write barriers in lvm or md.  That may be why ext3
defaults to not using them.  Fedora supports barriers in md and LVM, but I'm
not clear on the VM systems.  There are hardware raid controllers with battery
backed write cache that disable write caching on the drives.  These fail if the
power outage is extended.

The size of a drive write cache is limited (8 MiB max), and a drive always
starts writing immediately - the write cache is to avoid blocking subsequent
host writes.  It seems to me that an ideal hardware solution would be a battery
that powers the drive just long enough to finish the write cache - just a few
seconds at worst.  Of course, the system UPS is supposed to supply that, and
that must be why I've never seen a drive level UPS offered.

We used to have Motorola servers that would provide a POWERFAIL signal when AC
failed, nearly a second before DC failed (big capacitors).  The OS could use
that to suspend disk writes, giving drives a chance to finish flushing.
If AC power is restored before the capacitor runs out, then writes would
resume.  Knowing what I know now, I used this signal incorrectly. 
On POWERFAIL, the database would flush its write cache.  Not smart if the
disk write cache was full.  (I'm not sure drives had write cache in those
days - drive tech was SCSI 2.)

--
 	      Stuart D. Gathman <stuart@bmsi.com>
     Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-03-28 21:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-26  4:24 [linux-lvm] Powerfailure and snapshot consistency Stuart D. Gathman
2011-03-26  4:42 ` Ron Johnson
2011-03-26  4:52   ` Stuart D. Gathman
2011-03-26  5:25     ` Ron Johnson
2011-03-26  5:53       ` hansbkk
2011-03-26 16:07       ` Stuart D. Gathman
2011-03-26 20:30         ` Mike Snitzer
2011-03-27 21:55           ` Stuart D. Gathman
2011-03-28 17:20             ` Phillip Susi
2011-03-28 17:24               ` Stuart D. Gathman
2011-03-28 17:37                 ` Phillip Susi
2011-03-28 21:26                   ` Stuart D. Gathman
2011-03-28 18:34                 ` Mike Snitzer
2011-03-28 17:26         ` Les Mikesell
2011-03-28 17:54           ` Stuart D. Gathman
2011-03-28 19:43             ` Ron Johnson
2011-03-26  7:49 ` Ray Morris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).