Problem with disk

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Problem with disk
@ 2006-05-03 20:01 David Ronis
  2006-05-03 20:08 ` Ric Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: David Ronis @ 2006-05-03 20:01 UTC (permalink / raw)
  To: linux-ide

I have an Toshiba Satellite M40 laptop that has a Fujitsu MHU2100AT ATA
Disk drive.  I've had two instances of major disk corruption and have
brought the laptop back to Toshiba twice.  The first time they found a
problem in the power supply, but the second they said it was fine.  

I have windows installed on another partition and have had no corruption
problems there since the first repair.  I've also run spinwrite-6.0 on
the disk and again it reports no problems.

Here are some symptoms of the problems under Linux:

I restore from dump backups after running mkfs on the Linux partition.
During the restore I get some complaints of read/write errors
(fortunately all in nonessential files).  After the restore is
completed, the system seems to be fine, however, after powering down an
rebooting, major disk problems are found, and after running fsck, I end
up with significant data loss.

I've run with and without journaling turned on, but this doesn't seem to
make a difference.

I notice from hdparm that disk write caching is turned on.  Any chance
that there is a problem with the cache not being flushed before powering
down?

I'm running Linux-2.6.15.5 on what is otherwise a slackware-10.2
install.

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-03 20:01 Problem with disk David Ronis
@ 2006-05-03 20:08 ` Ric Wheeler
  2006-05-05 23:49   ` Mark Hahn
  0 siblings, 1 reply; 13+ messages in thread
From: Ric Wheeler @ 2006-05-03 20:08 UTC (permalink / raw)
  To: David.Ronis; +Cc: linux-ide

David Ronis wrote:

>I notice from hdparm that disk write caching is turned on.  Any chance
>that there is a problem with the cache not being flushed before powering
>down?
>
>I'm running Linux-2.6.15.5 on what is otherwise a slackware-10.2
>install.
>
>David
>
>
>  
>
While linux has support for write barriers (that allow you to run safely 
with the write cache enabled), it needs support from the drive. 

I would suggest that you should run with the write cache disabled unless 
you can verify working barrier support.

The fact that your drive reports IO errors is also worrying - you might 
just have a bad drive...  You can look at drive help with tools like 
smartctl.

Good luck,

ric


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-03 20:08 ` Ric Wheeler
@ 2006-05-05 23:49   ` Mark Hahn
  2006-05-06  0:51     ` Ric Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Hahn @ 2006-05-05 23:49 UTC (permalink / raw)
  To: David.Ronis; +Cc: linux-ide

> >I notice from hdparm that disk write caching is turned on.  Any chance
> >that there is a problem with the cache not being flushed before powering
> >down?

pretty unlikly.  linux normally offlines the drive before halting.

> I would suggest that you should run with the write cache disabled unless 
> you can verify working barrier support.

this is true, but extremely conservative/paranoid.  it makes a lot 
of sense if you're handling banking transactions or if you really
see a lot of abrupt power-offs (yank the battery).  what are the chances
of a drive failing to write dirty blocks when idle, halting?

don't get me wrong: write barriers are A Good Thing.  just that Linux 
survived very nicely for many years before such things were bothered with.

> The fact that your drive reports IO errors is also worrying - you might 
> just have a bad drive...  You can look at drive help with tools like 
> smartctl.

IO errors trump any concerns for write barriers - there's no need to 
even think about barriers or cache settings if the disk is, for instance,
reporting media errors...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-05 23:49   ` Mark Hahn
@ 2006-05-06  0:51     ` Ric Wheeler
  2006-05-06 17:11       ` Mark Hahn
  0 siblings, 1 reply; 13+ messages in thread
From: Ric Wheeler @ 2006-05-06  0:51 UTC (permalink / raw)
  To: Mark Hahn; +Cc: David.Ronis, linux-ide



Mark Hahn wrote:

>>I would suggest that you should run with the write cache disabled unless 
>>you can verify working barrier support.
>>    
>>
>
>this is true, but extremely conservative/paranoid.  it makes a lot 
>of sense if you're handling banking transactions or if you really
>see a lot of abrupt power-offs (yank the battery).  what are the chances
>of a drive failing to write dirty blocks when idle, halting?
>  
>
The write cache in modern drives is multiple megabytes - 8 or 16MB is 
not uncommon. The chances that you have data that is lost on a power 
failure in the write cache is actually quite high...

I agree that most people should not lose too much sleep over this.

>don't get me wrong: write barriers are A Good Thing.  just that Linux 
>survived very nicely for many years before such things were bothered with.
>
>  
>
>>The fact that your drive reports IO errors is also worrying - you might 
>>just have a bad drive...  You can look at drive help with tools like 
>>smartctl.
>>    
>>
>
>IO errors trump any concerns for write barriers - there's no need to 
>even think about barriers or cache settings if the disk is, for instance,
>reporting media errors...
>
>  
>
Agreed again ;-)

ric


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-06  0:51     ` Ric Wheeler
@ 2006-05-06 17:11       ` Mark Hahn
  2006-05-06 18:17         ` Ric Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Hahn @ 2006-05-06 17:11 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: David.Ronis, linux-ide

> >this is true, but extremely conservative/paranoid.  it makes a lot 
> >of sense if you're handling banking transactions or if you really
> >see a lot of abrupt power-offs (yank the battery).  what are the chances
> >of a drive failing to write dirty blocks when idle, halting?
> >
> The write cache in modern drives is multiple megabytes - 8 or 16MB is 
> not uncommon. The chances that you have data that is lost on a power 
> failure in the write cache is actually quite high...

but we're not talking about power failures in the middle of peak activity.
afaikt, drives also never dedicate their whole cache to writeback - they 
keep plenty available for reads, as well.  it would also be rather surprising
if the firmware was completely oblivious about limiting the age of
writebacks; after all always delaying writes until you run out of cache 
capacity is _not_ a winning strategy (even ignoring safety issues.)

during a normal shutdown, can you think of some reason the drive would have 
LOTS of outstanding writes?  that's the real point.  depending on kernel
version, linux should be doing a cache-flush command and standby, then
eventually calling bios poweroff.  it's very possible that this is going 
wrong (rumors of disks that claim to implement, but ignore cache-flush,
or perhaps ones that stupidly don't flush on standby, or even bios poweroff
that happens so fast that the disks isn't done flushing...)  but turning 
off all writeback is overkill (especially when there's some other obvious 
sign of distress...)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-06 17:11       ` Mark Hahn
@ 2006-05-06 18:17         ` Ric Wheeler
  2006-05-06 18:34           ` Mark Hahn
  0 siblings, 1 reply; 13+ messages in thread
From: Ric Wheeler @ 2006-05-06 18:17 UTC (permalink / raw)
  To: Mark Hahn; +Cc: David.Ronis, linux-ide



Mark Hahn wrote:

>>The write cache in modern drives is multiple megabytes - 8 or 16MB is 
>>not uncommon. The chances that you have data that is lost on a power 
>>failure in the write cache is actually quite high...
>>    
>>
>
>but we're not talking about power failures in the middle of peak activity.
>afaikt, drives also never dedicate their whole cache to writeback - they 
>keep plenty available for reads, as well.  it would also be rather surprising
>if the firmware was completely oblivious about limiting the age of
>writebacks; after all always delaying writes until you run out of cache 
>capacity is _not_ a winning strategy (even ignoring safety issues.)
>  
>

If you have drives/hardware to test on, you can easily verify (which we 
do on a regular basis) that running with barriers over power fail 
testing gets you a solid recovery. Running with write cache on and no 
barriers gets you file system corruption. As I said before, the data you 
just wrote (or the file system wrote for you) most recently is the same 
data that you stand to lose on a powerloss.

>during a normal shutdown, can you think of some reason the drive would have 
>LOTS of outstanding writes?  that's the real point.  depending on kernel
>version, linux should be doing a cache-flush command and standby, then
>eventually calling bios poweroff.  it's very possible that this is going 
>wrong (rumors of disks that claim to implement, but ignore cache-flush,
>or perhaps ones that stupidly don't flush on standby, or even bios poweroff
>that happens so fast that the disks isn't done flushing...)  but turning 
>off all writeback is overkill (especially when there's some other obvious 
>sign of distress...)
>
>  
>
We don't test every make of drive, but the modern drives we do test do 
honor the cache flush commands. It is important to note that drive 
firmware is like any other bit of code - it can have bugs so this 
support does need to be reverified on each drive (and version of 
firmware) before you can trust high value data ;-)

If there is a hole in the sequence, dropping to standby could be the 
source of issues...

ric


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-06 18:17         ` Ric Wheeler
@ 2006-05-06 18:34           ` Mark Hahn
  2006-05-06 22:56             ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Hahn @ 2006-05-06 18:34 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: David.Ronis, linux-ide

> do on a regular basis) that running with barriers over power fail 
> testing gets you a solid recovery. Running with write cache on and no 
> barriers gets you file system corruption.

in short "barriers work".  never doubted!

> As I said before, the data you 
> just wrote (or the file system wrote for you) most recently is the same 
> data that you stand to lose on a powerloss.

obviously.
so the question is whether the cache still has dirty writeback when the power
drops due to normal poweroff.  I'd consider it a bug in the laptop bios to
let this happen, but that's not going to make the affected user happy...

> If there is a hole in the sequence, dropping to standby could be the 
> source of issues...

I guess it's a matter of how byzantine the bugs are you want to consider.
for mass-produced devices, I'm reluctant to assume the disk vendor has 
forgotten to _ever_ flush writeback data, for instance.  and don't forget
that a bogus drive that entirely forgets writeback may also not really
turn off write caching when you tell it to!

I assume that the disk will indeed do writeback if left idle for a little
while.  on machines where this is a  real problem, I would start out by
waving relevant chickens like the following to give the best chance of
shutting down cleanly: 
	sync
	blockdev --flushbufs
	hdparm -W 0
	sleep 2
	hdparm -y
	sleep 5
	halt -hp

rather than _always_ suffering the penalty of disabled write cache, 
especially on a single slow laptop drive...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-06 18:34           ` Mark Hahn
@ 2006-05-06 22:56             ` Tejun Heo
  2006-05-07 13:21               ` Ric Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2006-05-06 22:56 UTC (permalink / raw)
  To: Mark Hahn; +Cc: Ric Wheeler, David.Ronis, linux-ide

Mark Hahn wrote:
[--snip--]
> I assume that the disk will indeed do writeback if left idle for a little
> while.  on machines where this is a  real problem, I would start out by
> waving relevant chickens like the following to give the best chance of
> shutting down cleanly: 
> 	sync
> 	blockdev --flushbufs
> 	hdparm -W 0
> 	sleep 2
> 	hdparm -y
> 	sleep 5
> 	halt -hp
> 
> rather than _always_ suffering the penalty of disabled write cache, 
> especially on a single slow laptop drive...

This is slightly OT as this thread is talking about normal power down 
but disabling writeback cache has its advantages.  When you have power 
fluctation (e.g. power source fluctation or new device hot plugged and 
crappy PSU can't hold the voltage), the harddisk could briefly power 
down while other parts of system keep running.  If the disk was under 
active FS writes, this ends up in inconsistencies between what the OS 
thinks the disk has and the disk actually has.

Unfortunately, this can result in *massive* destruction of the 
filesystem.  I lost my RAID-1 array earlier this year this way.  The FS 
code systematically destroyed metadata of the filesystem and, on the 
following reboot, fsck did the final blow, I think.  I ended up with 
100+Gbytes of unorganized data and I had to recover data by grep + bvi.

This is an extreme case but it shows turning off writeback has its 
advantages.  After the initial stress & panic attack subsided, I tried 
to think about how to prevent such catastrophes, but there doesn't seem 
to be a good way.  There's no way to tell 1. if the harddrive actually 
lost the writeback cache content 2. if so, how much it has lost.  So, 
unless the OS halts the system everytime something seems weird with the 
disk, turning off writeback cache seems to be the only solution.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-06 22:56             ` Tejun Heo
@ 2006-05-07 13:21               ` Ric Wheeler
  2006-05-07 13:41                 ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Ric Wheeler @ 2006-05-07 13:21 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Mark Hahn, David.Ronis, linux-ide, neilb

Tejun Heo wrote:
> 
> 
> Unfortunately, this can result in *massive* destruction of the 
> filesystem.  I lost my RAID-1 array earlier this year this way.  The FS 
> code systematically destroyed metadata of the filesystem and, on the 
> following reboot, fsck did the final blow, I think.  I ended up with 
> 100+Gbytes of unorganized data and I had to recover data by grep + bvi.

Were you running with Neil's fixes that make MD devices properly handle write 
barrier requests?  Until fairly recently (not sure when this was fixed), MD 
devices more or less dropped the barrier requests.

With properly working barriers, any journal file system should get you back to a 
consistent state after a power drop (although there are many less common ways 
that drives can potentially drop data).

> 
> This is an extreme case but it shows turning off writeback has its 
> advantages.  After the initial stress & panic attack subsided, I tried 
> to think about how to prevent such catastrophes, but there doesn't seem 
> to be a good way.  There's no way to tell 1. if the harddrive actually 
> lost the writeback cache content 2. if so, how much it has lost.  So, 
> unless the OS halts the system everytime something seems weird with the 
> disk, turning off writeback cache seems to be the only solution.
> 

Turning off the writeback cache is definitely the safe and conservative way to 
go for mission critical data unless you can be very certain that your barriers 
are properly working on the drive & IO stack.  We validate the cache flush 
commands with a s-ata analyzer (making sure that we see them on sync/transaction 
commits) and that they take a reasonable amount of time at the drive...

ric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-07 13:21               ` Ric Wheeler
@ 2006-05-07 13:41                 ` Tejun Heo
  2006-05-08 14:33                   ` Ric Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2006-05-07 13:41 UTC (permalink / raw)
  To: ric; +Cc: Mark Hahn, David.Ronis, linux-ide, neilb

Ric Wheeler wrote:
> 
> 
> Tejun Heo wrote:
>>
>>
>> Unfortunately, this can result in *massive* destruction of the 
>> filesystem.  I lost my RAID-1 array earlier this year this way.  The 
>> FS code systematically destroyed metadata of the filesystem and, on 
>> the following reboot, fsck did the final blow, I think.  I ended up 
>> with 100+Gbytes of unorganized data and I had to recover data by grep 
>> + bvi.
> 
> Were you running with Neil's fixes that make MD devices properly handle 
> write barrier requests?  Until fairly recently (not sure when this was 
> fixed), MD devices more or less dropped the barrier requests.
> 
> With properly working barriers, any journal file system should get you 
> back to a consistent state after a power drop (although there are many 
> less common ways that drives can potentially drop data).

I'm not sure whether the barrier was working or not.  Ummm.. Are you 
saying that MD is capable of recovering from data drop *during* 
operation?  ie. the system didn't go out, just the harddrives.  Data is 
lost no matter what MD does and MD and the filesystem don't have any way 
to tell which bits made it to the media and which are lost whether 
barriers are working or not.

To handle such conditions, device driver should tell upper layer that 
PHY status has changed (or something weird happened which could lead to 
data loss) and the fs, in return, perform journal replay while still 
online.  I'm pretty sure that isn't implemented in the current kernel.

>>
>> This is an extreme case but it shows turning off writeback has its 
>> advantages.  After the initial stress & panic attack subsided, I tried 
>> to think about how to prevent such catastrophes, but there doesn't 
>> seem to be a good way.  There's no way to tell 1. if the harddrive 
>> actually lost the writeback cache content 2. if so, how much it has 
>> lost.  So, unless the OS halts the system everytime something seems 
>> weird with the disk, turning off writeback cache seems to be the only 
>> solution.
>>
> 
> Turning off the writeback cache is definitely the safe and conservative 
> way to go for mission critical data unless you can be very certain that 
> your barriers are properly working on the drive & IO stack.  We validate 
> the cache flush commands with a s-ata analyzer (making sure that we see 
> them on sync/transaction commits) and that they take a reasonable amount 
> of time at the drive...
> 

One thing I'm curious about is how much performance benefit can be 
obtained from write-back caching.  With NCQ/TCQ, latency is much less of 
an issue and I don't think scheduling and/or buffering inside the drive 
would result in significant performance increase when so much is done by 
the vm and block layer (aside from scheduling of currently queued commands).

Some linux elevators try pretty hard to not mix read and write requests 
as they mess up statistics (write back cache absorbs write requests very 
fast then affect following read requests).  So, they basically try to 
eliminate the effect of write-back caching.

Well, benchmark time, it seems.  :)

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-07 13:41                 ` Tejun Heo
@ 2006-05-08 14:33                   ` Ric Wheeler
  2006-05-10 22:21                     ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Ric Wheeler @ 2006-05-08 14:33 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Mark Hahn, David.Ronis, linux-ide, neilb

Tejun Heo wrote:
> Ric Wheeler wrote:
>>
>>
>> Tejun Heo wrote:
>>>
>>>
>>> Unfortunately, this can result in *massive* destruction of the 
>>> filesystem.  I lost my RAID-1 array earlier this year this way.  The 
>>> FS code systematically destroyed metadata of the filesystem and, on 
>>> the following reboot, fsck did the final blow, I think.  I ended up 
>>> with 100+Gbytes of unorganized data and I had to recover data by 
>>> grep + bvi.
>>
>> Were you running with Neil's fixes that make MD devices properly 
>> handle write barrier requests?  Until fairly recently (not sure when 
>> this was fixed), MD devices more or less dropped the barrier requests.
>>
>> With properly working barriers, any journal file system should get 
>> you back to a consistent state after a power drop (although there are 
>> many less common ways that drives can potentially drop data).
>
> I'm not sure whether the barrier was working or not.  Ummm.. Are you 
> saying that MD is capable of recovering from data drop *during* 
> operation?  ie. the system didn't go out, just the harddrives.  Data 
> is lost no matter what MD does and MD and the filesystem don't have 
> any way to tell which bits made it to the media and which are lost 
> whether barriers are working or not.
I think that MD will do the right thing if the IO terminates with an 
error condition.  If the error is silent (and that can happen during a 
write), then it clearly cannot recover.
>
> To handle such conditions, device driver should tell upper layer that 
> PHY status has changed (or something weird happened which could lead 
> to data loss) and the fs, in return, perform journal replay while 
> still online.  I'm pretty sure that isn't implemented in the current 
> kernel.
>
>>>
>>> This is an extreme case but it shows turning off writeback has its 
>>> advantages.  After the initial stress & panic attack subsided, I 
>>> tried to think about how to prevent such catastrophes, but there 
>>> doesn't seem to be a good way.  There's no way to tell 1. if the 
>>> harddrive actually lost the writeback cache content 2. if so, how 
>>> much it has lost.  So, unless the OS halts the system everytime 
>>> something seems weird with the disk, turning off writeback cache 
>>> seems to be the only solution.
>>>
>>
>> Turning off the writeback cache is definitely the safe and 
>> conservative way to go for mission critical data unless you can be 
>> very certain that your barriers are properly working on the drive & 
>> IO stack.  We validate the cache flush commands with a s-ata analyzer 
>> (making sure that we see them on sync/transaction commits) and that 
>> they take a reasonable amount of time at the drive...
>>
>
> One thing I'm curious about is how much performance benefit can be 
> obtained from write-back caching.  With NCQ/TCQ, latency is much less 
> of an issue and I don't think scheduling and/or buffering inside the 
> drive would result in significant performance increase when so much is 
> done by the vm and block layer (aside from scheduling of currently 
> queued commands).
>
> Some linux elevators try pretty hard to not mix read and write 
> requests as they mess up statistics (write back cache absorbs write 
> requests very fast then affect following read requests).  So, they 
> basically try to eliminate the effect of write-back caching.
>
> Well, benchmark time, it seems.  :)
My own benchmarks showed a clear win for a write intensive work load 
with the write cache + barriers enabled using reiserfs. I think that the 
NCQ/TCQ wins mostly in a read case.

ric



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-08 14:33                   ` Ric Wheeler
@ 2006-05-10 22:21                     ` Tejun Heo
  2006-05-13 19:31                       ` Ric Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2006-05-10 22:21 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Mark Hahn, David.Ronis, linux-ide, neilb

Ric Wheeler wrote:
> I think that MD will do the right thing if the IO terminates with an 
> error condition.  If the error is silent (and that can happen during a 
> write), then it clearly cannot recover.

The condition I've described results in silent loss of data.  Depending 
on type and implementation, LLDD might be able to detect the condition 
(PHY RDY status changed for SATA), but the event happens after the 
affected writes are completed successfully.  For example,

1. fs issues writes for block #x, #y and then barrier #b.
2. #x gets written to the write-back cache and completed successfully
3. power glitch occurs while #y is in progress.  LLDD detects the 
condition, recovers the drive and retries #y.
4. #y gets written to the write-back cache and completed successfully
4. barrier #b gets executed and #y gets written to the media, but #x is 
lost and nobody knows about it.

I'm worried about the problem because, with libata, hotplug is becoming 
available to the masses and when average Joe hot plugs a new drive into 
his machine which has $8 power supply (really, they sell 300w ATX power 
at 8000 KRW which is about $8), this is going to happen.  I had a pretty 
decent power supply from a reputable maker but I still got hit by the 
problem.

Maybe the correct approach is to establish a warm-plug protocol.  Kernel 
provides a way to plug IOs and user helper program plugs all IOs until 
the new device settles.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem with disk
  2006-05-10 22:21                     ` Tejun Heo
@ 2006-05-13 19:31                       ` Ric Wheeler
  0 siblings, 0 replies; 13+ messages in thread
From: Ric Wheeler @ 2006-05-13 19:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Mark Hahn, David.Ronis, linux-ide, neilb



Tejun Heo wrote:

> Ric Wheeler wrote:
>
>> I think that MD will do the right thing if the IO terminates with an 
>> error condition.  If the error is silent (and that can happen during 
>> a write), then it clearly cannot recover.
>
>
> The condition I've described results in silent loss of data.  
> Depending on type and implementation, LLDD might be able to detect the 
> condition (PHY RDY status changed for SATA), but the event happens 
> after the affected writes are completed successfully.  For example,
>
> 1. fs issues writes for block #x, #y and then barrier #b.
> 2. #x gets written to the write-back cache and completed successfully
> 3. power glitch occurs while #y is in progress.  LLDD detects the 
> condition, recovers the drive and retries #y.
> 4. #y gets written to the write-back cache and completed successfully
> 4. barrier #b gets executed and #y gets written to the media, but #x 
> is lost and nobody knows about it.

The promise that you get from the barrier is pretty simple - after a 
successful one, all IO's that have been submitted before then are on 
platter if the barrier works.

In your example, if you mean power glitch as in power loss, x will be 
lost (and probably lots of other write cache state), but the application 
should expect it (or add extra barriers)....

>
> I'm worried about the problem because, with libata, hotplug is 
> becoming available to the masses and when average Joe hot plugs a new 
> drive into his machine which has $8 power supply (really, they sell 
> 300w ATX power at 8000 KRW which is about $8), this is going to 
> happen.  I had a pretty decent power supply from a reputable maker but 
> I still got hit by the problem.

Not sure that I understand exactly how a glitch (as opposed to a full 
loss) would cause x to get lost - the drive firmware should track the 
fact that x was in the write cache and not destaged to platter.

>
> Maybe the correct approach is to establish a warm-plug protocol.  
> Kernel provides a way to plug IOs and user helper program plugs all 
> IOs until the new device settles.
>
> Thanks.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-05-13 19:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-03 20:01 Problem with disk David Ronis
2006-05-03 20:08 ` Ric Wheeler
2006-05-05 23:49   ` Mark Hahn
2006-05-06  0:51     ` Ric Wheeler
2006-05-06 17:11       ` Mark Hahn
2006-05-06 18:17         ` Ric Wheeler
2006-05-06 18:34           ` Mark Hahn
2006-05-06 22:56             ` Tejun Heo
2006-05-07 13:21               ` Ric Wheeler
2006-05-07 13:41                 ` Tejun Heo
2006-05-08 14:33                   ` Ric Wheeler
2006-05-10 22:21                     ` Tejun Heo
2006-05-13 19:31                       ` Ric Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).