public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Con Kolivas <kernel@kolivas.org>
To: "Eric D. Mudama" <edmudama@mail.bounceswoosh.org>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.6.0-test11 data loss
Date: Thu, 25 Dec 2003 16:17:30 +1100	[thread overview]
Message-ID: <200312251617.30228.kernel@kolivas.org> (raw)
In-Reply-To: <20031225020738.GA24690@bounceswoosh.org>

On Thu, 25 Dec 2003 13:07, Eric D. Mudama wrote:
> On Thu, Dec 25 at  9:34, Con Kolivas wrote:
> >On Thu, 25 Dec 2003 09:22, Gergely Tamas wrote:
> >> I don't think this is a reiserfs bug. This was my first thought and
> >> after first hitting this bug, I've moved all my partitions from reiserfs
> >> to jfs. But I've also had this problem with it... Now I'm back to
> >> 2.4.23, and everything works fine.
> >
> >Because of the numerous reboots and hangs I've seen with experimental
> > patches I've also seen this, but it's not reiserFS fault. The problem is
> > that most drives have write caching enabled and not all of them are safe
> > with this. If you disable it with hdparm (hdparm -W 0 /dev/hd*) you'll
> > find that open files during a hard reset or power outage will prevent
> > those open files from being corrupted.
>
> Write cache off will not prevent a file from being corrupted, however,
> it should limit the corruption to a single disk operation.
>
> I don't see how the behavior you describe could be the drive's
> fault...
>
> The user stated that their system hard locked, then they went and
> rebooted it, and following the reboot they had corruption...  From
> this, there are a few possibilities:
>
> 1. The drive had been given the commands to write the data prior to the
> hang.
>
> If this was the case, the drive would happilly keep writing the data
> it had been given and was caching in the background, even while you
> continued to send (or stopped sending) data for a new command over the
> interface.  An IDE interface lockup or system lockup will not prevent
> the drive from flushing the remainder of its write cache.  (Only
> possible exception might be faulty handling of a hard reset, but all
> drives today will flush their cache when they see the reset, prior to
> processing it.) Unless the user yanked power within a few hundred
> milliseconds of the write command, I think it is unlikely that cached
> data already in the drive wasn't flushed properly.
>
> 2. The drive was in the middle of a command writing important data
> during the hang.
>
> In this case, yes, your file you were writing would probably be
> corrupt on the media, but nothing more.  Drives detect power loss, and
> immediately disable write-gate and park the actuator.  If they don't
> get the actuator parked before they run out of back-EMF from the
> momentum of the platter(s), the head will stick to the media and
> you'll probably need a chisel to get that drive to spin again.
>
> 3. The drive hadn't yet been issued the commands for the data that was
> eventually corrupted.
>
> I find this to be the most likely case, and is a situation where the
> filesystem thinks objects were moved but those updates were not
> correctly sent to the disk (due to the hang?), so it might think
> they're in the old location or something.  (I'm not a filesystem
> wizard so if I'm way off-base, my apologies)
>
> It seems to me that the problem occurred at a higher system level than
> the disk, and disabling the write cache on the drive (besides being a
> *HUGE* performance loser) will only make the window for failure
> smaller, not eliminate it entirely.
>
> Unless you are using *really* old hard drives, the write caching in
> today's drives is really quite good and definitely should be usable.
> Sure, it makes things less safe in power events, but system lockups
> shouldn't affect the drive's ability to flush its cache.  Note too
> that Gergely reported that the problem went away on his 2.4.23 system.
> I don't believe that to be a small data point.

I hardly said it was the correct solution; just what worked for me, as I had 
exactly the same issue going 2.4->2.6. I can't even recall if write caching 
was actually on in 2.4, and my write performance under video capture has not 
shown any detriment. The filesystem gods should comment. Merry Christmas.

Con


  reply	other threads:[~2003-12-25  5:17 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-24 21:59 2.6.0-test11 data loss Keith Lea
2003-12-24 22:22 ` Gergely Tamas
2003-12-24 22:34   ` Con Kolivas
2003-12-25  2:07     ` Eric D. Mudama
2003-12-25  5:17       ` Con Kolivas [this message]
2003-12-25  6:15       ` Hans Reiser
2003-12-25 16:46   ` Tomas Szepe
2003-12-25 23:45     ` Hmamouche, Youssef
2003-12-25  1:21 ` Felipe Alfaro Solana
2003-12-25  6:11 ` Hans Reiser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200312251617.30228.kernel@kolivas.org \
    --to=kernel@kolivas.org \
    --cc=edmudama@mail.bounceswoosh.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox