All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robert Hancock <hancockrwd@gmail.com>
To: Byron Stanoszek <bstanoszek@comtime.com>
Cc: Jeff Garzik <jgarzik@pobox.com>, linux-ide@vger.kernel.org
Subject: Re: Possible corruption over AHCI
Date: Sun, 06 Jan 2013 22:08:50 -0600	[thread overview]
Message-ID: <50EA4A52.2030800@gmail.com> (raw)
In-Reply-To: <alpine.LNX.2.00.1301031441140.19998@winds.org>

On 01/03/2013 02:45 PM, Byron Stanoszek wrote:
> Hi Jeff, all,
>
> I'm having a data corruption issue while storing data to a specific type of
> Compact Flash card connected over AHCI. It seems that when two (or more)
> processes are writing to disk at the same time, and a sync() happens, every
> once in a while some data from one process's file writes will appear in
> place
> of data in the other file.
>
> Here are the specifics of my hardware:
>
> I'm using the built-in CF card slot on a Siemens 627C Industrial PC,
> which is
> connected to the motherboard via an AHCI chipset. The CF card is
> bootable. The
> BIOS is configured to use "RAID" mode ("Enhanced" or "AHCI" mode will
> not boot
> the CF card).
>
> AHCI chipset in use:
> 00:1f.2 0104: 8086:282a (rev 05)
> 00:1f.2 RAID bus controller: Intel Corporation 82801 Mobile SATA
> Controller [RAID mode] (rev 05)
>
> CF card with the problem:  SanDisk Ultra 8GB   (model SDCFH-008G)
> CF card that always works: SanDisk Extreme 8GB (model SDCFX-008G)
>
> Filesystem: ReiserFS
>
> Kernels tested to show symptoms: 3.0.14, 3.4.11, 3.7.1
>
> I can get the problem to reproduce almost 50% of the time by having a
> program
> drop a 50MB core dump in the background (over and over again) to the disk,
> while in the meantime I rsync over a 190MB gzipped file over to the disk
> from a
> remote PC. After that, I "sync", and then I clear the kernel's clean cache
> using "echo 1 > /proc/sys/vm/drop_caches".
>
> 50% of the time, rereading the gzipped file will show one or more 4K
> chunks of
> data from the core dump (or other process writing to disk) come out in
> random
> locations in the file, compared to what the file showed before clearing the
> cache. In other words, after the write and sync is complete, the cached
> file in
> Linux memory shows correct, but the copy stored on disk is wrong.
>
> I've reproduced the problem on several 627C PCs and Ultra cards now. If
> I use
> the same Ultra card on any other type of PC (using ata_piix or pata_jmicron
> drivers, since the Siemens PC is the only system I have with an AHCI
> chipset),
> it works fine. If I use an Extreme card instead on the Siemens PC, it works
> fine (even after 1000 transfers).
>
> I tried mounting and recreating the ReiserFS using the "notail" option,
> still
> same problem.
>
> I tried limiting the disk to use UDMA/33 or PIO4 mode, still same
> problem. (The
> Ultra disk normally comes up as UDMA/66, and the Extreme disk normally
> comes up
> as UDMA/100).
>
> I verified NCQ is not being used.
>
> Assuming this is a problem in the AHCI driver for the moment, what other
> options can I tweak to try to narrow down the problem? Are there any
> relevant
> AHCI features I can turn on/off by changing the source?
>
> I've attached the dmesg & lspci of the Siemens PC.
>
> Thanks and best regards,
>   -Byron

My first inclination is that this isn't very likely to be a problem in 
the AHCI driver. It's the most widely used storage driver on modern PCs 
so it seems unlikely that this sort of problem would show up there at 
this point.

I assume there's some kind of SATA to PATA bridge involved in the chain 
(likely on the motherboard). It's possible that some combination of 
timing changes between the cards, the controller operating mode and/or 
the different host controller causes a bug to occur in either the CF 
card or the bridge chip.

  reply	other threads:[~2013-01-07  4:08 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-03 20:45 Possible corruption over AHCI Byron Stanoszek
2013-01-07  4:08 ` Robert Hancock [this message]
2013-01-13 23:31   ` Byron Stanoszek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50EA4A52.2030800@gmail.com \
    --to=hancockrwd@gmail.com \
    --cc=bstanoszek@comtime.com \
    --cc=jgarzik@pobox.com \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.