From: Jeff Garzik <jeff@garzik.org>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Ric Wheeler <ric@emc.com>,
linux-scsi <linux-scsi@vger.kernel.org>,
linux-fsdevel@vger.kernel.org,
Linux-ide <linux-ide@vger.kernel.org>
Subject: Re: impact of 4k sector size on the IO & FS stack
Date: Mon, 12 Mar 2007 10:26:32 -0400 [thread overview]
Message-ID: <45F56318.9030505@garzik.org> (raw)
In-Reply-To: <20070312122424.18ed86ce@lxorguk.ukuu.org.uk>
Alan Cox wrote:
>> First generation of 1K sector drives will continue to use the same
>> 512-byte ATA sector size you are familiar with. A single 512-byte write
>> will cause the drive to perform a read-modify-write cycle. This
>> configuration is physical 1K sector, logical 512b sector.
>
> The problem case is "read-modify-screwup"
>
> At that point we've trashed the block we were writing (a well studied
> recovery case), and we've blasted some previously sane, totally
> unrelated sector of data out of existance. Thats why we need to know
> ideally if they are doing the write to a different physical block when
> they do this, so that we don't lose the old data. My guess is they won't
> as it'll be hard.
Strict ATA command set answer: you will have no idea what goes on under
the hood. The current 512-b interface stays /exactly/ the same, save
for a word or two in IDENTIFY DEVICE telling you the "secret" physical
sector size. If all your I/Os are aligned properly, then you need not
worry about RMW cycles, as they will not occur.
Intuition answer: they will use their firmware-internal standard code
for scheduling reads and writes, and will only reallocate sectors as
needed by media failure or similar events.
The "M" part of the modify cycle happens in disk ram. So from the
disk's point of view, a single 512-b write would require reading a
single 1K hard sector, updating the contents in cache RAM, and then
writing a single 1K hard sector. The reading of the unknown half of the
sector can be scheduled well in advance, usually, since writeback
caching gives the drive plenty of time (relatively speaking) to optimize
things.
Overall, it definitely adds a few more points of failure, but we can't
do much at all about those points of failure.
In my own experiments on my own Fedora workstation, ~66% of IOs in Linux
start on an odd sector, and ~33% started on even-numbered sectors. For
a 1K-sector drive with 'odd' alignment, the configuration Microsoft will
likely want, that means the majority of disk transactions will avoid a
RMW cycle, but a still-numerous minority will not. I did not test
transfer length, to see how many transfers /ended/ on an odd sector,
thus determining how many RMW cycles the tail of an average I/O requires.
>> A future configuration will change the logical ATA interface away from
>> 512-byte sectors to 1K or 4K. Here, it is impossible to read a quantity
>> smaller than 1K or 4K, whatever the sector size is.
>
> That one I'm not worried about - other than "guess how Redmond decide to
> make partition tables work" that one is mostly easy (be fun to see how
> many controllers simply can't cope with the command formats)
Indeed...
Jeff
next prev parent reply other threads:[~2007-03-12 14:26 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-11 22:51 impact of 4k sector size on the IO & FS stack Ric Wheeler
2007-03-11 23:14 ` Jan Engelhardt
2007-03-12 2:45 ` Ric Wheeler
2007-03-12 3:27 ` Jan Engelhardt
2007-03-12 3:46 ` Andreas Dilger
2007-03-12 12:17 ` Alan Cox
2007-03-12 14:41 ` Jeff Garzik
2007-03-12 14:36 ` Jeff Garzik
2007-03-12 15:45 ` Alan Cox
2007-03-12 18:31 ` Bryan Henderson
2007-03-12 18:37 ` Sergei Shtylyov
2007-03-12 20:52 ` Bryan Henderson
2007-03-12 19:16 ` Douglas Gilbert
2007-03-12 19:28 ` Jeff Garzik
2007-03-12 0:02 ` Alan Cox
2007-03-12 0:44 ` Jeff Garzik
2007-03-12 2:37 ` Ric Wheeler
2007-03-12 12:24 ` Alan Cox
2007-03-12 13:32 ` Ric Wheeler
2007-03-12 15:21 ` Douglas Gilbert
2007-03-12 16:08 ` Martin K. Petersen
2007-03-12 14:26 ` Jeff Garzik [this message]
2007-03-13 5:11 ` Andreas Dilger
2007-03-13 6:34 ` Chris Wedgwood
2007-03-12 2:41 ` Ric Wheeler
2007-03-12 8:18 ` Christoph Hellwig
2007-03-12 14:40 ` James Bottomley
2007-03-12 14:45 ` Jeff Garzik
2007-03-12 14:57 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45F56318.9030505@garzik.org \
--to=jeff@garzik.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=ric@emc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox