public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michael Tokarev <mjt@tls.msk.ru>
To: Karel Zak <kzak@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Tejun Heo <tj@kernel.org>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Daniel Taylor <Daniel.Taylor@wdc.com>,
	Jeff Garzik <jeff@garzik.org>, Mark Lord <kernel@teksavvy.com>,
	tytso@mit.edu, "H. Peter Anvin" <hpa@zytor.com>,
	hirofumi@mail.parknet.co.jp,
	Andrew Morton <akpm@linux-foundation.org>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	irtiger@gmail.com, Matthew Wilcox <matthew@wil.cx>,
	aschnell@suse.de, knikanth@suse.de, jdelvare@suse.de,
	Jim Meyering <jim@meyering.net>, Neil Brown <neilb@suse.de>
Subject: Re: ATA 4 KiB sector issues.
Date: Tue, 09 Mar 2010 13:16:01 +0300	[thread overview]
Message-ID: <4B961FE1.5040105@msgid.tls.msk.ru> (raw)
In-Reply-To: <20100309100153.GD18077@nb.net.home>

Karel Zak wrote:
> On Tue, Mar 09, 2010 at 09:53:37AM +0300, Michael Tokarev wrote:
[]
>> Think of a raid5 array - with all the mentioned good stuff in place
>> fdisk should figure out to align partitions on the array stripe
>> boundary, and should do that automatically.  And this should be
> 
> Yes. For userspace there is not a difference between RAID and non-RAID
> device -- the topology support in kernel provides unified API to all
> devices. It means we needn't any extra support for RAIDs in
> fdisk/parted. The userspace tools follow topology data from kernel.
> 
> The good thing with 1MiB default alignment is that it is usable for
> usual stripe sizes (for sizes greater than 1MiB we use optimal I/O
> size).

No, it's not that simple.  For raid5 (and I especially mentioned raid5
above), raid4 and raid6, 1MiB is only good when the number of devices
is 2^N+1 (for raid[45]) or 2^N+2 (for raid6).  For raid5 that means
3, 5, 9, 17, .. disks.  In all other cases the alignment (which should
match stripe size) will not be power of two.  For example, for a 4-disk
raid5 array with 1MiB chunk size the partitions should be aligned at
3MiB boundaries.  For 6-disk raid5 with 256KiB chunk size it is
5x256=1280 Kib.  And so on.

Yes it has little to do with the $subject (4KiB sectors), but it is
closely related still.

>> most easy to debug/test, since the whole thing is controllable
>> by kernel.
> 
> I did almost all my tests with scsi_debug or MD RAID0 on scsi_debug.
> It works as expected.

Actually, for raid0, the alignment is questionable.  Should it be a
multiple of chunk size or whole stripe size?  I'm not sure, both ways
has bad and good sides..  But if it is the latter, the same issues
pops up again: do a 3-disk raid0 and you'll have to align to 3*2^N.

[]
>  Disk /dev/sdb: 2621 MB, 2621440000 bytes
>  255 heads, 63 sectors/track, 318 cylinders, total 5120000 sectors
>  Units = sectors of 1 * 512 = 512 bytes
>  Sector size (logical/physical): 512 bytes / 4096 bytes
>  I/O size (minimum/optimal): 4096 bytes / 32768 bytes

Good.

>  # mdadm --create /dev/md8 --level=5 --raid-devices=4 /dev/sdb{1,2,3,4}

That's 3-disk stripe size with default 64Kb chunk size, which makes
3x64=320KiB - the number to which everything should be aligned.

>  # fdisk -lcu /dev/md8
> 
>  Disk /dev/md8: 1572 MB, 1572667392 bytes
>  2 heads, 4 sectors/track, 383952 cylinders, total 3071616 sectors
>  Units = sectors of 1 * 512 = 512 bytes
>  Sector size (logical/physical): 512 bytes / 4096 bytes
>  I/O size (minimum/optimal): 65536 bytes / 65536 bytes

And here we go: fdisk does not see the right number: nothing
is dividable by 3.

[]
>  # cat /sys/block/md8/md8p{1,2}/alignment_offset
>  0
>  0

And that's where the issue is.  md does not {sup,re}port all
this stuff yet.

This is what I'm talking about.

Thanks!

/mjt

  reply	other threads:[~2010-03-09 10:16 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-08  3:48 ATA 4 KiB sector issues Tejun Heo
2010-03-08  5:38 ` Greg Freemyer
2010-03-08  7:00 ` James Bottomley
2010-03-08  7:53   ` H. Peter Anvin
2010-03-08 15:34     ` Martin K. Petersen
2010-03-09 22:36       ` Daniel Taylor
2010-03-09 22:46     ` Greg Freemyer
2010-03-10  0:05       ` Tejun Heo
2010-03-10  0:14         ` Daniel Taylor
2010-03-10  0:26           ` Tejun Heo
2010-03-10  0:36             ` H. Peter Anvin
2010-03-10  5:17           ` H. Peter Anvin
2010-03-10  7:09           ` Gabor Gombas
2010-03-10  0:32       ` H. Peter Anvin
2010-03-10 10:46         ` Johannes Stezenbach
2010-03-10 11:22           ` H. Peter Anvin
2010-03-08  7:56   ` H. Peter Anvin
2010-03-08 15:33   ` Martin K. Petersen
2010-03-08 15:38     ` Martin K. Petersen
2010-03-08 15:41       ` Martin K. Petersen
2010-03-08 18:50         ` H. Peter Anvin
2010-03-08 18:58           ` James Bottomley
2010-03-08 19:11             ` H. Peter Anvin
2010-03-08 20:02             ` Cláudio Martins
2010-03-08 21:07               ` Martin K. Petersen
2010-03-08 20:19           ` Martin K. Petersen
2010-03-08 21:16             ` H. Peter Anvin
2010-03-10  0:34           ` Tejun Heo
2010-03-10  7:53         ` Matthew Wilcox
2010-03-10 13:47           ` Jeff Garzik
2010-03-10 16:19             ` Damian Lukowski
2010-03-11 13:04               ` Theodore Tso
2010-03-11 13:57                 ` Nikanth Karthikesan
2010-03-11 14:28                   ` Theodore Tso
2010-03-11 14:39                     ` James Bottomley
2010-03-11 15:05                       ` Nikanth Karthikesan
2010-03-11 15:25                         ` tytso
2010-03-11 16:26                           ` Gene Heskett
2010-03-11 16:34                           ` Greg Freemyer
2010-03-12  1:09                             ` Tejun Heo
2010-03-11 14:48                     ` Mike Snitzer
2010-03-11 15:00                     ` Nikanth Karthikesan
2010-03-11 15:10                       ` Tejun Heo
2010-03-11 16:01                       ` Mike Snitzer
2010-03-11 18:26                         ` Christoph Hellwig
2010-03-11 16:33                   ` H. Peter Anvin
2010-03-08 15:18 ` Martin K. Petersen
2010-03-08 18:29   ` H. Peter Anvin
2010-03-08 20:01     ` Martin K. Petersen
2010-03-08 19:34   ` Mike Snitzer
2010-03-09  2:53     ` Tejun Heo
2010-03-09  3:20       ` Martin K. Petersen
2010-03-09  6:53     ` Michael Tokarev
2010-03-09 10:01       ` Karel Zak
2010-03-09 10:16         ` Michael Tokarev [this message]
2010-03-09 11:15           ` Dave Chinner
2010-03-09 11:38             ` Michael Tokarev
2010-03-09 12:20               ` Dave Chinner
2010-03-09 11:50           ` Karel Zak
2010-03-09 12:18           ` Karel Zak
2010-03-10  5:06             ` Martin K. Petersen
2010-03-10 20:50               ` Henrique de Moraes Holschuh
2010-03-10  4:57       ` Martin K. Petersen
2010-03-08 19:58   ` Karel Zak
2010-03-09  2:34     ` Tejun Heo
2010-03-09  2:42       ` Jeff Garzik
2010-03-09  2:49         ` Tejun Heo
2010-03-09  2:42       ` Tejun Heo
2010-03-09  3:11         ` Martin K. Petersen
2010-03-09  3:09       ` Martin K. Petersen
2010-03-09  3:38       ` Daniel Taylor
2010-03-09  4:54         ` Martin K. Petersen
2010-03-09  7:27     ` Jim Meyering
2010-03-09 23:56       ` Tejun Heo
2010-03-08 20:12   ` H. Peter Anvin
2010-03-09  2:22     ` Tejun Heo
2010-03-09  2:44   ` Tejun Heo
2010-03-09  3:18     ` Martin K. Petersen
2010-03-09 14:32       ` Mark Lord
2010-03-09  6:34 ` Mikael Abrahamsson
2010-03-09 10:06   ` Michal Soltys
2010-03-10  0:11     ` Tejun Heo
2010-03-14 21:09       ` Michal Soltys
2010-03-14 22:56         ` s ponnusa
2010-03-09 13:55 ` Mark Lord
2010-03-10  0:00   ` Tejun Heo
2010-03-10  6:08     ` Mark Lord
2010-03-09 23:46 ` Arnd Bergmann
2010-03-10  0:20   ` Tejun Heo
2010-03-10  9:14   ` Denys Vlasenko
2010-03-15  1:21     ` H. Peter Anvin
2010-03-15  2:26       ` Denys Vlasenko
2010-03-15  2:56         ` Greg Freemyer
2010-03-15  4:00         ` H. Peter Anvin
2010-03-15 12:30           ` Arnd Bergmann
2010-03-15  5:20         ` david
2010-03-15  9:56           ` Denys Vlasenko
2010-03-15 14:47             ` H. Peter Anvin
2010-03-16  2:30     ` Tejun Heo
2010-03-16  2:32       ` Tejun Heo
2010-03-16  6:14       ` James Bottomley
2010-03-16  6:22         ` Tejun Heo
2010-03-16 13:24           ` James Bottomley
2010-03-16 13:56             ` Tejun Heo
2010-03-16 14:21               ` James Bottomley
2010-03-16 14:25                 ` Arnd Bergmann
2010-03-16 14:50                 ` Tejun Heo
2010-03-16 15:02                   ` James Bottomley
2010-03-16 15:20                     ` Tejun Heo
2010-03-16 15:22                       ` Martin K. Petersen
2010-03-17  2:07                         ` Tejun Heo
2010-03-16 15:23                       ` James Bottomley
2010-03-16 15:37                         ` Tejun Heo
2010-03-16 20:42                           ` Ric Wheeler
2010-03-17  2:04                             ` Tejun Heo
2010-03-17  2:51                         ` Kevin Easton
2010-03-17  3:44                           ` Tejun Heo
2010-03-17  8:01                             ` jdow
2010-03-17 17:04                       ` Bill Davidsen
2010-03-16 14:38               ` Denys Vlasenko
2010-03-16 15:12                 ` Tejun Heo
2010-03-16 15:25                   ` Denys Vlasenko
2010-03-16 15:47                     ` Tejun Heo
2010-03-17  6:48             ` H. Peter Anvin
2010-03-16  6:27         ` Thomas Chou
  -- strict thread matches above, loose matches on Subject: below --
2010-03-12  3:10 H. Peter Anvin
2010-03-16 22:21 H. Peter Anvin
2010-03-17 15:08 ` Ric Wheeler
2010-03-17 17:13   ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B961FE1.5040105@msgid.tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=Daniel.Taylor@wdc.com \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=aschnell@suse.de \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=hpa@zytor.com \
    --cc=irtiger@gmail.com \
    --cc=jdelvare@suse.de \
    --cc=jeff@garzik.org \
    --cc=jim@meyering.net \
    --cc=kernel@teksavvy.com \
    --cc=knikanth@suse.de \
    --cc=kzak@redhat.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=matthew@wil.cx \
    --cc=neilb@suse.de \
    --cc=snitzer@redhat.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox