public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Karel Zak <kzak@redhat.com>
To: Michael Tokarev <mjt@tls.msk.ru>
Cc: Mike Snitzer <snitzer@redhat.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Tejun Heo <tj@kernel.org>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Daniel Taylor <Daniel.Taylor@wdc.com>,
	Jeff Garzik <jeff@garzik.org>, Mark Lord <kernel@teksavvy.com>,
	tytso@mit.edu, "H. Peter Anvin" <hpa@zytor.com>,
	hirofumi@mail.parknet.co.jp,
	Andrew Morton <akpm@linux-foundation.org>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	irtiger@gmail.com, Matthew Wilcox <matthew@wil.cx>,
	aschnell@suse.de, knikanth@suse.de, jdelvare@suse.de,
	Jim Meyering <jim@meyering.net>, Neil Brown <neilb@suse.de>
Subject: Re: ATA 4 KiB sector issues.
Date: Tue, 9 Mar 2010 11:01:53 +0100	[thread overview]
Message-ID: <20100309100153.GD18077@nb.net.home> (raw)
In-Reply-To: <4B95F071.3070400@msgid.tls.msk.ru>

On Tue, Mar 09, 2010 at 09:53:37AM +0300, Michael Tokarev wrote:
> Mike Snitzer wrote:
> []
> > I've been keeping track of all the pieces in play, have coordinated
> > with kzak and jim, and have a summary that offers some amount of macro
> > detail (at the end I touch on parted and fdisk):
> > 
> > http://people.redhat.com/msnitzer/docs/io-limits.txt
> 
> What I don't see in this thread and in this document is - any mention
> of linux md layer.  I think it is the first candidate to test the whole
> thing, the easiest and most important one.  I mean the alignment and
> "recommended I/O size" and all this similar stuff.
> 
> Think of a raid5 array - with all the mentioned good stuff in place
> fdisk should figure out to align partitions on the array stripe
> boundary, and should do that automatically.  And this should be

Yes. For userspace there is not a difference between RAID and non-RAID
device -- the topology support in kernel provides unified API to all
devices. It means we needn't any extra support for RAIDs in
fdisk/parted. The userspace tools follow topology data from kernel.

The good thing with 1MiB default alignment is that it is usable for
usual stripe sizes (for sizes greater than 1MiB we use optimal I/O
size).

> most easy to debug/test, since the whole thing is controllable
> by kernel.

I did almost all my tests with scsi_debug or MD RAID0 on scsi_debug.
It works as expected. (Note that kernel 2.6.31 has a problem with
alignment_offset calculation on stacked devices, so use the latest
kernel where the bug is already fixed.)

But I didn't tried to use unpartitioned (whole) 4K disks for RAIDs,
because scsi_debug does not allow to create more devices (and I don't
have a real HW).

Some tests are available in util-linux-ng sources:
http://git.kernel.org/?p=utils/util-linux-ng/util-linux-ng.git;a=tree;f=tests/ts/fdisk

    Karel


 # modprobe scsi_debug dev_size_mb=2500 sector_size=512 physblk_exp=3

    [..create partitions...]

 # fdisk -lcu /dev/sdb 

 Disk /dev/sdb: 2621 MB, 2621440000 bytes
 255 heads, 63 sectors/track, 318 cylinders, total 5120000 sectors
 Units = sectors of 1 * 512 = 512 bytes
 Sector size (logical/physical): 512 bytes / 4096 bytes
 I/O size (minimum/optimal): 4096 bytes / 32768 bytes
 Disk identifier: 0xb585b0be

 Device Boot         Start         End      Blocks   Id  System
 /dev/sdb1            2048     1026047      512000   83  Linux
 /dev/sdb2         1026048     2050047      512000   83  Linux
 /dev/sdb3         2050048     3074047      512000   83  Linux
 /dev/sdb4         3074048     4098047      512000   83  Linux


 # mdadm --create /dev/md8 --level=5 --raid-devices=4 /dev/sdb{1,2,3,4}

     [...create partitions on the raid...]

 # fdisk -lcu /dev/md8

 Disk /dev/md8: 1572 MB, 1572667392 bytes
 2 heads, 4 sectors/track, 383952 cylinders, total 3071616 sectors
 Units = sectors of 1 * 512 = 512 bytes
 Sector size (logical/physical): 512 bytes / 4096 bytes
 I/O size (minimum/optimal): 65536 bytes / 65536 bytes
 Disk identifier: 0x1bb6fd8d

 Device Boot          Start         End      Blocks   Id  System
 /dev/md8p1            2048     1435647      716800   83  Linux
 /dev/md8p2         1435648     2869247      716800   83  Linux


 Check offsets (alignment):

 # cat /sys/block/sdb/sdb{1,2,3,4}/alignment_offset
 0
 0
 0
 0

 # cat /sys/block/md8/md8p{1,2}/alignment_offset
 0
 0

-- 
 Karel Zak  <kzak@redhat.com>

  reply	other threads:[~2010-03-09 10:02 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-08  3:48 ATA 4 KiB sector issues Tejun Heo
2010-03-08  5:38 ` Greg Freemyer
2010-03-08  7:00 ` James Bottomley
2010-03-08  7:53   ` H. Peter Anvin
2010-03-08 15:34     ` Martin K. Petersen
2010-03-09 22:36       ` Daniel Taylor
2010-03-09 22:46     ` Greg Freemyer
2010-03-10  0:05       ` Tejun Heo
2010-03-10  0:14         ` Daniel Taylor
2010-03-10  0:26           ` Tejun Heo
2010-03-10  0:36             ` H. Peter Anvin
2010-03-10  5:17           ` H. Peter Anvin
2010-03-10  7:09           ` Gabor Gombas
2010-03-10  0:32       ` H. Peter Anvin
2010-03-10 10:46         ` Johannes Stezenbach
2010-03-10 11:22           ` H. Peter Anvin
2010-03-08  7:56   ` H. Peter Anvin
2010-03-08 15:33   ` Martin K. Petersen
2010-03-08 15:38     ` Martin K. Petersen
2010-03-08 15:41       ` Martin K. Petersen
2010-03-08 18:50         ` H. Peter Anvin
2010-03-08 18:58           ` James Bottomley
2010-03-08 19:11             ` H. Peter Anvin
2010-03-08 20:02             ` Cláudio Martins
2010-03-08 21:07               ` Martin K. Petersen
2010-03-08 20:19           ` Martin K. Petersen
2010-03-08 21:16             ` H. Peter Anvin
2010-03-10  0:34           ` Tejun Heo
2010-03-10  7:53         ` Matthew Wilcox
2010-03-10 13:47           ` Jeff Garzik
2010-03-10 16:19             ` Damian Lukowski
2010-03-11 13:04               ` Theodore Tso
2010-03-11 13:57                 ` Nikanth Karthikesan
2010-03-11 14:28                   ` Theodore Tso
2010-03-11 14:39                     ` James Bottomley
2010-03-11 15:05                       ` Nikanth Karthikesan
2010-03-11 15:25                         ` tytso
2010-03-11 16:26                           ` Gene Heskett
2010-03-11 16:34                           ` Greg Freemyer
2010-03-12  1:09                             ` Tejun Heo
2010-03-11 14:48                     ` Mike Snitzer
2010-03-11 15:00                     ` Nikanth Karthikesan
2010-03-11 15:10                       ` Tejun Heo
2010-03-11 16:01                       ` Mike Snitzer
2010-03-11 18:26                         ` Christoph Hellwig
2010-03-11 16:33                   ` H. Peter Anvin
2010-03-08 15:18 ` Martin K. Petersen
2010-03-08 18:29   ` H. Peter Anvin
2010-03-08 20:01     ` Martin K. Petersen
2010-03-08 19:34   ` Mike Snitzer
2010-03-09  2:53     ` Tejun Heo
2010-03-09  3:20       ` Martin K. Petersen
2010-03-09  6:53     ` Michael Tokarev
2010-03-09 10:01       ` Karel Zak [this message]
2010-03-09 10:16         ` Michael Tokarev
2010-03-09 11:15           ` Dave Chinner
2010-03-09 11:38             ` Michael Tokarev
2010-03-09 12:20               ` Dave Chinner
2010-03-09 11:50           ` Karel Zak
2010-03-09 12:18           ` Karel Zak
2010-03-10  5:06             ` Martin K. Petersen
2010-03-10 20:50               ` Henrique de Moraes Holschuh
2010-03-10  4:57       ` Martin K. Petersen
2010-03-08 19:58   ` Karel Zak
2010-03-09  2:34     ` Tejun Heo
2010-03-09  2:42       ` Jeff Garzik
2010-03-09  2:49         ` Tejun Heo
2010-03-09  2:42       ` Tejun Heo
2010-03-09  3:11         ` Martin K. Petersen
2010-03-09  3:09       ` Martin K. Petersen
2010-03-09  3:38       ` Daniel Taylor
2010-03-09  4:54         ` Martin K. Petersen
2010-03-09  7:27     ` Jim Meyering
2010-03-09 23:56       ` Tejun Heo
2010-03-08 20:12   ` H. Peter Anvin
2010-03-09  2:22     ` Tejun Heo
2010-03-09  2:44   ` Tejun Heo
2010-03-09  3:18     ` Martin K. Petersen
2010-03-09 14:32       ` Mark Lord
2010-03-09  6:34 ` Mikael Abrahamsson
2010-03-09 10:06   ` Michal Soltys
2010-03-10  0:11     ` Tejun Heo
2010-03-14 21:09       ` Michal Soltys
2010-03-14 22:56         ` s ponnusa
2010-03-09 13:55 ` Mark Lord
2010-03-10  0:00   ` Tejun Heo
2010-03-10  6:08     ` Mark Lord
2010-03-09 23:46 ` Arnd Bergmann
2010-03-10  0:20   ` Tejun Heo
2010-03-10  9:14   ` Denys Vlasenko
2010-03-15  1:21     ` H. Peter Anvin
2010-03-15  2:26       ` Denys Vlasenko
2010-03-15  2:56         ` Greg Freemyer
2010-03-15  4:00         ` H. Peter Anvin
2010-03-15 12:30           ` Arnd Bergmann
2010-03-15  5:20         ` david
2010-03-15  9:56           ` Denys Vlasenko
2010-03-15 14:47             ` H. Peter Anvin
2010-03-16  2:30     ` Tejun Heo
2010-03-16  2:32       ` Tejun Heo
2010-03-16  6:14       ` James Bottomley
2010-03-16  6:22         ` Tejun Heo
2010-03-16 13:24           ` James Bottomley
2010-03-16 13:56             ` Tejun Heo
2010-03-16 14:21               ` James Bottomley
2010-03-16 14:25                 ` Arnd Bergmann
2010-03-16 14:50                 ` Tejun Heo
2010-03-16 15:02                   ` James Bottomley
2010-03-16 15:20                     ` Tejun Heo
2010-03-16 15:22                       ` Martin K. Petersen
2010-03-17  2:07                         ` Tejun Heo
2010-03-16 15:23                       ` James Bottomley
2010-03-16 15:37                         ` Tejun Heo
2010-03-16 20:42                           ` Ric Wheeler
2010-03-17  2:04                             ` Tejun Heo
2010-03-17  2:51                         ` Kevin Easton
2010-03-17  3:44                           ` Tejun Heo
2010-03-17  8:01                             ` jdow
2010-03-17 17:04                       ` Bill Davidsen
2010-03-16 14:38               ` Denys Vlasenko
2010-03-16 15:12                 ` Tejun Heo
2010-03-16 15:25                   ` Denys Vlasenko
2010-03-16 15:47                     ` Tejun Heo
2010-03-17  6:48             ` H. Peter Anvin
2010-03-16  6:27         ` Thomas Chou
  -- strict thread matches above, loose matches on Subject: below --
2010-03-12  3:10 H. Peter Anvin
2010-03-16 22:21 H. Peter Anvin
2010-03-17 15:08 ` Ric Wheeler
2010-03-17 17:13   ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100309100153.GD18077@nb.net.home \
    --to=kzak@redhat.com \
    --cc=Daniel.Taylor@wdc.com \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=aschnell@suse.de \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=hpa@zytor.com \
    --cc=irtiger@gmail.com \
    --cc=jdelvare@suse.de \
    --cc=jeff@garzik.org \
    --cc=jim@meyering.net \
    --cc=kernel@teksavvy.com \
    --cc=knikanth@suse.de \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=matthew@wil.cx \
    --cc=mjt@tls.msk.ru \
    --cc=neilb@suse.de \
    --cc=snitzer@redhat.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox