linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Questions about 4k sector drives
@ 2010-04-25 11:45 Florian Kusche
  2010-04-25 14:43 ` Phillip Susi
  0 siblings, 1 reply; 14+ messages in thread
From: Florian Kusche @ 2010-04-25 11:45 UTC (permalink / raw)
  To: linux-raid

Hello,

I have a few questions about Software RAID and 4k-sector drives. I already searched the web, but some questions are left.

1) (this one is not directly raid-related)
I understand, that newer kernels can determine whether a drive uses 4k physical sectors even if it presents 512-byte logical sectors to the outside. (via the sector_size_supported() patch by Matthew Wilcox)

Is this only an information given to userland tools, or will the linux kernel change its behavior (e.g. use 4k-blocks for such block devices)? (I guess it's only an information for userland tools.)

2)
Will Software RAID work with physical 4k-sector drives...
- that simulate logical 512 byte sectors?
- that also have logical 4k-sectors?
(I'm pretty sure, the answer to both questions is yes.)

3)
Will Software RAID work in mixed setups? i.e.: what combinations of the following drive types are possible?
- 512b physical / 512b logical
- 4k physical / 512b logical
- 4k physical / 4k logical
And: Will the resulting md block device have 512 byte blocks or 4k blocks?
(I would guess that you need to have the same logical sector size for all disks.)

I am aware of the performance problems due to read-modify-write cycles and potential misalignment. This has been discussed in plenty of articles on the web.

It would be great if someone could clear things up a little (and tell me if me guesses are correct).

Thanks,
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-04-25 11:45 Questions about 4k sector drives Florian Kusche
@ 2010-04-25 14:43 ` Phillip Susi
  2010-05-02 23:04   ` Bill Davidsen
  0 siblings, 1 reply; 14+ messages in thread
From: Phillip Susi @ 2010-04-25 14:43 UTC (permalink / raw)
  To: Florian Kusche; +Cc: linux-raid

On Sun, 2010-04-25 at 13:45 +0200, Florian Kusche wrote:
> Hello,
> 
> I have a few questions about Software RAID and 4k-sector drives. I
>  already searched the web, but some questions are left.
> 
> 1) (this one is not directly raid-related) I understand, that newer
>  kernels can determine whether a drive uses 4k physical sectors even if
>  it presents 512-byte logical sectors to the outside. (via the
>  sector_size_supported() patch by Matthew Wilcox)
> 
> Is this only an information given to userland tools, or will the linux
>  kernel change its behavior (e.g. use 4k-blocks for such block
>  devices)? (I guess it's only an information for userland tools.)

If the device reports it, then the kernel knows it and exports it to
user space.  The new 4k drives from WD unfortunately, lie and report
their physical sector size is 512 bytes.

> 2)
> Will Software RAID work with physical 4k-sector drives...
> - that simulate logical 512 byte sectors?
> - that also have logical 4k-sectors?
> (I'm pretty sure, the answer to both questions is yes.)

Yes.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-04-25 14:43 ` Phillip Susi
@ 2010-05-02 23:04   ` Bill Davidsen
  2010-05-03  5:54     ` Luca Berra
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Bill Davidsen @ 2010-05-02 23:04 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Florian Kusche, linux-raid

Phillip Susi wrote:
> On Sun, 2010-04-25 at 13:45 +0200, Florian Kusche wrote:
>   
>> Hello,
>>
>> I have a few questions about Software RAID and 4k-sector drives. I
>>  already searched the web, but some questions are left.
>>
>> 1) (this one is not directly raid-related) I understand, that newer
>>  kernels can determine whether a drive uses 4k physical sectors even if
>>  it presents 512-byte logical sectors to the outside. (via the
>>  sector_size_supported() patch by Matthew Wilcox)
>>
>> Is this only an information given to userland tools, or will the linux
>>  kernel change its behavior (e.g. use 4k-blocks for such block
>>  devices)? (I guess it's only an information for userland tools.)
>>     
>
> If the device reports it, then the kernel knows it and exports it to
> user space.  The new 4k drives from WD unfortunately, lie and report
> their physical sector size is 512 bytes.
>
>   
>> 2)
>> Will Software RAID work with physical 4k-sector drives...
>> - that simulate logical 512 byte sectors?
>> - that also have logical 4k-sectors?
>> (I'm pretty sure, the answer to both questions is yes.)
>>     
>
> Yes.
>   

Is there any reason not to just align partitions for all drives on 32kB 
sectors and expect that to work on 512b 4kB and SSD? Cautious testing 
here says it works fine, no anomalies, no exciting performance data, 
just works.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-05-02 23:04   ` Bill Davidsen
@ 2010-05-03  5:54     ` Luca Berra
  2010-05-03 13:17     ` Phillip Susi
  2010-05-03 13:30     ` Greg Freemyer
  2 siblings, 0 replies; 14+ messages in thread
From: Luca Berra @ 2010-05-03  5:54 UTC (permalink / raw)
  To: linux-raid

On Sun, May 02, 2010 at 07:04:50PM -0400, Bill Davidsen wrote:
> Is there any reason not to just align partitions for all drives on 32kB 
> sectors and expect that to work on 512b 4kB and SSD? Cautious testing here 
> says it works fine, no anomalies, no exciting performance data, just works.

No reason at all, consider that even microsoft decided to align
partition at 1M boundary with W2008.

L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-05-02 23:04   ` Bill Davidsen
  2010-05-03  5:54     ` Luca Berra
@ 2010-05-03 13:17     ` Phillip Susi
  2010-05-03 13:30     ` Greg Freemyer
  2 siblings, 0 replies; 14+ messages in thread
From: Phillip Susi @ 2010-05-03 13:17 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Florian Kusche, linux-raid

On 5/2/2010 7:04 PM, Bill Davidsen wrote:
> Is there any reason not to just align partitions for all drives on 32kB
> sectors and expect that to work on 512b 4kB and SSD? Cautious testing
> here says it works fine, no anomalies, no exciting performance data,
> just works.

SSD usually have an erase block size of 512k, which is why Windows 7
aligns partitions to a 1 MB boundary and parted has followed suit.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-05-02 23:04   ` Bill Davidsen
  2010-05-03  5:54     ` Luca Berra
  2010-05-03 13:17     ` Phillip Susi
@ 2010-05-03 13:30     ` Greg Freemyer
  2010-05-03 13:38       ` Phillip Susi
  2010-05-03 20:19       ` Martin K. Petersen
  2 siblings, 2 replies; 14+ messages in thread
From: Greg Freemyer @ 2010-05-03 13:30 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Phillip Susi, Florian Kusche, linux-raid, Martin K. Petersen

Adding Martin Petersen in cc since he knows as much about 4K sector
drives as anyone.

On Sun, May 2, 2010 at 7:04 PM, Bill Davidsen <davidsen@tmr.com> wrote:
<snip>
> Is there any reason not to just align partitions for all drives on 32kB
> sectors and expect that to work on 512b 4kB and SSD? Cautious testing here
> says it works fine, no anomalies, no exciting performance data, just works.

In theory 4K physical sector drives with XP alignment will eventually
ship and possibly have already.

The alignment maybe controlled via a jumper, or could be set in the
factory.  Its up to the manufacturer so there is no way to predict.

These drives will need partitions/stripes etc. aligned to 31.5K, not 1MB.

I don't know it any those drives exist yet, or if they ever will.  But
the kernel topology info specifically supports providing the above
info and aiui parted uses it to choose the best partition layout.

mdadm should as well, not just blindly say 1MB is the magic alignment
point.  (ie. linux can do better than Win2008/Win2003 which simply
disagree with each other on how to align.)

Greg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-05-03 13:30     ` Greg Freemyer
@ 2010-05-03 13:38       ` Phillip Susi
  2010-05-03 20:27         ` Martin K. Petersen
  2010-05-03 20:19       ` Martin K. Petersen
  1 sibling, 1 reply; 14+ messages in thread
From: Phillip Susi @ 2010-05-03 13:38 UTC (permalink / raw)
  To: Greg Freemyer
  Cc: Bill Davidsen, Florian Kusche, linux-raid, Martin K. Petersen

On 5/3/2010 9:30 AM, Greg Freemyer wrote:
> In theory 4K physical sector drives with XP alignment will eventually
> ship and possibly have already.
> 
> The alignment maybe controlled via a jumper, or could be set in the
> factory.  Its up to the manufacturer so there is no way to predict.
> 
> These drives will need partitions/stripes etc. aligned to 31.5K, not 1MB.

The WD drives have such a jumper, but it is not set by default and WD
highly recommends NOT using it since it will only produce optimal
results with XP.

> I don't know it any those drives exist yet, or if they ever will.  But
> the kernel topology info specifically supports providing the above
> info and aiui parted uses it to choose the best partition layout.

AFAICS the kernel has a means of providing that information to user
space, and parted will use it if it is provided, but the kernel has no
means of obtaining that informati9on from the drive, so it is always
left as unknown, so parted defaults to 1 MB alignment like Windows 7.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-05-03 13:30     ` Greg Freemyer
  2010-05-03 13:38       ` Phillip Susi
@ 2010-05-03 20:19       ` Martin K. Petersen
  1 sibling, 0 replies; 14+ messages in thread
From: Martin K. Petersen @ 2010-05-03 20:19 UTC (permalink / raw)
  To: Greg Freemyer
  Cc: Bill Davidsen, Phillip Susi, Florian Kusche, linux-raid,
	Martin K. Petersen

>>>>> "Greg" == Greg Freemyer <greg.freemyer@gmail.com> writes:

Greg> In theory 4K physical sector drives with XP alignment will
Greg> eventually ship and possibly have already.

Well, that's a definite maybe :)

The 4K transition took much longer than anticipated and Vista and beyond
know how to query the drives for alignment.  So I'm guessing that we'll
only see 1-alignment via a jumper at this point.


Greg> I don't know it any those drives exist yet, or if they ever will.

I have a bunch, but obviously they are mostly prototypes.


Greg> mdadm should as well, not just blindly say 1MB is the magic
Greg> alignment point.  (ie. linux can do better than Win2008/Win2003
Greg> which simply disagree with each other on how to align.)

We're going with 1MB as default because that's the new storage industry
consensus.  It's a less formalized number than - say - IDEMA sector
counts, but it appears to have reached critical mass among the vendors.

And obviously we'll compensate if the storage device reports a different
alignment via the relevant ATA or SCSI knobs.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-05-03 13:38       ` Phillip Susi
@ 2010-05-03 20:27         ` Martin K. Petersen
  2010-05-04 13:24           ` Phillip Susi
  0 siblings, 1 reply; 14+ messages in thread
From: Martin K. Petersen @ 2010-05-03 20:27 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Greg Freemyer, Bill Davidsen, Florian Kusche, linux-raid,
	Martin K. Petersen

>>>>> "Phillip" == Phillip Susi <psusi@cfl.rr.com> writes:

>> I don't know it any those drives exist yet, or if they ever will.
>> But the kernel topology info specifically supports providing the
>> above info and aiui parted uses it to choose the best partition
>> layout.

Phillip> AFAICS the kernel has a means of providing that information to
Phillip> user space, and parted will use it if it is provided, but the
Phillip> kernel has no means of obtaining that information from the
Phillip> drive, so it is always left as unknown, so parted defaults to 1
Phillip> MB alignment like Windows 7.

We have means of obtaining alignment and physical sector size
information from both SCSI and ATA drives.  But only if the drive
firmware provides the information, of course.

One currently shipping drive model on the market isn't reporting the
bigger physical block size.  But there are several other 4KB sector
products out there that are working just fine.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-05-03 20:27         ` Martin K. Petersen
@ 2010-05-04 13:24           ` Phillip Susi
  2010-05-04 15:29             ` Martin K. Petersen
  0 siblings, 1 reply; 14+ messages in thread
From: Phillip Susi @ 2010-05-04 13:24 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Greg Freemyer, Bill Davidsen, Florian Kusche, linux-raid

On 5/3/2010 4:27 PM, Martin K. Petersen wrote:
> We have means of obtaining alignment and physical sector size
> information from both SCSI and ATA drives.  But only if the drive
> firmware provides the information, of course.

How?  I don't see any such information in the output of hdparm -I for
instance.

> One currently shipping drive model on the market isn't reporting the
> bigger physical block size.  But there are several other 4KB sector
> products out there that are working just fine.

The WD drive indeed reports a 512 byte sector size, but I also have an
SSD with a 512kb erase block size and it seems like these knobs were
intended to cover that as well, but again, the values exported by the
kernel in /sys are 0 and I don't see a way for the drive to report this
information to the kernel.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-05-04 13:24           ` Phillip Susi
@ 2010-05-04 15:29             ` Martin K. Petersen
  2010-05-05  7:44               ` John Robinson
  0 siblings, 1 reply; 14+ messages in thread
From: Martin K. Petersen @ 2010-05-04 15:29 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Martin K. Petersen, Greg Freemyer, Bill Davidsen, Florian Kusche,
	linux-raid

>>>>> "Phillip" == Phillip Susi <psusi@cfl.rr.com> writes:

>> We have means of obtaining alignment and physical sector size
>> information from both SCSI and ATA drives.  But only if the drive
>> firmware provides the information, of course.

Phillip> How?  

http://oss.oracle.com/~mkp/docs/linux-advanced-storage.pdf


Phillip> I don't see any such information in the output of hdparm -I for
Phillip> instance.

You need hdparm-9.27 or later.


>> One currently shipping drive model on the market isn't reporting the
>> bigger physical block size.  But there are several other 4KB sector
>> products out there that are working just fine.

Phillip> The WD drive indeed reports a 512 byte sector size, but I also
Phillip> have an SSD with a 512kb erase block size and it seems like
Phillip> these knobs were intended to cover that as well, but again, the
Phillip> values exported by the kernel in /sys are 0 and I don't see a
Phillip> way for the drive to report this information to the kernel.

There are no means to report things like the erase block size.  A few
years ago there was a push in the industry to define a set of parameters
that would make sense for flash drives.  For a variety of reasons,
however, this effort never really took off.  There are some things in
the pipeline but it's mostly statistics and life expectancy stuff.

On well-designed drives the erase block size and other physical
characteristics do not matter because the firmware uses an approach akin
to a log-structured filesystem.

For low-end devices (where we could potentially benefit from knowing the
physical characteristics) the problem is that this information is often
considered part of the vendor's secret sauce.  Another common concern is
that exporting a set of metrics squarely puts the drive in the "poorly
designed" bucket and that's a marketing disaster.

It is a lengthy process to get stuff pushed through the standards
organizations.  Even if the industry had been successful in defining a
set of SSD characteristics it would have taken quite a while for things
to get ratified and show up in devices.  The expectation was that early
SSD designs exhibiting side effects from being flash-based would be
obsolete by then.

And as it turns out you can get an SSD with a sane firmware for $100 and
change these days...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-05-04 15:29             ` Martin K. Petersen
@ 2010-05-05  7:44               ` John Robinson
  2010-05-05  7:47                 ` Mikael Abrahamsson
  0 siblings, 1 reply; 14+ messages in thread
From: John Robinson @ 2010-05-05  7:44 UTC (permalink / raw)
  To: linux-raid

Perhaps slightly o/t, but...

On 04/05/2010 16:29, Martin K. Petersen wrote:
[...]
> And as it turns out you can get an SSD with a sane firmware for $100 and
> change these days...

You can? Which one(s)? Would they be good for putting md bitmaps and 
filesystem journals on?

Cheers,

John.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-05-05  7:44               ` John Robinson
@ 2010-05-05  7:47                 ` Mikael Abrahamsson
  2010-05-10 17:13                   ` Bill Davidsen
  0 siblings, 1 reply; 14+ messages in thread
From: Mikael Abrahamsson @ 2010-05-05  7:47 UTC (permalink / raw)
  To: John Robinson; +Cc: linux-raid

On Wed, 5 May 2010, John Robinson wrote:

> You can? Which one(s)? Would they be good for putting md bitmaps and 
> filesystem journals on?

Yes. The Intel X25-V 40G drive is the one I would recommend, I use it as a 
system drive in one box, it's not as fast (linear write speed) as the 
X25-M drives, but it's definitely a step up from the 5400rpm 2.5" drive I 
used in the system before :P

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Questions about 4k sector drives
  2010-05-05  7:47                 ` Mikael Abrahamsson
@ 2010-05-10 17:13                   ` Bill Davidsen
  0 siblings, 0 replies; 14+ messages in thread
From: Bill Davidsen @ 2010-05-10 17:13 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: John Robinson, linux-raid

Mikael Abrahamsson wrote:
> On Wed, 5 May 2010, John Robinson wrote:
>
>> You can? Which one(s)? Would they be good for putting md bitmaps and 
>> filesystem journals on?
>
> Yes. The Intel X25-V 40G drive is the one I would recommend, I use it 
> as a system drive in one box, it's not as fast (linear write speed) as 
> the X25-M drives, but it's definitely a step up from the 5400rpm 2.5" 
> drive I used in the system before :P
>
2nd that, nice drive, cheap, many uses for it. In addition to bitmap and 
journal (pick the right journal options and see a huge boost), putting 
swap out there make hibernate one of those "wanna see it again" operations.

I really want to use it for write cache, but I guess putting a big 
journal there has a similar effect.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-05-10 17:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-25 11:45 Questions about 4k sector drives Florian Kusche
2010-04-25 14:43 ` Phillip Susi
2010-05-02 23:04   ` Bill Davidsen
2010-05-03  5:54     ` Luca Berra
2010-05-03 13:17     ` Phillip Susi
2010-05-03 13:30     ` Greg Freemyer
2010-05-03 13:38       ` Phillip Susi
2010-05-03 20:27         ` Martin K. Petersen
2010-05-04 13:24           ` Phillip Susi
2010-05-04 15:29             ` Martin K. Petersen
2010-05-05  7:44               ` John Robinson
2010-05-05  7:47                 ` Mikael Abrahamsson
2010-05-10 17:13                   ` Bill Davidsen
2010-05-03 20:19       ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).