md device io request split

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* md device io request split
@ 2011-11-22  9:36 "Ramon Schönborn"
  2011-11-23  2:31 ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: "Ramon Schönborn" @ 2011-11-22  9:36 UTC (permalink / raw)
  To: linux-raid

Hi,

could someone help me understand why md splits io requests in 4k blocks?
iostat says:
Device:	rrqm/s	wrqm/s	r/s	w/s	rMB/s	wMB/s	avgrq-sz	avgqu-sz	await	svctm	%util				
...
dm-71	4.00	5895.00	31.00	7538.00	0.14	52.54	14.25	94.69	16041	0.13	96.00
dm-96	2.00	5883.00	18.00	7670.00	0.07	52.95	14.13	104.84	13.69	0.12	96.00
md17	0.00	0.00	48.00	13234.00	0.19	51.70	8.00	0.00	0.00	0.00	0.00

md17 is a raid1 with members "dm-71" and "dm-96". IO was generated with something like "dd if=/dev/zero bs=100k of=/dev/md17".
According to "avgrq-sz", the average size of the requests is 8 times 512b, i.e. 4k.
I used kernel 3.0.7 and verified the results with a raid5 and older kernel version (2.6.32) too.
Why do i bother about this at all?
The io requests in my case come from a virtual machine, where the requests have been merged in a virtual device. Afterwards the requests are split at md-level (vm host) and later merged again (at dm-71/dm-96). This seems to be an avoidable overhead, isn't it?

regards,
Ramon Schönborn
-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: md device io request split
  2011-11-22  9:36 md device io request split "Ramon Schönborn"
@ 2011-11-23  2:31 ` NeilBrown
  2011-11-23 13:22   ` "Ramon Schönborn"
  0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2011-11-23  2:31 UTC (permalink / raw)
  To: Ramon Schönborn; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2351 bytes --]

On Tue, 22 Nov 2011 10:36:34 +0100 "Ramon Schönborn" <RSchoenborn@gmx.net>
wrote:

> Hi,
> 
> could someone help me understand why md splits io requests in 4k blocks?
> iostat says:
> Device:	rrqm/s	wrqm/s	r/s	w/s	rMB/s	wMB/s	avgrq-sz	avgqu-sz	await	svctm	%util				
> ...
> dm-71	4.00	5895.00	31.00	7538.00	0.14	52.54	14.25	94.69	16041	0.13	96.00
> dm-96	2.00	5883.00	18.00	7670.00	0.07	52.95	14.13	104.84	13.69	0.12	96.00
> md17	0.00	0.00	48.00	13234.00	0.19	51.70	8.00	0.00	0.00	0.00	0.00
> 
> md17 is a raid1 with members "dm-71" and "dm-96". IO was generated with something like "dd if=/dev/zero bs=100k of=/dev/md17".
> According to "avgrq-sz", the average size of the requests is 8 times 512b, i.e. 4k.
> I used kernel 3.0.7 and verified the results with a raid5 and older kernel version (2.6.32) too.
> Why do i bother about this at all?
> The io requests in my case come from a virtual machine, where the requests have been merged in a virtual device. Afterwards the requests are split at md-level (vm host) and later merged again (at dm-71/dm-96). This seems to be an avoidable overhead, isn't it?

Reads to a RAID5 device should be as large as the chunk size.

Writes will always be 4K as they go through the stripe cache which uses 4K
blocks.
These 4K requests should be combined into large requests by the
elevator/scheduler at a lower level so the device should see largish writes.

Writing to a RAID5 is always going to be costly due to the need to compute
and write parity, so it isn't clear to me that this is a place were
optimisation is appropriate.

RAID1 will only limit requests to 4K if the device beneath it is
non-contiguous - e.g. a striped array or LVM arrangement were consecutive
blocks might be on different devices.
Because of the way request splitting is managed in the block layer, RAID1 is
only allowed to send down a request that will be sure to fit on a single
device.  As different devices in the RAID1 could have different alignments it
would be very complex to track exactly how each request must be split at the
top of the stack so as to fit all the way down, and I think it is impossible
to do it in a race-free way.
So if this might be the case, RAID1 insists on only receiving 1-page requests
because it knows they are always allowed to be passed down.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: md device io request split
  2011-11-23  2:31 ` NeilBrown
@ 2011-11-23 13:22   ` "Ramon Schönborn"
  2011-11-23 19:30     ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: "Ramon Schönborn" @ 2011-11-23 13:22 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

 
> RAID1 will only limit requests to 4K if the device beneath it is
> non-contiguous - e.g. a striped array or LVM arrangement were consecutive
> blocks might be on different devices.

how does it know if a device is non-contiguous? Is there a way to have a dm device "marked" like that or force md to use bigger requests?
Lets assume a host with about 20 raid1 devices consisting of dm-devices with the mentioned overhead - do you think that not splitting the requests could lead to a noticeable performance improvement?

Thanx for your help,
Ramon Schönborn
-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: md device io request split
  2011-11-23 13:22   ` "Ramon Schönborn"
@ 2011-11-23 19:30     ` NeilBrown
  0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2011-11-23 19:30 UTC (permalink / raw)
  To: Ramon Schönborn; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]

On Wed, 23 Nov 2011 14:22:16 +0100 "Ramon Schönborn" <RSchoenborn@gmx.net>
wrote:

>  
> > RAID1 will only limit requests to 4K if the device beneath it is
> > non-contiguous - e.g. a striped array or LVM arrangement were consecutive
> > blocks might be on different devices.
> 
> how does it know if a device is non-contiguous? Is there a way to have a dm device "marked" like that or force md to use bigger requests?
> Lets assume a host with about 20 raid1 devices consisting of dm-devices with the mentioned overhead - do you think that not splitting the requests could lead to a noticeable performance improvement?
> 
> Thanx for your help,
> Ramon Schönborn

If the device provide a "merge_bvec_fn", then it is assumed to not be
contiguous.
dm always sets this on its devices.

I really have no idea what sort of overhead this creates.  You would need to
test it.
I assume you are using dm simply as a partitioning tool with a single linear
mapping per device.
If this is the case it should be safe for testing to remove the line

	blk_queue_merge_bvec(md->queue, dm_merge_bvec);

from drivers/md/dm.c and see how that change performance.  If you have any dm
targets more simple than a single linear mapping with will almost certainly
cause IO failure at some point so this should only be used for testing.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-11-23 19:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-22  9:36 md device io request split "Ramon Schönborn"
2011-11-23  2:31 ` NeilBrown
2011-11-23 13:22   ` "Ramon Schönborn"
2011-11-23 19:30     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).