* blk[front|back] does not hand over disk parameters
@ 2011-02-25 16:43 Adi Kriegisch
2011-02-28 10:06 ` Ian Campbell
0 siblings, 1 reply; 7+ messages in thread
From: Adi Kriegisch @ 2011-02-25 16:43 UTC (permalink / raw)
To: xen-devel
Dear all,
(following the XenFAQ on how to report a bug[1], I submitted this to
xen-user list[2] first, reported the bug in bugzilla[3] and now resend the
text to this list. Please CC me in replys as I am not subscribed to this
list, Thanks!)
I investigated some serious performance drop between Dom0 and DomU with
LVM on top of RAID6 and blkback devices.
While I have around 130MB/s write performance in Dom0, I only get 30MB/s in
DomU. Inspecting this with dstat/iostat revealed that I have a read rate of
about 17-25MB/s while writing with aroung 40MB/s.
The reading only occurs on the disk devices assembled to the RAID6 not the
md device itself. So this is related to RAID6 activity only.
The reason for this is recalculation of checksums due to a too small
optimal_io_size:
On Dom0:
blockdev --getiomin /dev/space/test
524288 (which is chunk size)
blockdev --getioopt /dev/space/test
3145728 (which is 6*chunk size)
On DomU:
blockdev --getiomin /dev/xvdb1
512
blockdev --getioopt /dev/xvdb1
0 (so the kernel will use 1MB by default, IIRC)
minimum_io_size -- if not set -- is hardware block size which seems to be
set to 512 in xlvbd_init_blk_queue (blkfront.c). Btw: blockdev --getbsz
/dev/space/test gives 4096 on Dom0 while DomU reports 512.
I can somehow mitigate the issue by using a way smaller chunk size but this
is IMHO just working around the issue. Another workaround could be to use a
"power-of-two" number of data disks in the raid and choose the chunk size
to sum up to 1MB. But this is just another hack...
If there is anything I can do, please let me know!
Thanks,
Adi Kriegisch
PS: I am using a stock Debian/Squeeze kernel on top of Debians Xen 4.0.1-2.
[1] http://wiki.xensource.com/xenwiki/XenFaq
[2] http://lists.xensource.com/archives/html/xen-users/2011-02/msg00615.html
[3] http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1745
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blk[front|back] does not hand over disk parameters
2011-02-25 16:43 blk[front|back] does not hand over disk parameters Adi Kriegisch
@ 2011-02-28 10:06 ` Ian Campbell
2011-02-28 10:55 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Ian Campbell @ 2011-02-28 10:06 UTC (permalink / raw)
To: Adi Kriegisch; +Cc: xen-devel@lists.xensource.com
On Fri, 2011-02-25 at 16:43 +0000, Adi Kriegisch wrote:
> Dear all,
>
> (following the XenFAQ on how to report a bug[1], I submitted this to
> xen-user list[2] first, reported the bug in bugzilla[3] and now resend the
> text to this list. Please CC me in replys as I am not subscribed to this
> list, Thanks!)
>
> I investigated some serious performance drop between Dom0 and DomU with
> LVM on top of RAID6 and blkback devices.
> While I have around 130MB/s write performance in Dom0, I only get 30MB/s in
> DomU. Inspecting this with dstat/iostat revealed that I have a read rate of
> about 17-25MB/s while writing with aroung 40MB/s.
> The reading only occurs on the disk devices assembled to the RAID6 not the
> md device itself. So this is related to RAID6 activity only.
> The reason for this is recalculation of checksums due to a too small
> optimal_io_size:
> On Dom0:
> blockdev --getiomin /dev/space/test
> 524288 (which is chunk size)
> blockdev --getioopt /dev/space/test
> 3145728 (which is 6*chunk size)
>
> On DomU:
> blockdev --getiomin /dev/xvdb1
> 512
> blockdev --getioopt /dev/xvdb1
> 0 (so the kernel will use 1MB by default, IIRC)
>
> minimum_io_size -- if not set -- is hardware block size which seems to be
> set to 512 in xlvbd_init_blk_queue (blkfront.c). Btw: blockdev --getbsz
> /dev/space/test gives 4096 on Dom0 while DomU reports 512.
>
> I can somehow mitigate the issue by using a way smaller chunk size but this
> is IMHO just working around the issue. Another workaround could be to use a
> "power-of-two" number of data disks in the raid and choose the chunk size
> to sum up to 1MB. But this is just another hack...
>
> If there is anything I can do, please let me know!
This is not the sort of thing which changes dynamically across the
lifetime of a device, is it? In which case it seems like the sort of
information which the backend could communicate to the frontend via
xenbus at start of day. e.g. take a look at how the sector-size is
passed through xenbus.
It should be trivial to add this in a compatible manner since the
frontend can just do what it does today if the nodes are missing and the
backend wouldn't rely on the frontend doing anything useful with the
information anyway.
Can you make a patch?
Ian.
>
> Thanks,
> Adi Kriegisch
>
> PS: I am using a stock Debian/Squeeze kernel on top of Debians Xen 4.0.1-2.
>
> [1] http://wiki.xensource.com/xenwiki/XenFaq
> [2] http://lists.xensource.com/archives/html/xen-users/2011-02/msg00615.html
> [3] http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1745
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blk[front|back] does not hand over disk parameters
2011-02-28 10:06 ` Ian Campbell
@ 2011-02-28 10:55 ` Jan Beulich
2011-02-28 11:13 ` Ian Campbell
2011-02-28 11:54 ` Adi Kriegisch
0 siblings, 2 replies; 7+ messages in thread
From: Jan Beulich @ 2011-02-28 10:55 UTC (permalink / raw)
To: Adi Kriegisch, Ian Campbell; +Cc: xen-devel@lists.xensource.com
>>> On 28.02.11 at 11:06, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> This is not the sort of thing which changes dynamically across the
> lifetime of a device, is it? In which case it seems like the sort of
> information which the backend could communicate to the frontend via
> xenbus at start of day. e.g. take a look at how the sector-size is
> passed through xenbus.
>
> It should be trivial to add this in a compatible manner since the
> frontend can just do what it does today if the nodes are missing and the
> backend wouldn't rely on the frontend doing anything useful with the
> information anyway.
Am I right in understanding that these numbers aren't used by
the block layer itself at all, but just get provided to userspace for
whatever optimization it can do? In that case, I can't really see
how passing through these values can really help general
performance (i.e. for apps not paying attention to these values).
Confused, Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blk[front|back] does not hand over disk parameters
2011-02-28 10:55 ` Jan Beulich
@ 2011-02-28 11:13 ` Ian Campbell
2011-02-28 12:54 ` Jan Beulich
2011-02-28 11:54 ` Adi Kriegisch
1 sibling, 1 reply; 7+ messages in thread
From: Ian Campbell @ 2011-02-28 11:13 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xensource.com, Adi Kriegisch
On Mon, 2011-02-28 at 10:55 +0000, Jan Beulich wrote:
> >>> On 28.02.11 at 11:06, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > This is not the sort of thing which changes dynamically across the
> > lifetime of a device, is it? In which case it seems like the sort of
> > information which the backend could communicate to the frontend via
> > xenbus at start of day. e.g. take a look at how the sector-size is
> > passed through xenbus.
> >
> > It should be trivial to add this in a compatible manner since the
> > frontend can just do what it does today if the nodes are missing and the
> > backend wouldn't rely on the frontend doing anything useful with the
> > information anyway.
>
> Am I right in understanding that these numbers aren't used by
> the block layer itself at all, but just get provided to userspace for
> whatever optimization it can do?
>
> Confused, Jan
I had inferred from Adi's bringing them up that the kernel would
actually use them in some way, but I don't actually know if that's the
case...
> In that case, I can't really see
> how passing through these values can really help general
> performance (i.e. for apps not paying attention to these values).
Even their utility is only if userspace explicitly makes use of them
they are just as useful in a Xen domU as they are in a non-Xen system,
so why would we not plumb them through?
Ian.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blk[front|back] does not hand over disk parameters
2011-02-28 10:55 ` Jan Beulich
2011-02-28 11:13 ` Ian Campbell
@ 2011-02-28 11:54 ` Adi Kriegisch
2011-02-28 12:51 ` Jan Beulich
1 sibling, 1 reply; 7+ messages in thread
From: Adi Kriegisch @ 2011-02-28 11:54 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xensource.com, Ian Campbell
On Mon, Feb 28, 2011 at 10:55:12AM +0000, Jan Beulich wrote:
> >>> On 28.02.11 at 11:06, Ian Campbell <Ian.Campbell@citrix.com> wrote:
[SNIP]
> > It should be trivial to add this in a compatible manner since the
> > frontend can just do what it does today if the nodes are missing and the
> > backend wouldn't rely on the frontend doing anything useful with the
> > information anyway.
>
> Am I right in understanding that these numbers aren't used by
> the block layer itself at all, but just get provided to userspace for
> whatever optimization it can do? In that case, I can't really see
> how passing through these values can really help general
> performance (i.e. for apps not paying attention to these values).
AFAIK these values are used by mkfs.* in userspace and by the I/O Schedulers
in kernel space to optimize performance. There has been some discussions about
that on the kernel mailing lists[1] and there is an interesting document about
that available from Mike Snitzer[2].
Those values are important for 4K block size drives, for SSDs and -- as in my
case -- for RAID levels with checksums.
A quick test with a samba server installed in Dom0 revealed that those
values do not need to be honoured by Samba to get full write speed. I/O
scheduler seems to be the one that needs those values.
-- Adi
[1] http://marc.info/?l=linux-ide&m=124058535512850&w=4
[2] http://people.redhat.com/msnitzer/docs/io-limits.txt
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blk[front|back] does not hand over disk parameters
2011-02-28 11:54 ` Adi Kriegisch
@ 2011-02-28 12:51 ` Jan Beulich
0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2011-02-28 12:51 UTC (permalink / raw)
To: Adi Kriegisch; +Cc: xen-devel@lists.xensource.com, Ian Campbell
>>> On 28.02.11 at 12:54, Adi Kriegisch <adi@cg.tuwien.ac.at> wrote:
> On Mon, Feb 28, 2011 at 10:55:12AM +0000, Jan Beulich wrote:
>> >>> On 28.02.11 at 11:06, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> [SNIP]
>> > It should be trivial to add this in a compatible manner since the
>> > frontend can just do what it does today if the nodes are missing and the
>> > backend wouldn't rely on the frontend doing anything useful with the
>> > information anyway.
>>
>> Am I right in understanding that these numbers aren't used by
>> the block layer itself at all, but just get provided to userspace for
>> whatever optimization it can do? In that case, I can't really see
>> how passing through these values can really help general
>> performance (i.e. for apps not paying attention to these values).
> AFAIK these values are used by mkfs.* in userspace and by the I/O Schedulers
> in kernel space to optimize performance. There has been some discussions
I grepped for io_min and a couple of derived variables (like
alignment_offset) but couldn't spot any I/O-relevant readers
under block/.
> about
> that on the kernel mailing lists[1] and there is an interesting document
> about
> that available from Mike Snitzer[2].
I'll take a look at those.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blk[front|back] does not hand over disk parameters
2011-02-28 11:13 ` Ian Campbell
@ 2011-02-28 12:54 ` Jan Beulich
0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2011-02-28 12:54 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel@lists.xensource.com, Adi Kriegisch
>>> On 28.02.11 at 12:13, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote:
> On Mon, 2011-02-28 at 10:55 +0000, Jan Beulich wrote:
>> >>> On 28.02.11 at 11:06, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>> > This is not the sort of thing which changes dynamically across the
>> > lifetime of a device, is it? In which case it seems like the sort of
>> > information which the backend could communicate to the frontend via
>> > xenbus at start of day. e.g. take a look at how the sector-size is
>> > passed through xenbus.
>> >
>> > It should be trivial to add this in a compatible manner since the
>> > frontend can just do what it does today if the nodes are missing and the
>> > backend wouldn't rely on the frontend doing anything useful with the
>> > information anyway.
>>
>> Am I right in understanding that these numbers aren't used by
>> the block layer itself at all, but just get provided to userspace for
>> whatever optimization it can do?
>>
>> Confused, Jan
>
> I had inferred from Adi's bringing them up that the kernel would
> actually use them in some way, but I don't actually know if that's the
> case...
>
>> In that case, I can't really see
>> how passing through these values can really help general
>> performance (i.e. for apps not paying attention to these values).
>
> Even their utility is only if userspace explicitly makes use of them
> they are just as useful in a Xen domU as they are in a non-Xen system,
> so why would we not plumb them through?
Plumbing through can be easily done, indeed, but the question
is if that gets us any performance improvement. If these are only
for allowing user mode optimizations, then maybe. If they're to
control kernel behavior, then the 11-pages-per-request limitation
of the blkif protocol would likely make this exercise pretty pointless
I would think.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-02-28 12:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-25 16:43 blk[front|back] does not hand over disk parameters Adi Kriegisch
2011-02-28 10:06 ` Ian Campbell
2011-02-28 10:55 ` Jan Beulich
2011-02-28 11:13 ` Ian Campbell
2011-02-28 12:54 ` Jan Beulich
2011-02-28 11:54 ` Adi Kriegisch
2011-02-28 12:51 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.