qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
@ 2018-12-29  6:29 Ying Fang
  0 siblings, 0 replies; 10+ messages in thread
From: Ying Fang @ 2018-12-29  6:29 UTC (permalink / raw)
  To: mreitz, kwolf; +Cc: qemu-devel@nongnu.org, "jianjay.zhou

[-- Attachment #1: Type: text/plain, Size: 1994 bytes --]

Hi.
Recently one of our customer complained about the I/O performance of QEMU emulated host cdrom device.
I did some investigation on it and there was still some point I could not figure out. So I had to ask for your help.

Here is the application scenario setup by our customer.
filename.iso        /dev/sr0           /dev/cdrom
remote client    -->  host(cdemu)      -->  Linux VM
(1)A remote client maps an iso file to x86 host machine through network using tcp.
(2)The cdemu daemon then load it as a local virtual cdrom disk drive.
(3)A VM is launched with the virtual cdrom disk drive configured.
The VM can make use of this virtual cdrom to install an OS in the iso file.

The network bandwith btw the remote client and host is 100Mbps, we test I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=100000.
And we have
(1) I/O perf of host side /dev/sr0 is 11MB/s;
(2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.

As we can see, I/O perf of cdrom inside the VM is about 34.5% compared with host side.
FlameGraph is used to find out the bottleneck of this operation and we find out that too much time is occupied by calling *bdrv_is_inserted*.
Then we dig into the code and figure out that the ioctl in *cdrom_is_inserted* takes too much time, because it triggers io_schdule_timeout in kernel.
In the code path of emulated cdrom device, each DMA I/O request consists of several *bdrv_is_inserted*, which degrades the I/O performance by about 31% in our test.
static bool cdrom_is_inserted(BlockDriverState *bs)
{
    BDRVRawState *s = bs->opaque;
    int ret;

    ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
    return ret == CDS_DISC_OK;
}
A flamegraph svg file (cdrom.svg) is attachieved in this email to show the code timing profile we've tested.

So here is my question.
(1) Why do we regularly check the presence of a cdrom disk drive in the code path?  Can we do it asynchronously?
(2) Can we drop some check point in the code path to improve the performance?
Thanks.

[-- Attachment #2: cdrom.svg --]
[-- Type: image/svg+xml, Size: 414142 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
@ 2018-12-29  6:33 Ying Fang
  2019-01-07  1:23 ` Ying Fang
  2019-01-08 12:46 ` Kevin Wolf
  0 siblings, 2 replies; 10+ messages in thread
From: Ying Fang @ 2018-12-29  6:33 UTC (permalink / raw)
  To: mreitz, kwolf; +Cc: qemu-devel@nongnu.org, jianjay.zhou, dengkai1

[-- Attachment #1: Type: text/plain, Size: 1994 bytes --]

Hi.
Recently one of our customer complained about the I/O performance of QEMU emulated host cdrom device.
I did some investigation on it and there was still some point I could not figure out. So I had to ask for your help.

Here is the application scenario setup by our customer.
filename.iso        /dev/sr0           /dev/cdrom
remote client    -->  host(cdemu)      -->  Linux VM
(1)A remote client maps an iso file to x86 host machine through network using tcp.
(2)The cdemu daemon then load it as a local virtual cdrom disk drive.
(3)A VM is launched with the virtual cdrom disk drive configured.
The VM can make use of this virtual cdrom to install an OS in the iso file.

The network bandwith btw the remote client and host is 100Mbps, we test I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=100000.
And we have
(1) I/O perf of host side /dev/sr0 is 11MB/s;
(2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.

As we can see, I/O perf of cdrom inside the VM is about 34.5% compared with host side.
FlameGraph is used to find out the bottleneck of this operation and we find out that too much time is occupied by calling *bdrv_is_inserted*.
Then we dig into the code and figure out that the ioctl in *cdrom_is_inserted* takes too much time, because it triggers io_schdule_timeout in kernel.
In the code path of emulated cdrom device, each DMA I/O request consists of several *bdrv_is_inserted*, which degrades the I/O performance by about 31% in our test.
static bool cdrom_is_inserted(BlockDriverState *bs)
{
    BDRVRawState *s = bs->opaque;
    int ret;

    ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
    return ret == CDS_DISC_OK;
}
A flamegraph svg file (cdrom.svg) is attachieved in this email to show the code timing profile we've tested.

So here is my question.
(1) Why do we regularly check the presence of a cdrom disk drive in the code path?  Can we do it asynchronously?
(2) Can we drop some check point in the code path to improve the performance?
Thanks.

[-- Attachment #2: cdrom.svg --]
[-- Type: image/svg+xml, Size: 414142 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
  2018-12-29  6:33 [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device Ying Fang
@ 2019-01-07  1:23 ` Ying Fang
  2019-01-08 12:46 ` Kevin Wolf
  1 sibling, 0 replies; 10+ messages in thread
From: Ying Fang @ 2019-01-07  1:23 UTC (permalink / raw)
  To: mreitz, kwolf; +Cc: qemu-devel@nongnu.org, jianjay.zhou, dengkai1

ping

On 2018/12/29 14:33, Ying Fang wrote:
> Hi.
> Recently one of our customer complained about the I/O performance of QEMU emulated host cdrom device.
> I did some investigation on it and there was still some point I could not figure out. So I had to ask for your help.
> 
> Here is the application scenario setup by our customer.
> filename.iso        /dev/sr0           /dev/cdrom
> remote client    -->  host(cdemu)      -->  Linux VM
> (1)A remote client maps an iso file to x86 host machine through network using tcp.
> (2)The cdemu daemon then load it as a local virtual cdrom disk drive.
> (3)A VM is launched with the virtual cdrom disk drive configured.
> The VM can make use of this virtual cdrom to install an OS in the iso file.
> 
> The network bandwith btw the remote client and host is 100Mbps, we test I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=100000.
> And we have
> (1) I/O perf of host side /dev/sr0 is 11MB/s;
> (2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.
> 
> As we can see, I/O perf of cdrom inside the VM is about 34.5% compared with host side.
> FlameGraph is used to find out the bottleneck of this operation and we find out that too much time is occupied by calling *bdrv_is_inserted*.
> Then we dig into the code and figure out that the ioctl in *cdrom_is_inserted* takes too much time, because it triggers io_schdule_timeout in kernel.
> In the code path of emulated cdrom device, each DMA I/O request consists of several *bdrv_is_inserted*, which degrades the I/O performance by about 31% in our test.
> static bool cdrom_is_inserted(BlockDriverState *bs)
> {
>     BDRVRawState *s = bs->opaque;
>     int ret;
> 
>     ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>     return ret == CDS_DISC_OK;
> }
> A flamegraph svg file (cdrom.svg) is attachieved in this email to show the code timing profile we've tested.
> 
> So here is my question.
> (1) Why do we regularly check the presence of a cdrom disk drive in the code path?  Can we do it asynchronously?
> (2) Can we drop some check point in the code path to improve the performance?
> Thanks.
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
  2018-12-29  6:33 [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device Ying Fang
  2019-01-07  1:23 ` Ying Fang
@ 2019-01-08 12:46 ` Kevin Wolf
  2019-01-09  3:20   ` Ying Fang
  1 sibling, 1 reply; 10+ messages in thread
From: Kevin Wolf @ 2019-01-08 12:46 UTC (permalink / raw)
  To: Ying Fang; +Cc: mreitz, qemu-devel@nongnu.org, jianjay.zhou, dengkai1

Am 29.12.2018 um 07:33 hat Ying Fang geschrieben:
> Hi.
> Recently one of our customer complained about the I/O performance of QEMU emulated host cdrom device.
> I did some investigation on it and there was still some point I could not figure out. So I had to ask for your help.
> 
> Here is the application scenario setup by our customer.
> filename.iso        /dev/sr0           /dev/cdrom
> remote client    -->  host(cdemu)      -->  Linux VM
> (1)A remote client maps an iso file to x86 host machine through network using tcp.
> (2)The cdemu daemon then load it as a local virtual cdrom disk drive.
> (3)A VM is launched with the virtual cdrom disk drive configured.
> The VM can make use of this virtual cdrom to install an OS in the iso file.
> 
> The network bandwith btw the remote client and host is 100Mbps, we test I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=100000.
> And we have
> (1) I/O perf of host side /dev/sr0 is 11MB/s;
> (2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.
> 
> As we can see, I/O perf of cdrom inside the VM is about 34.5% compared with host side.
> FlameGraph is used to find out the bottleneck of this operation and we find out that too much time is occupied by calling *bdrv_is_inserted*.
> Then we dig into the code and figure out that the ioctl in *cdrom_is_inserted* takes too much time, because it triggers io_schdule_timeout in kernel.
> In the code path of emulated cdrom device, each DMA I/O request consists of several *bdrv_is_inserted*, which degrades the I/O performance by about 31% in our test.
> static bool cdrom_is_inserted(BlockDriverState *bs)
> {
>     BDRVRawState *s = bs->opaque;
>     int ret;
> 
>     ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>     return ret == CDS_DISC_OK;
> }
> A flamegraph svg file (cdrom.svg) is attachieved in this email to show the code timing profile we've tested.
> 
> So here is my question.
> (1) Why do we regularly check the presence of a cdrom disk drive in the code path?  Can we do it asynchronously?
> (2) Can we drop some check point in the code path to improve the performance?
> Thanks.

I'm actually not sure why so many places check it. Just letting an I/O
request fail if the CD was removed would probably be easier.

To try out whether that would improve performance significantly, you
could try to use the host_device backend rather than the host_cdrom
backend. That one doesn't implement .bdrv_is_inserted, so the operation
will be cheap (just return true unconditionally).

You will also lose eject/lock passthrough when doing so, so this is not
the final solution, but if it proves to be a lot faster, we can check
where bdrv_is_inserted() calls are actually important (if anywhere) and
hopefully remove some even for the host_cdrom case.

Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
  2019-01-08 12:46 ` Kevin Wolf
@ 2019-01-09  3:20   ` Ying Fang
  2019-01-15 20:15     ` John Snow
  0 siblings, 1 reply; 10+ messages in thread
From: Ying Fang @ 2019-01-09  3:20 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: mreitz, qemu-devel@nongnu.org, jianjay.zhou, dengkai1



On 2019/1/8 20:46, Kevin Wolf wrote:
> Am 29.12.2018 um 07:33 hat Ying Fang geschrieben:
>> Hi.
>> Recently one of our customer complained about the I/O performance of QEMU emulated host cdrom device.
>> I did some investigation on it and there was still some point I could not figure out. So I had to ask for your help.
>>
>> Here is the application scenario setup by our customer.
>> filename.iso        /dev/sr0           /dev/cdrom
>> remote client    -->  host(cdemu)      -->  Linux VM
>> (1)A remote client maps an iso file to x86 host machine through network using tcp.
>> (2)The cdemu daemon then load it as a local virtual cdrom disk drive.
>> (3)A VM is launched with the virtual cdrom disk drive configured.
>> The VM can make use of this virtual cdrom to install an OS in the iso file.
>>
>> The network bandwith btw the remote client and host is 100Mbps, we test I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=100000.
>> And we have
>> (1) I/O perf of host side /dev/sr0 is 11MB/s;
>> (2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.
>>
>> As we can see, I/O perf of cdrom inside the VM is about 34.5% compared with host side.
>> FlameGraph is used to find out the bottleneck of this operation and we find out that too much time is occupied by calling *bdrv_is_inserted*.
>> Then we dig into the code and figure out that the ioctl in *cdrom_is_inserted* takes too much time, because it triggers io_schdule_timeout in kernel.
>> In the code path of emulated cdrom device, each DMA I/O request consists of several *bdrv_is_inserted*, which degrades the I/O performance by about 31% in our test.
>> static bool cdrom_is_inserted(BlockDriverState *bs)
>> {
>>     BDRVRawState *s = bs->opaque;
>>     int ret;
>>
>>     ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>>     return ret == CDS_DISC_OK;
>> }
>> A flamegraph svg file (cdrom.svg) is attachieved in this email to show the code timing profile we've tested.
>>
>> So here is my question.
>> (1) Why do we regularly check the presence of a cdrom disk drive in the code path?  Can we do it asynchronously?
>> (2) Can we drop some check point in the code path to improve the performance?
>> Thanks.
> 
> I'm actually not sure why so many places check it. Just letting an I/O
> request fail if the CD was removed would probably be easier.
> 
You can see those check points as showed in the attached flamegraph file in this thread (rename it to cdrom.svg).
It is called mainly in bdrv_check_byte_request and ide_atapi_cmd, and only the host_cdrom backend implements
it using cdrom_is_inserted.

> To try out whether that would improve performance significantly, you
> could try to use the host_device backend rather than the host_cdrom
> backend. That one doesn't implement .bdrv_is_inserted, so the operation
> will be cheap (just return true unconditionally).
> 
The remote client maps filename.iso to host virtual cdrom drive.
We use xml like
    <disk type='block' device='cdrom'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/sr0'/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
to attach the virtual cdrom drive into guest and backend bdrv is host_cdrom.
Sorry I do not know how to use the host_device backend here.

> You will also lose eject/lock passthrough when doing so, so this is not
> the final solution, but if it proves to be a lot faster, we can check
> where bdrv_is_inserted() calls are actually important (if anywhere) and
> hopefully remove some even for the host_cdrom case.
> 
You mean if we do not check *cdrom_is_inserted*, we may lose eject/lock here ?
I wonder we used to check the presence of the cdrom device each time it is touched
though I am not familiar with this feature.

> Kevin
> 
> .
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
  2019-01-09  3:20   ` Ying Fang
@ 2019-01-15 20:15     ` John Snow
  2019-01-16  2:48       ` Ying Fang
  0 siblings, 1 reply; 10+ messages in thread
From: John Snow @ 2019-01-15 20:15 UTC (permalink / raw)
  To: Ying Fang, Kevin Wolf
  Cc: jianjay.zhou, dengkai1, qemu-devel@nongnu.org, mreitz



On 1/8/19 10:20 PM, Ying Fang wrote:
> 
> 
> On 2019/1/8 20:46, Kevin Wolf wrote:
>> Am 29.12.2018 um 07:33 hat Ying Fang geschrieben:
>>> Hi.
>>> Recently one of our customer complained about the I/O performance of QEMU emulated host cdrom device.
>>> I did some investigation on it and there was still some point I could not figure out. So I had to ask for your help.
>>>
>>> Here is the application scenario setup by our customer.
>>> filename.iso        /dev/sr0           /dev/cdrom
>>> remote client    -->  host(cdemu)      -->  Linux VM
>>> (1)A remote client maps an iso file to x86 host machine through network using tcp.
>>> (2)The cdemu daemon then load it as a local virtual cdrom disk drive.
>>> (3)A VM is launched with the virtual cdrom disk drive configured.
>>> The VM can make use of this virtual cdrom to install an OS in the iso file.
>>>
>>> The network bandwith btw the remote client and host is 100Mbps, we test I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=100000.
>>> And we have
>>> (1) I/O perf of host side /dev/sr0 is 11MB/s;
>>> (2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.
>>>
>>> As we can see, I/O perf of cdrom inside the VM is about 34.5% compared with host side.
>>> FlameGraph is used to find out the bottleneck of this operation and we find out that too much time is occupied by calling *bdrv_is_inserted*.
>>> Then we dig into the code and figure out that the ioctl in *cdrom_is_inserted* takes too much time, because it triggers io_schdule_timeout in kernel.
>>> In the code path of emulated cdrom device, each DMA I/O request consists of several *bdrv_is_inserted*, which degrades the I/O performance by about 31% in our test.
>>> static bool cdrom_is_inserted(BlockDriverState *bs)
>>> {
>>>     BDRVRawState *s = bs->opaque;
>>>     int ret;
>>>
>>>     ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>>>     return ret == CDS_DISC_OK;
>>> }
>>> A flamegraph svg file (cdrom.svg) is attachieved in this email to show the code timing profile we've tested.
>>>
>>> So here is my question.
>>> (1) Why do we regularly check the presence of a cdrom disk drive in the code path?  Can we do it asynchronously?

The ATAPI emulator needs to know if it has storage present to carry out
the request or to signal an error, so it checks. It might be the case
that the VM operator forcibly dismounted network storage without
awareness from the guest. (This is basically emulation of the case when
a user mechanically forces a CDROM tray open.)

I wasn't aware this check was so slow.

Maybe we can just cache blk_is_inserted -- I don't remember if it's
possible to eject media without awareness from the device model, but if
this check winds up being slow in some configurations we can cache it.

This won't help the bdrv_check_byte_request calls, though, just ones
from the device model, see below

>>> (2) Can we drop some check point in the code path to improve the performance?
>>> Thanks.
>>
>> I'm actually not sure why so many places check it. Just letting an I/O
>> request fail if the CD was removed would probably be easier.
>>
> You can see those check points as showed in the attached flamegraph file in this thread (rename it to cdrom.svg).
> It is called mainly in bdrv_check_byte_request and ide_atapi_cmd, and only the host_cdrom backend implements
> it using cdrom_is_inserted.
> 

in ide_atapi_cmd, try replacing:

`!s->tray_open && blk_is_inserted(s->blk) && s->cdrom_changed)`

with

`!s->tray_open && s->cdrom_changed && blk_is_inserted(s->blk))`

which should hopefully short-circuit calls to blk_is_inserted if
s->cdrom_changed is false, but it doesn't do much to fix the subsequent
call:

`    if ((cmd->flags & CHECK_READY) &&
        (!media_present(s) || !blk_is_inserted(s->blk)))`

which is unfortunately going to check blk_is_inserted quite a lot
because the read commands are tagged CHECK_READY...


As alluded to above, too, I'm not sure what I can do about
bdrv_check_byte_request -- what happens if the io.c helpers don't
actually check this? Will they fail gracefully?

I guess there's one way to find out.

>> To try out whether that would improve performance significantly, you
>> could try to use the host_device backend rather than the host_cdrom
>> backend. That one doesn't implement .bdrv_is_inserted, so the operation
>> will be cheap (just return true unconditionally).
>>
> The remote client maps filename.iso to host virtual cdrom drive.
> We use xml like
>     <disk type='block' device='cdrom'>
>       <driver name='qemu' type='raw' cache='none' io='native'/>
>       <source dev='/dev/sr0'/>
>       <target dev='hdb' bus='ide'/>
>       <readonly/>
>       <boot order='2'/>
>       <address type='drive' controller='0' bus='0' target='0' unit='1'/>
>     </disk>
> to attach the virtual cdrom drive into guest and backend bdrv is host_cdrom.
> Sorry I do not know how to use the host_device backend here.
> 
>> You will also lose eject/lock passthrough when doing so, so this is not
>> the final solution, but if it proves to be a lot faster, we can check
>> where bdrv_is_inserted() calls are actually important (if anywhere) and
>> hopefully remove some even for the host_cdrom case.
>>
> You mean if we do not check *cdrom_is_inserted*, we may lose eject/lock here ?
> I wonder we used to check the presence of the cdrom device each time it is touched
> though I am not familiar with this feature.
> 
>> Kevin
>>
>> .
>>
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
  2019-01-15 20:15     ` John Snow
@ 2019-01-16  2:48       ` Ying Fang
  2019-01-18 22:43         ` John Snow
  0 siblings, 1 reply; 10+ messages in thread
From: Ying Fang @ 2019-01-16  2:48 UTC (permalink / raw)
  To: John Snow, Kevin Wolf
  Cc: jianjay.zhou, dengkai1, qemu-devel@nongnu.org, mreitz



On 2019/1/16 4:15, John Snow wrote:
> 
> 
> On 1/8/19 10:20 PM, Ying Fang wrote:
>>
>>
>> On 2019/1/8 20:46, Kevin Wolf wrote:
>>> Am 29.12.2018 um 07:33 hat Ying Fang geschrieben:
>>>> Hi.
>>>> Recently one of our customer complained about the I/O performance of QEMU emulated host cdrom device.
>>>> I did some investigation on it and there was still some point I could not figure out. So I had to ask for your help.
>>>>
>>>> Here is the application scenario setup by our customer.
>>>> filename.iso        /dev/sr0           /dev/cdrom
>>>> remote client    -->  host(cdemu)      -->  Linux VM
>>>> (1)A remote client maps an iso file to x86 host machine through network using tcp.
>>>> (2)The cdemu daemon then load it as a local virtual cdrom disk drive.
>>>> (3)A VM is launched with the virtual cdrom disk drive configured.
>>>> The VM can make use of this virtual cdrom to install an OS in the iso file.
>>>>
>>>> The network bandwith btw the remote client and host is 100Mbps, we test I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=100000.
>>>> And we have
>>>> (1) I/O perf of host side /dev/sr0 is 11MB/s;
>>>> (2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.
>>>>
>>>> As we can see, I/O perf of cdrom inside the VM is about 34.5% compared with host side.
>>>> FlameGraph is used to find out the bottleneck of this operation and we find out that too much time is occupied by calling *bdrv_is_inserted*.
>>>> Then we dig into the code and figure out that the ioctl in *cdrom_is_inserted* takes too much time, because it triggers io_schdule_timeout in kernel.
>>>> In the code path of emulated cdrom device, each DMA I/O request consists of several *bdrv_is_inserted*, which degrades the I/O performance by about 31% in our test.
>>>> static bool cdrom_is_inserted(BlockDriverState *bs)
>>>> {
>>>>     BDRVRawState *s = bs->opaque;
>>>>     int ret;
>>>>
>>>>     ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>>>>     return ret == CDS_DISC_OK;
>>>> }
>>>> A flamegraph svg file (cdrom.svg) is attachieved in this email to show the code timing profile we've tested.
>>>>
>>>> So here is my question.
>>>> (1) Why do we regularly check the presence of a cdrom disk drive in the code path?  Can we do it asynchronously?
> 
> The ATAPI emulator needs to know if it has storage present to carry out
> the request or to signal an error, so it checks. It might be the case
> that the VM operator forcibly dismounted network storage without
> awareness from the guest. (This is basically emulation of the case when
> a user mechanically forces a CDROM tray open.)
> 
> I wasn't aware this check was so slow.
It is fast enough in most case, however it may be slow if the cdrom drive is a virtual drive mapped from remote client.
This is showed in https://raw.githubusercontent.com/fangying/fangying.github.io/master/94E119FA-8BA1-41AB-A26A-FBDC635BCD2C.png
The reason is ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) in cdrom_is_inserted takes much time in this scenario.

> 
> Maybe we can just cache blk_is_inserted -- I don't remember if it's
> possible to eject media without awareness from the device model, but if
> this check winds up being slow in some configurations we can cache it.
To cache media status is a good idea here, we check blk_is_inserted instead and modify it when media status is changed.
> 
> This won't help the bdrv_check_byte_request calls, though, just ones
> from the device model, see below
> 
>>>> (2) Can we drop some check point in the code path to improve the performance?
>>>> Thanks.
>>>
>>> I'm actually not sure why so many places check it. Just letting an I/O
>>> request fail if the CD was removed would probably be easier.
>>>
>> You can see those check points as showed in the attached flamegraph file in this thread (rename it to cdrom.svg).
>> It is called mainly in bdrv_check_byte_request and ide_atapi_cmd, and only the host_cdrom backend implements
>> it using cdrom_is_inserted.
>>
> 
> in ide_atapi_cmd, try replacing:
> 
> `!s->tray_open && blk_is_inserted(s->blk) && s->cdrom_changed)`
> 
> with
> 
> `!s->tray_open && s->cdrom_changed && blk_is_inserted(s->blk))`
> 
> which should hopefully short-circuit calls to blk_is_inserted if
> s->cdrom_changed is false, but it doesn't do much to fix the subsequent
> call:
> 
> `    if ((cmd->flags & CHECK_READY) &&
>         (!media_present(s) || !blk_is_inserted(s->blk)))`
> 
> which is unfortunately going to check blk_is_inserted quite a lot
> because the read commands are tagged CHECK_READY...
Yes you are right.
> 
> 
> As alluded to above, too, I'm not sure what I can do about
> bdrv_check_byte_request -- what happens if the io.c helpers don't
> actually check this? Will they fail gracefully?
> 
> I guess there's one way to find out.

I did some test here. As we can see only cdrom bdrv implements bdrv_is_inserted,
then I skip check presence of cdrom device.

I test I/O using dd if=/dev/sr0 of=/dev/null bs=32K count=1000
and manually disconnect the remote iso client from the host to inject error.

I found it failed gracefully and send an I/O error event.
monitor_qapi_event_emit[423]|: {"timestamp": {"seconds": 1547466098, "microseconds": 740265}, "event": "BLOCK_IO_ERROR", "data":
{"device": "drive-ide0-0-1", "node-name": "#block369", "reason": "Input/output error", "operation": "read", "action": "report"}}

diff --git a/block.c b/block.c
index 45145c5..5371ce2 100644
--- a/block.c
+++ b/block.c
@@ -3479,6 +3479,11 @@ bool bdrv_is_inserted(BlockDriverState *bs)
     if (!drv) {
         return false;
     }
+
+    if (bdrv_is_read_only(bs)) {
+       return true;
+    }
+
     if (drv->bdrv_is_inserted) {
         return drv->bdrv_is_inserted(bs);
     }

Thanks.
> 
>>> To try out whether that would improve performance significantly, you
>>> could try to use the host_device backend rather than the host_cdrom
>>> backend. That one doesn't implement .bdrv_is_inserted, so the operation
>>> will be cheap (just return true unconditionally).
>>>
>> The remote client maps filename.iso to host virtual cdrom drive.
>> We use xml like
>>     <disk type='block' device='cdrom'>
>>       <driver name='qemu' type='raw' cache='none' io='native'/>
>>       <source dev='/dev/sr0'/>
>>       <target dev='hdb' bus='ide'/>
>>       <readonly/>
>>       <boot order='2'/>
>>       <address type='drive' controller='0' bus='0' target='0' unit='1'/>
>>     </disk>
>> to attach the virtual cdrom drive into guest and backend bdrv is host_cdrom.
>> Sorry I do not know how to use the host_device backend here.
>>
>>> You will also lose eject/lock passthrough when doing so, so this is not
>>> the final solution, but if it proves to be a lot faster, we can check
>>> where bdrv_is_inserted() calls are actually important (if anywhere) and
>>> hopefully remove some even for the host_cdrom case.
>>>
>> You mean if we do not check *cdrom_is_inserted*, we may lose eject/lock here ?
>> I wonder we used to check the presence of the cdrom device each time it is touched
>> though I am not familiar with this feature.
>>
>>> Kevin
>>>
>>> .
>>>
>>
>>
> 
> 
> 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
  2019-01-16  2:48       ` Ying Fang
@ 2019-01-18 22:43         ` John Snow
  2019-01-22  3:26           ` fangying
  0 siblings, 1 reply; 10+ messages in thread
From: John Snow @ 2019-01-18 22:43 UTC (permalink / raw)
  To: Ying Fang, Kevin Wolf
  Cc: jianjay.zhou, dengkai1, qemu-devel@nongnu.org, mreitz



On 1/15/19 9:48 PM, Ying Fang wrote:
> 
> 
> On 2019/1/16 4:15, John Snow wrote:
>>
>>
>> On 1/8/19 10:20 PM, Ying Fang wrote:
>>>
>>>
>>> On 2019/1/8 20:46, Kevin Wolf wrote:
>>>> Am 29.12.2018 um 07:33 hat Ying Fang geschrieben:
>>>>> Hi.
>>>>> Recently one of our customer complained about the I/O performance of QEMU emulated host cdrom device.
>>>>> I did some investigation on it and there was still some point I could not figure out. So I had to ask for your help.
>>>>>
>>>>> Here is the application scenario setup by our customer.
>>>>> filename.iso        /dev/sr0           /dev/cdrom
>>>>> remote client    -->  host(cdemu)      -->  Linux VM
>>>>> (1)A remote client maps an iso file to x86 host machine through network using tcp.
>>>>> (2)The cdemu daemon then load it as a local virtual cdrom disk drive.
>>>>> (3)A VM is launched with the virtual cdrom disk drive configured.
>>>>> The VM can make use of this virtual cdrom to install an OS in the iso file.
>>>>>
>>>>> The network bandwith btw the remote client and host is 100Mbps, we test I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=100000.
>>>>> And we have
>>>>> (1) I/O perf of host side /dev/sr0 is 11MB/s;
>>>>> (2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.
>>>>>
>>>>> As we can see, I/O perf of cdrom inside the VM is about 34.5% compared with host side.
>>>>> FlameGraph is used to find out the bottleneck of this operation and we find out that too much time is occupied by calling *bdrv_is_inserted*.
>>>>> Then we dig into the code and figure out that the ioctl in *cdrom_is_inserted* takes too much time, because it triggers io_schdule_timeout in kernel.
>>>>> In the code path of emulated cdrom device, each DMA I/O request consists of several *bdrv_is_inserted*, which degrades the I/O performance by about 31% in our test.
>>>>> static bool cdrom_is_inserted(BlockDriverState *bs)
>>>>> {
>>>>>     BDRVRawState *s = bs->opaque;
>>>>>     int ret;
>>>>>
>>>>>     ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>>>>>     return ret == CDS_DISC_OK;
>>>>> }
>>>>> A flamegraph svg file (cdrom.svg) is attachieved in this email to show the code timing profile we've tested.
>>>>>
>>>>> So here is my question.
>>>>> (1) Why do we regularly check the presence of a cdrom disk drive in the code path?  Can we do it asynchronously?
>>
>> The ATAPI emulator needs to know if it has storage present to carry out
>> the request or to signal an error, so it checks. It might be the case
>> that the VM operator forcibly dismounted network storage without
>> awareness from the guest. (This is basically emulation of the case when
>> a user mechanically forces a CDROM tray open.)
>>
>> I wasn't aware this check was so slow.
> It is fast enough in most case, however it may be slow if the cdrom drive is a virtual drive mapped from remote client.
> This is showed in https://raw.githubusercontent.com/fangying/fangying.github.io/master/94E119FA-8BA1-41AB-A26A-FBDC635BCD2C.png
> The reason is ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) in cdrom_is_inserted takes much time in this scenario.
> 
>>
>> Maybe we can just cache blk_is_inserted -- I don't remember if it's
>> possible to eject media without awareness from the device model, but if
>> this check winds up being slow in some configurations we can cache it.
> To cache media status is a good idea here, we check blk_is_inserted instead and modify it when media status is changed.
>>
>> This won't help the bdrv_check_byte_request calls, though, just ones
>> from the device model, see below
>>
>>>>> (2) Can we drop some check point in the code path to improve the performance?
>>>>> Thanks.
>>>>
>>>> I'm actually not sure why so many places check it. Just letting an I/O
>>>> request fail if the CD was removed would probably be easier.
>>>>
>>> You can see those check points as showed in the attached flamegraph file in this thread (rename it to cdrom.svg).
>>> It is called mainly in bdrv_check_byte_request and ide_atapi_cmd, and only the host_cdrom backend implements
>>> it using cdrom_is_inserted.
>>>
>>
>> in ide_atapi_cmd, try replacing:
>>
>> `!s->tray_open && blk_is_inserted(s->blk) && s->cdrom_changed)`
>>
>> with
>>
>> `!s->tray_open && s->cdrom_changed && blk_is_inserted(s->blk))`
>>
>> which should hopefully short-circuit calls to blk_is_inserted if
>> s->cdrom_changed is false, but it doesn't do much to fix the subsequent
>> call:
>>
>> `    if ((cmd->flags & CHECK_READY) &&
>>         (!media_present(s) || !blk_is_inserted(s->blk)))`
>>
>> which is unfortunately going to check blk_is_inserted quite a lot
>> because the read commands are tagged CHECK_READY...
> Yes you are right.
>>
>>
>> As alluded to above, too, I'm not sure what I can do about
>> bdrv_check_byte_request -- what happens if the io.c helpers don't
>> actually check this? Will they fail gracefully?
>>
>> I guess there's one way to find out.
> 
> I did some test here. As we can see only cdrom bdrv implements bdrv_is_inserted,
> then I skip check presence of cdrom device.
> 
> I test I/O using dd if=/dev/sr0 of=/dev/null bs=32K count=1000
> and manually disconnect the remote iso client from the host to inject error.
> 
> I found it failed gracefully and send an I/O error event.
> monitor_qapi_event_emit[423]|: {"timestamp": {"seconds": 1547466098, "microseconds": 740265}, "event": "BLOCK_IO_ERROR", "data":
> {"device": "drive-ide0-0-1", "node-name": "#block369", "reason": "Input/output error", "operation": "read", "action": "report"}}
> 
> diff --git a/block.c b/block.c
> index 45145c5..5371ce2 100644
> --- a/block.c
> +++ b/block.c
> @@ -3479,6 +3479,11 @@ bool bdrv_is_inserted(BlockDriverState *bs)
>      if (!drv) {
>          return false;
>      }
> +
> +    if (bdrv_is_read_only(bs)) {
> +       return true;
> +    }
> +
>      if (drv->bdrv_is_inserted) {
>          return drv->bdrv_is_inserted(bs);
>      }
> 
> Thanks.
>>
>>>> To try out whether that would improve performance significantly, you
>>>> could try to use the host_device backend rather than the host_cdrom
>>>> backend. That one doesn't implement .bdrv_is_inserted, so the operation
>>>> will be cheap (just return true unconditionally).
>>>>
>>> The remote client maps filename.iso to host virtual cdrom drive.
>>> We use xml like
>>>     <disk type='block' device='cdrom'>
>>>       <driver name='qemu' type='raw' cache='none' io='native'/>
>>>       <source dev='/dev/sr0'/>
>>>       <target dev='hdb' bus='ide'/>
>>>       <readonly/>
>>>       <boot order='2'/>
>>>       <address type='drive' controller='0' bus='0' target='0' unit='1'/>
>>>     </disk>
>>> to attach the virtual cdrom drive into guest and backend bdrv is host_cdrom.
>>> Sorry I do not know how to use the host_device backend here.
>>>
>>>> You will also lose eject/lock passthrough when doing so, so this is not
>>>> the final solution, but if it proves to be a lot faster, we can check
>>>> where bdrv_is_inserted() calls are actually important (if anywhere) and
>>>> hopefully remove some even for the host_cdrom case.
>>>>
>>> You mean if we do not check *cdrom_is_inserted*, we may lose eject/lock here ?
>>> I wonder we used to check the presence of the cdrom device each time it is touched
>>> though I am not familiar with this feature.
>>>
>>>> Kevin
>>>>
>>>> .
>>>>
>>>
>>>
>>
>>
>>
> 

Do you intend to send a patch along? I doubt I'll be able to prioritize
this but I will look over any patches that get sent. I am a little
skeptical of the "read_only" optimization, I might need Kevin's input on
that one, but a patch would be a good place for that discussion.

--js

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
  2019-01-18 22:43         ` John Snow
@ 2019-01-22  3:26           ` fangying
  0 siblings, 0 replies; 10+ messages in thread
From: fangying @ 2019-01-22  3:26 UTC (permalink / raw)
  To: John Snow, Kevin Wolf; +Cc: Zhoujian (jay), dengkai (A), qemu-devel, mreitz






________________________________
方应
M:+86-15925605092<tel:+86-15925605092>
E:fangying1@huawei.com<mailto:fangying1@huawei.com>
2012实验室-欧拉五部
2012 Laboratories-Euler Dept 5

发件人: John Snow
收件人: fangying<fangying1@huawei.com<mailto:fangying1@huawei.com>>;Kevin Wolf<kwolf@redhat.com<mailto:kwolf@redhat.com>>
抄送: Zhoujian (jay)<jianjay.zhou@huawei.com<mailto:jianjay.zhou@huawei.com>>;dengkai (A)<dengkai1@huawei.com<mailto:dengkai1@huawei.com>>;qemu-devel<qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>>;mreitz<mreitz@redhat.com<mailto:mreitz@redhat.com>>
主题: Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
时间: 2019-01-19 06:43:43



On 1/15/19 9:48 PM, Ying Fang wrote:
>
>
> On 2019/1/16 4:15, John Snow wrote:
>>
>>
>> On 1/8/19 10:20 PM, Ying Fang wrote:
>>>
>>>
>>> On 2019/1/8 20:46, Kevin Wolf wrote:
>>>> Am 29.12.2018 um 07:33 hat Ying Fang geschrieben:
>>>>> Hi.
>>>>> Recently one of our customer complained about the I/O performance of QEMU emulated host cdrom device.
>>>>> I did some investigation on it and there was still some point I could not figure out. So I had to ask for your help.
>>>>>
>>>>> Here is the application scenario setup by our customer.
>>>>> filename.iso        /dev/sr0           /dev/cdrom
>>>>> remote client    -->  host(cdemu)      -->  Linux VM
>>>>> (1)A remote client maps an iso file to x86 host machine through network using tcp.
>>>>> (2)The cdemu daemon then load it as a local virtual cdrom disk drive.
>>>>> (3)A VM is launched with the virtual cdrom disk drive configured.
>>>>> The VM can make use of this virtual cdrom to install an OS in the iso file.
>>>>>
>>>>> The network bandwith btw the remote client and host is 100Mbps, we test I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=100000.
>>>>> And we have
>>>>> (1) I/O perf of host side /dev/sr0 is 11MB/s;
>>>>> (2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.
>>>>>
>>>>> As we can see, I/O perf of cdrom inside the VM is about 34.5% compared with host side.
>>>>> FlameGraph is used to find out the bottleneck of this operation and we find out that too much time is occupied by calling *bdrv_is_inserted*.
>>>>> Then we dig into the code and figure out that the ioctl in *cdrom_is_inserted* takes too much time, because it triggers io_schdule_timeout in kernel.
>>>>> In the code path of emulated cdrom device, each DMA I/O request consists of several *bdrv_is_inserted*, which degrades the I/O performance by about 31% in our test.
>>>>> static bool cdrom_is_inserted(BlockDriverState *bs)
>>>>> {
>>>>>     BDRVRawState *s = bs->opaque;
>>>>>     int ret;
>>>>>
>>>>>     ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>>>>>     return ret == CDS_DISC_OK;
>>>>> }
>>>>> A flamegraph svg file (cdrom.svg) is attachieved in this email to show the code timing profile we've tested.
>>>>>
>>>>> So here is my question.
>>>>> (1) Why do we regularly check the presence of a cdrom disk drive in the code path?  Can we do it asynchronously?
>>
>> The ATAPI emulator needs to know if it has storage present to carry out
>> the request or to signal an error, so it checks. It might be the case
>> that the VM operator forcibly dismounted network storage without
>> awareness from the guest. (This is basically emulation of the case when
>> a user mechanically forces a CDROM tray open.)
>>
>> I wasn't aware this check was so slow.
> It is fast enough in most case, however it may be slow if the cdrom drive is a virtual drive mapped from remote client.
> This is showed in https://raw.githubusercontent.com/fangying/fangying.github.io/master/94E119FA-8BA1-41AB-A26A-FBDC635BCD2C.png
> The reason is ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) in cdrom_is_inserted takes much time in this scenario.
>
>>
>> Maybe we can just cache blk_is_inserted -- I don't remember if it's
>> possible to eject media without awareness from the device model, but if
>> this check winds up being slow in some configurations we can cache it.
> To cache media status is a good idea here, we check blk_is_inserted instead and modify it when media status is changed.
>>
>> This won't help the bdrv_check_byte_request calls, though, just ones
>> from the device model, see below
>>
>>>>> (2) Can we drop some check point in the code path to improve the performance?
>>>>> Thanks.
>>>>
>>>> I'm actually not sure why so many places check it. Just letting an I/O
>>>> request fail if the CD was removed would probably be easier.
>>>>
>>> You can see those check points as showed in the attached flamegraph file in this thread (rename it to cdrom.svg).
>>> It is called mainly in bdrv_check_byte_request and ide_atapi_cmd, and only the host_cdrom backend implements
>>> it using cdrom_is_inserted.
>>>
>>
>> in ide_atapi_cmd, try replacing:
>>
>> `!s->tray_open && blk_is_inserted(s->blk) && s->cdrom_changed)`
>>
>> with
>>
>> `!s->tray_open && s->cdrom_changed && blk_is_inserted(s->blk))`
>>
>> which should hopefully short-circuit calls to blk_is_inserted if
>> s->cdrom_changed is false, but it doesn't do much to fix the subsequent
>> call:
>>
>> `    if ((cmd->flags & CHECK_READY) &&
>>         (!media_present(s) || !blk_is_inserted(s->blk)))`
>>
>> which is unfortunately going to check blk_is_inserted quite a lot
>> because the read commands are tagged CHECK_READY...
> Yes you are right.
>>
>>
>> As alluded to above, too, I'm not sure what I can do about
>> bdrv_check_byte_request -- what happens if the io.c helpers don't
>> actually check this? Will they fail gracefully?
>>
>> I guess there's one way to find out.
>
> I did some test here. As we can see only cdrom bdrv implements bdrv_is_inserted,
> then I skip check presence of cdrom device.
>
> I test I/O using dd if=/dev/sr0 of=/dev/null bs=32K count=1000
> and manually disconnect the remote iso client from the host to inject error.
>
> I found it failed gracefully and send an I/O error event.
> monitor_qapi_event_emit[423]|: {"timestamp": {"seconds": 1547466098, "microseconds": 740265}, "event": "BLOCK_IO_ERROR", "data":
> {"device": "drive-ide0-0-1", "node-name": "#block369", "reason": "Input/output error", "operation": "read", "action": "report"}}
>
> diff --git a/block.c b/block.c
> index 45145c5..5371ce2 100644
> --- a/block.c
> +++ b/block.c
> @@ -3479,6 +3479,11 @@ bool bdrv_is_inserted(BlockDriverState *bs)
>      if (!drv) {
>          return false;
>      }
> +
> +    if (bdrv_is_read_only(bs)) {
> +       return true;
> +    }
> +
>      if (drv->bdrv_is_inserted) {
>          return drv->bdrv_is_inserted(bs);
>      }
>
> Thanks.
>>
>>>> To try out whether that would improve performance significantly, you
>>>> could try to use the host_device backend rather than the host_cdrom
>>>> backend. That one doesn't implement .bdrv_is_inserted, so the operation
>>>> will be cheap (just return true unconditionally).
>>>>
>>> The remote client maps filename.iso to host virtual cdrom drive.
>>> We use xml like
>>>     <disk type='block' device='cdrom'>
>>>       <driver name='qemu' type='raw' cache='none' io='native'/>
>>>       <source dev='/dev/sr0'/>
>>>       <target dev='hdb' bus='ide'/>
>>>       <readonly/>
>>>       <boot order='2'/>
>>>       <address type='drive' controller='0' bus='0' target='0' unit='1'/>
>>>     </disk>
>>> to attach the virtual cdrom drive into guest and backend bdrv is host_cdrom.
>>> Sorry I do not know how to use the host_device backend here.
>>>
>>>> You will also lose eject/lock passthrough when doing so, so this is not
>>>> the final solution, but if it proves to be a lot faster, we can check
>>>> where bdrv_is_inserted() calls are actually important (if anywhere) and
>>>> hopefully remove some even for the host_cdrom case.
>>>>
>>> You mean if we do not check *cdrom_is_inserted*, we may lose eject/lock here ?
>>> I wonder we used to check the presence of the cdrom device each time it is touched
>>> though I am not familiar with this feature.
>>>
>>>> Kevin
>>>>
>>>> .
>>>>
>>>
>>>
>>
>>
>>
>

Do you intend to send a patch along? I doubt I'll be able to prioritize
this but I will look over any patches that get sent. I am a little
skeptical of the "read_only" optimization, I might need Kevin's input on
that one, but a patch would be a good place for that discussion.

--js

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
@ 2019-01-22  5:55 fangying
  0 siblings, 0 replies; 10+ messages in thread
From: fangying @ 2019-01-22  5:55 UTC (permalink / raw)
  To: John Snow, Kevin Wolf; +Cc: Zhoujian (jay), dengkai (A), qemu-devel, mreitz

Hi,
I am not intended to send a patch now, I use the ‘read-only’ code snippet to prove that the I/O performance
can increase from 30% to 80% compared to host side if we do not call cdrom_is_inserted here. Obviously it is not
suitable to skip over checking cdrom drive status.

However I cannot come up with a proper solution right now. But I think there may be two approches.
One way that can check drive status equal to ioctl CDROM_DRIVE_STATUS but much faster. The other way
that let qemu catch drive status via event triggered mechanism.

Regards.

发件人: fangying
发送时间: 2019年1月22日 11:27
收件人: John Snow <jsnow@redhat.com>; Kevin Wolf <kwolf@redhat.com>
抄送: Zhoujian (jay) <jianjay.zhou@huawei.com>; dengkai (A) <dengkai1@huawei.com>; qemu-devel <qemu-devel@nongnu.org>; mreitz <mreitz@redhat.com>
主题: RE: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device

发件人: John Snow
收件人: fangying<fangying1@huawei.com<mailto:fangying1@huawei.com>>;Kevin Wolf<kwolf@redhat.com<mailto:kwolf@redhat.com>>
抄送: Zhoujian (jay)<jianjay.zhou@huawei.com<mailto:jianjay.zhou@huawei.com>>;dengkai (A)<dengkai1@huawei.com<mailto:dengkai1@huawei.com>>;qemu-devel<qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>>;mreitz<mreitz@redhat.com<mailto:mreitz@redhat.com>>
主题: Re: [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device
时间: 2019-01-19 06:43:43



On 1/15/19 9:48 PM, Ying Fang wrote:
>
>
> On 2019/1/16 4:15, John Snow wrote:
>>
>>
>> On 1/8/19 10:20 PM, Ying Fang wrote:
>>>
>>>
>>> On 2019/1/8 20:46, Kevin Wolf wrote:
>>>> Am 29.12.2018 um 07:33 hat Ying Fang geschrieben:
>>>>> Hi.
>>>>> Recently one of our customer complained about the I/O performance of QEMU emulated host cdrom device.
>>>>> I did some investigation on it and there was still some point I could not figure out. So I had to ask for your help.
>>>>>
>>>>> Here is the application scenario setup by our customer.
>>>>> filename.iso        /dev/sr0           /dev/cdrom
>>>>> remote client    -->  host(cdemu)      -->  Linux VM
>>>>> (1)A remote client maps an iso file to x86 host machine through network using tcp.
>>>>> (2)The cdemu daemon then load it as a local virtual cdrom disk drive.
>>>>> (3)A VM is launched with the virtual cdrom disk drive configured.
>>>>> The VM can make use of this virtual cdrom to install an OS in the iso file.
>>>>>
>>>>> The network bandwith btw the remote client and host is 100Mbps, we test I/O perf using: dd if=/dev/sr0 of=/dev/null bs=32K count=100000.
>>>>> And we have
>>>>> (1) I/O perf of host side /dev/sr0 is 11MB/s;
>>>>> (2) I/O perf of /dev/cdrom inside VM is 3.8MB/s.
>>>>>
>>>>> As we can see, I/O perf of cdrom inside the VM is about 34.5% compared with host side.
>>>>> FlameGraph is used to find out the bottleneck of this operation and we find out that too much time is occupied by calling *bdrv_is_inserted*.
>>>>> Then we dig into the code and figure out that the ioctl in *cdrom_is_inserted* takes too much time, because it triggers io_schdule_timeout in kernel.
>>>>> In the code path of emulated cdrom device, each DMA I/O request consists of several *bdrv_is_inserted*, which degrades the I/O performance by about 31% in our test.
>>>>> static bool cdrom_is_inserted(BlockDriverState *bs)
>>>>> {
>>>>>     BDRVRawState *s = bs->opaque;
>>>>>     int ret;
>>>>>
>>>>>     ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>>>>>     return ret == CDS_DISC_OK;
>>>>> }
>>>>> A flamegraph svg file (cdrom.svg) is attachieved in this email to show the code timing profile we've tested.
>>>>>
>>>>> So here is my question.
>>>>> (1) Why do we regularly check the presence of a cdrom disk drive in the code path?  Can we do it asynchronously?
>>
>> The ATAPI emulator needs to know if it has storage present to carry out
>> the request or to signal an error, so it checks. It might be the case
>> that the VM operator forcibly dismounted network storage without
>> awareness from the guest. (This is basically emulation of the case when
>> a user mechanically forces a CDROM tray open.)
>>
>> I wasn't aware this check was so slow.
> It is fast enough in most case, however it may be slow if the cdrom drive is a virtual drive mapped from remote client.
> This is showed in https://raw.githubusercontent.com/fangying/fangying.github.io/master/94E119FA-8BA1-41AB-A26A-FBDC635BCD2C.png
> The reason is ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) in cdrom_is_inserted takes much time in this scenario.
>
>>
>> Maybe we can just cache blk_is_inserted -- I don't remember if it's
>> possible to eject media without awareness from the device model, but if
>> this check winds up being slow in some configurations we can cache it.
> To cache media status is a good idea here, we check blk_is_inserted instead and modify it when media status is changed.
>>
>> This won't help the bdrv_check_byte_request calls, though, just ones
>> from the device model, see below
>>
>>>>> (2) Can we drop some check point in the code path to improve the performance?
>>>>> Thanks.
>>>>
>>>> I'm actually not sure why so many places check it. Just letting an I/O
>>>> request fail if the CD was removed would probably be easier.
>>>>
>>> You can see those check points as showed in the attached flamegraph file in this thread (rename it to cdrom.svg).
>>> It is called mainly in bdrv_check_byte_request and ide_atapi_cmd, and only the host_cdrom backend implements
>>> it using cdrom_is_inserted.
>>>
>>
>> in ide_atapi_cmd, try replacing:
>>
>> `!s->tray_open && blk_is_inserted(s->blk) && s->cdrom_changed)`
>>
>> with
>>
>> `!s->tray_open && s->cdrom_changed && blk_is_inserted(s->blk))`
>>
>> which should hopefully short-circuit calls to blk_is_inserted if
>> s->cdrom_changed is false, but it doesn't do much to fix the subsequent
>> call:
>>
>> `    if ((cmd->flags & CHECK_READY) &&
>>         (!media_present(s) || !blk_is_inserted(s->blk)))`
>>
>> which is unfortunately going to check blk_is_inserted quite a lot
>> because the read commands are tagged CHECK_READY...
> Yes you are right.
>>
>>
>> As alluded to above, too, I'm not sure what I can do about
>> bdrv_check_byte_request -- what happens if the io.c helpers don't
>> actually check this? Will they fail gracefully?
>>
>> I guess there's one way to find out.
>
> I did some test here. As we can see only cdrom bdrv implements bdrv_is_inserted,
> then I skip check presence of cdrom device.
>
> I test I/O using dd if=/dev/sr0 of=/dev/null bs=32K count=1000
> and manually disconnect the remote iso client from the host to inject error.
>
> I found it failed gracefully and send an I/O error event.
> monitor_qapi_event_emit[423]|: {"timestamp": {"seconds": 1547466098, "microseconds": 740265}, "event": "BLOCK_IO_ERROR", "data":
> {"device": "drive-ide0-0-1", "node-name": "#block369", "reason": "Input/output error", "operation": "read", "action": "report"}}
>
> diff --git a/block.c b/block.c
> index 45145c5..5371ce2 100644
> --- a/block.c
> +++ b/block.c
> @@ -3479,6 +3479,11 @@ bool bdrv_is_inserted(BlockDriverState *bs)
>      if (!drv) {
>          return false;
>      }
> +
> +    if (bdrv_is_read_only(bs)) {
> +       return true;
> +    }
> +
>      if (drv->bdrv_is_inserted) {
>          return drv->bdrv_is_inserted(bs);
>      }
>
> Thanks.
>>
>>>> To try out whether that would improve performance significantly, you
>>>> could try to use the host_device backend rather than the host_cdrom
>>>> backend. That one doesn't implement .bdrv_is_inserted, so the operation
>>>> will be cheap (just return true unconditionally).
>>>>
>>> The remote client maps filename.iso to host virtual cdrom drive.
>>> We use xml like
>>>     <disk type='block' device='cdrom'>
>>>       <driver name='qemu' type='raw' cache='none' io='native'/>
>>>       <source dev='/dev/sr0'/>
>>>       <target dev='hdb' bus='ide'/>
>>>       <readonly/>
>>>       <boot order='2'/>
>>>       <address type='drive' controller='0' bus='0' target='0' unit='1'/>
>>>     </disk>
>>> to attach the virtual cdrom drive into guest and backend bdrv is host_cdrom.
>>> Sorry I do not know how to use the host_device backend here.
>>>
>>>> You will also lose eject/lock passthrough when doing so, so this is not
>>>> the final solution, but if it proves to be a lot faster, we can check
>>>> where bdrv_is_inserted() calls are actually important (if anywhere) and
>>>> hopefully remove some even for the host_cdrom case.
>>>>
>>> You mean if we do not check *cdrom_is_inserted*, we may lose eject/lock here ?
>>> I wonder we used to check the presence of the cdrom device each time it is touched
>>> though I am not familiar with this feature.
>>>
>>>> Kevin
>>>>
>>>> .
>>>>
>>>
>>>
>>
>>
>>
>

Do you intend to send a patch along? I doubt I'll be able to prioritize
this but I will look over any patches that get sent. I am a little
skeptical of the "read_only" optimization, I might need Kevin's input on
that one, but a patch would be a good place for that discussion.

--js

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-01-22  5:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-29  6:33 [Qemu-devel] [RFC] Questions on the I/O performance of emulated host cdrom device Ying Fang
2019-01-07  1:23 ` Ying Fang
2019-01-08 12:46 ` Kevin Wolf
2019-01-09  3:20   ` Ying Fang
2019-01-15 20:15     ` John Snow
2019-01-16  2:48       ` Ying Fang
2019-01-18 22:43         ` John Snow
2019-01-22  3:26           ` fangying
  -- strict thread matches above, loose matches on Subject: below --
2019-01-22  5:55 fangying
2018-12-29  6:29 Ying Fang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).