* [Qemu-devel] [PATCH] PCI: minor performance optimization
@ 2015-11-20 10:45 Cao jin
2015-11-20 10:45 ` Michael S. Tsirkin
0 siblings, 1 reply; 7+ messages in thread
From: Cao jin @ 2015-11-20 10:45 UTC (permalink / raw)
To: qemu-devel; +Cc: mst
1. Do param check in pci_add_capability2(), as it is a public API.
2. As spec says, each capability must be DWORD aligned, so an optimization can
be done via Loop Unrolling.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
hw/pci/pci.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 168b9cc..1e99603 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1924,13 +1924,15 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
{
int offset = PCI_CONFIG_HEADER_SIZE;
- int i;
- for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
+ int i = PCI_CONFIG_HEADER_SIZE;;
+
+ for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
if (pdev->used[i])
- offset = i + 1;
- else if (i - offset + 1 == size)
+ offset = i + 4;
+ else if (i - offset >= size)
return offset;
}
+
return 0;
}
@@ -2144,6 +2146,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
uint8_t *config;
int i, overlapping_cap;
+ assert(size > 0);
+
if (!offset) {
offset = pci_find_space(pdev, size);
if (!offset) {
--
2.1.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] PCI: minor performance optimization
2015-11-20 10:45 [Qemu-devel] [PATCH] PCI: minor performance optimization Cao jin
@ 2015-11-20 10:45 ` Michael S. Tsirkin
2015-11-20 11:04 ` Cao jin
0 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2015-11-20 10:45 UTC (permalink / raw)
To: Cao jin; +Cc: qemu-devel
On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
> 1. Do param check in pci_add_capability2(), as it is a public API.
Separate patch pls.
> 2. As spec says, each capability must be DWORD aligned, so an optimization can
> be done via Loop Unrolling.
Why do we want to optimize it?
>
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
> hw/pci/pci.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 168b9cc..1e99603 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1924,13 +1924,15 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
> static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
> {
> int offset = PCI_CONFIG_HEADER_SIZE;
> - int i;
> - for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
> + int i = PCI_CONFIG_HEADER_SIZE;;
> +
> + for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
> if (pdev->used[i])
> - offset = i + 1;
> - else if (i - offset + 1 == size)
> + offset = i + 4;
> + else if (i - offset >= size)
> return offset;
> }
> +
> return 0;
> }
>
> @@ -2144,6 +2146,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
> uint8_t *config;
> int i, overlapping_cap;
>
> + assert(size > 0);
> +
> if (!offset) {
> offset = pci_find_space(pdev, size);
> if (!offset) {
> --
> 2.1.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] PCI: minor performance optimization
2015-11-20 10:45 ` Michael S. Tsirkin
@ 2015-11-20 11:04 ` Cao jin
2015-11-20 11:26 ` Michael S. Tsirkin
0 siblings, 1 reply; 7+ messages in thread
From: Cao jin @ 2015-11-20 11:04 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel
On 11/20/2015 06:45 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
>> 1. Do param check in pci_add_capability2(), as it is a public API.
>
> Separate patch pls.
OK
>
>> 2. As spec says, each capability must be DWORD aligned, so an optimization can
>> be done via Loop Unrolling.
>
> Why do we want to optimize it?
>
For tiny performance improvement via less loop. take pcie express
capability(60 bytes at most) for example, it may loop 60 times, now we
just need 15 times, a quarter of before.
>>
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
>> hw/pci/pci.c | 12 ++++++++----
>> 1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index 168b9cc..1e99603 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -1924,13 +1924,15 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
>> static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
>> {
>> int offset = PCI_CONFIG_HEADER_SIZE;
>> - int i;
>> - for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
>> + int i = PCI_CONFIG_HEADER_SIZE;;
>> +
>> + for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
>> if (pdev->used[i])
>> - offset = i + 1;
>> - else if (i - offset + 1 == size)
>> + offset = i + 4;
>> + else if (i - offset >= size)
>> return offset;
>> }
>> +
>> return 0;
>> }
>>
>> @@ -2144,6 +2146,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
>> uint8_t *config;
>> int i, overlapping_cap;
>>
>> + assert(size > 0);
>> +
>> if (!offset) {
>> offset = pci_find_space(pdev, size);
>> if (!offset) {
>> --
>> 2.1.0
> .
>
--
Yours Sincerely,
Cao Jin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] PCI: minor performance optimization
2015-11-20 11:04 ` Cao jin
@ 2015-11-20 11:26 ` Michael S. Tsirkin
2015-11-20 11:58 ` Cao jin
0 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2015-11-20 11:26 UTC (permalink / raw)
To: Cao jin; +Cc: qemu-devel
On Fri, Nov 20, 2015 at 07:04:07PM +0800, Cao jin wrote:
>
>
> On 11/20/2015 06:45 PM, Michael S. Tsirkin wrote:
> >On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
> >>1. Do param check in pci_add_capability2(), as it is a public API.
> >
> >Separate patch pls.
>
> OK
>
> >
> >>2. As spec says, each capability must be DWORD aligned, so an optimization can
> >> be done via Loop Unrolling.
> >
> >Why do we want to optimize it?
> >
>
> For tiny performance improvement via less loop. take pcie express
> capability(60 bytes at most) for example, it may loop 60 times, now we just
> need 15 times, a quarter of before.
But who cares? This is not a data path operation.
> >>
> >>Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> >>---
> >> hw/pci/pci.c | 12 ++++++++----
> >> 1 file changed, 8 insertions(+), 4 deletions(-)
> >>
> >>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>index 168b9cc..1e99603 100644
> >>--- a/hw/pci/pci.c
> >>+++ b/hw/pci/pci.c
> >>@@ -1924,13 +1924,15 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
> >> static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
> >> {
> >> int offset = PCI_CONFIG_HEADER_SIZE;
> >>- int i;
> >>- for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
> >>+ int i = PCI_CONFIG_HEADER_SIZE;;
> >>+
> >>+ for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
> >> if (pdev->used[i])
> >>- offset = i + 1;
> >>- else if (i - offset + 1 == size)
> >>+ offset = i + 4;
> >>+ else if (i - offset >= size)
> >> return offset;
> >> }
> >>+
> >> return 0;
> >> }
> >>
> >>@@ -2144,6 +2146,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
> >> uint8_t *config;
> >> int i, overlapping_cap;
> >>
> >>+ assert(size > 0);
> >>+
> >> if (!offset) {
> >> offset = pci_find_space(pdev, size);
> >> if (!offset) {
> >>--
> >>2.1.0
> >.
> >
>
> --
> Yours Sincerely,
>
> Cao Jin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] PCI: minor performance optimization
2015-11-20 11:26 ` Michael S. Tsirkin
@ 2015-11-20 11:58 ` Cao jin
2015-11-20 13:30 ` Michael S. Tsirkin
0 siblings, 1 reply; 7+ messages in thread
From: Cao jin @ 2015-11-20 11:58 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel
On 11/20/2015 07:26 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 20, 2015 at 07:04:07PM +0800, Cao jin wrote:
>>
>>
>> On 11/20/2015 06:45 PM, Michael S. Tsirkin wrote:
>>> On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
>>>
>>>> 2. As spec says, each capability must be DWORD aligned, so an optimization can
>>>> be done via Loop Unrolling.
>>>
>>> Why do we want to optimize it?
>>>
>>
>> For tiny performance improvement via less loop. take pcie express
>> capability(60 bytes at most) for example, it may loop 60 times, now we just
>> need 15 times, a quarter of before.
>
> But who cares? This is not a data path operation.
It is tiny thing I found when browsing code. When found there are
several places looks like this, I think maybe it does good to qemu to do
this and CCed to you because it don`t look like a simple trivial patch.
So, hey Michael, if you don`t like this kind of optimization, that`t ok,
forget it. But I think it make me little confused when determine which
kind of patch should be CCed to you.
>
>>>>
>>>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>>>> ---
>>>> hw/pci/pci.c | 12 ++++++++----
>>>> 1 file changed, 8 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>> index 168b9cc..1e99603 100644
>>>> --- a/hw/pci/pci.c
>>>> +++ b/hw/pci/pci.c
>>>> @@ -1924,13 +1924,15 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
>>>> static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
>>>> {
>>>> int offset = PCI_CONFIG_HEADER_SIZE;
>>>> - int i;
>>>> - for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
>>>> + int i = PCI_CONFIG_HEADER_SIZE;;
>>>> +
>>>> + for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
>>>> if (pdev->used[i])
>>>> - offset = i + 1;
>>>> - else if (i - offset + 1 == size)
>>>> + offset = i + 4;
>>>> + else if (i - offset >= size)
>>>> return offset;
>>>> }
>>>> +
>>>> return 0;
>>>> }
>>>>
>>>> @@ -2144,6 +2146,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
>>>> uint8_t *config;
>>>> int i, overlapping_cap;
>>>>
>>>> + assert(size > 0);
>>>> +
>>>> if (!offset) {
>>>> offset = pci_find_space(pdev, size);
>>>> if (!offset) {
>>>> --
>>>> 2.1.0
>>> .
>>>
>>
>> --
>> Yours Sincerely,
>>
>> Cao Jin
> .
>
--
Yours Sincerely,
Cao Jin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] PCI: minor performance optimization
2015-11-20 11:58 ` Cao jin
@ 2015-11-20 13:30 ` Michael S. Tsirkin
2015-11-21 7:22 ` Cao jin
0 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2015-11-20 13:30 UTC (permalink / raw)
To: Cao jin; +Cc: qemu-devel
On Fri, Nov 20, 2015 at 07:58:01PM +0800, Cao jin wrote:
>
>
> On 11/20/2015 07:26 PM, Michael S. Tsirkin wrote:
> >On Fri, Nov 20, 2015 at 07:04:07PM +0800, Cao jin wrote:
> >>
> >>
> >>On 11/20/2015 06:45 PM, Michael S. Tsirkin wrote:
> >>>On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
> >>>
> >>>>2. As spec says, each capability must be DWORD aligned, so an optimization can
> >>>> be done via Loop Unrolling.
> >>>
> >>>Why do we want to optimize it?
> >>>
> >>
> >>For tiny performance improvement via less loop. take pcie express
> >>capability(60 bytes at most) for example, it may loop 60 times, now we just
> >>need 15 times, a quarter of before.
> >
> >But who cares? This is not a data path operation.
>
> It is tiny thing I found when browsing code. When found there are several
> places looks like this, I think maybe it does good to qemu to do this and
> CCed to you because it don`t look like a simple trivial patch.
>
> So, hey Michael, if you don`t like this kind of optimization, that`t ok,
> forget it. But I think it make me little confused when determine which kind
> of patch should be CCed to you.
Optimization patches should normally include performance numbers
if they are to be merged.
Try to come up with a benchmark and you will realize that the speed of
this function has no effect under even half way realistic conditions.
> >
> >>>>
> >>>>Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> >>>>---
> >>>> hw/pci/pci.c | 12 ++++++++----
> >>>> 1 file changed, 8 insertions(+), 4 deletions(-)
> >>>>
> >>>>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>>>index 168b9cc..1e99603 100644
> >>>>--- a/hw/pci/pci.c
> >>>>+++ b/hw/pci/pci.c
> >>>>@@ -1924,13 +1924,15 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
> >>>> static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
> >>>> {
> >>>> int offset = PCI_CONFIG_HEADER_SIZE;
> >>>>- int i;
> >>>>- for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
> >>>>+ int i = PCI_CONFIG_HEADER_SIZE;;
> >>>>+
> >>>>+ for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
> >>>> if (pdev->used[i])
> >>>>- offset = i + 1;
> >>>>- else if (i - offset + 1 == size)
> >>>>+ offset = i + 4;
> >>>>+ else if (i - offset >= size)
> >>>> return offset;
> >>>> }
> >>>>+
> >>>> return 0;
> >>>> }
> >>>>
> >>>>@@ -2144,6 +2146,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
> >>>> uint8_t *config;
> >>>> int i, overlapping_cap;
> >>>>
> >>>>+ assert(size > 0);
> >>>>+
> >>>> if (!offset) {
> >>>> offset = pci_find_space(pdev, size);
> >>>> if (!offset) {
> >>>>--
> >>>>2.1.0
> >>>.
> >>>
> >>
> >>--
> >>Yours Sincerely,
> >>
> >>Cao Jin
> >.
> >
>
> --
> Yours Sincerely,
>
> Cao Jin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] PCI: minor performance optimization
2015-11-20 13:30 ` Michael S. Tsirkin
@ 2015-11-21 7:22 ` Cao jin
0 siblings, 0 replies; 7+ messages in thread
From: Cao jin @ 2015-11-21 7:22 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel
On 11/20/2015 09:30 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 20, 2015 at 07:58:01PM +0800, Cao jin wrote:
>>
>>
>> On 11/20/2015 07:26 PM, Michael S. Tsirkin wrote:
>>> On Fri, Nov 20, 2015 at 07:04:07PM +0800, Cao jin wrote:
>>>>
>>>>
>>>> On 11/20/2015 06:45 PM, Michael S. Tsirkin wrote:
>>>>> On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
>>>>>
>>>>>> 2. As spec says, each capability must be DWORD aligned, so an optimization can
>>>>>> be done via Loop Unrolling.
>>>>>
>>>>> Why do we want to optimize it?
>>>>>
>>>>
>>>> For tiny performance improvement via less loop. take pcie express
>>>> capability(60 bytes at most) for example, it may loop 60 times, now we just
>>>> need 15 times, a quarter of before.
>>>
>>> But who cares? This is not a data path operation.
>>
>> It is tiny thing I found when browsing code. When found there are several
>> places looks like this, I think maybe it does good to qemu to do this and
>> CCed to you because it don`t look like a simple trivial patch.
>>
>> So, hey Michael, if you don`t like this kind of optimization, that`t ok,
>> forget it. But I think it make me little confused when determine which kind
>> of patch should be CCed to you.
>
> Optimization patches should normally include performance numbers
> if they are to be merged.
> Try to come up with a benchmark and you will realize that the speed of
> this function has no effect under even half way realistic conditions.
>
Maybe you are right. OK, will send the param check patch to the qemu-trivial
>>>>> .
>>>>>
>>>>
>>>> --
>>>> Yours Sincerely,
>>>>
>>>> Cao Jin
>>> .
>>>
>>
>> --
>> Yours Sincerely,
>>
>> Cao Jin
> .
>
--
Yours Sincerely,
Cao Jin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-11-21 7:21 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-20 10:45 [Qemu-devel] [PATCH] PCI: minor performance optimization Cao jin
2015-11-20 10:45 ` Michael S. Tsirkin
2015-11-20 11:04 ` Cao jin
2015-11-20 11:26 ` Michael S. Tsirkin
2015-11-20 11:58 ` Cao jin
2015-11-20 13:30 ` Michael S. Tsirkin
2015-11-21 7:22 ` Cao jin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).