All of lore.kernel.org
 help / color / mirror / Atom feed
* Rather slow time of Pin in Windows with GPL  PV driver
@ 2011-03-09  6:53 MaoXiaoyun
  2011-03-09  7:58 ` John Weekes
       [not found] ` <D271C3A4-9B27-4E08-A92A-D55A811736EC@bendigoit.com.au>
  0 siblings, 2 replies; 28+ messages in thread
From: MaoXiaoyun @ 2011-03-09  6:53 UTC (permalink / raw)
  To: james.harper; +Cc: xen devel


[-- Attachment #1.1: Type: text/plain, Size: 614 bytes --]


Hi James:
 
         When using HVM, windows 2003 with GPL PV installed, We
found that the Ping respone time is abnormal, quite slow. We did 
some tests to compare with Linux OS. 
 
       Attached is the linux and Win2003 Ping result, which shows 
in Linux, the time is less than 1ms, and very stable, while in 2003, 
Ping time,  minium to lese than 1ms, and maxium to 15ms, unstable.
 
Those two HVMS is on the same host, we also test XP, 2008, 
and it looks like Ping on XP behavior normally, but in 2008, it has the
same behavior as 2003.
 
What could be the cause?
 
Many thanks. 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 991 bytes --]

[-- Attachment #2: 2003.JPG --]
[-- Type: image/jpeg, Size: 53071 bytes --]

[-- Attachment #3: linux.JPG --]
[-- Type: image/jpeg, Size: 67062 bytes --]

[-- Attachment #4: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Rather slow time of Pin in Windows with GPL PV driver
  2011-03-09  6:53 Rather slow time of Pin in Windows with GPL PV driver MaoXiaoyun
@ 2011-03-09  7:58 ` John Weekes
       [not found] ` <D271C3A4-9B27-4E08-A92A-D55A811736EC@bendigoit.com.au>
  1 sibling, 0 replies; 28+ messages in thread
From: John Weekes @ 2011-03-09  7:58 UTC (permalink / raw)
  To: xen-devel

On 3/8/2011 10:53 PM, MaoXiaoyun wrote:
> Attached is the linux and Win2003 Ping result, which shows
> in Linux, the time is less than 1ms, and very stable, while in 2003,
> Ping time, minium to lese than 1ms, and maxium to 15ms, unstable.
>
> Those two HVMS is on the same host, we also test XP, 2008,
> and it looks like Ping on XP behavior normally, but in 2008, it has the
> same behavior as 2003.

This may not completely account for it, but try enabling the multimedia
timer on your Windows installs by starting up any application that asks
to use it at a 1ms resolution, such as Windows Media Player. Seeing RTTs
that are either 1ms or 15ms, but never a number in between, is a
tell-tale sign that the MM timer is not currently in use, and as a
result you're getting less accurate, more granular readings.

-John

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Rather slow time of Pin in Windows with GPL PV driver
       [not found] ` <D271C3A4-9B27-4E08-A92A-D55A811736EC@bendigoit.com.au>
@ 2011-03-09 10:20   ` MaoXiaoyun
       [not found]   ` <BLU157-w82233DE21FFA3AC07FCC3DAC90@phx.gbl>
  1 sibling, 0 replies; 28+ messages in thread
From: MaoXiaoyun @ 2011-03-09 10:20 UTC (permalink / raw)
  To: james.harper; +Cc: xen devel


[-- Attachment #1.1: Type: text/plain, Size: 1141 bytes --]


BTW:  HVM windows 2003/2008  with rhel PV driver install works fine too.
Thanks.

 
> Subject: Re: Rather slow time of Pin in Windows with GPL PV driver
> From: james.harper@bendigoit.com.au
> Date: Wed, 9 Mar 2011 17:56:47 +1100
> To: tinnycloud@hotmail.com
> 
> How many vcpus?
> 
> What is timer_mode set to in the config?
> 
> Sent from my iPhone
> 
> On 09/03/2011, at 17:53, "MaoXiaoyun" <tinnycloud@hotmail.com> wrote:
> 
> > Hi James:
> > 
> > When using HVM, windows 2003 with GPL PV installed, We
> > found that the Ping respone time is abnormal, quite slow. We did 
> > some tests to compare with Linux OS. 
> > 
> > Attached is the linux and Win2003 Ping result, which shows 
> > in Linux, the time is less than 1ms, and very stable, while in 2003, 
> > Ping time, minium to lese than 1ms, and maxium to 15ms, unstable.
> > 
> > Those two HVMS is on the same host, we also test XP, 2008, 
> > and it looks like Ping on XP behavior normally, but in 2008, it has the
> > same behavior as 2003.
> > 
> > What could be the cause?
> > 
> > Many thanks.
> > <2003.JPG>
> > <linux.JPG>
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 1607 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Rather slow time of Pin in Windows with GPL PV driver
       [not found]     ` <AEC6C66638C05B468B556EA548C1A77D01C55DCB@trantor>
@ 2011-03-09 11:15       ` MaoXiaoyun
  2011-03-09 11:28         ` James Harper
  0 siblings, 1 reply; 28+ messages in thread
From: MaoXiaoyun @ 2011-03-09 11:15 UTC (permalink / raw)
  To: james.harper; +Cc: xen devel


[-- Attachment #1.1: Type: text/plain, Size: 1161 bytes --]


I may try pin later, my host has 4 * 4 core CPU.
 
Well, I just compare GPL and RHEL PV driver codes, I noticed most of the net driver
initialization is the same. Only one difference, in  GPL code has the code of 
KeSetTargetProcessorDpc(&xi->rx_dpc, 0); //in xennet_rx.c line 953
but the RHEL doesn't
 
So I simply comments the code, recompile and did the test again.
The result shows good, all ping time less than < 1ms.
 
May it be the cause? Is it harmful to comments this line? 
Many thanks.

 
> Subject: RE: Rather slow time of Pin in Windows with GPL PV driver
> Date: Wed, 9 Mar 2011 22:02:17 +1100
> From: james.harper@bendigoit.com.au
> To: tinnycloud@hotmail.com
> 
> > 
> > 8 vcpus
> > time_mode is 1
> > 
> 
> Can you pin all 8 cores to that DomU?
> 
> How many total cores on the physical machine?
> 
> When I try I get mostly time=1ms and time<1ms with occasional higher
> single digit values. This time of night probably isn't such a good time
> to be ping testing for me because my system is running backups which is
> pretty intensive - it's only a dual core AMD running a few VM's.
> 
> James
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 1609 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Rather slow time of Pin in Windows with GPL PV driver
  2011-03-09 11:15       ` MaoXiaoyun
@ 2011-03-09 11:28         ` James Harper
  2011-03-09 11:39           ` MaoXiaoyun
                             ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: James Harper @ 2011-03-09 11:28 UTC (permalink / raw)
  To: MaoXiaoyun; +Cc: xen devel

> I may try pin later, my host has 4 * 4 core CPU.
> 
> Well, I just compare GPL and RHEL PV driver codes, I noticed most of
the net
> driver
> initialization is the same. Only one difference, in  GPL code has the
code of
> KeSetTargetProcessorDpc(&xi->rx_dpc, 0); //in xennet_rx.c line 953
> but the RHEL doesn't
> 
> So I simply comments the code, recompile and did the test again.
> The result shows good, all ping time less than < 1ms.
> 
> May it be the cause? Is it harmful to comments this line?
> Many thanks.
> 

At a guess I would say it should be harmful to performance, but all the
critical code is protected by spinlocks.

It could be a leftover from a previous version of GPLPV. In the current
version, the spinlock protected code is probably a little long winded
but is nothing compared to the passing down of packets to Windows that
is done in the DPC but outside the spinlock.

Can you do some general performance tests with this change?

Is the RHEL PV driver source publicly available?

James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Rather slow time of Pin in Windows with GPL PV driver
  2011-03-09 11:28         ` James Harper
@ 2011-03-09 11:39           ` MaoXiaoyun
  2011-03-10  3:17           ` MaoXiaoyun
  2011-03-10  4:47           ` MaoXiaoyun
  2 siblings, 0 replies; 28+ messages in thread
From: MaoXiaoyun @ 2011-03-09 11:39 UTC (permalink / raw)
  To: james.harper; +Cc: xen devel


[-- Attachment #1.1: Type: text/plain, Size: 1691 bytes --]


Send the RHEL source code for convience.
I will have some performance test later.
 
In my understanding, KeSetTargetProcessorDpc(&xi->rx_dpc, num)
means put this Dpc into numth CPU dpc queue, right ?
 
What is the difference between this line of code is commentted 
and  the num is set to 0? I assume they have the same mean, right?
 
Thanks.
 
> Subject: RE: Rather slow time of Pin in Windows with GPL PV driver
> Date: Wed, 9 Mar 2011 22:28:48 +1100
> From: james.harper@bendigoit.com.au
> To: tinnycloud@hotmail.com
> CC: xen-devel@lists.xensource.com
> 
> > I may try pin later, my host has 4 * 4 core CPU.
> > 
> > Well, I just compare GPL and RHEL PV driver codes, I noticed most of
> the net
> > driver
> > initialization is the same. Only one difference, in GPL code has the
> code of
> > KeSetTargetProcessorDpc(&xi->rx_dpc, 0); //in xennet_rx.c line 953
> > but the RHEL doesn't
> > 
> > So I simply comments the code, recompile and did the test again.
> > The result shows good, all ping time less than < 1ms.
> > 
> > May it be the cause? Is it harmful to comments this line?
> > Many thanks.
> > 
> 
> At a guess I would say it should be harmful to performance, but all the
> critical code is protected by spinlocks.
> 
> It could be a leftover from a previous version of GPLPV. In the current
> version, the spinlock protected code is probably a little long winded
> but is nothing compared to the passing down of packets to Windows that
> is done in the DPC but outside the spinlock.
> 
> Can you do some general performance tests with this change?
> 
> Is the RHEL PV driver source publicly available?
> 
> James
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 2214 bytes --]

[-- Attachment #2: src.rar --]
[-- Type: application/octet-stream, Size: 338288 bytes --]

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Rather slow time of Pin in Windows with GPL PV driver
  2011-03-09 11:28         ` James Harper
  2011-03-09 11:39           ` MaoXiaoyun
@ 2011-03-10  3:17           ` MaoXiaoyun
  2011-03-10  4:47           ` MaoXiaoyun
  2 siblings, 0 replies; 28+ messages in thread
From: MaoXiaoyun @ 2011-03-10  3:17 UTC (permalink / raw)
  To: james.harper; +Cc: xen devel


[-- Attachment #1.1: Type: text/plain, Size: 3787 bytes --]


Hi James:
 
      I did some more test, add some log to what the interrupts distribution.
      In XenNet_RxBufferCheck I log the CPU number by adding line 
      KdPrint((__DRIVER_NAME "    pcpu = %lu\n", pcpu));  please refer to below code.
 
      Result below show that when KeSetTargetProcessorDpc is commentten, the interrupts
distribute over all VCPUs, could this explain something?
 
      Beside, I did some perf test using netperf, no performanc difference observed.
 
===============Result================================
      1) KeSetTargetProcessorDpc(&xi->rx_dpc, 0) is commentted.
       XnetNet       pcpu = 1
       XnetNet       pcpu = 3
       XnetNet       pcpu = 2
       XnetNet       pcpu = 3
       XnetNet       pcpu = 7
       XnetNet       pcpu = 0
       XnetNet       pcpu = 5
       XnetNet       pcpu = 3
       XnetNet       pcpu = 0
       XnetNet       pcpu = 3
       XnetNet       pcpu = 7
       XnetNet       pcpu = 4
       XnetNet       pcpu = 5
       XnetNet       pcpu = 2
       XnetNet       pcpu = 4
       XnetNet       pcpu = 5
       XnetNet       pcpu = 6
       XnetNet       pcpu = 0
       XnetNet       pcpu = 6
      
       2) KeSetTargetProcessorDpc(&xi->rx_dpc, 0) is *NOT*commentted.
       XnetNet       pcpu = 0
       XnetNet       pcpu = 0
       XnetNet       pcpu = 0
       XnetNet       pcpu = 0
       XnetNet       pcpu = 0
       XnetNet       pcpu = 0
       XnetNet       pcpu = 0
       XnetNet       pcpu = 0
       XnetNet       pcpu = 0
       XnetNet       pcpu = 0
       XnetNet       pcpu = 0
==================================================
static VOID
XenNet_RxBufferCheck(PKDPC dpc, PVOID context, PVOID arg1, PVOID arg2)
{
  struct xennet_info *xi = context;
  RING_IDX cons, prod;
  LIST_ENTRY rx_packet_list;
  PLIST_ENTRY entry;
  PNDIS_PACKET packets[MAXIMUM_PACKETS_PER_INDICATE];
  ULONG packet_count = 0;
  struct netif_rx_response *rxrsp = NULL;
  struct netif_extra_info *ei;
  USHORT id;
  int more_to_do = FALSE;
  packet_info_t *pi = &xi->rxpi;
  //NDIS_STATUS status;
  shared_buffer_t *page_buf;
  PNDIS_BUFFER buffer;
  ULONG pcpu = KeGetCurrentProcessorNumber() & 0xff;

  UNREFERENCED_PARAMETER(dpc);
  UNREFERENCED_PARAMETER(arg1);
  UNREFERENCED_PARAMETER(arg2);

  //FUNCTION_ENTER();

  KdPrint((__DRIVER_NAME "    pcpu = %lu\n", pcpu));    

  if (!xi->connected)

 
> Subject: RE: Rather slow time of Pin in Windows with GPL PV driver
> Date: Wed, 9 Mar 2011 22:28:48 +1100
> From: james.harper@bendigoit.com.au
> To: tinnycloud@hotmail.com
> CC: xen-devel@lists.xensource.com
> 
> > I may try pin later, my host has 4 * 4 core CPU.
> > 
> > Well, I just compare GPL and RHEL PV driver codes, I noticed most of
> the net
> > driver
> > initialization is the same. Only one difference, in GPL code has the
> code of
> > KeSetTargetProcessorDpc(&xi->rx_dpc, 0); //in xennet_rx.c line 953
> > but the RHEL doesn't
> > 
> > So I simply comments the code, recompile and did the test again.
> > The result shows good, all ping time less than < 1ms.
> > 
> > May it be the cause? Is it harmful to comments this line?
> > Many thanks.
> > 
> 
> At a guess I would say it should be harmful to performance, but all the
> critical code is protected by spinlocks.
> 
> It could be a leftover from a previous version of GPLPV. In the current
> version, the spinlock protected code is probably a little long winded
> but is nothing compared to the passing down of packets to Windows that
> is done in the DPC but outside the spinlock.
> 
> Can you do some general performance tests with this change?
> 
> Is the RHEL PV driver source publicly available?
> 
> James
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 7415 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Rather slow time of Pin in Windows with GPL PV driver
  2011-03-09 11:28         ` James Harper
  2011-03-09 11:39           ` MaoXiaoyun
  2011-03-10  3:17           ` MaoXiaoyun
@ 2011-03-10  4:47           ` MaoXiaoyun
  2011-03-10  6:27             ` James Harper
  2 siblings, 1 reply; 28+ messages in thread
From: MaoXiaoyun @ 2011-03-10  4:47 UTC (permalink / raw)
  To: james.harper; +Cc: xen devel


[-- Attachment #1.1: Type: text/plain, Size: 1785 bytes --]


It looks like KeSetTargetProcessorDpc(&xi->rx_dpc, 0) set the rx_dpc in VCPU0 only, 
and in fact interrput for xennet are distributed across all VCPUS.
 
By using IntFiltr from http://support.microsoft.com/kb/252867
to set interrupt affinity to VCPU0 only, without KeSetTargetProcessorDpc
commentted, we get quite stable ping time too., which is less than 1ms
 
So I think this is the problem.
 KeSetTargetProcessorDpc should be discard.
 
> Subject: RE: Rather slow time of Pin in Windows with GPL PV driver
> Date: Wed, 9 Mar 2011 22:28:48 +1100
> From: james.harper@bendigoit.com.au
> To: tinnycloud@hotmail.com
> CC: xen-devel@lists.xensource.com
> 
> > I may try pin later, my host has 4 * 4 core CPU.
> > 
> > Well, I just compare GPL and RHEL PV driver codes, I noticed most of
> the net
> > driver
> > initialization is the same. Only one difference, in GPL code has the
> code of
> > KeSetTargetProcessorDpc(&xi->rx_dpc, 0); //in xennet_rx.c line 953
> > but the RHEL doesn't
> > 
> > So I simply comments the code, recompile and did the test again.
> > The result shows good, all ping time less than < 1ms.
> > 
> > May it be the cause? Is it harmful to comments this line?
> > Many thanks.
> > 
> 
> At a guess I would say it should be harmful to performance, but all the
> critical code is protected by spinlocks.
> 
> It could be a leftover from a previous version of GPLPV. In the current
> version, the spinlock protected code is probably a little long winded
> but is nothing compared to the passing down of packets to Windows that
> is done in the DPC but outside the spinlock.
> 
> Can you do some general performance tests with this change?
> 
> Is the RHEL PV driver source publicly available?
> 
> James
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 2387 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Rather slow time of Pin in Windows with GPL PV driver
  2011-03-10  4:47           ` MaoXiaoyun
@ 2011-03-10  6:27             ` James Harper
  2011-03-10  9:27               ` Paul Durrant
  0 siblings, 1 reply; 28+ messages in thread
From: James Harper @ 2011-03-10  6:27 UTC (permalink / raw)
  To: MaoXiaoyun; +Cc: xen devel

> 
> It looks like KeSetTargetProcessorDpc(&xi->rx_dpc, 0) set the rx_dpc
in VCPU0
> only,
> and in fact interrput for xennet are distributed across all VCPUS.
> 
> By using IntFiltr from http://support.microsoft.com/kb/252867
> to set interrupt affinity to VCPU0 only, without
KeSetTargetProcessorDpc
> commentted, we get quite stable ping time too., which is less than 1ms
> 
> So I think this is the problem.
>  KeSetTargetProcessorDpc should be discard.
> 

Ah. So when the cpu for the irq is different to the cpu for the dpc, you
get the extra delay. That makes sense. It would also explain why XP
didn't seem to see the same problem as I think the IRQ is directed to
CPU0 there... I've been looking for the docs on what's different and
can't find anything.

If you can confirm that you have no problems with removing
KeSetTargetProcessorDpc I'll remove it, at least for >W2003 builds until
I find the docs about what NDIS expects to do on what CPU.

James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PV driver
  2011-03-10  6:27             ` James Harper
@ 2011-03-10  9:27               ` Paul Durrant
  2011-03-10  9:30                 ` RE: Rather slow time of Pin in Windows with GPL PVdriver James Harper
  0 siblings, 1 reply; 28+ messages in thread
From: Paul Durrant @ 2011-03-10  9:27 UTC (permalink / raw)
  To: James Harper, MaoXiaoyun; +Cc: xen devel

You have to be careful here. Xen will only ever deliver the evtchn interrupt to VCPU0. I can't immediately see anything preventing an HVM domain trying to bind and evtchn to another VCPU but you can see from the code in hvm_assert_evtchn_irq() that the guest will only be kicked for events bound to VCPU0 (is_hvm_pv_evtchn_vcpu() will only be true for Linux PVonHVM domains). Thus if you bind your DPC to a CPU other than zero and don't set it to HighImportance then it will not be immediately scheduled since default DPC importance is MediumImportance.

  Paul

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of James Harper
> Sent: 10 March 2011 06:27
> To: MaoXiaoyun
> Cc: xen devel
> Subject: [Xen-devel] RE: Rather slow time of Pin in Windows with GPL
> PV driver
> 
> >
> > It looks like KeSetTargetProcessorDpc(&xi->rx_dpc, 0) set the
> rx_dpc
> in VCPU0
> > only,
> > and in fact interrput for xennet are distributed across all VCPUS.
> >
> > By using IntFiltr from http://support.microsoft.com/kb/252867
> > to set interrupt affinity to VCPU0 only, without
> KeSetTargetProcessorDpc
> > commentted, we get quite stable ping time too., which is less than
> 1ms
> >
> > So I think this is the problem.
> >  KeSetTargetProcessorDpc should be discard.
> >
> 
> Ah. So when the cpu for the irq is different to the cpu for the dpc,
> you get the extra delay. That makes sense. It would also explain why
> XP didn't seem to see the same problem as I think the IRQ is
> directed to
> CPU0 there... I've been looking for the docs on what's different and
> can't find anything.
> 
> If you can confirm that you have no problems with removing
> KeSetTargetProcessorDpc I'll remove it, at least for >W2003 builds
> until I find the docs about what NDIS expects to do on what CPU.
> 
> James
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-10  9:27               ` Paul Durrant
@ 2011-03-10  9:30                 ` James Harper
  2011-03-10 10:34                   ` Paul Durrant
  0 siblings, 1 reply; 28+ messages in thread
From: James Harper @ 2011-03-10  9:30 UTC (permalink / raw)
  To: Paul Durrant, MaoXiaoyun; +Cc: xen devel

> 
> You have to be careful here. Xen will only ever deliver the evtchn
interrupt
> to VCPU0. I can't immediately see anything preventing an HVM domain
trying to
> bind and evtchn to another VCPU but you can see from the code in
> hvm_assert_evtchn_irq() that the guest will only be kicked for events
bound to
> VCPU0 (is_hvm_pv_evtchn_vcpu() will only be true for Linux PVonHVM
domains).
> Thus if you bind your DPC to a CPU other than zero and don't set it to
> HighImportance then it will not be immediately scheduled since default
DPC
> importance is MediumImportance.
> 

Are you sure? That's not what I remember seeing. You always have to
query shared_info_area->vcpu_info[0] not
shared_info_area->vcpu_info[vcpu], but the actual VCPU the interrupt is
scheduled onto can be any.

James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-10  9:30                 ` RE: Rather slow time of Pin in Windows with GPL PVdriver James Harper
@ 2011-03-10 10:34                   ` Paul Durrant
  2011-03-10 10:41                     ` James Harper
  0 siblings, 1 reply; 28+ messages in thread
From: Paul Durrant @ 2011-03-10 10:34 UTC (permalink / raw)
  To: James Harper, MaoXiaoyun; +Cc: xen devel

Yeah, you're right. We have a patch in XenServer to just use the lowest numbered vCPU but in unstable it still pointlessly round robins. Thus, if you bind DPCs and don't set their importance up you will end up with them not being immediately scheduled quite a lot of the time.

  Paul

> -----Original Message-----
> From: James Harper [mailto:james.harper@bendigoit.com.au]
> Sent: 10 March 2011 09:30
> To: Paul Durrant; MaoXiaoyun
> Cc: xen devel
> Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
> GPL PVdriver
> 
> >
> > You have to be careful here. Xen will only ever deliver the evtchn
> interrupt
> > to VCPU0. I can't immediately see anything preventing an HVM
> domain
> trying to
> > bind and evtchn to another VCPU but you can see from the code in
> > hvm_assert_evtchn_irq() that the guest will only be kicked for
> events
> bound to
> > VCPU0 (is_hvm_pv_evtchn_vcpu() will only be true for Linux PVonHVM
> domains).
> > Thus if you bind your DPC to a CPU other than zero and don't set
> it to
> > HighImportance then it will not be immediately scheduled since
> default
> DPC
> > importance is MediumImportance.
> >
> 
> Are you sure? That's not what I remember seeing. You always have to
> query shared_info_area->vcpu_info[0] not shared_info_area-
> >vcpu_info[vcpu], but the actual VCPU the interrupt is scheduled
> onto can be any.
> 
> James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-10 10:34                   ` Paul Durrant
@ 2011-03-10 10:41                     ` James Harper
  2011-03-10 11:05                       ` Paul Durrant
  0 siblings, 1 reply; 28+ messages in thread
From: James Harper @ 2011-03-10 10:41 UTC (permalink / raw)
  To: Paul Durrant, MaoXiaoyun; +Cc: xen devel

> 
> Yeah, you're right. We have a patch in XenServer to just use the
lowest
> numbered vCPU but in unstable it still pointlessly round robins. Thus,
if you
> bind DPCs and don't set their importance up you will end up with them
not
> being immediately scheduled quite a lot of the time.
> 

You say "pointlessly round robins"... why is the behaviour considered
pointless? (assuming you don't use bound DPCs)

I'm looking at my networking code and if I could schedule DPC's on
processors on a round-robin basis (eg because the IRQ's are submitted on
a round robin basis), one CPU could grab the rx ring lock, pull the data
off the ring into local buffers, release the lock, then process the
local buffers (build packets, submit to NDIS, etc). While the first CPU
is processing packets, another CPU can then start servicing the ring
too.

If Xen is changed to always send the IRQ to CPU zero then I'd have to
start round-robining DPC's myself if I wanted to do it that way...

Currently I'm suffering a bit from the small ring sizes not being able
to hold enough buffers to keep packets flowing quickly in all
situations.

James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-10 10:41                     ` James Harper
@ 2011-03-10 11:05                       ` Paul Durrant
  2011-03-10 18:22                         ` Pasi Kärkkäinen
  2011-03-11  5:10                         ` RE: Rather slow time of Pin in Windows with GPL PVdriver MaoXiaoyun
  0 siblings, 2 replies; 28+ messages in thread
From: Paul Durrant @ 2011-03-10 11:05 UTC (permalink / raw)
  To: James Harper, MaoXiaoyun; +Cc: xen devel

It's kind of pointless because you're always having to go to vCPU0's shared info for the event info. so you're just going to keep pinging this between caches all the time. Same holds true of data you access in your DPC if it's constantly moving around. Better IMO to keep locality by default and distribute DPCs accessing distinct data explicitly.

  Paul

> -----Original Message-----
> From: James Harper [mailto:james.harper@bendigoit.com.au]
> Sent: 10 March 2011 10:41
> To: Paul Durrant; MaoXiaoyun
> Cc: xen devel
> Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
> GPL PVdriver
> 
> >
> > Yeah, you're right. We have a patch in XenServer to just use the
> lowest
> > numbered vCPU but in unstable it still pointlessly round robins.
> Thus,
> if you
> > bind DPCs and don't set their importance up you will end up with
> them
> not
> > being immediately scheduled quite a lot of the time.
> >
> 
> You say "pointlessly round robins"... why is the behaviour
> considered pointless? (assuming you don't use bound DPCs)
> 
> I'm looking at my networking code and if I could schedule DPC's on
> processors on a round-robin basis (eg because the IRQ's are
> submitted on a round robin basis), one CPU could grab the rx ring
> lock, pull the data off the ring into local buffers, release the
> lock, then process the local buffers (build packets, submit to NDIS,
> etc). While the first CPU is processing packets, another CPU can
> then start servicing the ring too.
> 
> If Xen is changed to always send the IRQ to CPU zero then I'd have
> to start round-robining DPC's myself if I wanted to do it that
> way...
> 
> Currently I'm suffering a bit from the small ring sizes not being
> able to hold enough buffers to keep packets flowing quickly in all
> situations.
> 
> James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-10 11:05                       ` Paul Durrant
@ 2011-03-10 18:22                         ` Pasi Kärkkäinen
  2011-03-11  9:53                           ` Paul Durrant
  2011-03-11  5:10                         ` RE: Rather slow time of Pin in Windows with GPL PVdriver MaoXiaoyun
  1 sibling, 1 reply; 28+ messages in thread
From: Pasi Kärkkäinen @ 2011-03-10 18:22 UTC (permalink / raw)
  To: Paul Durrant; +Cc: MaoXiaoyun, James Harper, xen devel

On Thu, Mar 10, 2011 at 11:05:56AM +0000, Paul Durrant wrote:
> It's kind of pointless because you're always having to go to vCPU0's shared info for the event info. so you're just going to keep pinging this between caches all the time. Same holds true of data you access in your DPC if it's constantly moving around. Better IMO to keep locality by default and distribute DPCs accessing distinct data explicitly.
> 

Should this patch be upstreamed then? 

-- Pasi

>   Paul
> 
> > -----Original Message-----
> > From: James Harper [mailto:james.harper@bendigoit.com.au]
> > Sent: 10 March 2011 10:41
> > To: Paul Durrant; MaoXiaoyun
> > Cc: xen devel
> > Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
> > GPL PVdriver
> > 
> > >
> > > Yeah, you're right. We have a patch in XenServer to just use the
> > lowest
> > > numbered vCPU but in unstable it still pointlessly round robins.
> > Thus,
> > if you
> > > bind DPCs and don't set their importance up you will end up with
> > them
> > not
> > > being immediately scheduled quite a lot of the time.
> > >
> > 
> > You say "pointlessly round robins"... why is the behaviour
> > considered pointless? (assuming you don't use bound DPCs)
> > 
> > I'm looking at my networking code and if I could schedule DPC's on
> > processors on a round-robin basis (eg because the IRQ's are
> > submitted on a round robin basis), one CPU could grab the rx ring
> > lock, pull the data off the ring into local buffers, release the
> > lock, then process the local buffers (build packets, submit to NDIS,
> > etc). While the first CPU is processing packets, another CPU can
> > then start servicing the ring too.
> > 
> > If Xen is changed to always send the IRQ to CPU zero then I'd have
> > to start round-robining DPC's myself if I wanted to do it that
> > way...
> > 
> > Currently I'm suffering a bit from the small ring sizes not being
> > able to hold enough buffers to keep packets flowing quickly in all
> > situations.
> > 
> > James
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-10 11:05                       ` Paul Durrant
  2011-03-10 18:22                         ` Pasi Kärkkäinen
@ 2011-03-11  5:10                         ` MaoXiaoyun
  2011-03-12 23:15                           ` James Harper
  2011-03-14  0:45                           ` James Harper
  1 sibling, 2 replies; 28+ messages in thread
From: MaoXiaoyun @ 2011-03-11  5:10 UTC (permalink / raw)
  To: paul.durrant, james.harper; +Cc: xen devel


[-- Attachment #1.1: Type: text/plain, Size: 2389 bytes --]


Hi Paul:
 
      Sorry I'm not fully follow your point. 
      One quick question is when you mention "pointless round robin", which piece of code did you refer to?
 
thanks.
 
> From: Paul.Durrant@citrix.com
> To: james.harper@bendigoit.com.au; tinnycloud@hotmail.com
> CC: xen-devel@lists.xensource.com
> Date: Thu, 10 Mar 2011 11:05:56 +0000
> Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with GPL PVdriver
> 
> It's kind of pointless because you're always having to go to vCPU0's shared info for the event info. so you're just going to keep pinging this between caches all the time. Same holds true of data you access in your DPC if it's constantly moving around. Better IMO to keep locality by default and distribute DPCs accessing distinct data explicitly.
> 
> Paul
> 
> > -----Original Message-----
> > From: James Harper [mailto:james.harper@bendigoit.com.au]
> > Sent: 10 March 2011 10:41
> > To: Paul Durrant; MaoXiaoyun
> > Cc: xen devel
> > Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
> > GPL PVdriver
> > 
> > >
> > > Yeah, you're right. We have a patch in XenServer to just use the
> > lowest
> > > numbered vCPU but in unstable it still pointlessly round robins.
> > Thus,
> > if you
> > > bind DPCs and don't set their importance up you will end up with
> > them
> > not
> > > being immediately scheduled quite a lot of the time.
> > >
> > 
> > You say "pointlessly round robins"... why is the behaviour
> > considered pointless? (assuming you don't use bound DPCs)
> > 
> > I'm looking at my networking code and if I could schedule DPC's on
> > processors on a round-robin basis (eg because the IRQ's are
> > submitted on a round robin basis), one CPU could grab the rx ring
> > lock, pull the data off the ring into local buffers, release the
> > lock, then process the local buffers (build packets, submit to NDIS,
> > etc). While the first CPU is processing packets, another CPU can
> > then start servicing the ring too.
> > 
> > If Xen is changed to always send the IRQ to CPU zero then I'd have
> > to start round-robining DPC's myself if I wanted to do it that
> > way...
> > 
> > Currently I'm suffering a bit from the small ring sizes not being
> > able to hold enough buffers to keep packets flowing quickly in all
> > situations.
> > 
> > James
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 3098 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-10 18:22                         ` Pasi Kärkkäinen
@ 2011-03-11  9:53                           ` Paul Durrant
  2011-03-13 23:43                             ` RE: Rather slow time of Ping in Windows with GPLPVdriver James Harper
  0 siblings, 1 reply; 28+ messages in thread
From: Paul Durrant @ 2011-03-11  9:53 UTC (permalink / raw)
  To: Pasi Kärkkäinen; +Cc: MaoXiaoyun, James Harper, xen devel

I did post a patch ages ago. It was deemed a bit too hacky. I think it would probably be better to re-examine the way Windows PV drivers are handling interrupts. It would be much nicer if we could properly bind event channels across all our vCPUs; we may be able to leverage what Stefano did for Linux PV-on-HVM.

  Paul

> -----Original Message-----
> From: Pasi Kärkkäinen [mailto:pasik@iki.fi]
> Sent: 10 March 2011 18:23
> To: Paul Durrant
> Cc: James Harper; MaoXiaoyun; xen devel
> Subject: Re: [Xen-devel] RE: Rather slow time of Pin in Windows with
> GPL PVdriver
> 
> On Thu, Mar 10, 2011 at 11:05:56AM +0000, Paul Durrant wrote:
> > It's kind of pointless because you're always having to go to
> vCPU0's shared info for the event info. so you're just going to keep
> pinging this between caches all the time. Same holds true of data
> you access in your DPC if it's constantly moving around. Better IMO
> to keep locality by default and distribute DPCs accessing distinct
> data explicitly.
> >
> 
> Should this patch be upstreamed then?
> 
> -- Pasi
> 
> >   Paul
> >
> > > -----Original Message-----
> > > From: James Harper [mailto:james.harper@bendigoit.com.au]
> > > Sent: 10 March 2011 10:41
> > > To: Paul Durrant; MaoXiaoyun
> > > Cc: xen devel
> > > Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows
> with
> > > GPL PVdriver
> > >
> > > >
> > > > Yeah, you're right. We have a patch in XenServer to just use
> the
> > > lowest
> > > > numbered vCPU but in unstable it still pointlessly round
> robins.
> > > Thus,
> > > if you
> > > > bind DPCs and don't set their importance up you will end up
> with
> > > them
> > > not
> > > > being immediately scheduled quite a lot of the time.
> > > >
> > >
> > > You say "pointlessly round robins"... why is the behaviour
> > > considered pointless? (assuming you don't use bound DPCs)
> > >
> > > I'm looking at my networking code and if I could schedule DPC's
> on
> > > processors on a round-robin basis (eg because the IRQ's are
> > > submitted on a round robin basis), one CPU could grab the rx
> ring
> > > lock, pull the data off the ring into local buffers, release the
> > > lock, then process the local buffers (build packets, submit to
> NDIS,
> > > etc). While the first CPU is processing packets, another CPU can
> > > then start servicing the ring too.
> > >
> > > If Xen is changed to always send the IRQ to CPU zero then I'd
> have
> > > to start round-robining DPC's myself if I wanted to do it that
> > > way...
> > >
> > > Currently I'm suffering a bit from the small ring sizes not
> being
> > > able to hold enough buffers to keep packets flowing quickly in
> all
> > > situations.
> > >
> > > James
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-11  5:10                         ` RE: Rather slow time of Pin in Windows with GPL PVdriver MaoXiaoyun
@ 2011-03-12 23:15                           ` James Harper
  2011-03-14  0:45                           ` James Harper
  1 sibling, 0 replies; 28+ messages in thread
From: James Harper @ 2011-03-12 23:15 UTC (permalink / raw)
  To: MaoXiaoyun, paul.durrant; +Cc: xen devel

I've just pushed a bit of a rewrite of the rx path in gplpv. It's not
particularly well tested yet but I can't get it to crash. It should
scale much better with SMP too. I'm using more lock free data structures
so the lock's are held for much less time.

James

> -----Original Message-----
> From: MaoXiaoyun [mailto:tinnycloud@hotmail.com]
> Sent: Friday, 11 March 2011 16:10
> To: paul.durrant@citrix.com; James Harper
> Cc: xen devel
> Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
GPL
> PVdriver
> 
> Hi Paul:
> 
>       Sorry I'm not fully follow your point.
>       One quick question is when you mention "pointless round robin",
which
> piece of code did you refer to?
> 
> thanks.
> 
> > From: Paul.Durrant@citrix.com
> > To: james.harper@bendigoit.com.au; tinnycloud@hotmail.com
> > CC: xen-devel@lists.xensource.com
> > Date: Thu, 10 Mar 2011 11:05:56 +0000
> > Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
GPL
> PVdriver
> >
> > It's kind of pointless because you're always having to go to vCPU0's
shared
> info for the event info. so you're just going to keep pinging this
between
> caches all the time. Same holds true of data you access in your DPC if
it's
> constantly moving around. Better IMO to keep locality by default and
> distribute DPCs accessing distinct data explicitly.
> >
> > Paul
> >
> > > -----Original Message-----
> > > From: James Harper [mailto:james.harper@bendigoit.com.au]
> > > Sent: 10 March 2011 10:41
> > > To: Paul Durrant; MaoXiaoyun
> > > Cc: xen devel
> > > Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows
with
> > > GPL PVdriver
> > >
> > > >
> > > > Yeah, you're right. We have a patch in XenServer to just use the
> > > lowest
> > > > numbered vCPU but in unstable it still pointlessly round robins.
> > > Thus,
> > > if you
> > > > bind DPCs and don't set their importance up you will end up with
> > > them
> > > not
> > > > being immediately scheduled quite a lot of the time.
> > > >
> > >
> > > You say "pointlessly round robins"... why is the behaviour
> > > considered pointless? (assuming you don't use bound DPCs)
> > >
> > > I'm looking at my networking code and if I could schedule DPC's on
> > > processors on a round-robin basis (eg because the IRQ's are
> > > submitted on a round robin basis), one CPU could grab the rx ring
> > > lock, pull the data off the ring into local buffers, release the
> > > lock, then process the local buffers (build packets, submit to
NDIS,
> > > etc). While the first CPU is processing packets, another CPU can
> > > then start servicing the ring too.
> > >
> > > If Xen is changed to always send the IRQ to CPU zero then I'd have
> > > to start round-robining DPC's myself if I wanted to do it that
> > > way...
> > >
> > > Currently I'm suffering a bit from the small ring sizes not being
> > > able to hold enough buffers to keep packets flowing quickly in all
> > > situations.
> > >
> > > James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Ping in Windows with GPLPVdriver
  2011-03-11  9:53                           ` Paul Durrant
@ 2011-03-13 23:43                             ` James Harper
  2011-03-14 10:22                               ` Paul Durrant
  0 siblings, 1 reply; 28+ messages in thread
From: James Harper @ 2011-03-13 23:43 UTC (permalink / raw)
  To: Paul Durrant, Pasi Kärkkäinen; +Cc: MaoXiaoyun, xen devel

> 
> I did post a patch ages ago. It was deemed a bit too hacky. I think it would
> probably be better to re-examine the way Windows PV drivers are handling
> interrupts. It would be much nicer if we could properly bind event channels
> across all our vCPUs; we may be able to leverage what Stefano did for Linux
> PV-on-HVM.
> 

What would also be nice is to have multiple interrupts attached to the platform pci driver, and bind events to a specific interrupt, and be able to control the affinity of each interrupt.

Another idea would be that each xenbus device hotplugs a new pci device with an interrupt. That only works for OS's that support hotplug pci though... 

MSI interrupts might be another way of conveying event channel information as part of the interrupt, but I don't know enough about how MSI works to know if that is possible. I believe you still need one irq per 'message id' so your back to my first wish item.

Under Windows, if we set the affinity of the platform pci irq to cpu0 will that do the job (bind the irq to cpu0), or are there inefficiencies in doing that?

James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-11  5:10                         ` RE: Rather slow time of Pin in Windows with GPL PVdriver MaoXiaoyun
  2011-03-12 23:15                           ` James Harper
@ 2011-03-14  0:45                           ` James Harper
  2011-03-14  2:44                             ` MaoXiaoyun
  2011-03-14 10:32                             ` RE: Rather slow time of Pin " Paul Durrant
  1 sibling, 2 replies; 28+ messages in thread
From: James Harper @ 2011-03-14  0:45 UTC (permalink / raw)
  To: MaoXiaoyun, paul.durrant; +Cc: xen devel

> 
> I've just pushed a bit of a rewrite of the rx path in gplpv. It's not
> particularly well tested yet but I can't get it to crash. It should
scale much
> better with SMP too. I'm using more lock free data structures so the
lock's
> are held for much less time.
> 

Unfortunately performance still isn't good. What I've found is that NDIS
really does want you to only process packets on one CPU at one time (eg
CPU0), otherwise they are indicated to NDIS out of order causing serious
performance problems (according to the docs).

In addition to KeSetTargetProcessorDpc(&xi->rx_dpc, 0), we also need to
do KeSetImportanceDpc(&xi->rx_dpc, HighImportance) - as Paul stated,
which makes sure the DPC runs immediately even if it is triggered from
another CPU (I assume this has IPI overhead though). I think I could
detect >1 CPU's and schedule the rx and tx onto different CPU's to each
other, but always the same CPU.

Windows does support RSS which ensures per-connection in-order
processing of packets. From reading the "Receive-Side Scaling
Enhancements in Windows Server 2008" document, it appears that we would
need to hash various fields in the packet header and compute a CPU
number for that connection, then schedule the DPC onto that CPU. It
shouldn't be that hard except that xennet.sys is an NDIS5.1 driver, not
an NDIS6.0 driver, and in order to support NDIS6.0 I would need to
maintain two trees which I'm reluctant to do without a very good reason.
Other docs state the RSS is supported for Windows 2003 SP2 but I can't
find any specifics - I've asked the question on the ntdev list.

James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-14  0:45                           ` James Harper
@ 2011-03-14  2:44                             ` MaoXiaoyun
  2011-03-14  3:10                               ` RE: Rather slow time of Ping " James Harper
  2011-03-14 10:32                             ` RE: Rather slow time of Pin " Paul Durrant
  1 sibling, 1 reply; 28+ messages in thread
From: MaoXiaoyun @ 2011-03-14  2:44 UTC (permalink / raw)
  To: James Harper, paul.durrant; +Cc: xen devel


[-- Attachment #1.1: Type: text/plain, Size: 2383 bytes --]


Do you mean if we discard KeSetTargetProcessorDpc(&xi->rx_dpc, 0) , the interrupts
will be processed across on different VCPUS, but will cause serious performance issue?
Where could I find the releated docs?
 
So actually we need do KeSetImportanceDpc(&xi->rx_dpc, HighImportance) to solve
ping problem.  Though performance is not the best, but it should not decrease, right?
 
many thanks.
 
 > Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with GPL PVdriver
> Date: Mon, 14 Mar 2011 11:45:46 +1100
> From: james.harper@bendigoit.com.au
> To: tinnycloud@hotmail.com; paul.durrant@citrix.com
> CC: xen-devel@lists.xensource.com
> 
> > 
> > I've just pushed a bit of a rewrite of the rx path in gplpv. It's not
> > particularly well tested yet but I can't get it to crash. It should
> scale much
> > better with SMP too. I'm using more lock free data structures so the
> lock's
> > are held for much less time.
> > 
> 
> Unfortunately performance still isn't good. What I've found is that NDIS
> really does want you to only process packets on one CPU at one time (eg
> CPU0), otherwise they are indicated to NDIS out of order causing serious
> performance problems (according to the docs).
> 
> In addition to KeSetTargetProcessorDpc(&xi->rx_dpc, 0), we also need to
> do KeSetImportanceDpc(&xi->rx_dpc, HighImportance) - as Paul stated,
> which makes sure the DPC runs immediately even if it is triggered from
> another CPU (I assume this has IPI overhead though). I think I could
> detect >1 CPU's and schedule the rx and tx onto different CPU's to each
> other, but always the same CPU.
> 
> Windows does support RSS which ensures per-connection in-order
> processing of packets. From reading the "Receive-Side Scaling
> Enhancements in Windows Server 2008" document, it appears that we would
> need to hash various fields in the packet header and compute a CPU
> number for that connection, then schedule the DPC onto that CPU. It
> shouldn't be that hard except that xennet.sys is an NDIS5.1 driver, not
> an NDIS6.0 driver, and in order to support NDIS6.0 I would need to
> maintain two trees which I'm reluctant to do without a very good reason.
> Other docs state the RSS is supported for Windows 2003 SP2 but I can't
> find any specifics - I've asked the question on the ntdev list.
> 
> James
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 2895 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Ping in Windows with GPL PVdriver
  2011-03-14  2:44                             ` MaoXiaoyun
@ 2011-03-14  3:10                               ` James Harper
  2011-03-14  3:48                                 ` MaoXiaoyun
  0 siblings, 1 reply; 28+ messages in thread
From: James Harper @ 2011-03-14  3:10 UTC (permalink / raw)
  To: MaoXiaoyun, paul.durrant; +Cc: xen devel

> 
> Do you mean if we discard KeSetTargetProcessorDpc(&xi->rx_dpc, 0) ,
the
> interrupts
> will be processed across on different VCPUS, but will cause serious
> performance issue?
> Where could I find the releated docs?
> 
> So actually we need do KeSetImportanceDpc(&xi->rx_dpc, HighImportance)
to
> solve
> ping problem.  Though performance is not the best, but it should not
decrease,
> right?
> 

In my testing, without the KeSetTargetProcessorDpc, iperf would give
inconsistent results, which I assume is because packets were being
delivered to NDIS out of order.

KeSetImportanceDpc(HighImportance) should resolve the 15ms response time
you were seeing as the DPC will be immediately scheduled on the other
processor, rather than scheduled some time later.

James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Ping in Windows with GPL PVdriver
  2011-03-14  3:10                               ` RE: Rather slow time of Ping " James Harper
@ 2011-03-14  3:48                                 ` MaoXiaoyun
  2011-03-14  3:50                                   ` James Harper
  2011-03-14 10:35                                   ` Paul Durrant
  0 siblings, 2 replies; 28+ messages in thread
From: MaoXiaoyun @ 2011-03-14  3:48 UTC (permalink / raw)
  To: james.harper, paul.durrant; +Cc: xen devel


[-- Attachment #1.1: Type: text/plain, Size: 2099 bytes --]


Thanks James.
 
I will do some iperf test either.
 
One more quesiton:
Does "Xen will only ever deliver the evtchn interrupt to VCPU0" mentioned by Paul right?
If so, how to explain the log I printed before?
It looks like all VCPUS have got the packets. 
 
===============Result================================
      1) KeSetTargetProcessorDpc(&xi->rx_dpc, 0) is commentted.
       XnetNet       pcpu = 1
       XnetNet       pcpu = 3
       XnetNet       pcpu = 2
       XnetNet       pcpu = 3
       XnetNet       pcpu = 7
       XnetNet       pcpu = 0
       XnetNet       pcpu = 5
       XnetNet       pcpu = 3
       XnetNet       pcpu = 0
       XnetNet       pcpu = 3
       XnetNet       pcpu = 7
       XnetNet       pcpu = 4
       XnetNet       pcpu = 5
       XnetNet       pcpu = 2
       XnetNet       pcpu = 4
       XnetNet       pcpu = 5
       XnetNet       pcpu = 6
       XnetNet       pcpu = 0
       XnetNet       pcpu = 6
 
> Subject: RE: [Xen-devel] RE: Rather slow time of Ping in Windows with GPL PVdriver
> Date: Mon, 14 Mar 2011 14:10:46 +1100
> From: james.harper@bendigoit.com.au
> To: tinnycloud@hotmail.com; paul.durrant@citrix.com
> CC: xen-devel@lists.xensource.com
> 
> > 
> > Do you mean if we discard KeSetTargetProcessorDpc(&xi->rx_dpc, 0) ,
> the
> > interrupts
> > will be processed across on different VCPUS, but will cause serious
> > performance issue?
> > Where could I find the releated docs?
> > 
> > So actually we need do KeSetImportanceDpc(&xi->rx_dpc, HighImportance)
> to
> > solve
> > ping problem. Though performance is not the best, but it should not
> decrease,
> > right?
> > 
> 
> In my testing, without the KeSetTargetProcessorDpc, iperf would give
> inconsistent results, which I assume is because packets were being
> delivered to NDIS out of order.
> 
> KeSetImportanceDpc(HighImportance) should resolve the 15ms response time
> you were seeing as the DPC will be immediately scheduled on the other
> processor, rather than scheduled some time later.
> 
> James
 		 	   		  

[-- Attachment #1.2: Type: text/html, Size: 4593 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Ping in Windows with GPL PVdriver
  2011-03-14  3:48                                 ` MaoXiaoyun
@ 2011-03-14  3:50                                   ` James Harper
  2011-03-14 10:35                                   ` Paul Durrant
  1 sibling, 0 replies; 28+ messages in thread
From: James Harper @ 2011-03-14  3:50 UTC (permalink / raw)
  To: MaoXiaoyun, paul.durrant; +Cc: xen devel

> Thanks James.
> 
> I will do some iperf test either.
> 
> One more quesiton:
> Does "Xen will only ever deliver the evtchn interrupt to VCPU0"
mentioned by
> Paul right?

I think he later mentioned that the feature he referred to wasn't in the
version we were using, just the citrix version.

James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Ping in Windows with GPLPVdriver
  2011-03-13 23:43                             ` RE: Rather slow time of Ping in Windows with GPLPVdriver James Harper
@ 2011-03-14 10:22                               ` Paul Durrant
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2011-03-14 10:22 UTC (permalink / raw)
  To: James Harper, Pasi Kärkkäinen; +Cc: MaoXiaoyun, xen devel

Nope, limiting the affinity mask before your IoConnectInterrupt(Ex) will work just fine, although you do risk Windows not giving you an interrupt if it decides for some reason that it's out of vectors on CPU0. Pretty small risk though, given that it's shareable :-)

  Paul

> -----Original Message-----
> From: James Harper [mailto:james.harper@bendigoit.com.au]
> Sent: 13 March 2011 23:44
> To: Paul Durrant; Pasi Kärkkäinen
> Cc: MaoXiaoyun; xen devel
> Subject: RE: [Xen-devel] RE: Rather slow time of Ping in Windows
> with GPLPVdriver
> 
> >
> > I did post a patch ages ago. It was deemed a bit too hacky. I
> think it
> > would probably be better to re-examine the way Windows PV drivers
> are
> > handling interrupts. It would be much nicer if we could properly
> bind
> > event channels across all our vCPUs; we may be able to leverage
> what
> > Stefano did for Linux PV-on-HVM.
> >
> 
> What would also be nice is to have multiple interrupts attached to
> the platform pci driver, and bind events to a specific interrupt,
> and be able to control the affinity of each interrupt.
> 
> Another idea would be that each xenbus device hotplugs a new pci
> device with an interrupt. That only works for OS's that support
> hotplug pci though...
> 
> MSI interrupts might be another way of conveying event channel
> information as part of the interrupt, but I don't know enough about
> how MSI works to know if that is possible. I believe you still need
> one irq per 'message id' so your back to my first wish item.
> 
> Under Windows, if we set the affinity of the platform pci irq to
> cpu0 will that do the job (bind the irq to cpu0), or are there
> inefficiencies in doing that?
> 
> James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-14  0:45                           ` James Harper
  2011-03-14  2:44                             ` MaoXiaoyun
@ 2011-03-14 10:32                             ` Paul Durrant
  2011-03-14 11:20                               ` James Harper
  1 sibling, 1 reply; 28+ messages in thread
From: Paul Durrant @ 2011-03-14 10:32 UTC (permalink / raw)
  To: James Harper, MaoXiaoyun; +Cc: xen devel

NDIS 5.x on Vista+ has some serious issues: see http://www.osronline.com/showThread.cfm?link=124242

This probably doesn't explain an immediate performance issue though. RSS is supported on Windows 2k3 SP2 IIRC but you need to bind as NDIS 5.2. I don't think it's present in the 6.x -> 5.x wrapper in Vista+ though. You'd need to use NDIS 6.1+.

  Paul

> -----Original Message-----
> From: James Harper [mailto:james.harper@bendigoit.com.au]
> Sent: 14 March 2011 00:46
> To: MaoXiaoyun; Paul Durrant
> Cc: xen devel
> Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
> GPL PVdriver
> 
> >
> > I've just pushed a bit of a rewrite of the rx path in gplpv. It's
> not
> > particularly well tested yet but I can't get it to crash. It
> should
> scale much
> > better with SMP too. I'm using more lock free data structures so
> the
> lock's
> > are held for much less time.
> >
> 
> Unfortunately performance still isn't good. What I've found is that
> NDIS really does want you to only process packets on one CPU at one
> time (eg CPU0), otherwise they are indicated to NDIS out of order
> causing serious performance problems (according to the docs).
> 
> In addition to KeSetTargetProcessorDpc(&xi->rx_dpc, 0), we also need
> to do KeSetImportanceDpc(&xi->rx_dpc, HighImportance) - as Paul
> stated, which makes sure the DPC runs immediately even if it is
> triggered from another CPU (I assume this has IPI overhead though).
> I think I could detect >1 CPU's and schedule the rx and tx onto
> different CPU's to each other, but always the same CPU.
> 
> Windows does support RSS which ensures per-connection in-order
> processing of packets. From reading the "Receive-Side Scaling
> Enhancements in Windows Server 2008" document, it appears that we
> would need to hash various fields in the packet header and compute a
> CPU number for that connection, then schedule the DPC onto that CPU.
> It shouldn't be that hard except that xennet.sys is an NDIS5.1
> driver, not an NDIS6.0 driver, and in order to support NDIS6.0 I
> would need to maintain two trees which I'm reluctant to do without a
> very good reason.
> Other docs state the RSS is supported for Windows 2003 SP2 but I
> can't find any specifics - I've asked the question on the ntdev
> list.
> 
> James

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Ping in Windows with GPL PVdriver
  2011-03-14  3:48                                 ` MaoXiaoyun
  2011-03-14  3:50                                   ` James Harper
@ 2011-03-14 10:35                                   ` Paul Durrant
  1 sibling, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2011-03-14 10:35 UTC (permalink / raw)
  To: MaoXiaoyun, james.harper@bendigoit.com.au; +Cc: xen devel


[-- Attachment #1.1: Type: text/plain, Size: 2534 bytes --]

No, as James said, the interrupt targeting patch is only in Citrix XenServer.

  Paul

From: MaoXiaoyun [mailto:tinnycloud@hotmail.com]
Sent: 14 March 2011 03:49
To: james.harper@bendigoit.com.au; Paul Durrant
Cc: xen devel
Subject: RE: [Xen-devel] RE: Rather slow time of Ping in Windows with GPL PVdriver

Thanks James.

I will do some iperf test either.

One more quesiton:
Does "Xen will only ever deliver the evtchn interrupt to VCPU0" mentioned by Paul right?
If so, how to explain the log I printed before?
It looks like all VCPUS have got the packets.

===============Result================================
      1) KeSetTargetProcessorDpc(&xi->rx_dpc, 0) is commentted.
       XnetNet       pcpu = 1
       XnetNet       pcpu = 3
       XnetNet       pcpu = 2
       XnetNet       pcpu = 3
       XnetNet       pcpu = 7
       XnetNet       pcpu = 0
       XnetNet       pcpu = 5
       XnetNet       pcpu = 3
       XnetNet       pcpu = 0
       XnetNet       pcpu = 3
       XnetNet       pcpu = 7
       XnetNet       pcpu = 4
       XnetNet       pcpu = 5
       XnetNet       pcpu = 2
       XnetNet       pcpu = 4
       XnetNet       pcpu = 5
       XnetNet       pcpu = 6
       XnetNet       pcpu = 0
       XnetNet       pcpu = 6



> Subject: RE: [Xen-devel] RE: Rather slow time of Ping in Windows with GPL PVdriver
> Date: Mon, 14 Mar 2011 14:10:46 +1100
> From: james.harper@bendigoit.com.au<mailto:james.harper@bendigoit.com.au>
> To: tinnycloud@hotmail.com<mailto:tinnycloud@hotmail.com>; paul.durrant@citrix.com<mailto:paul.durrant@citrix.com>
> CC: xen-devel@lists.xensource.com<mailto:xen-devel@lists.xensource.com>
>
> >
> > Do you mean if we discard KeSetTargetProcessorDpc(&xi->rx_dpc, 0) ,
> the
> > interrupts
> > will be processed across on different VCPUS, but will cause serious
> > performance issue?
> > Where could I find the releated docs?
> >
> > So actually we need do KeSetImportanceDpc(&xi->rx_dpc, HighImportance)
> to
> > solve
> > ping problem. Though performance is not the best, but it should not
> decrease,
> > right?
> >
>
> In my testing, without the KeSetTargetProcessorDpc, iperf would give
> inconsistent results, which I assume is because packets were being
> delivered to NDIS out of order.
>
> KeSetImportanceDpc(HighImportance) should resolve the 15ms response time
> you were seeing as the DPC will be immediately scheduled on the other
> processor, rather than scheduled some time later.
>
> James

[-- Attachment #1.2: Type: text/html, Size: 8867 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
  2011-03-14 10:32                             ` RE: Rather slow time of Pin " Paul Durrant
@ 2011-03-14 11:20                               ` James Harper
  0 siblings, 0 replies; 28+ messages in thread
From: James Harper @ 2011-03-14 11:20 UTC (permalink / raw)
  To: Paul Durrant, MaoXiaoyun; +Cc: xen devel

> 
> NDIS 5.x on Vista+ has some serious issues: see
> http://www.osronline.com/showThread.cfm?link=124242
> 

I was exited then as someone has reported a problem with GPLPV where the
rx path appears to hang (due to running out of resources) after some
days, but unfortunately it's with 2003 not 2008, and it looks like it's
packets that it's running out of, not buffers. D'oh.

> This probably doesn't explain an immediate performance issue though.
RSS is
> supported on Windows 2k3 SP2 IIRC but you need to bind as NDIS 5.2. I
don't
> think it's present in the 6.x -> 5.x wrapper in Vista+ though. You'd
need to
> use NDIS 6.1+.
> 

That kind of removes the attraction a bit. It sounds like 5.2 is a bit
of an orphan as it isn't mentioned anywhere but the SNP KB pages (in
overview form), and most of the links from there have suffered some
major bitrot and either redirect to NDIS6.2 or to 'page not found'.

James

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-03-14 11:20 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-09  6:53 Rather slow time of Pin in Windows with GPL PV driver MaoXiaoyun
2011-03-09  7:58 ` John Weekes
     [not found] ` <D271C3A4-9B27-4E08-A92A-D55A811736EC@bendigoit.com.au>
2011-03-09 10:20   ` MaoXiaoyun
     [not found]   ` <BLU157-w82233DE21FFA3AC07FCC3DAC90@phx.gbl>
     [not found]     ` <AEC6C66638C05B468B556EA548C1A77D01C55DCB@trantor>
2011-03-09 11:15       ` MaoXiaoyun
2011-03-09 11:28         ` James Harper
2011-03-09 11:39           ` MaoXiaoyun
2011-03-10  3:17           ` MaoXiaoyun
2011-03-10  4:47           ` MaoXiaoyun
2011-03-10  6:27             ` James Harper
2011-03-10  9:27               ` Paul Durrant
2011-03-10  9:30                 ` RE: Rather slow time of Pin in Windows with GPL PVdriver James Harper
2011-03-10 10:34                   ` Paul Durrant
2011-03-10 10:41                     ` James Harper
2011-03-10 11:05                       ` Paul Durrant
2011-03-10 18:22                         ` Pasi Kärkkäinen
2011-03-11  9:53                           ` Paul Durrant
2011-03-13 23:43                             ` RE: Rather slow time of Ping in Windows with GPLPVdriver James Harper
2011-03-14 10:22                               ` Paul Durrant
2011-03-11  5:10                         ` RE: Rather slow time of Pin in Windows with GPL PVdriver MaoXiaoyun
2011-03-12 23:15                           ` James Harper
2011-03-14  0:45                           ` James Harper
2011-03-14  2:44                             ` MaoXiaoyun
2011-03-14  3:10                               ` RE: Rather slow time of Ping " James Harper
2011-03-14  3:48                                 ` MaoXiaoyun
2011-03-14  3:50                                   ` James Harper
2011-03-14 10:35                                   ` Paul Durrant
2011-03-14 10:32                             ` RE: Rather slow time of Pin " Paul Durrant
2011-03-14 11:20                               ` James Harper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.