* [PATCH net-next] virtio: Fix affinity for >32 VCPUs @ 2017-02-03 6:19 Ben Serebrin 2017-02-03 15:07 ` Michael S. Tsirkin ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Ben Serebrin @ 2017-02-03 6:19 UTC (permalink / raw) To: netdev, mst, jasowang, davem, willemb, venkateshs, jmattson, serebrin From: Benjamin Serebrin <serebrin@google.com> If the number of virtio queue pairs is not equal to the number of VCPUs, the virtio guest driver doesn't assign any CPU affinity for the queue interrupts or the xps aggregation interrupt. Google Compute Engine currently provides 1 queue pair for every VCPU, but limits that at a maximum of 32 queue pairs. This code assigns interrupt affinity even when there are more than 32 VCPUs. Tested: (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2) Without the fix we see all queues affinitized to all CPUs: cd /proc/irq for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done 0-63 [...] 0-63 and we see all TX queues' xps_cpus affinitzed to no cores: for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done 00000000,00000000 [...] 00000000,00000000 With the fix, we see each queue assigned to the a single core, and xps affinity set to 1 unique cpu per TX queue. 64 VCPU: cd /proc/irq for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done 0-63 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 0-63 0-63 0-63 0-63 cd /sys/class/net/eth0/queues for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done 00000001,00000001 00000002,00000002 00000004,00000004 00000008,00000008 00000010,00000010 00000020,00000020 00000040,00000040 00000080,00000080 00000100,00000100 00000200,00000200 00000400,00000400 00000800,00000800 00001000,00001000 00002000,00002000 00004000,00004000 00008000,00008000 00010000,00010000 00020000,00020000 00040000,00040000 00080000,00080000 00100000,00100000 00200000,00200000 00400000,00400000 00800000,00800000 01000000,01000000 02000000,02000000 04000000,04000000 08000000,08000000 10000000,10000000 20000000,20000000 40000000,40000000 80000000,80000000 48 VCPU: cd /proc/irq for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done 0-47 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 0-47 0-47 0-47 0-47 cd /sys/class/net/eth0/queues for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done 0001,00000001 0002,00000002 0004,00000004 0008,00000008 0010,00000010 0020,00000020 0040,00000040 0080,00000080 0100,00000100 0200,00000200 0400,00000400 0800,00000800 1000,00001000 2000,00002000 4000,00004000 8000,00008000 0000,00010000 0000,00020000 0000,00040000 0000,00080000 0000,00100000 0000,00200000 0000,00400000 0000,00800000 0000,01000000 0000,02000000 0000,04000000 0000,08000000 0000,10000000 0000,20000000 0000,40000000 0000,80000000 Signed-off-by: Ben Serebrin <serebrin@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Jim Mattson <jmattson@google.com> Acked-by: Venkatesh Srinivas <venkateshs@google.com> Effort: kvm --- drivers/net/virtio_net.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 765c2d6358da..0dc3a102bfc4 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1502,20 +1502,44 @@ static void virtnet_set_affinity(struct virtnet_info *vi) * queue pairs, we let the queue pairs to be private to one cpu by * setting the affinity hint to eliminate the contention. */ - if (vi->curr_queue_pairs == 1 || - vi->max_queue_pairs != num_online_cpus()) { + if (vi->curr_queue_pairs == 1) { virtnet_clean_affinity(vi, -1); return; } + /* If there are more cpus than queues, then assign the queues' + * interrupts to the first cpus until we run out. + */ i = 0; for_each_online_cpu(cpu) { + if (i == vi->max_queue_pairs) + break; virtqueue_set_affinity(vi->rq[i].vq, cpu); virtqueue_set_affinity(vi->sq[i].vq, cpu); - netif_set_xps_queue(vi->dev, cpumask_of(cpu), i); i++; } + /* Stripe the XPS affinities across the online CPUs. + * Hyperthread pairs are typically assigned such that Linux's + * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause + * hyperthread twins to share TX queues, in the case where there are + * more cpus than queues. + */ + for (i = 0; i < vi->max_queue_pairs; i++) { + struct cpumask mask; + int skip = i; + + cpumask_clear(&mask); + for_each_online_cpu(cpu) { + while (skip--) + cpu = cpumask_next(cpu, cpu_online_mask); + if (cpu < num_possible_cpus()) + cpumask_set_cpu(cpu, &mask); + skip = vi->max_queue_pairs - 1; + } + netif_set_xps_queue(vi->dev, &mask, i); + } + vi->affinity_hint_set = true; } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs 2017-02-03 6:19 [PATCH net-next] virtio: Fix affinity for >32 VCPUs Ben Serebrin @ 2017-02-03 15:07 ` Michael S. Tsirkin 2017-02-03 18:22 ` Benjamin Serebrin 2017-02-06 7:24 ` Jason Wang 2017-02-06 10:06 ` Christian Borntraeger 2 siblings, 1 reply; 11+ messages in thread From: Michael S. Tsirkin @ 2017-02-03 15:07 UTC (permalink / raw) To: Ben Serebrin; +Cc: netdev, jasowang, davem, willemb, venkateshs, jmattson On Thu, Feb 02, 2017 at 10:19:05PM -0800, Ben Serebrin wrote: > From: Benjamin Serebrin <serebrin@google.com> > > If the number of virtio queue pairs is not equal to the > number of VCPUs, the virtio guest driver doesn't assign > any CPU affinity for the queue interrupts or the xps > aggregation interrupt. > > Google Compute Engine currently provides 1 queue pair for > every VCPU, but limits that at a maximum of 32 queue pairs. > > This code assigns interrupt affinity even when there are more than > 32 VCPUs. > > Tested: > > (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2) > > Without the fix we see all queues affinitized to all CPUs: > > cd /proc/irq > for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done > 0-63 > [...] > 0-63 > > and we see all TX queues' xps_cpus affinitzed to no cores: > > for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done > 00000000,00000000 > [...] > 00000000,00000000 > > With the fix, we see each queue assigned to the a single core, > and xps affinity set to 1 unique cpu per TX queue. > > 64 VCPU: > > cd /proc/irq > for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done > > 0-63 > 0 > 0 > 1 > 1 > 2 > 2 > 3 > 3 > 4 > 4 > 5 > 5 > 6 > 6 > 7 > 7 > 8 > 8 > 9 > 9 > 10 > 10 > 11 > 11 > 12 > 12 > 13 > 13 > 14 > 14 > 15 > 15 > 16 > 16 > 17 > 17 > 18 > 18 > 19 > 19 > 20 > 20 > 21 > 21 > 22 > 22 > 23 > 23 > 24 > 24 > 25 > 25 > 26 > 26 > 27 > 27 > 28 > 28 > 29 > 29 > 30 > 30 > 31 > 31 > 0-63 > 0-63 > 0-63 > 0-63 > > cd /sys/class/net/eth0/queues > for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done > > 00000001,00000001 > 00000002,00000002 > 00000004,00000004 > 00000008,00000008 > 00000010,00000010 > 00000020,00000020 > 00000040,00000040 > 00000080,00000080 > 00000100,00000100 > 00000200,00000200 > 00000400,00000400 > 00000800,00000800 > 00001000,00001000 > 00002000,00002000 > 00004000,00004000 > 00008000,00008000 > 00010000,00010000 > 00020000,00020000 > 00040000,00040000 > 00080000,00080000 > 00100000,00100000 > 00200000,00200000 > 00400000,00400000 > 00800000,00800000 > 01000000,01000000 > 02000000,02000000 > 04000000,04000000 > 08000000,08000000 > 10000000,10000000 > 20000000,20000000 > 40000000,40000000 > 80000000,80000000 > > 48 VCPU: > > cd /proc/irq > for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done > 0-47 > 0 > 0 > 1 > 1 > 2 > 2 > 3 > 3 > 4 > 4 > 5 > 5 > 6 > 6 > 7 > 7 > 8 > 8 > 9 > 9 > 10 > 10 > 11 > 11 > 12 > 12 > 13 > 13 > 14 > 14 > 15 > 15 > 16 > 16 > 17 > 17 > 18 > 18 > 19 > 19 > 20 > 20 > 21 > 21 > 22 > 22 > 23 > 23 > 24 > 24 > 25 > 25 > 26 > 26 > 27 > 27 > 28 > 28 > 29 > 29 > 30 > 30 > 31 > 31 > 0-47 > 0-47 > 0-47 > 0-47 > > cd /sys/class/net/eth0/queues > for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done > > 0001,00000001 > 0002,00000002 > 0004,00000004 > 0008,00000008 > 0010,00000010 > 0020,00000020 > 0040,00000040 > 0080,00000080 > 0100,00000100 > 0200,00000200 > 0400,00000400 > 0800,00000800 > 1000,00001000 > 2000,00002000 > 4000,00004000 > 8000,00008000 > 0000,00010000 > 0000,00020000 > 0000,00040000 > 0000,00080000 > 0000,00100000 > 0000,00200000 > 0000,00400000 > 0000,00800000 > 0000,01000000 > 0000,02000000 > 0000,04000000 > 0000,08000000 > 0000,10000000 > 0000,20000000 > 0000,40000000 > 0000,80000000 > > Signed-off-by: Ben Serebrin <serebrin@google.com> > Acked-by: Willem de Bruijn <willemb@google.com> > Acked-by: Jim Mattson <jmattson@google.com> > Acked-by: Venkatesh Srinivas <venkateshs@google.com> I wonder why not just do it in userspace though. It would be nice to mention this in the commit log. Are we sure this distribution is best for all workloads? While irqbalancer is hardly a perfect oracle it does manage to balance the load somewhat, and IIUC kernel affinities would break that. Thoughts? > Effort: kvm > --- > drivers/net/virtio_net.c | 30 +++++++++++++++++++++++++++--- > 1 file changed, 27 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 765c2d6358da..0dc3a102bfc4 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1502,20 +1502,44 @@ static void virtnet_set_affinity(struct virtnet_info *vi) > * queue pairs, we let the queue pairs to be private to one cpu by > * setting the affinity hint to eliminate the contention. > */ > - if (vi->curr_queue_pairs == 1 || > - vi->max_queue_pairs != num_online_cpus()) { > + if (vi->curr_queue_pairs == 1) { > virtnet_clean_affinity(vi, -1); > return; > } > > + /* If there are more cpus than queues, then assign the queues' > + * interrupts to the first cpus until we run out. > + */ > i = 0; > for_each_online_cpu(cpu) { > + if (i == vi->max_queue_pairs) > + break; > virtqueue_set_affinity(vi->rq[i].vq, cpu); > virtqueue_set_affinity(vi->sq[i].vq, cpu); > - netif_set_xps_queue(vi->dev, cpumask_of(cpu), i); > i++; > } > > + /* Stripe the XPS affinities across the online CPUs. > + * Hyperthread pairs are typically assigned such that Linux's > + * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause > + * hyperthread twins to share TX queues, in the case where there are > + * more cpus than queues. > + */ > + for (i = 0; i < vi->max_queue_pairs; i++) { > + struct cpumask mask; > + int skip = i; > + > + cpumask_clear(&mask); > + for_each_online_cpu(cpu) { > + while (skip--) > + cpu = cpumask_next(cpu, cpu_online_mask); > + if (cpu < num_possible_cpus()) > + cpumask_set_cpu(cpu, &mask); > + skip = vi->max_queue_pairs - 1; > + } > + netif_set_xps_queue(vi->dev, &mask, i); > + } > + > vi->affinity_hint_set = true; > } > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs 2017-02-03 15:07 ` Michael S. Tsirkin @ 2017-02-03 18:22 ` Benjamin Serebrin 2017-02-03 18:31 ` Willem de Bruijn 2017-02-03 18:33 ` Rick Jones 0 siblings, 2 replies; 11+ messages in thread From: Benjamin Serebrin @ 2017-02-03 18:22 UTC (permalink / raw) To: Michael S. Tsirkin Cc: netdev, jasowang, David Miller, Willem de Bruijn, Venkatesh Srinivas, James Mattson Thanks, Michael, I'll put this text in the commit log: XPS settings aren't write-able from userspace, so the only way I know to fix XPS is in the driver. The interrupt affinity settings could be set by irqbalancer, yes, but it's not always enabled (by intention) in all guests. I just confirmed that on an unpatched 64VCPU VM with no irqbalancer, all virtio-net interrupts end up on core 0, and on a patched 64VCPU, interrupts end up on the core whose number matches the queue number. This patch is just extending the existing behavior in the interrupt affinity assignment loop, and I think it stands on its own. It actually sparked some internal discussion about further intelligence in XPS and interrupt affinity, and we'll do some experiments and come back with further patches. On Fri, Feb 3, 2017 at 7:07 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Thu, Feb 02, 2017 at 10:19:05PM -0800, Ben Serebrin wrote: >> From: Benjamin Serebrin <serebrin@google.com> >> >> If the number of virtio queue pairs is not equal to the >> number of VCPUs, the virtio guest driver doesn't assign >> any CPU affinity for the queue interrupts or the xps >> aggregation interrupt. >> >> Google Compute Engine currently provides 1 queue pair for >> every VCPU, but limits that at a maximum of 32 queue pairs. >> >> This code assigns interrupt affinity even when there are more than >> 32 VCPUs. >> >> Tested: >> >> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2) >> >> Without the fix we see all queues affinitized to all CPUs: >> >> cd /proc/irq >> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done >> 0-63 >> [...] >> 0-63 >> >> and we see all TX queues' xps_cpus affinitzed to no cores: >> >> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done >> 00000000,00000000 >> [...] >> 00000000,00000000 >> >> With the fix, we see each queue assigned to the a single core, >> and xps affinity set to 1 unique cpu per TX queue. >> >> 64 VCPU: >> >> cd /proc/irq >> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done >> >> 0-63 >> 0 >> 0 >> 1 >> 1 >> 2 >> 2 >> 3 >> 3 >> 4 >> 4 >> 5 >> 5 >> 6 >> 6 >> 7 >> 7 >> 8 >> 8 >> 9 >> 9 >> 10 >> 10 >> 11 >> 11 >> 12 >> 12 >> 13 >> 13 >> 14 >> 14 >> 15 >> 15 >> 16 >> 16 >> 17 >> 17 >> 18 >> 18 >> 19 >> 19 >> 20 >> 20 >> 21 >> 21 >> 22 >> 22 >> 23 >> 23 >> 24 >> 24 >> 25 >> 25 >> 26 >> 26 >> 27 >> 27 >> 28 >> 28 >> 29 >> 29 >> 30 >> 30 >> 31 >> 31 >> 0-63 >> 0-63 >> 0-63 >> 0-63 >> >> cd /sys/class/net/eth0/queues >> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done >> >> 00000001,00000001 >> 00000002,00000002 >> 00000004,00000004 >> 00000008,00000008 >> 00000010,00000010 >> 00000020,00000020 >> 00000040,00000040 >> 00000080,00000080 >> 00000100,00000100 >> 00000200,00000200 >> 00000400,00000400 >> 00000800,00000800 >> 00001000,00001000 >> 00002000,00002000 >> 00004000,00004000 >> 00008000,00008000 >> 00010000,00010000 >> 00020000,00020000 >> 00040000,00040000 >> 00080000,00080000 >> 00100000,00100000 >> 00200000,00200000 >> 00400000,00400000 >> 00800000,00800000 >> 01000000,01000000 >> 02000000,02000000 >> 04000000,04000000 >> 08000000,08000000 >> 10000000,10000000 >> 20000000,20000000 >> 40000000,40000000 >> 80000000,80000000 >> >> 48 VCPU: >> >> cd /proc/irq >> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done >> 0-47 >> 0 >> 0 >> 1 >> 1 >> 2 >> 2 >> 3 >> 3 >> 4 >> 4 >> 5 >> 5 >> 6 >> 6 >> 7 >> 7 >> 8 >> 8 >> 9 >> 9 >> 10 >> 10 >> 11 >> 11 >> 12 >> 12 >> 13 >> 13 >> 14 >> 14 >> 15 >> 15 >> 16 >> 16 >> 17 >> 17 >> 18 >> 18 >> 19 >> 19 >> 20 >> 20 >> 21 >> 21 >> 22 >> 22 >> 23 >> 23 >> 24 >> 24 >> 25 >> 25 >> 26 >> 26 >> 27 >> 27 >> 28 >> 28 >> 29 >> 29 >> 30 >> 30 >> 31 >> 31 >> 0-47 >> 0-47 >> 0-47 >> 0-47 >> >> cd /sys/class/net/eth0/queues >> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done >> >> 0001,00000001 >> 0002,00000002 >> 0004,00000004 >> 0008,00000008 >> 0010,00000010 >> 0020,00000020 >> 0040,00000040 >> 0080,00000080 >> 0100,00000100 >> 0200,00000200 >> 0400,00000400 >> 0800,00000800 >> 1000,00001000 >> 2000,00002000 >> 4000,00004000 >> 8000,00008000 >> 0000,00010000 >> 0000,00020000 >> 0000,00040000 >> 0000,00080000 >> 0000,00100000 >> 0000,00200000 >> 0000,00400000 >> 0000,00800000 >> 0000,01000000 >> 0000,02000000 >> 0000,04000000 >> 0000,08000000 >> 0000,10000000 >> 0000,20000000 >> 0000,40000000 >> 0000,80000000 >> >> Signed-off-by: Ben Serebrin <serebrin@google.com> >> Acked-by: Willem de Bruijn <willemb@google.com> >> Acked-by: Jim Mattson <jmattson@google.com> >> Acked-by: Venkatesh Srinivas <venkateshs@google.com> > > I wonder why not just do it in userspace though. > It would be nice to mention this in the commit log. > Are we sure this distribution is best for all workloads? > While irqbalancer is hardly a perfect oracle it does > manage to balance the load somewhat, and IIUC kernel > affinities would break that. > Thoughts? > > >> Effort: kvm >> --- >> drivers/net/virtio_net.c | 30 +++++++++++++++++++++++++++--- >> 1 file changed, 27 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index 765c2d6358da..0dc3a102bfc4 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -1502,20 +1502,44 @@ static void virtnet_set_affinity(struct virtnet_info *vi) >> * queue pairs, we let the queue pairs to be private to one cpu by >> * setting the affinity hint to eliminate the contention. >> */ >> - if (vi->curr_queue_pairs == 1 || >> - vi->max_queue_pairs != num_online_cpus()) { >> + if (vi->curr_queue_pairs == 1) { >> virtnet_clean_affinity(vi, -1); >> return; >> } >> >> + /* If there are more cpus than queues, then assign the queues' >> + * interrupts to the first cpus until we run out. >> + */ >> i = 0; >> for_each_online_cpu(cpu) { >> + if (i == vi->max_queue_pairs) >> + break; >> virtqueue_set_affinity(vi->rq[i].vq, cpu); >> virtqueue_set_affinity(vi->sq[i].vq, cpu); >> - netif_set_xps_queue(vi->dev, cpumask_of(cpu), i); >> i++; >> } >> >> + /* Stripe the XPS affinities across the online CPUs. >> + * Hyperthread pairs are typically assigned such that Linux's >> + * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause >> + * hyperthread twins to share TX queues, in the case where there are >> + * more cpus than queues. >> + */ >> + for (i = 0; i < vi->max_queue_pairs; i++) { >> + struct cpumask mask; >> + int skip = i; >> + >> + cpumask_clear(&mask); >> + for_each_online_cpu(cpu) { >> + while (skip--) >> + cpu = cpumask_next(cpu, cpu_online_mask); >> + if (cpu < num_possible_cpus()) >> + cpumask_set_cpu(cpu, &mask); >> + skip = vi->max_queue_pairs - 1; >> + } >> + netif_set_xps_queue(vi->dev, &mask, i); >> + } >> + >> vi->affinity_hint_set = true; >> } >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs 2017-02-03 18:22 ` Benjamin Serebrin @ 2017-02-03 18:31 ` Willem de Bruijn 2017-02-03 18:34 ` Rick Jones 2017-02-03 18:33 ` Rick Jones 1 sibling, 1 reply; 11+ messages in thread From: Willem de Bruijn @ 2017-02-03 18:31 UTC (permalink / raw) To: Benjamin Serebrin Cc: Michael S. Tsirkin, Network Development, jasowang, David Miller, Willem de Bruijn, Venkatesh Srinivas, James Mattson > XPS settings aren't write-able from userspace, so the only way I know > to fix XPS is in the driver. It is writable by privileged users. >> I wonder why not just do it in userspace though. Configuring interrupts and xps from userspace at boot is more robust, as device driver defaults can change. But especially for customers who are unaware of these settings, choosing sane defaults won't hurt. >> It would be nice to mention this in the commit log. >> Are we sure this distribution is best for all workloads? Overall, this minimizes contention between cores, so it should be broadly preferable. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs 2017-02-03 18:31 ` Willem de Bruijn @ 2017-02-03 18:34 ` Rick Jones 2017-02-03 20:25 ` Willem de Bruijn 0 siblings, 1 reply; 11+ messages in thread From: Rick Jones @ 2017-02-03 18:34 UTC (permalink / raw) To: Willem de Bruijn, Benjamin Serebrin Cc: Michael S. Tsirkin, Network Development, jasowang, David Miller, Willem de Bruijn, Venkatesh Srinivas, James Mattson On 02/03/2017 10:31 AM, Willem de Bruijn wrote: > Configuring interrupts and xps from userspace at boot is more robust, > as device driver defaults can change. But especially for customers who > are unaware of these settings, choosing sane defaults won't hurt. The devil is in finding the sane defaults. For example, the issues we've seen with VMs sending traffic getting reordered when the driver took it upon itself to enable xps. rick jones ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs 2017-02-03 18:34 ` Rick Jones @ 2017-02-03 20:25 ` Willem de Bruijn 0 siblings, 0 replies; 11+ messages in thread From: Willem de Bruijn @ 2017-02-03 20:25 UTC (permalink / raw) To: Rick Jones Cc: Benjamin Serebrin, Michael S. Tsirkin, Network Development, jasowang, David Miller, Willem de Bruijn, Venkatesh Srinivas, James Mattson On Fri, Feb 3, 2017 at 1:34 PM, Rick Jones <rick.jones2@hpe.com> wrote: > On 02/03/2017 10:31 AM, Willem de Bruijn wrote: >> >> Configuring interrupts and xps from userspace at boot is more robust, >> as device driver defaults can change. But especially for customers who >> are unaware of these settings, choosing sane defaults won't hurt. > > > The devil is in finding the sane defaults. Agreed, of course. > For example, the issues we've > seen with VMs sending traffic getting reordered when the driver took it upon > itself to enable xps. That particular reordering issue [1] was with xps on the host, where vm thread migration causes flows to exit on different tx queues because ooo_okay is not supported for flows from tap devices. Within a guest, on the other hand, xps settings should not cause reordering of packets from the same flow, as ooo_okay is active for tcp flows. [1] https://patchwork.ozlabs.org/patch/662913/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs 2017-02-03 18:22 ` Benjamin Serebrin 2017-02-03 18:31 ` Willem de Bruijn @ 2017-02-03 18:33 ` Rick Jones 1 sibling, 0 replies; 11+ messages in thread From: Rick Jones @ 2017-02-03 18:33 UTC (permalink / raw) To: Benjamin Serebrin, Michael S. Tsirkin Cc: netdev, jasowang, David Miller, Willem de Bruijn, Venkatesh Srinivas, James Mattson On 02/03/2017 10:22 AM, Benjamin Serebrin wrote: > Thanks, Michael, I'll put this text in the commit log: > > XPS settings aren't write-able from userspace, so the only way I know > to fix XPS is in the driver. ?? root@np-cp1-c0-m1-mgmt:/home/stack# cat /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/net/hed0/queues/tx-0/xps_cpus 00000000,00000001 root@np-cp1-c0-m1-mgmt:/home/stack# echo 0 > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/net/hed0/queues/tx-0/xps_cpus root@np-cp1-c0-m1-mgmt:/home/stack# cat /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/net/hed0/queues/tx-0/xps_cpus 00000000,00000000 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs 2017-02-03 6:19 [PATCH net-next] virtio: Fix affinity for >32 VCPUs Ben Serebrin 2017-02-03 15:07 ` Michael S. Tsirkin @ 2017-02-06 7:24 ` Jason Wang 2017-02-06 7:28 ` Benjamin Serebrin 2017-02-06 10:06 ` Christian Borntraeger 2 siblings, 1 reply; 11+ messages in thread From: Jason Wang @ 2017-02-06 7:24 UTC (permalink / raw) To: Ben Serebrin, netdev, mst, davem, willemb, venkateshs, jmattson On 2017年02月03日 14:19, Ben Serebrin wrote: > From: Benjamin Serebrin <serebrin@google.com> > > If the number of virtio queue pairs is not equal to the > number of VCPUs, the virtio guest driver doesn't assign > any CPU affinity for the queue interrupts or the xps > aggregation interrupt. So this in fact is not a affinity fixing for #cpus > 32 but adding affinity for #cpus != #queue pairs. > Google Compute Engine currently provides 1 queue pair for > every VCPU, but limits that at a maximum of 32 queue pairs. > > This code assigns interrupt affinity even when there are more than > 32 VCPUs. > > Tested: > > (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2) > > Without the fix we see all queues affinitized to all CPUs: [...] > > + /* If there are more cpus than queues, then assign the queues' > + * interrupts to the first cpus until we run out. > + */ > i = 0; > for_each_online_cpu(cpu) { > + if (i == vi->max_queue_pairs) > + break; > virtqueue_set_affinity(vi->rq[i].vq, cpu); > virtqueue_set_affinity(vi->sq[i].vq, cpu); > - netif_set_xps_queue(vi->dev, cpumask_of(cpu), i); > i++; > } > > + /* Stripe the XPS affinities across the online CPUs. > + * Hyperthread pairs are typically assigned such that Linux's > + * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause > + * hyperthread twins to share TX queues, in the case where there are > + * more cpus than queues. Since we use combined queue pairs, why not use the same policy for RX? Thanks > + */ > + for (i = 0; i < vi->max_queue_pairs; i++) { > + struct cpumask mask; > + int skip = i; > + > + cpumask_clear(&mask); > + for_each_online_cpu(cpu) { > + while (skip--) > + cpu = cpumask_next(cpu, cpu_online_mask); > + if (cpu < num_possible_cpus()) > + cpumask_set_cpu(cpu, &mask); > + skip = vi->max_queue_pairs - 1; > + } > + netif_set_xps_queue(vi->dev, &mask, i); > + } > + > vi->affinity_hint_set = true; > } > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs 2017-02-06 7:24 ` Jason Wang @ 2017-02-06 7:28 ` Benjamin Serebrin 2017-02-06 7:45 ` Jason Wang 0 siblings, 1 reply; 11+ messages in thread From: Benjamin Serebrin @ 2017-02-06 7:28 UTC (permalink / raw) To: Jason Wang Cc: netdev, Michael S. Tsirkin, David Miller, Willem de Bruijn, Venkatesh Srinivas, James Mattson On Sun, Feb 5, 2017 at 11:24 PM, Jason Wang <jasowang@redhat.com> wrote: > > > On 2017年02月03日 14:19, Ben Serebrin wrote: >> >> From: Benjamin Serebrin <serebrin@google.com> >> >> If the number of virtio queue pairs is not equal to the >> number of VCPUs, the virtio guest driver doesn't assign >> any CPU affinity for the queue interrupts or the xps >> aggregation interrupt. > > > So this in fact is not a affinity fixing for #cpus > 32 but adding affinity > for #cpus != #queue pairs. Fair enough. I'll adjust the title line in the subsequent version. > >> Google Compute Engine currently provides 1 queue pair for >> every VCPU, but limits that at a maximum of 32 queue pairs. >> >> This code assigns interrupt affinity even when there are more than >> 32 VCPUs. >> >> Tested: >> >> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2) >> >> Without the fix we see all queues affinitized to all CPUs: > > > [...] > >> + /* If there are more cpus than queues, then assign the queues' >> + * interrupts to the first cpus until we run out. >> + */ >> i = 0; >> for_each_online_cpu(cpu) { >> + if (i == vi->max_queue_pairs) >> + break; >> virtqueue_set_affinity(vi->rq[i].vq, cpu); >> virtqueue_set_affinity(vi->sq[i].vq, cpu); >> - netif_set_xps_queue(vi->dev, cpumask_of(cpu), i); >> i++; >> } >> + /* Stripe the XPS affinities across the online CPUs. >> + * Hyperthread pairs are typically assigned such that Linux's >> + * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause >> + * hyperthread twins to share TX queues, in the case where there >> are >> + * more cpus than queues. > > > Since we use combined queue pairs, why not use the same policy for RX? XPS is for transmit only. > Thanks > > >> + */ >> + for (i = 0; i < vi->max_queue_pairs; i++) { >> + struct cpumask mask; >> + int skip = i; >> + >> + cpumask_clear(&mask); >> + for_each_online_cpu(cpu) { >> + while (skip--) >> + cpu = cpumask_next(cpu, cpu_online_mask); >> + if (cpu < num_possible_cpus()) >> + cpumask_set_cpu(cpu, &mask); >> + skip = vi->max_queue_pairs - 1; >> + } >> + netif_set_xps_queue(vi->dev, &mask, i); >> + } >> + >> vi->affinity_hint_set = true; >> } >> > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs 2017-02-06 7:28 ` Benjamin Serebrin @ 2017-02-06 7:45 ` Jason Wang 0 siblings, 0 replies; 11+ messages in thread From: Jason Wang @ 2017-02-06 7:45 UTC (permalink / raw) To: Benjamin Serebrin Cc: netdev, Michael S. Tsirkin, David Miller, Willem de Bruijn, Venkatesh Srinivas, James Mattson On 2017年02月06日 15:28, Benjamin Serebrin wrote: > On Sun, Feb 5, 2017 at 11:24 PM, Jason Wang<jasowang@redhat.com> wrote: >> On 2017年02月03日 14:19, Ben Serebrin wrote: >>> From: Benjamin Serebrin<serebrin@google.com> >>> >>> If the number of virtio queue pairs is not equal to the >>> number of VCPUs, the virtio guest driver doesn't assign >>> any CPU affinity for the queue interrupts or the xps >>> aggregation interrupt. >> So this in fact is not a affinity fixing for #cpus > 32 but adding affinity >> for #cpus != #queue pairs. > Fair enough. I'll adjust the title line in the subsequent version. > > >>> Google Compute Engine currently provides 1 queue pair for >>> every VCPU, but limits that at a maximum of 32 queue pairs. >>> >>> This code assigns interrupt affinity even when there are more than >>> 32 VCPUs. >>> >>> Tested: >>> >>> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2) >>> >>> Without the fix we see all queues affinitized to all CPUs: >> [...] >> >>> + /* If there are more cpus than queues, then assign the queues' >>> + * interrupts to the first cpus until we run out. >>> + */ >>> i = 0; >>> for_each_online_cpu(cpu) { >>> + if (i == vi->max_queue_pairs) >>> + break; >>> virtqueue_set_affinity(vi->rq[i].vq, cpu); >>> virtqueue_set_affinity(vi->sq[i].vq, cpu); >>> - netif_set_xps_queue(vi->dev, cpumask_of(cpu), i); >>> i++; >>> } >>> + /* Stripe the XPS affinities across the online CPUs. >>> + * Hyperthread pairs are typically assigned such that Linux's >>> + * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause >>> + * hyperthread twins to share TX queues, in the case where there >>> are >>> + * more cpus than queues. >> Since we use combined queue pairs, why not use the same policy for RX? > XPS is for transmit only. > > Yes, but I mean, e.g consider you let hyperthread twins to share TX queues (XPS), why not share TX and RX queue interrupts (affinity)? Thanks ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs 2017-02-03 6:19 [PATCH net-next] virtio: Fix affinity for >32 VCPUs Ben Serebrin 2017-02-03 15:07 ` Michael S. Tsirkin 2017-02-06 7:24 ` Jason Wang @ 2017-02-06 10:06 ` Christian Borntraeger 2 siblings, 0 replies; 11+ messages in thread From: Christian Borntraeger @ 2017-02-06 10:06 UTC (permalink / raw) To: Ben Serebrin, netdev, mst, jasowang, davem, willemb, venkateshs, jmattson, linux-s390, linux-arch@vger.kernel.org On 02/03/2017 07:19 AM, Ben Serebrin wrote: [...] > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1502,20 +1502,44 @@ static void virtnet_set_affinity(struct virtnet_info *vi) > * queue pairs, we let the queue pairs to be private to one cpu by > * setting the affinity hint to eliminate the contention. > */ > - if (vi->curr_queue_pairs == 1 || > - vi->max_queue_pairs != num_online_cpus()) { > + if (vi->curr_queue_pairs == 1) { > virtnet_clean_affinity(vi, -1); > return; > } > > + /* If there are more cpus than queues, then assign the queues' > + * interrupts to the first cpus until we run out. > + */ > i = 0; > for_each_online_cpu(cpu) { > + if (i == vi->max_queue_pairs) > + break; > virtqueue_set_affinity(vi->rq[i].vq, cpu); > virtqueue_set_affinity(vi->sq[i].vq, cpu); > - netif_set_xps_queue(vi->dev, cpumask_of(cpu), i); > i++; > } > > + /* Stripe the XPS affinities across the online CPUs. > + * Hyperthread pairs are typically assigned such that Linux's > + * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause > + * hyperthread twins to share TX queues, in the case where there are > + * more cpus than queues. > + */ This is not always true. E.g. on s390 the SMT threads are usually paired even/odd. e.g. [cborntra@s38lp08 linux]$ lscpu -e CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED POLARIZATION ADDRESS 0 0 0 0 0 0:0:0:0 yes yes horizontal 0 1 0 0 0 0 1:1:1:1 yes yes horizontal 1 2 0 0 0 1 2:2:2:2 yes yes horizontal 2 3 0 0 0 1 3:3:3:3 yes yes horizontal 3 4 0 0 0 2 4:4:4:4 yes yes horizontal 4 5 0 0 0 2 5:5:5:5 yes yes horizontal 5 6 0 0 0 3 6:6:6:6 yes yes horizontal 6 This does not matter yet for s390 (as virtio is usally doen via the ccw bus) but maybe we should consider an future patch to provide some arch-specific striping hints. Or would it make sense to change the s390 layout for SMT twins because there is more code that expects all threads 0 at the front and all threads 1 at the end? > + for (i = 0; i < vi->max_queue_pairs; i++) { > + struct cpumask mask; > + int skip = i; > + > + cpumask_clear(&mask); > + for_each_online_cpu(cpu) { > + while (skip--) > + cpu = cpumask_next(cpu, cpu_online_mask); > + if (cpu < num_possible_cpus()) > + cpumask_set_cpu(cpu, &mask); > + skip = vi->max_queue_pairs - 1; > + } > + netif_set_xps_queue(vi->dev, &mask, i); > + } > + > vi->affinity_hint_set = true; > } > > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-02-06 10:06 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-02-03 6:19 [PATCH net-next] virtio: Fix affinity for >32 VCPUs Ben Serebrin 2017-02-03 15:07 ` Michael S. Tsirkin 2017-02-03 18:22 ` Benjamin Serebrin 2017-02-03 18:31 ` Willem de Bruijn 2017-02-03 18:34 ` Rick Jones 2017-02-03 20:25 ` Willem de Bruijn 2017-02-03 18:33 ` Rick Jones 2017-02-06 7:24 ` Jason Wang 2017-02-06 7:28 ` Benjamin Serebrin 2017-02-06 7:45 ` Jason Wang 2017-02-06 10:06 ` Christian Borntraeger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).