[PATCH V3 0/3] basic busy polling support for vhost

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V3 0/3] basic busy polling support for vhost_net
@ 2016-02-26  8:42 ` Jason Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-26  8:42 UTC (permalink / raw)
  To: kvm, mst, virtualization, netdev, linux-kernel; +Cc: yang.zhang.wz, RAPOPORT

This series tries to add basic busy polling for vhost net. The idea is
simple: at the end of tx/rx processing, busy polling for new tx added
descriptor and rx receive socket for a while. The maximum number of
time (in us) could be spent on busy polling was specified ioctl.

Test A were done through:

- 50 us as busy loop timeout
- Netperf 2.6
- Two machines with back to back connected mlx4
- Guest with 8 vcpus and 1 queue

Results:
- TCP_RR was imporved obviously (at most 27%). And cpu utilizaton was
  also improved in this case.
- No obvious differences in Guest RX throughput.
- Guest TX throughput was also improved.

TCP_RR:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
    1/     1/  +27%/    0%/  +27%/  +27%/  +25%
    1/    50/   +2%/   +1%/   +2%/   +2%/   -4%
    1/   100/   +2%/   +1%/   +3%/   +3%/  -14%
    1/   200/   +2%/   +2%/   +5%/   +5%/  -15%
   64/     1/  +20%/  -13%/  +20%/  +20%/  +20%
   64/    50/  +17%/  +14%/  +16%/  +16%/  -11%
   64/   100/  +14%/  +12%/  +14%/  +14%/  -35%
   64/   200/  +16%/  +15%/   +9%/   +9%/  -28%
  256/     1/  +19%/   -6%/  +19%/  +19%/  +18%
  256/    50/  +18%/  +15%/  +16%/  +16%/   +3%
  256/   100/  +11%/   +9%/  +12%/  +12%/   -1%
  256/   200/   +5%/   +8%/   +4%/   +4%/  +64%
  512/     1/  +20%/    0%/  +20%/  +20%/   -2%
  512/    50/  +12%/  +10%/  +12%/  +12%/   +8%
  512/   100/  +11%/   +7%/  +10%/  +10%/   -5%
  512/   200/   +3%/   +2%/   +3%/   +3%/   -5%
 1024/     1/  +19%/   -2%/  +19%/  +19%/  +18%
 1024/    50/  +13%/  +10%/  +12%/  +12%/    0%
 1024/   100/   +9%/   +8%/   +8%/   +8%/  -16%
 1024/   200/   +3%/   +4%/   +3%/   +3%/  -14%
 Guest RX:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
   64/     1/  -12%/  -10%/   +2%/   +1%/  +42%
   64/     4/   -3%/   -5%/   +2%/   -1%/    0%
   64/     8/   -1%/   -5%/   -1%/   -2%/    0%
  512/     1/   +5%/  -13%/   +6%/   +9%/  +17%
  512/     4/   -3%/   -9%/   +6%/   +4%/  -14%
  512/     8/   -2%/   -7%/    0%/    0%/   -1%
 1024/     1/  +18%/  +31%/  -12%/  -11%/  -31%
 1024/     4/    0%/   -9%/   -1%/   -6%/   -7%
 1024/     8/   -3%/   -8%/   -2%/   -4%/    0%
 2048/     1/    0%/   -1%/    0%/   -4%/   +5%
 2048/     4/    0%/   +2%/    0%/    0%/    0%
 2048/     8/    0%/   -6%/    0%/   -3%/   -1%
 4096/     1/   -1%/   +2%/  -14%/   -5%/   +8%
 4096/     4/    0%/   +1%/    0%/   +1%/   -1%
 4096/     8/   -1%/   -1%/   -2%/   -2%/   -3%
16384/     1/    0%/    0%/   +4%/   +5%/    0%
16384/     4/    0%/   +5%/   +7%/   +9%/    0%
16384/     8/   +1%/   +1%/   +3%/   +3%/   +2%
65535/     1/    0%/  +12%/   -1%/   +2%/   -2%
65535/     4/    0%/    0%/   -2%/   -2%/   +2%
65535/     8/   -1%/   -1%/   -4%/   -4%/    0%
Guest TX:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
   64/     1/  -16%/  -21%/   -2%/  -12%/   +1%
   64/     4/   -6%/   -2%/   -1%/   +6%/   -7%
   64/     8/   +4%/   +4%/   -2%/   +1%/  +30%
  512/     1/  -32%/  -33%/  -11%/  +62%/ +314%
  512/     4/  +30%/  +20%/  -22%/  -17%/  -14%
  512/     8/  +24%/  +12%/  -21%/  -10%/   -6%
 1024/     1/   +1%/   -7%/   +2%/  +51%/  +75%
 1024/     4/  +10%/   +9%/  -11%/  -19%/  -10%
 1024/     8/  +13%/   +7%/  -11%/  -13%/  -12%
 2048/     1/  +17%/    0%/   +1%/  +35%/  +78%
 2048/     4/  +15%/  +14%/  -17%/  -24%/  -15%
 2048/     8/  +11%/   +9%/  -15%/  -20%/  -12%
 4096/     1/   +3%/   -7%/    0%/  +21%/  +48%
 4096/     4/   +3%/   +4%/   -9%/  -19%/  +41%
 4096/     8/  +15%/  +13%/  -33%/  -28%/  -15%
16384/     1/   +5%/   -8%/   -4%/  -10%/ +323%
16384/     4/  +13%/   +5%/  -15%/  -11%/ +147%
16384/     8/   +8%/   +6%/  -25%/  -27%/  -31%
65535/     1/   +8%/    0%/   +5%/    0%/  +45%
65535/     4/  +10%/   +1%/   +7%/   -8%/ +151%
65535/     8/   +5%/    0%/   +1%/  -16%/  -29%

Test B were done through:

- 50us as busy loop timeout
- Netperf 2.6
- Two machines with back to back connected ixgbe
- Two guests each wich 1 vcpu and 1 queue
- pin two vhost threads to the same cpu on host to simulate the cpu
contending

Results:
- In this radical case, we can still get at most 14% improvement on
TCP_RR.
- For guest tx stream, minor improvemnt with at most 5% regression in
one byte case. For guest rx stream, at most 5% regression were seen.

Guest TX:
size /-+% /
1 /-5.55%/
64 /+1.11%/
256 /+2.33%/
512 /-0.03%/
1024 /+1.14%/
4096 /+0.00%/
16384/+0.00%/

Guest RX:
size /-+% /
1 /-5.11%/
64 /-0.55%/
256 /-2.35%/
512 /-3.39%/
1024 /+6.8% /
4096 /-0.01%/
16384/+0.00%/

TCP_RR:
size /-+% /
1 /+9.79% /
64 /+4.51% /
256 /+6.47% /
512 /-3.37% /
1024 /+6.15% /
4096 /+14.88%/
16384/-2.23% /

Changes from V2:
- rename vhost_vq_more_avail() to vhost_vq_avail_empty(). And return
  false we __get_user() fails.
- do not bother premmptions/timers for good path.
- use vhost_vring_state as ioctl parameter instead of reinveting a new
  one.
- add the unit of timeout (us) to the comment of new added ioctls

Changes from V1:
- remove the buggy vq_error() in vhost_vq_more_avail().
- leave vhost_enable_notify() untouched.

Changes from RFC V3:
- small tweak on the code to avoid multiple duplicate conditions in
  critical path when busy loop is not enabled.
- add the test result of multiple VMs

Changes from RFC V2:
- poll also at the end of rx handling
- factor out the polling logic and optimize the code a little bit
- add two ioctls to get and set the busy poll timeout
- test on ixgbe (which can give more stable and reproducable numbers)
  instead of mlx4.

Changes from RFC V1:
- add a comment for vhost_has_work() to explain why it could be
  lockless
- add param description for busyloop_timeout
- split out the busy polling logic into a new helper
- check and exit the loop when there's a pending signal
- disable preemption during busy looping to make sure lock_clock() was
  correctly used.

Jason Wang (3):
  vhost: introduce vhost_has_work()
  vhost: introduce vhost_vq_avail_empty()
  vhost_net: basic polling support

 drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/vhost/vhost.c      | 35 ++++++++++++++++++++
 drivers/vhost/vhost.h      |  3 ++
 include/uapi/linux/vhost.h |  6 ++++
 4 files changed, 118 insertions(+), 5 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH V3 0/3] basic busy polling support for vhost_net
@ 2016-02-26  8:42 ` Jason Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-26  8:42 UTC (permalink / raw)
  To: kvm, mst, virtualization, netdev, linux-kernel
  Cc: RAPOPORT, yang.zhang.wz, Jason Wang

This series tries to add basic busy polling for vhost net. The idea is
simple: at the end of tx/rx processing, busy polling for new tx added
descriptor and rx receive socket for a while. The maximum number of
time (in us) could be spent on busy polling was specified ioctl.

Test A were done through:

- 50 us as busy loop timeout
- Netperf 2.6
- Two machines with back to back connected mlx4
- Guest with 8 vcpus and 1 queue

Results:
- TCP_RR was imporved obviously (at most 27%). And cpu utilizaton was
  also improved in this case.
- No obvious differences in Guest RX throughput.
- Guest TX throughput was also improved.

TCP_RR:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
    1/     1/  +27%/    0%/  +27%/  +27%/  +25%
    1/    50/   +2%/   +1%/   +2%/   +2%/   -4%
    1/   100/   +2%/   +1%/   +3%/   +3%/  -14%
    1/   200/   +2%/   +2%/   +5%/   +5%/  -15%
   64/     1/  +20%/  -13%/  +20%/  +20%/  +20%
   64/    50/  +17%/  +14%/  +16%/  +16%/  -11%
   64/   100/  +14%/  +12%/  +14%/  +14%/  -35%
   64/   200/  +16%/  +15%/   +9%/   +9%/  -28%
  256/     1/  +19%/   -6%/  +19%/  +19%/  +18%
  256/    50/  +18%/  +15%/  +16%/  +16%/   +3%
  256/   100/  +11%/   +9%/  +12%/  +12%/   -1%
  256/   200/   +5%/   +8%/   +4%/   +4%/  +64%
  512/     1/  +20%/    0%/  +20%/  +20%/   -2%
  512/    50/  +12%/  +10%/  +12%/  +12%/   +8%
  512/   100/  +11%/   +7%/  +10%/  +10%/   -5%
  512/   200/   +3%/   +2%/   +3%/   +3%/   -5%
 1024/     1/  +19%/   -2%/  +19%/  +19%/  +18%
 1024/    50/  +13%/  +10%/  +12%/  +12%/    0%
 1024/   100/   +9%/   +8%/   +8%/   +8%/  -16%
 1024/   200/   +3%/   +4%/   +3%/   +3%/  -14%
 Guest RX:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
   64/     1/  -12%/  -10%/   +2%/   +1%/  +42%
   64/     4/   -3%/   -5%/   +2%/   -1%/    0%
   64/     8/   -1%/   -5%/   -1%/   -2%/    0%
  512/     1/   +5%/  -13%/   +6%/   +9%/  +17%
  512/     4/   -3%/   -9%/   +6%/   +4%/  -14%
  512/     8/   -2%/   -7%/    0%/    0%/   -1%
 1024/     1/  +18%/  +31%/  -12%/  -11%/  -31%
 1024/     4/    0%/   -9%/   -1%/   -6%/   -7%
 1024/     8/   -3%/   -8%/   -2%/   -4%/    0%
 2048/     1/    0%/   -1%/    0%/   -4%/   +5%
 2048/     4/    0%/   +2%/    0%/    0%/    0%
 2048/     8/    0%/   -6%/    0%/   -3%/   -1%
 4096/     1/   -1%/   +2%/  -14%/   -5%/   +8%
 4096/     4/    0%/   +1%/    0%/   +1%/   -1%
 4096/     8/   -1%/   -1%/   -2%/   -2%/   -3%
16384/     1/    0%/    0%/   +4%/   +5%/    0%
16384/     4/    0%/   +5%/   +7%/   +9%/    0%
16384/     8/   +1%/   +1%/   +3%/   +3%/   +2%
65535/     1/    0%/  +12%/   -1%/   +2%/   -2%
65535/     4/    0%/    0%/   -2%/   -2%/   +2%
65535/     8/   -1%/   -1%/   -4%/   -4%/    0%
Guest TX:
size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
   64/     1/  -16%/  -21%/   -2%/  -12%/   +1%
   64/     4/   -6%/   -2%/   -1%/   +6%/   -7%
   64/     8/   +4%/   +4%/   -2%/   +1%/  +30%
  512/     1/  -32%/  -33%/  -11%/  +62%/ +314%
  512/     4/  +30%/  +20%/  -22%/  -17%/  -14%
  512/     8/  +24%/  +12%/  -21%/  -10%/   -6%
 1024/     1/   +1%/   -7%/   +2%/  +51%/  +75%
 1024/     4/  +10%/   +9%/  -11%/  -19%/  -10%
 1024/     8/  +13%/   +7%/  -11%/  -13%/  -12%
 2048/     1/  +17%/    0%/   +1%/  +35%/  +78%
 2048/     4/  +15%/  +14%/  -17%/  -24%/  -15%
 2048/     8/  +11%/   +9%/  -15%/  -20%/  -12%
 4096/     1/   +3%/   -7%/    0%/  +21%/  +48%
 4096/     4/   +3%/   +4%/   -9%/  -19%/  +41%
 4096/     8/  +15%/  +13%/  -33%/  -28%/  -15%
16384/     1/   +5%/   -8%/   -4%/  -10%/ +323%
16384/     4/  +13%/   +5%/  -15%/  -11%/ +147%
16384/     8/   +8%/   +6%/  -25%/  -27%/  -31%
65535/     1/   +8%/    0%/   +5%/    0%/  +45%
65535/     4/  +10%/   +1%/   +7%/   -8%/ +151%
65535/     8/   +5%/    0%/   +1%/  -16%/  -29%

Test B were done through:

- 50us as busy loop timeout
- Netperf 2.6
- Two machines with back to back connected ixgbe
- Two guests each wich 1 vcpu and 1 queue
- pin two vhost threads to the same cpu on host to simulate the cpu
contending

Results:
- In this radical case, we can still get at most 14% improvement on
TCP_RR.
- For guest tx stream, minor improvemnt with at most 5% regression in
one byte case. For guest rx stream, at most 5% regression were seen.

Guest TX:
size /-+% /
1 /-5.55%/
64 /+1.11%/
256 /+2.33%/
512 /-0.03%/
1024 /+1.14%/
4096 /+0.00%/
16384/+0.00%/

Guest RX:
size /-+% /
1 /-5.11%/
64 /-0.55%/
256 /-2.35%/
512 /-3.39%/
1024 /+6.8% /
4096 /-0.01%/
16384/+0.00%/

TCP_RR:
size /-+% /
1 /+9.79% /
64 /+4.51% /
256 /+6.47% /
512 /-3.37% /
1024 /+6.15% /
4096 /+14.88%/
16384/-2.23% /

Changes from V2:
- rename vhost_vq_more_avail() to vhost_vq_avail_empty(). And return
  false we __get_user() fails.
- do not bother premmptions/timers for good path.
- use vhost_vring_state as ioctl parameter instead of reinveting a new
  one.
- add the unit of timeout (us) to the comment of new added ioctls

Changes from V1:
- remove the buggy vq_error() in vhost_vq_more_avail().
- leave vhost_enable_notify() untouched.

Changes from RFC V3:
- small tweak on the code to avoid multiple duplicate conditions in
  critical path when busy loop is not enabled.
- add the test result of multiple VMs

Changes from RFC V2:
- poll also at the end of rx handling
- factor out the polling logic and optimize the code a little bit
- add two ioctls to get and set the busy poll timeout
- test on ixgbe (which can give more stable and reproducable numbers)
  instead of mlx4.

Changes from RFC V1:
- add a comment for vhost_has_work() to explain why it could be
  lockless
- add param description for busyloop_timeout
- split out the busy polling logic into a new helper
- check and exit the loop when there's a pending signal
- disable preemption during busy looping to make sure lock_clock() was
  correctly used.

Jason Wang (3):
  vhost: introduce vhost_has_work()
  vhost: introduce vhost_vq_avail_empty()
  vhost_net: basic polling support

 drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/vhost/vhost.c      | 35 ++++++++++++++++++++
 drivers/vhost/vhost.h      |  3 ++
 include/uapi/linux/vhost.h |  6 ++++
 4 files changed, 118 insertions(+), 5 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH V3 1/3] vhost: introduce vhost_has_work()
  2016-02-26  8:42 ` Jason Wang
@ 2016-02-26  8:42   ` Jason Wang
  -1 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-26  8:42 UTC (permalink / raw)
  To: kvm, mst, virtualization, netdev, linux-kernel; +Cc: yang.zhang.wz, RAPOPORT

This path introduces a helper which can give a hint for whether or not
there's a work queued in the work list. This could be used for busy
polling code to exit the busy loop.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 7 +++++++
 drivers/vhost/vhost.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index ad2146a..90ac092 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -245,6 +245,13 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);
 
+/* A lockless hint for busy polling code to exit the loop */
+bool vhost_has_work(struct vhost_dev *dev)
+{
+	return !list_empty(&dev->work_list);
+}
+EXPORT_SYMBOL_GPL(vhost_has_work);
+
 void vhost_poll_queue(struct vhost_poll *poll)
 {
 	vhost_work_queue(poll->dev, &poll->work);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index d3f7674..43284ad 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -37,6 +37,7 @@ struct vhost_poll {
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
+bool vhost_has_work(struct vhost_dev *dev);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 		     unsigned long mask, struct vhost_dev *dev);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V3 1/3] vhost: introduce vhost_has_work()
@ 2016-02-26  8:42   ` Jason Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-26  8:42 UTC (permalink / raw)
  To: kvm, mst, virtualization, netdev, linux-kernel
  Cc: RAPOPORT, yang.zhang.wz, Jason Wang

This path introduces a helper which can give a hint for whether or not
there's a work queued in the work list. This could be used for busy
polling code to exit the busy loop.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 7 +++++++
 drivers/vhost/vhost.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index ad2146a..90ac092 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -245,6 +245,13 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);
 
+/* A lockless hint for busy polling code to exit the loop */
+bool vhost_has_work(struct vhost_dev *dev)
+{
+	return !list_empty(&dev->work_list);
+}
+EXPORT_SYMBOL_GPL(vhost_has_work);
+
 void vhost_poll_queue(struct vhost_poll *poll)
 {
 	vhost_work_queue(poll->dev, &poll->work);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index d3f7674..43284ad 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -37,6 +37,7 @@ struct vhost_poll {
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
+bool vhost_has_work(struct vhost_dev *dev);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 		     unsigned long mask, struct vhost_dev *dev);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V3 2/3] vhost: introduce vhost_vq_avail_empty()
  2016-02-26  8:42 ` Jason Wang
@ 2016-02-26  8:42   ` Jason Wang
  -1 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-26  8:42 UTC (permalink / raw)
  To: kvm, mst, virtualization, netdev, linux-kernel; +Cc: yang.zhang.wz, RAPOPORT

This patch introduces a helper which will return true if we're sure
that the available ring is empty for a specific vq. When we're not
sure, e.g vq access failure, return false instead. This could be used
for busy polling code to exit the busy loop.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 14 ++++++++++++++
 drivers/vhost/vhost.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 90ac092..c4ff9f2 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1633,6 +1633,20 @@ void vhost_add_used_and_signal_n(struct vhost_dev *dev,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
 
+/* return true if we're sure that available ring is empty */
+bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	__virtio16 avail_idx;
+	int r;
+
+	r = __get_user(avail_idx, &vq->avail->idx);
+	if (r)
+		return false;
+
+	return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx;
+}
+EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
+
 /* OK, now we need to know about added descriptors. */
 bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 43284ad..a7a43f0 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -159,6 +159,7 @@ void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
 void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
 void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
+bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
 bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V3 2/3] vhost: introduce vhost_vq_avail_empty()
@ 2016-02-26  8:42   ` Jason Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-26  8:42 UTC (permalink / raw)
  To: kvm, mst, virtualization, netdev, linux-kernel
  Cc: RAPOPORT, yang.zhang.wz, Jason Wang

This patch introduces a helper which will return true if we're sure
that the available ring is empty for a specific vq. When we're not
sure, e.g vq access failure, return false instead. This could be used
for busy polling code to exit the busy loop.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 14 ++++++++++++++
 drivers/vhost/vhost.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 90ac092..c4ff9f2 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1633,6 +1633,20 @@ void vhost_add_used_and_signal_n(struct vhost_dev *dev,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
 
+/* return true if we're sure that available ring is empty */
+bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	__virtio16 avail_idx;
+	int r;
+
+	r = __get_user(avail_idx, &vq->avail->idx);
+	if (r)
+		return false;
+
+	return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx;
+}
+EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
+
 /* OK, now we need to know about added descriptors. */
 bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 43284ad..a7a43f0 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -159,6 +159,7 @@ void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
 void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
 void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
+bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
 bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V3 3/3] vhost_net: basic polling support
  2016-02-26  8:42 ` Jason Wang
@ 2016-02-26  8:42   ` Jason Wang
  -1 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-26  8:42 UTC (permalink / raw)
  To: kvm, mst, virtualization, netdev, linux-kernel; +Cc: yang.zhang.wz, RAPOPORT

This patch tries to poll for new added tx buffer or socket receive
queue for a while at the end of tx/rx processing. The maximum time
spent on polling were specified through a new kind of vring ioctl.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/vhost/vhost.c      | 14 ++++++++
 drivers/vhost/vhost.h      |  1 +
 include/uapi/linux/vhost.h |  6 ++++
 4 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9eda69e..c91af93 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 	rcu_read_unlock_bh();
 }
 
+static inline unsigned long busy_clock(void)
+{
+	return local_clock() >> 10;
+}
+
+static bool vhost_can_busy_poll(struct vhost_dev *dev,
+				unsigned long endtime)
+{
+	return likely(!need_resched()) &&
+	       likely(!time_after(busy_clock(), endtime)) &&
+	       likely(!signal_pending(current)) &&
+	       !vhost_has_work(dev) &&
+	       single_task_running();
+}
+
+static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
+				    struct vhost_virtqueue *vq,
+				    struct iovec iov[], unsigned int iov_size,
+				    unsigned int *out_num, unsigned int *in_num)
+{
+	unsigned long uninitialized_var(endtime);
+	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+				    out_num, in_num, NULL, NULL);
+
+	if (r == vq->num && vq->busyloop_timeout) {
+		preempt_disable();
+		endtime = busy_clock() + vq->busyloop_timeout;
+		while (vhost_can_busy_poll(vq->dev, endtime) &&
+		       vhost_vq_avail_empty(vq->dev, vq))
+			cpu_relax();
+		preempt_enable();
+		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					out_num, in_num, NULL, NULL);
+	}
+
+	return r;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_tx(struct vhost_net *net)
@@ -331,10 +369,9 @@ static void handle_tx(struct vhost_net *net)
 			      % UIO_MAXIOV == nvq->done_idx))
 			break;
 
-		head = vhost_get_vq_desc(vq, vq->iov,
-					 ARRAY_SIZE(vq->iov),
-					 &out, &in,
-					 NULL, NULL);
+		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
+						ARRAY_SIZE(vq->iov),
+						&out, &in);
 		/* On error, stop handling until the next kick. */
 		if (unlikely(head < 0))
 			break;
@@ -435,6 +472,38 @@ static int peek_head_len(struct sock *sk)
 	return len;
 }
 
+static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
+{
+	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
+	struct vhost_virtqueue *vq = &nvq->vq;
+	unsigned long uninitialized_var(endtime);
+	int len = peek_head_len(sk);
+
+	if (!len && vq->busyloop_timeout) {
+		/* Both tx vq and rx socket were polled here */
+		mutex_lock(&vq->mutex);
+		vhost_disable_notify(&net->dev, vq);
+
+		preempt_disable();
+		endtime = busy_clock() + vq->busyloop_timeout;
+
+		while (vhost_can_busy_poll(&net->dev, endtime) &&
+		       skb_queue_empty(&sk->sk_receive_queue) &&
+		       vhost_vq_avail_empty(&net->dev, vq))
+			cpu_relax();
+
+		preempt_enable();
+
+		if (vhost_enable_notify(&net->dev, vq))
+			vhost_poll_queue(&vq->poll);
+		mutex_unlock(&vq->mutex);
+
+		len = peek_head_len(sk);
+	}
+
+	return len;
+}
+
 /* This is a multi-buffer version of vhost_get_desc, that works if
  *	vq has read descriptors only.
  * @vq		- the relevant virtqueue
@@ -553,7 +622,7 @@ static void handle_rx(struct vhost_net *net)
 		vq->log : NULL;
 	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
 
-	while ((sock_len = peek_head_len(sock->sk))) {
+	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
 		headcount = get_rx_bufs(vq, vq->heads, vhost_len,
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c4ff9f2..5abfce9 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -285,6 +285,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->memory = NULL;
 	vq->is_le = virtio_legacy_is_little_endian();
 	vhost_vq_reset_user_be(vq);
+	vq->busyloop_timeout = 0;
 }
 
 static int vhost_worker(void *data)
@@ -919,6 +920,19 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
 	case VHOST_GET_VRING_ENDIAN:
 		r = vhost_get_vring_endian(vq, idx, argp);
 		break;
+	case VHOST_SET_VRING_BUSYLOOP_TIMEOUT:
+		if (copy_from_user(&s, argp, sizeof(s))) {
+			r = -EFAULT;
+			break;
+		}
+		vq->busyloop_timeout = s.num;
+		break;
+	case VHOST_GET_VRING_BUSYLOOP_TIMEOUT:
+		s.index = idx;
+		s.num = vq->busyloop_timeout;
+		if (copy_to_user(argp, &s, sizeof(s)))
+			r = -EFAULT;
+		break;
 	default:
 		r = -ENOIOCTLCMD;
 	}
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index a7a43f0..9a02158 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -115,6 +115,7 @@ struct vhost_virtqueue {
 	/* Ring endianness requested by userspace for cross-endian support. */
 	bool user_be;
 #endif
+	u32 busyloop_timeout;
 };
 
 struct vhost_dev {
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index ab373191..61a8777 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -126,6 +126,12 @@ struct vhost_memory {
 #define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)
 /* Set eventfd to signal an error */
 #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
+/* Set busy loop timeout (in us) */
+#define VHOST_SET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x23,	\
+					 struct vhost_vring_state)
+/* Get busy loop timeout (in us) */
+#define VHOST_GET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x24,	\
+					 struct vhost_vring_state)
 
 /* VHOST_NET specific defines */
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V3 3/3] vhost_net: basic polling support
@ 2016-02-26  8:42   ` Jason Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-26  8:42 UTC (permalink / raw)
  To: kvm, mst, virtualization, netdev, linux-kernel
  Cc: RAPOPORT, yang.zhang.wz, Jason Wang

This patch tries to poll for new added tx buffer or socket receive
queue for a while at the end of tx/rx processing. The maximum time
spent on polling were specified through a new kind of vring ioctl.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/vhost/vhost.c      | 14 ++++++++
 drivers/vhost/vhost.h      |  1 +
 include/uapi/linux/vhost.h |  6 ++++
 4 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9eda69e..c91af93 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 	rcu_read_unlock_bh();
 }
 
+static inline unsigned long busy_clock(void)
+{
+	return local_clock() >> 10;
+}
+
+static bool vhost_can_busy_poll(struct vhost_dev *dev,
+				unsigned long endtime)
+{
+	return likely(!need_resched()) &&
+	       likely(!time_after(busy_clock(), endtime)) &&
+	       likely(!signal_pending(current)) &&
+	       !vhost_has_work(dev) &&
+	       single_task_running();
+}
+
+static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
+				    struct vhost_virtqueue *vq,
+				    struct iovec iov[], unsigned int iov_size,
+				    unsigned int *out_num, unsigned int *in_num)
+{
+	unsigned long uninitialized_var(endtime);
+	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+				    out_num, in_num, NULL, NULL);
+
+	if (r == vq->num && vq->busyloop_timeout) {
+		preempt_disable();
+		endtime = busy_clock() + vq->busyloop_timeout;
+		while (vhost_can_busy_poll(vq->dev, endtime) &&
+		       vhost_vq_avail_empty(vq->dev, vq))
+			cpu_relax();
+		preempt_enable();
+		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					out_num, in_num, NULL, NULL);
+	}
+
+	return r;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_tx(struct vhost_net *net)
@@ -331,10 +369,9 @@ static void handle_tx(struct vhost_net *net)
 			      % UIO_MAXIOV == nvq->done_idx))
 			break;
 
-		head = vhost_get_vq_desc(vq, vq->iov,
-					 ARRAY_SIZE(vq->iov),
-					 &out, &in,
-					 NULL, NULL);
+		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
+						ARRAY_SIZE(vq->iov),
+						&out, &in);
 		/* On error, stop handling until the next kick. */
 		if (unlikely(head < 0))
 			break;
@@ -435,6 +472,38 @@ static int peek_head_len(struct sock *sk)
 	return len;
 }
 
+static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
+{
+	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
+	struct vhost_virtqueue *vq = &nvq->vq;
+	unsigned long uninitialized_var(endtime);
+	int len = peek_head_len(sk);
+
+	if (!len && vq->busyloop_timeout) {
+		/* Both tx vq and rx socket were polled here */
+		mutex_lock(&vq->mutex);
+		vhost_disable_notify(&net->dev, vq);
+
+		preempt_disable();
+		endtime = busy_clock() + vq->busyloop_timeout;
+
+		while (vhost_can_busy_poll(&net->dev, endtime) &&
+		       skb_queue_empty(&sk->sk_receive_queue) &&
+		       vhost_vq_avail_empty(&net->dev, vq))
+			cpu_relax();
+
+		preempt_enable();
+
+		if (vhost_enable_notify(&net->dev, vq))
+			vhost_poll_queue(&vq->poll);
+		mutex_unlock(&vq->mutex);
+
+		len = peek_head_len(sk);
+	}
+
+	return len;
+}
+
 /* This is a multi-buffer version of vhost_get_desc, that works if
  *	vq has read descriptors only.
  * @vq		- the relevant virtqueue
@@ -553,7 +622,7 @@ static void handle_rx(struct vhost_net *net)
 		vq->log : NULL;
 	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
 
-	while ((sock_len = peek_head_len(sock->sk))) {
+	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
 		headcount = get_rx_bufs(vq, vq->heads, vhost_len,
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c4ff9f2..5abfce9 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -285,6 +285,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->memory = NULL;
 	vq->is_le = virtio_legacy_is_little_endian();
 	vhost_vq_reset_user_be(vq);
+	vq->busyloop_timeout = 0;
 }
 
 static int vhost_worker(void *data)
@@ -919,6 +920,19 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
 	case VHOST_GET_VRING_ENDIAN:
 		r = vhost_get_vring_endian(vq, idx, argp);
 		break;
+	case VHOST_SET_VRING_BUSYLOOP_TIMEOUT:
+		if (copy_from_user(&s, argp, sizeof(s))) {
+			r = -EFAULT;
+			break;
+		}
+		vq->busyloop_timeout = s.num;
+		break;
+	case VHOST_GET_VRING_BUSYLOOP_TIMEOUT:
+		s.index = idx;
+		s.num = vq->busyloop_timeout;
+		if (copy_to_user(argp, &s, sizeof(s)))
+			r = -EFAULT;
+		break;
 	default:
 		r = -ENOIOCTLCMD;
 	}
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index a7a43f0..9a02158 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -115,6 +115,7 @@ struct vhost_virtqueue {
 	/* Ring endianness requested by userspace for cross-endian support. */
 	bool user_be;
 #endif
+	u32 busyloop_timeout;
 };
 
 struct vhost_dev {
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index ab373191..61a8777 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -126,6 +126,12 @@ struct vhost_memory {
 #define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)
 /* Set eventfd to signal an error */
 #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
+/* Set busy loop timeout (in us) */
+#define VHOST_SET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x23,	\
+					 struct vhost_vring_state)
+/* Get busy loop timeout (in us) */
+#define VHOST_GET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x24,	\
+					 struct vhost_vring_state)
 
 /* VHOST_NET specific defines */
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 3/3] vhost_net: basic polling support
  2016-02-26  8:42   ` Jason Wang
@ 2016-02-28 14:09     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2016-02-28 14:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: yang.zhang.wz, RAPOPORT, kvm, netdev, linux-kernel,
	virtualization

On Fri, Feb 26, 2016 at 04:42:44PM +0800, Jason Wang wrote:
> This patch tries to poll for new added tx buffer or socket receive
> queue for a while at the end of tx/rx processing. The maximum time
> spent on polling were specified through a new kind of vring ioctl.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Looks good overall, but I still see one problem.

> ---
>  drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
>  drivers/vhost/vhost.c      | 14 ++++++++
>  drivers/vhost/vhost.h      |  1 +
>  include/uapi/linux/vhost.h |  6 ++++
>  4 files changed, 95 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 9eda69e..c91af93 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>  	rcu_read_unlock_bh();
>  }
>  
> +static inline unsigned long busy_clock(void)
> +{
> +	return local_clock() >> 10;
> +}
> +
> +static bool vhost_can_busy_poll(struct vhost_dev *dev,
> +				unsigned long endtime)
> +{
> +	return likely(!need_resched()) &&
> +	       likely(!time_after(busy_clock(), endtime)) &&
> +	       likely(!signal_pending(current)) &&
> +	       !vhost_has_work(dev) &&
> +	       single_task_running();

So I find it quite unfortunate that this still uses single_task_running.
This means that for example a SCHED_IDLE task will prevent polling from
becoming active, and that seems like a bug, or at least
an undocumented feature :).

Unfortunately this logic affects the behaviour as observed
by userspace, so we can't merge it like this and tune
afterwards, since otherwise mangement tools will start
depending on this logic.


> +}
> +
> +static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> +				    struct vhost_virtqueue *vq,
> +				    struct iovec iov[], unsigned int iov_size,
> +				    unsigned int *out_num, unsigned int *in_num)
> +{
> +	unsigned long uninitialized_var(endtime);
> +	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> +				    out_num, in_num, NULL, NULL);
> +
> +	if (r == vq->num && vq->busyloop_timeout) {
> +		preempt_disable();
> +		endtime = busy_clock() + vq->busyloop_timeout;
> +		while (vhost_can_busy_poll(vq->dev, endtime) &&
> +		       vhost_vq_avail_empty(vq->dev, vq))
> +			cpu_relax();
> +		preempt_enable();
> +		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> +					out_num, in_num, NULL, NULL);
> +	}
> +
> +	return r;
> +}
> +
>  /* Expects to be always run from workqueue - which acts as
>   * read-size critical section for our kind of RCU. */
>  static void handle_tx(struct vhost_net *net)
> @@ -331,10 +369,9 @@ static void handle_tx(struct vhost_net *net)
>  			      % UIO_MAXIOV == nvq->done_idx))
>  			break;
>  
> -		head = vhost_get_vq_desc(vq, vq->iov,
> -					 ARRAY_SIZE(vq->iov),
> -					 &out, &in,
> -					 NULL, NULL);
> +		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
> +						ARRAY_SIZE(vq->iov),
> +						&out, &in);
>  		/* On error, stop handling until the next kick. */
>  		if (unlikely(head < 0))
>  			break;
> @@ -435,6 +472,38 @@ static int peek_head_len(struct sock *sk)
>  	return len;
>  }
>  
> +static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
> +{
> +	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> +	struct vhost_virtqueue *vq = &nvq->vq;
> +	unsigned long uninitialized_var(endtime);
> +	int len = peek_head_len(sk);
> +
> +	if (!len && vq->busyloop_timeout) {
> +		/* Both tx vq and rx socket were polled here */
> +		mutex_lock(&vq->mutex);
> +		vhost_disable_notify(&net->dev, vq);
> +
> +		preempt_disable();
> +		endtime = busy_clock() + vq->busyloop_timeout;
> +
> +		while (vhost_can_busy_poll(&net->dev, endtime) &&
> +		       skb_queue_empty(&sk->sk_receive_queue) &&
> +		       vhost_vq_avail_empty(&net->dev, vq))
> +			cpu_relax();
> +
> +		preempt_enable();
> +
> +		if (vhost_enable_notify(&net->dev, vq))
> +			vhost_poll_queue(&vq->poll);
> +		mutex_unlock(&vq->mutex);
> +
> +		len = peek_head_len(sk);
> +	}
> +
> +	return len;
> +}
> +
>  /* This is a multi-buffer version of vhost_get_desc, that works if
>   *	vq has read descriptors only.
>   * @vq		- the relevant virtqueue
> @@ -553,7 +622,7 @@ static void handle_rx(struct vhost_net *net)
>  		vq->log : NULL;
>  	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
>  
> -	while ((sock_len = peek_head_len(sock->sk))) {
> +	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
>  		sock_len += sock_hlen;
>  		vhost_len = sock_len + vhost_hlen;
>  		headcount = get_rx_bufs(vq, vq->heads, vhost_len,
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index c4ff9f2..5abfce9 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -285,6 +285,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>  	vq->memory = NULL;
>  	vq->is_le = virtio_legacy_is_little_endian();
>  	vhost_vq_reset_user_be(vq);
> +	vq->busyloop_timeout = 0;
>  }
>  
>  static int vhost_worker(void *data)
> @@ -919,6 +920,19 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
>  	case VHOST_GET_VRING_ENDIAN:
>  		r = vhost_get_vring_endian(vq, idx, argp);
>  		break;
> +	case VHOST_SET_VRING_BUSYLOOP_TIMEOUT:
> +		if (copy_from_user(&s, argp, sizeof(s))) {
> +			r = -EFAULT;
> +			break;
> +		}
> +		vq->busyloop_timeout = s.num;
> +		break;
> +	case VHOST_GET_VRING_BUSYLOOP_TIMEOUT:
> +		s.index = idx;
> +		s.num = vq->busyloop_timeout;
> +		if (copy_to_user(argp, &s, sizeof(s)))
> +			r = -EFAULT;
> +		break;
>  	default:
>  		r = -ENOIOCTLCMD;
>  	}
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index a7a43f0..9a02158 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -115,6 +115,7 @@ struct vhost_virtqueue {
>  	/* Ring endianness requested by userspace for cross-endian support. */
>  	bool user_be;
>  #endif
> +	u32 busyloop_timeout;
>  };
>  
>  struct vhost_dev {
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index ab373191..61a8777 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -126,6 +126,12 @@ struct vhost_memory {
>  #define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)
>  /* Set eventfd to signal an error */
>  #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
> +/* Set busy loop timeout (in us) */
> +#define VHOST_SET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x23,	\
> +					 struct vhost_vring_state)
> +/* Get busy loop timeout (in us) */
> +#define VHOST_GET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x24,	\
> +					 struct vhost_vring_state)
>  
>  /* VHOST_NET specific defines */
>  
> -- 
> 2.5.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 3/3] vhost_net: basic polling support
@ 2016-02-28 14:09     ` Michael S. Tsirkin
  0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2016-02-28 14:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, virtualization, netdev, linux-kernel, RAPOPORT,
	yang.zhang.wz

On Fri, Feb 26, 2016 at 04:42:44PM +0800, Jason Wang wrote:
> This patch tries to poll for new added tx buffer or socket receive
> queue for a while at the end of tx/rx processing. The maximum time
> spent on polling were specified through a new kind of vring ioctl.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Looks good overall, but I still see one problem.

> ---
>  drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
>  drivers/vhost/vhost.c      | 14 ++++++++
>  drivers/vhost/vhost.h      |  1 +
>  include/uapi/linux/vhost.h |  6 ++++
>  4 files changed, 95 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 9eda69e..c91af93 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>  	rcu_read_unlock_bh();
>  }
>  
> +static inline unsigned long busy_clock(void)
> +{
> +	return local_clock() >> 10;
> +}
> +
> +static bool vhost_can_busy_poll(struct vhost_dev *dev,
> +				unsigned long endtime)
> +{
> +	return likely(!need_resched()) &&
> +	       likely(!time_after(busy_clock(), endtime)) &&
> +	       likely(!signal_pending(current)) &&
> +	       !vhost_has_work(dev) &&
> +	       single_task_running();

So I find it quite unfortunate that this still uses single_task_running.
This means that for example a SCHED_IDLE task will prevent polling from
becoming active, and that seems like a bug, or at least
an undocumented feature :).

Unfortunately this logic affects the behaviour as observed
by userspace, so we can't merge it like this and tune
afterwards, since otherwise mangement tools will start
depending on this logic.


> +}
> +
> +static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> +				    struct vhost_virtqueue *vq,
> +				    struct iovec iov[], unsigned int iov_size,
> +				    unsigned int *out_num, unsigned int *in_num)
> +{
> +	unsigned long uninitialized_var(endtime);
> +	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> +				    out_num, in_num, NULL, NULL);
> +
> +	if (r == vq->num && vq->busyloop_timeout) {
> +		preempt_disable();
> +		endtime = busy_clock() + vq->busyloop_timeout;
> +		while (vhost_can_busy_poll(vq->dev, endtime) &&
> +		       vhost_vq_avail_empty(vq->dev, vq))
> +			cpu_relax();
> +		preempt_enable();
> +		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> +					out_num, in_num, NULL, NULL);
> +	}
> +
> +	return r;
> +}
> +
>  /* Expects to be always run from workqueue - which acts as
>   * read-size critical section for our kind of RCU. */
>  static void handle_tx(struct vhost_net *net)
> @@ -331,10 +369,9 @@ static void handle_tx(struct vhost_net *net)
>  			      % UIO_MAXIOV == nvq->done_idx))
>  			break;
>  
> -		head = vhost_get_vq_desc(vq, vq->iov,
> -					 ARRAY_SIZE(vq->iov),
> -					 &out, &in,
> -					 NULL, NULL);
> +		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
> +						ARRAY_SIZE(vq->iov),
> +						&out, &in);
>  		/* On error, stop handling until the next kick. */
>  		if (unlikely(head < 0))
>  			break;
> @@ -435,6 +472,38 @@ static int peek_head_len(struct sock *sk)
>  	return len;
>  }
>  
> +static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
> +{
> +	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> +	struct vhost_virtqueue *vq = &nvq->vq;
> +	unsigned long uninitialized_var(endtime);
> +	int len = peek_head_len(sk);
> +
> +	if (!len && vq->busyloop_timeout) {
> +		/* Both tx vq and rx socket were polled here */
> +		mutex_lock(&vq->mutex);
> +		vhost_disable_notify(&net->dev, vq);
> +
> +		preempt_disable();
> +		endtime = busy_clock() + vq->busyloop_timeout;
> +
> +		while (vhost_can_busy_poll(&net->dev, endtime) &&
> +		       skb_queue_empty(&sk->sk_receive_queue) &&
> +		       vhost_vq_avail_empty(&net->dev, vq))
> +			cpu_relax();
> +
> +		preempt_enable();
> +
> +		if (vhost_enable_notify(&net->dev, vq))
> +			vhost_poll_queue(&vq->poll);
> +		mutex_unlock(&vq->mutex);
> +
> +		len = peek_head_len(sk);
> +	}
> +
> +	return len;
> +}
> +
>  /* This is a multi-buffer version of vhost_get_desc, that works if
>   *	vq has read descriptors only.
>   * @vq		- the relevant virtqueue
> @@ -553,7 +622,7 @@ static void handle_rx(struct vhost_net *net)
>  		vq->log : NULL;
>  	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
>  
> -	while ((sock_len = peek_head_len(sock->sk))) {
> +	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
>  		sock_len += sock_hlen;
>  		vhost_len = sock_len + vhost_hlen;
>  		headcount = get_rx_bufs(vq, vq->heads, vhost_len,
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index c4ff9f2..5abfce9 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -285,6 +285,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>  	vq->memory = NULL;
>  	vq->is_le = virtio_legacy_is_little_endian();
>  	vhost_vq_reset_user_be(vq);
> +	vq->busyloop_timeout = 0;
>  }
>  
>  static int vhost_worker(void *data)
> @@ -919,6 +920,19 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
>  	case VHOST_GET_VRING_ENDIAN:
>  		r = vhost_get_vring_endian(vq, idx, argp);
>  		break;
> +	case VHOST_SET_VRING_BUSYLOOP_TIMEOUT:
> +		if (copy_from_user(&s, argp, sizeof(s))) {
> +			r = -EFAULT;
> +			break;
> +		}
> +		vq->busyloop_timeout = s.num;
> +		break;
> +	case VHOST_GET_VRING_BUSYLOOP_TIMEOUT:
> +		s.index = idx;
> +		s.num = vq->busyloop_timeout;
> +		if (copy_to_user(argp, &s, sizeof(s)))
> +			r = -EFAULT;
> +		break;
>  	default:
>  		r = -ENOIOCTLCMD;
>  	}
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index a7a43f0..9a02158 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -115,6 +115,7 @@ struct vhost_virtqueue {
>  	/* Ring endianness requested by userspace for cross-endian support. */
>  	bool user_be;
>  #endif
> +	u32 busyloop_timeout;
>  };
>  
>  struct vhost_dev {
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index ab373191..61a8777 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -126,6 +126,12 @@ struct vhost_memory {
>  #define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)
>  /* Set eventfd to signal an error */
>  #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
> +/* Set busy loop timeout (in us) */
> +#define VHOST_SET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x23,	\
> +					 struct vhost_vring_state)
> +/* Get busy loop timeout (in us) */
> +#define VHOST_GET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x24,	\
> +					 struct vhost_vring_state)
>  
>  /* VHOST_NET specific defines */
>  
> -- 
> 2.5.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 3/3] vhost_net: basic polling support
  2016-02-28 14:09     ` Michael S. Tsirkin
@ 2016-02-29  5:15       ` Jason Wang
  -1 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-29  5:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yang.zhang.wz, RAPOPORT, kvm, netdev, linux-kernel,
	virtualization



On 02/28/2016 10:09 PM, Michael S. Tsirkin wrote:
> On Fri, Feb 26, 2016 at 04:42:44PM +0800, Jason Wang wrote:
>> > This patch tries to poll for new added tx buffer or socket receive
>> > queue for a while at the end of tx/rx processing. The maximum time
>> > spent on polling were specified through a new kind of vring ioctl.
>> > 
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> Looks good overall, but I still see one problem.
>
>> > ---
>> >  drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
>> >  drivers/vhost/vhost.c      | 14 ++++++++
>> >  drivers/vhost/vhost.h      |  1 +
>> >  include/uapi/linux/vhost.h |  6 ++++
>> >  4 files changed, 95 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> > index 9eda69e..c91af93 100644
>> > --- a/drivers/vhost/net.c
>> > +++ b/drivers/vhost/net.c
>> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>> >  	rcu_read_unlock_bh();
>> >  }
>> >  
>> > +static inline unsigned long busy_clock(void)
>> > +{
>> > +	return local_clock() >> 10;
>> > +}
>> > +
>> > +static bool vhost_can_busy_poll(struct vhost_dev *dev,
>> > +				unsigned long endtime)
>> > +{
>> > +	return likely(!need_resched()) &&
>> > +	       likely(!time_after(busy_clock(), endtime)) &&
>> > +	       likely(!signal_pending(current)) &&
>> > +	       !vhost_has_work(dev) &&
>> > +	       single_task_running();
> So I find it quite unfortunate that this still uses single_task_running.
> This means that for example a SCHED_IDLE task will prevent polling from
> becoming active, and that seems like a bug, or at least
> an undocumented feature :).

Yes, it may need more thoughts.

>
> Unfortunately this logic affects the behaviour as observed
> by userspace, so we can't merge it like this and tune
> afterwards, since otherwise mangement tools will start
> depending on this logic.
>
>

How about remove single_task_running() first here and optimize on top?
We probably need something like this to handle overcommitment.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 3/3] vhost_net: basic polling support
@ 2016-02-29  5:15       ` Jason Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-29  5:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, RAPOPORT,
	yang.zhang.wz



On 02/28/2016 10:09 PM, Michael S. Tsirkin wrote:
> On Fri, Feb 26, 2016 at 04:42:44PM +0800, Jason Wang wrote:
>> > This patch tries to poll for new added tx buffer or socket receive
>> > queue for a while at the end of tx/rx processing. The maximum time
>> > spent on polling were specified through a new kind of vring ioctl.
>> > 
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> Looks good overall, but I still see one problem.
>
>> > ---
>> >  drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
>> >  drivers/vhost/vhost.c      | 14 ++++++++
>> >  drivers/vhost/vhost.h      |  1 +
>> >  include/uapi/linux/vhost.h |  6 ++++
>> >  4 files changed, 95 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> > index 9eda69e..c91af93 100644
>> > --- a/drivers/vhost/net.c
>> > +++ b/drivers/vhost/net.c
>> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>> >  	rcu_read_unlock_bh();
>> >  }
>> >  
>> > +static inline unsigned long busy_clock(void)
>> > +{
>> > +	return local_clock() >> 10;
>> > +}
>> > +
>> > +static bool vhost_can_busy_poll(struct vhost_dev *dev,
>> > +				unsigned long endtime)
>> > +{
>> > +	return likely(!need_resched()) &&
>> > +	       likely(!time_after(busy_clock(), endtime)) &&
>> > +	       likely(!signal_pending(current)) &&
>> > +	       !vhost_has_work(dev) &&
>> > +	       single_task_running();
> So I find it quite unfortunate that this still uses single_task_running.
> This means that for example a SCHED_IDLE task will prevent polling from
> becoming active, and that seems like a bug, or at least
> an undocumented feature :).

Yes, it may need more thoughts.

>
> Unfortunately this logic affects the behaviour as observed
> by userspace, so we can't merge it like this and tune
> afterwards, since otherwise mangement tools will start
> depending on this logic.
>
>

How about remove single_task_running() first here and optimize on top?
We probably need something like this to handle overcommitment.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 3/3] vhost_net: basic polling support
  2016-02-29  5:15       ` Jason Wang
@ 2016-02-29  9:03         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2016-02-29  9:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: yang.zhang.wz, RAPOPORT, kvm, netdev, linux-kernel,
	virtualization

On Mon, Feb 29, 2016 at 01:15:48PM +0800, Jason Wang wrote:
> 
> 
> On 02/28/2016 10:09 PM, Michael S. Tsirkin wrote:
> > On Fri, Feb 26, 2016 at 04:42:44PM +0800, Jason Wang wrote:
> >> > This patch tries to poll for new added tx buffer or socket receive
> >> > queue for a while at the end of tx/rx processing. The maximum time
> >> > spent on polling were specified through a new kind of vring ioctl.
> >> > 
> >> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > Looks good overall, but I still see one problem.
> >
> >> > ---
> >> >  drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
> >> >  drivers/vhost/vhost.c      | 14 ++++++++
> >> >  drivers/vhost/vhost.h      |  1 +
> >> >  include/uapi/linux/vhost.h |  6 ++++
> >> >  4 files changed, 95 insertions(+), 5 deletions(-)
> >> > 
> >> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> > index 9eda69e..c91af93 100644
> >> > --- a/drivers/vhost/net.c
> >> > +++ b/drivers/vhost/net.c
> >> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
> >> >  	rcu_read_unlock_bh();
> >> >  }
> >> >  
> >> > +static inline unsigned long busy_clock(void)
> >> > +{
> >> > +	return local_clock() >> 10;
> >> > +}
> >> > +
> >> > +static bool vhost_can_busy_poll(struct vhost_dev *dev,
> >> > +				unsigned long endtime)
> >> > +{
> >> > +	return likely(!need_resched()) &&
> >> > +	       likely(!time_after(busy_clock(), endtime)) &&
> >> > +	       likely(!signal_pending(current)) &&
> >> > +	       !vhost_has_work(dev) &&
> >> > +	       single_task_running();
> > So I find it quite unfortunate that this still uses single_task_running.
> > This means that for example a SCHED_IDLE task will prevent polling from
> > becoming active, and that seems like a bug, or at least
> > an undocumented feature :).
> 
> Yes, it may need more thoughts.
> 
> >
> > Unfortunately this logic affects the behaviour as observed
> > by userspace, so we can't merge it like this and tune
> > afterwards, since otherwise mangement tools will start
> > depending on this logic.
> >
> >
> 
> How about remove single_task_running() first here and optimize on top?
> We probably need something like this to handle overcommitment.

Sounds good.

-- 
MST

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 3/3] vhost_net: basic polling support
@ 2016-02-29  9:03         ` Michael S. Tsirkin
  0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2016-02-29  9:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, virtualization, netdev, linux-kernel, RAPOPORT,
	yang.zhang.wz

On Mon, Feb 29, 2016 at 01:15:48PM +0800, Jason Wang wrote:
> 
> 
> On 02/28/2016 10:09 PM, Michael S. Tsirkin wrote:
> > On Fri, Feb 26, 2016 at 04:42:44PM +0800, Jason Wang wrote:
> >> > This patch tries to poll for new added tx buffer or socket receive
> >> > queue for a while at the end of tx/rx processing. The maximum time
> >> > spent on polling were specified through a new kind of vring ioctl.
> >> > 
> >> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > Looks good overall, but I still see one problem.
> >
> >> > ---
> >> >  drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
> >> >  drivers/vhost/vhost.c      | 14 ++++++++
> >> >  drivers/vhost/vhost.h      |  1 +
> >> >  include/uapi/linux/vhost.h |  6 ++++
> >> >  4 files changed, 95 insertions(+), 5 deletions(-)
> >> > 
> >> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> > index 9eda69e..c91af93 100644
> >> > --- a/drivers/vhost/net.c
> >> > +++ b/drivers/vhost/net.c
> >> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
> >> >  	rcu_read_unlock_bh();
> >> >  }
> >> >  
> >> > +static inline unsigned long busy_clock(void)
> >> > +{
> >> > +	return local_clock() >> 10;
> >> > +}
> >> > +
> >> > +static bool vhost_can_busy_poll(struct vhost_dev *dev,
> >> > +				unsigned long endtime)
> >> > +{
> >> > +	return likely(!need_resched()) &&
> >> > +	       likely(!time_after(busy_clock(), endtime)) &&
> >> > +	       likely(!signal_pending(current)) &&
> >> > +	       !vhost_has_work(dev) &&
> >> > +	       single_task_running();
> > So I find it quite unfortunate that this still uses single_task_running.
> > This means that for example a SCHED_IDLE task will prevent polling from
> > becoming active, and that seems like a bug, or at least
> > an undocumented feature :).
> 
> Yes, it may need more thoughts.
> 
> >
> > Unfortunately this logic affects the behaviour as observed
> > by userspace, so we can't merge it like this and tune
> > afterwards, since otherwise mangement tools will start
> > depending on this logic.
> >
> >
> 
> How about remove single_task_running() first here and optimize on top?
> We probably need something like this to handle overcommitment.

Sounds good.

-- 
MST

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 3/3] vhost_net: basic polling support
  2016-02-26  8:42   ` Jason Wang
@ 2016-02-28 21:56     ` Christian Borntraeger
  -1 siblings, 0 replies; 22+ messages in thread
From: Christian Borntraeger @ 2016-02-28 21:56 UTC (permalink / raw)
  To: Jason Wang, kvm, mst, virtualization, netdev, linux-kernel
  Cc: yang.zhang.wz, RAPOPORT

On 02/26/2016 09:42 AM, Jason Wang wrote:
> This patch tries to poll for new added tx buffer or socket receive
> queue for a while at the end of tx/rx processing. The maximum time
> spent on polling were specified through a new kind of vring ioctl.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
>  drivers/vhost/vhost.c      | 14 ++++++++
>  drivers/vhost/vhost.h      |  1 +
>  include/uapi/linux/vhost.h |  6 ++++
>  4 files changed, 95 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 9eda69e..c91af93 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>  	rcu_read_unlock_bh();
>  }
> 
> +static inline unsigned long busy_clock(void)
> +{
> +	return local_clock() >> 10;
> +}
> +
> +static bool vhost_can_busy_poll(struct vhost_dev *dev,
> +				unsigned long endtime)
> +{
> +	return likely(!need_resched()) &&
> +	       likely(!time_after(busy_clock(), endtime)) &&
> +	       likely(!signal_pending(current)) &&
> +	       !vhost_has_work(dev) &&
> +	       single_task_running();
> +}
> +
> +static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> +				    struct vhost_virtqueue *vq,
> +				    struct iovec iov[], unsigned int iov_size,
> +				    unsigned int *out_num, unsigned int *in_num)
> +{
> +	unsigned long uninitialized_var(endtime);
> +	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> +				    out_num, in_num, NULL, NULL);
> +
> +	if (r == vq->num && vq->busyloop_timeout) {
> +		preempt_disable();
> +		endtime = busy_clock() + vq->busyloop_timeout;
> +		while (vhost_can_busy_poll(vq->dev, endtime) &&
> +		       vhost_vq_avail_empty(vq->dev, vq))
> +			cpu_relax();


Can you use cpu_relax_lowlatency (which should be the same as cpu_relax for almost
everybody but s390? cpu_relax (without low latency might give up the time slice
when running under another hypervisor (like LPAR on s390), which might not be what
we want here.



[...] 
> +static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
> +{
> +	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> +	struct vhost_virtqueue *vq = &nvq->vq;
> +	unsigned long uninitialized_var(endtime);
> +	int len = peek_head_len(sk);
> +
> +	if (!len && vq->busyloop_timeout) {
> +		/* Both tx vq and rx socket were polled here */
> +		mutex_lock(&vq->mutex);
> +		vhost_disable_notify(&net->dev, vq);
> +
> +		preempt_disable();
> +		endtime = busy_clock() + vq->busyloop_timeout;
> +
> +		while (vhost_can_busy_poll(&net->dev, endtime) &&
> +		       skb_queue_empty(&sk->sk_receive_queue) &&
> +		       vhost_vq_avail_empty(&net->dev, vq))
> +			cpu_relax();

here as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 3/3] vhost_net: basic polling support
@ 2016-02-28 21:56     ` Christian Borntraeger
  0 siblings, 0 replies; 22+ messages in thread
From: Christian Borntraeger @ 2016-02-28 21:56 UTC (permalink / raw)
  To: Jason Wang, kvm, mst, virtualization, netdev, linux-kernel
  Cc: RAPOPORT, yang.zhang.wz

On 02/26/2016 09:42 AM, Jason Wang wrote:
> This patch tries to poll for new added tx buffer or socket receive
> queue for a while at the end of tx/rx processing. The maximum time
> spent on polling were specified through a new kind of vring ioctl.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
>  drivers/vhost/vhost.c      | 14 ++++++++
>  drivers/vhost/vhost.h      |  1 +
>  include/uapi/linux/vhost.h |  6 ++++
>  4 files changed, 95 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 9eda69e..c91af93 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>  	rcu_read_unlock_bh();
>  }
> 
> +static inline unsigned long busy_clock(void)
> +{
> +	return local_clock() >> 10;
> +}
> +
> +static bool vhost_can_busy_poll(struct vhost_dev *dev,
> +				unsigned long endtime)
> +{
> +	return likely(!need_resched()) &&
> +	       likely(!time_after(busy_clock(), endtime)) &&
> +	       likely(!signal_pending(current)) &&
> +	       !vhost_has_work(dev) &&
> +	       single_task_running();
> +}
> +
> +static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> +				    struct vhost_virtqueue *vq,
> +				    struct iovec iov[], unsigned int iov_size,
> +				    unsigned int *out_num, unsigned int *in_num)
> +{
> +	unsigned long uninitialized_var(endtime);
> +	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> +				    out_num, in_num, NULL, NULL);
> +
> +	if (r == vq->num && vq->busyloop_timeout) {
> +		preempt_disable();
> +		endtime = busy_clock() + vq->busyloop_timeout;
> +		while (vhost_can_busy_poll(vq->dev, endtime) &&
> +		       vhost_vq_avail_empty(vq->dev, vq))
> +			cpu_relax();


Can you use cpu_relax_lowlatency (which should be the same as cpu_relax for almost
everybody but s390? cpu_relax (without low latency might give up the time slice
when running under another hypervisor (like LPAR on s390), which might not be what
we want here.



[...] 
> +static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
> +{
> +	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> +	struct vhost_virtqueue *vq = &nvq->vq;
> +	unsigned long uninitialized_var(endtime);
> +	int len = peek_head_len(sk);
> +
> +	if (!len && vq->busyloop_timeout) {
> +		/* Both tx vq and rx socket were polled here */
> +		mutex_lock(&vq->mutex);
> +		vhost_disable_notify(&net->dev, vq);
> +
> +		preempt_disable();
> +		endtime = busy_clock() + vq->busyloop_timeout;
> +
> +		while (vhost_can_busy_poll(&net->dev, endtime) &&
> +		       skb_queue_empty(&sk->sk_receive_queue) &&
> +		       vhost_vq_avail_empty(&net->dev, vq))
> +			cpu_relax();

here as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 3/3] vhost_net: basic polling support
  2016-02-28 21:56     ` Christian Borntraeger
@ 2016-02-29  5:17       ` Jason Wang
  -1 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-29  5:17 UTC (permalink / raw)
  To: Christian Borntraeger, kvm, mst, virtualization, netdev,
	linux-kernel
  Cc: yang.zhang.wz, RAPOPORT



On 02/29/2016 05:56 AM, Christian Borntraeger wrote:
> On 02/26/2016 09:42 AM, Jason Wang wrote:
>> > This patch tries to poll for new added tx buffer or socket receive
>> > queue for a while at the end of tx/rx processing. The maximum time
>> > spent on polling were specified through a new kind of vring ioctl.
>> > 
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>> > ---
>> >  drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
>> >  drivers/vhost/vhost.c      | 14 ++++++++
>> >  drivers/vhost/vhost.h      |  1 +
>> >  include/uapi/linux/vhost.h |  6 ++++
>> >  4 files changed, 95 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> > index 9eda69e..c91af93 100644
>> > --- a/drivers/vhost/net.c
>> > +++ b/drivers/vhost/net.c
>> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>> >  	rcu_read_unlock_bh();
>> >  }
>> > 
>> > +static inline unsigned long busy_clock(void)
>> > +{
>> > +	return local_clock() >> 10;
>> > +}
>> > +
>> > +static bool vhost_can_busy_poll(struct vhost_dev *dev,
>> > +				unsigned long endtime)
>> > +{
>> > +	return likely(!need_resched()) &&
>> > +	       likely(!time_after(busy_clock(), endtime)) &&
>> > +	       likely(!signal_pending(current)) &&
>> > +	       !vhost_has_work(dev) &&
>> > +	       single_task_running();
>> > +}
>> > +
>> > +static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
>> > +				    struct vhost_virtqueue *vq,
>> > +				    struct iovec iov[], unsigned int iov_size,
>> > +				    unsigned int *out_num, unsigned int *in_num)
>> > +{
>> > +	unsigned long uninitialized_var(endtime);
>> > +	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>> > +				    out_num, in_num, NULL, NULL);
>> > +
>> > +	if (r == vq->num && vq->busyloop_timeout) {
>> > +		preempt_disable();
>> > +		endtime = busy_clock() + vq->busyloop_timeout;
>> > +		while (vhost_can_busy_poll(vq->dev, endtime) &&
>> > +		       vhost_vq_avail_empty(vq->dev, vq))
>> > +			cpu_relax();
> Can you use cpu_relax_lowlatency (which should be the same as cpu_relax for almost
> everybody but s390? cpu_relax (without low latency might give up the time slice
> when running under another hypervisor (like LPAR on s390), which might not be what
> we want here.

Ok, will do this in next version.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 3/3] vhost_net: basic polling support
@ 2016-02-29  5:17       ` Jason Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2016-02-29  5:17 UTC (permalink / raw)
  To: Christian Borntraeger, kvm, mst, virtualization, netdev,
	linux-kernel
  Cc: RAPOPORT, yang.zhang.wz



On 02/29/2016 05:56 AM, Christian Borntraeger wrote:
> On 02/26/2016 09:42 AM, Jason Wang wrote:
>> > This patch tries to poll for new added tx buffer or socket receive
>> > queue for a while at the end of tx/rx processing. The maximum time
>> > spent on polling were specified through a new kind of vring ioctl.
>> > 
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>> > ---
>> >  drivers/vhost/net.c        | 79 +++++++++++++++++++++++++++++++++++++++++++---
>> >  drivers/vhost/vhost.c      | 14 ++++++++
>> >  drivers/vhost/vhost.h      |  1 +
>> >  include/uapi/linux/vhost.h |  6 ++++
>> >  4 files changed, 95 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> > index 9eda69e..c91af93 100644
>> > --- a/drivers/vhost/net.c
>> > +++ b/drivers/vhost/net.c
>> > @@ -287,6 +287,44 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>> >  	rcu_read_unlock_bh();
>> >  }
>> > 
>> > +static inline unsigned long busy_clock(void)
>> > +{
>> > +	return local_clock() >> 10;
>> > +}
>> > +
>> > +static bool vhost_can_busy_poll(struct vhost_dev *dev,
>> > +				unsigned long endtime)
>> > +{
>> > +	return likely(!need_resched()) &&
>> > +	       likely(!time_after(busy_clock(), endtime)) &&
>> > +	       likely(!signal_pending(current)) &&
>> > +	       !vhost_has_work(dev) &&
>> > +	       single_task_running();
>> > +}
>> > +
>> > +static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
>> > +				    struct vhost_virtqueue *vq,
>> > +				    struct iovec iov[], unsigned int iov_size,
>> > +				    unsigned int *out_num, unsigned int *in_num)
>> > +{
>> > +	unsigned long uninitialized_var(endtime);
>> > +	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>> > +				    out_num, in_num, NULL, NULL);
>> > +
>> > +	if (r == vq->num && vq->busyloop_timeout) {
>> > +		preempt_disable();
>> > +		endtime = busy_clock() + vq->busyloop_timeout;
>> > +		while (vhost_can_busy_poll(vq->dev, endtime) &&
>> > +		       vhost_vq_avail_empty(vq->dev, vq))
>> > +			cpu_relax();
> Can you use cpu_relax_lowlatency (which should be the same as cpu_relax for almost
> everybody but s390? cpu_relax (without low latency might give up the time slice
> when running under another hypervisor (like LPAR on s390), which might not be what
> we want here.

Ok, will do this in next version.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 0/3] basic busy polling support for vhost_net
  2016-02-26  8:42 ` Jason Wang
@ 2016-02-26 16:45   ` David Miller
  -1 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2016-02-26 16:45 UTC (permalink / raw)
  To: jasowang
  Cc: yang.zhang.wz, RAPOPORT, kvm, mst, netdev, linux-kernel,
	virtualization

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 26 Feb 2016 16:42:41 +0800

> This series tries to add basic busy polling for vhost net. The idea is
> simple: at the end of tx/rx processing, busy polling for new tx added
> descriptor and rx receive socket for a while. The maximum number of
> time (in us) could be spent on busy polling was specified ioctl.

I'm assuming this will go through Michael's tree.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 0/3] basic busy polling support for vhost_net
@ 2016-02-26 16:45   ` David Miller
  0 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2016-02-26 16:45 UTC (permalink / raw)
  To: jasowang
  Cc: kvm, mst, virtualization, netdev, linux-kernel, RAPOPORT,
	yang.zhang.wz

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 26 Feb 2016 16:42:41 +0800

> This series tries to add basic busy polling for vhost net. The idea is
> simple: at the end of tx/rx processing, busy polling for new tx added
> descriptor and rx receive socket for a while. The maximum number of
> time (in us) could be spent on busy polling was specified ioctl.

I'm assuming this will go through Michael's tree.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 0/3] basic busy polling support for vhost_net
  2016-02-26 16:45   ` David Miller
@ 2016-02-28  9:12     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2016-02-28  9:12 UTC (permalink / raw)
  To: David Miller
  Cc: yang.zhang.wz, RAPOPORT, kvm, netdev, linux-kernel,
	virtualization

On Fri, Feb 26, 2016 at 11:45:02AM -0500, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Fri, 26 Feb 2016 16:42:41 +0800
> 
> > This series tries to add basic busy polling for vhost net. The idea is
> > simple: at the end of tx/rx processing, busy polling for new tx added
> > descriptor and rx receive socket for a while. The maximum number of
> > time (in us) could be spent on busy polling was specified ioctl.
> 
> I'm assuming this will go through Michael's tree.

Definitely.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V3 0/3] basic busy polling support for vhost_net
@ 2016-02-28  9:12     ` Michael S. Tsirkin
  0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2016-02-28  9:12 UTC (permalink / raw)
  To: David Miller
  Cc: jasowang, kvm, virtualization, netdev, linux-kernel, RAPOPORT,
	yang.zhang.wz

On Fri, Feb 26, 2016 at 11:45:02AM -0500, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Fri, 26 Feb 2016 16:42:41 +0800
> 
> > This series tries to add basic busy polling for vhost net. The idea is
> > simple: at the end of tx/rx processing, busy polling for new tx added
> > descriptor and rx receive socket for a while. The maximum number of
> > time (in us) could be spent on busy polling was specified ioctl.
> 
> I'm assuming this will go through Michael's tree.

Definitely.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-02-29  9:04 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-26  8:42 [PATCH V3 0/3] basic busy polling support for vhost_net Jason Wang
2016-02-26  8:42 ` Jason Wang
2016-02-26  8:42 ` [PATCH V3 1/3] vhost: introduce vhost_has_work() Jason Wang
2016-02-26  8:42   ` Jason Wang
2016-02-26  8:42 ` [PATCH V3 2/3] vhost: introduce vhost_vq_avail_empty() Jason Wang
2016-02-26  8:42   ` Jason Wang
2016-02-26  8:42 ` [PATCH V3 3/3] vhost_net: basic polling support Jason Wang
2016-02-26  8:42   ` Jason Wang
2016-02-28 14:09   ` Michael S. Tsirkin
2016-02-28 14:09     ` Michael S. Tsirkin
2016-02-29  5:15     ` Jason Wang
2016-02-29  5:15       ` Jason Wang
2016-02-29  9:03       ` Michael S. Tsirkin
2016-02-29  9:03         ` Michael S. Tsirkin
2016-02-28 21:56   ` Christian Borntraeger
2016-02-28 21:56     ` Christian Borntraeger
2016-02-29  5:17     ` Jason Wang
2016-02-29  5:17       ` Jason Wang
2016-02-26 16:45 ` [PATCH V3 0/3] basic busy polling support for vhost_net David Miller
2016-02-26 16:45   ` David Miller
2016-02-28  9:12   ` Michael S. Tsirkin
2016-02-28  9:12     ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.