* [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
@ 2014-05-16 22:11 Xi Wang
2014-05-19 9:27 ` Jason Wang
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Xi Wang @ 2014-05-16 22:11 UTC (permalink / raw)
To: David S. Miller, netdev
Cc: Jason Wang, Michael S. Tsirkin, Maxim Krasnyansky, Neal Cardwell,
Eric Dumazet, Xi Wang
tun_do_read always adds current thread to wait queue, even if a packet
is ready to read. This is inefficient because both sleeper and waker
want to acquire the wait queue spin lock when packet rate is high.
We restructure the read function and use common kernel networking
routines to handle receive, sleep and wakeup. With the change
available packets are checked first before the reading thread is added
to the wait queue.
Ran performance tests with the following configuration:
- my packet generator -> tap1 -> br0 -> tap0 -> my packet consumer
- sender pinned to one core and receiver pinned to another core
- sender send small UDP packets (64 bytes total) as fast as it can
- sandy bridge cores
- throughput are receiver side goodput numbers
The results are
baseline: 731k pkts/sec, cpu utilization at 1.50 cpus
changed: 783k pkts/sec, cpu utilization at 1.53 cpus
The performance difference is largely determined by packet rate and
inter-cpu communication cost. For example, if the sender and
receiver are pinned to different cpu sockets, the results are
baseline: 558k pkts/sec, cpu utilization at 1.71 cpus
changed: 690k pkts/sec, cpu utilization at 1.67 cpus
Co-authored-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xi Wang <xii@google.com>
---
Changelog since v1:
- Added back error code. NETREG_REGISTERED behavior is different but
should be compatible with the previous implementation
- Removed non essential changes
drivers/net/tun.c | 54 ++++++++++++++++--------------------------------------
1 file changed, 16 insertions(+), 38 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ee328ba..98bad1f 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -498,12 +498,12 @@ static void tun_detach_all(struct net_device *dev)
for (i = 0; i < n; i++) {
tfile = rtnl_dereference(tun->tfiles[i]);
BUG_ON(!tfile);
- wake_up_all(&tfile->wq.wait);
+ tfile->socket.sk->sk_data_ready(tfile->socket.sk);
RCU_INIT_POINTER(tfile->tun, NULL);
--tun->numqueues;
}
list_for_each_entry(tfile, &tun->disabled, next) {
- wake_up_all(&tfile->wq.wait);
+ tfile->socket.sk->sk_data_ready(tfile->socket.sk);
RCU_INIT_POINTER(tfile->tun, NULL);
}
BUG_ON(tun->numqueues != 0);
@@ -807,8 +807,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
/* Notify and wake up reader process */
if (tfile->flags & TUN_FASYNC)
kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
- wake_up_interruptible_poll(&tfile->wq.wait, POLLIN |
- POLLRDNORM | POLLRDBAND);
+ tfile->socket.sk->sk_data_ready(tfile->socket.sk);
rcu_read_unlock();
return NETDEV_TX_OK;
@@ -965,7 +964,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table *wait)
tun_debug(KERN_INFO, tun, "tun_chr_poll\n");
- poll_wait(file, &tfile->wq.wait, wait);
+ poll_wait(file, sk_sleep(sk), wait);
if (!skb_queue_empty(&sk->sk_receive_queue))
mask |= POLLIN | POLLRDNORM;
@@ -1330,47 +1329,26 @@ done:
static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
const struct iovec *iv, ssize_t len, int noblock)
{
- DECLARE_WAITQUEUE(wait, current);
struct sk_buff *skb;
ssize_t ret = 0;
+ int peeked, err, off = 0;
tun_debug(KERN_INFO, tun, "tun_do_read\n");
- if (unlikely(!noblock))
- add_wait_queue(&tfile->wq.wait, &wait);
- while (len) {
- if (unlikely(!noblock))
- current->state = TASK_INTERRUPTIBLE;
+ if (!len)
+ return ret;
- /* Read frames from the queue */
- if (!(skb = skb_dequeue(&tfile->socket.sk->sk_receive_queue))) {
- if (noblock) {
- ret = -EAGAIN;
- break;
- }
- if (signal_pending(current)) {
- ret = -ERESTARTSYS;
- break;
- }
- if (tun->dev->reg_state != NETREG_REGISTERED) {
- ret = -EIO;
- break;
- }
-
- /* Nothing to read, let's sleep */
- schedule();
- continue;
- }
+ if (tun->dev->reg_state != NETREG_REGISTERED)
+ return -EIO;
+ /* Read frames from queue */
+ skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
+ &peeked, &off, &err);
+ if (skb) {
ret = tun_put_user(tun, tfile, skb, iv, len);
kfree_skb(skb);
- break;
- }
-
- if (unlikely(!noblock)) {
- current->state = TASK_RUNNING;
- remove_wait_queue(&tfile->wq.wait, &wait);
- }
+ } else
+ ret = err;
return ret;
}
@@ -2199,8 +2177,8 @@ static int tun_chr_open(struct inode *inode, struct file * file)
tfile->flags = 0;
tfile->ifindex = 0;
- rcu_assign_pointer(tfile->socket.wq, &tfile->wq);
init_waitqueue_head(&tfile->wq.wait);
+ RCU_INIT_POINTER(tfile->socket.wq, &tfile->wq);
tfile->socket.file = file;
tfile->socket.ops = &tun_socket_ops;
--
1.9.1.423.g4596e3a
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-16 22:11 [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency Xi Wang
@ 2014-05-19 9:27 ` Jason Wang
2014-05-19 14:09 ` Eric Dumazet
2014-05-19 16:06 ` Michael S. Tsirkin
2014-05-21 7:54 ` Michael S. Tsirkin
2014-05-21 19:51 ` David Miller
2 siblings, 2 replies; 18+ messages in thread
From: Jason Wang @ 2014-05-19 9:27 UTC (permalink / raw)
To: Xi Wang, David S. Miller, netdev
Cc: Michael S. Tsirkin, Maxim Krasnyansky, Neal Cardwell,
Eric Dumazet
On 05/17/2014 06:11 AM, Xi Wang wrote:
> tun_do_read always adds current thread to wait queue, even if a packet
> is ready to read. This is inefficient because both sleeper and waker
> want to acquire the wait queue spin lock when packet rate is high.
>
> We restructure the read function and use common kernel networking
> routines to handle receive, sleep and wakeup. With the change
> available packets are checked first before the reading thread is added
> to the wait queue.
>
> Ran performance tests with the following configuration:
>
> - my packet generator -> tap1 -> br0 -> tap0 -> my packet consumer
> - sender pinned to one core and receiver pinned to another core
> - sender send small UDP packets (64 bytes total) as fast as it can
> - sandy bridge cores
> - throughput are receiver side goodput numbers
>
> The results are
>
> baseline: 731k pkts/sec, cpu utilization at 1.50 cpus
> changed: 783k pkts/sec, cpu utilization at 1.53 cpus
>
> The performance difference is largely determined by packet rate and
> inter-cpu communication cost. For example, if the sender and
> receiver are pinned to different cpu sockets, the results are
>
> baseline: 558k pkts/sec, cpu utilization at 1.71 cpus
> changed: 690k pkts/sec, cpu utilization at 1.67 cpus
>
> Co-authored-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Xi Wang <xii@google.com>
> ---
>
> Changelog since v1:
> - Added back error code. NETREG_REGISTERED behavior is different but
> should be compatible with the previous implementation
> - Removed non essential changes
>
>
> drivers/net/tun.c | 54 ++++++++++++++++--------------------------------------
> 1 file changed, 16 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index ee328ba..98bad1f 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -498,12 +498,12 @@ static void tun_detach_all(struct net_device *dev)
> for (i = 0; i < n; i++) {
> tfile = rtnl_dereference(tun->tfiles[i]);
> BUG_ON(!tfile);
> - wake_up_all(&tfile->wq.wait);
> + tfile->socket.sk->sk_data_ready(tfile->socket.sk);
Looks like wake_up_all() works pretty good here. Is there any reason to
switch to use sk_data_ready()?
wake_up_all() will wake up task of both TASK_INTERRUPTIBLE and
TASK_UNINTERRUPTIBLE while sock_def_readable() only wakes up
interruptible task. Is this a possible issue? Even if not, looks like a
cleanup not relate to the topic.
> RCU_INIT_POINTER(tfile->tun, NULL);
> --tun->numqueues;
> }
> list_for_each_entry(tfile, &tun->disabled, next) {
> - wake_up_all(&tfile->wq.wait);
> + tfile->socket.sk->sk_data_ready(tfile->socket.sk);
> RCU_INIT_POINTER(tfile->tun, NULL);
> }
> BUG_ON(tun->numqueues != 0);
> @@ -807,8 +807,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> /* Notify and wake up reader process */
> if (tfile->flags & TUN_FASYNC)
> kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
> - wake_up_interruptible_poll(&tfile->wq.wait, POLLIN |
> - POLLRDNORM | POLLRDBAND);
> + tfile->socket.sk->sk_data_ready(tfile->socket.sk);
>
> rcu_read_unlock();
> return NETDEV_TX_OK;
> @@ -965,7 +964,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table *wait)
>
> tun_debug(KERN_INFO, tun, "tun_chr_poll\n");
>
> - poll_wait(file, &tfile->wq.wait, wait);
> + poll_wait(file, sk_sleep(sk), wait);
Same here.
>
> if (!skb_queue_empty(&sk->sk_receive_queue))
> mask |= POLLIN | POLLRDNORM;
> @@ -1330,47 +1329,26 @@ done:
> static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
> const struct iovec *iv, ssize_t len, int noblock)
> {
> - DECLARE_WAITQUEUE(wait, current);
> struct sk_buff *skb;
> ssize_t ret = 0;
> + int peeked, err, off = 0;
>
> tun_debug(KERN_INFO, tun, "tun_do_read\n");
>
> - if (unlikely(!noblock))
> - add_wait_queue(&tfile->wq.wait, &wait);
> - while (len) {
> - if (unlikely(!noblock))
> - current->state = TASK_INTERRUPTIBLE;
> + if (!len)
> + return ret;
>
> - /* Read frames from the queue */
> - if (!(skb = skb_dequeue(&tfile->socket.sk->sk_receive_queue))) {
> - if (noblock) {
> - ret = -EAGAIN;
> - break;
> - }
> - if (signal_pending(current)) {
> - ret = -ERESTARTSYS;
> - break;
> - }
> - if (tun->dev->reg_state != NETREG_REGISTERED) {
> - ret = -EIO;
> - break;
> - }
> -
> - /* Nothing to read, let's sleep */
> - schedule();
> - continue;
> - }
> + if (tun->dev->reg_state != NETREG_REGISTERED)
> + return -EIO;
>
> + /* Read frames from queue */
> + skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
> + &peeked, &off, &err);
> + if (skb) {
Still a little bit difference. We check the reg_state after we're sure
there's nothing left in sk_receive_queue. But this patch returns -EIO
before trying to dequeue skb.
> ret = tun_put_user(tun, tfile, skb, iv, len);
> kfree_skb(skb);
> - break;
> - }
> -
> - if (unlikely(!noblock)) {
> - current->state = TASK_RUNNING;
> - remove_wait_queue(&tfile->wq.wait, &wait);
> - }
> + } else
> + ret = err;
>
> return ret;
> }
> @@ -2199,8 +2177,8 @@ static int tun_chr_open(struct inode *inode, struct file * file)
> tfile->flags = 0;
> tfile->ifindex = 0;
>
> - rcu_assign_pointer(tfile->socket.wq, &tfile->wq);
> init_waitqueue_head(&tfile->wq.wait);
> + RCU_INIT_POINTER(tfile->socket.wq, &tfile->wq);
>
And here
> tfile->socket.file = file;
> tfile->socket.ops = &tun_socket_ops;
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-19 9:27 ` Jason Wang
@ 2014-05-19 14:09 ` Eric Dumazet
2014-05-20 4:44 ` Jason Wang
2014-05-19 16:06 ` Michael S. Tsirkin
1 sibling, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2014-05-19 14:09 UTC (permalink / raw)
To: Jason Wang
Cc: Xi Wang, David S. Miller, netdev, Michael S. Tsirkin,
Maxim Krasnyansky, Neal Cardwell, Eric Dumazet
On Mon, 2014-05-19 at 17:27 +0800, Jason Wang wrote:
> Still a little bit difference. We check the reg_state after we're sure
> there's nothing left in sk_receive_queue. But this patch returns -EIO
> before trying to dequeue skb.
Do you think its a problem, other than a simple difference of behavior ?
Is this -EIO thing a hard requirement, or a side effect ?
Either we try to converge things, or we keep this driver a pile of
copy/pasted stuff from other locations, missing improvements we did
in core networking stack, because of some supposed differences.
About the sk_data_ready() and wake_up_all(), you missed the whole part
of the patch I think.
Check how sock_def_readable() does everything properly and efficiently,
including the async part.
static void sock_def_readable(struct sock *sk)
{
struct socket_wq *wq;
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
if (wq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait, POLLIN | POLLPRI |
POLLRDNORM | POLLRDBAND);
sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
rcu_read_unlock();
}
Using sk_data_ready() is the way to reach sock_def_readable()
as its not an EXPORT_SYMBOL.
If this driver was using the common interface, we would not have
these false sharing problems, that were solved a long time ago.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-19 9:27 ` Jason Wang
2014-05-19 14:09 ` Eric Dumazet
@ 2014-05-19 16:06 ` Michael S. Tsirkin
2014-05-20 4:51 ` Jason Wang
1 sibling, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-05-19 16:06 UTC (permalink / raw)
To: Jason Wang
Cc: Xi Wang, David S. Miller, netdev, Maxim Krasnyansky,
Neal Cardwell, Eric Dumazet
On Mon, May 19, 2014 at 05:27:41PM +0800, Jason Wang wrote:
> > @@ -1330,47 +1329,26 @@ done:
> > static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
> > const struct iovec *iv, ssize_t len, int noblock)
> > {
> > - DECLARE_WAITQUEUE(wait, current);
> > struct sk_buff *skb;
> > ssize_t ret = 0;
> > + int peeked, err, off = 0;
> >
> > tun_debug(KERN_INFO, tun, "tun_do_read\n");
> >
> > - if (unlikely(!noblock))
> > - add_wait_queue(&tfile->wq.wait, &wait);
> > - while (len) {
> > - if (unlikely(!noblock))
> > - current->state = TASK_INTERRUPTIBLE;
> > + if (!len)
> > + return ret;
> >
> > - /* Read frames from the queue */
> > - if (!(skb = skb_dequeue(&tfile->socket.sk->sk_receive_queue))) {
> > - if (noblock) {
> > - ret = -EAGAIN;
> > - break;
> > - }
> > - if (signal_pending(current)) {
> > - ret = -ERESTARTSYS;
> > - break;
> > - }
> > - if (tun->dev->reg_state != NETREG_REGISTERED) {
> > - ret = -EIO;
> > - break;
> > - }
> > -
> > - /* Nothing to read, let's sleep */
> > - schedule();
> > - continue;
> > - }
> > + if (tun->dev->reg_state != NETREG_REGISTERED)
> > + return -EIO;
> >
> > + /* Read frames from queue */
> > + skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
> > + &peeked, &off, &err);
> > + if (skb) {
>
> Still a little bit difference. We check the reg_state after we're sure
> there's nothing left in sk_receive_queue. But this patch returns -EIO
> before trying to dequeue skb.
Yes but what's the concern here? What does userspace do
to notice the change in behaviour?
tun_detach calls tun_queue_purge before unregister_netdevice
so apparently there's never anything in queue when
state isn't registered.
Did I miss something?
--
MST
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-19 14:09 ` Eric Dumazet
@ 2014-05-20 4:44 ` Jason Wang
2014-05-20 4:52 ` Eric Dumazet
2014-05-20 5:11 ` Eric Dumazet
0 siblings, 2 replies; 18+ messages in thread
From: Jason Wang @ 2014-05-20 4:44 UTC (permalink / raw)
To: Eric Dumazet
Cc: Xi Wang, David S. Miller, netdev, Michael S. Tsirkin,
Maxim Krasnyansky, Neal Cardwell, Eric Dumazet
On 05/19/2014 10:09 PM, Eric Dumazet wrote:
> On Mon, 2014-05-19 at 17:27 +0800, Jason Wang wrote:
>
>> Still a little bit difference. We check the reg_state after we're sure
>> there's nothing left in sk_receive_queue. But this patch returns -EIO
>> before trying to dequeue skb.
> Do you think its a problem, other than a simple difference of behavior ?
I just point out the possible issues.
>
> Is this -EIO thing a hard requirement, or a side effect ?
Probably not, but it's really not hard to keep this.
>
> Either we try to converge things, or we keep this driver a pile of
> copy/pasted stuff from other locations, missing improvements we did
> in core networking stack, because of some supposed differences.
Agree, but we'd better separate the unrelated changes into other patches
to simplify review and speed up convergence.
>
> About the sk_data_ready() and wake_up_all(), you missed the whole part
> of the patch I think.
>
> Check how sock_def_readable() does everything properly and efficiently,
> including the async part.
But this changes (sk_data_ready()) has nothing related to switching to
use __skb_recv_datagram()
The async part may not work since tun has its own implementation of fasync.
>
> static void sock_def_readable(struct sock *sk)
> {
> struct socket_wq *wq;
>
> rcu_read_lock();
> wq = rcu_dereference(sk->sk_wq);
> if (wq_has_sleeper(wq))
> wake_up_interruptible_sync_poll(&wq->wait, POLLIN | POLLPRI |
> POLLRDNORM | POLLRDBAND);
> sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
> rcu_read_unlock();
> }
>
> Using sk_data_ready() is the way to reach sock_def_readable()
> as its not an EXPORT_SYMBOL.
>
> If this driver was using the common interface, we would not have
> these false sharing problems, that were solved a long time ago.
Yes, but better with a separate patch.
We can probably rework the tun_fasync() to use sock_fasync(), but using
sk_data_ready() in tun_detach_all() is still questionable. We may not
want to send EIO to userspace when tun device is destroying.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-19 16:06 ` Michael S. Tsirkin
@ 2014-05-20 4:51 ` Jason Wang
2014-05-20 6:22 ` Michael S. Tsirkin
0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2014-05-20 4:51 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Xi Wang, David S. Miller, netdev, Maxim Krasnyansky,
Neal Cardwell, Eric Dumazet
On 05/20/2014 12:06 AM, Michael S. Tsirkin wrote:
> On Mon, May 19, 2014 at 05:27:41PM +0800, Jason Wang wrote:
>>> @@ -1330,47 +1329,26 @@ done:
>>> static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
>>> const struct iovec *iv, ssize_t len, int noblock)
>>> {
>>> - DECLARE_WAITQUEUE(wait, current);
>>> struct sk_buff *skb;
>>> ssize_t ret = 0;
>>> + int peeked, err, off = 0;
>>>
>>> tun_debug(KERN_INFO, tun, "tun_do_read\n");
>>>
>>> - if (unlikely(!noblock))
>>> - add_wait_queue(&tfile->wq.wait, &wait);
>>> - while (len) {
>>> - if (unlikely(!noblock))
>>> - current->state = TASK_INTERRUPTIBLE;
>>> + if (!len)
>>> + return ret;
>>>
>>> - /* Read frames from the queue */
>>> - if (!(skb = skb_dequeue(&tfile->socket.sk->sk_receive_queue))) {
>>> - if (noblock) {
>>> - ret = -EAGAIN;
>>> - break;
>>> - }
>>> - if (signal_pending(current)) {
>>> - ret = -ERESTARTSYS;
>>> - break;
>>> - }
>>> - if (tun->dev->reg_state != NETREG_REGISTERED) {
>>> - ret = -EIO;
>>> - break;
>>> - }
>>> -
>>> - /* Nothing to read, let's sleep */
>>> - schedule();
>>> - continue;
>>> - }
>>> + if (tun->dev->reg_state != NETREG_REGISTERED)
>>> + return -EIO;
>>>
>>> + /* Read frames from queue */
>>> + skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
>>> + &peeked, &off, &err);
>>> + if (skb) {
>> Still a little bit difference. We check the reg_state after we're sure
>> there's nothing left in sk_receive_queue. But this patch returns -EIO
>> before trying to dequeue skb.
>
> Yes but what's the concern here? What does userspace do
> to notice the change in behaviour?
> tun_detach calls tun_queue_purge before unregister_netdevice
> so apparently there's never anything in queue when
> state isn't registered.
> Did I miss something?
>
>From rollback_registered_many(), ndo_uninit() was called after reg_state
was changed to NETREG_UNREGISTERING. So userspace could read something
in the small window. Not sure it was a problem, but it does have minor
difference.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-20 4:44 ` Jason Wang
@ 2014-05-20 4:52 ` Eric Dumazet
2014-05-20 6:35 ` Michael S. Tsirkin
2014-05-20 5:11 ` Eric Dumazet
1 sibling, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2014-05-20 4:52 UTC (permalink / raw)
To: Jason Wang
Cc: Xi Wang, David S. Miller, netdev, Michael S. Tsirkin,
Maxim Krasnyansky, Neal Cardwell, Eric Dumazet
On Tue, 2014-05-20 at 12:44 +0800, Jason Wang wrote:
> But this changes (sk_data_ready()) has nothing related to switching to
> use __skb_recv_datagram()
I think I will give up at this point, this is becoming too much time to
gain only 15% of cpu.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-20 4:44 ` Jason Wang
2014-05-20 4:52 ` Eric Dumazet
@ 2014-05-20 5:11 ` Eric Dumazet
2014-05-20 6:03 ` Jason Wang
1 sibling, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2014-05-20 5:11 UTC (permalink / raw)
To: Jason Wang
Cc: Xi Wang, David S. Miller, netdev, Michael S. Tsirkin,
Maxim Krasnyansky, Neal Cardwell, Eric Dumazet
On Tue, 2014-05-20 at 12:44 +0800, Jason Wang wrote:
> On 05/19/2014 10:09 PM, Eric Dumazet wrote:
> >
> > About the sk_data_ready() and wake_up_all(), you missed the whole part
> > of the patch I think.
> >
> > Check how sock_def_readable() does everything properly and efficiently,
> > including the async part.
>
> But this changes (sk_data_ready()) has nothing related to switching to
> use __skb_recv_datagram()
>
This is totally related.
I think you did not yet understood this patch
Compare wake_up_all() and sk_data_ready() speeds, you'll be surprised.
You should ask to yourself : Why do we use wq_has_sleeper() in
networking stacks ?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-20 5:11 ` Eric Dumazet
@ 2014-05-20 6:03 ` Jason Wang
2014-05-20 6:34 ` Michael S. Tsirkin
2014-05-20 13:59 ` Eric Dumazet
0 siblings, 2 replies; 18+ messages in thread
From: Jason Wang @ 2014-05-20 6:03 UTC (permalink / raw)
To: Eric Dumazet
Cc: Xi Wang, David S. Miller, netdev, Michael S. Tsirkin,
Maxim Krasnyansky, Neal Cardwell, Eric Dumazet
On 05/20/2014 01:11 PM, Eric Dumazet wrote:
> On Tue, 2014-05-20 at 12:44 +0800, Jason Wang wrote:
>> On 05/19/2014 10:09 PM, Eric Dumazet wrote:
>>> About the sk_data_ready() and wake_up_all(), you missed the whole part
>>> of the patch I think.
>>>
>>> Check how sock_def_readable() does everything properly and efficiently,
>>> including the async part.
>> But this changes (sk_data_ready()) has nothing related to switching to
>> use __skb_recv_datagram()
>>
> This is totally related.
>
> I think you did not yet understood this patch
Sorry for being unclear, but I think you misunderstand my meaning.
>
> Compare wake_up_all() and sk_data_ready() speeds, you'll be surprised.
>
> You should ask to yourself : Why do we use wq_has_sleeper() in
> networking stacks ?
See my first reply, I don't have objection that uses sk_data_ready() in
tun_net_xmit(). My only concern is using sk_data_ready() in
tun_detach_all():
- It was only called during tun destroying, so I believe we don't care
about the performance in this condition.
- sk_data_ready() was usually called when there's something new to be
processed which is not case in tun_detach_all()
- Not sure it was a problem but sock_def_readable() will not wake up
uninterruptible task during tun destroying.
- If we make sock_fasync() works for tun in the future, it may send
SIGIO to user process during tun destroying which is not expected.
Thanks
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-20 4:51 ` Jason Wang
@ 2014-05-20 6:22 ` Michael S. Tsirkin
2014-05-20 6:40 ` Jason Wang
0 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-05-20 6:22 UTC (permalink / raw)
To: Jason Wang
Cc: Xi Wang, David S. Miller, netdev, Maxim Krasnyansky,
Neal Cardwell, Eric Dumazet
On Tue, May 20, 2014 at 12:51:40PM +0800, Jason Wang wrote:
> On 05/20/2014 12:06 AM, Michael S. Tsirkin wrote:
> > On Mon, May 19, 2014 at 05:27:41PM +0800, Jason Wang wrote:
> >>> @@ -1330,47 +1329,26 @@ done:
> >>> static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
> >>> const struct iovec *iv, ssize_t len, int noblock)
> >>> {
> >>> - DECLARE_WAITQUEUE(wait, current);
> >>> struct sk_buff *skb;
> >>> ssize_t ret = 0;
> >>> + int peeked, err, off = 0;
> >>>
> >>> tun_debug(KERN_INFO, tun, "tun_do_read\n");
> >>>
> >>> - if (unlikely(!noblock))
> >>> - add_wait_queue(&tfile->wq.wait, &wait);
> >>> - while (len) {
> >>> - if (unlikely(!noblock))
> >>> - current->state = TASK_INTERRUPTIBLE;
> >>> + if (!len)
> >>> + return ret;
> >>>
> >>> - /* Read frames from the queue */
> >>> - if (!(skb = skb_dequeue(&tfile->socket.sk->sk_receive_queue))) {
> >>> - if (noblock) {
> >>> - ret = -EAGAIN;
> >>> - break;
> >>> - }
> >>> - if (signal_pending(current)) {
> >>> - ret = -ERESTARTSYS;
> >>> - break;
> >>> - }
> >>> - if (tun->dev->reg_state != NETREG_REGISTERED) {
> >>> - ret = -EIO;
> >>> - break;
> >>> - }
> >>> -
> >>> - /* Nothing to read, let's sleep */
> >>> - schedule();
> >>> - continue;
> >>> - }
> >>> + if (tun->dev->reg_state != NETREG_REGISTERED)
> >>> + return -EIO;
> >>>
> >>> + /* Read frames from queue */
> >>> + skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
> >>> + &peeked, &off, &err);
> >>> + if (skb) {
> >> Still a little bit difference. We check the reg_state after we're sure
> >> there's nothing left in sk_receive_queue. But this patch returns -EIO
> >> before trying to dequeue skb.
> >
> > Yes but what's the concern here? What does userspace do
> > to notice the change in behaviour?
> > tun_detach calls tun_queue_purge before unregister_netdevice
> > so apparently there's never anything in queue when
> > state isn't registered.
> > Did I miss something?
> >
>
> >From rollback_registered_many(), ndo_uninit() was called after reg_state
> was changed to NETREG_UNREGISTERING. So userspace could read something
> in the small window. Not sure it was a problem, but it does have minor
> difference.
OK so userspace reads the fd when it's destroyed.
Previously it could get EIO, now it can get EIO.
Previously packets could get dropped, now packets could get dropped.
Sure timing changed but can the change break some userspace? How?
--
MSt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-20 6:03 ` Jason Wang
@ 2014-05-20 6:34 ` Michael S. Tsirkin
2014-05-20 6:55 ` Jason Wang
2014-05-20 13:59 ` Eric Dumazet
1 sibling, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-05-20 6:34 UTC (permalink / raw)
To: Jason Wang
Cc: Eric Dumazet, Xi Wang, David S. Miller, netdev, Maxim Krasnyansky,
Neal Cardwell, Eric Dumazet
On Tue, May 20, 2014 at 02:03:34PM +0800, Jason Wang wrote:
> On 05/20/2014 01:11 PM, Eric Dumazet wrote:
> > On Tue, 2014-05-20 at 12:44 +0800, Jason Wang wrote:
> >> On 05/19/2014 10:09 PM, Eric Dumazet wrote:
> >>> About the sk_data_ready() and wake_up_all(), you missed the whole part
> >>> of the patch I think.
> >>>
> >>> Check how sock_def_readable() does everything properly and efficiently,
> >>> including the async part.
> >> But this changes (sk_data_ready()) has nothing related to switching to
> >> use __skb_recv_datagram()
> >>
> > This is totally related.
> >
> > I think you did not yet understood this patch
>
> Sorry for being unclear, but I think you misunderstand my meaning.
> >
> > Compare wake_up_all() and sk_data_ready() speeds, you'll be surprised.
> >
> > You should ask to yourself : Why do we use wq_has_sleeper() in
> > networking stacks ?
>
> See my first reply, I don't have objection that uses sk_data_ready() in
> tun_net_xmit(). My only concern is using sk_data_ready() in
> tun_detach_all():
>
> - It was only called during tun destroying, so I believe we don't care
> about the performance in this condition.
> - sk_data_ready() was usually called when there's something new to be
> processed which is not case in tun_detach_all()
OK so what does userspace do to notice change in behaviour?
I don't ask that you write a test but can you show us in
pseudo-code?
> - Not sure it was a problem but sock_def_readable() will not wake up
> uninterruptible task during tun destroying.
But task_uninterruptible here would really mean some in-kernel caller
hooking into this function? is there a way to create this from
userspace? If not we don't care.
> - If we make sock_fasync() works for tun in the future, it may send
> SIGIO to user process during tun destroying which is not expected.
>
> Thanks
I don't get this last comment. The patch does not touch fasync paths
at all. How can it break them?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-20 4:52 ` Eric Dumazet
@ 2014-05-20 6:35 ` Michael S. Tsirkin
0 siblings, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-05-20 6:35 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jason Wang, Xi Wang, David S. Miller, netdev, Maxim Krasnyansky,
Neal Cardwell, Eric Dumazet
On Mon, May 19, 2014 at 09:52:39PM -0700, Eric Dumazet wrote:
> On Tue, 2014-05-20 at 12:44 +0800, Jason Wang wrote:
>
> > But this changes (sk_data_ready()) has nothing related to switching to
> > use __skb_recv_datagram()
>
> I think I will give up at this point, this is becoming too much time to
> gain only 15% of cpu.
>
Ouch I did like the patch, less code and more performance.
I hope we'll be able to save it still, please be patient
Jason's just being cautious trying to make sure we don't
break existing userspace.
--
MST
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-20 6:22 ` Michael S. Tsirkin
@ 2014-05-20 6:40 ` Jason Wang
0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2014-05-20 6:40 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Xi Wang, David S. Miller, netdev, Maxim Krasnyansky,
Neal Cardwell, Eric Dumazet
On 05/20/2014 02:22 PM, Michael S. Tsirkin wrote:
> On Tue, May 20, 2014 at 12:51:40PM +0800, Jason Wang wrote:
>> On 05/20/2014 12:06 AM, Michael S. Tsirkin wrote:
>>> On Mon, May 19, 2014 at 05:27:41PM +0800, Jason Wang wrote:
>>>>> @@ -1330,47 +1329,26 @@ done:
>>>>> static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
>>>>> const struct iovec *iv, ssize_t len, int noblock)
>>>>> {
>>>>> - DECLARE_WAITQUEUE(wait, current);
>>>>> struct sk_buff *skb;
>>>>> ssize_t ret = 0;
>>>>> + int peeked, err, off = 0;
>>>>>
>>>>> tun_debug(KERN_INFO, tun, "tun_do_read\n");
>>>>>
>>>>> - if (unlikely(!noblock))
>>>>> - add_wait_queue(&tfile->wq.wait, &wait);
>>>>> - while (len) {
>>>>> - if (unlikely(!noblock))
>>>>> - current->state = TASK_INTERRUPTIBLE;
>>>>> + if (!len)
>>>>> + return ret;
>>>>>
>>>>> - /* Read frames from the queue */
>>>>> - if (!(skb = skb_dequeue(&tfile->socket.sk->sk_receive_queue))) {
>>>>> - if (noblock) {
>>>>> - ret = -EAGAIN;
>>>>> - break;
>>>>> - }
>>>>> - if (signal_pending(current)) {
>>>>> - ret = -ERESTARTSYS;
>>>>> - break;
>>>>> - }
>>>>> - if (tun->dev->reg_state != NETREG_REGISTERED) {
>>>>> - ret = -EIO;
>>>>> - break;
>>>>> - }
>>>>> -
>>>>> - /* Nothing to read, let's sleep */
>>>>> - schedule();
>>>>> - continue;
>>>>> - }
>>>>> + if (tun->dev->reg_state != NETREG_REGISTERED)
>>>>> + return -EIO;
>>>>>
>>>>> + /* Read frames from queue */
>>>>> + skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
>>>>> + &peeked, &off, &err);
>>>>> + if (skb) {
>>>> Still a little bit difference. We check the reg_state after we're sure
>>>> there's nothing left in sk_receive_queue. But this patch returns -EIO
>>>> before trying to dequeue skb.
>>> Yes but what's the concern here? What does userspace do
>>> to notice the change in behaviour?
>>> tun_detach calls tun_queue_purge before unregister_netdevice
>>> so apparently there's never anything in queue when
>>> state isn't registered.
>>> Did I miss something?
>>>
>> >From rollback_registered_many(), ndo_uninit() was called after reg_state
>> was changed to NETREG_UNREGISTERING. So userspace could read something
>> in the small window. Not sure it was a problem, but it does have minor
>> difference.
>
> OK so userspace reads the fd when it's destroyed.
> Previously it could get EIO, now it can get EIO.
> Previously packets could get dropped, now packets could get dropped.
> Sure timing changed but can the change break some userspace? How?
>
Probably not even noticeable by userspace. Though it was not hard to
keep this behaviour, I'm ok with the changes here.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-20 6:34 ` Michael S. Tsirkin
@ 2014-05-20 6:55 ` Jason Wang
0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2014-05-20 6:55 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Eric Dumazet, Xi Wang, David S. Miller, netdev, Maxim Krasnyansky,
Neal Cardwell, Eric Dumazet
On 05/20/2014 02:34 PM, Michael S. Tsirkin wrote:
> On Tue, May 20, 2014 at 02:03:34PM +0800, Jason Wang wrote:
>> On 05/20/2014 01:11 PM, Eric Dumazet wrote:
>>> On Tue, 2014-05-20 at 12:44 +0800, Jason Wang wrote:
>>>> On 05/19/2014 10:09 PM, Eric Dumazet wrote:
>>>>> About the sk_data_ready() and wake_up_all(), you missed the whole part
>>>>> of the patch I think.
>>>>>
>>>>> Check how sock_def_readable() does everything properly and efficiently,
>>>>> including the async part.
>>>> But this changes (sk_data_ready()) has nothing related to switching to
>>>> use __skb_recv_datagram()
>>>>
>>> This is totally related.
>>>
>>> I think you did not yet understood this patch
>> Sorry for being unclear, but I think you misunderstand my meaning.
>>> Compare wake_up_all() and sk_data_ready() speeds, you'll be surprised.
>>>
>>> You should ask to yourself : Why do we use wq_has_sleeper() in
>>> networking stacks ?
>> See my first reply, I don't have objection that uses sk_data_ready() in
>> tun_net_xmit(). My only concern is using sk_data_ready() in
>> tun_detach_all():
>>
>> - It was only called during tun destroying, so I believe we don't care
>> about the performance in this condition.
>> - sk_data_ready() was usually called when there's something new to be
>> processed which is not case in tun_detach_all()
> OK so what does userspace do to notice change in behaviour?
> I don't ask that you write a test but can you show us in
> pseudo-code?
>
>
Not currently but possible in the future.
>> - Not sure it was a problem but sock_def_readable() will not wake up
>> uninterruptible task during tun destroying.
> But task_uninterruptible here would really mean some in-kernel caller
> hooking into this function?
Yes.
> is there a way to create this from
> userspace? If not we don't care.
I think not.
>> - If we make sock_fasync() works for tun in the future, it may send
>> SIGIO to user process during tun destroying which is not expected.
>>
>> Thanks
> I don't get this last comment. The patch does not touch fasync paths
> at all. How can it break them?
sock_def_readable() will call sk_wake_async() which may send SIGIO to
userspace if SOCK_FASYNC were set for socket. But currently tun has its
own implementation of fasync, so this can not happen. But if we want to
switch to use sock_fasync() for tun in the future, this issue may appear.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-20 6:03 ` Jason Wang
2014-05-20 6:34 ` Michael S. Tsirkin
@ 2014-05-20 13:59 ` Eric Dumazet
2014-05-21 4:45 ` Jason Wang
1 sibling, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2014-05-20 13:59 UTC (permalink / raw)
To: Jason Wang
Cc: Xi Wang, David S. Miller, netdev, Michael S. Tsirkin,
Maxim Krasnyansky, Neal Cardwell, Eric Dumazet
On Tue, 2014-05-20 at 14:03 +0800, Jason Wang wrote:
> On 05/20/2014 01:11 PM, Eric Dumazet wrote:
> > On Tue, 2014-05-20 at 12:44 +0800, Jason Wang wrote:
> >> On 05/19/2014 10:09 PM, Eric Dumazet wrote:
> >>> About the sk_data_ready() and wake_up_all(), you missed the whole part
> >>> of the patch I think.
> >>>
> >>> Check how sock_def_readable() does everything properly and efficiently,
> >>> including the async part.
> >> But this changes (sk_data_ready()) has nothing related to switching to
> >> use __skb_recv_datagram()
> >>
> > This is totally related.
> >
> > I think you did not yet understood this patch
>
> Sorry for being unclear, but I think you misunderstand my meaning.
> >
> > Compare wake_up_all() and sk_data_ready() speeds, you'll be surprised.
> >
> > You should ask to yourself : Why do we use wq_has_sleeper() in
> > networking stacks ?
>
> See my first reply, I don't have objection that uses sk_data_ready() in
> tun_net_xmit(). My only concern is using sk_data_ready() in
> tun_detach_all():
>
> - It was only called during tun destroying, so I believe we don't care
> about the performance in this condition.
Its there for symmetry, and so far our tests just work.
Have you run into any problems ?
> - sk_data_ready() was usually called when there's something new to be
> processed which is not case in tun_detach_all()
sk_data_ready() will wakeup waiters exactly like wake_up_all()
We do not use wake_up_all() in net/ipv4 & net/ipv6, have you seen any
bug because of this ?
wake_up_all() is a lazy call, when an author cannot be careful enough to
use a better way.
Your resistance shows that you think the _existing_ code might be racy.
Care to elaborate instead ?
> - Not sure it was a problem but sock_def_readable() will not wake up
> uninterruptible task during tun destroying.
Thats irrelevant. We are supposed to unblock threads that are waiting on
the tun device, not threads doing uninterruptible stuff somewhere else
in the kernel.
Eventually they will later reach tun device and will detect device is
gone/dismantled.
> - If we make sock_fasync() works for tun in the future, it may send
> SIGIO to user process during tun destroying which is not expected.
SOCK_FASYNC is not set on the tun socket.
sk_wake_async() does nothing in this case. As for 99.9999 % of TCP
sockets and nobody ever noticed this code path was almost dead.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-20 13:59 ` Eric Dumazet
@ 2014-05-21 4:45 ` Jason Wang
0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2014-05-21 4:45 UTC (permalink / raw)
To: Eric Dumazet
Cc: Xi Wang, David S. Miller, netdev, Michael S. Tsirkin,
Maxim Krasnyansky, Neal Cardwell, Eric Dumazet
On 05/20/2014 09:59 PM, Eric Dumazet wrote:
> On Tue, 2014-05-20 at 14:03 +0800, Jason Wang wrote:
>> On 05/20/2014 01:11 PM, Eric Dumazet wrote:
>>> On Tue, 2014-05-20 at 12:44 +0800, Jason Wang wrote:
>>>> On 05/19/2014 10:09 PM, Eric Dumazet wrote:
>>>>> About the sk_data_ready() and wake_up_all(), you missed the whole part
>>>>> of the patch I think.
>>>>>
>>>>> Check how sock_def_readable() does everything properly and efficiently,
>>>>> including the async part.
>>>> But this changes (sk_data_ready()) has nothing related to switching to
>>>> use __skb_recv_datagram()
>>>>
>>> This is totally related.
>>>
>>> I think you did not yet understood this patch
>> Sorry for being unclear, but I think you misunderstand my meaning.
>>> Compare wake_up_all() and sk_data_ready() speeds, you'll be surprised.
>>>
>>> You should ask to yourself : Why do we use wq_has_sleeper() in
>>> networking stacks ?
>> See my first reply, I don't have objection that uses sk_data_ready() in
>> tun_net_xmit(). My only concern is using sk_data_ready() in
>> tun_detach_all():
>>
>> - It was only called during tun destroying, so I believe we don't care
>> about the performance in this condition.
> Its there for symmetry, and so far our tests just work.
>
> Have you run into any problems ?
Nope, I tested this patch and it works well.
>> - sk_data_ready() was usually called when there's something new to be
>> processed which is not case in tun_detach_all()
> sk_data_ready() will wakeup waiters exactly like wake_up_all()
>
> We do not use wake_up_all() in net/ipv4 & net/ipv6, have you seen any
> bug because of this ?
>
> wake_up_all() is a lazy call, when an author cannot be careful enough to
> use a better way.
I haven't. I thought there should be some reason that the author use
wake_up_all() here. But I'm now convinced that it's safe to do the change.
>
> Your resistance shows that you think the _existing_ code might be racy.
>
> Care to elaborate instead ?
I'm asking since I want to make sure nothing breaks and I think some of
changes are unrelated.
>> - Not sure it was a problem but sock_def_readable() will not wake up
>> uninterruptible task during tun destroying.
> Thats irrelevant. We are supposed to unblock threads that are waiting on
> the tun device, not threads doing uninterruptible stuff somewhere else
> in the kernel.
>
> Eventually they will later reach tun device and will detect device is
> gone/dismantled.
Ok.
>
>> - If we make sock_fasync() works for tun in the future, it may send
>> SIGIO to user process during tun destroying which is not expected.
> SOCK_FASYNC is not set on the tun socket.
>
> sk_wake_async() does nothing in this case. As for 99.9999 % of TCP
> sockets and nobody ever noticed this code path was almost dead.
>
>
I see and thanks for your time. I don't have concern with this patch any
more.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-16 22:11 [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency Xi Wang
2014-05-19 9:27 ` Jason Wang
@ 2014-05-21 7:54 ` Michael S. Tsirkin
2014-05-21 19:51 ` David Miller
2 siblings, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-05-21 7:54 UTC (permalink / raw)
To: Xi Wang
Cc: David S. Miller, netdev, Jason Wang, Maxim Krasnyansky,
Neal Cardwell, Eric Dumazet
On Fri, May 16, 2014 at 03:11:48PM -0700, Xi Wang wrote:
> tun_do_read always adds current thread to wait queue, even if a packet
> is ready to read. This is inefficient because both sleeper and waker
> want to acquire the wait queue spin lock when packet rate is high.
>
> We restructure the read function and use common kernel networking
> routines to handle receive, sleep and wakeup. With the change
> available packets are checked first before the reading thread is added
> to the wait queue.
>
> Ran performance tests with the following configuration:
>
> - my packet generator -> tap1 -> br0 -> tap0 -> my packet consumer
> - sender pinned to one core and receiver pinned to another core
> - sender send small UDP packets (64 bytes total) as fast as it can
> - sandy bridge cores
> - throughput are receiver side goodput numbers
>
> The results are
>
> baseline: 731k pkts/sec, cpu utilization at 1.50 cpus
> changed: 783k pkts/sec, cpu utilization at 1.53 cpus
>
> The performance difference is largely determined by packet rate and
> inter-cpu communication cost. For example, if the sender and
> receiver are pinned to different cpu sockets, the results are
>
> baseline: 558k pkts/sec, cpu utilization at 1.71 cpus
> changed: 690k pkts/sec, cpu utilization at 1.67 cpus
>
> Co-authored-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Xi Wang <xii@google.com>
Less code and more performance, what's not to like.
I think userspace concerns/questions have been addressed in
this thread.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Thanks for your patience!
> ---
>
> Changelog since v1:
> - Added back error code. NETREG_REGISTERED behavior is different but
> should be compatible with the previous implementation
> - Removed non essential changes
>
>
> drivers/net/tun.c | 54 ++++++++++++++++--------------------------------------
> 1 file changed, 16 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index ee328ba..98bad1f 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -498,12 +498,12 @@ static void tun_detach_all(struct net_device *dev)
> for (i = 0; i < n; i++) {
> tfile = rtnl_dereference(tun->tfiles[i]);
> BUG_ON(!tfile);
> - wake_up_all(&tfile->wq.wait);
> + tfile->socket.sk->sk_data_ready(tfile->socket.sk);
> RCU_INIT_POINTER(tfile->tun, NULL);
> --tun->numqueues;
> }
> list_for_each_entry(tfile, &tun->disabled, next) {
> - wake_up_all(&tfile->wq.wait);
> + tfile->socket.sk->sk_data_ready(tfile->socket.sk);
> RCU_INIT_POINTER(tfile->tun, NULL);
> }
> BUG_ON(tun->numqueues != 0);
> @@ -807,8 +807,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> /* Notify and wake up reader process */
> if (tfile->flags & TUN_FASYNC)
> kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
> - wake_up_interruptible_poll(&tfile->wq.wait, POLLIN |
> - POLLRDNORM | POLLRDBAND);
> + tfile->socket.sk->sk_data_ready(tfile->socket.sk);
>
> rcu_read_unlock();
> return NETDEV_TX_OK;
> @@ -965,7 +964,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table *wait)
>
> tun_debug(KERN_INFO, tun, "tun_chr_poll\n");
>
> - poll_wait(file, &tfile->wq.wait, wait);
> + poll_wait(file, sk_sleep(sk), wait);
>
> if (!skb_queue_empty(&sk->sk_receive_queue))
> mask |= POLLIN | POLLRDNORM;
> @@ -1330,47 +1329,26 @@ done:
> static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
> const struct iovec *iv, ssize_t len, int noblock)
> {
> - DECLARE_WAITQUEUE(wait, current);
> struct sk_buff *skb;
> ssize_t ret = 0;
> + int peeked, err, off = 0;
>
> tun_debug(KERN_INFO, tun, "tun_do_read\n");
>
> - if (unlikely(!noblock))
> - add_wait_queue(&tfile->wq.wait, &wait);
> - while (len) {
> - if (unlikely(!noblock))
> - current->state = TASK_INTERRUPTIBLE;
> + if (!len)
> + return ret;
>
> - /* Read frames from the queue */
> - if (!(skb = skb_dequeue(&tfile->socket.sk->sk_receive_queue))) {
> - if (noblock) {
> - ret = -EAGAIN;
> - break;
> - }
> - if (signal_pending(current)) {
> - ret = -ERESTARTSYS;
> - break;
> - }
> - if (tun->dev->reg_state != NETREG_REGISTERED) {
> - ret = -EIO;
> - break;
> - }
> -
> - /* Nothing to read, let's sleep */
> - schedule();
> - continue;
> - }
> + if (tun->dev->reg_state != NETREG_REGISTERED)
> + return -EIO;
>
> + /* Read frames from queue */
> + skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
> + &peeked, &off, &err);
> + if (skb) {
> ret = tun_put_user(tun, tfile, skb, iv, len);
> kfree_skb(skb);
> - break;
> - }
> -
> - if (unlikely(!noblock)) {
> - current->state = TASK_RUNNING;
> - remove_wait_queue(&tfile->wq.wait, &wait);
> - }
> + } else
> + ret = err;
>
> return ret;
> }
> @@ -2199,8 +2177,8 @@ static int tun_chr_open(struct inode *inode, struct file * file)
> tfile->flags = 0;
> tfile->ifindex = 0;
>
> - rcu_assign_pointer(tfile->socket.wq, &tfile->wq);
> init_waitqueue_head(&tfile->wq.wait);
> + RCU_INIT_POINTER(tfile->socket.wq, &tfile->wq);
>
> tfile->socket.file = file;
> tfile->socket.ops = &tun_socket_ops;
> --
> 1.9.1.423.g4596e3a
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency
2014-05-16 22:11 [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency Xi Wang
2014-05-19 9:27 ` Jason Wang
2014-05-21 7:54 ` Michael S. Tsirkin
@ 2014-05-21 19:51 ` David Miller
2 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2014-05-21 19:51 UTC (permalink / raw)
To: xii; +Cc: netdev, jasowang, mst, maxk, ncardwell, edumazet
From: Xi Wang <xii@google.com>
Date: Fri, 16 May 2014 15:11:48 -0700
> tun_do_read always adds current thread to wait queue, even if a packet
> is ready to read. This is inefficient because both sleeper and waker
> want to acquire the wait queue spin lock when packet rate is high.
>
> We restructure the read function and use common kernel networking
> routines to handle receive, sleep and wakeup. With the change
> available packets are checked first before the reading thread is added
> to the wait queue.
>
> Ran performance tests with the following configuration:
>
> - my packet generator -> tap1 -> br0 -> tap0 -> my packet consumer
> - sender pinned to one core and receiver pinned to another core
> - sender send small UDP packets (64 bytes total) as fast as it can
> - sandy bridge cores
> - throughput are receiver side goodput numbers
>
> The results are
>
> baseline: 731k pkts/sec, cpu utilization at 1.50 cpus
> changed: 783k pkts/sec, cpu utilization at 1.53 cpus
>
> The performance difference is largely determined by packet rate and
> inter-cpu communication cost. For example, if the sender and
> receiver are pinned to different cpu sockets, the results are
>
> baseline: 558k pkts/sec, cpu utilization at 1.71 cpus
> changed: 690k pkts/sec, cpu utilization at 1.67 cpus
>
> Co-authored-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Xi Wang <xii@google.com>
Applied to net-next, thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2014-05-21 19:51 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-16 22:11 [PATCH v2] net-tun: restructure tun_do_read for better sleep/wakeup efficiency Xi Wang
2014-05-19 9:27 ` Jason Wang
2014-05-19 14:09 ` Eric Dumazet
2014-05-20 4:44 ` Jason Wang
2014-05-20 4:52 ` Eric Dumazet
2014-05-20 6:35 ` Michael S. Tsirkin
2014-05-20 5:11 ` Eric Dumazet
2014-05-20 6:03 ` Jason Wang
2014-05-20 6:34 ` Michael S. Tsirkin
2014-05-20 6:55 ` Jason Wang
2014-05-20 13:59 ` Eric Dumazet
2014-05-21 4:45 ` Jason Wang
2014-05-19 16:06 ` Michael S. Tsirkin
2014-05-20 4:51 ` Jason Wang
2014-05-20 6:22 ` Michael S. Tsirkin
2014-05-20 6:40 ` Jason Wang
2014-05-21 7:54 ` Michael S. Tsirkin
2014-05-21 19:51 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).