Netdev List
 help / color / mirror / Atom feed
* Re: [V2 PATCH 9/9] vhost: zerocopy: poll vq in zerocopy callback
From: Shirley Ma @ 2012-05-16 15:10 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, mst, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <4FB317C8.90002@redhat.com>

On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
> >>   drivers/vhost/vhost.c |    1 +
> >>   1 files changed, 1 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >> index 947f00d..7b75fdf 100644
> >> --- a/drivers/vhost/vhost.c
> >> +++ b/drivers/vhost/vhost.c
> >> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
> >>          struct vhost_ubuf_ref *ubufs = ubuf->arg;
> >>          struct vhost_virtqueue *vq = ubufs->vq;
> >>
> >> +       vhost_poll_queue(&vq->poll);
> >>          /* set len = 1 to mark this desc buffers done DMA */
> >>          vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
> >>          kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
> > Doing so, we might have redundant vhost_poll_queue(). Do you know in
> > which scenario there might be missing of adding and signaling during
> > zerocopy?
> 
> Yes, as we only do signaling and adding during tx work, if there's no
> tx 
> work when the skb were sent, we may lose the opportunity to let guest 
> know about the completion. It's easy to be reproduced with netperf
> test. 

The reason which host signals guest is to free guest tx buffers, if
there is no tx work, then it's not necessary to signal the guest unless
guest runs out of memory. The pending buffers will be released
virtio_net device gone.

What's the behavior of netperf test when you hit this situation?

Thanks
Shirley

^ permalink raw reply

* Re: [V2 PATCH 9/9] vhost: zerocopy: poll vq in zerocopy callback
From: Michael S. Tsirkin @ 2012-05-16 15:14 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <1337181027.10741.13.camel@oc3660625478.ibm.com>

On Wed, May 16, 2012 at 08:10:27AM -0700, Shirley Ma wrote:
> On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
> > >>   drivers/vhost/vhost.c |    1 +
> > >>   1 files changed, 1 insertions(+), 0 deletions(-)
> > >>
> > >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > >> index 947f00d..7b75fdf 100644
> > >> --- a/drivers/vhost/vhost.c
> > >> +++ b/drivers/vhost/vhost.c
> > >> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
> > >>          struct vhost_ubuf_ref *ubufs = ubuf->arg;
> > >>          struct vhost_virtqueue *vq = ubufs->vq;
> > >>
> > >> +       vhost_poll_queue(&vq->poll);
> > >>          /* set len = 1 to mark this desc buffers done DMA */
> > >>          vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
> > >>          kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
> > > Doing so, we might have redundant vhost_poll_queue(). Do you know in
> > > which scenario there might be missing of adding and signaling during
> > > zerocopy?
> > 
> > Yes, as we only do signaling and adding during tx work, if there's no
> > tx 
> > work when the skb were sent, we may lose the opportunity to let guest 
> > know about the completion. It's easy to be reproduced with netperf
> > test. 
> 
> The reason which host signals guest is to free guest tx buffers, if
> there is no tx work, then it's not necessary to signal the guest unless
> guest runs out of memory. The pending buffers will be released
> virtio_net device gone.
> 
> What's the behavior of netperf test when you hit this situation?
> 
> Thanks
> Shirley

IIRC guest networking seems to be lost.


-- 
MST

^ permalink raw reply

* Re: [PATCH] drop_monitor: convert to modular building
From: Neil Horman @ 2012-05-16 15:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David S. Miller
In-Reply-To: <1337179681.8512.1212.camel@edumazet-glaptop>

On Wed, May 16, 2012 at 04:48:01PM +0200, Eric Dumazet wrote:
> On Wed, 2012-05-16 at 10:27 -0400, Neil Horman wrote:
> > When I first wrote drop monitor I wrote it to just build monolithically.  There
> > is no reason it can't be built modularly as well, so lets give it that
> > flexibiity.
> 
> > +	for_each_present_cpu(cpu) {
> > +		data = &per_cpu(dm_cpu_data, cpu);
> > +		del_timer(&data->send_timer);
> > +		cancel_work_sync(&data->dm_alert_work);
> > +		/*
> > +		 * At this point, we should have exclusive access
> > +		 * to this struct and can free the skb inside it
> > +		 */
> > +		kfree_skb(data->skb);
> > +	}
> > +
> 
> I dont think for_each_present_cpu(cpu) is right 
> 
> (I realize drop_monitor already uses this, but its a bug)
> 
> To use it, you must have a notifier to react to cpu HOTPLUG events.
> 
> -> for_each_possible_cpu() is more correct.
> 
Ok, i can do that.
> Also, please dont add new printk(KERN_WARNING ...), use pr_warn(...)
> instead
> 
Ack, I'll add a patch to this series to convert the existing printks to their
corresponding pr_* macros
Neil

> 
> 
> 

^ permalink raw reply

* Re: [PATCH RFC] tun: experimental zero copy tx support
From: Shirley Ma @ 2012-05-16 15:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David S. Miller, Stephen Hemminger, Joe Perches, Jason Wang,
	netdev, linux-kernel, Ian.Campbell, kvm
In-Reply-To: <20120514183929.GB17086@redhat.com>

On Mon, 2012-05-14 at 21:39 +0300, Michael S. Tsirkin wrote:
> > Hello Mike,
> > 
> > Have you tested this patch? I think the difference between macvtap
> and
> > tap is tap forwarding the packet to bridge. The zerocopy is disabled
> in
> > this case.
> > 
> > Shirley
> 
> Testing in progress, but the patchset I pointed to enables
> zerocopy with bridge. 

Hello Mike,

You meant this patch or another patchset for enabling bridge zerocopy?

I remembered we disabled forward skb zerocopy since the user space
program might hold the buffers too long or forever.

In tap/bridge case, when the tx buffers will be released?

Thanks
Shirley


^ permalink raw reply

* Re: [PATCH] drop_monitor: convert to modular building
From: Neil Horman @ 2012-05-16 15:19 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, David S. Miller
In-Reply-To: <1337180505.2568.10.camel@bwh-desktop.uk.solarflarecom.com>

On Wed, May 16, 2012 at 04:01:45PM +0100, Ben Hutchings wrote:
> On Wed, 2012-05-16 at 10:27 -0400, Neil Horman wrote:
> > When I first wrote drop monitor I wrote it to just build monolithically.  There
> > is no reason it can't be built modularly as well, so lets give it that
> > flexibiity.
> 
> Yes, please.
> 
> [...]
> > --- a/net/core/drop_monitor.c
> > +++ b/net/core/drop_monitor.c
> > @@ -22,6 +22,7 @@
> >  #include <linux/timer.h>
> >  #include <linux/bitops.h>
> >  #include <linux/slab.h>
> > +#include <linux/module.h>
> >  #include <net/genetlink.h>
> >  #include <net/netevent.h>
> >  
> > @@ -223,9 +224,15 @@ static int set_all_monitor_traces(int state)
> >  
> >  	switch (state) {
> >  	case TRACE_ON:
> > +		if (!try_module_get(THIS_MODULE)) {
> > +			rc = -EINVAL;
> 
> Minor issue, but this isn't the right error code - there is nothing
> invalid about the request, it just came at the wrong time.  Perhaps
> ENODEV or ECANCELED?
> 
Yeah, ok, ENODEV seems reasonable.

> > +			break;
> > +		}
> > +
> >  		rc |= register_trace_kfree_skb(trace_kfree_skb_hit, NULL);
> >  		rc |= register_trace_napi_poll(trace_napi_poll_hit, NULL);
> >  		break;
> > +
> >  	case TRACE_OFF:
> >  		rc |= unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL);
> >  		rc |= unregister_trace_napi_poll(trace_napi_poll_hit, NULL);
> > @@ -241,6 +248,9 @@ static int set_all_monitor_traces(int state)
> >  				kfree_rcu(new_stat, rcu);
> >  			}
> >  		}
> > +
> > +		module_put(THIS_MODULE);
> > +
> >  		break;
> >  	default:
> >  		rc = 1;
> > @@ -383,4 +393,38 @@ out:
> >  	return rc;
> >  }
> >  
> > -late_initcall(init_net_drop_monitor);
> > +static void exit_net_drop_monitor(void)
> > +{
> > +	struct per_cpu_dm_data *data;
> > +	int cpu;
> > +
> > +	if (unregister_netdevice_notifier(&dropmon_net_notifier))
> > +		printk(KERN_WARNING "Unable to unregiser dropmon notifer\n");
> 
> Currently this will only fail if you didn't actually register the
> notifier, which would be a bug.  If there is ever any other reason this
> could fail, continuing with the notifier still registered would be
> disastrous.  Therefore I think this should be:
> 
> 	rc = unregister_netdevice_notifier(&dropmon_net_notifier);
> 	BUG_ON(rc);
> 
Ok, seems reasonable.

> > +	/*
> > +	 * Because of the module_get/put we do in the trace state change path
> > +	 * we are guarnateed not to have any current users when we get here
> > +	 * all we need to do is make sure that we don't have any running timers
> > +	 * or pending schedule calls
> > +	 */
> 
> Surely you need to call set_all_monitor_traces(TRACE_OFF) first...
> 
Nope, If you'll note the code higher up in the patch, I use try_module_get and
module_put to prevent the module unload code from getting here while anyone is
actually using the protocol.  Once we are in the module remove routine here, we
are guaranateed that there are no users of this protocol, and that the
tracepoints are all unregistered.

> > +	for_each_present_cpu(cpu) {
> > +		data = &per_cpu(dm_cpu_data, cpu);
> > +		del_timer(&data->send_timer);
> > +		cancel_work_sync(&data->dm_alert_work);
> > +		/*
> > +		 * At this point, we should have exclusive access
> > +		 * to this struct and can free the skb inside it
> > +		 */
> > +		kfree_skb(data->skb);
> > +	}
> > +
> > +	if (genl_unregister_family(&net_drop_monitor_family))
> > +		printk(KERN_WARNING "Unable to unregister drop monitor socket family\n");
> 
> Same issue as with unregister_netdevice_notifier().
> 
ack, I'll update that to a BUG_ON
Neil

^ permalink raw reply

* Re: [PATCH 0/3] net: mac80211: Neaten debugging
From: Joe Perches @ 2012-05-16 15:22 UTC (permalink / raw)
  To: Johannes Berg
  Cc: David Miller, linville, linux-wireless, netdev, linux-kernel
In-Reply-To: <1337155171.4367.2.camel@jlt3.sipsolutions.net>

On Wed, 2012-05-16 at 09:59 +0200, Johannes Berg wrote:
> I wonder if it makes sense to leave these under
> #ifdef though? Why #ifdef something if it's going to be invisible most
> of the time anyway?

I don't understand your point.
#ifdef removal is a good thing.

^ permalink raw reply

* Re: [PATCH 0/3] net: mac80211: Neaten debugging
From: Johannes Berg @ 2012-05-16 15:30 UTC (permalink / raw)
  To: Joe Perches; +Cc: David Miller, linville, linux-wireless, netdev, linux-kernel
In-Reply-To: <1337181771.4818.0.camel@joe2Laptop>

On Wed, 2012-05-16 at 08:22 -0700, Joe Perches wrote:
> On Wed, 2012-05-16 at 09:59 +0200, Johannes Berg wrote:
> > I wonder if it makes sense to leave these under
> > #ifdef though? Why #ifdef something if it's going to be invisible most
> > of the time anyway?
> 
> I don't understand your point.
> #ifdef removal is a good thing.

Yeah, but you left a lot of them under ifdef, and I'm wondering why you
didn't remove them, or if you should, or ...

johannes

^ permalink raw reply

* Re: [PATCH] drop_monitor: convert to modular building
From: Ben Hutchings @ 2012-05-16 15:34 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, David S. Miller
In-Reply-To: <20120516151932.GB30195@hmsreliant.think-freely.org>

On Wed, 2012-05-16 at 11:19 -0400, Neil Horman wrote:
> On Wed, May 16, 2012 at 04:01:45PM +0100, Ben Hutchings wrote:
> > On Wed, 2012-05-16 at 10:27 -0400, Neil Horman wrote:
> > > When I first wrote drop monitor I wrote it to just build monolithically.  There
> > > is no reason it can't be built modularly as well, so lets give it that
> > > flexibiity.
[...]
> > > +	/*
> > > +	 * Because of the module_get/put we do in the trace state change path
> > > +	 * we are guarnateed not to have any current users when we get here
> > > +	 * all we need to do is make sure that we don't have any running timers
> > > +	 * or pending schedule calls
> > > +	 */
> > 
> > Surely you need to call set_all_monitor_traces(TRACE_OFF) first...
> > 
> Nope, If you'll note the code higher up in the patch, I use try_module_get and
> module_put to prevent the module unload code from getting here while anyone is
> actually using the protocol.  Once we are in the module remove routine here, we
> are guaranateed that there are no users of this protocol, and that the
> tracepoints are all unregistered.
[...]

Yes, of course.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 0/3] net: mac80211: Neaten debugging
From: Joe Perches @ 2012-05-16 15:43 UTC (permalink / raw)
  To: Johannes Berg
  Cc: David Miller, linville, linux-wireless, netdev, linux-kernel
In-Reply-To: <1337182220.18519.5.camel@jlt3.sipsolutions.net>

On Wed, 2012-05-16 at 17:30 +0200, Johannes Berg wrote:
> On Wed, 2012-05-16 at 08:22 -0700, Joe Perches wrote:
> > On Wed, 2012-05-16 at 09:59 +0200, Johannes Berg wrote:
> > > I wonder if it makes sense to leave these under
> > > #ifdef though? Why #ifdef something if it's going to be invisible most
> > > of the time anyway?
> > 
> > I don't understand your point.
> > #ifdef removal is a good thing.
> 
> Yeah, but you left a lot of them under ifdef, and I'm wondering why you
> didn't remove them, or if you should, or ...

Those mostly use other different
#ifdef CONFIG_SOME_OTHER_CONTROL elements.

There are I think 3 that I left because they do
not use printk/pr_level but use wiphy_<level> or
netdev_<level>.

^ permalink raw reply

* Re: [PATCH 0/3] net: mac80211: Neaten debugging
From: Johannes Berg @ 2012-05-16 15:56 UTC (permalink / raw)
  To: Joe Perches; +Cc: David Miller, linville, linux-wireless, netdev, linux-kernel
In-Reply-To: <1337183004.4818.2.camel@joe2Laptop>

On Wed, 2012-05-16 at 08:43 -0700, Joe Perches wrote:
> On Wed, 2012-05-16 at 17:30 +0200, Johannes Berg wrote:
> > On Wed, 2012-05-16 at 08:22 -0700, Joe Perches wrote:
> > > On Wed, 2012-05-16 at 09:59 +0200, Johannes Berg wrote:
> > > > I wonder if it makes sense to leave these under
> > > > #ifdef though? Why #ifdef something if it's going to be invisible most
> > > > of the time anyway?
> > > 
> > > I don't understand your point.
> > > #ifdef removal is a good thing.
> > 
> > Yeah, but you left a lot of them under ifdef, and I'm wondering why you
> > didn't remove them, or if you should, or ...
> 
> Those mostly use other different
> #ifdef CONFIG_SOME_OTHER_CONTROL elements.
> 
> There are I think 3 that I left because they do
> not use printk/pr_level but use wiphy_<level> or
> netdev_<level>.

Hmm, ok. I guess I need to just look at them and decide what should be
what. Thanks for the preparation work though :)

johannes

^ permalink raw reply

* [PATCH net-next] net: sock_flag() cleanup
From: Eric Dumazet @ 2012-05-16 15:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

- sock_flag() accepts a const pointer

- sock_flag() returns a boolean

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h |    2 +-
 net/core/sock.c    |   14 +++++++-------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index e613704..036f506 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -650,7 +650,7 @@ static inline void sock_reset_flag(struct sock *sk, enum sock_flags flag)
 	__clear_bit(flag, &sk->sk_flags);
 }
 
-static inline int sock_flag(struct sock *sk, enum sock_flags flag)
+static inline bool sock_flag(const struct sock *sk, enum sock_flags flag)
 {
 	return test_bit(flag, &sk->sk_flags);
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 26ed27f..9d144ee 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -849,7 +849,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		break;
 
 	case SO_BROADCAST:
-		v.val = !!sock_flag(sk, SOCK_BROADCAST);
+		v.val = sock_flag(sk, SOCK_BROADCAST);
 		break;
 
 	case SO_SNDBUF:
@@ -865,7 +865,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		break;
 
 	case SO_KEEPALIVE:
-		v.val = !!sock_flag(sk, SOCK_KEEPOPEN);
+		v.val = sock_flag(sk, SOCK_KEEPOPEN);
 		break;
 
 	case SO_TYPE:
@@ -887,7 +887,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		break;
 
 	case SO_OOBINLINE:
-		v.val = !!sock_flag(sk, SOCK_URGINLINE);
+		v.val = sock_flag(sk, SOCK_URGINLINE);
 		break;
 
 	case SO_NO_CHECK:
@@ -900,7 +900,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 
 	case SO_LINGER:
 		lv		= sizeof(v.ling);
-		v.ling.l_onoff	= !!sock_flag(sk, SOCK_LINGER);
+		v.ling.l_onoff	= sock_flag(sk, SOCK_LINGER);
 		v.ling.l_linger	= sk->sk_lingertime / HZ;
 		break;
 
@@ -1012,11 +1012,11 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		break;
 
 	case SO_RXQ_OVFL:
-		v.val = !!sock_flag(sk, SOCK_RXQ_OVFL);
+		v.val = sock_flag(sk, SOCK_RXQ_OVFL);
 		break;
 
 	case SO_WIFI_STATUS:
-		v.val = !!sock_flag(sk, SOCK_WIFI_STATUS);
+		v.val = sock_flag(sk, SOCK_WIFI_STATUS);
 		break;
 
 	case SO_PEEK_OFF:
@@ -1026,7 +1026,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = sk->sk_peek_off;
 		break;
 	case SO_NOFCS:
-		v.val = !!sock_flag(sk, SOCK_NOFCS);
+		v.val = sock_flag(sk, SOCK_NOFCS);
 		break;
 	default:
 		return -ENOPROTOOPT;

^ permalink raw reply related

* Re: [PATCH RFC] tun: experimental zero copy tx support
From: Michael S. Tsirkin @ 2012-05-16 16:51 UTC (permalink / raw)
  To: Shirley Ma
  Cc: David S. Miller, Stephen Hemminger, Joe Perches, Jason Wang,
	netdev, linux-kernel, Ian.Campbell, kvm
In-Reply-To: <1337181415.10741.18.camel@oc3660625478.ibm.com>

On Wed, May 16, 2012 at 08:16:55AM -0700, Shirley Ma wrote:
> On Mon, 2012-05-14 at 21:39 +0300, Michael S. Tsirkin wrote:
> > > Hello Mike,
> > > 
> > > Have you tested this patch? I think the difference between macvtap
> > and
> > > tap is tap forwarding the packet to bridge. The zerocopy is disabled
> > in
> > > this case.
> > > 
> > > Shirley
> > 
> > Testing in progress, but the patchset I pointed to enables
> > zerocopy with bridge. 
> 
> Hello Mike,
> 
> You meant this patch or another patchset for enabling bridge zerocopy?
> 
> I remembered we disabled forward skb zerocopy since the user space
> program might hold the buffers too long or forever.
> 
> In tap/bridge case, when the tx buffers will be released?
> 
> Thanks
> Shirley

It still fails some tests for me but maybe I'll post the whole
patchset so you can see how it works.

^ permalink raw reply

* Re: [PATCH] ptp_pch: Add missing #include <linux/slab.h>
From: Richard Cochran @ 2012-05-16 17:19 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Takahiro Shimizu, David S. Miller, netdev, linux-kernel
In-Reply-To: <1337169017-15835-1-git-send-email-geert@linux-m68k.org>

On Wed, May 16, 2012 at 01:50:17PM +0200, Geert Uytterhoeven wrote:
> drivers/ptp/ptp_pch.c: In function 'pch_remove':
> drivers/ptp/ptp_pch.c:576:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]
> drivers/ptp/ptp_pch.c: In function 'pch_probe':
> drivers/ptp/ptp_pch.c:587:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
> 
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

Acked-by: Richard Cochran <richardcochran@gmail.com>

Thanks,
Richard

> ---
> Not even compile-tested, as it fails on parisc only.
> http://kisskb.ellerman.id.au/kisskb/buildresult/6312813/
> 
>  drivers/ptp/ptp_pch.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/ptp/ptp_pch.c b/drivers/ptp/ptp_pch.c
> index 375eb04..6fff680 100644
> --- a/drivers/ptp/ptp_pch.c
> +++ b/drivers/ptp/ptp_pch.c
> @@ -30,6 +30,7 @@
>  #include <linux/module.h>
>  #include <linux/pci.h>
>  #include <linux/ptp_clock_kernel.h>
> +#include <linux/slab.h>
>  
>  #define STATION_ADDR_LEN	20
>  #define PCI_DEVICE_ID_PCH_1588	0x8819
> -- 
> 1.7.0.4
> 

^ permalink raw reply

* Re: [V2 PATCH 9/9] vhost: zerocopy: poll vq in zerocopy callback
From: Shirley Ma @ 2012-05-16 17:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <20120516151444.GC9934@redhat.com>

On Wed, 2012-05-16 at 18:14 +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 08:10:27AM -0700, Shirley Ma wrote:
> > On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
> > > >>   drivers/vhost/vhost.c |    1 +
> > > >>   1 files changed, 1 insertions(+), 0 deletions(-)
> > > >>
> > > >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > >> index 947f00d..7b75fdf 100644
> > > >> --- a/drivers/vhost/vhost.c
> > > >> +++ b/drivers/vhost/vhost.c
> > > >> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
> > > >>          struct vhost_ubuf_ref *ubufs = ubuf->arg;
> > > >>          struct vhost_virtqueue *vq = ubufs->vq;
> > > >>
> > > >> +       vhost_poll_queue(&vq->poll);
> > > >>          /* set len = 1 to mark this desc buffers done DMA */
> > > >>          vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
> > > >>          kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
> > > > Doing so, we might have redundant vhost_poll_queue(). Do you
> know in
> > > > which scenario there might be missing of adding and signaling
> during
> > > > zerocopy?
> > > 
> > > Yes, as we only do signaling and adding during tx work, if there's
> no
> > > tx 
> > > work when the skb were sent, we may lose the opportunity to let
> guest 
> > > know about the completion. It's easy to be reproduced with netperf
> > > test. 
> > 
> > The reason which host signals guest is to free guest tx buffers, if
> > there is no tx work, then it's not necessary to signal the guest
> unless
> > guest runs out of memory. The pending buffers will be released
> > virtio_net device gone.
> > 
> > What's the behavior of netperf test when you hit this situation?
> > 
> > Thanks
> > Shirley
> 
> IIRC guest networking seems to be lost. 

It seems vhost_enable_notify is missing in somewhere else?

Thanks
Shirley

^ permalink raw reply

* Re: [V2 PATCH 9/9] vhost: zerocopy: poll vq in zerocopy callback
From: Michael S. Tsirkin @ 2012-05-16 18:36 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <1337189525.10741.24.camel@oc3660625478.ibm.com>

On Wed, May 16, 2012 at 10:32:05AM -0700, Shirley Ma wrote:
> On Wed, 2012-05-16 at 18:14 +0300, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 08:10:27AM -0700, Shirley Ma wrote:
> > > On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
> > > > >>   drivers/vhost/vhost.c |    1 +
> > > > >>   1 files changed, 1 insertions(+), 0 deletions(-)
> > > > >>
> > > > >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > >> index 947f00d..7b75fdf 100644
> > > > >> --- a/drivers/vhost/vhost.c
> > > > >> +++ b/drivers/vhost/vhost.c
> > > > >> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
> > > > >>          struct vhost_ubuf_ref *ubufs = ubuf->arg;
> > > > >>          struct vhost_virtqueue *vq = ubufs->vq;
> > > > >>
> > > > >> +       vhost_poll_queue(&vq->poll);
> > > > >>          /* set len = 1 to mark this desc buffers done DMA */
> > > > >>          vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
> > > > >>          kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
> > > > > Doing so, we might have redundant vhost_poll_queue(). Do you
> > know in
> > > > > which scenario there might be missing of adding and signaling
> > during
> > > > > zerocopy?
> > > > 
> > > > Yes, as we only do signaling and adding during tx work, if there's
> > no
> > > > tx 
> > > > work when the skb were sent, we may lose the opportunity to let
> > guest 
> > > > know about the completion. It's easy to be reproduced with netperf
> > > > test. 
> > > 
> > > The reason which host signals guest is to free guest tx buffers, if
> > > there is no tx work, then it's not necessary to signal the guest
> > unless
> > > guest runs out of memory. The pending buffers will be released
> > > virtio_net device gone.
> > > 
> > > What's the behavior of netperf test when you hit this situation?
> > > 
> > > Thanks
> > > Shirley
> > 
> > IIRC guest networking seems to be lost. 
> 
> It seems vhost_enable_notify is missing in somewhere else?
> 
> Thanks
> Shirley

Donnu. I see virtio sending packets but they do not get
to tun on host. debugging.

^ permalink raw reply

* Re: [PATCH 0/4] netfilter fixes for 3.4-rc7
From: Jozsef Kadlecsik @ 2012-05-16 18:41 UTC (permalink / raw)
  To: David Miller; +Cc: Pablo Neira Ayuso, netfilter-devel, netdev
In-Reply-To: <20120514.185607.1967456974676336550.davem@davemloft.net>

Hi Dave,

On Mon, 14 May 2012, David Miller wrote:

> From: pablo@netfilter.org
> Date: Mon, 14 May 2012 13:46:59 +0200
> 
> > * One fix for possible timeout overflow for ipset, from Jozsef
> >   Kadlecsik.
> > 
> > * One fix to ensure that hash size is correct, again for ipset
> >   from Jozsef Kadlecsik.
> > 
> > * Removal of redundant include in xt_CT from Eldad Zack.
> > 
> > * Fix for wrong usage of MODULE_ALIAS_NFCT_HELPER in nf_ct_h323
> >   helper from myself.
> 
> I don't consider any of these appropriate this late in the -RC
> series.
> 
> They don't fix major regressions seen by many users.

Could at least the patch with the subject

   netfilter: ipset: fix hash size checking in kernel

   The hash size must fit both into u32 (jhash) and the max value of
   size_t. The missing checking could lead to kernel crash, bug reported
   by Seblu.

be submitted into 3.4-rc7? Any non most-recent ipset package compiled with 
gcc-4.7 or above can trigger the bug.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: [PATCH] ptp_pch: Add missing #include <linux/slab.h>
From: David Miller @ 2012-05-16 18:58 UTC (permalink / raw)
  To: geert; +Cc: tshimizu818, richardcochran, netdev, linux-kernel
In-Reply-To: <1337169017-15835-1-git-send-email-geert@linux-m68k.org>

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Wed, 16 May 2012 13:50:17 +0200

> drivers/ptp/ptp_pch.c: In function 'pch_remove':
> drivers/ptp/ptp_pch.c:576:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]
> drivers/ptp/ptp_pch.c: In function 'pch_probe':
> drivers/ptp/ptp_pch.c:587:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
> 
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

Applied.

^ permalink raw reply

* Re: [V2 PATCH 9/9] vhost: zerocopy: poll vq in zerocopy callback
From: Shirley Ma @ 2012-05-16 19:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <20120516183629.GJ10769@redhat.com>

On Wed, 2012-05-16 at 21:36 +0300, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 10:32:05AM -0700, Shirley Ma wrote:
> > On Wed, 2012-05-16 at 18:14 +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 16, 2012 at 08:10:27AM -0700, Shirley Ma wrote:
> > > > On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
> > > > > >>   drivers/vhost/vhost.c |    1 +
> > > > > >>   1 files changed, 1 insertions(+), 0 deletions(-)
> > > > > >>
> > > > > >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > >> index 947f00d..7b75fdf 100644
> > > > > >> --- a/drivers/vhost/vhost.c
> > > > > >> +++ b/drivers/vhost/vhost.c
> > > > > >> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void
> *arg)
> > > > > >>          struct vhost_ubuf_ref *ubufs = ubuf->arg;
> > > > > >>          struct vhost_virtqueue *vq = ubufs->vq;
> > > > > >>
> > > > > >> +       vhost_poll_queue(&vq->poll);
> > > > > >>          /* set len = 1 to mark this desc buffers done DMA
> */
> > > > > >>          vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
> > > > > >>          kref_put(&ubufs->kref,
> vhost_zerocopy_done_signal);
> > > > > > Doing so, we might have redundant vhost_poll_queue(). Do you
> > > know in
> > > > > > which scenario there might be missing of adding and
> signaling
> > > during
> > > > > > zerocopy?
> > > > > 
> > > > > Yes, as we only do signaling and adding during tx work, if
> there's
> > > no
> > > > > tx 
> > > > > work when the skb were sent, we may lose the opportunity to
> let
> > > guest 
> > > > > know about the completion. It's easy to be reproduced with
> netperf
> > > > > test. 
> > > > 
> > > > The reason which host signals guest is to free guest tx buffers,
> if
> > > > there is no tx work, then it's not necessary to signal the guest
> > > unless
> > > > guest runs out of memory. The pending buffers will be released
> > > > virtio_net device gone.
> > > > 
> > > > What's the behavior of netperf test when you hit this situation?
> > > > 
> > > > Thanks
> > > > Shirley
> > > 
> > > IIRC guest networking seems to be lost. 
> > 
> > It seems vhost_enable_notify is missing in somewhere else?
> > 
> > Thanks
> > Shirley
> 
> Donnu. I see virtio sending packets but they do not get
> to tun on host. debugging. 

I looked at the code, if (zerocopy) check is needed for below code:

+	if (zerocopy) {
                        num_pends = likely(vq->upend_idx >= vq->done_idx) ?
                                    (vq->upend_idx - vq->done_idx) :
                                    (vq->upend_idx + UIO_MAXIOV - vq->done_idx);
                        if (unlikely(num_pends > VHOST_MAX_PEND)) {
                                tx_poll_start(net, sock);
				vhost_poll_queue
                                set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
                                break;
                        }
+	}
                        if (unlikely(vhost_enable_notify(&net->dev, vq))) {
                                vhost_disable_notify(&net->dev, vq);
                                continue;
                        }
                        break;


Second, whether it's possible the problem comes from tx_poll_start()? In
some situation, vhost_poll_wakeup() is not being called?

Thanks
Shirley

^ permalink raw reply

* Re: [PATCH] bonding: only send arp monitor packets if no other traffic
From: Jay Vosburgh @ 2012-05-16 19:08 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev
In-Reply-To: <4FB3F67C.6000401@genband.com>

Chris Friesen <chris.friesen@genband.com> wrote:

>In order to minimize network traffic, when using load balancing modes
>only send out arp monitor packets if it's been more than delta_in_ticks
>jiffies since we either received or transmitted packets.  The rationale
>behind this is that if there is a lot of other traffic going on we don't
>need the arp monitor packets to determine whether or not the link is
>working.
>
>This makes the most difference if you have a lot of hosts all arping
>the same target at high frequency.

	This logic would not work for the active-backup case (it would
break arp_validate, for one thing), but might be ok for the loadbalance
(balance-xor, balance-rr) case.

	This might adversely affect cases where the switch ports are not
configured into a port channel; in that case, the ARP broadcasts would
be sent to all slaves, but with this patch, will no longer be.  That's
technically not a correct configuration, but seems to be in use
nevertheless.

	I didn't think that the ARP monitor was particularly popular for
the loadbalance case, since it is not as reliable.  It depends upon the
switch to insure that some traffic comes in to each slave, and low
traffic periods can result in false detection of link failures.  Even
with the ARPs being sent out, if those are not evenly balanced by the
switch, false failure detections can occur.

	-J


>Signed-off-by: Chris Friesen <chris.friesen@genand.com>
>---
> drivers/net/bonding/bond_main.c |    8 ++++++--
> 1 files changed, 6 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index bc13b3d..4c8459a 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2885,8 +2885,12 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
> 		 * do - all replies will be rx'ed on same link causing slaves
> 		 * to be unstable during low/no traffic periods
> 		 */
>-		if (IS_UP(slave->dev))
>-			bond_arp_send_all(bond, slave);
>+		if (IS_UP(slave->dev)) {
>+			if (time_after_eq(jiffies, dev_trans_start(slave->dev) + delta_in_ticks) ||
>+			    time_after_eq(jiffies, slave->dev->last_rx + delta_in_ticks)) {
>+				bond_arp_send_all(bond, slave);
>+			}
>+		}
> 	}
>
> 	if (do_failover) {
>
>-- 
>
>Chris Friesen
>Software Designer
>3500 Carling Avenue
>Ottawa, Ontario K2H 8E9
>www.genband.com

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH 0/4] netfilter fixes for 3.4-rc7
From: David Miller @ 2012-05-16 19:18 UTC (permalink / raw)
  To: kadlec; +Cc: pablo, netfilter-devel, netdev
In-Reply-To: <alpine.DEB.2.00.1205162037110.2742@blackhole.kfki.hu>

From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Date: Wed, 16 May 2012 20:41:51 +0200 (CEST)

> Could at least the patch with the subject
> 
>    netfilter: ipset: fix hash size checking in kernel
> 
>    The hash size must fit both into u32 (jhash) and the max value of
>    size_t. The missing checking could lead to kernel crash, bug reported
>    by Seblu.
> 
> be submitted into 3.4-rc7? Any non most-recent ipset package compiled with 
> gcc-4.7 or above can trigger the bug.

And only root can trigger it if he gives bogus parameters right?

If that's the case, the exposure is to privileged users committing an
operator error, so I don't see it as so important.

^ permalink raw reply

* [PATCH] bonding: only send arp monitor packets if no other traffic
From: Chris Friesen @ 2012-05-16 18:48 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

In order to minimize network traffic, when using load balancing modes
only send out arp monitor packets if it's been more than delta_in_ticks
jiffies since we either received or transmitted packets.  The rationale
behind this is that if there is a lot of other traffic going on we don't
need the arp monitor packets to determine whether or not the link is
working.

This makes the most difference if you have a lot of hosts all arping
the same target at high frequency.

Signed-off-by: Chris Friesen <chris.friesen@genand.com>
---
 drivers/net/bonding/bond_main.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index bc13b3d..4c8459a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2885,8 +2885,12 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
 		 * do - all replies will be rx'ed on same link causing slaves
 		 * to be unstable during low/no traffic periods
 		 */
-		if (IS_UP(slave->dev))
-			bond_arp_send_all(bond, slave);
+		if (IS_UP(slave->dev)) {
+			if (time_after_eq(jiffies, dev_trans_start(slave->dev) + delta_in_ticks) ||
+			    time_after_eq(jiffies, slave->dev->last_rx + delta_in_ticks)) {
+				bond_arp_send_all(bond, slave);
+			}
+		}
 	}
 
 	if (do_failover) {

-- 

Chris Friesen
Software Designer
3500 Carling Avenue
Ottawa, Ontario K2H 8E9
www.genband.com

^ permalink raw reply related

* Re: [PATCH net-next v2 0/2] 6lowpan: code updates
From: David Miller @ 2012-05-16 19:28 UTC (permalink / raw)
  To: alex.bluesman.smirnov; +Cc: netdev
In-Reply-To: <1337153248-5779-1-git-send-email-alex.bluesman.smirnov@gmail.com>

From: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Date: Wed, 16 May 2012 11:27:26 +0400

>   6lowpan: rework data fetching from skb

This is not what we told you to do.

We told you that IF you were going to emit a warning message
for the pskb_may_pull() failure condition, you should use
WARN_ON_ONCE() so that it doesn't potentially flood the
logs.

But you must always, in every case, handle the error in some
reasonable way, not just when WARN_ON_ONCE() does that initial
one-and-only trigger.

^ permalink raw reply

* Re: [PATCH v2 0/8] mISDN: Fixes and enhancements for the data channels
From: David Miller @ 2012-05-16 19:28 UTC (permalink / raw)
  To: kkeil; +Cc: netdev
In-Reply-To: <1337161868-19399-1-git-send-email-kkeil@linux-pingi.de>

From: Karsten Keil <kkeil@linux-pingi.de>
Date: Wed, 16 May 2012 11:51:00 +0200

> v2: All fixes from Dave Millers review
>    - comment format
>    - use bool type for boolean parameter
>    - use (var == CONSTANT)
> 
> This series improve the stability of streaming raw voice data when the
> system is under high load. With the fixes in place you can send and
> receive multiple faximilies (using SPANDSP) in parallel
> while compiling a kernel without a getting a transfer aborted.
> 
> for net-next

All applied to net-next.

^ permalink raw reply

* Re: [PATCH net-next] net: sock_flag() cleanup
From: David Miller @ 2012-05-16 19:30 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1337183827.8512.1215.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 May 2012 17:57:07 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> - sock_flag() accepts a const pointer
> 
> - sock_flag() returns a boolean
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] fq_codel: should use qdisc backlog as threshold
From: David Miller @ 2012-05-16 19:30 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, nichols, dave.taht, van, codel, bloat
In-Reply-To: <1337179149.8512.1208.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 May 2012 16:39:09 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> codel_should_drop() logic allows a packet being not dropped if queue
> size is under max packet size.
> 
> In fq_codel, we have two possible backlogs : The qdisc global one, and
> the flow local one.
> 
> The meaningful one for codel_should_drop() should be the global backlog,
> not the per flow one, so that thin flows can have a non zero drop/mark
> probability.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox