Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC net-next 0/2] net: Use net_<level>_ratelimit
From: Johannes Berg @ 2012-05-15 18:03 UTC (permalink / raw)
  To: Joe Perches
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netfilter-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-decnet-user-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-sctp-u79uwXL29TY76Z2rM5mHXA,
	coreteam-Cap9r6Oaw4JrovVCs/uTlw,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA, David Miller
In-Reply-To: <1337104765.7050.24.camel@joe2Laptop>

On Tue, 2012-05-15 at 10:59 -0700, Joe Perches wrote:
> On Tue, 2012-05-15 at 13:45 -0400, David Miller wrote:
> > From: Joe Perches <joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
> > Date: Mon, 14 May 2012 00:56:24 -0700
> > 
> > > net_ratelimit() like __ratelimit() is too easy to misuse.
> > > 
> > > Add simplifying macros similar to pr_<level>_ratelimited
> > > that combines the test of net_ratelimit and logging.
> > > 
> > > Joe Perches (2):
> > >   net: Add net_ratelimited_function and net_<level>_ratelimited macros
> > >   net: Convert net_ratelimit uses to net_<level>_ratelimited
> > 
> > These look fine to me so I've applied them to net-next and am
> > sanity checking the build right now.
> 
> OK, but fyi, there's a possible issue with !CONFIG_DEBUG
> builds because these patches converted some uses of
> 	if (net_ratelimit())
> 		printk(KERN_DEBUG ...
> to
> 	net_dbg_ratelimited()
> 
> These messages are no longer emitted when DEBUG isn't defined
> and not using dynamic_debug.  I'm not sure that's a real
> problem, but it's a difference.
> 
> I could produce a net_printk_ratelimited that would keep
> the original behavior if necessary.
> 
> 	net_printk_ratelimited(KERN_DEBUG etc...)

Oops. Yes, please do that, mac80211 doesn't have DEBUG yet

johannes

^ permalink raw reply

* Re: [PATCH RFC] tun: experimental zero copy tx support
From: Shirley Ma @ 2012-05-15 18:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Dumazet, Stephen Hemminger, David S. Miller, Joe Perches,
	Jason Wang, netdev, linux-kernel, Ian.Campbell, kvm
In-Reply-To: <20120514193134.GA18366@redhat.com>

On Mon, 2012-05-14 at 22:31 +0300, Michael S. Tsirkin wrote:
> On Mon, May 14, 2012 at 09:21:47PM +0200, Eric Dumazet wrote:
> > On Mon, 2012-05-14 at 22:14 +0300, Michael S. Tsirkin wrote:
> > 
> > > They seem to be in net-next, or did I miss something?
> > 
> > Yes, you re-introduce in this patch some bugs already fixed in
> macvtap
> 
> Could explain why I see some problems in testing :)
> Maybe common code should go into net/core?
> I couldn't decide whether the increase in kernel
> size is worth it.

Which problem you hit during testing?

The logic of zerocopy_sg_from_iovec should work well. Jason's fix made
the code more clear.

Thanks
Shirley

^ permalink raw reply

* Re: [RFC net-next 0/2] net: Use net_<level>_ratelimit
From: Joe Perches @ 2012-05-15 17:59 UTC (permalink / raw)
  To: David Miller
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	coreteam-Cap9r6Oaw4JrovVCs/uTlw, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-decnet-user-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-sctp-u79uwXL29TY76Z2rM5mHXA,
	netfilter-u79uwXL29TY76Z2rM5mHXA,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20120515.134531.530903973750646107.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

On Tue, 2012-05-15 at 13:45 -0400, David Miller wrote:
> From: Joe Perches <joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
> Date: Mon, 14 May 2012 00:56:24 -0700
> 
> > net_ratelimit() like __ratelimit() is too easy to misuse.
> > 
> > Add simplifying macros similar to pr_<level>_ratelimited
> > that combines the test of net_ratelimit and logging.
> > 
> > Joe Perches (2):
> >   net: Add net_ratelimited_function and net_<level>_ratelimited macros
> >   net: Convert net_ratelimit uses to net_<level>_ratelimited
> 
> These look fine to me so I've applied them to net-next and am
> sanity checking the build right now.

OK, but fyi, there's a possible issue with !CONFIG_DEBUG
builds because these patches converted some uses of
	if (net_ratelimit())
		printk(KERN_DEBUG ...
to
	net_dbg_ratelimited()

These messages are no longer emitted when DEBUG isn't defined
and not using dynamic_debug.  I'm not sure that's a real
problem, but it's a difference.

I could produce a net_printk_ratelimited that would keep
the original behavior if necessary.

	net_printk_ratelimited(KERN_DEBUG etc...)

cheers, Joe

^ permalink raw reply

* Re: [RFC net-next 0/2] net: Use net_<level>_ratelimit
From: David Miller @ 2012-05-15 17:45 UTC (permalink / raw)
  To: joe-6d6DIl74uiNBDgjK7y7TUQ
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	coreteam-Cap9r6Oaw4JrovVCs/uTlw, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-decnet-user-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-sctp-u79uwXL29TY76Z2rM5mHXA,
	netfilter-u79uwXL29TY76Z2rM5mHXA,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <cover.1336981915.git.joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>

From: Joe Perches <joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
Date: Mon, 14 May 2012 00:56:24 -0700

> net_ratelimit() like __ratelimit() is too easy to misuse.
> 
> Add simplifying macros similar to pr_<level>_ratelimited
> that combines the test of net_ratelimit and logging.
> 
> Joe Perches (2):
>   net: Add net_ratelimited_function and net_<level>_ratelimited macros
>   net: Convert net_ratelimit uses to net_<level>_ratelimited

These look fine to me so I've applied them to net-next and am
sanity checking the build right now.

Thanks Joe.

^ permalink raw reply

* Re: [PATCH] dummy: documentation is stale
From: David Miller @ 2012-05-15 17:44 UTC (permalink / raw)
  To: alan; +Cc: netdev
In-Reply-To: <20120514135724.23177.91808.stgit@bluebook>

From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Mon, 14 May 2012 14:57:31 +0100

> From: Alan Cox <alan@linux.intel.com>
> 
> dummy0/1/2 names are always used and there are options to set multiple
> dummy devices. Remove the obsolete text
> 
> Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=42865
> Signed-off-by: Alan Cox <alan@linux.intel.com>

Applied, thanks Alan.

^ permalink raw reply

* Re: [V2 PATCH 4/9] macvtap: zerocopy: set SKBTX_DEV_ZEROCOPY only when skb is built successfully
From: Shirley Ma @ 2012-05-15 17:44 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, mst, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <20120502034206.11782.91643.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Wed, 2012-05-02 at 11:42 +0800, Jason Wang wrote:
> Current the SKBTX_DEV_ZEROCOPY is set unconditionally after
> zerocopy_sg_from_iovec(), this would lead NULL pointer when macvtap
> fails to build zerocopy skb because destructor_arg was not
> initialized. Solve this by set this flag after the skb were built
> successfully.

I thought this flag was needed for zerocopy skb free even in err case.
I've checked it back again, it's OK to move the flag after the skb
successfully built. Or we can fix it by modify skb free with
destructor_arg NULL check as below:
...
skb_release_data() {
...
                if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
                        struct ubuf_info *uarg;

                        uarg = skb_shinfo(skb)->destructor_arg;
                        if (uarg && uarg->callback)
                                uarg->callback(uarg);
                }

...
}
Thanks
Shirley

^ permalink raw reply

* Re: [PATCH] pch_gbe: fix transmit races
From: David Miller @ 2012-05-15 17:43 UTC (permalink / raw)
  To: eric.dumazet; +Cc: andy.cress, netdev
In-Reply-To: <1337023566.8512.622.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 14 May 2012 21:26:06 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> Andy reported pch_gbe triggered "NETDEV WATCHDOG" errors.
> 
> May 11 11:06:09 kontron kernel: WARNING: at net/sched/sch_generic.c:261
> dev_watchdog+0x1ec/0x200() (Not tainted)
> May 11 11:06:09 kontron kernel: Hardware name: N/A
> May 11 11:06:09 kontron kernel: NETDEV WATCHDOG: eth0 (pch_gbe):
> transmit queue 0 timed out
> 
> It seems pch_gbe has a racy tx path (races with TX completion path)
> 
> Remove tx_queue_lock lock since it has no purpose, we must use tx_lock
> instead.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Andy Cress <andy.cress@us.kontron.com>
> Tested-by: Andy Cress <andy.cress@us.kontron.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] cdc_ether: add Novatel USB551L device IDs for FLAG_WWAN
From: David Miller @ 2012-05-15 17:42 UTC (permalink / raw)
  To: oliver-GvhC2dPhHPQdnm+yROfE0A
  Cc: dcbw-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <201205071623.56396.oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>

From: Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>
Date: Mon, 7 May 2012 16:23:56 +0200

> Am Montag, 7. Mai 2012, 16:24:51 schrieb Dan Williams:
>> Needs to be tagged with FLAG_WWAN, which since it has generic
>> descriptors, won't happen if we don't override the generic
>> driver info.
>> 
>> Cc: Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>
>> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Signed-off-by: Dan Williams <dcbw-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Acked-by: Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 3/3] usbnet: fix skb traversing races during unlink(v1)
From: David Miller @ 2012-05-15 17:42 UTC (permalink / raw)
  To: tom.leiming-Re5JQEeQqe8AvxtiuMwx3w
  Cc: gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w, oneukum-l3A5Bk7waGM,
	stable-DgEjT+Ai2ygdnm+yROfE0A
In-Reply-To: <20120503090450.41704527@tom-ThinkPad-T410>

From: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Thu, 3 May 2012 09:04:50 +0800

>>From a87ff961f0a5d50223bd084dfac4fe5ce84f3913 Mon Sep 17 00:00:00 2001
> From: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date: Thu, 26 Apr 2012 11:33:46 +0800
> Subject: [PATCH] usbnet: fix skb traversing races during unlink(v2)
> 
> Commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d(net/usbnet: avoid
> recursive locking in usbnet_stop()) fixes the recursive locking
> problem by releasing the skb queue lock before unlink, but may
> cause skb traversing races:
> 	- after URB is unlinked and the queue lock is released,
> 	the refered skb and skb->next may be moved to done queue,
> 	even be released
> 	- in skb_queue_walk_safe, the next skb is still obtained
> 	by next pointer of the last skb
> 	- so maybe trigger oops or other problems
> 
> This patch extends the usage of entry->state to describe 'start_unlink'
> state, so always holding the queue(rx/tx) lock to change the state if
> the referd skb is in rx or tx queue because we need to know if the
> refered urb has been started unlinking in unlink_urbs.
> 
> The other part of this patch is based on Huajun's patch:
> always traverse from head of the tx/rx queue to get skb which is
> to be unlinked but not been started unlinking.
> 
> Signed-off-by: Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next V1 1/8] net/mlx4: Address build warnings on set but not used variables
From: David Miller @ 2012-05-15 17:38 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, netdev, yevgenyp
In-Reply-To: <1337066690-2248-2-git-send-email-ogerlitz@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Tue, 15 May 2012 10:24:43 +0300

> -		err = mlx4_cmd_imm(dev, mac, &out_param, RES_MAC,
> -				   RES_OP_RESERVE_AND_MAP, MLX4_CMD_FREE_RES,
> -				   MLX4_CMD_TIME_CLASS_A, MLX4_CMD_WRAPPED);
> +		(void) mlx4_cmd_imm(dev, mac, &out_param, RES_MAC,
> +					RES_OP_RESERVE_AND_MAP, MLX4_CMD_FREE_RES,
> +					MLX4_CMD_TIME_CLASS_A, MLX4_CMD_WRAPPED);

Please stop wasting my time.  This is not the correct way to indent
functions that have arguments on multiple lines.

I'll say it one more time:

	The first character must line up with the column right after
	the openning parenthesis on the first line.

This means you DO NOT use only TAB characters and indent the thing
into the solar system like you have above.

Instead you use TAB and SPACE characters, as needed, to line it up
properly, like so:

	func(arg1, arg2,
	     arg3, arg4);

^ permalink raw reply

* Re: [V2 PATCH 3/9] macvtap: zerocopy: put page when fail to get all requested user pages
From: Shirley Ma @ 2012-05-15 17:33 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, mst, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <20120502034157.11782.66606.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Wed, 2012-05-02 at 11:41 +0800, Jason Wang wrote:
> When get_user_pages_fast() fails to get all requested pages, we could
> not use
> kfree_skb() to free it as it has not been put in the skb fragments. So
> we need
> to call put_page() instead.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/macvtap.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 7cb2684..9ab182a 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -531,9 +531,11 @@ static int zerocopy_sg_from_iovec(struct sk_buff
> *skb, const struct iovec *from,
>                 size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >>
> PAGE_SHIFT;
>                 num_pages = get_user_pages_fast(base, size, 0,
> &page[i]);
>                 if ((num_pages != size) ||
> -                   (num_pages > MAX_SKB_FRAGS -
> skb_shinfo(skb)->nr_frags))
> -                       /* put_page is in skb free */
> +                   (num_pages > MAX_SKB_FRAGS -
> skb_shinfo(skb)->nr_frags)) {
> +                       for (i = 0; i < num_pages; i++)
> +                               put_page(page[i]);
>                         return -EFAULT;
> +               }
>                 truesize = size * PAGE_SIZE;
>                 skb->data_len += len;
>                 skb->len += len; 

Good catch. I don't know why I thought put_page would be called in
skb_free for these pages which hadn't been added to skb frags before. :(

thanks
Shirley

^ permalink raw reply

* Re: [V2 PATCH 2/9] macvtap: zerocopy: fix truesize underestimation
From: Shirley Ma @ 2012-05-15 17:26 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, mst, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <20120502034144.11782.88947.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Wed, 2012-05-02 at 11:41 +0800, Jason Wang wrote:
> As the skb fragment were pinned/built from user pages, we should
> account the page instead of length for truesize.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/macvtap.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index bd4a70d..7cb2684 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -519,6 +519,7 @@ static int zerocopy_sg_from_iovec(struct sk_buff
> *skb, const struct iovec *from,
>                 struct page *page[MAX_SKB_FRAGS];
>                 int num_pages;
>                 unsigned long base;
> +               unsigned long truesize;
> 
>                 len = from->iov_len - offset;
>                 if (!len) {
> @@ -533,10 +534,11 @@ static int zerocopy_sg_from_iovec(struct sk_buff
> *skb, const struct iovec *from,
>                     (num_pages > MAX_SKB_FRAGS -
> skb_shinfo(skb)->nr_frags))
>                         /* put_page is in skb free */
>                         return -EFAULT;
> +               truesize = size * PAGE_SIZE;

Here should be truesize = size * PAGE_SIZE - offset, right?

>                 skb->data_len += len;
>                 skb->len += len;
> -               skb->truesize += len;
> -               atomic_add(len, &skb->sk->sk_wmem_alloc);
> +               skb->truesize += truesize;
> +               atomic_add(truesize, &skb->sk->sk_wmem_alloc);
>                 while (len) {
>                         int off = base & ~PAGE_MASK;
>                         int size = min_t(int, len, PAGE_SIZE - off);
> 
> 

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Eric Dumazet @ 2012-05-15 17:23 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Kieran Mansley, netdev
In-Reply-To: <1337101280.8512.1108.camel@edumazet-glaptop>

On Tue, 2012-05-15 at 19:01 +0200, Eric Dumazet wrote:

> 
> napi_get_frags() could probably updated in net-next to use the first
> frag as skb->head to save 512 bytes per skb.

By the way GRO_MAX_HEAD definition is way too big, napi_get_frags()
allocates fat skbs (1280 bytes of overhead instead of 768 bytes)

This should be enough :

#define GRO_MAX_HEAD 128

^ permalink raw reply

* Re: [V2 PATCH 1/9] macvtap: zerocopy: fix offset calculation when building skb
From: Shirley Ma @ 2012-05-15 17:17 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, mst, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <20120502034130.11782.25906.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Wed, 2012-05-02 at 11:41 +0800, Jason Wang wrote:
> This patch fixes the offset calculation when building skb:
> 
> - offset1 were used as skb data offset not vector offset
> - reset offset to zero only when we advance to next vector

I tested the original code in all scenario, it worked well.

However this patch makes the code more clear.

Thanks
Shirley

^ permalink raw reply

* Re: [PATCH] xfrm_algo: drop an unnecessary inclusion
From: David Miller @ 2012-05-15 17:14 UTC (permalink / raw)
  To: JBeulich; +Cc: netdev, linux-kernel
In-Reply-To: <4FB2618C0200007800083C99@nat28.tlf.novell.com>

From: "Jan Beulich" <JBeulich@suse.com>
Date: Tue, 15 May 2012 13:00:44 +0100

> For several releases, this has not been needed anymore, as no helper
> functions declared in net/ah.h get implemented by xfrm_algo.c anymore.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Applied.

^ permalink raw reply

* Re: [PATCH, v2] xfrm: make xfrm_algo.c a module
From: David Miller @ 2012-05-15 17:14 UTC (permalink / raw)
  To: JBeulich; +Cc: netdev, linux-kernel
In-Reply-To: <4FB260D80200007800083C91@nat28.tlf.novell.com>

From: "Jan Beulich" <JBeulich@suse.com>
Date: Tue, 15 May 2012 12:57:44 +0100

> By making this a standalone config option (auto-selected as needed),
> selecting CRYPTO from here rather than from XFRM (which is boolean)
> allows the core crypto code to become a module again even when XFRM=y.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/2] ethtool changes
From: David Miller @ 2012-05-15 17:14 UTC (permalink / raw)
  To: manish.chopra
  Cc: bhutchings, netdev, Dept_NX_Linux_NIC_Driver, anirban.chakraborty
In-Reply-To: <1337080419-31786-1-git-send-email-manish.chopra@qlogic.com>

From: Manish Chopra <manish.chopra@qlogic.com>
Date: Tue, 15 May 2012 07:13:37 -0400

> Please apply it to net-next.

All applied, thanks.

^ permalink raw reply

* Re: [PATCH v3.4-rc7 regression] net/bond: return int from recv_probe()
From: David Miller @ 2012-05-15 17:08 UTC (permalink / raw)
  To: arnd; +Cc: geert, jbohac, linux-kernel, netdev
In-Reply-To: <201205151056.53621.arnd@arndb.de>

Also there were illegal sequences in your email headers (the netdev
address has a trailing ".") so vger rejected the posting.

^ permalink raw reply

* Re: [PATCH v3.4-rc7 regression] net/bond: return int from recv_probe()
From: David Miller @ 2012-05-15 17:07 UTC (permalink / raw)
  To: arnd; +Cc: geert, jbohac, linux-kernel, netdev
In-Reply-To: <201205151056.53621.arnd@arndb.de>

From: Arnd Bergmann <arnd@arndb.de>
Date: Tue, 15 May 2012 10:56:53 +0000

> 13a8e0c8c "bonding: don't increase rx_dropped after processing LACPDUs"
> changed the prototype of the bonding recv_probe handler, but only in
> some places, as identified by this warning:
> 
> drivers/net/bonding/bond_main.c: In function 'bond_handle_frame':
> drivers/net/bonding/bond_main.c:1463:13: error: assignment from incompatible pointer type [-Werror]
> drivers/net/bonding/bond_main.c: In function 'bond_open':
> drivers/net/bonding/bond_main.c:3441:21: error: assignment from incompatible pointer type [-Werror]
> drivers/net/bonding/bond_main.c:3448:20: error: assignment from incompatible pointer type [-Werror]
> 
> To fix this, we can change the remaining prototypes to return an
> integer as well, and always return RX_HANDLER_ANOTHER from rlb_arp_recv.

This has been fixed in Linus's tree for almost 2 days.

^ permalink raw reply

* Re: [PATCH v3 6/6] net: sh_eth: use NAPI
From: David Miller @ 2012-05-15 17:05 UTC (permalink / raw)
  To: yoshihiro.shimoda.uh; +Cc: netdev, linux-sh
In-Reply-To: <4FB225F1.20407@renesas.com>

From: "Shimoda, Yoshihiro" <yoshihiro.shimoda.uh@renesas.com>
Date: Tue, 15 May 2012 18:46:25 +0900

> 2012/05/15 14:07, David Miller wrote:
>> From: "Shimoda, Yoshihiro" <yoshihiro.shimoda.uh@renesas.com>
>> Date: Tue, 15 May 2012 13:47:44 +0900
>> 
>>> 2012/05/15 7:50, David Miller wrote:
>>>> You need strict synchronization between your TX queueing and TX
>>>> liberation flows.  So that queue stop and wake are only performed
>>>> at the correct moment.
>>>
>>> I will add netif_queue_stopped() in the sh_eth_poll().
>> 
>> That doesn't fix the bug.  What if someone transmits a packet and
>> fills the TX queue between the netif_queue_stopped() test and the
>> call to netif_wake_queue()?
>> 
>> Adding another test doesn't create the necessary synchronization.
>> 
> 
> Thank you for the reply again.
> I will modify the code as the following. Is it correct?
> 
> 	if (txfree_num) {
> 		netif_tx_lock(ndev);
> 		if (netif_queue_stopped(ndev))
> 			netif_wake_queue(ndev);
> 		netif_tx_unlock(ndev);
> 	}

Yes, and then you don't need that private lock in the start_xmit()
method at all, since that method runs with the tx_lock held.

^ permalink raw reply

* ibmveth bug?
From: Nishanth Aravamudan @ 2012-05-15 17:01 UTC (permalink / raw)
  To: santil; +Cc: anton, benh, paulus, netdev, linux-kernel

Hi Santiago,

Are you still working on ibmveth?

I've found a very sporadic bug with ibmveth in some testing. PAPR
requires that:

"Validate the Buffer Descriptor of the receive queue buffer (I/O
addresses for entire buffer length starting at the spec- ified I/O
address are translated by the RTCE table, length is a multiple of 16
bytes, and alignment is on a 16 byte boundary) else H_Parameter."

but from what I can tell ibmveth.c is not enforcing this last condition:

	adapter->rx_queue.queue_addr =
		kmalloc(adapter->rx_queue.queue_len, GFP_KERNEL);

	...

	adapter->rx_queue.queue_dma = dma_map_single(dev,
		adapter->rx_queue.queue_addr, adapter->rx_queue.queue_len,
		DMA_BIDIRECTIONAL);

	...

	rxq_desc.fields.address = adapter->rx_queue.queue_dma;

	...

	lpar_rc = ibmveth_register_logical_lan(adapter, rxq_desc,
		mac_address);
	netdev_err(netdev, "buffer TCE:0x%llx filter TCE:0x%llx rxq "
	 	"desc:0x%llx MAC:0x%llx\n", adapter->buffer_list_dma,
	 	adapter->filter_list_dma, rxq_desc.desc, mac_address);

And I got on one install attempt:

[ 39.978430] ibmveth 30000004: eth0: h_register_logical_lan failed with -4
[ 39.978449] ibmveth 30000004: eth0: buffer TCE:0x1000 filter TCE:0x10000 rxq desc:0x80006010000200a8 MAC:0x56754de8e904

rxq desc, as you can see is not 16byte aligned. kmalloc() only
guarantees 8-byte alignment (as does gcc, I think). Initially, I thought
we could just overallocate the queue_addr and ALIGN() down, but then we
would need to save the original kmalloc pointer in a new struct member
per rx_queue.

So a couple of questions:

1) Is my analysis accurate? :)

2) How gross would it be to save an extra pointer for every rx_queue?

3) Based upon 2), is it better to just go ahead and create our own
kmem_cache (which gets an alignment specified)?

For 3), I started coding this, but couldn't find a clean place to
allocate the kmem_cache itself, as the size of each object depends on
the run-time characteristics (afaict), but needs to be specified at
cache creation time. Any insight you could provide would be great!

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Eric Dumazet @ 2012-05-15 17:01 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Kieran Mansley, netdev
In-Reply-To: <1337100454.2544.25.camel@bwh-desktop.uk.solarflarecom.com>

On Tue, 2012-05-15 at 17:47 +0100, Ben Hutchings wrote:
> On Tue, 2012-05-15 at 18:34 +0200, Eric Dumazet wrote:
> > On Tue, 2012-05-15 at 17:29 +0100, Kieran Mansley wrote:
> > > On Tue, 2012-05-15 at 16:56 +0200, Eric Dumazet wrote:
> > > > 
> > > > Please try latest kernels, this is probably 'fixed'
> > > 
> > > I've just tried with 3.4.0-rc7 and the problem is still reproducible.
> > > It's perhaps harder to reproduce than on 3.3.6 but still there.
> > > 
> > > > What network driver are you using ? 
> > > 
> > > The receiver is using the sfc driver that is included in the kernel
> > > build, together with an SFC 9020 NIC. 
> > > 
> > > Kieran
> > > 
> > 
> > MTU ?
> 
> 1500
> 
> > What is typical skb->truesize of skb given to stack in RX path ?
> >
> > If drivers use PAGE_SIZE fragments, then you are more likely to hit
> > limit.
> 
> We're passing page fragments into GRO.


Yes, I can see drivers/net/ethernet/sfc/rx.c is even lying about
truesize. Thats explain why you trigger the backlogdrop even on 2.6
kernels.

skb->len = rx_buf->len;
skb->data_len = rx_buf->len;
skb->truesize += rx_buf->len; // instead of real frag size

So skb->truesize are rather small.

napi_get_frags() could probably updated in net-next to use the first
frag as skb->head to save 512 bytes per skb.

You could try setting tcp_adv_win_scale to -2

^ permalink raw reply

* Re: [V2 PATCH 9/9] vhost: zerocopy: poll vq in zerocopy callback
From: Shirley Ma @ 2012-05-15 16:50 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, mst, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <20120502034254.11782.27314.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Wed, 2012-05-02 at 11:42 +0800, Jason Wang wrote:
> We add used and signal guest in worker thread but did not poll the
> virtqueue
> during the zero copy callback. This may lead the missing of adding and
> signalling during zerocopy. Solve this by polling the virtqueue and
> let it
> wakeup the worker during callback.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/vhost.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 947f00d..7b75fdf 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
>         struct vhost_ubuf_ref *ubufs = ubuf->arg;
>         struct vhost_virtqueue *vq = ubufs->vq;
> 
> +       vhost_poll_queue(&vq->poll);
>         /* set len = 1 to mark this desc buffers done DMA */
>         vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
>         kref_put(&ubufs->kref, vhost_zerocopy_done_signal);

Doing so, we might have redundant vhost_poll_queue(). Do you know in
which scenario there might be missing of adding and signaling during
zerocopy?

Thanks
Shirley

^ permalink raw reply

* Re: TCPBacklogDrops during aggressive bursts of traffic
From: Ben Hutchings @ 2012-05-15 16:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Kieran Mansley, netdev
In-Reply-To: <1337099641.8512.1102.camel@edumazet-glaptop>

On Tue, 2012-05-15 at 18:34 +0200, Eric Dumazet wrote:
> On Tue, 2012-05-15 at 17:29 +0100, Kieran Mansley wrote:
> > On Tue, 2012-05-15 at 16:56 +0200, Eric Dumazet wrote:
> > > 
> > > Please try latest kernels, this is probably 'fixed'
> > 
> > I've just tried with 3.4.0-rc7 and the problem is still reproducible.
> > It's perhaps harder to reproduce than on 3.3.6 but still there.
> > 
> > > What network driver are you using ? 
> > 
> > The receiver is using the sfc driver that is included in the kernel
> > build, together with an SFC 9020 NIC. 
> > 
> > Kieran
> > 
> 
> MTU ?

1500

> What is typical skb->truesize of skb given to stack in RX path ?
>
> If drivers use PAGE_SIZE fragments, then you are more likely to hit
> limit.

We're passing page fragments into GRO.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next 0/2] extend sch_mqprio to distribute traffic not only by ETS TC
From: John Fastabend @ 2012-05-15 16:44 UTC (permalink / raw)
  To: Amir Vadai
  Cc: David S. Miller, netdev, Oren Duer, Liran Liss, Jamal Hadi Salim,
	Diego Crupnicoff, Or Gerlitz
In-Reply-To: <4FB15BEC.8040000@mellanox.com>

On 5/14/2012 12:24 PM, Amir Vadai wrote:
>>>> On 5/6/2012 12:05 AM, Amir Vadai wrote:
>>>>> This series comes to revive the discussion initiated on the thread "net:
>>>>> support tx_ring per UP in HW based QoS mechanism" (see
>>>>> http://marc.info/?t=133165957200004&r=1&w=2) with the major issue to be address
>>>>> is - how should sk_prio<=>    TC be done, for both, tagged and untagged traffic.
>>>>> Following is a staged description addressing the background, problem
>>>>> description, current situation, suggestion for the change and implementation of
>>>>> it.

[...]

> John Hi,
> 
> After some internal discussions, it was agreed to line up with your
> approach, to leave mqprio an abstract skb->priority <=> queue set
> mapping and to ignore egress_map if mqprio is enabled.
> 

OK sounds good.

> It would be very nice, if the term 'tc' in kernel code would be
> replaced to queue set, since it is very misleading.
> 

Go ahead and write up a patch. Just be careful not to break existing
user visible API. I agree it is confusing.

> There still might be some small issues with skb_tx_hash for tagged
> traffic, which I will work on tomorrow, and hopefully will send a new
> patch set with the solution.
> 

What are the issues? Lets see a patch.

Thanks,
John

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox