Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH RESEND 1/2] wlcore: fix copy-paste bug: assign from src struct not dest
From: Giel van Schijndel @ 2015-01-07 19:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Giel van Schijndel, Kalle Valo, John W. Linville, Eliad Peller,
	Arik Nemtsov, open list:TI WILINK WIRELES...,
	open list:NETWORKING DRIVERS
In-Reply-To: <1420394427-19509-1-git-send-email-me@mortis.eu>

Signed-off-by: Giel van Schijndel <me@mortis.eu>
Reported-at: http://www.viva64.com/en/b/0299/
---
 drivers/net/wireless/ti/wlcore/acx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ti/wlcore/acx.c b/drivers/net/wireless/ti/wlcore/acx.c
index b924cea..f28fa3b 100644
--- a/drivers/net/wireless/ti/wlcore/acx.c
+++ b/drivers/net/wireless/ti/wlcore/acx.c
@@ -1725,7 +1725,7 @@ int wl12xx_acx_config_hangover(struct wl1271 *wl)
 	acx->decrease_delta = conf->decrease_delta;
 	acx->quiet_time = conf->quiet_time;
 	acx->increase_time = conf->increase_time;
-	acx->window_size = acx->window_size;
+	acx->window_size = conf->window_size;
 
 	ret = wl1271_cmd_configure(wl, ACX_CONFIG_HANGOVER, acx,
 				   sizeof(*acx));
-- 
2.1.4

^ permalink raw reply related

* [PATCH RESEND 2/2] wlcore: align member-assigns in a structure-copy block
From: Giel van Schijndel @ 2015-01-07 19:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Giel van Schijndel, Kalle Valo, Eliad Peller, John W. Linville,
	Arik Nemtsov, open list:TI WILINK WIRELES...,
	open list:NETWORKING DRIVERS
In-Reply-To: <1420659525-22975-1-git-send-email-me@mortis.eu>

This highlights the differences (e.g. the bug fixed in the previous
commit).

Signed-off-by: Giel van Schijndel <me@mortis.eu>
---
 drivers/net/wireless/ti/wlcore/acx.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/ti/wlcore/acx.c b/drivers/net/wireless/ti/wlcore/acx.c
index f28fa3b..93a2fa8 100644
--- a/drivers/net/wireless/ti/wlcore/acx.c
+++ b/drivers/net/wireless/ti/wlcore/acx.c
@@ -1715,17 +1715,17 @@ int wl12xx_acx_config_hangover(struct wl1271 *wl)
 		goto out;
 	}
 
-	acx->recover_time = cpu_to_le32(conf->recover_time);
-	acx->hangover_period = conf->hangover_period;
-	acx->dynamic_mode = conf->dynamic_mode;
-	acx->early_termination_mode = conf->early_termination_mode;
-	acx->max_period = conf->max_period;
-	acx->min_period = conf->min_period;
-	acx->increase_delta = conf->increase_delta;
-	acx->decrease_delta = conf->decrease_delta;
-	acx->quiet_time = conf->quiet_time;
-	acx->increase_time = conf->increase_time;
-	acx->window_size = conf->window_size;
+	acx->recover_time               = cpu_to_le32(conf->recover_time);
+	acx->hangover_period            = conf->hangover_period;
+	acx->dynamic_mode               = conf->dynamic_mode;
+	acx->early_termination_mode     = conf->early_termination_mode;
+	acx->max_period                 = conf->max_period;
+	acx->min_period                 = conf->min_period;
+	acx->increase_delta             = conf->increase_delta;
+	acx->decrease_delta             = conf->decrease_delta;
+	acx->quiet_time                 = conf->quiet_time;
+	acx->increase_time              = conf->increase_time;
+	acx->window_size                = conf->window_size;
 
 	ret = wl1271_cmd_configure(wl, ACX_CONFIG_HANGOVER, acx,
 				   sizeof(*acx));
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH iproute2 3/3] ip netns: Delete all netns
From: Brian Haley @ 2015-01-07 19:40 UTC (permalink / raw)
  To: Vadim Kochan; +Cc: netdev
In-Reply-To: <20150107181112.GA24241@angus-think.lan>

On 01/07/2015 01:11 PM, Vadim Kochan wrote:
> On Wed, Jan 07, 2015 at 07:36:40PM +0200, Vadim Kochan wrote:
>> On Wed, Jan 07, 2015 at 10:44:24AM -0500, Brian Haley wrote:
>>> On 01/07/2015 06:04 AM, Vadim Kochan wrote:
>>>> From: Vadim Kochan <vadim4j@gmail.com>
>>>>
>>>> Allow delete all namespace names by:
>>>>
>>>>     $ ip netns del all
>>>
>>> So I can still create a namespace called 'all', but can't exec in it or delete
>>> it independently with this change.  Perhaps you need to block that as well?
>>> Unless there's some other patch I'm missing?
>>>
>>> -Brian
>> Hm, I did not take it into account ...
>> I will look if I can find another way ...
>>
>> Thanks,
> 
> what about this ?
> 
>     $ ip netns exec / ip link
>     $ ip netns del /
> 
> so it make a sense to be as root directory of bound ns names in /var/run/netns/ ?
> what do you think ?

I think using / is confusing.  And something like -a[ll] as an option doesn't
seem right either.

Or you just trap the name "all" in the add case and don't allow it.

Just my opinion.

-Brian

^ permalink raw reply

* Fw: iproute2: segfault with ip link show dev
From: Stephen Hemminger @ 2015-01-07 19:42 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 1996 bytes --]

Looks like one VF info changes broke old code

Begin forwarded message:

Date: Wed, 7 Jan 2015 04:06:53 -0800
From: William Dauchy <william@gandi.net>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: iproute2: segfault with ip link show dev


Hi,

I was using iproute2 3.15.
My network card in using igb driver with VF enable, e.g igb.max_vfs=2
After upgrading to 3.16, I have now a segfault while doing a
usual `ip link show dev eth1`.
Disabling VFS make the segafult disappear.

Here is the gdb trace even if it does not contain much info.
The segfault occurs when at the VF step.

(gdb) break print_linkinfo
Breakpoint 1 at 0x40782d
(gdb) set args link show dev eth1
(gdb) r
Starting program: /sbin/ip link show dev eth1

Breakpoint 1, 0x000000000040782d in print_linkinfo ()
(gdb) n
Single stepping until exit from function print_linkinfo,
which has no line number information.
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP
mode DEFAULT group default qlen 10000
    link/ether 00:26:6c:ff:b5:c1 brd ff:ff:ff:ff:ff:ff

Program received signal SIGSEGV, Segmentation fault.
0x00000000004070eb in print_vfinfo ()
(gdb) bt
#0  0x00000000004070eb in print_vfinfo ()
#1  0x0000000000407f9f in print_linkinfo ()
#2  0x000000000041f266 in iplink_get ()
#3  0x0000000000409c69 in ipaddr_list_flush_or_save ()
#4  0x000000000040a113 in ipaddr_list_link ()
#5  0x00000000004203f6 in do_iplink ()
#6  0x0000000000405a07 in do_cmd ()
#7  0x000000000040621e in main ()

The expected output is for example:

# ip link show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 10000
    link/ether 00:26:6c:ff:b3:8d brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto


I'm using my own kernel build, a stable v3.14.x

Regards,
-- 
William

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v5 0/7]: ixgbevf: Allow querying VFs RSS indirection table and key
From: Greg Rose @ 2015-01-07 19:46 UTC (permalink / raw)
  To: Vlad Zolotarov; +Cc: netdev, Gleb Natapov, Avi Kivity, jeffrey.t.kirsher
In-Reply-To: <1420658802-18061-1-git-send-email-vladz@cloudius-systems.com>

On Wed, Jan 7, 2015 at 11:26 AM, Vlad Zolotarov
<vladz@cloudius-systems.com> wrote:
> Add the ethtool ops to VF driver to allow querying the RSS indirection table
> and RSS Random Key.
>
> On some devices VFs share the RSS Redirection Table and Hash Key with a PF and letting
> the VF query this information may introduce some security risks. Therefore we disable this
> feature by default for such devices (e.g. 82599) and allow it for those where there isn't any
> possible risk (e.g. on x550). The new netdev op is going to allow a system administrator to
> change the default behaviour with "ip link set" command.
>
>  - netdev: Add a new netdev op to allow/block VF from querying RSS Indirection Table and
>    RSS Hash Key.
>  - PF driver: Add new VF-PF channel commands.
>  - VF driver: Utilize these new commands and add the corresponding
>               ethtool callbacks.
>
> New in v5:
>    - Added a new netdev op to allow/block VF from querying RSS Indirection Table and
>      RSS Hash Key.
>    - Let VF query the RSS info only if VF is allowed to.
>
> New in v4:
>    - Forgot to run checkpatch on v3 and there were a few styling things to fix. ;)
>
> New in v3:
>    - Added a missing support for x550 devices.
>    - Mask the indirection table values according to PSRTYPE[n].RQPL.
>    - Minimized the number of added VF-PF commands.
>
> New in v2:
>    - Added a detailed description to patches 4 and 5.
>
> New in v1 (compared to RFC):
>    - Use "if-else" statement instead of a "switch-case" for a single option case.
>      More specifically: in cases where the newly added API version is the only one
>      allowed. We may consider using a "switch-case" back again when the list of
>      allowed API versions in these specific places grows up.
>
> Vlad Zolotarov (7):
>   if_link: Add an additional parameter to ifla_vf_info for RSS querying
>   ixgbe: Add a new netdev op to allow/prevent a VF from querying an RSS
>     info
>   ixgbe: Add a RETA query command to VF-PF channel API
>   ixgbevf: Add a RETA query code
>   ixgbe: Add GET_RSS_KEY command to VF-PF channel commands set
>   ixgbevf: Add RSS Key query code
>   ixgbevf: Add the appropriate ethtool ops to query RSS indirection
>     table and key
>
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h          |   1 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |   7 ++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h      |  10 ++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c    | 119 +++++++++++++++++++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h    |   2 +
>  drivers/net/ethernet/intel/ixgbevf/ethtool.c      |  42 +++++++
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   4 +-
>  drivers/net/ethernet/intel/ixgbevf/mbx.h          |  10 ++
>  drivers/net/ethernet/intel/ixgbevf/vf.c           | 132 ++++++++++++++++++++++
>  drivers/net/ethernet/intel/ixgbevf/vf.h           |   2 +
>  include/linux/if_link.h                           |   1 +
>  include/linux/netdevice.h                         |   8 ++
>  include/uapi/linux/if_link.h                      |   8 ++
>  net/core/rtnetlink.c                              |  33 +++++-
>  14 files changed, 372 insertions(+), 7 deletions(-)

The series looks good to me with the addition of the ability to set
policy via the new netdev op.

Thanks Vlad!

Acked-By: Greg Rose <gregory.v.rose@intel.com>

- Greg

>
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch iproute2 2/2] tc: add support for BPF based actions
From: Jiri Pirko @ 2015-01-07 19:52 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, David Miller, Jamal Hadi Salim, Stephen Hemminger
In-Reply-To: <CAHA+R7Pk31fjLm-ksUF-8hQ3+tkqQ-Z820Bayj2n6DTFjNcCsQ@mail.gmail.com>

Wed, Jan 07, 2015 at 07:50:47PM CET, cwang@twopensource.com wrote:
>On Wed, Jan 7, 2015 at 8:47 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> +       fprintf(stderr, "Usage: ... bpf ...\n");
>> +       fprintf(stderr, "\n");
>> +       fprintf(stderr, " [inline]:     run bytecode BPF_BYTECODE\n");
>> +       fprintf(stderr, " [from file]:  run bytecode-file FILE\n");
>> +       fprintf(stderr, "\n");
>> +       fprintf(stderr, "Where BPF_BYTECODE := \'s,c t f k,c t f k,c t f k,...\'\n");
>> +       fprintf(stderr, "      c,t,f,k and s are decimals; s denotes number of 4-tuples\n");
>> +       fprintf(stderr, "Where FILE points to a file containing the BPF_BYTECODE string\n");
>> +       fprintf(stderr, "\nACTION_SPEC := ... look at individual actions\n");
>> +       fprintf(stderr, "NOTE: CLASSID is parsed as hexadecimal input.\n");
>
>Can we just use BPF transparently for gact?

Why to squash it there? I think it is much clearer to do this
separatelly.

>It is never user-friendly to
>use this kind of bytecode even though I know there is a tool to "compile"
>BPF.

Please see cls_bpf. It's already in-tree for some time. act_bpf just
completes this.

^ permalink raw reply

* Re: [patch iproute2 2/2] tc: add support for BPF based actions
From: Cong Wang @ 2015-01-07 19:58 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, David Miller, Jamal Hadi Salim, Stephen Hemminger
In-Reply-To: <20150107195233.GA1898@nanopsycho.orion>

On Wed, Jan 7, 2015 at 11:52 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Wed, Jan 07, 2015 at 07:50:47PM CET, cwang@twopensource.com wrote:
>>On Wed, Jan 7, 2015 at 8:47 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> +       fprintf(stderr, "Usage: ... bpf ...\n");
>>> +       fprintf(stderr, "\n");
>>> +       fprintf(stderr, " [inline]:     run bytecode BPF_BYTECODE\n");
>>> +       fprintf(stderr, " [from file]:  run bytecode-file FILE\n");
>>> +       fprintf(stderr, "\n");
>>> +       fprintf(stderr, "Where BPF_BYTECODE := \'s,c t f k,c t f k,c t f k,...\'\n");
>>> +       fprintf(stderr, "      c,t,f,k and s are decimals; s denotes number of 4-tuples\n");
>>> +       fprintf(stderr, "Where FILE points to a file containing the BPF_BYTECODE string\n");
>>> +       fprintf(stderr, "\nACTION_SPEC := ... look at individual actions\n");
>>> +       fprintf(stderr, "NOTE: CLASSID is parsed as hexadecimal input.\n");
>>
>>Can we just use BPF transparently for gact?
>
> Why to squash it there? I think it is much clearer to do this
> separatelly.
>

Because they are both intended to drop/pass/pipe packets,
we don't have to make a separated one just because one is
using BPF one isn't.

>>It is never user-friendly to
>>use this kind of bytecode even though I know there is a tool to "compile"
>>BPF.
>
> Please see cls_bpf. It's already in-tree for some time. act_bpf just
> completes this.

Yeah, that is what I hate too.

^ permalink raw reply

* Re: [PATCH net-next v5 0/7]: ixgbevf: Allow querying VFs RSS indirection table and key
From: Jeff Kirsher @ 2015-01-07 20:03 UTC (permalink / raw)
  To: Vlad Zolotarov; +Cc: netdev, gleb, avi
In-Reply-To: <1420658802-18061-1-git-send-email-vladz@cloudius-systems.com>

[-- Attachment #1: Type: text/plain, Size: 3347 bytes --]

On Wed, 2015-01-07 at 21:26 +0200, Vlad Zolotarov wrote:
> Add the ethtool ops to VF driver to allow querying the RSS indirection
> table
> and RSS Random Key.
> 
> On some devices VFs share the RSS Redirection Table and Hash Key with
> a PF and letting
> the VF query this information may introduce some security risks.
> Therefore we disable this
> feature by default for such devices (e.g. 82599) and allow it for
> those where there isn't any
> possible risk (e.g. on x550). The new netdev op is going to allow a
> system administrator to
> change the default behaviour with "ip link set" command.
> 
>  - netdev: Add a new netdev op to allow/block VF from querying RSS
> Indirection Table and
>    RSS Hash Key.
>  - PF driver: Add new VF-PF channel commands.
>  - VF driver: Utilize these new commands and add the corresponding
>               ethtool callbacks.
> 
> New in v5:
>    - Added a new netdev op to allow/block VF from querying RSS
> Indirection Table and
>      RSS Hash Key.
>    - Let VF query the RSS info only if VF is allowed to.
> 
> New in v4:
>    - Forgot to run checkpatch on v3 and there were a few styling
> things to fix. ;)
> 
> New in v3:
>    - Added a missing support for x550 devices.
>    - Mask the indirection table values according to PSRTYPE[n].RQPL.
>    - Minimized the number of added VF-PF commands.
> 
> New in v2:
>    - Added a detailed description to patches 4 and 5.
> 
> New in v1 (compared to RFC):
>    - Use "if-else" statement instead of a "switch-case" for a single
> option case.
>      More specifically: in cases where the newly added API version is
> the only one
>      allowed. We may consider using a "switch-case" back again when
> the list of
>      allowed API versions in these specific places grows up.
> 
> Vlad Zolotarov (7):
>   if_link: Add an additional parameter to ifla_vf_info for RSS
> querying
>   ixgbe: Add a new netdev op to allow/prevent a VF from querying an
> RSS
>     info
>   ixgbe: Add a RETA query command to VF-PF channel API
>   ixgbevf: Add a RETA query code
>   ixgbe: Add GET_RSS_KEY command to VF-PF channel commands set
>   ixgbevf: Add RSS Key query code
>   ixgbevf: Add the appropriate ethtool ops to query RSS indirection
>     table and key
> 
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h          |   1 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |   7 ++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h      |  10 ++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c    | 119
> +++++++++++++++++++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h    |   2 +
>  drivers/net/ethernet/intel/ixgbevf/ethtool.c      |  42 +++++++
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   4 +-
>  drivers/net/ethernet/intel/ixgbevf/mbx.h          |  10 ++
>  drivers/net/ethernet/intel/ixgbevf/vf.c           | 132
> ++++++++++++++++++++++
>  drivers/net/ethernet/intel/ixgbevf/vf.h           |   2 +
>  include/linux/if_link.h                           |   1 +
>  include/linux/netdevice.h                         |   8 ++
>  include/uapi/linux/if_link.h                      |   8 ++
>  net/core/rtnetlink.c                              |  33 +++++-
>  14 files changed, 372 insertions(+), 7 deletions(-)

Thanks Vlad, I will add your patches to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH iproute2 3/3] ip netns: Delete all netns
From: Vadim Kochan @ 2015-01-07 19:55 UTC (permalink / raw)
  To: Brian Haley; +Cc: Vadim Kochan, netdev
In-Reply-To: <54AD8BC3.2070609@hp.com>

On Wed, Jan 07, 2015 at 02:40:51PM -0500, Brian Haley wrote:
> On 01/07/2015 01:11 PM, Vadim Kochan wrote:
> > On Wed, Jan 07, 2015 at 07:36:40PM +0200, Vadim Kochan wrote:
> >> On Wed, Jan 07, 2015 at 10:44:24AM -0500, Brian Haley wrote:
> >>> On 01/07/2015 06:04 AM, Vadim Kochan wrote:
> >>>> From: Vadim Kochan <vadim4j@gmail.com>
> >>>>
> >>>> Allow delete all namespace names by:
> >>>>
> >>>>     $ ip netns del all
> >>>
> >>> So I can still create a namespace called 'all', but can't exec in it or delete
> >>> it independently with this change.  Perhaps you need to block that as well?
> >>> Unless there's some other patch I'm missing?
> >>>
> >>> -Brian
> >> Hm, I did not take it into account ...
> >> I will look if I can find another way ...
> >>
> >> Thanks,
> > 
> > what about this ?
> > 
> >     $ ip netns exec / ip link
> >     $ ip netns del /
> > 
> > so it make a sense to be as root directory of bound ns names in /var/run/netns/ ?
> > what do you think ?
> 
> I think using / is confusing.  And something like -a[ll] as an option doesn't
> seem right either.
> 
> Or you just trap the name "all" in the add case and don't allow it.
> 
> Just my opinion.
> 
> -Brian

So I think that do not allow to add netns "all" can be a solution, I'd
like to hear from other people if it might be OK.

Thanks,

^ permalink raw reply

* RE: [net v2 2/3] i40e: Fix Rx checksum error counter
From: Singhai, Anjali @ 2015-01-07 20:14 UTC (permalink / raw)
  To: Tom Herbert, Kirsher, Jeffrey T
  Cc: David Miller, Linux Netdev List, nhorman@redhat.com,
	sassmann@redhat.com, jogreene@redhat.com, Rose, Gregory V
In-Reply-To: <CA+mtBx-VgUCCxnkayoqR6AwYXu_mfzbtGZzH+oeT3wrOznzWGg@mail.gmail.com>

On Tue, 6 Jan 2015 21:43:57 -0800
Tom Herbert <therbert@google.com> wrote:
> > @@ -1337,15 +1335,19 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
> >                                           skb->protocol == htons(ETH_P_8021AD))
> >                                           ? VLAN_HLEN : 0;
> >
> > -               rx_udp_csum = udp_csum(skb);
> > -               iph = ip_hdr(skb);
> > -               csum = csum_tcpudp_magic(
> > -                               iph->saddr, iph->daddr,
> > -                               (skb->len - skb_transport_offset(skb)),
> > -                               IPPROTO_UDP, rx_udp_csum);
> > +               if ((ip_hdr(skb)->protocol == IPPROTO_UDP) &&
> > +                   (udp_hdr(skb)->check != 0)) {
> > +                       rx_udp_csum = udp_csum(skb);
> 
> Doesn't this compute the whole checksum of the packet making the fact 
> that device verified inner checksum pretty much irrelevant? It would 
> probably be just as well to return CHECKSUM_NONE and let the stack 
> deal with it and remove all this complexity.

This is only calculating outer UDP csum, inner csums are offloaded and so is the outer IP csum. Overall this is less work than asking the stack to do all of those by marking it as CHECKSUM_UNNECESSARY. We do have a patch in line to use csum_level but I believe even with that we would be asking the stack to do more work than necessary if we indicate that only inner checksums are offloaded. With our HW we are able to offload 3 out of 4 csums.

^ permalink raw reply

* Re: TCP connection issues against Amazon S3
From: Erik Grinaker @ 2015-01-07 20:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <1420646284.5947.19.camel@edumazet-glaptop2.roam.corp.google.com>

On 07 Jan 2015, at 15:58, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-01-07 at 13:31 +0000, Erik Grinaker wrote:
>> On 06 Jan 2015, at 22:00, Yuchung Cheng <ycheng@google.com> wrote:
>>> On Tue, Jan 6, 2015 at 1:04 PM, Erik Grinaker <erik@bengler.no> wrote:
>>>> 
>>>>> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
>>>> This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s
>>> probably some minor change, which gets amplified by the lack of SACKs
>>> on the loadbalancer. Anyway, I’ll bring it up with Amazon.
>>> can you post traces with the older kernels?
>> 
>> Here is a dump using 3.11.10 against a non-SACK-enabled loadbalancer:
>> 
>> http://abstrakt.bengler.no/tcp-issues-s3-nosack-3.11.10.pcap.bz2
>> 
>> The transfer shows lots of DUPACKs and retransmits, but this does not
>> seem to have as bad an effect as it did with the failing transfer we
>> saw on newer kernels:
>> 
>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>> 
>> One big difference, which Rick touched on earlier, is that the newer
>> kernels keep sending TCP window updates as it’s going through the
>> retransmits. The older kernel does not do this.
> 
> The new kernel is the receiver : It does no retransmits.
> 
> Increasing window in ACK packets should not prevent sender into
> retransmitting missing packets.
> 
> Sender is not a linux host and is very buggy IMO : If receiver
> advertises a too big window, sender decides to not retransmit in some
> cases.

I agree. I have contacted Amazon about this, but am not too hopeful for a quick fix; they have been promising SACK-support on their loadbalancers since 2006, for example.

That said, since this change breaks a service as popular as S3, it might be worth reconsidering.

> You can play with /proc/sys/net/ipv4/tcp_rmem and adopt very low values
> to work around the sender bug.
> 
> ( Or use SO_RCVBUF in receiver application)

Thanks, setting SO_RCVBUF seems like a reasonable workaround.--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH net] ipv6: Prevent ipv6_find_hdr() from returning ENOENT for valid non-first fragments
From: Rahul Sharma @ 2015-01-07 20:48 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Pablo Neira Ayuso, netdev, linux-kernel, netfilter-devel
In-Reply-To: <1420627396.26870.36.camel@stressinduktion.org>

Hi Hannes,

On Wed, Jan 7, 2015 at 4:13 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi,
>
> On Mi, 2015-01-07 at 11:11 +0530, Rahul Sharma wrote:
>> On Wed, Jan 7, 2015 at 4:17 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> > On Wed, Jan 07, 2015 at 03:03:20AM +0530, Rahul Sharma wrote:
>> >> ipv6_find_hdr() currently assumes that the next-header field in the
>> >> fragment header of the non-first fragment is the "protocol number of
>> >> the last header" (here last header excludes any extension header
>> >> protocol numbers ) which is incorrect as per RFC2460. The next-header
>> >> value is the first header of the fragmentable part of the original
>> >> packet (which can be extension header as well).
>> >> This can create reassembly problems. For example: Fragmented
>> >> authenticated OSPFv3 packets (where AH header is inserted before the
>> >> protocol header). For the second fragment, the next header value in
>> >> the fragment header will be NEXTHDR_AUTH which is correct but
>> >> ipv6_find_hdr will return ENOENT since AH is an extension header
>> >> resulting in second fragment getting dropped. This check for the
>> >> presence of non-extension header needs to be removed.
>> >>
>> >> Signed-off-by: Rahul Sharma <rsharma@arista.com>
>> >> ---
>> >> --- linux-3.18.1/net/ipv6/exthdrs_core.c.orig   2015-01-06
>> >> 10:25:36.411419863 -0800
>> >> +++ linux-3.18.1/net/ipv6/exthdrs_core.c        2015-01-06
>> >> 10:51:45.819364986 -0800
>> >> @@ -171,10 +171,11 @@ EXPORT_SYMBOL_GPL(ipv6_find_tlv);
>> >>   * If the first fragment doesn't contain the final protocol header or
>> >>   * NEXTHDR_NONE it is considered invalid.
>> >>   *
>> >> - * Note that non-1st fragment is special case that "the protocol number
>> >> - * of last header" is "next header" field in Fragment header. In this case,
>> >> - * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
>> >> - * isn't NULL.
>> >> + * Note that non-1st fragment is special case that "the protocol number of the
>> >> + * first header of the fragmentable part of the original packet" is
>> >> + * "next header" field in the Fragment header. In this case, *offset is
>> >> + * meaningless and fragment offset is stored in *fragoff if fragoff isn't
>> >> + * NULL.
>> >>   *
>> >>   * if flags is not NULL and it's a fragment, then the frag flag
>> >>   * IP6_FH_F_FRAG will be set. If it's an AH header, the
>> >> @@ -250,9 +251,7 @@ int ipv6_find_hdr(const struct sk_buff *
>> >>
>> >>                         _frag_off = ntohs(*fp) & ~0x7;
>> >>                         if (_frag_off) {
>> >> -                               if (target < 0 &&
>> >> -                                   ((!ipv6_ext_hdr(hp->nexthdr)) ||
>> >
>> > This check assumes that the following headers cannot show up in the
>> > fragmented part of the IPv6 packet:
>> >
>> >  12 bool ipv6_ext_hdr(u8 nexthdr)
>> >  13 {
>> >  14         /*
>> >  15          * find out if nexthdr is an extension header or a protocol
>> >  16          */
>> >  17         return   (nexthdr == NEXTHDR_HOP)       ||
>> >  18                  (nexthdr == NEXTHDR_ROUTING)   ||
>> >  19                  (nexthdr == NEXTHDR_FRAGMENT)  ||
>> >  20                  (nexthdr == NEXTHDR_AUTH)      ||
>> >  21                  (nexthdr == NEXTHDR_NONE)      ||
>> >  22                  (nexthdr == NEXTHDR_DEST);
>> >
>> >> -                                    hp->nexthdr == NEXTHDR_NONE)) {
>> >> +                               if (target < 0) {
>> >>                                         if (fragoff)
>> >>                                                 *fragoff = _frag_off;
>> >>                                         return hp->nexthdr;
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> I think this is incorrect. Authentication header shows up in the
>> fragmentable part of the original IPv6 packet. So, for the non-first
>> fragments the next-header field value can be NEXTHDR_AUTH.
>
> Pablo's mail got me thinking again.
>
> In general, IPv6 extension headers can appear in any order and stacks
> must be process them. Fragmentation adds a limitation, that some
> extension headers do not make sense and don't have any effect if they
> appear after a fragmentation header (HbH and ROUTING).
>
> Looking at the rest of the function we don't check for HBHHDR or RTHDR
> following a fragmentation header either if we process the first fragment
> (core stack only processes HBH if directly following the ipv6 header
> anyway).
>
> So, in my opinion, it is safe to completely remove this check and it
> would align if the rest of the extension processing logic. The callers
> all seem fine with that.
>
> Pablo, what do you think?
>
> Anyway, the patch does not apply cleanly, the patch header is mangled.
> Could you check and send again?
>
> Thanks,
> Hannes
>
>
I am not sure if replying on the thread with a patch is a good idea
(or should I send a new email). Anyway, let me know if this is works.

Signed-off-by: Rahul Sharma <rsharma@arista.com>
---
 net/ipv6/exthdrs_core.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c
index 8af3eb5..5949f87 100644
--- a/net/ipv6/exthdrs_core.c
+++ b/net/ipv6/exthdrs_core.c
@@ -171,10 +171,11 @@ EXPORT_SYMBOL_GPL(ipv6_find_tlv);
  * If the first fragment doesn't contain the final protocol header or
  * NEXTHDR_NONE it is considered invalid.
  *
- * Note that non-1st fragment is special case that "the protocol number
- * of last header" is "next header" field in Fragment header. In this case,
- * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
- * isn't NULL.
+ * Note that non-1st fragment is special case that "the protocol number of the
+ * first header of the fragmentable part of the original packet" is
+ * "next header" field in the Fragment header. In this case, *offset is
+ * meaningless and fragment offset is stored in *fragoff if fragoff isn't
+ * NULL.
  *
  * if flags is not NULL and it's a fragment, then the frag flag
  * IP6_FH_F_FRAG will be set. If it's an AH header, the
@@ -250,9 +251,7 @@ int ipv6_find_hdr(const struct sk_buff *skb,
unsigned int *offset,

                        _frag_off = ntohs(*fp) & ~0x7;
                        if (_frag_off) {
-                               if (target < 0 &&
-                                   ((!ipv6_ext_hdr(hp->nexthdr)) ||
-                                    hp->nexthdr == NEXTHDR_NONE)) {
+                               if (target < 0) {
                                        if (fragoff)
                                                *fragoff = _frag_off;
                                        return hp->nexthdr;
-- 
1.7.4.4

^ permalink raw reply related

* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Jesse Gross @ 2015-01-07 20:52 UTC (permalink / raw)
  To: Fan Du
  Cc: Du, Fan, Thomas Graf, davem@davemloft.net, Michael S. Tsirkin,
	Jason Wang, netdev@vger.kernel.org, fw@strlen.de,
	dev@openvswitch.org, pshelar@nicira.com
In-Reply-To: <54ACCAFD.4070203@gmail.com>

On Tue, Jan 6, 2015 at 9:58 PM, Fan Du <fengyuleidian0615@gmail.com> wrote:
> 于 2015年01月07日 03:11, Jesse Gross 写道:
>>>>
>>>> One of the reasons for only doing path MTU discovery
>>>> >>for L3 is that it operates seamlessly as part of normal operation -
>>>> >>there is no need to forge addresses or potentially generate ICMP when
>>>> >>on an L2 network. However, this ignores the IP handling that is going
>>>> >>on (note that in OVS it is possible for L3 to be implemented as a set
>>>> >>of flows coming from a controller).
>>>> >>
>>>> >>It also should not be VXLAN specific or duplicate VXLAN encapsulation
>>>> >>code. As this is happening before encapsulation, the generated ICMP
>>>> >>does not need to be encapsulated either if it is created in the right
>>>> >>location.
>>>
>>> >
>>> >Yes, I agree. GRE share the same issue from the code flow.
>>> >Pushing back ICMP msg back without encapsulation without circulating
>>> > down
>>> >to physical device is possible. The "right location" as far as I know
>>> >could only be in ovs_vport_send. In addition this probably requires
>>> > wrapper
>>> >route looking up operation for GRE/VXLAN, after get the under layer
>>> > device
>>> >MTU
>>> >from the routing information, then calculate reduced MTU becomes
>>> > feasible.
>>
>> As I said, it needs to be integrated into L3 processing. In OVS this
>> would mean adding some primitives to the kernel and then exposing the
>> functionality upwards into userspace/controller.
>
>
> I'm bit of confused with "L3 processing" you mentioned here... SORRY
> Apparently I'm not seeing the whole picture as you pointed out. Could you
> please
> elaborate "L3 processing" a bit more? docs/codes/or other useful links.
> Appreciated.

L3 processing is anywhere that routing takes place - i.e. where you
would decrement the TTL and change the MAC addresses. Part of routing
is dealing with differing MTUs, so that needs to be integrated into
the same logic.

> My understanding is:
> controller sets the forwarding rules into kernel datapath, any flow not
> matching
> with the rules are threw to controller by upcall. Once the rule decision is
> made
> by controller, then, this flow packet is pushed down to datapath to be
> forwarded
> again according to the new rule.
>
> So I'm not sure whether pushing the over-MTU-sized packet or pushing the
> forged ICMP
> without encapsulation to controller is required by current ovs
> implementation. By doing
> so, such over-MTU-sized packet is treated as a event for the controller to
> be take
> care of.

If flows are implementing routing (again, they are doing things like
decrementing the TTL) then it is necessary for them to also handle
this situation using some potentially new primitives (like a size
check). Otherwise you end up with issues like the ones that I
mentioned above like needing to forge addresses because you don't know
what the correct ones are. If the flows aren't doing things to
implement routing, then you really have a flat L2 network and you
shouldn't be doing this type of behavior at all as I described in the
original plan.

^ permalink raw reply

* Re: [PATCH] vhost/net: length miscalculation
From: Sergei Shtylyov @ 2015-01-07 20:58 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Alex Williamson, Greg Kurz, kvm, virtualization, netdev
In-Reply-To: <1420620847-24477-1-git-send-email-mst@redhat.com>

Hello.

On 01/07/2015 11:55 AM, Michael S. Tsirkin wrote:

> commit 8b38694a2dc8b18374310df50174f1e4376d6824
>      vhost/net: virtio 1.0 byte swap
> had this chunk:
> -       heads[headcount - 1].len += datalen;
> +       heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen);

> This adds datalen with the wrong sign, causing guest panics.

> Fixes: 8b38694a2dc8b18374310df50174f1e4376d6824

    The format of this tag assumes 12-digit SHA1 hash and the commit 
description enclosed in parens and double quotes. See 
Documentation/SubmittingPatches.

> Reported-by: Alex Williamson <alex.williamson@redhat.com>
> Suggested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

WBR, Sergei

^ permalink raw reply

* Webmail Help-desk support
From: Marianne Gordon @ 2015-01-07 21:00 UTC (permalink / raw)
  To: info

Your email has exceeded the storage limit created. You will no longer
be able to send or receive messages.
To activate,  Chick here and fill in the necessary information
;http://onlineseverupdatehanceclear.jigsy.com/

Consideration must be activated in the day to regenerate new surfaces.
Help-desk support.

^ permalink raw reply

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
From: Alexei Starovoitov @ 2015-01-07 21:17 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, netdev@vger.kernel.org,
	David S. Miller, Andy Gospodarek

On Tue, Jan 6, 2015 at 9:37 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>> - above plus put_header_graph() which will allow to
>>    rearrange some fixed sized headers ?
>
> OK but I'm not sure where/if these devices exist. Maybe your
> thinking of a software dataplane case? Would get_headers return
> some headers/fields but not include them in the graph and then
> expect the user to build a graph with them if the user needs
> them. Are there restrictions on how the graph can be built
> out? I guess I'm working with the assumption that the device
> returns a complete parse graph for all combinations it supports.

ahh. I thought that get_hdr_graph() will return one
that is currently configured and put_hdr_graph()
will try to configure new sequence of headers.
I think returning all possible combinations is not practical,
since number of such combinations can be very large for
some hw.
Also it seems that 4/11 patch and rocker_header_nodes[]
in particular describing one graph instead of
all possible?

>> - above plus put_header() ?
>>    I'm having a hard time envisioning how that would
>>    look like.
>
> This case makes more sense to me. The user supplies the definition
> of the headers and the graph showing how they are related and the
> driver can program the parser to work correctly.

yes, assuming that put_hdr_graph() programs one
sequence of jumping through hdrs...
but I think it's also fine if you do one put_hdrs_and_graph()
function as you described.

> To be honest though I would really be happy getting the 1st option
> working.

agree.
as long as we don't screw up get*() semantics that
prevent clean put*() logic :)
To illustrate my point:
if hw parser can parse 2 vlans and there is
no way to configure it to do zero, one or three, it's perfectly
fine for put_hdr_graph() to fail when it tries to configure
something different.
But if hw can be configured to do 1 vlan or 2 vlans, it
would be overkill to pass both graphs in get().
Just pass one that is currently active and let put() try things?
I think get_hdrs() on its own is good enough indication
to the user what hw is capable of and hdr_graph is
just a jump table between them. If hw can parse vxlan
without vxlan extensions it will be clearly seen in get_hdrs,
so no point trying to do put_hdrs() with some new
definition of vxlan unless parser is fully programmable.
that's where I was going with my category 2 where
only put_hdr_graph() exists... imo it will fit trident
and alta models ?
Personally I believe that we should design this API
with as much as possible real hw in mind.
rocker can support different models of hw...

^ permalink raw reply

* Re: TCP connection issues against Amazon S3
From: Eric Dumazet @ 2015-01-07 21:33 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <A58F8257-97F8-4662-98FC-7AC57F35A3A4@bengler.no>

On Wed, 2015-01-07 at 20:37 +0000, Erik Grinaker wrote:

> I agree. I have contacted Amazon about this, but am not too hopeful
> for a quick fix; they have been promising SACK-support on their
> loadbalancers since 2006, for example.
> 
> That said, since this change breaks a service as popular as S3, it
> might be worth reconsidering.
> 

Which change are you talking about ? Have you done a bisection to
clearly identify the patch exposing this sender bug ?

We are not going to stick TCP stack to 20th century and buggy peers or
middleboxes, sorry.

^ permalink raw reply

* Re: TCP connection issues against Amazon S3
From: Yuchung Cheng @ 2015-01-07 21:33 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <A58F8257-97F8-4662-98FC-7AC57F35A3A4@bengler.no>

On Wed, Jan 7, 2015 at 12:37 PM, Erik Grinaker <erik@bengler.no> wrote:
> On 07 Jan 2015, at 15:58, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Wed, 2015-01-07 at 13:31 +0000, Erik Grinaker wrote:
>>> On 06 Jan 2015, at 22:00, Yuchung Cheng <ycheng@google.com> wrote:
>>>> On Tue, Jan 6, 2015 at 1:04 PM, Erik Grinaker <erik@bengler.no> wrote:
>>>>>
>>>>>> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
>>>>> This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s
>>>> probably some minor change, which gets amplified by the lack of SACKs
>>>> on the loadbalancer. Anyway, I’ll bring it up with Amazon.
>>>> can you post traces with the older kernels?
>>>
>>> Here is a dump using 3.11.10 against a non-SACK-enabled loadbalancer:
>>>
>>> http://abstrakt.bengler.no/tcp-issues-s3-nosack-3.11.10.pcap.bz2
>>>
>>> The transfer shows lots of DUPACKs and retransmits, but this does not
>>> seem to have as bad an effect as it did with the failing transfer we
>>> saw on newer kernels:
>>>
>>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>>>
>>> One big difference, which Rick touched on earlier, is that the newer
>>> kernels keep sending TCP window updates as it’s going through the
>>> retransmits. The older kernel does not do this.
>>
>> The new kernel is the receiver : It does no retransmits.
>>
>> Increasing window in ACK packets should not prevent sender into
>> retransmitting missing packets.
>>
>> Sender is not a linux host and is very buggy IMO : If receiver
>> advertises a too big window, sender decides to not retransmit in some
>> cases.
>
> I agree. I have contacted Amazon about this, but am not too hopeful for a quick fix; they have been promising SACK-support on their loadbalancers since 2006, for example.
>
> That said, since this change breaks a service as popular as S3, it might be worth reconsidering.
With the newer kernel and bigger receive window, the sender skips (the
already slow NewReno) fast recovery and falls back to (exp backoff)
timeout recovery. Reducing rwin to accommodate the sender's bug seems
backward to me.


>
>> You can play with /proc/sys/net/ipv4/tcp_rmem and adopt very low values
>> to work around the sender bug.
>>
>> ( Or use SO_RCVBUF in receiver application)
>
> Thanks, setting SO_RCVBUF seems like a reasonable workaround.

^ permalink raw reply

* Re: [net v2 2/3] i40e: Fix Rx checksum error counter
From: Tom Herbert @ 2015-01-07 21:34 UTC (permalink / raw)
  To: Singhai, Anjali
  Cc: Kirsher, Jeffrey T, David Miller, Linux Netdev List,
	nhorman@redhat.com, sassmann@redhat.com, jogreene@redhat.com,
	Rose, Gregory V
In-Reply-To: <C40BE8378EF49C44B9184714DBC8EF2992D9668D@ORSMSX110.amr.corp.intel.com>

On Wed, Jan 7, 2015 at 12:14 PM, Singhai, Anjali
<anjali.singhai@intel.com> wrote:
> On Tue, 6 Jan 2015 21:43:57 -0800
> Tom Herbert <therbert@google.com> wrote:
>> > @@ -1337,15 +1335,19 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
>> >                                           skb->protocol == htons(ETH_P_8021AD))
>> >                                           ? VLAN_HLEN : 0;
>> >
>> > -               rx_udp_csum = udp_csum(skb);
>> > -               iph = ip_hdr(skb);
>> > -               csum = csum_tcpudp_magic(
>> > -                               iph->saddr, iph->daddr,
>> > -                               (skb->len - skb_transport_offset(skb)),
>> > -                               IPPROTO_UDP, rx_udp_csum);
>> > +               if ((ip_hdr(skb)->protocol == IPPROTO_UDP) &&
>> > +                   (udp_hdr(skb)->check != 0)) {
>> > +                       rx_udp_csum = udp_csum(skb);
>>
>> Doesn't this compute the whole checksum of the packet making the fact
>> that device verified inner checksum pretty much irrelevant? It would
>> probably be just as well to return CHECKSUM_NONE and let the stack
>> deal with it and remove all this complexity.
>
> This is only calculating outer UDP csum, inner csums are offloaded and so is the outer IP csum. Overall this is less work than asking the stack to do all of those by marking it as CHECKSUM_UNNECESSARY. We do have a patch in line to use csum_level but I believe even with that we would be asking the stack to do more work than necessary if we indicate that only inner checksums are offloaded. With our HW we are able to offload 3 out of 4 csums.

The stack will only compute the checksum over the packet zero or one
times. The fact that the driver is doing it instead of the stack makes
little difference, once the decision is made to checksum the packet
the performance gains of offloading are lost. If you defer to the
stack then this complex code is removed and there is the possibility
that the checksum doesn't even need to be calculated at all (like UDP
packet is bad or no listener for port).

What does "we are able to offload 3 out of 4 csums" mean?

Thanks,
Tom

^ permalink raw reply

* Re: [PATCH net-next] Driver: Vmxnet3: Reinitialize vmxnet3 backend on wakeup from hibernate
From: Sergei Shtylyov @ 2015-01-07 21:57 UTC (permalink / raw)
  To: Shrikrishna Khare, sbhatewara, pv-drivers, netdev, linux-kernel
  Cc: Srividya Murali
In-Reply-To: <1420653393-27657-1-git-send-email-skhare@vmware.com>

Hello.

On 01/07/2015 08:56 PM, Shrikrishna Khare wrote:

> Failing to reinitialize on wakeup results in loss of network connectivity for
> vmxnet3 interface.

> Signed-off-by: Srividya Murali <smurali@vmware.com>
> Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
> Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
> ---
>   drivers/net/vmxnet3/vmxnet3_drv.c |   50 +++++++++++++++++++++---------------
>   drivers/net/vmxnet3/vmxnet3_int.h |    4 +-
>   2 files changed, 31 insertions(+), 23 deletions(-)

> diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
> index 7af1f5c..124cb9f 100644
> --- a/drivers/net/vmxnet3/vmxnet3_drv.c
> +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> @@ -3290,51 +3290,59 @@ skip_arp:
>   static int
>   vmxnet3_resume(struct device *device)
>   {
> -	int err, i = 0;
> +	int err;
>   	unsigned long flags;
>   	struct pci_dev *pdev = to_pci_dev(device);
>   	struct net_device *netdev = pci_get_drvdata(pdev);
>   	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> -	struct Vmxnet3_PMConf *pmConf;
>
>   	if (!netif_running(netdev))
>   		return 0;
>
> -	/* Destroy wake-up filters. */
> -	pmConf = adapter->pm_conf;
> -	memset(pmConf, 0, sizeof(*pmConf));
> -
> -	adapter->shared->devRead.pmConfDesc.confVer = cpu_to_le32(1);
> -	adapter->shared->devRead.pmConfDesc.confLen = cpu_to_le32(sizeof(
> -								  *pmConf));
> -	adapter->shared->devRead.pmConfDesc.confPA =
> -		cpu_to_le64(adapter->pm_conf_pa);
> -
> -	netif_device_attach(netdev);
>   	pci_set_power_state(pdev, PCI_D0);
>   	pci_restore_state(pdev);
>   	err = pci_enable_device_mem(pdev);
>   	if (err != 0)
> -		return err;
> +		goto err;

    Why?

>
>   	pci_enable_wake(pdev, PCI_D0, 0);
>
> +	vmxnet3_alloc_intr_resources(adapter);
> +
> +	/* During hibernate and suspend, device has to be reinitialized as the
> +	 * device state need not be preserved.
> +	 */
> +
> +	/* Need not check adapter state as other reset tasks cannot run during
> +	 * device resume.
> +	 */
>   	spin_lock_irqsave(&adapter->cmd_lock, flags);
>   	VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
> -			       VMXNET3_CMD_UPDATE_PMCFG);
> +			       VMXNET3_CMD_QUIESCE_DEV);
>   	spin_unlock_irqrestore(&adapter->cmd_lock, flags);
> -	vmxnet3_alloc_intr_resources(adapter);
> -	vmxnet3_request_irqs(adapter);
> -	for (i = 0; i < adapter->num_rx_queues; i++)
> -		napi_enable(&adapter->rx_queue[i].napi);
> -	vmxnet3_enable_all_intrs(adapter);
> +	vmxnet3_tq_cleanup_all(adapter);
> +	vmxnet3_rq_cleanup_all(adapter);
>
> -	return 0;
> +	vmxnet3_reset_dev(adapter);
> +	err = vmxnet3_activate_dev(adapter);
> +	if (err) {
> +		netdev_err(adapter->netdev,

    Not just 'netdev'?

> +			   "%s: failed to re-activate on resume, error: %d",
> +		    netdev->name, err);

    Should have been aligned with the line right above. And isn't netdev->name 
printed by netdev_err() already?

> +		vmxnet3_force_close(adapter);
> +		goto err;

    Why not just *return*?

> +	}
> +	netif_device_attach(netdev);
> +
> +err:
> +	return err;
>   }
[...]

WBR, Sergei

^ permalink raw reply

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
From: John Fastabend @ 2015-01-07 22:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, netdev@vger.kernel.org,
	David S. Miller, Andy Gospodarek
In-Reply-To: <CAADnVQLGLy5w8fcUKdVG6g6EpMJ=n=NMn=pq+mdYSqAOJdtk=w@mail.gmail.com>

On 01/07/2015 01:17 PM, Alexei Starovoitov wrote:
> On Tue, Jan 6, 2015 at 9:37 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>>> - above plus put_header_graph() which will allow to
>>>     rearrange some fixed sized headers ?
>>
>> OK but I'm not sure where/if these devices exist. Maybe your
>> thinking of a software dataplane case? Would get_headers return
>> some headers/fields but not include them in the graph and then
>> expect the user to build a graph with them if the user needs
>> them. Are there restrictions on how the graph can be built
>> out? I guess I'm working with the assumption that the device
>> returns a complete parse graph for all combinations it supports.
>
> ahh. I thought that get_hdr_graph() will return one
> that is currently configured and put_hdr_graph()
> will try to configure new sequence of headers.
> I think returning all possible combinations is not practical,
> since number of such combinations can be very large for
> some hw.

Agree here I think it should return the currently configured
and active hdr graph. Just to be clear I had assumed that any driver
that supported put_header_graph would also support a put_headers
call. basically your case 3 below.

> Also it seems that 4/11 patch and rocker_header_nodes[]
> in particular describing one graph instead of
> all possible?

It returns the one and only graph rocker supports now or at least
the graph of supported headers as I read the rocker code. As
rocker becomes more flexible I would expect this to grow including
tunnels, stacked headers, tcp, etc.

>
>>> - above plus put_header() ?
>>>     I'm having a hard time envisioning how that would
>>>     look like.
>>
>> This case makes more sense to me. The user supplies the definition
>> of the headers and the graph showing how they are related and the
>> driver can program the parser to work correctly.
>
> yes, assuming that put_hdr_graph() programs one
> sequence of jumping through hdrs...
> but I think it's also fine if you do one put_hdrs_and_graph()
> function as you described.
>
>> To be honest though I would really be happy getting the 1st option
>> working.
>
> agree.
> as long as we don't screw up get*() semantics that
> prevent clean put*() logic :)
> To illustrate my point:
> if hw parser can parse 2 vlans and there is
> no way to configure it to do zero, one or three, it's perfectly
> fine for put_hdr_graph() to fail when it tries to configure
> something different.
> But if hw can be configured to do 1 vlan or 2 vlans, it
> would be overkill to pass both graphs in get().
> Just pass one that is currently active and let put() try things?

This is what I intended. I think it is good enough.

> I think get_hdrs() on its own is good enough indication
> to the user what hw is capable of and hdr_graph is
> just a jump table between them. If hw can parse vxlan
> without vxlan extensions it will be clearly seen in get_hdrs,
> so no point trying to do put_hdrs() with some new
> definition of vxlan unless parser is fully programmable.
> that's where I was going with my category 2 where
> only put_hdr_graph() exists... imo it will fit trident
> and alta models ?
> Personally I believe that we should design this API
> with as much as possible real hw in mind.
> rocker can support different models of hw...
>

Yep. Which is why at some point I would like to program up
a couple other "worlds" for rocker that have different pipelines.
This would allow experimenting with more then the current static
model rocker uses.


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [linux-nics] [PATCH] e1000: fix time comparison
From: Jeff Kirsher @ 2015-01-07 22:02 UTC (permalink / raw)
  To: Asaf Vertz
  Cc: linux.nics, e1000-devel, linux-kernel, matthew.vick, netdev,
	carolyn.wyborny
In-Reply-To: <20150107124145.GA15954@ubuntu>

[-- Attachment #1: Type: text/plain, Size: 432 bytes --]

On Wed, 2015-01-07 at 14:41 +0200, Asaf Vertz wrote:
> To be future-proof and for better readability the time comparisons are
> modified to use time_after_eq() instead of plain, error-prone math.
> 
> Signed-off-by: Asaf Vertz <asaf.vertz@tandemg.com>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_ethtool.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)

Thanks Asaf, I will add your patch to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: REGRESSION in nfnetlink on 3.18.x (bisected)
From: Andre Tomt @ 2015-01-07 22:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, netdev
In-Reply-To: <5498A7DF.6040903@tomt.net>

On 23. des. 2014 00:23, Andre Tomt wrote:
> On 22. des. 2014 12:56, Pablo Neira Ayuso wrote:
>> Could you give a test to this patch? Thanks.
>>
>
> Initial testing looks good with this patch applied on top of 3.18.1
> I will give it a spin on some more systems tomorrow.

No news is good news :-)

~10 3.18.x systems in various roles have had this fix for two weeks with 
no issues.

^ permalink raw reply

* iproute2 tc: Print qdisc, stats, class, filter info to string rather than FILE
From: Vadim Kochan @ 2015-01-07 21:53 UTC (permalink / raw)
  To: netdev; +Cc: vadim4j

Hi,

I'd like to make possible to generate plot info about class hierarchy
with qdisc, filters, stats info, but I see that all this staffs are
printed to 'FILE *fp' which makes this hard to implement and seems to me
that it should be changed to be printed to 'char *s' (memory safety
should be considered by asprintf or similar), so this is a big change
but all this is possible and I ready to do it, but before do it I'd like
to ask if this could be acceptable as idea.

BUT maybe someone can advise me easier solution ?

Regards,

^ permalink raw reply

* Re: iproute2 tc: Print qdisc, stats, class, filter info to string rather than FILE
From: Florian Westphal @ 2015-01-07 22:13 UTC (permalink / raw)
  To: Vadim Kochan; +Cc: netdev
In-Reply-To: <20150107215331.GA19197@angus-think.lan>

Vadim Kochan <vadim4j@gmail.com> wrote:
> I'd like to make possible to generate plot info about class hierarchy
> with qdisc, filters, stats info, but I see that all this staffs are
> printed to 'FILE *fp' which makes this hard to implement and seems to me
> that it should be changed to be printed to 'char *s' (memory safety
> should be considered by asprintf or similar), so this is a big change
> but all this is possible and I ready to do it, but before do it I'd like
> to ask if this could be acceptable as idea.
> 
> BUT maybe someone can advise me easier solution ?

Would fmemopen/open_memstream(3) do what you want?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox