Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 1/2] ethtool: Support for configurable RSS hash function
From: Ben Hutchings @ 2014-11-05 21:51 UTC (permalink / raw)
  To: Amir Vadai
  Cc: David S. Miller, netdev, Or Gerlitz, Eyal Perry, Yevgeny Petrilin
In-Reply-To: <1415188769-19593-2-git-send-email-amirv@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 7383 bytes --]

On Wed, 2014-11-05 at 13:59 +0200, Amir Vadai wrote:
> From: Eyal Perry <eyalpe@mellanox.com>
> 
> This patch adds an RSS hash functions string-set, and two
> ethtool-options for set/get current RSS hash function. User-kernel API is done
> through the new hfunc mask field in the ethtool_rxfh struct. A bit set
> in the hfunc is corresponding to an index in the string-set.
> 
> Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
> ---
>  include/linux/ethtool.h      | 28 ++++++++++++++++++++++++
>  include/uapi/linux/ethtool.h |  6 ++++-
>  net/core/ethtool.c           | 52 ++++++++++++++++++++++++++++++--------------
>  3 files changed, 69 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index c1a2d60..61003b1 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -59,6 +59,29 @@ enum ethtool_phys_id_state {
>  	ETHTOOL_ID_OFF
>  };
>  
> +enum {
> +	RSS_HASH_TOP_BIT, /* Configurable RSS hash function - Toeplitz */
> +	RSS_HASH_XOR_BIT, /* Configurable RSS hash function - Xor */
> +
> +	/*
> +	 * Add your fresh new hash function bits above and remember to update
> +	 * rss_hash_func_strings[] below
> +	 */
> +	RSS_HASH_FUNCS_COUNT
> +};
> +
> +#define __RSS_HASH_BIT(bit)	((u32)1 << (bit))
> +#define __RSS_HASH(name)	 __RSS_HASH_BIT(RSS_HASH_##name##_BIT)
> +
> +#define RSS_HASH_TOP		__RSS_HASH(TOP)
> +#define RSS_HASH_XOR		__RSS_HASH(XOR)

I think #define RSS_HASH_UNKNOWN 0 might also be useful.

And I think all of these names should get an ETH_ prefix.

> +static const char
> +rss_hash_func_strings[RSS_HASH_FUNCS_COUNT][ETH_GSTRING_LEN] = {
> +	[RSS_HASH_TOP_BIT] =     "toeplitz",
> +	[RSS_HASH_XOR_BIT] =     "xor",
> +};

This belongs in net/core/ethtool.c.

>  struct net_device;
>  
>  /* Some generic methods drivers may use in their ethtool_ops */
> @@ -158,6 +181,9 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, u32 n_rx_rings)
>   *	Returns zero if not supported for this specific device.
>   * @get_rxfh_indir_size: Get the size of the RX flow hash indirection table.
>   *	Returns zero if not supported for this specific device.
> + * @get_rxfh_func: Get the hardware RX flow hash function.
> + * @set_rxfh_func: Set the hardware RX flow hash function. Returns a negative
> + *	error code or zero.
>   * @get_rxfh: Get the contents of the RX flow hash indirection table and hash
>   *	key.
>   *	Will only be called if one or both of @get_rxfh_indir_size and
> @@ -241,6 +267,8 @@ struct ethtool_ops {
>  	int	(*reset)(struct net_device *, u32 *);
>  	u32	(*get_rxfh_key_size)(struct net_device *);
>  	u32	(*get_rxfh_indir_size)(struct net_device *);
> +	u32	(*get_rxfh_func)(struct net_device *);
> +	int	(*set_rxfh_func)(struct net_device *, u32);

Why not another parameter to get_rxfh/set_rxfh?  I know it's a pain to
update all the implementations, but changing algorithm potentially
changes the supported indirection table and key lengths.  They have to
be validated together.

>  	int	(*get_rxfh)(struct net_device *, u32 *indir, u8 *key);
>  	int	(*set_rxfh)(struct net_device *, const u32 *indir,
>  			    const u8 *key);
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index eb2095b..eb91da4 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -534,6 +534,7 @@ struct ethtool_pauseparam {
>   * @ETH_SS_NTUPLE_FILTERS: Previously used with %ETHTOOL_GRXNTUPLE;
>   *	now deprecated
>   * @ETH_SS_FEATURES: Device feature names
> + * @ETH_SS_RSS_HASH_FUNCS: RSS hush function names
>   */
>  enum ethtool_stringset {
>  	ETH_SS_TEST		= 0,
> @@ -541,6 +542,7 @@ enum ethtool_stringset {
>  	ETH_SS_PRIV_FLAGS,
>  	ETH_SS_NTUPLE_FILTERS,
>  	ETH_SS_FEATURES,
> +	ETH_SS_RSS_HASH_FUNCS,
>  };
>  
>  /**
> @@ -900,7 +902,9 @@ struct ethtool_rxfh {
>  	__u32	rss_context;
>  	__u32   indir_size;
>  	__u32   key_size;
> -	__u32	rsvd[2];
> +	__u8	hfunc;

Missing kernel-doc.  This needs to be very clear about what the valid
values are.

> +	__u8	rsvd8[3];
> +	__u32	rsvd32;
>  	__u32   rss_config[0];
>  };
>  #define ETH_RXFH_INDIR_NO_CHANGE	0xffffffff
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index 06dfb29..4791c17 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -185,6 +185,9 @@ static int __ethtool_get_sset_count(struct net_device *dev, int sset)
>  	if (sset == ETH_SS_FEATURES)
>  		return ARRAY_SIZE(netdev_features_strings);
>  
> +	if (sset == ETH_SS_RSS_HASH_FUNCS)
> +		return ARRAY_SIZE(rss_hash_func_strings);
> +
>  	if (ops->get_sset_count && ops->get_strings)
>  		return ops->get_sset_count(dev, sset);
>  	else
[...]
> @@ -760,32 +769,43 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
>  	const struct ethtool_ops *ops = dev->ethtool_ops;
>  	struct ethtool_rxnfc rx_rings;
>  	struct ethtool_rxfh rxfh;
> -	u32 dev_indir_size = 0, dev_key_size = 0, i;
> +	u32 dev_indir_size = 0, dev_key_size = 0, dev_hfunc = 0, i;
>  	u32 *indir = NULL, indir_bytes = 0;
>  	u8 *hkey = NULL;
>  	u8 *rss_config;
>  	u32 rss_cfg_offset = offsetof(struct ethtool_rxfh, rss_config[0]);
>  
> -	if (!(ops->get_rxfh_indir_size || ops->get_rxfh_key_size) ||
> -	    !ops->get_rxnfc || !ops->set_rxfh)
> +	if (!(ops->get_rxfh_indir_size || ops->get_rxfh_key_size ||
> +	      ops->get_rxfh_func) || !ops->get_rxnfc || !ops->set_rxfh)
>  		return -EOPNOTSUPP;
>  
> +	if (ops->get_rxfh_func)
> +		dev_hfunc = ops->get_rxfh_func(dev);
>  	if (ops->get_rxfh_indir_size)
>  		dev_indir_size = ops->get_rxfh_indir_size(dev);
>  	if (ops->get_rxfh_key_size)
>  		dev_key_size = dev->ethtool_ops->get_rxfh_key_size(dev);
> -	if ((dev_key_size + dev_indir_size) == 0)
> +	if ((dev_key_size + dev_indir_size + dev_hfunc) == 0)
>  		return -EOPNOTSUPP;
>  
>  	if (copy_from_user(&rxfh, useraddr, sizeof(rxfh)))
>  		return -EFAULT;
>  
>  	/* Check that reserved fields are 0 for now */
> -	if (rxfh.rss_context || rxfh.rsvd[0] || rxfh.rsvd[1])
> +	if (rxfh.rss_context || rxfh.rsvd8[0] || rxfh.rsvd8[1] ||
> +	    rxfh.rsvd8[2] || rxfh.rsvd32)
>  		return -EINVAL;
>  
> +	if (rxfh.hfunc != dev_hfunc) {
> +		if (!ops->set_rxfh_func)
> +			return -EOPNOTSUPP;
> +		ret = ops->set_rxfh_func(dev, rxfh.hfunc);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	/* If either indir or hash key is valid, proceed further.
> -	 * It is not valid to request that both be unchanged.
> +	 * Must request at least one change: indir size, hash key or function.
>  	 */
>  	if ((rxfh.indir_size &&
>  	     rxfh.indir_size != ETH_RXFH_INDIR_NO_CHANGE &&
> @@ -793,7 +813,7 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
>  	    (rxfh.key_size && (rxfh.key_size != dev_key_size)) ||
>  	    (rxfh.indir_size == ETH_RXFH_INDIR_NO_CHANGE &&
>  	     rxfh.key_size == 0))
> -		return -EINVAL;
> +		return rxfh.hfunc ? 0 : -EINVAL;

Shouldn't the condition be rxfh.hfunc != dev_hfunc ?

Ben.

>  	if (rxfh.indir_size != ETH_RXFH_INDIR_NO_CHANGE)
>  		indir_bytes = dev_indir_size * sizeof(indir[0]);

-- 
Ben Hutchings
The program is absolutely right; therefore, the computer must be wrong.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply

* Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter
From: David Miller @ 2014-11-05 21:57 UTC (permalink / raw)
  To: viro; +Cc: herbert, netdev, linux-kernel, bcrl
In-Reply-To: <20141105210745.GT7996@ZenIV.linux.org.uk>

From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Wed, 5 Nov 2014 21:07:45 +0000

> Ping me when you put it there, OK?  I'll rebase the rest of old stuff on
> top of it (similar helpers, mostly).

I just pushed it into net-next, thanks Al.

^ permalink raw reply

* Re: mlx4+vxlan offload breaks gre tunnels
From: Tom Herbert @ 2014-11-05 21:59 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Florian Westphal, Linux Netdev List, Jesse Gross, Amir Vadai
In-Reply-To: <545A4DB7.5010603@mellanox.com>

On Wed, Nov 5, 2014 at 8:17 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> On 11/5/2014 5:04 PM, Florian Westphal wrote:
>>
>> tl,dr: all tcp packets sent via gre tunnel have broken tcp csum if vxlan
>> offload
>> is enabled with mlx4 driver.
>>
>> Given following config on tx-side:
>> dev=enp3s0
>> ip addr add dev $dev 192.168.23.1/24
>> ip link set $dev up
>> ip link add mygre type gretap remote 192.168.23.2 local 192.168.23.1
>> ip addr add dev mygre 192.168.42.1/24
>> ip link set gre0 up
>> ip link set mygre up
>>
>> and
>>
>> options mlx4_core log_num_mgm_entry_size=-1 debug_level=1
>> port_type_array=2,2
>>
>> in
>> /etc/modprobe.d/mlx4.conf
>>
>> all tcp packets sent to destinations over the gre tunnel have bogus tcp
>> checksums (and are tossed on rx side when stack validates tcp checksum).
>>
>> net-next head is commit 30349bdbc4da5ecf0efa25556e3caff9c9b8c5f7 .
>>
>> What makes things work for me:
>> either
>>
>> options mlx4_core 1 debug_level=1 port_type_array=2,2
>>
>> (ie. no MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
>>
>> or not setting NETIF_F_IP_CSUM in enc_features:
>>
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> @@ -2579,10 +2579,12 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev,
>> int port,
>>                  dev->priv_flags |= IFF_UNICAST_FLT;
>>            if (mdev->dev->caps.tunnel_offload_mode ==
>> MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
>> -               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
>> +               dev->hw_enc_features |= NETIF_F_RXCSUM |
>>                                          NETIF_F_TSO |
>> NETIF_F_GSO_UDP_TUNNEL;
>>
>> I am not sure if its right fix, but to my eyes this basically looks like
>> mlx4 is telling stack that it can handle tcp checksum offload within
>> tunnels, and that doesn't seem to be the case for all types (e.g. gre).
>>
>> Could someone who understand the enc_features specifics better confirm
>> that
>> above patch is correct (or provide a better/proper fix)?
>
>
> Yep, I can see now the problem. It comes into play with ConnectX3-pro NICs
> that support VXLAN offloads (but not with ConnectX3 NIC which don't) when
> you enable the offloads support on the CX3-pro.
>
> The problem originates from the fact that we can't advertize something like
> "the HW can offload the inner checksum of UDP/VXLAN encapsulated (but not
> for GRE)", e.g in a similar manner that exists in the GSO space, where you
> have NETIF_F_GSO _YYY for each yyy in {UDP, SIT, GRE, etc} tunneling scheme.
>
> I think the best effort we can do now is
>
> 1. come up with something such as the below patch for 3.18 which is
> back-ward portable for -stable kernels, it will only arm the hw offloads if
> the OS tells us there's VXLAN in action
>
> 2. come  up with proper kernel APIs to let NICs advertize which encap
> schemes they can actually offload the inner checksum, Tom... your work which
> now runs over netdev.
>
Possibly #3: add ndo_gso_check to detect nested tunneling. In this
case it would see that gso_type has both SKB_GSO_GRE and
SKB_GSO_UDP_TUNNEL set.

> Tom/Jesse- thoughts? are you +1-ing the below approach?
>
> Or.
>
> tested to work with the  following which is a bit different, tell me if it
> works for you
>
> # node A - with mlx4_en address192.168.31.18
> ip tunnel add gre1 mode gre local 192.168.31.18 remote 192.168.31.17 ttl 255
> ifconfig gre1 10.10.10.18/24 up
> ifconfig gre1 mtu 1450
>
> # node B - with mlx4_en address192.168.31.17
> ip tunnel add gre1 mode gre local 192.168.31.17 remote 192.168.31.18 ttl 255
> ifconfig gre1 10.10.10.17/24 up
> ifconfig gre1 mtu 1450
>
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 0efbae9..7753833 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2292,6 +2292,12 @@ static void mlx4_en_add_vxlan_offloads(struct
> work_struct *work)
>  out:
>         if (ret)
>                 en_err(priv, "failed setting L2 tunnel configuration ret
> %d\n", ret);
> +
> +       /* set offloads */
> +       priv->dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
> +                                     NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;
> +       priv->dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
> +       priv->dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
>  }
>
>  static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
> @@ -2299,6 +2305,10 @@ static void mlx4_en_del_vxlan_offloads(struct
> work_struct *work)
>         int ret;
>         struct mlx4_en_priv *priv = container_of(work, struct mlx4_en_priv,
> vxlan_del_task);
> +       /* unset offloads */
> +       priv->dev->hw_enc_features = 0;
> +       priv->dev->hw_features &= ~NETIF_F_GSO_UDP_TUNNEL;
> +       priv->dev->features    &= ~NETIF_F_GSO_UDP_TUNNEL;
>
>         ret = mlx4_SET_PORT_VXLAN(priv->mdev->dev, priv->port,
>                                   VXLAN_STEER_BY_OUTER_MAC, 0);
> @@ -2578,13 +2588,6 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int
> port,
>         if (mdev->dev->caps.steering_mode != MLX4_STEERING_MODE_A0)
>                 dev->priv_flags |= IFF_UNICAST_FLT;
>
> -       if (mdev->dev->caps.tunnel_offload_mode ==
> MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
> -               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
> -                                       NETIF_F_TSO |
> NETIF_F_GSO_UDP_TUNNEL;
> -               dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
> -               dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
> -       }
> -
>         mdev->pndev[port] = dev;
>
>         netif_carrier_off(dev);
>

^ permalink raw reply

* Re: [PATCH v2 net] tcp: zero retrans_stamp if all retrans were acked
From: David Miller @ 2014-11-05 22:00 UTC (permalink / raw)
  To: ncardwell; +Cc: mleitner, netdev, ycheng, edumazet
In-Reply-To: <CADVnQymMioK0USNpaOF2Kb4+zMTDrjJ7r=5Bu51LbirVQPnzyw@mail.gmail.com>

From: Neal Cardwell <ncardwell@google.com>
Date: Tue, 4 Nov 2014 15:10:31 -0500

> On Tue, Nov 4, 2014 at 2:15 PM, Marcelo Ricardo Leitner
> <mleitner@redhat.com> wrote:
> ...
>> Therefore, now we clear retrans_stamp as soon as all data during the
>> loss window is fully acked.
>>
>> Reported-by: Ueki Kohei
>> Cc: Neal Cardwell <ncardwell@google.com>
>> Cc: Yuchung Cheng <ycheng@google.com>
>> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>> ---
>>
>> Notes:
>>     v1->v2: fixed compilation issue noticed by Neal
>>
>>  net/ipv4/tcp_input.c | 60 +++++++++++++++++++++++++++-------------------------
>>  1 file changed, 31 insertions(+), 29 deletions(-)
> 
> Acked-by: Neal Cardwell <ncardwell@google.com>
> Tested-by: Neal Cardwell <ncardwell@google.com>
> 
> Code looks fine, and it passes Yuchung's packetdrill test case for this.
> 
> Thanks for finding and fixing this, Marcelo.

Applied, thanks everyone.

^ permalink raw reply

* Re: [Patch net-next] ipv6: move INET6_MATCH() to include/net/inet6_hashtables.h
From: David Miller @ 2014-11-05 22:00 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev
In-Reply-To: <1415127587-29030-1-git-send-email-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue,  4 Nov 2014 10:59:47 -0800

> It is only used in net/ipv6/inet6_hashtables.c.
> 
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net v4] ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
From: Cong Wang @ 2014-11-05 22:11 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, lw1a2.jing, netdev, Eric Dumazet,
	Hannes Frederic Sowa, David L Stevens
In-Reply-To: <1415215658-10054-1-git-send-email-dborkman@redhat.com>

On Wed, Nov 5, 2014 at 11:27 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> -static struct sk_buff *mld_newpack(struct inet6_dev *idev, int size)
> +static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)

For net-next, you probably want to get rid of the 'mtu' parameter,
since all callers use dev->mtu. :)

^ permalink raw reply

* Re: [PATCH] bridge: include in6.h in if_bridge.h for struct in6_addr
From: David Miller @ 2014-11-05 22:13 UTC (permalink / raw)
  To: gregory.0xf0-Re5JQEeQqe8AvxtiuMwx3w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	carlos-H+wXaHxf7aLQT0dZR+AlfA, eblake-H+wXaHxf7aLQT0dZR+AlfA,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	xiyou.wangcong-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <1415128881-30183-1-git-send-email-gregory.0xf0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

From: Gregory Fong <gregory.0xf0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Tue,  4 Nov 2014 11:21:21 -0800

> if_bridge.h uses struct in6_addr ip6, but wasn't including the in6.h
> header.  Thomas Backlund originally sent a patch to do this, but this
> revealed a redefinition issue: https://lkml.org/lkml/2013/1/13/116
> 
> The redefinition issue should have been fixed by the following Linux
> commits:
> ee262ad827f89e2dc7851ec2986953b5b125c6bc inet: defines IPPROTO_* needed for module alias generation
> cfd280c91253cc28e4919e349fa7a813b63e71e8 net: sync some IP headers with glibc
> 
> and the following glibc commit:
> 6c82a2f8d7c8e21e39237225c819f182ae438db3 Coordinate IPv6 definitions for Linux and glibc
> 
> so actually include the header now.
> 
> Reported-by: Colin Guthrie <colin-odJJhXpcy38dnm+yROfE0A@public.gmane.org>
> Reported-by: Christiaan Welvaart <cjw-CllfUmslCRwdbCeoMzGj59i2O/JbrIOy@public.gmane.org>
> Reported-by: Thomas Backlund <tmb-odJJhXpcy38dnm+yROfE0A@public.gmane.org>
> Cc: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Cong Wang <xiyou.wangcong-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> Signed-off-by: Gregory Fong <gregory.0xf0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Applied, thank you.

^ permalink raw reply

* Re: Convert net_msg_warn, NETDEBUG, & LIMIT_NETDEBUG?
From: David Miller @ 2014-11-05 22:16 UTC (permalink / raw)
  To: realty; +Cc: netdev
In-Reply-To: <1415134000.23168.7.camel@perches.com>

From: Joe Perches <realty@perches.com>
Date: Tue, 04 Nov 2014 12:46:40 -0800

> net_msg_warn is a sysctl used to control the printk
> of a bundle of mostly ipv4/ipv6 logging messages.
> 
> Does anyone use it?
  ...
> Should those KERN_DEBUG uses be converted to
> net_dbg_ratelimited so that these uses could be
> controlled via dynamic_debug instead of the
> net_msg_warn sysctl?
  ...

I will respond to this by saying that, generally speaking, I'd
rather we move away from local debugging facility and controls
and towards the generic ones.

Definitely kill the allocation failure debug stuff, that's obviously
superfluous and bogus.

It may be the case that we're stuck with the net_msg_warn sysctl
for the stuff that isn't obviously extraneous and removable like
the allocation failure cases.

Why don't we work on these one at a time and see what's left after
all the adjustments/deletions?

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 05/13] ethtool,net/mlx4_en: Add 100M, 20G, 56G speeds ethtool reporting support
From: Ben Hutchings @ 2014-11-05 22:38 UTC (permalink / raw)
  To: Amir Vadai
  Cc: David S. Miller, netdev, Yevgeny Petrilin, Or Gerlitz,
	Saeed Mahameed
In-Reply-To: <1414402667-8841-6-git-send-email-amirv@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 2537 bytes --]

On Mon, 2014-10-27 at 11:37 +0200, Amir Vadai wrote:
> From: Saeed Mahameed <saeedm@mellanox.com>
> 
> Added 100M, 20G and 56G ethtool speed reporting support.
> Update mlx4_en_test_speed self test with the new speeds.
> 
> Defined new link speeds in include/uapi/linux/ethtool.h:
> +#define SPEED_20000	20000
> +#define SPEED_40000	40000
> +#define SPEED_56000	56000
> 
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
[...]
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -1213,6 +1213,10 @@ enum ethtool_sfeatures_retval_bits {
>  #define SUPPORTED_40000baseCR4_Full	(1 << 24)
>  #define SUPPORTED_40000baseSR4_Full	(1 << 25)
>  #define SUPPORTED_40000baseLR4_Full	(1 << 26)
> +#define SUPPORTED_56000baseKR4_Full	(1 << 27)
> +#define SUPPORTED_56000baseCR4_Full	(1 << 28)
> +#define SUPPORTED_56000baseSR4_Full	(1 << 29)
> +#define SUPPORTED_56000baseLR4_Full	(1 << 30)
>  
>  #define ADVERTISED_10baseT_Half		(1 << 0)
>  #define ADVERTISED_10baseT_Full		(1 << 1)
> @@ -1241,6 +1245,10 @@ enum ethtool_sfeatures_retval_bits {
>  #define ADVERTISED_40000baseCR4_Full	(1 << 24)
>  #define ADVERTISED_40000baseSR4_Full	(1 << 25)
>  #define ADVERTISED_40000baseLR4_Full	(1 << 26)
> +#define ADVERTISED_56000baseKR4_Full	(1 << 27)
> +#define ADVERTISED_56000baseCR4_Full	(1 << 28)
> +#define ADVERTISED_56000baseSR4_Full	(1 << 29)
> +#define ADVERTISED_56000baseLR4_Full	(1 << 30)

Can these modes be auto-negotiated?  If not then they don't need
advertised/supported bits.
 
>  /* The following are all involved in forcing a particular link
>   * mode for the device for setting things.  When getting the
> @@ -1248,12 +1256,16 @@ enum ethtool_sfeatures_retval_bits {
>   * it was forced up into this mode or autonegotiated.
>   */
>  
> -/* The forced speed, 10Mb, 100Mb, gigabit, 2.5Gb, 10GbE. */
> +/* The forced speed, 10Mb, 100Mb, gigabit, [2.5|10|20|40|56]GbE. */
>  #define SPEED_10		10
>  #define SPEED_100		100
>  #define SPEED_1000		1000
>  #define SPEED_2500		2500
>  #define SPEED_10000		10000
> +#define SPEED_20000		20000
> +#define SPEED_40000		40000
> +#define SPEED_56000		56000

We shouldn't add new SPEED macros.  The speed is just a number of Mbit/s
and we don't need to enumerate the possible values.

Ben.

>  #define SPEED_UNKNOWN		-1
>  
>  /* Duplex, half or full. */

-- 
Ben Hutchings
The program is absolutely right; therefore, the computer must be wrong.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply

* [PATCH net-next] net; ipv[46] - Remove 2 unnecessary NETDEBUG OOM messages
From: Joe Perches @ 2014-11-05 22:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20141105.171623.1142237070388305283.davem@davemloft.net>

These messages aren't useful as there's a generic dump_stack()
on OOM.

Neaten the comment and if test above the OOM by separating the
assign in if into an allocation then if test.

Signed-off-by: Joe Perches <joe@perches.com>
---
 net/ipv4/ip_output.c  |  8 +++-----
 net/ipv6/ip6_output.c | 10 ++++------
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index bc6471d..4a929ad 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -662,12 +662,10 @@ slow_path:
 		if (len < left)	{
 			len &= ~7;
 		}
-		/*
-		 *	Allocate buffer.
-		 */
 
-		if ((skb2 = alloc_skb(len+hlen+ll_rs, GFP_ATOMIC)) == NULL) {
-			NETDEBUG(KERN_INFO "IP: frag: no memory for new fragment!\n");
+		/* Allocate buffer */
+		skb2 = alloc_skb(len + hlen + ll_rs, GFP_ATOMIC);
+		if (!skb2) {
 			err = -ENOMEM;
 			goto fail;
 		}
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 8e950c2..916d2a1 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -747,13 +747,11 @@ slow_path:
 		if (len < left)	{
 			len &= ~7;
 		}
-		/*
-		 *	Allocate buffer.
-		 */
 
-		if ((frag = alloc_skb(len + hlen + sizeof(struct frag_hdr) +
-				      hroom + troom, GFP_ATOMIC)) == NULL) {
-			NETDEBUG(KERN_INFO "IPv6: frag: no memory for new fragment!\n");
+		/* Allocate buffer */
+		frag = alloc_skb(len + hlen + sizeof(struct frag_hdr) +
+				 hroom + troom, GFP_ATOMIC);
+		if (!frag) {
 			IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
 				      IPSTATS_MIB_FRAGFAILS);
 			err = -ENOMEM;

^ permalink raw reply related

* Re: [PATCH net v4] ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
From: Daniel Borkmann @ 2014-11-05 22:46 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, lw1a2.jing, netdev, Eric Dumazet,
	Hannes Frederic Sowa, David L Stevens
In-Reply-To: <CAHA+R7NKPvhut_K+xgtyJqd=a8Y=dfU80e8C7Jo1qgTdg=2ubg@mail.gmail.com>

On 11/05/2014 11:11 PM, Cong Wang wrote:
> On Wed, Nov 5, 2014 at 11:27 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> -static struct sk_buff *mld_newpack(struct inet6_dev *idev, int size)
>> +static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
>
> For net-next, you probably want to get rid of the 'mtu' parameter,
> since all callers use dev->mtu. :)

Yeah, feel free. ;) Probably for the longer term it might make sense to
look into ways to refactor and unify some of the more generic portions of
the IGMP/MLD code.

^ permalink raw reply

* Re: [GIT net-next] Open vSwitch
From: Pravin Shelar @ 2014-11-05 22:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20141105.151047.1621156460688575485.davem@davemloft.net>

On Wed, Nov 5, 2014 at 12:10 PM, David Miller <davem@davemloft.net> wrote:
>
> Please do not submit your patches such that the email Date: field is
> the commit's date.  You're not posting these on Nov. 4th, yet that
> is the Date: field on all of the individual patch emails.
>
> I want them to be the date at the time you post the patch to the mailing
> list.
>
> Otherwise the ordering in patchwork is not cronological wrt. the list's
> postings and this makes my work more difficult than it needs to be.
>
Sorry about the Date field. NTP stopped working on my machine thats
why the date got messed up.

^ permalink raw reply

* Re: [PATCH 00/20] kselftest install target feature
From: Kees Cook @ 2014-11-05 23:23 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Greg KH, Andrew Morton, Michal Marek, David S. Miller, Phong Tran,
	David Herrmann, Hugh Dickins, pranith kumar, Eric W. Biederman,
	Serge E. Hallyn, linux-kbuild, LKML, Linux API,
	Network Development
In-Reply-To: <5459650E.6070201@osg.samsung.com>

On Tue, Nov 4, 2014 at 3:45 PM, Shuah Khan <shuahkh@osg.samsung.com> wrote:
> On 11/04/2014 12:22 PM, Kees Cook wrote:
>> On Tue, Nov 4, 2014 at 9:10 AM, Shuah Khan <shuahkh@osg.samsung.com> wrote:
>>> This patch series adds a new kselftest_install make target
>>> to enable selftest install. When make kselftest_install is
>>> run, selftests are installed on the system. A new install
>>> target is added to selftests Makefile which will install
>>> targets for the tests that are specified in INSTALL_TARGETS.
>>> During install, a script is generated to run tests that are
>>> installed. This script will be installed in the selftest install
>>> directory. Individual test Makefiles are changed to add to the
>>> script. This will allow new tests to add install and run test
>>> commands to the generated kselftest script.
>>
>> I'm all for making the self tests more available, but I don't think
>> this is the right approach. My primary objection is that it creates a
>> second way to run tests, and that means any changes and additions need
>> to be updated in two places. I'd much rather just maintain the single
>> "make" targets instead. Having "make" available on the target device
>> doesn't seem too bad to me. Is there a reason that doesn't work for
>> your situation?
>
> Kees,
>
> My primary objective is to provide a way to install selftests for a
> specific kernel release. This will allow developers to run tests for
> a specific release and look for regressions. Adding an install target
> will also help support local execution of tests in a virtualized
> environments. In some cases such as qemu, it is not practical to
> expect the target to have support for "make". Once tests are installed
> to be run outside the git environment, we need a master script that
> can run the tests. Hence the need for a master script that can run
> tests.
>
> We have the ability to run all tests via make kselftest target or
> run a specific test using the individual test's run_tests target.
> Both of above are necessary to support running tests from the tree.
> Embedding run_tests logic in the makefiles doesn't work very well
> in the long run.
>
> We also need a way to run them outside tree. I agree with you that
> the way I added the script generation, duplicates the code in individual
> run_tests targets and that changes/updates need to be made in both
> places.
>
> Would you be ok with the approach if I fixed the duplicating
> problem? I can address the duplication concern easily.

Yeah, getting rid of duplication would be much preferred. Thanks!

-Kees

>
>>
>> I would, however, like to see some better standardization of the test
>> "framework" that we've got in there already. (For example, some
>> failures fail the "make", some don't, there are various reporting
>> methods for success/failure depending on the test, etc.)
>
> This is being addressed and I have the framework in linux-kselftest
> git next branch at the moment. I do think the above work is part of
> addressing the larger framework issues such as being able to run tests
> on a target system that might not have "make" support and makes it
> easier to use.
>
> thanks,
> -- Shuah
>
>
> --
> Shuah Khan
> Sr. Linux Kernel Developer
> Samsung Research America (Silicon Valley)
> shuahkh@osg.samsung.com | (970) 217-8978



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply

* Re: [PATCH net v4] ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
From: Hannes Frederic Sowa @ 2014-11-05 23:32 UTC (permalink / raw)
  To: Daniel Borkmann, davem; +Cc: lw1a2.jing, netdev, Eric Dumazet, David L Stevens
In-Reply-To: <1415215658-10054-1-git-send-email-dborkman@redhat.com>

On Wed, Nov 5, 2014, at 20:27, Daniel Borkmann wrote:
> It has been reported that generating an MLD listener report on
> devices with large MTUs (e.g. 9000) and a high number of IPv6
> addresses can trigger a skb_over_panic():
> 
> skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
> head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
> dev:port1
>  ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:100!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: ixgbe(O)
> CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
> [...]
> Call Trace:
>  <IRQ>
>  [<ffffffff80578226>] ? skb_put+0x3a/0x3b
>  [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
>  [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
>  [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
>  [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
>  [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
>  [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
>  [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
>  [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
>  [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
>  [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
> 
> mld_newpack() skb allocations are usually requested with dev->mtu
> in size, since commit 72e09ad107e7 ("ipv6: avoid high order allocations")
> we have changed the limit in order to be less likely to fail.
> 
> However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
> macros, which determine if we may end up doing an skb_put() for
> adding another record. To avoid possible fragmentation, we check
> the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
> assumption as the actual max allocation size can be much smaller.
> 
> The IGMP case doesn't have this issue as commit 57e1ab6eaddc
> ("igmp: refine skb allocations") stores the allocation size in
> the cb[].
> 
> Set a reserved_tailroom to make it fit into the MTU and use
> skb_availroom() helper instead. This also allows to get rid of
> igmp_skb_size().
> 
> Reported-by: Wei Liu <lw1a2.jing@gmail.com>
> Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations")
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: David L Stevens <david.stevens@oracle.com>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks and sorry for the back and forth, Daniel!

^ permalink raw reply

* [PATCH net-next] net: esp: Convert NETDEBUG to pr_info
From: Joe Perches @ 2014-11-05 23:36 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller
  Cc: Patrick McHardy, Stephen Hemminger, netdev, LKML

Commit 64ce207306de ("[NET]: Make NETDEBUG pure printk wrappers")
originally had these NETDEBUG printks as always emitting.

Commit a2a316fd068c ("[NET]: Replace CONFIG_NET_DEBUG with sysctl")
added a net_msg_warn sysctl to these NETDEBUG uses.

Convert these NETDEBUG uses to normal pr_info calls.

This changes the output prefix from "ESP: " to include
"IPSec: " for the ipv4 case and "IPv6: " for the ipv6 case.

These output lines are now like the other messages in the files.

Other miscellanea:

Neaten the arithmetic spacing to be consistent with other
arithmetic spacing in the files.

Signed-off-by: Joe Perches <joe@perches.com>
---
 net/ipv4/esp4.c | 10 +++++-----
 net/ipv6/esp6.c | 10 +++++-----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index d2bf02e..60173d4 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -603,12 +603,12 @@ static int esp_init_authenc(struct xfrm_state *x)
 		BUG_ON(!aalg_desc);

 		err = -EINVAL;
-		if (aalg_desc->uinfo.auth.icv_fullbits/8 !=
+		if (aalg_desc->uinfo.auth.icv_fullbits / 8 !=
 		    crypto_aead_authsize(aead)) {
-			NETDEBUG(KERN_INFO "ESP: %s digestsize %u != %hu\n",
-				 x->aalg->alg_name,
-				 crypto_aead_authsize(aead),
-				 aalg_desc->uinfo.auth.icv_fullbits/8);
+			pr_info("ESP: %s digestsize %u != %hu\n",
+				x->aalg->alg_name,
+				crypto_aead_authsize(aead),
+				aalg_desc->uinfo.auth.icv_fullbits / 8);
 			goto free_key;
 		}

diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 83fc3a3..d21d7b2 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -544,12 +544,12 @@ static int esp_init_authenc(struct xfrm_state *x)
 		BUG_ON(!aalg_desc);

 		err = -EINVAL;
-		if (aalg_desc->uinfo.auth.icv_fullbits/8 !=
+		if (aalg_desc->uinfo.auth.icv_fullbits / 8 !=
 		    crypto_aead_authsize(aead)) {
-			NETDEBUG(KERN_INFO "ESP: %s digestsize %u != %hu\n",
-				 x->aalg->alg_name,
-				 crypto_aead_authsize(aead),
-				 aalg_desc->uinfo.auth.icv_fullbits/8);
+			pr_info("ESP: %s digestsize %u != %hu\n",
+				x->aalg->alg_name,
+				crypto_aead_authsize(aead),
+				aalg_desc->uinfo.auth.icv_fullbits / 8);
 			goto free_key;
 		}

^ permalink raw reply related

* [PATCH net-next] sock.h: Remove unused NETDEBUG macro
From: Joe Perches @ 2014-11-05 23:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, LKML
In-Reply-To: <1415230568.6634.36.camel@perches.com>

It's unused now, just delete it.

Signed-off-by: Joe Perches <joe@perches.com>
---

Assuming the 2 NETDEBUG conversion deletion patches are applied...

 include/net/sock.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7db3db1..6767d75 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2280,9 +2280,6 @@ bool sk_net_capable(const struct sock *sk, int cap);
  *	Enable debug/info messages
  */
 extern int net_msg_warn;
-#define NETDEBUG(fmt, args...) \
-	do { if (net_msg_warn) printk(fmt,##args); } while (0)
-
 #define LIMIT_NETDEBUG(fmt, args...) \
 	do { if (net_msg_warn && net_ratelimit()) printk(fmt,##args); } while(0)

^ permalink raw reply related

* [PATCH net-next] fou: Fix typo in returning flags in netlink
From: Tom Herbert @ 2014-11-06  0:49 UTC (permalink / raw)
  To: davem, netdev

When filling netlink info, dport is being returned as flags. Fix
instances to return correct value.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/ip_gre.c | 2 +-
 net/ipv4/ipip.c   | 2 +-
 net/ipv6/sit.c    | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 12055fd..ac84912 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -789,7 +789,7 @@ static int ipgre_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_u16(skb, IFLA_GRE_ENCAP_DPORT,
 			t->encap.dport) ||
 	    nla_put_u16(skb, IFLA_GRE_ENCAP_FLAGS,
-			t->encap.dport))
+			t->encap.flags))
 		goto nla_put_failure;
 
 	return 0;
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 37096d6..40403114 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -465,7 +465,7 @@ static int ipip_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_DPORT,
 			tunnel->encap.dport) ||
 	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_FLAGS,
-			tunnel->encap.dport))
+			tunnel->encap.flags))
 		goto nla_put_failure;
 
 	return 0;
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 58e5b47..45ad924 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1714,7 +1714,7 @@ static int ipip6_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_DPORT,
 			tunnel->encap.dport) ||
 	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_FLAGS,
-			tunnel->encap.dport))
+			tunnel->encap.flags))
 		goto nla_put_failure;
 
 	return 0;
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH net-next] net: gro: add a per device gro flush timer
From: Eric Dumazet @ 2014-11-06  0:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Or Gerlitz, Willem de Bruijn

From: Eric Dumazet <edumazet@google.com>

Tuning coalescing parameters on NIC can be really hard.

Servers can handle both bulk and RPC like traffic, with conflicting
goals : bulk flows want as big GRO packets as possible, RPC want minimal
latencies.

To reach big GRO packets on 10Gbe NIC, one can use :

ethtool -C eth0 rx-usecs 4 rx-frames 44

But this penalizes rpc sessions, with an increase of latencies, up to
50% in some cases, as NICs generally do not force an interrupt when
a packet with TCP Push flag is received.

Some NICs do not have an absolute timer, only a timer rearmed for every
incoming packet.

This patch uses a different strategy : Let GRO stack decides what do do,
based on traffic pattern.

Packets with Push flag wont be delayed.
Packets without Push flag might be held in GRO engine, if we keep
receiving data.

This new mechanism is off by default, and shall be enabled by setting
/sys/class/net/eth0/gro_flush_timeout to a value in nanosecond.

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00      0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

lpaa6:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00      0.00      0.50

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h |   12 +++------
 net/core/dev.c            |   44 ++++++++++++++++++++++++++++++++++--
 net/core/net-sysfs.c      |   18 ++++++++++++++
 3 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4767f546d7c0..8474fcfadc7c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -314,6 +314,8 @@ struct napi_struct {
 	struct net_device	*dev;
 	struct sk_buff		*gro_list;
 	struct sk_buff		*skb;
+	unsigned long		napi_rx_count;
+	struct hrtimer		timer;
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
 	unsigned int		napi_id;
@@ -485,14 +487,7 @@ void napi_hash_del(struct napi_struct *napi);
  * Stop NAPI from being scheduled on this context.
  * Waits till any outstanding processing completes.
  */
-static inline void napi_disable(struct napi_struct *n)
-{
-	might_sleep();
-	set_bit(NAPI_STATE_DISABLE, &n->state);
-	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
-		msleep(1);
-	clear_bit(NAPI_STATE_DISABLE, &n->state);
-}
+void napi_disable(struct napi_struct *n);
 
 /**
  *	napi_enable - enable NAPI scheduling
@@ -1603,6 +1598,7 @@ struct net_device {
 
 #endif
 
+	unsigned long		gro_flush_timeout;
 	rx_handler_func_t __rcu	*rx_handler;
 	void __rcu		*rx_handler_data;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 40be481268de..c88651bd8ada 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -133,6 +133,7 @@
 #include <linux/vmalloc.h>
 #include <linux/if_macvlan.h>
 #include <linux/errqueue.h>
+#include <linux/hrtimer.h>
 
 #include "net-sysfs.h"
 
@@ -4000,6 +4001,8 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	if (skb_is_gso(skb) || skb_has_frag_list(skb) || skb->csum_bad)
 		goto normal;
 
+	napi->napi_rx_count++;
+
 	gro_list_prepare(napi, skb);
 
 	rcu_read_lock();
@@ -4411,7 +4414,6 @@ EXPORT_SYMBOL(__napi_schedule_irqoff);
 void __napi_complete(struct napi_struct *n)
 {
 	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
-	BUG_ON(n->gro_list);
 
 	list_del_init(&n->poll_list);
 	smp_mb__before_atomic();
@@ -4430,8 +4432,19 @@ void napi_complete(struct napi_struct *n)
 	if (unlikely(test_bit(NAPI_STATE_NPSVC, &n->state)))
 		return;
 
-	napi_gro_flush(n, false);
+	if (n->gro_list) {
+		unsigned long timeout = 0;
+
+		if (n->napi_rx_count)
+			timeout = n->dev->gro_flush_timeout;
 
+		if (timeout)
+			hrtimer_start(&n->timer, ns_to_ktime(timeout),
+				      HRTIMER_MODE_REL_PINNED);
+		else
+			napi_gro_flush(n, false);
+	}
+	n->napi_rx_count = 0;
 	if (likely(list_empty(&n->poll_list))) {
 		WARN_ON_ONCE(!test_and_clear_bit(NAPI_STATE_SCHED, &n->state));
 	} else {
@@ -4495,10 +4508,23 @@ void napi_hash_del(struct napi_struct *napi)
 }
 EXPORT_SYMBOL_GPL(napi_hash_del);
 
+static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
+{
+	struct napi_struct *napi;
+
+	napi = container_of(timer, struct napi_struct, timer);
+	if (napi->gro_list)
+		napi_schedule(napi);
+
+	return HRTIMER_NORESTART;
+}
+
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {
 	INIT_LIST_HEAD(&napi->poll_list);
+	hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
+	napi->timer.function = napi_watchdog;
 	napi->gro_count = 0;
 	napi->gro_list = NULL;
 	napi->skb = NULL;
@@ -4517,6 +4543,20 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 }
 EXPORT_SYMBOL(netif_napi_add);
 
+void napi_disable(struct napi_struct *n)
+{
+	might_sleep();
+	set_bit(NAPI_STATE_DISABLE, &n->state);
+
+	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
+		msleep(1);
+
+	hrtimer_cancel(&n->timer);
+
+	clear_bit(NAPI_STATE_DISABLE, &n->state);
+}
+EXPORT_SYMBOL(napi_disable);
+
 void netif_napi_del(struct napi_struct *napi)
 {
 	list_del_init(&napi->dev_list);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 9dd06699b09c..1a24602cd54e 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -325,6 +325,23 @@ static ssize_t tx_queue_len_store(struct device *dev,
 }
 NETDEVICE_SHOW_RW(tx_queue_len, fmt_ulong);
 
+static int change_gro_flush_timeout(struct net_device *dev, unsigned long val)
+{
+	dev->gro_flush_timeout = val;
+	return 0;
+}
+
+static ssize_t gro_flush_timeout_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t len)
+{
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	return netdev_store(dev, attr, buf, len, change_gro_flush_timeout);
+}
+NETDEVICE_SHOW_RW(gro_flush_timeout, fmt_ulong);
+
 static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
 			     const char *buf, size_t len)
 {
@@ -422,6 +439,7 @@ static struct attribute *net_class_attrs[] = {
 	&dev_attr_mtu.attr,
 	&dev_attr_flags.attr,
 	&dev_attr_tx_queue_len.attr,
+	&dev_attr_gro_flush_timeout.attr,
 	&dev_attr_phys_port_id.attr,
 	NULL,
 };

^ permalink raw reply related

* Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics
From: Joe Stringer @ 2014-11-06  1:06 UTC (permalink / raw)
  To: David Miller
  Cc: gerlitz.or, therbert, netdev, sathya.perla, jeffrey.t.kirsher,
	linux.nics, amirv, shahed.shaikh, Dept-GELinuxNICDev,
	linux-kernel
In-Reply-To: <20141105.163825.1433973842938441546.davem@davemloft.net>

On Wed, Nov 05, 2014 at 04:38:25PM -0500, David Miller wrote:
> From: Or Gerlitz <gerlitz.or@gmail.com>
> Date: Wed, 5 Nov 2014 23:32:44 +0200
> 
> > but fact is that the proposed patch series has the --same-- helper for
> > four drivers, so why not start with a that limited helper which would
> > be picked up by these drivers and we'll take it from there.
> 
> I'm in favor of the helper, duplication is error prone.
> 
> And in fact, any differences a driver ends up needing might be
> integratable into the helper.

My impression was that the changes are more likely to be
hardware-specific (like the i40e changes) rather than software-specific,
like changes that might be integrated into the helper.

That said, I can rework for one helper. The way I see it would be the
same code as these patches, as "vxlan_gso_check(struct sk_buff *)" in
drivers/net/vxlan.c which would be called from each driver. Is that what
you had in mind?

^ permalink raw reply

* [PATCH 0/3i 3.18] Fix more problems with rtlwifi
From: Larry Finger @ 2014-11-06  1:10 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Larry Finger, netdev

This set of patches fix some additional problems found for rtlwifi,
rtl8192se, and rtl8192ee.

It is certainly possible that rtlwifi is getting too large. For that reason,
my changes for 3.19 will be restricted to identifying common routines, and
moving such code from the individual drivers into driver rtlwifi.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>


Larry Finger (3):
  rtlwifi: Fix setting of tx descriptor for new trx flow
  rtlwifi: Fix errors in descriptor manipulation
  rtlwifi: rtl8192se: Fix connection problems

 drivers/net/wireless/rtlwifi/pci.c           |  16 ++--
 drivers/net/wireless/rtlwifi/rtl8192se/hw.c  | 129 +++++++++++++--------------
 drivers/net/wireless/rtlwifi/rtl8192se/phy.c |   8 +-
 drivers/net/wireless/rtlwifi/rtl8192se/sw.c  |   4 +
 drivers/net/wireless/rtlwifi/rtl8192se/trx.c |  23 +++++
 drivers/net/wireless/rtlwifi/rtl8192se/trx.h |   4 +
 6 files changed, 110 insertions(+), 74 deletions(-)

-- 
2.1.2

^ permalink raw reply

* [PATCH 1/3 3.18] rtlwifi: Fix setting of tx descriptor for new trx flow
From: Larry Finger @ 2014-11-06  1:10 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Larry Finger, netdev
In-Reply-To: <1415236254-12274-1-git-send-email-Larry.Finger@lwfinger.net>

Device RTL8192EE uses a new form of trx flow. This fix sets up the descriptors
correctly.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
 drivers/net/wireless/rtlwifi/pci.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c
index 25daa87..116f746 100644
--- a/drivers/net/wireless/rtlwifi/pci.c
+++ b/drivers/net/wireless/rtlwifi/pci.c
@@ -1127,9 +1127,14 @@ static void _rtl_pci_prepare_bcn_tasklet(struct ieee80211_hw *hw)
 
 	__skb_queue_tail(&ring->queue, pskb);
 
-	rtlpriv->cfg->ops->set_desc(hw, (u8 *)pdesc, true, HW_DESC_OWN,
-				    &temp_one);
-
+	if (rtlpriv->use_new_trx_flow) {
+		temp_one = 4;
+		rtlpriv->cfg->ops->set_desc(hw, (u8 *)pbuffer_desc, true,
+					    HW_DESC_OWN, (u8 *)&temp_one);
+	} else {
+		rtlpriv->cfg->ops->set_desc(hw, (u8 *)pdesc, true, HW_DESC_OWN,
+					    &temp_one);
+	}
 	return;
 }
 
-- 
2.1.2

^ permalink raw reply related

* [PATCH 2/3 3.18] rtlwifi: Fix errors in descriptor manipulation
From: Larry Finger @ 2014-11-06  1:10 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Larry Finger, netdev
In-Reply-To: <1415236254-12274-1-git-send-email-Larry.Finger@lwfinger.net>

There are typos in the handling of the descriptor pointers where the wrong
descriptor is referenced. There is also an error in which the pointer is
incremented twice.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
 drivers/net/wireless/rtlwifi/pci.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c
index 116f746..6d2b628 100644
--- a/drivers/net/wireless/rtlwifi/pci.c
+++ b/drivers/net/wireless/rtlwifi/pci.c
@@ -1375,9 +1375,9 @@ static void _rtl_pci_free_tx_ring(struct ieee80211_hw *hw,
 	ring->desc = NULL;
 	if (rtlpriv->use_new_trx_flow) {
 		pci_free_consistent(rtlpci->pdev,
-				    sizeof(*ring->desc) * ring->entries,
+				    sizeof(*ring->buffer_desc) * ring->entries,
 				    ring->buffer_desc, ring->buffer_desc_dma);
-		ring->desc = NULL;
+		ring->buffer_desc = NULL;
 	}
 }
 
@@ -1548,7 +1548,6 @@ int rtl_pci_reset_trx_ring(struct ieee80211_hw *hw)
 							 true,
 							 HW_DESC_TXBUFF_ADDR),
 						 skb->len, PCI_DMA_TODEVICE);
-				ring->idx = (ring->idx + 1) % ring->entries;
 				kfree_skb(skb);
 				ring->idx = (ring->idx + 1) % ring->entries;
 			}
-- 
2.1.2

^ permalink raw reply related

* [PATCH 3/3 3.18] rtlwifi: rtl8192se: Fix connection problems
From: Larry Finger @ 2014-11-06  1:10 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Larry Finger, netdev
In-Reply-To: <1415236254-12274-1-git-send-email-Larry.Finger@lwfinger.net>

Changes in the vendor driver were added to rtlwifi, but some updates
to rtl8192se were missed.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
 drivers/net/wireless/rtlwifi/rtl8192se/hw.c  | 129 +++++++++++++--------------
 drivers/net/wireless/rtlwifi/rtl8192se/phy.c |   8 +-
 drivers/net/wireless/rtlwifi/rtl8192se/sw.c  |   4 +
 drivers/net/wireless/rtlwifi/rtl8192se/trx.c |  23 +++++
 drivers/net/wireless/rtlwifi/rtl8192se/trx.h |   4 +
 5 files changed, 100 insertions(+), 68 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/hw.c b/drivers/net/wireless/rtlwifi/rtl8192se/hw.c
index 00e0670..4626203 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192se/hw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192se/hw.c
@@ -1170,27 +1170,32 @@ static int _rtl92se_set_media_status(struct ieee80211_hw *hw,
 {
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
 	u8 bt_msr = rtl_read_byte(rtlpriv, MSR);
+	enum led_ctl_mode ledaction = LED_CTL_NO_LINK;
 	u32 temp;
+	u8 mode = MSR_NOLINK;
+
 	bt_msr &= ~MSR_LINK_MASK;
 
 	switch (type) {
 	case NL80211_IFTYPE_UNSPECIFIED:
-		bt_msr |= (MSR_LINK_NONE << MSR_LINK_SHIFT);
+		mode = MSR_NOLINK;
 		RT_TRACE(rtlpriv, COMP_INIT, DBG_TRACE,
 			 "Set Network type to NO LINK!\n");
 		break;
 	case NL80211_IFTYPE_ADHOC:
-		bt_msr |= (MSR_LINK_ADHOC << MSR_LINK_SHIFT);
+		mode = MSR_ADHOC;
 		RT_TRACE(rtlpriv, COMP_INIT, DBG_TRACE,
 			 "Set Network type to Ad Hoc!\n");
 		break;
 	case NL80211_IFTYPE_STATION:
-		bt_msr |= (MSR_LINK_MANAGED << MSR_LINK_SHIFT);
+		mode = MSR_INFRA;
+		ledaction = LED_CTL_LINK;
 		RT_TRACE(rtlpriv, COMP_INIT, DBG_TRACE,
 			 "Set Network type to STA!\n");
 		break;
 	case NL80211_IFTYPE_AP:
-		bt_msr |= (MSR_LINK_MASTER << MSR_LINK_SHIFT);
+		mode = MSR_AP;
+		ledaction = LED_CTL_LINK;
 		RT_TRACE(rtlpriv, COMP_INIT, DBG_TRACE,
 			 "Set Network type to AP!\n");
 		break;
@@ -1201,7 +1206,17 @@ static int _rtl92se_set_media_status(struct ieee80211_hw *hw,
 
 	}
 
-	rtl_write_byte(rtlpriv, (MSR), bt_msr);
+	/* MSR_INFRA == Link in infrastructure network;
+	 * MSR_ADHOC == Link in ad hoc network;
+	 * Therefore, check link state is necessary.
+	 *
+	 * MSR_AP == AP mode; link state is not cared here.
+	 */
+	if (mode != MSR_AP && rtlpriv->mac80211.link_state < MAC80211_LINKED) {
+		mode = MSR_NOLINK;
+		ledaction = LED_CTL_NO_LINK;
+}
+	rtl_write_byte(rtlpriv, (MSR), bt_msr | mode);
 
 	temp = rtl_read_dword(rtlpriv, TCR);
 	rtl_write_dword(rtlpriv, TCR, temp & (~BIT(8)));
@@ -1262,6 +1277,7 @@ void rtl92se_enable_interrupt(struct ieee80211_hw *hw)
 	rtl_write_dword(rtlpriv, INTA_MASK, rtlpci->irq_mask[0]);
 	/* Support Bit 32-37(Assign as Bit 0-5) interrupt setting now */
 	rtl_write_dword(rtlpriv, INTA_MASK + 4, rtlpci->irq_mask[1] & 0x3F);
+	rtlpci->irq_enabled = true;
 }
 
 void rtl92se_disable_interrupt(struct ieee80211_hw *hw)
@@ -1276,8 +1292,7 @@ void rtl92se_disable_interrupt(struct ieee80211_hw *hw)
 	rtlpci = rtl_pcidev(rtl_pcipriv(hw));
 	rtl_write_dword(rtlpriv, INTA_MASK, 0);
 	rtl_write_dword(rtlpriv, INTA_MASK + 4, 0);
-
-	synchronize_irq(rtlpci->pdev->irq);
+	rtlpci->irq_enabled = false;
 }
 
 static u8 _rtl92s_set_sysclk(struct ieee80211_hw *hw, u8 data)
@@ -2035,9 +2050,9 @@ static void rtl92se_update_hal_rate_table(struct ieee80211_hw *hw,
 	u32 ratr_value;
 	u8 ratr_index = 0;
 	u8 nmode = mac->ht_enable;
-	u8 mimo_ps = IEEE80211_SMPS_OFF;
 	u16 shortgi_rate = 0;
 	u32 tmp_ratr_value = 0;
+	u32 ratr_mask;
 	u8 curtxbw_40mhz = mac->bw_40;
 	u8 curshortgi_40mhz = (sta->ht_cap.cap & IEEE80211_HT_CAP_SGI_40) ?
 				1 : 0;
@@ -2063,26 +2078,21 @@ static void rtl92se_update_hal_rate_table(struct ieee80211_hw *hw,
 	case WIRELESS_MODE_N_24G:
 	case WIRELESS_MODE_N_5G:
 		nmode = 1;
-		if (mimo_ps == IEEE80211_SMPS_STATIC) {
-			ratr_value &= 0x0007F005;
-		} else {
-			u32 ratr_mask;
 
-			if (get_rf_type(rtlphy) == RF_1T2R ||
-			    get_rf_type(rtlphy) == RF_1T1R) {
-				if (curtxbw_40mhz)
-					ratr_mask = 0x000ff015;
-				else
-					ratr_mask = 0x000ff005;
-			} else {
-				if (curtxbw_40mhz)
-					ratr_mask = 0x0f0ff015;
-				else
-					ratr_mask = 0x0f0ff005;
-			}
-
-			ratr_value &= ratr_mask;
+		if (get_rf_type(rtlphy) == RF_1T2R ||
+		    get_rf_type(rtlphy) == RF_1T1R) {
+			if (curtxbw_40mhz)
+				ratr_mask = 0x000ff015;
+			else
+				ratr_mask = 0x000ff005;
+		} else {
+			if (curtxbw_40mhz)
+				ratr_mask = 0x0f0ff015;
+			else
+				ratr_mask = 0x0f0ff005;
 		}
+
+		ratr_value &= ratr_mask;
 		break;
 	default:
 		if (rtlphy->rf_type == RF_1T2R)
@@ -2137,7 +2147,8 @@ static void rtl92se_update_hal_rate_mask(struct ieee80211_hw *hw,
 	struct rtl_sta_info *sta_entry = NULL;
 	u32 ratr_bitmap;
 	u8 ratr_index = 0;
-	u8 curtxbw_40mhz = (sta->bandwidth >= IEEE80211_STA_RX_BW_40) ? 1 : 0;
+	u8 curtxbw_40mhz = (sta->ht_cap.cap & IEEE80211_HT_CAP_SUP_WIDTH_20_40)
+				? 1 : 0;
 	u8 curshortgi_40mhz = (sta->ht_cap.cap & IEEE80211_HT_CAP_SGI_40) ?
 				1 : 0;
 	u8 curshortgi_20mhz = (sta->ht_cap.cap & IEEE80211_HT_CAP_SGI_20) ?
@@ -2148,9 +2159,7 @@ static void rtl92se_update_hal_rate_mask(struct ieee80211_hw *hw,
 	u8 shortgi_rate = 0;
 	u32 mask = 0;
 	u32 band = 0;
-	bool bmulticast = false;
 	u8 macid = 0;
-	u8 mimo_ps = IEEE80211_SMPS_OFF;
 
 	sta_entry = (struct rtl_sta_info *) sta->drv_priv;
 	wirelessmode = sta_entry->wireless_mode;
@@ -2198,41 +2207,32 @@ static void rtl92se_update_hal_rate_mask(struct ieee80211_hw *hw,
 		band |= (WIRELESS_11N | WIRELESS_11G | WIRELESS_11B);
 		ratr_index = RATR_INX_WIRELESS_NGB;
 
-		if (mimo_ps == IEEE80211_SMPS_STATIC) {
-			if (rssi_level == 1)
-				ratr_bitmap &= 0x00070000;
-			else if (rssi_level == 2)
-				ratr_bitmap &= 0x0007f000;
-			else
-				ratr_bitmap &= 0x0007f005;
+		if (rtlphy->rf_type == RF_1T2R ||
+			rtlphy->rf_type == RF_1T1R) {
+			if (rssi_level == 1) {
+					ratr_bitmap &= 0x000f0000;
+			} else if (rssi_level == 3) {
+				ratr_bitmap &= 0x000fc000;
+			} else if (rssi_level == 5) {
+					ratr_bitmap &= 0x000ff000;
+			} else {
+				if (curtxbw_40mhz)
+					ratr_bitmap &= 0x000ff015;
+				else
+					ratr_bitmap &= 0x000ff005;
+			}
 		} else {
-			if (rtlphy->rf_type == RF_1T2R ||
-				rtlphy->rf_type == RF_1T1R) {
-				if (rssi_level == 1) {
-						ratr_bitmap &= 0x000f0000;
-				} else if (rssi_level == 3) {
-					ratr_bitmap &= 0x000fc000;
-				} else if (rssi_level == 5) {
-						ratr_bitmap &= 0x000ff000;
-				} else {
-					if (curtxbw_40mhz)
-						ratr_bitmap &= 0x000ff015;
-					else
-						ratr_bitmap &= 0x000ff005;
-				}
+			if (rssi_level == 1) {
+				ratr_bitmap &= 0x0f8f0000;
+			} else if (rssi_level == 3) {
+				ratr_bitmap &= 0x0f8fc000;
+			} else if (rssi_level == 5) {
+				ratr_bitmap &= 0x0f8ff000;
 			} else {
-				if (rssi_level == 1) {
-					ratr_bitmap &= 0x0f8f0000;
-				} else if (rssi_level == 3) {
-					ratr_bitmap &= 0x0f8fc000;
-				} else if (rssi_level == 5) {
-					ratr_bitmap &= 0x0f8ff000;
-				} else {
-					if (curtxbw_40mhz)
-						ratr_bitmap &= 0x0f8ff015;
-					else
-						ratr_bitmap &= 0x0f8ff005;
-				}
+				if (curtxbw_40mhz)
+					ratr_bitmap &= 0x0f8ff015;
+				else
+					ratr_bitmap &= 0x0f8ff005;
 			}
 		}
 
@@ -2275,15 +2275,12 @@ static void rtl92se_update_hal_rate_mask(struct ieee80211_hw *hw,
 		rtl_write_byte(rtlpriv, SG_RATE, shortgi_rate);
 	}
 
-	mask |= (bmulticast ? 1 : 0) << 9 | (macid & 0x1f) << 4 | (band & 0xf);
+	mask |= (macid & 0x1f) << 4 | (band & 0xf);
 
 	RT_TRACE(rtlpriv, COMP_RATR, DBG_TRACE, "mask = %x, bitmap = %x\n",
 		 mask, ratr_bitmap);
 	rtl_write_dword(rtlpriv, 0x2c4, ratr_bitmap);
 	rtl_write_dword(rtlpriv, WFM5, (FW_RA_UPDATE_MASK | (mask << 8)));
-
-	if (macid != 0)
-		sta_entry->ratr_index = ratr_index;
 }
 
 void rtl92se_update_hal_rate_tbl(struct ieee80211_hw *hw,
diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/phy.c b/drivers/net/wireless/rtlwifi/rtl8192se/phy.c
index 77c5b5f..e382cef 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192se/phy.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192se/phy.c
@@ -399,6 +399,11 @@ static bool _rtl92s_phy_sw_chnl_step_by_step(struct ieee80211_hw *hw,
 		case 2:
 			currentcmd = &postcommoncmd[*step];
 			break;
+		default:
+			RT_TRACE(rtlpriv, COMP_ERR, DBG_LOUD,
+				 "Invalid 'stage' = %d, Check it!\n",
+				 *stage);
+			return true;
 		}
 
 		if (currentcmd->cmdid == CMDID_END) {
@@ -602,7 +607,7 @@ bool rtl92s_phy_set_rf_power_state(struct ieee80211_hw *hw,
 		}
 	case ERFSLEEP:
 			if (ppsc->rfpwr_state == ERFOFF)
-				return false;
+				break;
 
 			for (queue_id = 0, i = 0;
 			     queue_id < RTL_PCI_MAX_TX_QUEUE_COUNT;) {
@@ -1064,7 +1069,6 @@ bool rtl92s_phy_bb_config(struct ieee80211_hw *hw)
 	/* Check BB/RF confiuration setting. */
 	/* We only need to configure RF which is turned on. */
 	path1 = (u8)(rtl92s_phy_query_bb_reg(hw, RFPGA0_TXINFO, 0xf));
-	mdelay(10);
 	path2 = (u8)(rtl92s_phy_query_bb_reg(hw, ROFDM0_TRXPATHENABLE, 0xf));
 	pathmap = path1 | path2;
 
diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/sw.c b/drivers/net/wireless/rtlwifi/rtl8192se/sw.c
index aadba29..3c4238e 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192se/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192se/sw.c
@@ -269,6 +269,7 @@ static struct rtl_hal_ops rtl8192se_hal_ops = {
 	.led_control = rtl92se_led_control,
 	.set_desc = rtl92se_set_desc,
 	.get_desc = rtl92se_get_desc,
+	.is_tx_desc_closed = rtl92se_is_tx_desc_closed,
 	.tx_polling = rtl92se_tx_polling,
 	.enable_hw_sec = rtl92se_enable_hw_security_config,
 	.set_key = rtl92se_set_key,
@@ -278,6 +279,7 @@ static struct rtl_hal_ops rtl8192se_hal_ops = {
 	.get_rfreg = rtl92s_phy_query_rf_reg,
 	.set_rfreg = rtl92s_phy_set_rf_reg,
 	.get_btc_status = rtl_btc_status_false,
+	.rx_command_packet = rtl92se_rx_command_packet,
 };
 
 static struct rtl_mod_params rtl92se_mod_params = {
@@ -306,6 +308,8 @@ static struct rtl_hal_cfg rtl92se_hal_cfg = {
 	.maps[MAC_RCR_ACRC32] = RCR_ACRC32,
 	.maps[MAC_RCR_ACF] = RCR_ACF,
 	.maps[MAC_RCR_AAP] = RCR_AAP,
+	.maps[MAC_HIMR] = INTA_MASK,
+	.maps[MAC_HIMRE] = INTA_MASK + 4,
 
 	.maps[EFUSE_TEST] = REG_EFUSE_TEST,
 	.maps[EFUSE_CTRL] = REG_EFUSE_CTRL,
diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/trx.c b/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
index 672fd3b..2014b18 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
@@ -652,8 +652,31 @@ u32 rtl92se_get_desc(u8 *desc, bool istx, u8 desc_name)
 	return ret;
 }
 
+bool rtl92se_is_tx_desc_closed(struct ieee80211_hw *hw, u8 hw_queue, u16 index)
+{
+	struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
+	struct rtl8192_tx_ring *ring = &rtlpci->tx_ring[hw_queue];
+	u8 *entry = (u8 *)(&ring->desc[ring->idx]);
+	u8 own = (u8)rtl92se_get_desc(entry, true, HW_DESC_OWN);
+
+	/* beacon packet will only use the first
+	 * descriptor iby default, and the own bit may not
+	 * be cleared by the hardware
+	 */
+	if (own)
+		return false;
+	return true;
+}
+
 void rtl92se_tx_polling(struct ieee80211_hw *hw, u8 hw_queue)
 {
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
 	rtl_write_word(rtlpriv, TP_POLL, BIT(0) << (hw_queue));
 }
+
+u32 rtl92se_rx_command_packet(struct ieee80211_hw *hw,
+			      struct rtl_stats status,
+			      struct sk_buff *skb)
+{
+	return 0;
+}
diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/trx.h b/drivers/net/wireless/rtlwifi/rtl8192se/trx.h
index 5a13f17..bd9f4bf 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192se/trx.h
+++ b/drivers/net/wireless/rtlwifi/rtl8192se/trx.h
@@ -43,6 +43,10 @@ bool rtl92se_rx_query_desc(struct ieee80211_hw *hw, struct rtl_stats *stats,
 void rtl92se_set_desc(struct ieee80211_hw *hw, u8 *pdesc, bool istx,
 		      u8 desc_name, u8 *val);
 u32 rtl92se_get_desc(u8 *pdesc, bool istx, u8 desc_name);
+bool rtl92se_is_tx_desc_closed(struct ieee80211_hw *hw, u8 hw_queue, u16 index);
 void rtl92se_tx_polling(struct ieee80211_hw *hw, u8 hw_queue);
+u32 rtl92se_rx_command_packet(struct ieee80211_hw *hw,
+			      struct rtl_stats status,
+			      struct sk_buff *skb);
 
 #endif
-- 
2.1.2

^ permalink raw reply related

* Re: [PATCH net-next] net: gro: add a per device gro flush timer
From: Rick Jones @ 2014-11-06  1:38 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, Or Gerlitz, Willem de Bruijn
In-Reply-To: <1415235320.13896.51.camel@edumazet-glaptop2.roam.corp.google.com>

On 11/05/2014 04:55 PM, Eric Dumazet wrote:
> Tested:
>   Ran 200 netperf TCP_STREAM from A to B (10Gbe link, 8 RX queues)
>
> Without this feature, we send back about 305,000 ACK per second.
>
> GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
>
> Setting a timer of 2000 nsec is enough to increase GRO packet sizes
> and reduce number of ACK packets. (811/19.2 = 42)
>
> Receiver performs less calls to upper stacks, less wakes up.
> This also reduces cpu usage on the sender, as it receives less ACK
> packets.
>
> Note that reducing number of wakes up increases cpu efficiency, but can
> decrease QPS, as applications wont have the chance to warmup cpu caches
> doing a partial read of RPC requests/answers if they fit in one skb.

Speaking of QPS, what happens to 200 TCP_RR tests when the feature is 
enabled?

rick jones

^ permalink raw reply

* Re: [PATCH V3 1/3] can: add can_is_canfd_skb() API
From: Dong Aisheng @ 2014-11-06  1:52 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Eric Dumazet, linux-can, mkl, wg, varkabhadram, netdev,
	linux-arm-kernel
In-Reply-To: <545A5F55.7050307@hartkopp.net>

On Wed, Nov 05, 2014 at 06:33:09PM +0100, Oliver Hartkopp wrote:
> On 05.11.2014 17:22, Eric Dumazet wrote:
> >On Wed, 2014-11-05 at 21:16 +0800, Dong Aisheng wrote:
> 
> >
> >This looks a bit strange to assume that skb->len == magical_value is CAN
> >FD. A comment would be nice.
> >
> 
> Yes. Due to exactly two types of struct can(fd)_frame which can be
> contained in a skb the skbs are distinguished by the length which
> can be either CAN_MTU or CANFD_MTU.
> 
> >>+static inline int can_is_canfd_skb(struct sk_buff *skb)
> >
> >static inline bool can_is_canfd_skb(const struct sk_buff *skb)
> >
> 
> ok.
> 

Got it.

> >>+{
> 
> What about:
> 
> 	/* the CAN specific type of skb is identified by its data length */
> 

Looks good to me.
I will send a updated version with these changes.

> >>+	return skb->len == CANFD_MTU;
> >>+}
> >>+
> >>  /* get data length from can_dlc with sanitized can_dlc */
> >>  u8 can_dlc2len(u8 can_dlc);
> 
> Regards,
> Oliver
>

Regards
Dong Aisheng

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox