Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] net: Convert SEQ_START_TOKEN/seq_printf to seq_puts
From: David Miller @ 2014-11-06  3:05 UTC (permalink / raw)
  To: joe; +Cc: netdev
In-Reply-To: <1415144223.1508.1.camel@perches.com>

From: Joe Perches <joe@perches.com>
Date: Tue, 04 Nov 2014 15:37:03 -0800

> Using a single fixed string is smaller code size than using
> a format and many string arguments.
> 
> Reduces overall code size a little.
> 
> $ size net/ipv4/igmp.o* net/ipv6/mcast.o* net/ipv6/ip6_flowlabel.o*
>    text	   data	    bss	    dec	    hex	filename
>   34269	   7012	  14824	  56105	   db29	net/ipv4/igmp.o.new
>   34315	   7012	  14824	  56151	   db57	net/ipv4/igmp.o.old
>   30078	   7869	  13200	  51147	   c7cb	net/ipv6/mcast.o.new
>   30105	   7869	  13200	  51174	   c7e6	net/ipv6/mcast.o.old
>   11434	   3748	   8580	  23762	   5cd2	net/ipv6/ip6_flowlabel.o.new
>   11491	   3748	   8580	  23819	   5d0b	net/ipv6/ip6_flowlabel.o.old
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Ok, I'm fine with this, applied.

Thanks Joe.

^ permalink raw reply

* Re: [PATCH] rtlwifi: Add more checks for get_btc_status callback
From: Mike Galbraith @ 2014-11-06  3:03 UTC (permalink / raw)
  To: Larry Finger
  Cc: Murilo Opsfelder Araujo, linux-kernel, linux-wireless, netdev,
	Chaoming Li, John W. Linville, Thadeu Cascardo, troy_tan
In-Reply-To: <545A6894.7040506@lwfinger.net>

On Wed, 2014-11-05 at 12:12 -0600, Larry Finger wrote:

> Yes, I am aware that rtl8192se is failing, and now that I am back from vacation, 
> I am working on the problem. If you want to use the driver with kernel 3.18, 
> clone the repo at http://github.com/lwfinger/rtlwifi_new.git and build and 
> install either the master or kernel_version branches. Both work.

Nah, no hurry.  My lappy is about to go on 4 weeks vacation, and has a
bulging suitcase full of kernels to wear :)

-Mike

^ permalink raw reply

* Re: [PATCH net-next] fast_hash: avoid indirect function calls
From: David Miller @ 2014-11-06  3:03 UTC (permalink / raw)
  To: hannes; +Cc: netdev, kernel, dborkman, tgraf
In-Reply-To: <8214a3fdc8b7f97bb782c8722e9f1e65037553fe.1415142006.git.hannes@stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed,  5 Nov 2014 00:23:04 +0100

> By default the arch_fast_hash hashing function pointers are initialized
> to jhash(2). If during boot-up a CPU with SSE4.2 is detected they get
> updated to the CRC32 ones. This dispatching scheme incurs a function
> pointer lookup and indirect call for every hashing operation.
> 
> rhashtable as a user of arch_fast_hash e.g. stores pointers to hashing
> functions in its structure, too, causing two indirect branches per
> hashing operation.
> 
> Using alternative_call we can get away with one of those indirect branches.
> 
> Acked-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Thomas Graf <tgraf@suug.ch>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Applied, thanks Hannes.

> Would it make sense to start suppressing the generation of local
> functions for static inline functions which address is taken?
> 
> E.g. we could use extern inline in a few cases (dst_output is often used
> as a function pointer but marked static inline).  We could mark it as
> extern inline and copy&paste the code to a .c file to prevent multiple
> copies of machine code for this function. But because of the copy&paste I
> did not in this case.

I'd say that perhaps dst_output() can be handled in the "traditional"
way, by not inlining it ever.

If we have indirect function invocations and non-direct inlines, maybe
in the end it's better to have it in a single hot cache location, no?

^ permalink raw reply

* Re: [PATCH net-next v1 00/12] amd-xgbe: AMD XGBE driver updates 2014-11-04
From: David Miller @ 2014-11-06  3:00 UTC (permalink / raw)
  To: thomas.lendacky; +Cc: netdev
In-Reply-To: <20141104220620.24738.10070.stgit@tlendack-t1.amdoffice.net>

From: Tom Lendacky <thomas.lendacky@amd.com>
Date: Tue, 4 Nov 2014 16:06:20 -0600

> The following series of patches includes functional updates to the
> driver as well as some trivial changes for function renaming and
> spelling fixes.
> 
> - Move channel and ring structure allocation into the device open path
> - Rename the pre_xmit function to dev_xmit
> - Explicitly use the u32 data type for the device descriptors
> - Use page allocation for the receive buffers
> - Add support for split header/payload receive
> - Add support for per DMA channel interrupts
> - Add support for receive side scaling (RSS)
> - Add support for ethtool receive side scaling commands
> - Fix the spelling of descriptors
> - After a PCS reset, sync the PCS and PHY modes
> - Add dependency on HAS_IOMEM to both the amd-xgbe and amd-xgbe-phy
>   drivers
> 
> This patch series is based on net-next.

Series applied, this series looked really nice.

Thanks.

^ permalink raw reply

* Re: [PATCH net 3/5] fm10k: Implement ndo_gso_check()
From: Alexander Duyck @ 2014-11-06  2:54 UTC (permalink / raw)
  To: Joe Stringer, netdev
  Cc: sathya.perla, jeffrey.t.kirsher, linux.nics, amirv, shahed.shaikh,
	Dept-GELinuxNICDev, therbert, linux-kernel
In-Reply-To: <1415138202-1197-4-git-send-email-joestringer@nicira.com>

On 11/04/2014 01:56 PM, Joe Stringer wrote:
> ndo_gso_check() was recently introduced to allow NICs to report the
> offloading support that they have on a per-skb basis. Add an
> implementation for this driver which checks for something that looks
> like VXLAN.
>
> Implementation shamelessly stolen from Tom Herbert:
> http://thread.gmane.org/gmane.linux.network/332428/focus=333111
>
> Signed-off-by: Joe Stringer <joestringer@nicira.com>
> ---
> Should this driver report support for GSO on packets with tunnel headers
> up to 64B like the i40e driver does?
> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> index 8811364..b9ef622 100644
> --- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> @@ -1350,6 +1350,17 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
>  	}
>  }
>  
> +static bool fm10k_gso_check(struct sk_buff *skb, struct net_device *dev)
> +{
> +	if ((skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) &&
> +	    (skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
> +	     skb->inner_protocol != htons(ETH_P_TEB) ||
> +	     skb_inner_mac_header(skb) - skb_transport_header(skb) != 16))
> +		return false;
> +
> +	return true;
> +}
> +
>  static const struct net_device_ops fm10k_netdev_ops = {
>  	.ndo_open		= fm10k_open,
>  	.ndo_stop		= fm10k_close,
> @@ -1372,6 +1383,7 @@ static const struct net_device_ops fm10k_netdev_ops = {
>  	.ndo_do_ioctl		= fm10k_ioctl,
>  	.ndo_dfwd_add_station	= fm10k_dfwd_add_station,
>  	.ndo_dfwd_del_station	= fm10k_dfwd_del_station,
> +	.ndo_gso_check		= fm10k_gso_check,
>  };
>  
>  #define DEFAULT_DEBUG_LEVEL_SHIFT 3

I'm thinking this check is far too simplistic.  If you look the fm10k
driver already has fm10k_tx_encap_offload() in the TSO function for
verifying if it can support offloading tunnels or not.  I would
recommend starting there or possibly even just adapting that function to
suit your purpose.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH 02/13] net_sched: introduce qdisc_peek() helper function
From: Cong Wang @ 2014-11-06  2:50 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Stephen Hemminger, Linux Kernel Network Developers
In-Reply-To: <20141105034929.GA19857@gondor.apana.org.au>

On Tue, Nov 4, 2014 at 7:49 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Tue, Nov 4, 2014 at 10:45 AM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> On Tue,  4 Nov 2014 09:56:25 -0800
>>> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>
>>>> +static inline void qdisc_warn_nonwc(void *func, struct Qdisc *qdisc)
>>>> +{
>>>> +     if (!(qdisc->flags & TCQ_F_WARN_NONWC)) {
>>>> +             pr_warn("%pf: %s qdisc %X: is non-work-conserving?\n",
>>>> +                     func, qdisc->ops->id, qdisc->handle >> 16);
>>>> +             qdisc->flags |= TCQ_F_WARN_NONWC;
>>>> +     }
>>>> +}
>>>> +
>>>
>>> Inilining this and creating N copies of same message is not a step forward.
>>
>> Hmm, I think gcc merges same string literals when building Linux kernel?
>> But I never verify this.
>
> In general you should try to avoid inlining code that's not in
> the fast path as that leads to binary code size bloat.  As errors
> shouldn't be in the fast path this function should be inlined.

Makes sense.

Thanks!

^ permalink raw reply

* Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics
From: Tom Herbert @ 2014-11-06  2:44 UTC (permalink / raw)
  To: David Miller
  Cc: Joe Stringer, Or Gerlitz, Linux Netdev List, Sathya Perla,
	Jeff Kirsher, linux.nics, Amir Vadai, shahed.shaikh,
	dept-gelinuxnicdev, LKML
In-Reply-To: <20141105.211558.969082848816106943.davem@davemloft.net>

On Wed, Nov 5, 2014 at 6:15 PM, David Miller <davem@davemloft.net> wrote:
> From: Joe Stringer <joestringer@nicira.com>
> Date: Wed, 5 Nov 2014 17:06:46 -0800
>
>> My impression was that the changes are more likely to be
>> hardware-specific (like the i40e changes) rather than software-specific,
>> like changes that might be integrated into the helper.
>
> I think there is more commonality amongst hardware capabilities,
> and this is why I want the helper to play itself out.
>
>> That said, I can rework for one helper. The way I see it would be the
>> same code as these patches, as "vxlan_gso_check(struct sk_buff *)" in
>> drivers/net/vxlan.c which would be called from each driver. Is that what
>> you had in mind?
>
> Yes.

Note that this code is not VXLAN specific, it will also accept NVGRE
and GRE/UDP with keyid and TEB. I imagine all these cases should be
indistinguishable to the hardware so they probably just work (which
would be cool!). It might be better to name and locate the helper
function to reflect that.

^ permalink raw reply

* Re: [PATCH net-next] net: gro: add a per device gro flush timer
From: Eric Dumazet @ 2014-11-06  2:39 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev, Or Gerlitz, Willem de Bruijn
In-Reply-To: <1415240055.13896.57.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, 2014-11-05 at 18:14 -0800, Eric Dumazet wrote:
> On Wed, 2014-11-05 at 17:38 -0800, Rick Jones wrote:
> 
> > Speaking of QPS, what happens to 200 TCP_RR tests when the feature is 
> > enabled?

The possible reduction of QPS happens when you have a single flow like
TCP_RR  -- -r 40000,40000

(Because we have one single TCP packet with 40000 bytes of payload,
application is waked up once when Push flag is received)

So cpu effiency is way better, but application has to copy 40000 bytes
in one go _after_ Push flag, instead of being able to copy part of the
data _before_ receiving the Push flag.

lpaa5:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# ./netperf -H lpaa6 -t TCP_RR -l 20 -Cc -- -r 40000,40000
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa6.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

16384  87380  40000   40000  20.00   9023.86  2.02   1.70   107.513  90.561 
16384  87380 

lpaa5:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# ./netperf -H lpaa6 -t TCP_RR -l 20 -Cc -- -r 40000,40000
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa6.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

16384  87380  40000   40000  20.00   8651.26  0.66   1.02   36.502  56.710 
16384  87380 

^ permalink raw reply

* Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics
From: David Miller @ 2014-11-06  2:15 UTC (permalink / raw)
  To: joestringer
  Cc: gerlitz.or, therbert, netdev, sathya.perla, jeffrey.t.kirsher,
	linux.nics, amirv, shahed.shaikh, Dept-GELinuxNICDev,
	linux-kernel
In-Reply-To: <20141106010501.GA18339@gmail.com>

From: Joe Stringer <joestringer@nicira.com>
Date: Wed, 5 Nov 2014 17:06:46 -0800

> My impression was that the changes are more likely to be
> hardware-specific (like the i40e changes) rather than software-specific,
> like changes that might be integrated into the helper.

I think there is more commonality amongst hardware capabilities,
and this is why I want the helper to play itself out.

> That said, I can rework for one helper. The way I see it would be the
> same code as these patches, as "vxlan_gso_check(struct sk_buff *)" in
> drivers/net/vxlan.c which would be called from each driver. Is that what
> you had in mind?

Yes.

^ permalink raw reply

* Re: [PATCH net-next] net: gro: add a per device gro flush timer
From: Eric Dumazet @ 2014-11-06  2:14 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev, Or Gerlitz, Willem de Bruijn
In-Reply-To: <545AD11B.5050603@hp.com>

On Wed, 2014-11-05 at 17:38 -0800, Rick Jones wrote:

> Speaking of QPS, what happens to 200 TCP_RR tests when the feature is 
> enabled?

Nothing at all (but the usual noise I guess)

200 TCP_RR send packets with 1 byte of payload and Push flag,
so no packet ever sits in napi->gro_list

lpaa5:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# time ./super_netperf 200 -H lpaa6 -t TCP_RR -l 20
3.13827e+06

real	0m32.170s
user	0m32.885s
sys	7m38.868s

lpaa5:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# time ./super_netperf 200 -H lpaa6 -t TCP_RR -l 20
3.19013e+06

real	0m37.152s
user	0m33.477s
sys	7m30.586s

Now lets try TCP_RR with -- -r 4000,4000 ;)

Reducing ACK packets allow us to better use the 10Gbe bandwith for
payload, so QPS actually increase.

lpaa5:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 0 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# time ./super_netperf 200 -H lpaa6 -t TCP_RR -l 20 -- -r
4000,4000
379645

real	0m32.201s
user	0m4.390s
sys	0m59.501s

lpaa5:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa6:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
lpaa5:~# time ./super_netperf 200 -H lpaa6 -t TCP_RR -l 20 -- -r
4000,4000
400610

real	0m37.159s
user	0m4.501s
sys	0m59.665s

^ permalink raw reply

* Re: M_CAN message RAM initialization AppNote  - was: Re: [PATCH V3 3/3] can: m_can: workaround for transmit data less than 4 bytes
From: Dong Aisheng @ 2014-11-06  1:57 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, linux-can, wg, varkabhadram, netdev,
	linux-arm-kernel
In-Reply-To: <545A692E.40002@hartkopp.net>

On Wed, Nov 05, 2014 at 07:15:10PM +0100, Oliver Hartkopp wrote:
> Hi all,
> 
> just to close this application note relevant point ...
> 
> I got an answer from Florian Hartwich (Mr. CAN) from Bosch regarding
> the bit error detection found by Dong Aisheng.
> 
> The relevant interrupts IR.BEU or IR.BEC monitor the message RAM:
> 
> Bit 21 BEU: Bit Error Uncorrected
> Message RAM bit error detected, uncorrected. Controlled by input
> signal m_can_aeim_berr[1] generated by an optional external parity /
> ECC logic attached to the Message RAM. An uncorrected Message RAM
> bit error sets CCCR.INIT to ‘1’. This is done to avoid transmission
> of corrupted data.
> 
> 0= No bit error detected when reading from Message RAM
> 1= Bit error detected, uncorrected (e.g. parity logic)
> 
> Bit 20 BEC: Bit Error Corrected
> Message RAM bit error detected and corrected. Controlled by input
> signal m_can_aeim_berr[0] generated by an optional external parity /
> ECC logic attached to the Message RAM.
> 
> 0= No bit error detected when reading from Message RAM
> 1= Bit error detected and corrected (e.g. ECC)
> 
> ---
> 
> The Message RAM is usually equipped with a parity or ECC functionality.
> But RAM cells suffer a hardware reset and can therefore hold
> arbitrary content at startup - including parity and/or ECC bits.
> 
> So when you write only the CAN ID and the first four bytes the last
> four bytes remain untouched. Then the M_CAN starts to read in 32bit
> words from the start of the Tx Message element. So it is very likely
> to trigger the message RAM error when reading the uninitialized
> 32bit word from the last four bytes.
> 
> Finally it turns out that an initial writing (with any kind of data)
> to the entire message RAM is mandatory to create valid parity/ECC
> checksums.
> 
> That's it.
> 

Thanks for sharing this information.
Does it mean this issue is related to the nature of Message RAM and is
supposed to exist on all M_CAN IP versions?

> Regards,
> Oliver
> 

Regards
Dong Aisheng

^ permalink raw reply

* Re: [PATCH V3 1/3] can: add can_is_canfd_skb() API
From: Dong Aisheng @ 2014-11-06  1:52 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Eric Dumazet, linux-can, mkl, wg, varkabhadram, netdev,
	linux-arm-kernel
In-Reply-To: <545A5F55.7050307@hartkopp.net>

On Wed, Nov 05, 2014 at 06:33:09PM +0100, Oliver Hartkopp wrote:
> On 05.11.2014 17:22, Eric Dumazet wrote:
> >On Wed, 2014-11-05 at 21:16 +0800, Dong Aisheng wrote:
> 
> >
> >This looks a bit strange to assume that skb->len == magical_value is CAN
> >FD. A comment would be nice.
> >
> 
> Yes. Due to exactly two types of struct can(fd)_frame which can be
> contained in a skb the skbs are distinguished by the length which
> can be either CAN_MTU or CANFD_MTU.
> 
> >>+static inline int can_is_canfd_skb(struct sk_buff *skb)
> >
> >static inline bool can_is_canfd_skb(const struct sk_buff *skb)
> >
> 
> ok.
> 

Got it.

> >>+{
> 
> What about:
> 
> 	/* the CAN specific type of skb is identified by its data length */
> 

Looks good to me.
I will send a updated version with these changes.

> >>+	return skb->len == CANFD_MTU;
> >>+}
> >>+
> >>  /* get data length from can_dlc with sanitized can_dlc */
> >>  u8 can_dlc2len(u8 can_dlc);
> 
> Regards,
> Oliver
>

Regards
Dong Aisheng

^ permalink raw reply

* Re: [PATCH net-next] net: gro: add a per device gro flush timer
From: Rick Jones @ 2014-11-06  1:38 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, Or Gerlitz, Willem de Bruijn
In-Reply-To: <1415235320.13896.51.camel@edumazet-glaptop2.roam.corp.google.com>

On 11/05/2014 04:55 PM, Eric Dumazet wrote:
> Tested:
>   Ran 200 netperf TCP_STREAM from A to B (10Gbe link, 8 RX queues)
>
> Without this feature, we send back about 305,000 ACK per second.
>
> GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
>
> Setting a timer of 2000 nsec is enough to increase GRO packet sizes
> and reduce number of ACK packets. (811/19.2 = 42)
>
> Receiver performs less calls to upper stacks, less wakes up.
> This also reduces cpu usage on the sender, as it receives less ACK
> packets.
>
> Note that reducing number of wakes up increases cpu efficiency, but can
> decrease QPS, as applications wont have the chance to warmup cpu caches
> doing a partial read of RPC requests/answers if they fit in one skb.

Speaking of QPS, what happens to 200 TCP_RR tests when the feature is 
enabled?

rick jones

^ permalink raw reply

* [PATCH 3/3 3.18] rtlwifi: rtl8192se: Fix connection problems
From: Larry Finger @ 2014-11-06  1:10 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Larry Finger, netdev
In-Reply-To: <1415236254-12274-1-git-send-email-Larry.Finger@lwfinger.net>

Changes in the vendor driver were added to rtlwifi, but some updates
to rtl8192se were missed.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
 drivers/net/wireless/rtlwifi/rtl8192se/hw.c  | 129 +++++++++++++--------------
 drivers/net/wireless/rtlwifi/rtl8192se/phy.c |   8 +-
 drivers/net/wireless/rtlwifi/rtl8192se/sw.c  |   4 +
 drivers/net/wireless/rtlwifi/rtl8192se/trx.c |  23 +++++
 drivers/net/wireless/rtlwifi/rtl8192se/trx.h |   4 +
 5 files changed, 100 insertions(+), 68 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/hw.c b/drivers/net/wireless/rtlwifi/rtl8192se/hw.c
index 00e0670..4626203 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192se/hw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192se/hw.c
@@ -1170,27 +1170,32 @@ static int _rtl92se_set_media_status(struct ieee80211_hw *hw,
 {
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
 	u8 bt_msr = rtl_read_byte(rtlpriv, MSR);
+	enum led_ctl_mode ledaction = LED_CTL_NO_LINK;
 	u32 temp;
+	u8 mode = MSR_NOLINK;
+
 	bt_msr &= ~MSR_LINK_MASK;
 
 	switch (type) {
 	case NL80211_IFTYPE_UNSPECIFIED:
-		bt_msr |= (MSR_LINK_NONE << MSR_LINK_SHIFT);
+		mode = MSR_NOLINK;
 		RT_TRACE(rtlpriv, COMP_INIT, DBG_TRACE,
 			 "Set Network type to NO LINK!\n");
 		break;
 	case NL80211_IFTYPE_ADHOC:
-		bt_msr |= (MSR_LINK_ADHOC << MSR_LINK_SHIFT);
+		mode = MSR_ADHOC;
 		RT_TRACE(rtlpriv, COMP_INIT, DBG_TRACE,
 			 "Set Network type to Ad Hoc!\n");
 		break;
 	case NL80211_IFTYPE_STATION:
-		bt_msr |= (MSR_LINK_MANAGED << MSR_LINK_SHIFT);
+		mode = MSR_INFRA;
+		ledaction = LED_CTL_LINK;
 		RT_TRACE(rtlpriv, COMP_INIT, DBG_TRACE,
 			 "Set Network type to STA!\n");
 		break;
 	case NL80211_IFTYPE_AP:
-		bt_msr |= (MSR_LINK_MASTER << MSR_LINK_SHIFT);
+		mode = MSR_AP;
+		ledaction = LED_CTL_LINK;
 		RT_TRACE(rtlpriv, COMP_INIT, DBG_TRACE,
 			 "Set Network type to AP!\n");
 		break;
@@ -1201,7 +1206,17 @@ static int _rtl92se_set_media_status(struct ieee80211_hw *hw,
 
 	}
 
-	rtl_write_byte(rtlpriv, (MSR), bt_msr);
+	/* MSR_INFRA == Link in infrastructure network;
+	 * MSR_ADHOC == Link in ad hoc network;
+	 * Therefore, check link state is necessary.
+	 *
+	 * MSR_AP == AP mode; link state is not cared here.
+	 */
+	if (mode != MSR_AP && rtlpriv->mac80211.link_state < MAC80211_LINKED) {
+		mode = MSR_NOLINK;
+		ledaction = LED_CTL_NO_LINK;
+}
+	rtl_write_byte(rtlpriv, (MSR), bt_msr | mode);
 
 	temp = rtl_read_dword(rtlpriv, TCR);
 	rtl_write_dword(rtlpriv, TCR, temp & (~BIT(8)));
@@ -1262,6 +1277,7 @@ void rtl92se_enable_interrupt(struct ieee80211_hw *hw)
 	rtl_write_dword(rtlpriv, INTA_MASK, rtlpci->irq_mask[0]);
 	/* Support Bit 32-37(Assign as Bit 0-5) interrupt setting now */
 	rtl_write_dword(rtlpriv, INTA_MASK + 4, rtlpci->irq_mask[1] & 0x3F);
+	rtlpci->irq_enabled = true;
 }
 
 void rtl92se_disable_interrupt(struct ieee80211_hw *hw)
@@ -1276,8 +1292,7 @@ void rtl92se_disable_interrupt(struct ieee80211_hw *hw)
 	rtlpci = rtl_pcidev(rtl_pcipriv(hw));
 	rtl_write_dword(rtlpriv, INTA_MASK, 0);
 	rtl_write_dword(rtlpriv, INTA_MASK + 4, 0);
-
-	synchronize_irq(rtlpci->pdev->irq);
+	rtlpci->irq_enabled = false;
 }
 
 static u8 _rtl92s_set_sysclk(struct ieee80211_hw *hw, u8 data)
@@ -2035,9 +2050,9 @@ static void rtl92se_update_hal_rate_table(struct ieee80211_hw *hw,
 	u32 ratr_value;
 	u8 ratr_index = 0;
 	u8 nmode = mac->ht_enable;
-	u8 mimo_ps = IEEE80211_SMPS_OFF;
 	u16 shortgi_rate = 0;
 	u32 tmp_ratr_value = 0;
+	u32 ratr_mask;
 	u8 curtxbw_40mhz = mac->bw_40;
 	u8 curshortgi_40mhz = (sta->ht_cap.cap & IEEE80211_HT_CAP_SGI_40) ?
 				1 : 0;
@@ -2063,26 +2078,21 @@ static void rtl92se_update_hal_rate_table(struct ieee80211_hw *hw,
 	case WIRELESS_MODE_N_24G:
 	case WIRELESS_MODE_N_5G:
 		nmode = 1;
-		if (mimo_ps == IEEE80211_SMPS_STATIC) {
-			ratr_value &= 0x0007F005;
-		} else {
-			u32 ratr_mask;
 
-			if (get_rf_type(rtlphy) == RF_1T2R ||
-			    get_rf_type(rtlphy) == RF_1T1R) {
-				if (curtxbw_40mhz)
-					ratr_mask = 0x000ff015;
-				else
-					ratr_mask = 0x000ff005;
-			} else {
-				if (curtxbw_40mhz)
-					ratr_mask = 0x0f0ff015;
-				else
-					ratr_mask = 0x0f0ff005;
-			}
-
-			ratr_value &= ratr_mask;
+		if (get_rf_type(rtlphy) == RF_1T2R ||
+		    get_rf_type(rtlphy) == RF_1T1R) {
+			if (curtxbw_40mhz)
+				ratr_mask = 0x000ff015;
+			else
+				ratr_mask = 0x000ff005;
+		} else {
+			if (curtxbw_40mhz)
+				ratr_mask = 0x0f0ff015;
+			else
+				ratr_mask = 0x0f0ff005;
 		}
+
+		ratr_value &= ratr_mask;
 		break;
 	default:
 		if (rtlphy->rf_type == RF_1T2R)
@@ -2137,7 +2147,8 @@ static void rtl92se_update_hal_rate_mask(struct ieee80211_hw *hw,
 	struct rtl_sta_info *sta_entry = NULL;
 	u32 ratr_bitmap;
 	u8 ratr_index = 0;
-	u8 curtxbw_40mhz = (sta->bandwidth >= IEEE80211_STA_RX_BW_40) ? 1 : 0;
+	u8 curtxbw_40mhz = (sta->ht_cap.cap & IEEE80211_HT_CAP_SUP_WIDTH_20_40)
+				? 1 : 0;
 	u8 curshortgi_40mhz = (sta->ht_cap.cap & IEEE80211_HT_CAP_SGI_40) ?
 				1 : 0;
 	u8 curshortgi_20mhz = (sta->ht_cap.cap & IEEE80211_HT_CAP_SGI_20) ?
@@ -2148,9 +2159,7 @@ static void rtl92se_update_hal_rate_mask(struct ieee80211_hw *hw,
 	u8 shortgi_rate = 0;
 	u32 mask = 0;
 	u32 band = 0;
-	bool bmulticast = false;
 	u8 macid = 0;
-	u8 mimo_ps = IEEE80211_SMPS_OFF;
 
 	sta_entry = (struct rtl_sta_info *) sta->drv_priv;
 	wirelessmode = sta_entry->wireless_mode;
@@ -2198,41 +2207,32 @@ static void rtl92se_update_hal_rate_mask(struct ieee80211_hw *hw,
 		band |= (WIRELESS_11N | WIRELESS_11G | WIRELESS_11B);
 		ratr_index = RATR_INX_WIRELESS_NGB;
 
-		if (mimo_ps == IEEE80211_SMPS_STATIC) {
-			if (rssi_level == 1)
-				ratr_bitmap &= 0x00070000;
-			else if (rssi_level == 2)
-				ratr_bitmap &= 0x0007f000;
-			else
-				ratr_bitmap &= 0x0007f005;
+		if (rtlphy->rf_type == RF_1T2R ||
+			rtlphy->rf_type == RF_1T1R) {
+			if (rssi_level == 1) {
+					ratr_bitmap &= 0x000f0000;
+			} else if (rssi_level == 3) {
+				ratr_bitmap &= 0x000fc000;
+			} else if (rssi_level == 5) {
+					ratr_bitmap &= 0x000ff000;
+			} else {
+				if (curtxbw_40mhz)
+					ratr_bitmap &= 0x000ff015;
+				else
+					ratr_bitmap &= 0x000ff005;
+			}
 		} else {
-			if (rtlphy->rf_type == RF_1T2R ||
-				rtlphy->rf_type == RF_1T1R) {
-				if (rssi_level == 1) {
-						ratr_bitmap &= 0x000f0000;
-				} else if (rssi_level == 3) {
-					ratr_bitmap &= 0x000fc000;
-				} else if (rssi_level == 5) {
-						ratr_bitmap &= 0x000ff000;
-				} else {
-					if (curtxbw_40mhz)
-						ratr_bitmap &= 0x000ff015;
-					else
-						ratr_bitmap &= 0x000ff005;
-				}
+			if (rssi_level == 1) {
+				ratr_bitmap &= 0x0f8f0000;
+			} else if (rssi_level == 3) {
+				ratr_bitmap &= 0x0f8fc000;
+			} else if (rssi_level == 5) {
+				ratr_bitmap &= 0x0f8ff000;
 			} else {
-				if (rssi_level == 1) {
-					ratr_bitmap &= 0x0f8f0000;
-				} else if (rssi_level == 3) {
-					ratr_bitmap &= 0x0f8fc000;
-				} else if (rssi_level == 5) {
-					ratr_bitmap &= 0x0f8ff000;
-				} else {
-					if (curtxbw_40mhz)
-						ratr_bitmap &= 0x0f8ff015;
-					else
-						ratr_bitmap &= 0x0f8ff005;
-				}
+				if (curtxbw_40mhz)
+					ratr_bitmap &= 0x0f8ff015;
+				else
+					ratr_bitmap &= 0x0f8ff005;
 			}
 		}
 
@@ -2275,15 +2275,12 @@ static void rtl92se_update_hal_rate_mask(struct ieee80211_hw *hw,
 		rtl_write_byte(rtlpriv, SG_RATE, shortgi_rate);
 	}
 
-	mask |= (bmulticast ? 1 : 0) << 9 | (macid & 0x1f) << 4 | (band & 0xf);
+	mask |= (macid & 0x1f) << 4 | (band & 0xf);
 
 	RT_TRACE(rtlpriv, COMP_RATR, DBG_TRACE, "mask = %x, bitmap = %x\n",
 		 mask, ratr_bitmap);
 	rtl_write_dword(rtlpriv, 0x2c4, ratr_bitmap);
 	rtl_write_dword(rtlpriv, WFM5, (FW_RA_UPDATE_MASK | (mask << 8)));
-
-	if (macid != 0)
-		sta_entry->ratr_index = ratr_index;
 }
 
 void rtl92se_update_hal_rate_tbl(struct ieee80211_hw *hw,
diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/phy.c b/drivers/net/wireless/rtlwifi/rtl8192se/phy.c
index 77c5b5f..e382cef 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192se/phy.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192se/phy.c
@@ -399,6 +399,11 @@ static bool _rtl92s_phy_sw_chnl_step_by_step(struct ieee80211_hw *hw,
 		case 2:
 			currentcmd = &postcommoncmd[*step];
 			break;
+		default:
+			RT_TRACE(rtlpriv, COMP_ERR, DBG_LOUD,
+				 "Invalid 'stage' = %d, Check it!\n",
+				 *stage);
+			return true;
 		}
 
 		if (currentcmd->cmdid == CMDID_END) {
@@ -602,7 +607,7 @@ bool rtl92s_phy_set_rf_power_state(struct ieee80211_hw *hw,
 		}
 	case ERFSLEEP:
 			if (ppsc->rfpwr_state == ERFOFF)
-				return false;
+				break;
 
 			for (queue_id = 0, i = 0;
 			     queue_id < RTL_PCI_MAX_TX_QUEUE_COUNT;) {
@@ -1064,7 +1069,6 @@ bool rtl92s_phy_bb_config(struct ieee80211_hw *hw)
 	/* Check BB/RF confiuration setting. */
 	/* We only need to configure RF which is turned on. */
 	path1 = (u8)(rtl92s_phy_query_bb_reg(hw, RFPGA0_TXINFO, 0xf));
-	mdelay(10);
 	path2 = (u8)(rtl92s_phy_query_bb_reg(hw, ROFDM0_TRXPATHENABLE, 0xf));
 	pathmap = path1 | path2;
 
diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/sw.c b/drivers/net/wireless/rtlwifi/rtl8192se/sw.c
index aadba29..3c4238e 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192se/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192se/sw.c
@@ -269,6 +269,7 @@ static struct rtl_hal_ops rtl8192se_hal_ops = {
 	.led_control = rtl92se_led_control,
 	.set_desc = rtl92se_set_desc,
 	.get_desc = rtl92se_get_desc,
+	.is_tx_desc_closed = rtl92se_is_tx_desc_closed,
 	.tx_polling = rtl92se_tx_polling,
 	.enable_hw_sec = rtl92se_enable_hw_security_config,
 	.set_key = rtl92se_set_key,
@@ -278,6 +279,7 @@ static struct rtl_hal_ops rtl8192se_hal_ops = {
 	.get_rfreg = rtl92s_phy_query_rf_reg,
 	.set_rfreg = rtl92s_phy_set_rf_reg,
 	.get_btc_status = rtl_btc_status_false,
+	.rx_command_packet = rtl92se_rx_command_packet,
 };
 
 static struct rtl_mod_params rtl92se_mod_params = {
@@ -306,6 +308,8 @@ static struct rtl_hal_cfg rtl92se_hal_cfg = {
 	.maps[MAC_RCR_ACRC32] = RCR_ACRC32,
 	.maps[MAC_RCR_ACF] = RCR_ACF,
 	.maps[MAC_RCR_AAP] = RCR_AAP,
+	.maps[MAC_HIMR] = INTA_MASK,
+	.maps[MAC_HIMRE] = INTA_MASK + 4,
 
 	.maps[EFUSE_TEST] = REG_EFUSE_TEST,
 	.maps[EFUSE_CTRL] = REG_EFUSE_CTRL,
diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/trx.c b/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
index 672fd3b..2014b18 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192se/trx.c
@@ -652,8 +652,31 @@ u32 rtl92se_get_desc(u8 *desc, bool istx, u8 desc_name)
 	return ret;
 }
 
+bool rtl92se_is_tx_desc_closed(struct ieee80211_hw *hw, u8 hw_queue, u16 index)
+{
+	struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
+	struct rtl8192_tx_ring *ring = &rtlpci->tx_ring[hw_queue];
+	u8 *entry = (u8 *)(&ring->desc[ring->idx]);
+	u8 own = (u8)rtl92se_get_desc(entry, true, HW_DESC_OWN);
+
+	/* beacon packet will only use the first
+	 * descriptor iby default, and the own bit may not
+	 * be cleared by the hardware
+	 */
+	if (own)
+		return false;
+	return true;
+}
+
 void rtl92se_tx_polling(struct ieee80211_hw *hw, u8 hw_queue)
 {
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
 	rtl_write_word(rtlpriv, TP_POLL, BIT(0) << (hw_queue));
 }
+
+u32 rtl92se_rx_command_packet(struct ieee80211_hw *hw,
+			      struct rtl_stats status,
+			      struct sk_buff *skb)
+{
+	return 0;
+}
diff --git a/drivers/net/wireless/rtlwifi/rtl8192se/trx.h b/drivers/net/wireless/rtlwifi/rtl8192se/trx.h
index 5a13f17..bd9f4bf 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192se/trx.h
+++ b/drivers/net/wireless/rtlwifi/rtl8192se/trx.h
@@ -43,6 +43,10 @@ bool rtl92se_rx_query_desc(struct ieee80211_hw *hw, struct rtl_stats *stats,
 void rtl92se_set_desc(struct ieee80211_hw *hw, u8 *pdesc, bool istx,
 		      u8 desc_name, u8 *val);
 u32 rtl92se_get_desc(u8 *pdesc, bool istx, u8 desc_name);
+bool rtl92se_is_tx_desc_closed(struct ieee80211_hw *hw, u8 hw_queue, u16 index);
 void rtl92se_tx_polling(struct ieee80211_hw *hw, u8 hw_queue);
+u32 rtl92se_rx_command_packet(struct ieee80211_hw *hw,
+			      struct rtl_stats status,
+			      struct sk_buff *skb);
 
 #endif
-- 
2.1.2

^ permalink raw reply related

* [PATCH 2/3 3.18] rtlwifi: Fix errors in descriptor manipulation
From: Larry Finger @ 2014-11-06  1:10 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Larry Finger, netdev
In-Reply-To: <1415236254-12274-1-git-send-email-Larry.Finger@lwfinger.net>

There are typos in the handling of the descriptor pointers where the wrong
descriptor is referenced. There is also an error in which the pointer is
incremented twice.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
 drivers/net/wireless/rtlwifi/pci.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c
index 116f746..6d2b628 100644
--- a/drivers/net/wireless/rtlwifi/pci.c
+++ b/drivers/net/wireless/rtlwifi/pci.c
@@ -1375,9 +1375,9 @@ static void _rtl_pci_free_tx_ring(struct ieee80211_hw *hw,
 	ring->desc = NULL;
 	if (rtlpriv->use_new_trx_flow) {
 		pci_free_consistent(rtlpci->pdev,
-				    sizeof(*ring->desc) * ring->entries,
+				    sizeof(*ring->buffer_desc) * ring->entries,
 				    ring->buffer_desc, ring->buffer_desc_dma);
-		ring->desc = NULL;
+		ring->buffer_desc = NULL;
 	}
 }
 
@@ -1548,7 +1548,6 @@ int rtl_pci_reset_trx_ring(struct ieee80211_hw *hw)
 							 true,
 							 HW_DESC_TXBUFF_ADDR),
 						 skb->len, PCI_DMA_TODEVICE);
-				ring->idx = (ring->idx + 1) % ring->entries;
 				kfree_skb(skb);
 				ring->idx = (ring->idx + 1) % ring->entries;
 			}
-- 
2.1.2

^ permalink raw reply related

* [PATCH 1/3 3.18] rtlwifi: Fix setting of tx descriptor for new trx flow
From: Larry Finger @ 2014-11-06  1:10 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Larry Finger, netdev
In-Reply-To: <1415236254-12274-1-git-send-email-Larry.Finger@lwfinger.net>

Device RTL8192EE uses a new form of trx flow. This fix sets up the descriptors
correctly.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
 drivers/net/wireless/rtlwifi/pci.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c
index 25daa87..116f746 100644
--- a/drivers/net/wireless/rtlwifi/pci.c
+++ b/drivers/net/wireless/rtlwifi/pci.c
@@ -1127,9 +1127,14 @@ static void _rtl_pci_prepare_bcn_tasklet(struct ieee80211_hw *hw)
 
 	__skb_queue_tail(&ring->queue, pskb);
 
-	rtlpriv->cfg->ops->set_desc(hw, (u8 *)pdesc, true, HW_DESC_OWN,
-				    &temp_one);
-
+	if (rtlpriv->use_new_trx_flow) {
+		temp_one = 4;
+		rtlpriv->cfg->ops->set_desc(hw, (u8 *)pbuffer_desc, true,
+					    HW_DESC_OWN, (u8 *)&temp_one);
+	} else {
+		rtlpriv->cfg->ops->set_desc(hw, (u8 *)pdesc, true, HW_DESC_OWN,
+					    &temp_one);
+	}
 	return;
 }
 
-- 
2.1.2

^ permalink raw reply related

* [PATCH 0/3i 3.18] Fix more problems with rtlwifi
From: Larry Finger @ 2014-11-06  1:10 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Larry Finger, netdev

This set of patches fix some additional problems found for rtlwifi,
rtl8192se, and rtl8192ee.

It is certainly possible that rtlwifi is getting too large. For that reason,
my changes for 3.19 will be restricted to identifying common routines, and
moving such code from the individual drivers into driver rtlwifi.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>


Larry Finger (3):
  rtlwifi: Fix setting of tx descriptor for new trx flow
  rtlwifi: Fix errors in descriptor manipulation
  rtlwifi: rtl8192se: Fix connection problems

 drivers/net/wireless/rtlwifi/pci.c           |  16 ++--
 drivers/net/wireless/rtlwifi/rtl8192se/hw.c  | 129 +++++++++++++--------------
 drivers/net/wireless/rtlwifi/rtl8192se/phy.c |   8 +-
 drivers/net/wireless/rtlwifi/rtl8192se/sw.c  |   4 +
 drivers/net/wireless/rtlwifi/rtl8192se/trx.c |  23 +++++
 drivers/net/wireless/rtlwifi/rtl8192se/trx.h |   4 +
 6 files changed, 110 insertions(+), 74 deletions(-)

-- 
2.1.2

^ permalink raw reply

* Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics
From: Joe Stringer @ 2014-11-06  1:06 UTC (permalink / raw)
  To: David Miller
  Cc: gerlitz.or, therbert, netdev, sathya.perla, jeffrey.t.kirsher,
	linux.nics, amirv, shahed.shaikh, Dept-GELinuxNICDev,
	linux-kernel
In-Reply-To: <20141105.163825.1433973842938441546.davem@davemloft.net>

On Wed, Nov 05, 2014 at 04:38:25PM -0500, David Miller wrote:
> From: Or Gerlitz <gerlitz.or@gmail.com>
> Date: Wed, 5 Nov 2014 23:32:44 +0200
> 
> > but fact is that the proposed patch series has the --same-- helper for
> > four drivers, so why not start with a that limited helper which would
> > be picked up by these drivers and we'll take it from there.
> 
> I'm in favor of the helper, duplication is error prone.
> 
> And in fact, any differences a driver ends up needing might be
> integratable into the helper.

My impression was that the changes are more likely to be
hardware-specific (like the i40e changes) rather than software-specific,
like changes that might be integrated into the helper.

That said, I can rework for one helper. The way I see it would be the
same code as these patches, as "vxlan_gso_check(struct sk_buff *)" in
drivers/net/vxlan.c which would be called from each driver. Is that what
you had in mind?

^ permalink raw reply

* [PATCH net-next] net: gro: add a per device gro flush timer
From: Eric Dumazet @ 2014-11-06  0:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Or Gerlitz, Willem de Bruijn

From: Eric Dumazet <edumazet@google.com>

Tuning coalescing parameters on NIC can be really hard.

Servers can handle both bulk and RPC like traffic, with conflicting
goals : bulk flows want as big GRO packets as possible, RPC want minimal
latencies.

To reach big GRO packets on 10Gbe NIC, one can use :

ethtool -C eth0 rx-usecs 4 rx-frames 44

But this penalizes rpc sessions, with an increase of latencies, up to
50% in some cases, as NICs generally do not force an interrupt when
a packet with TCP Push flag is received.

Some NICs do not have an absolute timer, only a timer rearmed for every
incoming packet.

This patch uses a different strategy : Let GRO stack decides what do do,
based on traffic pattern.

Packets with Push flag wont be delayed.
Packets without Push flag might be held in GRO engine, if we keep
receiving data.

This new mechanism is off by default, and shall be enabled by setting
/sys/class/net/eth0/gro_flush_timeout to a value in nanosecond.

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00      0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

lpaa6:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00      0.00      0.50

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h |   12 +++------
 net/core/dev.c            |   44 ++++++++++++++++++++++++++++++++++--
 net/core/net-sysfs.c      |   18 ++++++++++++++
 3 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4767f546d7c0..8474fcfadc7c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -314,6 +314,8 @@ struct napi_struct {
 	struct net_device	*dev;
 	struct sk_buff		*gro_list;
 	struct sk_buff		*skb;
+	unsigned long		napi_rx_count;
+	struct hrtimer		timer;
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
 	unsigned int		napi_id;
@@ -485,14 +487,7 @@ void napi_hash_del(struct napi_struct *napi);
  * Stop NAPI from being scheduled on this context.
  * Waits till any outstanding processing completes.
  */
-static inline void napi_disable(struct napi_struct *n)
-{
-	might_sleep();
-	set_bit(NAPI_STATE_DISABLE, &n->state);
-	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
-		msleep(1);
-	clear_bit(NAPI_STATE_DISABLE, &n->state);
-}
+void napi_disable(struct napi_struct *n);
 
 /**
  *	napi_enable - enable NAPI scheduling
@@ -1603,6 +1598,7 @@ struct net_device {
 
 #endif
 
+	unsigned long		gro_flush_timeout;
 	rx_handler_func_t __rcu	*rx_handler;
 	void __rcu		*rx_handler_data;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 40be481268de..c88651bd8ada 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -133,6 +133,7 @@
 #include <linux/vmalloc.h>
 #include <linux/if_macvlan.h>
 #include <linux/errqueue.h>
+#include <linux/hrtimer.h>
 
 #include "net-sysfs.h"
 
@@ -4000,6 +4001,8 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	if (skb_is_gso(skb) || skb_has_frag_list(skb) || skb->csum_bad)
 		goto normal;
 
+	napi->napi_rx_count++;
+
 	gro_list_prepare(napi, skb);
 
 	rcu_read_lock();
@@ -4411,7 +4414,6 @@ EXPORT_SYMBOL(__napi_schedule_irqoff);
 void __napi_complete(struct napi_struct *n)
 {
 	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
-	BUG_ON(n->gro_list);
 
 	list_del_init(&n->poll_list);
 	smp_mb__before_atomic();
@@ -4430,8 +4432,19 @@ void napi_complete(struct napi_struct *n)
 	if (unlikely(test_bit(NAPI_STATE_NPSVC, &n->state)))
 		return;
 
-	napi_gro_flush(n, false);
+	if (n->gro_list) {
+		unsigned long timeout = 0;
+
+		if (n->napi_rx_count)
+			timeout = n->dev->gro_flush_timeout;
 
+		if (timeout)
+			hrtimer_start(&n->timer, ns_to_ktime(timeout),
+				      HRTIMER_MODE_REL_PINNED);
+		else
+			napi_gro_flush(n, false);
+	}
+	n->napi_rx_count = 0;
 	if (likely(list_empty(&n->poll_list))) {
 		WARN_ON_ONCE(!test_and_clear_bit(NAPI_STATE_SCHED, &n->state));
 	} else {
@@ -4495,10 +4508,23 @@ void napi_hash_del(struct napi_struct *napi)
 }
 EXPORT_SYMBOL_GPL(napi_hash_del);
 
+static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
+{
+	struct napi_struct *napi;
+
+	napi = container_of(timer, struct napi_struct, timer);
+	if (napi->gro_list)
+		napi_schedule(napi);
+
+	return HRTIMER_NORESTART;
+}
+
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {
 	INIT_LIST_HEAD(&napi->poll_list);
+	hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
+	napi->timer.function = napi_watchdog;
 	napi->gro_count = 0;
 	napi->gro_list = NULL;
 	napi->skb = NULL;
@@ -4517,6 +4543,20 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 }
 EXPORT_SYMBOL(netif_napi_add);
 
+void napi_disable(struct napi_struct *n)
+{
+	might_sleep();
+	set_bit(NAPI_STATE_DISABLE, &n->state);
+
+	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
+		msleep(1);
+
+	hrtimer_cancel(&n->timer);
+
+	clear_bit(NAPI_STATE_DISABLE, &n->state);
+}
+EXPORT_SYMBOL(napi_disable);
+
 void netif_napi_del(struct napi_struct *napi)
 {
 	list_del_init(&napi->dev_list);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 9dd06699b09c..1a24602cd54e 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -325,6 +325,23 @@ static ssize_t tx_queue_len_store(struct device *dev,
 }
 NETDEVICE_SHOW_RW(tx_queue_len, fmt_ulong);
 
+static int change_gro_flush_timeout(struct net_device *dev, unsigned long val)
+{
+	dev->gro_flush_timeout = val;
+	return 0;
+}
+
+static ssize_t gro_flush_timeout_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t len)
+{
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	return netdev_store(dev, attr, buf, len, change_gro_flush_timeout);
+}
+NETDEVICE_SHOW_RW(gro_flush_timeout, fmt_ulong);
+
 static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
 			     const char *buf, size_t len)
 {
@@ -422,6 +439,7 @@ static struct attribute *net_class_attrs[] = {
 	&dev_attr_mtu.attr,
 	&dev_attr_flags.attr,
 	&dev_attr_tx_queue_len.attr,
+	&dev_attr_gro_flush_timeout.attr,
 	&dev_attr_phys_port_id.attr,
 	NULL,
 };

^ permalink raw reply related

* [PATCH net-next] fou: Fix typo in returning flags in netlink
From: Tom Herbert @ 2014-11-06  0:49 UTC (permalink / raw)
  To: davem, netdev

When filling netlink info, dport is being returned as flags. Fix
instances to return correct value.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/ip_gre.c | 2 +-
 net/ipv4/ipip.c   | 2 +-
 net/ipv6/sit.c    | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 12055fd..ac84912 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -789,7 +789,7 @@ static int ipgre_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_u16(skb, IFLA_GRE_ENCAP_DPORT,
 			t->encap.dport) ||
 	    nla_put_u16(skb, IFLA_GRE_ENCAP_FLAGS,
-			t->encap.dport))
+			t->encap.flags))
 		goto nla_put_failure;
 
 	return 0;
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 37096d6..40403114 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -465,7 +465,7 @@ static int ipip_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_DPORT,
 			tunnel->encap.dport) ||
 	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_FLAGS,
-			tunnel->encap.dport))
+			tunnel->encap.flags))
 		goto nla_put_failure;
 
 	return 0;
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 58e5b47..45ad924 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1714,7 +1714,7 @@ static int ipip6_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_DPORT,
 			tunnel->encap.dport) ||
 	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_FLAGS,
-			tunnel->encap.dport))
+			tunnel->encap.flags))
 		goto nla_put_failure;
 
 	return 0;
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH net-next] sock.h: Remove unused NETDEBUG macro
From: Joe Perches @ 2014-11-05 23:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, LKML
In-Reply-To: <1415230568.6634.36.camel@perches.com>

It's unused now, just delete it.

Signed-off-by: Joe Perches <joe@perches.com>
---

Assuming the 2 NETDEBUG conversion deletion patches are applied...

 include/net/sock.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7db3db1..6767d75 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2280,9 +2280,6 @@ bool sk_net_capable(const struct sock *sk, int cap);
  *	Enable debug/info messages
  */
 extern int net_msg_warn;
-#define NETDEBUG(fmt, args...) \
-	do { if (net_msg_warn) printk(fmt,##args); } while (0)
-
 #define LIMIT_NETDEBUG(fmt, args...) \
 	do { if (net_msg_warn && net_ratelimit()) printk(fmt,##args); } while(0)

^ permalink raw reply related

* [PATCH net-next] net: esp: Convert NETDEBUG to pr_info
From: Joe Perches @ 2014-11-05 23:36 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller
  Cc: Patrick McHardy, Stephen Hemminger, netdev, LKML

Commit 64ce207306de ("[NET]: Make NETDEBUG pure printk wrappers")
originally had these NETDEBUG printks as always emitting.

Commit a2a316fd068c ("[NET]: Replace CONFIG_NET_DEBUG with sysctl")
added a net_msg_warn sysctl to these NETDEBUG uses.

Convert these NETDEBUG uses to normal pr_info calls.

This changes the output prefix from "ESP: " to include
"IPSec: " for the ipv4 case and "IPv6: " for the ipv6 case.

These output lines are now like the other messages in the files.

Other miscellanea:

Neaten the arithmetic spacing to be consistent with other
arithmetic spacing in the files.

Signed-off-by: Joe Perches <joe@perches.com>
---
 net/ipv4/esp4.c | 10 +++++-----
 net/ipv6/esp6.c | 10 +++++-----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index d2bf02e..60173d4 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -603,12 +603,12 @@ static int esp_init_authenc(struct xfrm_state *x)
 		BUG_ON(!aalg_desc);

 		err = -EINVAL;
-		if (aalg_desc->uinfo.auth.icv_fullbits/8 !=
+		if (aalg_desc->uinfo.auth.icv_fullbits / 8 !=
 		    crypto_aead_authsize(aead)) {
-			NETDEBUG(KERN_INFO "ESP: %s digestsize %u != %hu\n",
-				 x->aalg->alg_name,
-				 crypto_aead_authsize(aead),
-				 aalg_desc->uinfo.auth.icv_fullbits/8);
+			pr_info("ESP: %s digestsize %u != %hu\n",
+				x->aalg->alg_name,
+				crypto_aead_authsize(aead),
+				aalg_desc->uinfo.auth.icv_fullbits / 8);
 			goto free_key;
 		}

diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 83fc3a3..d21d7b2 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -544,12 +544,12 @@ static int esp_init_authenc(struct xfrm_state *x)
 		BUG_ON(!aalg_desc);

 		err = -EINVAL;
-		if (aalg_desc->uinfo.auth.icv_fullbits/8 !=
+		if (aalg_desc->uinfo.auth.icv_fullbits / 8 !=
 		    crypto_aead_authsize(aead)) {
-			NETDEBUG(KERN_INFO "ESP: %s digestsize %u != %hu\n",
-				 x->aalg->alg_name,
-				 crypto_aead_authsize(aead),
-				 aalg_desc->uinfo.auth.icv_fullbits/8);
+			pr_info("ESP: %s digestsize %u != %hu\n",
+				x->aalg->alg_name,
+				crypto_aead_authsize(aead),
+				aalg_desc->uinfo.auth.icv_fullbits / 8);
 			goto free_key;
 		}

^ permalink raw reply related

* Re: [PATCH net v4] ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
From: Hannes Frederic Sowa @ 2014-11-05 23:32 UTC (permalink / raw)
  To: Daniel Borkmann, davem; +Cc: lw1a2.jing, netdev, Eric Dumazet, David L Stevens
In-Reply-To: <1415215658-10054-1-git-send-email-dborkman@redhat.com>

On Wed, Nov 5, 2014, at 20:27, Daniel Borkmann wrote:
> It has been reported that generating an MLD listener report on
> devices with large MTUs (e.g. 9000) and a high number of IPv6
> addresses can trigger a skb_over_panic():
> 
> skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
> head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
> dev:port1
>  ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:100!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: ixgbe(O)
> CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
> [...]
> Call Trace:
>  <IRQ>
>  [<ffffffff80578226>] ? skb_put+0x3a/0x3b
>  [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
>  [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
>  [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
>  [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
>  [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
>  [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
>  [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
>  [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
>  [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
>  [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
> 
> mld_newpack() skb allocations are usually requested with dev->mtu
> in size, since commit 72e09ad107e7 ("ipv6: avoid high order allocations")
> we have changed the limit in order to be less likely to fail.
> 
> However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
> macros, which determine if we may end up doing an skb_put() for
> adding another record. To avoid possible fragmentation, we check
> the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
> assumption as the actual max allocation size can be much smaller.
> 
> The IGMP case doesn't have this issue as commit 57e1ab6eaddc
> ("igmp: refine skb allocations") stores the allocation size in
> the cb[].
> 
> Set a reserved_tailroom to make it fit into the MTU and use
> skb_availroom() helper instead. This also allows to get rid of
> igmp_skb_size().
> 
> Reported-by: Wei Liu <lw1a2.jing@gmail.com>
> Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations")
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: David L Stevens <david.stevens@oracle.com>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks and sorry for the back and forth, Daniel!

^ permalink raw reply

* Re: [PATCH 00/20] kselftest install target feature
From: Kees Cook @ 2014-11-05 23:23 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Greg KH, Andrew Morton, Michal Marek, David S. Miller, Phong Tran,
	David Herrmann, Hugh Dickins, pranith kumar, Eric W. Biederman,
	Serge E. Hallyn, linux-kbuild, LKML, Linux API,
	Network Development
In-Reply-To: <5459650E.6070201@osg.samsung.com>

On Tue, Nov 4, 2014 at 3:45 PM, Shuah Khan <shuahkh@osg.samsung.com> wrote:
> On 11/04/2014 12:22 PM, Kees Cook wrote:
>> On Tue, Nov 4, 2014 at 9:10 AM, Shuah Khan <shuahkh@osg.samsung.com> wrote:
>>> This patch series adds a new kselftest_install make target
>>> to enable selftest install. When make kselftest_install is
>>> run, selftests are installed on the system. A new install
>>> target is added to selftests Makefile which will install
>>> targets for the tests that are specified in INSTALL_TARGETS.
>>> During install, a script is generated to run tests that are
>>> installed. This script will be installed in the selftest install
>>> directory. Individual test Makefiles are changed to add to the
>>> script. This will allow new tests to add install and run test
>>> commands to the generated kselftest script.
>>
>> I'm all for making the self tests more available, but I don't think
>> this is the right approach. My primary objection is that it creates a
>> second way to run tests, and that means any changes and additions need
>> to be updated in two places. I'd much rather just maintain the single
>> "make" targets instead. Having "make" available on the target device
>> doesn't seem too bad to me. Is there a reason that doesn't work for
>> your situation?
>
> Kees,
>
> My primary objective is to provide a way to install selftests for a
> specific kernel release. This will allow developers to run tests for
> a specific release and look for regressions. Adding an install target
> will also help support local execution of tests in a virtualized
> environments. In some cases such as qemu, it is not practical to
> expect the target to have support for "make". Once tests are installed
> to be run outside the git environment, we need a master script that
> can run the tests. Hence the need for a master script that can run
> tests.
>
> We have the ability to run all tests via make kselftest target or
> run a specific test using the individual test's run_tests target.
> Both of above are necessary to support running tests from the tree.
> Embedding run_tests logic in the makefiles doesn't work very well
> in the long run.
>
> We also need a way to run them outside tree. I agree with you that
> the way I added the script generation, duplicates the code in individual
> run_tests targets and that changes/updates need to be made in both
> places.
>
> Would you be ok with the approach if I fixed the duplicating
> problem? I can address the duplication concern easily.

Yeah, getting rid of duplication would be much preferred. Thanks!

-Kees

>
>>
>> I would, however, like to see some better standardization of the test
>> "framework" that we've got in there already. (For example, some
>> failures fail the "make", some don't, there are various reporting
>> methods for success/failure depending on the test, etc.)
>
> This is being addressed and I have the framework in linux-kselftest
> git next branch at the moment. I do think the above work is part of
> addressing the larger framework issues such as being able to run tests
> on a target system that might not have "make" support and makes it
> easier to use.
>
> thanks,
> -- Shuah
>
>
> --
> Shuah Khan
> Sr. Linux Kernel Developer
> Samsung Research America (Silicon Valley)
> shuahkh@osg.samsung.com | (970) 217-8978



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply

* Re: [GIT net-next] Open vSwitch
From: Pravin Shelar @ 2014-11-05 22:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20141105.151047.1621156460688575485.davem@davemloft.net>

On Wed, Nov 5, 2014 at 12:10 PM, David Miller <davem@davemloft.net> wrote:
>
> Please do not submit your patches such that the email Date: field is
> the commit's date.  You're not posting these on Nov. 4th, yet that
> is the Date: field on all of the individual patch emails.
>
> I want them to be the date at the time you post the patch to the mailing
> list.
>
> Otherwise the ordering in patchwork is not cronological wrt. the list's
> postings and this makes my work more difficult than it needs to be.
>
Sorry about the Date field. NTP stopped working on my machine thats
why the date got messed up.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox