Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [patch net-next 1/2 v3] tc: add BPF based action
From: Jiri Pirko @ 2015-01-14 15:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Network Development, David S. Miller,
	Jamal Hadi Salim, Hannes Frederic Sowa
In-Reply-To: <CAMEtUux+HOOegzyi82fYT_JMDX_+d0dZCQ=Zc5GWC-awQmJu1A@mail.gmail.com>

Wed, Jan 14, 2015 at 04:39:34PM CET, ast@plumgrid.com wrote:
>On Wed, Jan 14, 2015 at 5:28 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>>
>> I'm still wondering about the drop semantics ... wouldn't it be more
>> intuitive to use 0 for drops in this context?
>
>good point.
>I think it must be 0 to match behavior of socket filters, etc.
>If program tries to access beyond packet size or does divide
>by zero if will be terminated and will return 0.
>So zero should be the safest action from caller point of view.


Will do. Thanks!

^ permalink raw reply

* Re: [PATCH net 0/3]tg3: synchronize_irq() should be called without taking locks
From: Peter Hurley @ 2015-01-14 15:57 UTC (permalink / raw)
  To: Prashant Sreedharan, davem; +Cc: netdev, mchan
In-Reply-To: <1421217039-11689-1-git-send-email-prashant@broadcom.com>

On 01/14/2015 01:30 AM, Prashant Sreedharan wrote:
> Prashant Sreedharan (3):
>   tg3_timer() should grab tp->lock before checking for tp->irq_sync
>   tg3_reset_task() needs to use rtnl_lock to synchronize
>   Release tp->lock before invoking synchronize_irq()

Thanks!

For series:

Reported-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Peter Hurley <peter@hurleysoftware.com>

But maybe one of these patches should reference that this fixes
BUG: sleeping function... so that others can quickly find this
fix (if they're bisecting or whatever). For the same reason, it
might be useful for this series to be just one patch.

Regards,
Peter Hurley

^ permalink raw reply

* Re: [3.19-rc3] tg3: BUG: sleeping function called from invalid context
From: Peter Hurley @ 2015-01-14 16:06 UTC (permalink / raw)
  To: Prashant Sreedharan; +Cc: Michael Chan, netdev, Linux kernel
In-Reply-To: <1421116255.16485.14.camel@prashant>

On 01/12/2015 09:30 PM, Prashant Sreedharan wrote:
> On Mon, 2015-01-12 at 19:59 -0500, Peter Hurley wrote:
>> On 3.19-rc3, I'm seeing this might_sleep() warning [1] from the tg3_open()
>> call stack. Let me know if I need to bisect this.
>>
>> Regards,
>> Peter Hurley
>>
>> [1]
>>
>> [   17.203009] BUG: sleeping function called from invalid context at /home/peter/src/kernels/mainline/kernel/irq/manage.c:104
>> [   17.203067] in_atomic(): 1, irqs_disabled(): 0, pid: 1106, name: ip
>> [   17.203092] 2 locks held by ip/1106:
>> [   17.205255]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816adf1f>] rtnetlink_rcv+0x1f/0x40
>> [   17.207445]  #1:  (&(&tp->lock)->rlock){+.....}, at: [<ffffffffa01073e6>] tg3_start+0xc06/0x11f0 [tg3]
>> [   17.209725] CPU: 2 PID: 1106 Comm: ip Not tainted 3.19.0-rc3+wip-xeon+lockdep #rc3+wip
>> [   17.211900] Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 04/30/2012
>> [   17.214086]  0000000000000068 ffff8802ac823498 ffffffff817af7e8 0000000000000005
>> [   17.216265]  ffffffff81a9be78 ffff8802ac8234a8 ffffffff810998a5 ffff8802ac8234d8
>> [   17.218446]  ffffffff8109991a ffff8802ac8234c8 ffff8802af0aae00 ffffffffa00ed000
>> [   17.220636] Call Trace:
>> [   17.222743]  [<ffffffff817af7e8>] dump_stack+0x4f/0x7b
>> [   17.224808]  [<ffffffff810998a5>] ___might_sleep+0x105/0x140
>> [   17.226842]  [<ffffffff8109991a>] __might_sleep+0x3a/0xa0
>> [   17.228869]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
>> [   17.230939]  [<ffffffff810d7d78>] synchronize_irq+0x38/0xa0
>> [   17.232967]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
>> [   17.234991]  [<ffffffffa010105f>] tg3_chip_reset+0x13f/0x9c0 [tg3]
>> [   17.236988]  [<ffffffffa01020ae>] tg3_reset_hw+0x7e/0x2d20 [tg3]
>> [   17.238996]  [<ffffffff813bfaff>] ? __udelay+0x2f/0x40
>> [   17.241007]  [<ffffffffa00ef2f7>] ? _tw32_flush+0x47/0x80 [tg3]
>> [   17.243066]  [<ffffffffa0104dac>] tg3_init_hw+0x5c/0x70 [tg3]
>> [   17.245438]  [<ffffffffa010740b>] tg3_start+0xc2b/0x11f0 [tg3]
>> [   17.247444]  [<ffffffffa0107ad7>] ? tg3_open+0x107/0x2e0 [tg3]
>> [   17.249556]  [<ffffffff810c338d>] ? trace_hardirqs_on+0xd/0x10
>> [   17.251581]  [<ffffffff8107806f>] ? __local_bh_enable_ip+0x6f/0x100
>> [   17.253710]  [<ffffffffa0107af8>] tg3_open+0x128/0x2e0 [tg3]
>> [   17.255758]  [<ffffffff816ba3f5>] ? netpoll_poll_disable+0x5/0xa0
>> [   17.257932]  [<ffffffff816a14af>] __dev_open+0xbf/0x140
>> [   17.260091]  [<ffffffff816a17c1>] __dev_change_flags+0xa1/0x160
>> [   17.262222]  [<ffffffff816a18a9>] dev_change_flags+0x29/0x60
>> [   17.264360]  [<ffffffff816b0e02>] do_setlink+0x2f2/0xa30
>> [   17.266431]  [<ffffffff816b1b7f>] rtnl_newlink+0x51f/0x750
>> [   17.268485]  [<ffffffff816b1749>] ? rtnl_newlink+0xe9/0x750
>> [   17.270483]  [<ffffffff811869c2>] ? free_pages_prepare+0x1d2/0x270
>> [   17.272507]  [<ffffffff810c32bd>] ? trace_hardirqs_on_caller+0x11d/0x1e0
>> [   17.274531]  [<ffffffff813dd1b2>] ? nla_parse+0x32/0x120
>> [   17.276531]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [   17.278514]  [<ffffffff816adfd5>] rtnetlink_rcv_msg+0x95/0x250
>> [   17.280485]  [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
>> [   17.282448]  [<ffffffff817b4a02>] ? mutex_lock_nested+0x382/0x530
>> [   17.284402]  [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
>> [   17.286290]  [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
>> [   17.288142]  [<ffffffff816adf40>] ? rtnetlink_rcv+0x40/0x40
>> [   17.290031]  [<ffffffff816cedc1>] netlink_rcv_skb+0xc1/0xe0
>> [   17.291836]  [<ffffffff816adf2e>] rtnetlink_rcv+0x2e/0x40
>> [   17.293615]  [<ffffffff816ce473>] netlink_unicast+0xf3/0x1d0
>> [   17.295420]  [<ffffffff816ce863>] netlink_sendmsg+0x313/0x690
>> [   17.297132]  [<ffffffff811ada4f>] ? might_fault+0x5f/0xb0
>> [   17.298799]  [<ffffffff8168253c>] do_sock_sendmsg+0x8c/0x100
>> [   17.300493]  [<ffffffff81681e3e>] ? copy_msghdr_from_user+0x15e/0x1f0
>> [   17.302173]  [<ffffffff81682aeb>] ___sys_sendmsg+0x30b/0x320
>> [   17.303798]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [   17.305431]  [<ffffffff810bdee0>] ? cpuacct_account_field+0x80/0xb0
>> [   17.307085]  [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [   17.308744]  [<ffffffff810a4f35>] ? sched_clock_local+0x25/0x90
>> [   17.310375]  [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
>> [   17.311948]  [<ffffffff810a5198>] ? sched_clock_cpu+0xb8/0xe0
>> [   17.313509]  [<ffffffff810bf8be>] ? put_lock_stats.isra.26+0xe/0x30
>> [   17.315069]  [<ffffffff810c007e>] ? lock_release_holdtime.part.27+0x12e/0x1b0
>> [   17.316618]  [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
>> [   17.318162]  [<ffffffff8109f5d1>] ? get_parent_ip+0x11/0x50
>> [   17.319703]  [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
>> [   17.321235]  [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
>> [   17.322732]  [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
>> [   17.324197]  [<ffffffff816834f2>] __sys_sendmsg+0x42/0x80
>> [   17.325634]  [<ffffffff81683542>] SyS_sendmsg+0x12/0x20
>> [   17.327048]  [<ffffffff817ba12d>] system_call_fastpath+0x16/0x1b
> 
> Please bisect, there hasn't been tg3 code changes in this path that
> might cause this.

What triggers this is the new debugging code added to catch nested
sleeps; specifically e22b886 ("sched/wait: Add might_sleep() checks").

Regards,
Peter Hurley

^ permalink raw reply

* Re: [PATCH iproute2 0/3] ip netns: Run over all netns
From: Vadim Kochan @ 2015-01-14 16:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Vadim Kochan, netdev
In-Reply-To: <20150113172837.793f0969@urahara>

On Tue, Jan 13, 2015 at 05:28:37PM -0800, Stephen Hemminger wrote:
> On Wed,  7 Jan 2015 13:04:19 +0200
> Vadim Kochan <vadim4j@gmail.com> wrote:
> 
> > From: Vadim Kochan <vadim4j@gmail.com>
> > 
> > Allow 'ip netns del' and 'ip netns exec' run over each network namespace names.
> > 
> > 'ip netns exec' executes command forcely on eacn nsname.
> > 
> > Vadim Kochan (3):
> >   lib: Exec func on each netns
> >   ip netns: Allow exec on each netns
> >   ip netns: Delete all netns
> > 
> >  include/namespace.h |  6 ++++
> >  include/utils.h     |  5 +++
> >  ip/ipnetns.c        | 96 ++++++++++++++++++++++++++++++++---------------------
> >  lib/namespace.c     | 22 ++++++++++++
> >  lib/utils.c         | 28 ++++++++++++++++
> >  man/man8/ip-netns.8 | 26 ++++++++++++---
> >  6 files changed, 141 insertions(+), 42 deletions(-)
> > 
> 
> It is a useful concept but as others have pointed out the idea of reserving
> a keyword 'all' at this point is likely to cause somebody to break.
> So sorry.
OK, then what about additional option like:

    $ ip -all netns exec ...

?

Thanks,

^ permalink raw reply

* Re: [PATCH v4 20/20] kbuild: add a new kselftest_install make target to install selftests
From: Shuah Khan @ 2015-01-14 16:32 UTC (permalink / raw)
  To: mmarek, masami.hiramatsu.pt
  Cc: gregkh, akpm, rostedt, mingo, davem, keescook, tranmanphong, mpe,
	cov, dh.herrmann, hughd, bobby.prani, serge.hallyn, ebiederm,
	tim.bird, josh, koct9i, linux-kbuild, linux-kernel, linux-api,
	netdev
In-Reply-To: <2c5a28faaa79d9c2415854a08817ada509fcb943.1420571615.git.shuahkh@osg.samsung.com>

On 01/06/2015 12:43 PM, Shuah Khan wrote:
> Add a new make target to install to install kernel selftests.
> This new target will build and install selftests. kselftest
> target now depends on kselftest_install and runs the generated
> kselftest script to reduce duplicate work and for common look
> and feel when running tests.
> 
> make kselftest_target:
> -- exports kselftest INSTALL_KSFT_PATH
>    default $(INSTALL_MOD_PATH)/lib/kselftest/$(KERNELRELEASE)
> -- exports INSTALL_KSFT_PATH
> -- runs selftests make install target
> 
> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
> ---
>  Makefile | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)

Hi Marek,

Could you please Ack this patch, if this version looks good,
so I can take this through ksefltest tree.

thanks,
-- Shuah


-- 
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
shuahkh@osg.samsung.com | (970) 217-8978

^ permalink raw reply

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
From: Eric Dumazet @ 2015-01-14 16:34 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: arnd, robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
	linux-arm-kernel, xuwei5, zhangfei.gao, netdev, devicetree, linux
In-Reply-To: <1421217254-12008-4-git-send-email-dingtianhong@huawei.com>

On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
> 
> v13: Fix the problem of alignment parameters for function and checkpatch warming.
> 
> v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
>      for hip04 ethernet.
> 
> v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
>      is not supported for this patch, but I think it could work for hip04,
>      will support it later after some tests for performance better.
> 
>      Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
>      it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
> 
>      - Before:
>      $ ping 192.168.1.1 ...
>      === 192.168.1.1 ping statistics ===
>      24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>      rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
> 
>      $ iperf -c 192.168.1.1 ...
>      [ ID] Interval       Transfer     Bandwidth
>      [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec
> 
>      - After:
>      $ ping 192.168.1.1 ...
>      === 192.168.1.1 ping statistics ===
>      24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>      rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
> 
>      $ iperf -c 192.168.1.1 ...
>      [ ID] Interval       Transfer     Bandwidth
>      [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec
> 
> v10: According David Miller and Arnd Bergmann's suggestion, add some modification
>      for v9 version
>      - drop the workqueue
>      - batch cleanup based on tx_coalesce_frames/usecs for better throughput
>      - use a reasonable default tx timeout (200us, could be shorted
>        based on measurements) with a range timer
>      - fix napi poll function return value
>      - use a lockless queue for cleanup
> 
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>  drivers/net/ethernet/hisilicon/Makefile    |   2 +-
>  drivers/net/ethernet/hisilicon/hip04_eth.c | 969 +++++++++++++++++++++++++++++
>  2 files changed, 970 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
> 
> diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
> index 40115a7..6c14540 100644
> --- a/drivers/net/ethernet/hisilicon/Makefile
> +++ b/drivers/net/ethernet/hisilicon/Makefile
> @@ -3,4 +3,4 @@
>  #
>  
>  obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
> -obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
> +obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
> new file mode 100644
> index 0000000..525214e
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
> @@ -0,0 +1,969 @@
> +
> +/* Copyright (c) 2014 Linaro Ltd.
> + * Copyright (c) 2014 Hisilicon Limited.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/etherdevice.h>
> +#include <linux/platform_device.h>
> +#include <linux/interrupt.h>
> +#include <linux/ktime.h>
> +#include <linux/of_address.h>
> +#include <linux/phy.h>
> +#include <linux/of_mdio.h>
> +#include <linux/of_net.h>
> +#include <linux/mfd/syscon.h>
> +#include <linux/regmap.h>
> +
> +#define PPE_CFG_RX_ADDR			0x100
> +#define PPE_CFG_POOL_GRP		0x300
> +#define PPE_CFG_RX_BUF_SIZE		0x400
> +#define PPE_CFG_RX_FIFO_SIZE		0x500
> +#define PPE_CURR_BUF_CNT		0xa200
> +
> +#define GE_DUPLEX_TYPE			0x08
> +#define GE_MAX_FRM_SIZE_REG		0x3c
> +#define GE_PORT_MODE			0x40
> +#define GE_PORT_EN			0x44
> +#define GE_SHORT_RUNTS_THR_REG		0x50
> +#define GE_TX_LOCAL_PAGE_REG		0x5c
> +#define GE_TRANSMIT_CONTROL_REG		0x60
> +#define GE_CF_CRC_STRIP_REG		0x1b0
> +#define GE_MODE_CHANGE_REG		0x1b4
> +#define GE_RECV_CONTROL_REG		0x1e0
> +#define GE_STATION_MAC_ADDRESS		0x210
> +#define PPE_CFG_CPU_ADD_ADDR		0x580
> +#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
> +#define PPE_CFG_BUS_CTRL_REG		0x424
> +#define PPE_CFG_RX_CTRL_REG		0x428
> +#define PPE_CFG_RX_PKT_MODE_REG		0x438
> +#define PPE_CFG_QOS_VMID_GEN		0x500
> +#define PPE_CFG_RX_PKT_INT		0x538
> +#define PPE_INTEN			0x600
> +#define PPE_INTSTS			0x608
> +#define PPE_RINT			0x604
> +#define PPE_CFG_STS_MODE		0x700
> +#define PPE_HIS_RX_PKT_CNT		0x804
> +
> +/* REG_INTERRUPT */
> +#define RCV_INT				BIT(10)
> +#define RCV_NOBUF			BIT(8)
> +#define RCV_DROP			BIT(7)
> +#define TX_DROP				BIT(6)
> +#define DEF_INT_ERR			(RCV_NOBUF | RCV_DROP | TX_DROP)
> +#define DEF_INT_MASK			(RCV_INT | DEF_INT_ERR)
> +
> +/* TX descriptor config */
> +#define TX_FREE_MEM			BIT(0)
> +#define TX_READ_ALLOC_L3		BIT(1)
> +#define TX_FINISH_CACHE_INV		BIT(2)
> +#define TX_CLEAR_WB			BIT(4)
> +#define TX_L3_CHECKSUM			BIT(5)
> +#define TX_LOOP_BACK			BIT(11)
> +
> +/* RX error */
> +#define RX_PKT_DROP			BIT(0)
> +#define RX_L2_ERR			BIT(1)
> +#define RX_PKT_ERR			(RX_PKT_DROP | RX_L2_ERR)
> +
> +#define SGMII_SPEED_1000		0x08
> +#define SGMII_SPEED_100			0x07
> +#define SGMII_SPEED_10			0x06
> +#define MII_SPEED_100			0x01
> +#define MII_SPEED_10			0x00
> +
> +#define GE_DUPLEX_FULL			BIT(0)
> +#define GE_DUPLEX_HALF			0x00
> +#define GE_MODE_CHANGE_EN		BIT(0)
> +
> +#define GE_TX_AUTO_NEG			BIT(5)
> +#define GE_TX_ADD_CRC			BIT(6)
> +#define GE_TX_SHORT_PAD_THROUGH		BIT(7)
> +
> +#define GE_RX_STRIP_CRC			BIT(0)
> +#define GE_RX_STRIP_PAD			BIT(3)
> +#define GE_RX_PAD_EN			BIT(4)
> +
> +#define GE_AUTO_NEG_CTL			BIT(0)
> +
> +#define GE_RX_INT_THRESHOLD		BIT(6)
> +#define GE_RX_TIMEOUT			0x04
> +
> +#define GE_RX_PORT_EN			BIT(1)
> +#define GE_TX_PORT_EN			BIT(2)
> +
> +#define PPE_CFG_STS_RX_PKT_CNT_RC	BIT(12)
> +
> +#define PPE_CFG_RX_PKT_ALIGN		BIT(18)
> +#define PPE_CFG_QOS_VMID_MODE		BIT(14)
> +#define PPE_CFG_QOS_VMID_GRP_SHIFT	8
> +
> +#define PPE_CFG_RX_FIFO_FSFU		BIT(11)
> +#define PPE_CFG_RX_DEPTH_SHIFT		16
> +#define PPE_CFG_RX_START_SHIFT		0
> +#define PPE_CFG_RX_CTRL_ALIGN_SHIFT	11
> +
> +#define PPE_CFG_BUS_LOCAL_REL		BIT(14)
> +#define PPE_CFG_BUS_BIG_ENDIEN		BIT(0)
> +
> +#define RX_DESC_NUM			128
> +#define TX_DESC_NUM			256
> +#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
> +#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
> +
> +#define GMAC_PPE_RX_PKT_MAX_LEN		379
> +#define GMAC_MAX_PKT_LEN		1516
> +#define GMAC_MIN_PKT_LEN		31
> +#define RX_BUF_SIZE			1600
> +#define RESET_TIMEOUT			1000
> +#define TX_TIMEOUT			(6 * HZ)
> +
> +#define DRV_NAME			"hip04-ether"
> +#define DRV_VERSION			"v1.0"
> +
> +#define HIP04_MAX_TX_COALESCE_USECS	200
> +#define HIP04_MIN_TX_COALESCE_USECS	100
> +#define HIP04_MAX_TX_COALESCE_FRAMES	200
> +#define HIP04_MIN_TX_COALESCE_FRAMES	100
> +
> +struct tx_desc {
> +	u32 send_addr;

	__be32 send_adddr; ?

> +	u32 send_size;

	__be32

> +	u32 next_addr;
	__be32

> +	u32 cfg;
	__be32

> +	u32 wb_addr;
	__be32 wb_addr ?

> +} __aligned(64);
> +
> +struct rx_desc {
> +	u16 reserved_16;
> +	u16 pkt_len;
> +	u32 reserve1[3];
> +	u32 pkt_err;
> +	u32 reserve2[4];
> +};
> +
> +struct hip04_priv {
> +	void __iomem *base;
> +	int phy_mode;
> +	int chan;
> +	unsigned int port;
> +	unsigned int speed;
> +	unsigned int duplex;
> +	unsigned int reg_inten;
> +
> +	struct napi_struct napi;
> +	struct net_device *ndev;
> +
> +	struct tx_desc *tx_desc;
> +	dma_addr_t tx_desc_dma;
> +	struct sk_buff *tx_skb[TX_DESC_NUM];
> +	dma_addr_t tx_phys[TX_DESC_NUM];

This is not an efficient way to store skb/phys, as for each skb, info
will be store in 2 separate cache lines.

It would be better to use a 

struct hip04_tx_desc {
   struct sk_buff   *skb;
   dma_addr_t       phys;
} 

> +	unsigned int tx_head;
> +
> +	int tx_coalesce_frames;
> +	int tx_coalesce_usecs;
> +	struct hrtimer tx_coalesce_timer;
> +
> +	unsigned char *rx_buf[RX_DESC_NUM];
> +	dma_addr_t rx_phys[RX_DESC_NUM];

Same thing here : Use a struct to get better data locality.

> +	unsigned int rx_head;
> +	unsigned int rx_buf_size;
> +
> +	struct device_node *phy_node;
> +	struct phy_device *phy;
> +	struct regmap *map;
> +	struct work_struct tx_timeout_task;
> +
> +	/* written only by tx cleanup */
> +	unsigned int tx_tail ____cacheline_aligned_in_smp;
> +};
> +
> +static inline unsigned int tx_count(unsigned int head, unsigned int tail)
> +{
> +	return (head - tail) % (TX_DESC_NUM - 1);
> +}
> +
> +static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	u32 val;
> +
> +	priv->speed = speed;
> +	priv->duplex = duplex;
> +
> +	switch (priv->phy_mode) {
> +	case PHY_INTERFACE_MODE_SGMII:
> +		if (speed == SPEED_1000)
> +			val = SGMII_SPEED_1000;
> +		else if (speed == SPEED_100)
> +			val = SGMII_SPEED_100;
> +		else
> +			val = SGMII_SPEED_10;
> +		break;
> +	case PHY_INTERFACE_MODE_MII:
> +		if (speed == SPEED_100)
> +			val = MII_SPEED_100;
> +		else
> +			val = MII_SPEED_10;
> +		break;
> +	default:
> +		netdev_warn(ndev, "not supported mode\n");
> +		val = MII_SPEED_10;
> +		break;
> +	}
> +	writel_relaxed(val, priv->base + GE_PORT_MODE);
> +
> +	val = duplex ? GE_DUPLEX_FULL : GE_DUPLEX_HALF;
> +	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
> +
> +	val = GE_MODE_CHANGE_EN;
> +	writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
> +}
> +
> +static void hip04_reset_ppe(struct hip04_priv *priv)
> +{
> +	u32 val, tmp, timeout = 0;
> +
> +	do {
> +		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
> +		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
> +		if (timeout++ > RESET_TIMEOUT)
> +			break;
> +	} while (val & 0xfff);
> +}
> +
> +static void hip04_config_fifo(struct hip04_priv *priv)
> +{
> +	u32 val;
> +
> +	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
> +	val |= PPE_CFG_STS_RX_PKT_CNT_RC;
> +	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
> +
> +	val = BIT(priv->port);
> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
> +
> +	val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
> +	val |= PPE_CFG_QOS_VMID_MODE;
> +	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
> +
> +	val = RX_BUF_SIZE;
> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
> +
> +	val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
> +	val |= PPE_CFG_RX_FIFO_FSFU;
> +	val |= priv->chan << PPE_CFG_RX_START_SHIFT;
> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
> +
> +	val = NET_IP_ALIGN << PPE_CFG_RX_CTRL_ALIGN_SHIFT;
> +	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
> +
> +	val = PPE_CFG_RX_PKT_ALIGN;
> +	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
> +
> +	val = PPE_CFG_BUS_LOCAL_REL | PPE_CFG_BUS_BIG_ENDIEN;
> +	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
> +
> +	val = GMAC_PPE_RX_PKT_MAX_LEN;
> +	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
> +
> +	val = GMAC_MAX_PKT_LEN;
> +	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
> +
> +	val = GMAC_MIN_PKT_LEN;
> +	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
> +
> +	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
> +	val |= GE_TX_AUTO_NEG | GE_TX_ADD_CRC | GE_TX_SHORT_PAD_THROUGH;
> +	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
> +
> +	val = GE_RX_STRIP_CRC;
> +	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
> +
> +	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
> +	val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
> +	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
> +
> +	val = GE_AUTO_NEG_CTL;
> +	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
> +}
> +
> +static void hip04_mac_enable(struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	u32 val;
> +
> +	/* enable tx & rx */
> +	val = readl_relaxed(priv->base + GE_PORT_EN);
> +	val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
> +	writel_relaxed(val, priv->base + GE_PORT_EN);
> +
> +	/* clear rx int */
> +	val = RCV_INT;
> +	writel_relaxed(val, priv->base + PPE_RINT);
> +
> +	/* config recv int */
> +	val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
> +	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
> +
> +	/* enable interrupt */
> +	priv->reg_inten = DEF_INT_MASK;
> +	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +}
> +
> +static void hip04_mac_disable(struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	u32 val;
> +
> +	/* disable int */
> +	priv->reg_inten &= ~(DEF_INT_MASK);
> +	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +
> +	/* disable tx & rx */
> +	val = readl_relaxed(priv->base + GE_PORT_EN);
> +	val &= ~(GE_RX_PORT_EN | GE_TX_PORT_EN);
> +	writel_relaxed(val, priv->base + GE_PORT_EN);
> +}
> +
> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> +	writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
> +}
> +
> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
> +}
> +
> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
> +{
> +	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
> +}
> +
> +static void hip04_update_mac_address(struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +
> +	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
> +		       priv->base + GE_STATION_MAC_ADDRESS);
> +	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
> +			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
> +		       priv->base + GE_STATION_MAC_ADDRESS + 4);
> +}
> +
> +static int hip04_set_mac_address(struct net_device *ndev, void *addr)
> +{
> +	eth_mac_addr(ndev, addr);
> +	hip04_update_mac_address(ndev);
> +	return 0;
> +}
> +
> +static int hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	unsigned tx_tail = priv->tx_tail;
> +	struct tx_desc *desc;
> +	unsigned int bytes_compl = 0, pkts_compl = 0;
> +	unsigned int count;
> +
> +	smp_rmb();
> +	count = tx_count(ACCESS_ONCE(priv->tx_head), tx_tail);
> +	if (count == 0)
> +		goto out;
> +
> +	while (count) {
> +		desc = &priv->tx_desc[tx_tail];
> +		if (desc->send_addr != 0) {
> +			if (force)
> +				desc->send_addr = 0;
> +			else
> +				break;
> +		}
> +
> +		if (priv->tx_phys[tx_tail]) {
> +			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> +					 priv->tx_skb[tx_tail]->len,
> +					 DMA_TO_DEVICE);
> +			priv->tx_phys[tx_tail] = 0;
> +		}
> +		pkts_compl++;
> +		bytes_compl += priv->tx_skb[tx_tail]->len;
> +		dev_kfree_skb(priv->tx_skb[tx_tail]);
> +		priv->tx_skb[tx_tail] = NULL;
> +		tx_tail = TX_NEXT(tx_tail);
> +		count--;
> +	}
> +
> +	priv->tx_tail = tx_tail;
> +	smp_wmb(); /* Ensure tx_tail visible to xmit */
> +
> +out:
> +	if (pkts_compl || bytes_compl)

Testing bytes_compl should be enough : There is no way pkt_compl could
be 0 if bytes_compl is not 0.

> +		netdev_completed_queue(ndev, pkts_compl, bytes_compl);
> +
> +	if (unlikely(netif_queue_stopped(ndev)) && (count < (TX_DESC_NUM - 1)))
> +		netif_wake_queue(ndev);
> +
> +	return count;
> +}
> +
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	struct net_device_stats *stats = &ndev->stats;
> +	unsigned int tx_head = priv->tx_head, count;
> +	struct tx_desc *desc = &priv->tx_desc[tx_head];
> +	dma_addr_t phys;
> +
> +	smp_rmb();
> +	count = tx_count(tx_head, ACCESS_ONCE(priv->tx_tail));
> +	if (count == (TX_DESC_NUM - 1)) {
> +		netif_stop_queue(ndev);
> +		return NETDEV_TX_BUSY;
> +	}
> +
> +	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> +	if (dma_mapping_error(&ndev->dev, phys)) {
> +		dev_kfree_skb(skb);
> +		return NETDEV_TX_OK;
> +	}
> +
> +	priv->tx_skb[tx_head] = skb;
> +	priv->tx_phys[tx_head] = phys;
> +	desc->send_addr = cpu_to_be32(phys);
> +	desc->send_size = cpu_to_be32(skb->len);
> +	desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
> +	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> +	desc->wb_addr = cpu_to_be32(phys);
> +	skb_tx_timestamp(skb);
> +
> +	hip04_set_xmit_desc(priv, phys);
> +	priv->tx_head = TX_NEXT(tx_head);
> +	count++;

Starting from this point, skb might already have been freed by TX
completion.

Its racy to access skb->len 

> +	netdev_sent_queue(ndev, skb->len);
> +
> +	stats->tx_bytes += skb->len;
> +	stats->tx_packets++;
> +
> +	/* Ensure tx_head update visible to tx reclaim */
> +	smp_wmb();
> +
> +	/* queue is getting full, better start cleaning up now */
> +	if (count >= priv->tx_coalesce_frames) {
> +		if (napi_schedule_prep(&priv->napi)) {
> +			/* disable rx interrupt and timer */
> +			priv->reg_inten &= ~(RCV_INT);
> +			writel_relaxed(DEF_INT_MASK & ~RCV_INT,
> +				       priv->base + PPE_INTEN);
> +			hrtimer_cancel(&priv->tx_coalesce_timer);
> +			__napi_schedule(&priv->napi);
> +		}
> +	} else if (!hrtimer_is_queued(&priv->tx_coalesce_timer)) {
> +		/* cleanup not pending yet, start a new timer */
> +		hrtimer_start_expires(&priv->tx_coalesce_timer,
> +				      HRTIMER_MODE_REL);
> +	}
> +
> +	return NETDEV_TX_OK;
> +}
> +
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
> +	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
> +	struct net_device *ndev = priv->ndev;
> +	struct net_device_stats *stats = &ndev->stats;
> +	unsigned int cnt = hip04_recv_cnt(priv);
> +	struct rx_desc *desc;
> +	struct sk_buff *skb;
> +	unsigned char *buf;
> +	bool last = false;
> +	dma_addr_t phys;
> +	int rx = 0;
> +	int tx_remaining;
> +	u16 len;
> +	u32 err;
> +
> +	while (cnt && !last) {
> +		buf = priv->rx_buf[priv->rx_head];
> +		skb = build_skb(buf, priv->rx_buf_size);
> +		if (unlikely(!skb))
> +			net_dbg_ratelimited("build_skb failed\n");

Well, is skb is NULL, you're crashing later...
You really have to address a memory allocation error much better than
that !

> +
> +		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
> +				 RX_BUF_SIZE, DMA_FROM_DEVICE);
> +		priv->rx_phys[priv->rx_head] = 0;
> +
> +		desc = (struct rx_desc *)skb->data;
> +		len = be16_to_cpu(desc->pkt_len);
> +		err = be32_to_cpu(desc->pkt_err);
> +
> +		if (0 == len) {
> +			dev_kfree_skb_any(skb);
> +			last = true;
> +		} else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
> +			dev_kfree_skb_any(skb);
> +			stats->rx_dropped++;
> +			stats->rx_errors++;
> +		} else {
> +			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> +			skb_put(skb, len);
> +			skb->protocol = eth_type_trans(skb, ndev);
> +			napi_gro_receive(&priv->napi, skb);
> +			stats->rx_packets++;
> +			stats->rx_bytes += len;
> +			rx++;
> +		}
> +
> +		buf = netdev_alloc_frag(priv->rx_buf_size);
> +		if (!buf)
> +			goto done;

Same problem here : In case of memory allocation error, your driver is
totally screwed.

> +		phys = dma_map_single(&ndev->dev, buf,
> +				      RX_BUF_SIZE, DMA_FROM_DEVICE);
> +		if (dma_mapping_error(&ndev->dev, phys))
> +			goto done;

Same problem here : You really have to recover properly.

> +		priv->rx_buf[priv->rx_head] = buf;
> +		priv->rx_phys[priv->rx_head] = phys;
> +		hip04_set_recv_desc(priv, phys);
> +
> +		priv->rx_head = RX_NEXT(priv->rx_head);
> +		if (rx >= budget)
> +			goto done;
> +
> +		if (--cnt == 0)
> +			cnt = hip04_recv_cnt(priv);
> +	}
> +
> +	if (!(priv->reg_inten & RCV_INT)) {
> +		/* enable rx interrupt */
> +		priv->reg_inten |= RCV_INT;
> +		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +	}
> +	napi_complete(napi);
> +done:
> +	/* clean up tx descriptors and start a new timer if necessary */
> +	tx_remaining = hip04_tx_reclaim(ndev, false);
> +	if (rx < budget && tx_remaining)
> +		hrtimer_start_expires(&priv->tx_coalesce_timer, HRTIMER_MODE_REL);
> +
> +	return rx;
> +}
> +

^ permalink raw reply

* [patch-net-next v2 2/3] net: ethernet: cpsw: split out IRQ handler
From: Felipe Balbi @ 2015-01-14 16:58 UTC (permalink / raw)
  To: davem
  Cc: Tony Lindgren, Linux OMAP Mailing List, mugunthanvnm, netdev,
	Felipe Balbi
In-Reply-To: <1421254729-10602-1-git-send-email-balbi@ti.com>

Now we can introduce dedicated IRQ handlers
for each of the IRQ events. This helps with
cleaning up a little bit of the clutter in
cpsw_interrupt() while also making sure that
TX IRQs will try to handle TX buffers while
RX IRQs will try to handle RX buffers.

Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 drivers/net/ethernet/ti/cpsw.c | 41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 8e1af51e4b76..c6c483f3e49f 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -754,18 +754,36 @@ requeue:
 		dev_kfree_skb_any(new_skb);
 }
 
-static irqreturn_t cpsw_interrupt(int irq, void *dev_id)
+static irqreturn_t cpsw_dummy_interrupt(int irq, void *dev_id)
 {
 	struct cpsw_priv *priv = dev_id;
 	int value = irq - priv->irqs_table[0];
 
-	/* NOTICE: Ending IRQ here. The trick with the 'value' variable above
-	 * is to make sure we will always write the correct value to the EOI
-	 * register. Namely 0 for RX_THRESH Interrupt, 1 for RX Interrupt, 2
-	 * for TX Interrupt and 3 for MISC Interrupt.
-	 */
 	cpdma_ctlr_eoi(priv->dma, value);
 
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id)
+{
+	struct cpsw_priv *priv = dev_id;
+
+	cpdma_ctlr_eoi(priv->dma, CPDMA_EOI_TX);
+	cpdma_chan_process(priv->txch, 128);
+
+	priv = cpsw_get_slave_priv(priv, 1);
+	if (priv)
+		cpdma_chan_process(priv->txch, 128);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id)
+{
+	struct cpsw_priv *priv = dev_id;
+
+	cpdma_ctlr_eoi(priv->dma, CPDMA_EOI_RX);
+
 	cpsw_intr_disable(priv);
 	if (priv->irq_enabled == true) {
 		cpsw_disable_irq(priv);
@@ -1617,7 +1635,8 @@ static void cpsw_ndo_poll_controller(struct net_device *ndev)
 
 	cpsw_intr_disable(priv);
 	cpdma_ctlr_int_ctrl(priv->dma, false);
-	cpsw_interrupt(ndev->irq, priv);
+	cpsw_rx_interrupt(priv->irq[1], priv);
+	cpsw_tx_interrupt(priv->irq[2], priv);
 	cpdma_ctlr_int_ctrl(priv->dma, true);
 	cpsw_intr_enable(priv);
 }
@@ -2351,7 +2370,7 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_ale_ret;
 
 	priv->irqs_table[0] = irq;
-	ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+	ret = devm_request_irq(&pdev->dev, irq, cpsw_dummy_interrupt,
 			       0, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
 		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
@@ -2363,7 +2382,7 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_ale_ret;
 
 	priv->irqs_table[1] = irq;
-	ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+	ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt,
 			       0, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
 		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
@@ -2375,7 +2394,7 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_ale_ret;
 
 	priv->irqs_table[2] = irq;
-	ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+	ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt,
 			       0, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
 		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
@@ -2387,7 +2406,7 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_ale_ret;
 
 	priv->irqs_table[3] = irq;
-	ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+	ret = devm_request_irq(&pdev->dev, irq, cpsw_dummy_interrupt,
 			       0, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
 		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
-- 
2.2.0


^ permalink raw reply related

* [patch-net-next v2 1/3] net: ethernet: cpsw: unroll IRQ request loop
From: Felipe Balbi @ 2015-01-14 16:58 UTC (permalink / raw)
  To: davem
  Cc: Tony Lindgren, Linux OMAP Mailing List, mugunthanvnm, netdev,
	Felipe Balbi

This patch is in preparation for a nicer IRQ
handling scheme where we use different IRQ
handlers for each IRQ line (as it should be).

Later, we will also drop IRQs offset 0 and 3
because they are always disabled in this driver.

Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 drivers/net/ethernet/ti/cpsw.c | 62 ++++++++++++++++++++++++++++++++----------
 1 file changed, 47 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index e61ee8351272..8e1af51e4b76 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2156,7 +2156,8 @@ static int cpsw_probe(struct platform_device *pdev)
 	void __iomem			*ss_regs;
 	struct resource			*res, *ss_res;
 	u32 slave_offset, sliver_offset, slave_size;
-	int ret = 0, i, k = 0;
+	int ret = 0, i;
+	int irq;
 
 	ndev = alloc_etherdev(sizeof(struct cpsw_priv));
 	if (!ndev) {
@@ -2345,24 +2346,55 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_ale_ret;
 	}
 
-	while ((res = platform_get_resource(priv->pdev, IORESOURCE_IRQ, k))) {
-		if (k >= ARRAY_SIZE(priv->irqs_table)) {
-			ret = -EINVAL;
-			goto clean_ale_ret;
-		}
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0)
+		goto clean_ale_ret;
 
-		ret = devm_request_irq(&pdev->dev, res->start, cpsw_interrupt,
-				       0, dev_name(&pdev->dev), priv);
-		if (ret < 0) {
-			dev_err(priv->dev, "error attaching irq (%d)\n", ret);
-			goto clean_ale_ret;
-		}
+	priv->irqs_table[0] = irq;
+	ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+			       0, dev_name(&pdev->dev), priv);
+	if (ret < 0) {
+		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
+		goto clean_ale_ret;
+	}
 
-		priv->irqs_table[k] = res->start;
-		k++;
+	irq = platform_get_irq(pdev, 1);
+	if (irq < 0)
+		goto clean_ale_ret;
+
+	priv->irqs_table[1] = irq;
+	ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+			       0, dev_name(&pdev->dev), priv);
+	if (ret < 0) {
+		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
+		goto clean_ale_ret;
+	}
+
+	irq = platform_get_irq(pdev, 2);
+	if (irq < 0)
+		goto clean_ale_ret;
+
+	priv->irqs_table[2] = irq;
+	ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+			       0, dev_name(&pdev->dev), priv);
+	if (ret < 0) {
+		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
+		goto clean_ale_ret;
+	}
+
+	irq = platform_get_irq(pdev, 3);
+	if (irq < 0)
+		goto clean_ale_ret;
+
+	priv->irqs_table[3] = irq;
+	ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+			       0, dev_name(&pdev->dev), priv);
+	if (ret < 0) {
+		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
+		goto clean_ale_ret;
 	}
 
-	priv->num_irqs = k;
+	priv->num_irqs = 4;
 
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
-- 
2.2.0

^ permalink raw reply related

* [patch-net-next v2 3/3] net: ethernet: cpsw: don't requests IRQs we don't use
From: Felipe Balbi @ 2015-01-14 16:58 UTC (permalink / raw)
  To: davem
  Cc: Tony Lindgren, Linux OMAP Mailing List, mugunthanvnm, netdev,
	Felipe Balbi
In-Reply-To: <1421254729-10602-1-git-send-email-balbi@ti.com>

CPSW never uses RX_THRESHOLD or MISC interrupts. In
fact, they are always kept masked in their appropriate
IRQ Enable register.

Instead of allocating an IRQ that never fires, it's best
to remove that code altogether and let future patches
implement it if anybody needs those.

Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 drivers/net/ethernet/ti/cpsw.c | 55 ++++++++++++------------------------------
 1 file changed, 15 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index c6c483f3e49f..ba09ff3c1695 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -754,16 +754,6 @@ requeue:
 		dev_kfree_skb_any(new_skb);
 }
 
-static irqreturn_t cpsw_dummy_interrupt(int irq, void *dev_id)
-{
-	struct cpsw_priv *priv = dev_id;
-	int value = irq - priv->irqs_table[0];
-
-	cpdma_ctlr_eoi(priv->dma, value);
-
-	return IRQ_HANDLED;
-}
-
 static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id)
 {
 	struct cpsw_priv *priv = dev_id;
@@ -1635,8 +1625,8 @@ static void cpsw_ndo_poll_controller(struct net_device *ndev)
 
 	cpsw_intr_disable(priv);
 	cpdma_ctlr_int_ctrl(priv->dma, false);
-	cpsw_rx_interrupt(priv->irq[1], priv);
-	cpsw_tx_interrupt(priv->irq[2], priv);
+	cpsw_rx_interrupt(priv->irq[0], priv);
+	cpsw_tx_interrupt(priv->irq[1], priv);
 	cpdma_ctlr_int_ctrl(priv->dma, true);
 	cpsw_intr_enable(priv);
 }
@@ -2358,30 +2348,27 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_dma_ret;
 	}
 
-	ndev->irq = platform_get_irq(pdev, 0);
+	ndev->irq = platform_get_irq(pdev, 1);
 	if (ndev->irq < 0) {
 		dev_err(priv->dev, "error getting irq resource\n");
 		ret = -ENOENT;
 		goto clean_ale_ret;
 	}
 
-	irq = platform_get_irq(pdev, 0);
-	if (irq < 0)
-		goto clean_ale_ret;
-
-	priv->irqs_table[0] = irq;
-	ret = devm_request_irq(&pdev->dev, irq, cpsw_dummy_interrupt,
-			       0, dev_name(&pdev->dev), priv);
-	if (ret < 0) {
-		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
-		goto clean_ale_ret;
-	}
+	/* Grab RX and TX IRQs. Note that we also have RX_THRESHOLD and
+	 * MISC IRQs which are always kept disabled with this driver so
+	 * we will not request them.
+	 *
+	 * If anyone wants to implement support for those, make sure to
+	 * first request and append them to irqs_table array.
+	 */
 
+	/* RX IRQ */
 	irq = platform_get_irq(pdev, 1);
 	if (irq < 0)
 		goto clean_ale_ret;
 
-	priv->irqs_table[1] = irq;
+	priv->irqs_table[0] = irq;
 	ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt,
 			       0, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
@@ -2389,31 +2376,19 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_ale_ret;
 	}
 
+	/* TX IRQ */
 	irq = platform_get_irq(pdev, 2);
 	if (irq < 0)
 		goto clean_ale_ret;
 
-	priv->irqs_table[2] = irq;
+	priv->irqs_table[1] = irq;
 	ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt,
 			       0, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
 		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
 		goto clean_ale_ret;
 	}
-
-	irq = platform_get_irq(pdev, 3);
-	if (irq < 0)
-		goto clean_ale_ret;
-
-	priv->irqs_table[3] = irq;
-	ret = devm_request_irq(&pdev->dev, irq, cpsw_dummy_interrupt,
-			       0, dev_name(&pdev->dev), priv);
-	if (ret < 0) {
-		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
-		goto clean_ale_ret;
-	}
-
-	priv->num_irqs = 4;
+	priv->num_irqs = 2;
 
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
-- 
2.2.0

^ permalink raw reply related

* [patch net] team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
From: Jiri Pirko @ 2015-01-14 17:15 UTC (permalink / raw)
  To: netdev; +Cc: davem, jbenc

This patch is fixing a race condition that may cause setting
count_pending to -1, which results in unwanted big bulk of arp messages
(in case of "notify peers").

Consider following scenario:

count_pending == 2
   CPU0                                           CPU1
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 1)
					  schedule_delayed_work
 team_notify_peers
   atomic_add (adding 1 to count_pending)
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 1)
					  schedule_delayed_work
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 0)
   schedule_delayed_work
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to -1)

Fix this race by using atomic_dec_if_positive - that will prevent
count_pending running under 0.

Fixes: fc423ff00df3a1955441 ("team: add peer notification")
Fixes: 492b200efdd20b8fcfd  ("team: add support for sending multicast rejoins")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 drivers/net/team/team.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 43bcfff..4b2bfc5 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -622,6 +622,7 @@ static int team_change_mode(struct team *team, const char *kind)
 static void team_notify_peers_work(struct work_struct *work)
 {
 	struct team *team;
+	int val;
 
 	team = container_of(work, struct team, notify_peers.dw.work);
 
@@ -629,9 +630,14 @@ static void team_notify_peers_work(struct work_struct *work)
 		schedule_delayed_work(&team->notify_peers.dw, 0);
 		return;
 	}
+	val = atomic_dec_if_positive(&team->notify_peers.count_pending);
+	if (val < 0) {
+		rtnl_unlock();
+		return;
+	}
 	call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, team->dev);
 	rtnl_unlock();
-	if (!atomic_dec_and_test(&team->notify_peers.count_pending))
+	if (val)
 		schedule_delayed_work(&team->notify_peers.dw,
 				      msecs_to_jiffies(team->notify_peers.interval));
 }
@@ -662,6 +668,7 @@ static void team_notify_peers_fini(struct team *team)
 static void team_mcast_rejoin_work(struct work_struct *work)
 {
 	struct team *team;
+	int val;
 
 	team = container_of(work, struct team, mcast_rejoin.dw.work);
 
@@ -669,9 +676,14 @@ static void team_mcast_rejoin_work(struct work_struct *work)
 		schedule_delayed_work(&team->mcast_rejoin.dw, 0);
 		return;
 	}
+	val = atomic_dec_if_positive(&team->mcast_rejoin.count_pending);
+	if (val < 0) {
+		rtnl_unlock();
+		return;
+	}
 	call_netdevice_notifiers(NETDEV_RESEND_IGMP, team->dev);
 	rtnl_unlock();
-	if (!atomic_dec_and_test(&team->mcast_rejoin.count_pending))
+	if (val)
 		schedule_delayed_work(&team->mcast_rejoin.dw,
 				      msecs_to_jiffies(team->mcast_rejoin.interval));
 }
-- 
1.9.3

^ permalink raw reply related

* Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"
From: Eric Dumazet @ 2015-01-14 17:20 UTC (permalink / raw)
  To: Thomas Jarosch
  Cc: 'Linux Netdev List', Eric Dumazet, Jeff Kirsher,
	e1000-devel
In-Reply-To: <1719052.SGOfRAJhfQ@storm>

On Wed, 2015-01-14 at 16:32 +0100, Thomas Jarosch wrote:
> Hello,
> 
> after updating a good bunch of production level machines
> from kernel 3.4.101 to kernel 3.14.25, a few of them started
> to show serious trouble when there was a lot of network traffic.
> 
> ---------------------------------------------------------------
> Jan 14 11:14:57 intrartc kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
> Jan 14 11:14:57 intrartc kernel:  TDH                  <3b>
> Jan 14 11:14:57 intrartc kernel:  TDT                  <76>
> Jan 14 11:14:57 intrartc kernel:  next_to_use          <76>
> Jan 14 11:14:57 intrartc kernel:  next_to_clean        <31>
> Jan 14 11:14:57 intrartc kernel: buffer_info[next_to_clean]:
> Jan 14 11:14:57 intrartc kernel:  time_stamp           <ffff328c>
> Jan 14 11:14:57 intrartc kernel:  next_to_watch        <3b>
> Jan 14 11:14:57 intrartc kernel:  jiffies              <ffff33b9>
> Jan 14 11:14:57 intrartc kernel:  next_to_watch.status <0>
> Jan 14 11:14:57 intrartc kernel: MAC Status             <40080083>
> Jan 14 11:14:57 intrartc kernel: PHY Status             <796d>
> Jan 14 11:14:57 intrartc kernel: PHY 1000BASE-T Status  <3800>
> Jan 14 11:14:57 intrartc kernel: PHY Extended Status    <3000>
> Jan 14 11:14:57 intrartc kernel: PCI Status             <10>
> Jan 14 11:14:59 intrartc kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
> ..
> ---------------------------------------------------------------
> 
> All of those troubled machines use an Intel DH61CR board and
> are driven by the e1000e driver. Kernels 3.7.0 to 3.19-rc4 are affected.
> 
> The problem vanishes when you disable TSO. This is the
> recommended "solution" on serverfault and others.
> http://ehc.ac/p/e1000/bugs/378/
> http://serverfault.com/questions/616485/e1000e-reset-adapter-unexpectedly-detected-hardware-unit-hang
> 
> I have a test setup that can trigger the problem within seconds
> and bisected it down to this commit (hi Eric!):
> ---------------------------------------------------------------
> commit 69b08f62e17439ee3d436faf0b9a7ca6fffb78db
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Wed Sep 26 06:46:57 2012 +0000
> 
>     net: use bigger pages in __netdev_alloc_frag
> 
>     We currently use percpu order-0 pages in __netdev_alloc_frag
>     to deliver fragments used by __netdev_alloc_skb()
> 
>     Depending on NIC driver and arch being 32 or 64 bit, it allows a page to
>     be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096
> 
>     Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows :
> 
>     - Better filling of space (the ending hole overhead is less an issue)
> 
>     - Less calls to page allocator or accesses to page->_count
> 
>     - Could allow struct skb_shared_info futures changes without major
>     performance impact.
> 
>     This patch implements a transparent fallback to smaller
>     pages in case of memory pressure.
> 
>     It also uses a standard "struct page_frag" instead of a custom one.
> 
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>     Cc: Benjamin LaHaise <bcrl@kvack.org>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> ---------------------------------------------------------------
> 
> Reverting the commit f.e. in kernel 3.7.0  solves the issue.
> I've done some more tests:
> 
>     3.18.0 32bit + PAE: broken
>     3.6.0 32bit + PAE: works
>     3.7.0 32bit + PAE: broken
>     3.7.0 32bit + PAE + revert 69b08f62e17439ee3d436faf0b9a7ca6fffb78db -> works
> 
>     3.7.0 32bit (without PAE) -> broken
>     3.7.0 32bit + "GFP_COMP" flag removed in __netdev_alloc_frag(): broken
>     3.7.0 32bit + "GFP_COMP" flag replaced with
>                               "GFP_DMA" in __netdev_alloc_frag(): works!
>     3.7.0 32bit + "GFP_COMP" flag + "GFP_DMA" flag: broken
>     3.19-rc4 32bit: broken
> 
> 
> The problem is triggered only when the traffic is forwarded to another client.
> (this client is behind NAT). Generating traffic directly
> on the system did not trigger the issue.
> 
> To me it looks like Eric's change uncovered a memory allocation
> issue in the e1000e driver: It probably uses a memory address
> unsuitable for DMA or so. This is just a guess though.
> 
> Funny fact: I have another Intel DH61CR board that does not show the problem.
> I've borrowed (...) the mainboard from one affected box for my bisect test setup.
> 
> Please CC: comments. Thanks.

I would try to use lower data per txd. I am not sure 24KB is really
supported.

( check commit d821a4c4d11ad160925dab2bb009b8444beff484 for details)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index e14fd85f64eb..8d973f7edfbd 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3897,7 +3897,7 @@ void e1000e_reset(struct e1000_adapter *adapter)
 	 * limit of 24KB due to receive synchronization limitations.
 	 */
 	adapter->tx_fifo_limit = min_t(u32, ((er32(PBA) >> 16) << 10) - 96,
-				       24 << 10);
+				       8 << 10);
 
 	/* Disable Adaptive Interrupt Moderation if 2 full packets cannot
 	 * fit in receive buffer.

^ permalink raw reply related

* [PATCH v3 02/16] virtio/9p: verify device has config space
From: Michael S. Tsirkin @ 2015-01-14 17:27 UTC (permalink / raw)
  To: linux-kernel, virtualization
  Cc: Eric Van Hensbergen, netdev, v9fs-developer, Ron Minnich,
	David S. Miller
In-Reply-To: <1421256142-11512-1-git-send-email-mst@redhat.com>

Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/9p needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/9p/trans_virtio.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index daa749c..d8e376a 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -524,6 +524,12 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 	int err;
 	struct virtio_chan *chan;
 
+	if (!vdev->config->get) {
+		dev_err(&vdev->dev, "%s failure: config access disabled\n",
+			__func__);
+		return -EINVAL;
+	}
+
 	chan = kmalloc(sizeof(struct virtio_chan), GFP_KERNEL);
 	if (!chan) {
 		pr_err("Failed to allocate virtio 9P channel\n");
-- 
MST

^ permalink raw reply related

* [PATCH v3 05/16] virtio/net: verify device has config space
From: Michael S. Tsirkin @ 2015-01-14 17:27 UTC (permalink / raw)
  To: linux-kernel, virtualization
  Cc: Rusty Russell, cornelia.huck, virtualization, netdev
In-Reply-To: <1421256142-11512-1-git-send-email-mst@redhat.com>

Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/net needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5ca9771..9bc1072 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1713,6 +1713,12 @@ static int virtnet_probe(struct virtio_device *vdev)
 	struct virtnet_info *vi;
 	u16 max_queue_pairs;

+	if (!vdev->config->get) {
+		dev_err(&vdev->dev, "%s failure: config access disabled\n",
+			__func__);
+		return -EINVAL;
+	}
+
 	if (!virtnet_validate_features(vdev))
 		return -EINVAL;

-- 
MST

^ permalink raw reply related

* [PATCH for 3.19 0/3] rtlwifi: Various updates/fixes
From: Larry Finger @ 2015-01-14 17:37 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, Larry Finger, netdev

The first of these patches removes a logging statement that is no longer
needed. The other two fix a number of bugs in rtl8192ee that have been
found since the original inclusion of this driver.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>


Larry Finger (1):
  rtlwifi: Remove logging statement that is no longer needed

Troy Tan (2):
  rtlwifi: Fix handling of new style descriptors
  rtlwifi: rtl8192ee: Implement new handling of FIFO descriptor buffer

 drivers/net/wireless/rtlwifi/pci.c           |  36 ++++--
 drivers/net/wireless/rtlwifi/rtl8192ee/hw.c  | 167 +++++++++++++++++++++----
 drivers/net/wireless/rtlwifi/rtl8192ee/reg.h |   2 +
 drivers/net/wireless/rtlwifi/rtl8192ee/sw.c  |   3 +-
 drivers/net/wireless/rtlwifi/rtl8192ee/trx.c | 175 +++++++++++----------------
 drivers/net/wireless/rtlwifi/rtl8192ee/trx.h |   4 +-
 drivers/net/wireless/rtlwifi/wifi.h          |   1 +
 7 files changed, 242 insertions(+), 146 deletions(-)

-- 
2.1.2

^ permalink raw reply

* [PATCH for 3.19 1/3] rtlwifi: Remove logging statement that is no longer needed
From: Larry Finger @ 2015-01-14 17:37 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, Larry Finger, netdev
In-Reply-To: <1421257036-5382-1-git-send-email-Larry.Finger@lwfinger.net>

In commit e9538cf4f907 ("rtlwifi: Fix error when accessing unmapped memory
in skb"), a printk was included to indicate that the condition had been
reached. There is now enough evidence from other users that the fix is
working. That logging statement can now be removed.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
 drivers/net/wireless/rtlwifi/pci.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c
index c70efb9..e25faac 100644
--- a/drivers/net/wireless/rtlwifi/pci.c
+++ b/drivers/net/wireless/rtlwifi/pci.c
@@ -816,11 +816,8 @@ static void _rtl_pci_rx_interrupt(struct ieee80211_hw *hw)
 
 		/* get a new skb - if fail, old one will be reused */
 		new_skb = dev_alloc_skb(rtlpci->rxbuffersize);
-		if (unlikely(!new_skb)) {
-			pr_err("Allocation of new skb failed in %s\n",
-			       __func__);
+		if (unlikely(!new_skb))
 			goto no_new;
-		}
 		if (rtlpriv->use_new_trx_flow) {
 			buffer_desc =
 			  &rtlpci->rx_ring[rxring_idx].buffer_desc
-- 
2.1.2

^ permalink raw reply related

* [PATCH for 3.19 2/3] rtlwifi: Fix handling of new style descriptors
From: Larry Finger @ 2015-01-14 17:37 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, Troy Tan, netdev, Larry Finger
In-Reply-To: <1421257036-5382-1-git-send-email-Larry.Finger@lwfinger.net>

From: Troy Tan <troy_tan@realsil.com.cn>

The hardware and firmware for the RTL8192EE utilize a FIFO list of
descriptors. There were some problems with the initial implementation.
The worst of these failed to detect that the FIFO was becoming full,
which led to the device needing to be power cycled. As this condition
is not relevant to most of the devices supported by rtlwifi, a callback
routine was added to detect this situation. This patch implements the
necessary changes in the pci handler.

Signed-off-by: Troy Tan <troy_tan@realsil.com.cn>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
 drivers/net/wireless/rtlwifi/pci.c  | 31 +++++++++++++++++++++++--------
 drivers/net/wireless/rtlwifi/wifi.h |  1 +
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c
index e25faac..a62170e 100644
--- a/drivers/net/wireless/rtlwifi/pci.c
+++ b/drivers/net/wireless/rtlwifi/pci.c
@@ -578,6 +578,13 @@ static void _rtl_pci_tx_isr(struct ieee80211_hw *hw, int prio)
 		else
 			entry = (u8 *)(&ring->desc[ring->idx]);
 
+		if (rtlpriv->cfg->ops->get_available_desc &&
+		    rtlpriv->cfg->ops->get_available_desc(hw, prio) <= 1) {
+			RT_TRACE(rtlpriv, (COMP_INTR | COMP_SEND), DBG_DMESG,
+				 "no available desc!\n");
+			return;
+		}
+
 		if (!rtlpriv->cfg->ops->is_tx_desc_closed(hw, prio, ring->idx))
 			return;
 		ring->idx = (ring->idx + 1) % ring->entries;
@@ -641,10 +648,9 @@ static void _rtl_pci_tx_isr(struct ieee80211_hw *hw, int prio)
 
 		ieee80211_tx_status_irqsafe(hw, skb);
 
-		if ((ring->entries - skb_queue_len(&ring->queue))
-				== 2) {
+		if ((ring->entries - skb_queue_len(&ring->queue)) <= 4) {
 
-			RT_TRACE(rtlpriv, COMP_ERR, DBG_LOUD,
+			RT_TRACE(rtlpriv, COMP_ERR, DBG_DMESG,
 				 "more desc left, wake skb_queue@%d, ring->idx = %d, skb_queue_len = 0x%x\n",
 				 prio, ring->idx,
 				 skb_queue_len(&ring->queue));
@@ -793,7 +799,7 @@ static void _rtl_pci_rx_interrupt(struct ieee80211_hw *hw)
 			rx_remained_cnt =
 				rtlpriv->cfg->ops->rx_desc_buff_remained_cnt(hw,
 								      hw_queue);
-			if (rx_remained_cnt < 1)
+			if (rx_remained_cnt == 0)
 				return;
 
 		} else {	/* rx descriptor */
@@ -845,18 +851,18 @@ static void _rtl_pci_rx_interrupt(struct ieee80211_hw *hw)
 			else
 				skb_reserve(skb, stats.rx_drvinfo_size +
 					    stats.rx_bufshift);
-
 		} else {
 			RT_TRACE(rtlpriv, COMP_ERR, DBG_WARNING,
 				 "skb->end - skb->tail = %d, len is %d\n",
 				 skb->end - skb->tail, len);
-			break;
+			dev_kfree_skb_any(skb);
+			goto new_trx_end;
 		}
 		/* handle command packet here */
 		if (rtlpriv->cfg->ops->rx_command_packet &&
 		    rtlpriv->cfg->ops->rx_command_packet(hw, stats, skb)) {
 				dev_kfree_skb_any(skb);
-				goto end;
+				goto new_trx_end;
 		}
 
 		/*
@@ -906,6 +912,7 @@ static void _rtl_pci_rx_interrupt(struct ieee80211_hw *hw)
 		} else {
 			dev_kfree_skb_any(skb);
 		}
+new_trx_end:
 		if (rtlpriv->use_new_trx_flow) {
 			rtlpci->rx_ring[hw_queue].next_rx_rp += 1;
 			rtlpci->rx_ring[hw_queue].next_rx_rp %=
@@ -921,7 +928,6 @@ static void _rtl_pci_rx_interrupt(struct ieee80211_hw *hw)
 			rtlpriv->enter_ps = false;
 			schedule_work(&rtlpriv->works.lps_change_work);
 		}
-end:
 		skb = new_skb;
 no_new:
 		if (rtlpriv->use_new_trx_flow) {
@@ -1685,6 +1691,15 @@ static int rtl_pci_tx(struct ieee80211_hw *hw,
 		}
 	}
 
+	if (rtlpriv->cfg->ops->get_available_desc &&
+	    rtlpriv->cfg->ops->get_available_desc(hw, hw_queue) == 0) {
+			RT_TRACE(rtlpriv, COMP_ERR, DBG_WARNING,
+				 "get_available_desc fail\n");
+			spin_unlock_irqrestore(&rtlpriv->locks.irq_th_lock,
+					       flags);
+			return skb->len;
+	}
+
 	if (ieee80211_is_data_qos(fc)) {
 		tid = rtl_get_tid(skb);
 		if (sta) {
diff --git a/drivers/net/wireless/rtlwifi/wifi.h b/drivers/net/wireless/rtlwifi/wifi.h
index 3b3453a..413c2ab 100644
--- a/drivers/net/wireless/rtlwifi/wifi.h
+++ b/drivers/net/wireless/rtlwifi/wifi.h
@@ -2172,6 +2172,7 @@ struct rtl_hal_ops {
 	void (*add_wowlan_pattern)(struct ieee80211_hw *hw,
 				   struct rtl_wow_pattern *rtl_pattern,
 				   u8 index);
+	u16 (*get_available_desc)(struct ieee80211_hw *hw, u8 q_idx);
 };
 
 struct rtl_intf_ops {
-- 
2.1.2

^ permalink raw reply related

* [PATCH for 3.19 3/3] rtlwifi: rtl8192ee: Fix several bugs
From: Larry Finger @ 2015-01-14 17:37 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, Troy Tan, netdev, Larry Finger
In-Reply-To: <1421257036-5382-1-git-send-email-Larry.Finger@lwfinger.net>

From: Troy Tan <troy_tan@realsil.com.cn>

The following bugs are fixed in this driver:
1. Problems parsing C2H CMD
2. An ad-hoc connection can cause a TX freeze.
3. There are additional conditions that cause a TX freeze.
4. The previous code failed to handle situations where an RX
   descriptor was unavailable.

Signed-off-by: Troy Tan <troy_tan@realsil.com.cn>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
 drivers/net/wireless/rtlwifi/rtl8192ee/hw.c  | 167 +++++++++++++++++++++----
 drivers/net/wireless/rtlwifi/rtl8192ee/reg.h |   2 +
 drivers/net/wireless/rtlwifi/rtl8192ee/sw.c  |   3 +-
 drivers/net/wireless/rtlwifi/rtl8192ee/trx.c | 175 +++++++++++----------------
 drivers/net/wireless/rtlwifi/rtl8192ee/trx.h |   4 +-
 5 files changed, 217 insertions(+), 134 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/rtl8192ee/hw.c b/drivers/net/wireless/rtlwifi/rtl8192ee/hw.c
index 47beb49..215b970 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192ee/hw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192ee/hw.c
@@ -85,29 +85,6 @@ static void _rtl92ee_enable_bcn_sub_func(struct ieee80211_hw *hw)
 	_rtl92ee_set_bcn_ctrl_reg(hw, 0, BIT(1));
 }
 
-static void _rtl92ee_return_beacon_queue_skb(struct ieee80211_hw *hw)
-{
-	struct rtl_priv *rtlpriv = rtl_priv(hw);
-	struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
-	struct rtl8192_tx_ring *ring = &rtlpci->tx_ring[BEACON_QUEUE];
-	unsigned long flags;
-
-	spin_lock_irqsave(&rtlpriv->locks.irq_th_lock, flags);
-	while (skb_queue_len(&ring->queue)) {
-		struct rtl_tx_buffer_desc *entry =
-						&ring->buffer_desc[ring->idx];
-		struct sk_buff *skb = __skb_dequeue(&ring->queue);
-
-		pci_unmap_single(rtlpci->pdev,
-				 rtlpriv->cfg->ops->get_desc(
-				 (u8 *)entry, true, HW_DESC_TXBUFF_ADDR),
-				 skb->len, PCI_DMA_TODEVICE);
-		kfree_skb(skb);
-		ring->idx = (ring->idx + 1) % ring->entries;
-	}
-	spin_unlock_irqrestore(&rtlpriv->locks.irq_th_lock, flags);
-}
-
 static void _rtl92ee_disable_bcn_sub_func(struct ieee80211_hw *hw)
 {
 	_rtl92ee_set_bcn_ctrl_reg(hw, BIT(1), 0);
@@ -403,9 +380,6 @@ static void _rtl92ee_download_rsvd_page(struct ieee80211_hw *hw)
 		rtl_write_byte(rtlpriv, REG_DWBCN0_CTRL + 2,
 			       bcnvalid_reg | BIT(0));
 
-		/* Return Beacon TCB */
-		_rtl92ee_return_beacon_queue_skb(hw);
-
 		/* download rsvd page */
 		rtl92ee_set_fw_rsvdpagepkt(hw, false);
 
@@ -1163,6 +1137,140 @@ void rtl92ee_enable_hw_security_config(struct ieee80211_hw *hw)
 	rtlpriv->cfg->ops->set_hw_reg(hw, HW_VAR_WPA_CONFIG, &sec_reg_value);
 }
 
+static bool _rtl8192ee_check_pcie_dma_hang(struct rtl_priv *rtlpriv)
+{
+	u8 tmp;
+
+	/* write reg 0x350 Bit[26]=1. Enable debug port. */
+	tmp = rtl_read_byte(rtlpriv, REG_BACKDOOR_DBI_DATA + 3);
+	if (!(tmp & BIT(2))) {
+		rtl_write_byte(rtlpriv, REG_BACKDOOR_DBI_DATA + 3,
+			       (tmp | BIT(2)));
+		mdelay(100); /* Suggested by DD Justin_tsai. */
+	}
+
+	/* read reg 0x350 Bit[25] if 1 : RX hang
+	 * read reg 0x350 Bit[24] if 1 : TX hang
+	 */
+	tmp = rtl_read_byte(rtlpriv, REG_BACKDOOR_DBI_DATA + 3);
+	if ((tmp & BIT(0)) || (tmp & BIT(1))) {
+		RT_TRACE(rtlpriv, COMP_INIT, DBG_LOUD,
+			 "CheckPcieDMAHang8192EE(): true!!\n");
+		return true;
+	}
+	return false;
+}
+
+static void _rtl8192ee_reset_pcie_interface_dma(struct rtl_priv *rtlpriv,
+						bool mac_power_on)
+{
+	u8 tmp;
+	bool release_mac_rx_pause;
+	u8 backup_pcie_dma_pause;
+
+	RT_TRACE(rtlpriv, COMP_INIT, DBG_LOUD,
+		 "ResetPcieInterfaceDMA8192EE()\n");
+
+	/* Revise Note: Follow the document "PCIe RX DMA Hang Reset Flow_v03"
+	 * released by SD1 Alan.
+	 * 2013.05.07, by tynli.
+	 */
+
+	/* 1. disable register write lock
+	 *	write 0x1C bit[1:0] = 2'h0
+	 *	write 0xCC bit[2] = 1'b1
+	 */
+	tmp = rtl_read_byte(rtlpriv, REG_RSV_CTRL);
+	tmp &= ~(BIT(1) | BIT(0));
+	rtl_write_byte(rtlpriv, REG_RSV_CTRL, tmp);
+	tmp = rtl_read_byte(rtlpriv, REG_PMC_DBG_CTRL2);
+	tmp |= BIT(2);
+	rtl_write_byte(rtlpriv, REG_PMC_DBG_CTRL2, tmp);
+
+	/* 2. Check and pause TRX DMA
+	 *	write 0x284 bit[18] = 1'b1
+	 *	write 0x301 = 0xFF
+	 */
+	tmp = rtl_read_byte(rtlpriv, REG_RXDMA_CONTROL);
+	if (tmp & BIT(2)) {
+		/* Already pause before the function for another purpose. */
+		release_mac_rx_pause = false;
+	} else {
+		rtl_write_byte(rtlpriv, REG_RXDMA_CONTROL, (tmp | BIT(2)));
+		release_mac_rx_pause = true;
+	}
+
+	backup_pcie_dma_pause = rtl_read_byte(rtlpriv, REG_PCIE_CTRL_REG + 1);
+	if (backup_pcie_dma_pause != 0xFF)
+		rtl_write_byte(rtlpriv, REG_PCIE_CTRL_REG + 1, 0xFF);
+
+	if (mac_power_on) {
+		/* 3. reset TRX function
+		 *	write 0x100 = 0x00
+		 */
+		rtl_write_byte(rtlpriv, REG_CR, 0);
+	}
+
+	/* 4. Reset PCIe DMA
+	 *	write 0x003 bit[0] = 0
+	 */
+	tmp = rtl_read_byte(rtlpriv, REG_SYS_FUNC_EN + 1);
+	tmp &= ~(BIT(0));
+	rtl_write_byte(rtlpriv, REG_SYS_FUNC_EN + 1, tmp);
+
+	/* 5. Enable PCIe DMA
+	 *	write 0x003 bit[0] = 1
+	 */
+	tmp = rtl_read_byte(rtlpriv, REG_SYS_FUNC_EN + 1);
+	tmp |= BIT(0);
+	rtl_write_byte(rtlpriv, REG_SYS_FUNC_EN + 1, tmp);
+
+	if (mac_power_on) {
+		/* 6. enable TRX function
+		 *	write 0x100 = 0xFF
+		 */
+		rtl_write_byte(rtlpriv, REG_CR, 0xFF);
+
+		/* We should init LLT & RQPN and
+		 * prepare Tx/Rx descrptor address later
+		 * because MAC function is reset.
+		 */
+	}
+
+	/* 7. Restore PCIe autoload down bit
+	 *	write 0xF8 bit[17] = 1'b1
+	 */
+	tmp = rtl_read_byte(rtlpriv, REG_MAC_PHY_CTRL_NORMAL + 2);
+	tmp |= BIT(1);
+	rtl_write_byte(rtlpriv, REG_MAC_PHY_CTRL_NORMAL + 2, tmp);
+
+	/* In MAC power on state, BB and RF maybe in ON state,
+	 * if we release TRx DMA here
+	 * it will cause packets to be started to Tx/Rx,
+	 * so we release Tx/Rx DMA later.
+	 */
+	if (!mac_power_on) {
+		/* 8. release TRX DMA
+		 *	write 0x284 bit[18] = 1'b0
+		 *	write 0x301 = 0x00
+		 */
+		if (release_mac_rx_pause) {
+			tmp = rtl_read_byte(rtlpriv, REG_RXDMA_CONTROL);
+			rtl_write_byte(rtlpriv, REG_RXDMA_CONTROL,
+				       (tmp & (~BIT(2))));
+		}
+		rtl_write_byte(rtlpriv, REG_PCIE_CTRL_REG + 1,
+			       backup_pcie_dma_pause);
+	}
+
+	/* 9. lock system register
+	 *	write 0xCC bit[2] = 1'b0
+	 */
+	tmp = rtl_read_byte(rtlpriv, REG_PMC_DBG_CTRL2);
+	tmp &= ~(BIT(2));
+	rtl_write_byte(rtlpriv, REG_PMC_DBG_CTRL2, tmp);
+}
+
 int rtl92ee_hw_init(struct ieee80211_hw *hw)
 {
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
@@ -1188,6 +1296,13 @@ int rtl92ee_hw_init(struct ieee80211_hw *hw)
 		rtlhal->fw_ps_state = FW_PS_STATE_ALL_ON_92E;
 	}
 
+	if (_rtl8192ee_check_pcie_dma_hang(rtlpriv)) {
+		RT_TRACE(rtlpriv, COMP_INIT, DBG_DMESG, "92ee dma hang!\n");
+		_rtl8192ee_reset_pcie_interface_dma(rtlpriv,
+						    rtlhal->mac_func_enable);
+		rtlhal->mac_func_enable = false;
+	}
+
 	rtstatus = _rtl92ee_init_mac(hw);
 
 	rtl_write_byte(rtlpriv, 0x577, 0x03);
diff --git a/drivers/net/wireless/rtlwifi/rtl8192ee/reg.h b/drivers/net/wireless/rtlwifi/rtl8192ee/reg.h
index 3f2a959..696ae188 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192ee/reg.h
+++ b/drivers/net/wireless/rtlwifi/rtl8192ee/reg.h
@@ -77,9 +77,11 @@
 #define REG_HIMRE				0x00B8
 #define REG_HISRE				0x00BC
 
+#define REG_PMC_DBG_CTRL2			0x00CC
 #define REG_EFUSE_ACCESS			0x00CF
 #define REG_HPON_FSM				0x00EC
 #define REG_SYS_CFG1				0x00F0
+#define REG_MAC_PHY_CTRL_NORMAL		0x00F8
 #define REG_SYS_CFG2				0x00FC
 
 #define REG_CR					0x0100
diff --git a/drivers/net/wireless/rtlwifi/rtl8192ee/sw.c b/drivers/net/wireless/rtlwifi/rtl8192ee/sw.c
index f30c916..100d6fc 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192ee/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192ee/sw.c
@@ -114,8 +114,6 @@ int rtl92ee_init_sw_vars(struct ieee80211_hw *hw)
 				  RCR_AMF			|
 				  RCR_ACF			|
 				  RCR_ADF			|
-				  RCR_AICV			|
-				  RCR_ACRC32			|
 				  RCR_AB			|
 				  RCR_AM			|
 				  RCR_APM			|
@@ -241,6 +239,7 @@ static struct rtl_hal_ops rtl8192ee_hal_ops = {
 	.set_desc = rtl92ee_set_desc,
 	.get_desc = rtl92ee_get_desc,
 	.is_tx_desc_closed = rtl92ee_is_tx_desc_closed,
+	.get_available_desc = rtl92ee_get_available_desc,
 	.tx_polling = rtl92ee_tx_polling,
 	.enable_hw_sec = rtl92ee_enable_hw_security_config,
 	.init_sw_leds = rtl92ee_init_sw_leds,
diff --git a/drivers/net/wireless/rtlwifi/rtl8192ee/trx.c b/drivers/net/wireless/rtlwifi/rtl8192ee/trx.c
index 51806ac..ee1df82 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192ee/trx.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192ee/trx.c
@@ -354,6 +354,11 @@ bool rtl92ee_rx_query_desc(struct ieee80211_hw *hw,
 	struct ieee80211_hdr *hdr;
 	u32 phystatus = GET_RX_DESC_PHYST(pdesc);
 
+	if (GET_RX_STATUS_DESC_RPT_SEL(pdesc) == 0)
+		status->packet_report_type = NORMAL_RX;
+	else
+		status->packet_report_type = C2H_PACKET;
+
 	status->length = (u16)GET_RX_DESC_PKT_LEN(pdesc);
 	status->rx_drvinfo_size = (u8)GET_RX_DESC_DRV_INFO_SIZE(pdesc) *
 				  RX_DRV_INFO_SIZE_UNIT;
@@ -472,44 +477,22 @@ u16 rtl92ee_rx_desc_buff_remained_cnt(struct ieee80211_hw *hw, u8 queue_index)
 {
 	struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
-	u16 read_point = 0, write_point = 0, remind_cnt = 0;
-	u32 tmp_4byte = 0;
-	static u16 last_read_point;
-	static bool start_rx;
-
-	tmp_4byte = rtl_read_dword(rtlpriv, REG_RXQ_TXBD_IDX);
-	read_point = (u16)((tmp_4byte>>16) & 0x7ff);
-	write_point = (u16)(tmp_4byte & 0x7ff);
-
-	if (write_point != rtlpci->rx_ring[queue_index].next_rx_rp) {
-		RT_TRACE(rtlpriv, COMP_RXDESC, DBG_DMESG,
-			 "!!!write point is 0x%x, reg 0x3B4 value is 0x%x\n",
-			  write_point, tmp_4byte);
-		tmp_4byte = rtl_read_dword(rtlpriv, REG_RXQ_TXBD_IDX);
-		read_point = (u16)((tmp_4byte>>16) & 0x7ff);
-		write_point = (u16)(tmp_4byte & 0x7ff);
-	}
-
-	if (read_point > 0)
-		start_rx = true;
-	if (!start_rx)
-		return 0;
+	u16 desc_idx_hw = 0, desc_idx_host = 0, remind_cnt = 0;
+	u32 tmp_4byte = rtl_read_dword(rtlpriv, REG_RXQ_TXBD_IDX);
+	u32 rw_mask = 0x1ff;
 
-	if ((last_read_point > (RX_DESC_NUM_92E / 2)) &&
-	    (read_point <= (RX_DESC_NUM_92E / 2))) {
-		remind_cnt = RX_DESC_NUM_92E - write_point;
-	} else {
-		remind_cnt = (read_point >= write_point) ?
-			     (read_point - write_point) :
-			     (RX_DESC_NUM_92E - write_point + read_point);
-	}
+	desc_idx_hw = (u16)((tmp_4byte>>16) & rw_mask);
+	desc_idx_host = (u16)(tmp_4byte & rw_mask);
 
-	if (remind_cnt == 0)
+	/* may be no data, donot rx */
+	if (desc_idx_hw == desc_idx_host)
 		return 0;
 
-	rtlpci->rx_ring[queue_index].next_rx_rp = write_point;
+	remind_cnt = (desc_idx_hw > desc_idx_host) ?
+		     (desc_idx_hw - desc_idx_host) :
+		     (RX_DESC_NUM_92E - (desc_idx_host - desc_idx_hw));
 
-	last_read_point = read_point;
+	rtlpci->rx_ring[queue_index].next_rx_rp = desc_idx_host;
 	return remind_cnt;
 }
 
@@ -551,7 +534,8 @@ static u16 get_desc_addr_fr_q_idx(u16 queue_index)
 	return desc_address;
 }
 
-void rtl92ee_get_available_desc(struct ieee80211_hw *hw, u8 q_idx)
+/*free  desc that can be used */
+u16 rtl92ee_get_available_desc(struct ieee80211_hw *hw, u8 q_idx)
 {
 	struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
@@ -561,15 +545,25 @@ void rtl92ee_get_available_desc(struct ieee80211_hw *hw, u8 q_idx)
 
 	tmp_4byte = rtl_read_dword(rtlpriv,
 				   get_desc_addr_fr_q_idx(q_idx));
-	current_tx_read_point = (u16)((tmp_4byte >> 16) & 0x0fff);
-	current_tx_write_point = (u16)((tmp_4byte) & 0x0fff);
+	current_tx_read_point = (u16)((tmp_4byte >> 16) & 0x01ff);
+	current_tx_write_point = (u16)((tmp_4byte) & 0x01ff);
+
+	if (current_tx_read_point == current_tx_write_point)
+		point_diff = TX_DESC_NUM_92E - 1;
+	else if (current_tx_read_point < current_tx_write_point)
+		point_diff = TX_DESC_NUM_92E - (current_tx_write_point -
+			     current_tx_read_point) - 1;
+	else
+		point_diff = current_tx_read_point - current_tx_write_point - 1;
 
-	point_diff = ((current_tx_read_point > current_tx_write_point) ?
-		      (current_tx_read_point - current_tx_write_point) :
-		      (TX_DESC_NUM_92E - current_tx_write_point +
-		       current_tx_read_point));
+	if (0 == point_diff) {
+		RT_TRACE(rtlpriv, COMP_SEND, DBG_DMESG,
+			 "CR:%d,CW:%d\n",
+			 current_tx_read_point, current_tx_write_point);
+	}
 
 	rtlpci->tx_ring[q_idx].avl_desc = point_diff;
+	return point_diff;
 }
 
 void rtl92ee_pre_fill_tx_bd_desc(struct ieee80211_hw *hw,
@@ -706,7 +700,7 @@ void rtl92ee_tx_fill_desc(struct ieee80211_hw *hw,
 	mapping = pci_map_single(rtlpci->pdev, skb->data, skb->len,
 				 PCI_DMA_TODEVICE);
 	if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
-		RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
+		RT_TRACE(rtlpriv, COMP_SEND, DBG_DMESG,
 			 "DMA mapping error");
 		return;
 	}
@@ -870,7 +864,7 @@ void rtl92ee_tx_fill_cmddesc(struct ieee80211_hw *hw,
 	u8 txdesc_len = 40;
 
 	if (pci_dma_mapping_error(rtlpci->pdev, mapping)) {
-		RT_TRACE(rtlpriv, COMP_SEND, DBG_TRACE,
+		RT_TRACE(rtlpriv, COMP_SEND, DBG_DMESG,
 			 "DMA mapping error");
 		return;
 	}
@@ -918,8 +912,6 @@ void rtl92ee_set_desc(struct ieee80211_hw *hw, u8 *pdesc, bool istx,
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
 	u16 cur_tx_rp = 0;
 	u16 cur_tx_wp = 0;
-	static u16 last_txw_point;
-	static bool over_run;
 	u32 tmp = 0;
 	u8 q_idx = *val;
 
@@ -932,6 +924,7 @@ void rtl92ee_set_desc(struct ieee80211_hw *hw, u8 *pdesc, bool istx,
 			struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
 			struct rtl8192_tx_ring *ring = &rtlpci->tx_ring[q_idx];
 			u16 max_tx_desc = ring->entries;
+			u16 point_diff = 0;
 
 			if (q_idx == BEACON_QUEUE) {
 				ring->cur_tx_wp = 0;
@@ -940,44 +933,32 @@ void rtl92ee_set_desc(struct ieee80211_hw *hw, u8 *pdesc, bool istx,
 				return;
 			}
 
+			tmp = rtl_read_dword(rtlpriv,
+					     get_desc_addr_fr_q_idx(q_idx));
+			cur_tx_rp = (u16)((tmp >> 16) & 0x0fff);
+			cur_tx_wp = (u16)(tmp & 0x0fff);
+
+			ring->cur_tx_wp = cur_tx_wp;
+			ring->cur_tx_rp = cur_tx_rp;
+			point_diff = ((cur_tx_rp > cur_tx_wp) ?
+					      (cur_tx_rp - cur_tx_wp) :
+					      (TX_DESC_NUM_92E - 1 -
+					       cur_tx_wp + cur_tx_rp));
+
+			ring->avl_desc = point_diff;
+
 			ring->cur_tx_wp = ((ring->cur_tx_wp + 1) % max_tx_desc);
 
-			if (over_run) {
-				ring->cur_tx_wp = 0;
-				over_run = false;
-			}
-			if (ring->avl_desc > 1) {
+			if (ring->avl_desc >= 1) {
 				ring->avl_desc--;
-
 				rtl_write_word(rtlpriv,
 					       get_desc_addr_fr_q_idx(q_idx),
 					       ring->cur_tx_wp);
-
-				if (q_idx == 1)
-					last_txw_point = cur_tx_wp;
-			}
-
-			if (ring->avl_desc < (max_tx_desc - 15)) {
-				u16 point_diff = 0;
-
-				tmp =
-				  rtl_read_dword(rtlpriv,
-						 get_desc_addr_fr_q_idx(q_idx));
-				cur_tx_rp = (u16)((tmp >> 16) & 0x0fff);
-				cur_tx_wp = (u16)(tmp & 0x0fff);
-
-				ring->cur_tx_wp = cur_tx_wp;
-				ring->cur_tx_rp = cur_tx_rp;
-				point_diff = ((cur_tx_rp > cur_tx_wp) ?
-					      (cur_tx_rp - cur_tx_wp) :
-					      (TX_DESC_NUM_92E - 1 -
-					       cur_tx_wp + cur_tx_rp));
-
-				ring->avl_desc = point_diff;
+			} else {
+				pr_err("critical error! ring->avl_desc == 0\n");
 			}
 		}
-		break;
-		}
+		break; }
 	} else {
 		switch (desc_name) {
 		case HW_DESC_RX_PREPARE:
@@ -1043,38 +1024,29 @@ bool rtl92ee_is_tx_desc_closed(struct ieee80211_hw *hw, u8 hw_queue, u16 index)
 {
 	struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
-	u16 read_point, write_point, available_desc_num;
+	u16 read_point, write_point;
 	bool ret = false;
-	static u8 stop_report_cnt;
 	struct rtl8192_tx_ring *ring = &rtlpci->tx_ring[hw_queue];
+	u16 cur_tx_rp, cur_tx_wp;
+	u32 tmpu32 = 0;
+
+	tmpu32 = rtl_read_dword(rtlpriv,
+				get_desc_addr_fr_q_idx(hw_queue));
+	cur_tx_rp = (u16)((tmpu32 >> 16) & 0x01ff);
+	cur_tx_wp = (u16)(tmpu32 & 0x01ff);
+
+	ring->cur_tx_wp = cur_tx_wp;
+	ring->cur_tx_rp = cur_tx_rp;
+	ring->avl_desc = ((cur_tx_rp > cur_tx_wp) ? (cur_tx_rp - cur_tx_wp) :
+			  (TX_DESC_NUM_92E - cur_tx_wp + cur_tx_rp));
 
-	/*checking Read/Write Point each interrupt wastes CPU */
-	if (stop_report_cnt > 15 || !rtlpriv->link_info.busytraffic) {
-		u16 point_diff = 0;
-		u16 cur_tx_rp, cur_tx_wp;
-		u32 tmpu32 = 0;
-
-		tmpu32 =
-		  rtl_read_dword(rtlpriv,
-				 get_desc_addr_fr_q_idx(hw_queue));
-		cur_tx_rp = (u16)((tmpu32 >> 16) & 0x0fff);
-		cur_tx_wp = (u16)(tmpu32 & 0x0fff);
-
-		ring->cur_tx_wp = cur_tx_wp;
-		ring->cur_tx_rp = cur_tx_rp;
-		point_diff = ((cur_tx_rp > cur_tx_wp) ?
-			      (cur_tx_rp - cur_tx_wp) :
-			      (TX_DESC_NUM_92E - cur_tx_wp + cur_tx_rp));
-
-		ring->avl_desc = point_diff;
-	}
 
 	read_point = ring->cur_tx_rp;
 	write_point = ring->cur_tx_wp;
-	available_desc_num = ring->avl_desc;
+
 
 	if (write_point > read_point) {
-		if (index < write_point && index >= read_point)
+		if (index <= write_point && index >= read_point)
 			ret = false;
 		else
 			ret = true;
@@ -1095,13 +1067,6 @@ bool rtl92ee_is_tx_desc_closed(struct ieee80211_hw *hw, u8 hw_queue, u16 index)
 	    rtlpriv->psc.rfoff_reason > RF_CHANGE_BY_PS)
 		ret = true;
 
-	if (hw_queue < BEACON_QUEUE) {
-		if (!ret)
-			stop_report_cnt++;
-		else
-			stop_report_cnt = 0;
-	}
-
 	return ret;
 }
 
diff --git a/drivers/net/wireless/rtlwifi/rtl8192ee/trx.h b/drivers/net/wireless/rtlwifi/rtl8192ee/trx.h
index 45fd9db..8f78ac9 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192ee/trx.h
+++ b/drivers/net/wireless/rtlwifi/rtl8192ee/trx.h
@@ -542,6 +542,8 @@
 	LE_BITS_TO_4BYTE(__pdesc+8, 12, 4)
 #define GET_RX_DESC_RX_IS_QOS(__pdesc)			\
 	LE_BITS_TO_4BYTE(__pdesc+8, 16, 1)
+#define GET_RX_STATUS_DESC_RPT_SEL(__pdesc)		\
+	LE_BITS_TO_4BYTE(__pdesc+8, 28, 1)
 
 #define GET_RX_DESC_RXMCS(__pdesc)			\
 	LE_BITS_TO_4BYTE(__pdesc+12, 0, 7)
@@ -829,7 +831,7 @@ void rtl92ee_rx_check_dma_ok(struct ieee80211_hw *hw, u8 *header_desc,
 			     u8 queue_index);
 u16	rtl92ee_rx_desc_buff_remained_cnt(struct ieee80211_hw *hw,
 					  u8 queue_index);
-void rtl92ee_get_available_desc(struct ieee80211_hw *hw, u8 queue_index);
+u16 rtl92ee_get_available_desc(struct ieee80211_hw *hw, u8 queue_index);
 void rtl92ee_pre_fill_tx_bd_desc(struct ieee80211_hw *hw,
 				 u8 *tx_bd_desc, u8 *desc, u8 queue_index,
 				 struct sk_buff *skb, dma_addr_t addr);
-- 
2.1.2

^ permalink raw reply related

* [patch net-next v4 1/2] tc: add BPF based action
From: Jiri Pirko @ 2015-01-14 17:43 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs, dborkman, ast, hannes

This action provides a possibility to exec custom BPF code.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
v3->v4:
 - fixed Kconfig typo spotted out by Daniel
 - added some desc to Kconfig suggested by Daniel
 - fixed code flow in tcf_bpf to avoid gotos suggested by Daniel
 - drop retval changed from -1 to 0 as suggested by Daniel and agreed by Alexei
 - added a little comment to tcf_bpf drop code as suggested by Daniel
v2->v3:
 - s/bpf_len/bpf_num_ops/ per DaveM's suggestion
v1->v2:
 - fixed error path in _init
 - added cleanup function to kill filter prog
---
 include/net/tc_act/tc_bpf.h        |  25 +++++
 include/uapi/linux/tc_act/Kbuild   |   1 +
 include/uapi/linux/tc_act/tc_bpf.h |  31 ++++++
 net/sched/Kconfig                  |  12 +++
 net/sched/Makefile                 |   1 +
 net/sched/act_bpf.c                | 205 +++++++++++++++++++++++++++++++++++++
 6 files changed, 275 insertions(+)
 create mode 100644 include/net/tc_act/tc_bpf.h
 create mode 100644 include/uapi/linux/tc_act/tc_bpf.h
 create mode 100644 net/sched/act_bpf.c

diff --git a/include/net/tc_act/tc_bpf.h b/include/net/tc_act/tc_bpf.h
new file mode 100644
index 0000000..86a070f
--- /dev/null
+++ b/include/net/tc_act/tc_bpf.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2015 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __NET_TC_BPF_H
+#define __NET_TC_BPF_H
+
+#include <linux/filter.h>
+#include <net/act_api.h>
+
+struct tcf_bpf {
+	struct tcf_common	common;
+	struct bpf_prog		*filter;
+	struct sock_filter	*bpf_ops;
+	u16			bpf_num_ops;
+};
+#define to_bpf(a) \
+	container_of(a->priv, struct tcf_bpf, common)
+
+#endif /* __NET_TC_BPF_H */
diff --git a/include/uapi/linux/tc_act/Kbuild b/include/uapi/linux/tc_act/Kbuild
index b057da2..19d5219 100644
--- a/include/uapi/linux/tc_act/Kbuild
+++ b/include/uapi/linux/tc_act/Kbuild
@@ -8,3 +8,4 @@ header-y += tc_nat.h
 header-y += tc_pedit.h
 header-y += tc_skbedit.h
 header-y += tc_vlan.h
+header-y += tc_bpf.h
diff --git a/include/uapi/linux/tc_act/tc_bpf.h b/include/uapi/linux/tc_act/tc_bpf.h
new file mode 100644
index 0000000..5288bd77
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_bpf.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2015 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_BPF_H
+#define __LINUX_TC_BPF_H
+
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_BPF 13
+
+struct tc_act_bpf {
+	tc_gen;
+};
+
+enum {
+	TCA_ACT_BPF_UNSPEC,
+	TCA_ACT_BPF_TM,
+	TCA_ACT_BPF_PARMS,
+	TCA_ACT_BPF_OPS_LEN,
+	TCA_ACT_BPF_OPS,
+	__TCA_ACT_BPF_MAX,
+};
+#define TCA_ACT_BPF_MAX (__TCA_ACT_BPF_MAX - 1)
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index c54c9d9..29a0f95 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -698,6 +698,18 @@ config NET_ACT_VLAN
 	  To compile this code as a module, choose M here: the
 	  module will be called act_vlan.
 
+config NET_ACT_BPF
+        tristate "BPF based action"
+        depends on NET_CLS_ACT
+        ---help---
+	  Say Y here to execute BPF code on packets. The BPF code will decide
+	  if the packet should be dropped of not.
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_bpf.
+
 config NET_CLS_IND
 	bool "Incoming device classification"
 	depends on NET_CLS_U32 || NET_CLS_FW
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 679f24a..7ca2b4e 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_NET_ACT_SIMP)	+= act_simple.o
 obj-$(CONFIG_NET_ACT_SKBEDIT)	+= act_skbedit.o
 obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
+obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
 obj-$(CONFIG_NET_SCH_FIFO)	+= sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)	+= sch_cbq.o
 obj-$(CONFIG_NET_SCH_HTB)	+= sch_htb.o
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
new file mode 100644
index 0000000..1bd257e
--- /dev/null
+++ b/net/sched/act_bpf.c
@@ -0,0 +1,205 @@
+/*
+ * Copyright (c) 2015 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/filter.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+
+#include <linux/tc_act/tc_bpf.h>
+#include <net/tc_act/tc_bpf.h>
+
+#define BPF_TAB_MASK     15
+
+static int tcf_bpf(struct sk_buff *skb, const struct tc_action *a,
+		   struct tcf_result *res)
+{
+	struct tcf_bpf *b = a->priv;
+	int action;
+	int filter_res;
+
+	spin_lock(&b->tcf_lock);
+	b->tcf_tm.lastuse = jiffies;
+	bstats_update(&b->tcf_bstats, skb);
+	action = b->tcf_action;
+
+	filter_res = BPF_PROG_RUN(b->filter, skb);
+	if (filter_res == 0) {
+		/* Return code 0 from the BPF program
+		 * is being interpreted as a drop here.
+		 */
+		action = TC_ACT_SHOT;
+		b->tcf_qstats.drops++;
+	}
+
+	spin_unlock(&b->tcf_lock);
+	return action;
+}
+
+static int tcf_bpf_dump(struct sk_buff *skb, struct tc_action *a,
+			int bind, int ref)
+{
+	unsigned char *tp = skb_tail_pointer(skb);
+	struct tcf_bpf *b = a->priv;
+	struct tc_act_bpf opt = {
+		.index    = b->tcf_index,
+		.refcnt   = b->tcf_refcnt - ref,
+		.bindcnt  = b->tcf_bindcnt - bind,
+		.action   = b->tcf_action,
+	};
+	struct tcf_t t;
+	struct nlattr *nla;
+
+	if (nla_put(skb, TCA_ACT_BPF_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	if (nla_put_u16(skb, TCA_ACT_BPF_OPS_LEN, b->bpf_num_ops))
+		goto nla_put_failure;
+
+	nla = nla_reserve(skb, TCA_ACT_BPF_OPS, b->bpf_num_ops *
+			  sizeof(struct sock_filter));
+	if (!nla)
+		goto nla_put_failure;
+
+	memcpy(nla_data(nla), b->bpf_ops, nla_len(nla));
+
+	t.install = jiffies_to_clock_t(jiffies - b->tcf_tm.install);
+	t.lastuse = jiffies_to_clock_t(jiffies - b->tcf_tm.lastuse);
+	t.expires = jiffies_to_clock_t(b->tcf_tm.expires);
+	if (nla_put(skb, TCA_ACT_BPF_TM, sizeof(t), &t))
+		goto nla_put_failure;
+	return skb->len;
+
+nla_put_failure:
+	nlmsg_trim(skb, tp);
+	return -1;
+}
+
+static const struct nla_policy act_bpf_policy[TCA_ACT_BPF_MAX + 1] = {
+	[TCA_ACT_BPF_PARMS]	= { .len = sizeof(struct tc_act_bpf) },
+	[TCA_ACT_BPF_OPS_LEN]	= { .type = NLA_U16 },
+	[TCA_ACT_BPF_OPS]	= { .type = NLA_BINARY,
+				    .len = sizeof(struct sock_filter) * BPF_MAXINSNS },
+};
+
+static int tcf_bpf_init(struct net *net, struct nlattr *nla,
+			struct nlattr *est, struct tc_action *a,
+			int ovr, int bind)
+{
+	struct nlattr *tb[TCA_ACT_BPF_MAX + 1];
+	struct tc_act_bpf *parm;
+	struct tcf_bpf *b;
+	u16 bpf_size, bpf_num_ops;
+	struct sock_filter *bpf_ops;
+	struct sock_fprog_kern tmp;
+	struct bpf_prog *fp;
+	int ret;
+
+	if (!nla)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_ACT_BPF_MAX, nla, act_bpf_policy);
+	if (ret < 0)
+		return ret;
+
+	if (!tb[TCA_ACT_BPF_PARMS] ||
+	    !tb[TCA_ACT_BPF_OPS_LEN] || !tb[TCA_ACT_BPF_OPS])
+		return -EINVAL;
+	parm = nla_data(tb[TCA_ACT_BPF_PARMS]);
+
+	bpf_num_ops = nla_get_u16(tb[TCA_ACT_BPF_OPS_LEN]);
+	if (bpf_num_ops	> BPF_MAXINSNS || bpf_num_ops == 0)
+		return -EINVAL;
+
+	bpf_size = bpf_num_ops * sizeof(*bpf_ops);
+	bpf_ops = kzalloc(bpf_size, GFP_KERNEL);
+	if (!bpf_ops)
+		return -ENOMEM;
+
+	memcpy(bpf_ops, nla_data(tb[TCA_ACT_BPF_OPS]), bpf_size);
+
+	tmp.len = bpf_num_ops;
+	tmp.filter = bpf_ops;
+
+	ret = bpf_prog_create(&fp, &tmp);
+	if (ret)
+		goto free_bpf_ops;
+
+	if (!tcf_hash_check(parm->index, a, bind)) {
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*b), bind);
+		if (ret)
+			goto destroy_fp;
+
+		ret = ACT_P_CREATED;
+	} else {
+		if (bind)
+			goto destroy_fp;
+		tcf_hash_release(a, bind);
+		if (!ovr) {
+			ret = -EEXIST;
+			goto destroy_fp;
+		}
+	}
+
+	b = to_bpf(a);
+	spin_lock_bh(&b->tcf_lock);
+	b->tcf_action = parm->action;
+	b->bpf_num_ops = bpf_num_ops;
+	b->bpf_ops = bpf_ops;
+	b->filter = fp;
+	spin_unlock_bh(&b->tcf_lock);
+
+	if (ret == ACT_P_CREATED)
+		tcf_hash_insert(a);
+	return ret;
+
+destroy_fp:
+	bpf_prog_destroy(fp);
+free_bpf_ops:
+	kfree(bpf_ops);
+	return ret;
+}
+
+static void tcf_bpf_cleanup(struct tc_action *a, int bind)
+{
+	struct tcf_bpf *b = a->priv;
+
+	bpf_prog_destroy(b->filter);
+}
+
+static struct tc_action_ops act_bpf_ops = {
+	.kind =		"bpf",
+	.type =		TCA_ACT_BPF,
+	.owner =	THIS_MODULE,
+	.act =		tcf_bpf,
+	.dump =		tcf_bpf_dump,
+	.cleanup =	tcf_bpf_cleanup,
+	.init =		tcf_bpf_init,
+};
+
+static int __init bpf_init_module(void)
+{
+	return tcf_register_action(&act_bpf_ops, BPF_TAB_MASK);
+}
+
+static void __exit bpf_cleanup_module(void)
+{
+	tcf_unregister_action(&act_bpf_ops);
+}
+
+module_init(bpf_init_module);
+module_exit(bpf_cleanup_module);
+
+MODULE_AUTHOR("Jiri Pirko <jiri@resnulli.us>");
+MODULE_DESCRIPTION("TC BPF based action");
+MODULE_LICENSE("GPL v2");
-- 
1.9.3

^ permalink raw reply related

* [patch net-next v4 2/2] tc: cls_bpf: rename bpf_len to bpf_num_ops
From: Jiri Pirko @ 2015-01-14 17:43 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs, dborkman, ast, hannes
In-Reply-To: <1421257404-25452-1-git-send-email-jiri@resnulli.us>

It was suggested by DaveM to change the name as "len" might indicate
unit bytes.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
---
v3->v4:
 - no change
patch added in v3
---
 net/sched/cls_bpf.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 84c8219..1029923 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -37,7 +37,7 @@ struct cls_bpf_prog {
 	struct tcf_result res;
 	struct list_head link;
 	u32 handle;
-	u16 bpf_len;
+	u16 bpf_num_ops;
 	struct tcf_proto *tp;
 	struct rcu_head rcu;
 };
@@ -160,7 +160,7 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
 	struct tcf_exts exts;
 	struct sock_fprog_kern tmp;
 	struct bpf_prog *fp;
-	u16 bpf_size, bpf_len;
+	u16 bpf_size, bpf_num_ops;
 	u32 classid;
 	int ret;
 
@@ -173,13 +173,13 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
 		return ret;
 
 	classid = nla_get_u32(tb[TCA_BPF_CLASSID]);
-	bpf_len = nla_get_u16(tb[TCA_BPF_OPS_LEN]);
-	if (bpf_len > BPF_MAXINSNS || bpf_len == 0) {
+	bpf_num_ops = nla_get_u16(tb[TCA_BPF_OPS_LEN]);
+	if (bpf_num_ops > BPF_MAXINSNS || bpf_num_ops == 0) {
 		ret = -EINVAL;
 		goto errout;
 	}
 
-	bpf_size = bpf_len * sizeof(*bpf_ops);
+	bpf_size = bpf_num_ops * sizeof(*bpf_ops);
 	bpf_ops = kzalloc(bpf_size, GFP_KERNEL);
 	if (bpf_ops == NULL) {
 		ret = -ENOMEM;
@@ -188,14 +188,14 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
 
 	memcpy(bpf_ops, nla_data(tb[TCA_BPF_OPS]), bpf_size);
 
-	tmp.len = bpf_len;
+	tmp.len = bpf_num_ops;
 	tmp.filter = bpf_ops;
 
 	ret = bpf_prog_create(&fp, &tmp);
 	if (ret)
 		goto errout_free;
 
-	prog->bpf_len = bpf_len;
+	prog->bpf_num_ops = bpf_num_ops;
 	prog->bpf_ops = bpf_ops;
 	prog->filter = fp;
 	prog->res.classid = classid;
@@ -303,10 +303,10 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 
 	if (nla_put_u32(skb, TCA_BPF_CLASSID, prog->res.classid))
 		goto nla_put_failure;
-	if (nla_put_u16(skb, TCA_BPF_OPS_LEN, prog->bpf_len))
+	if (nla_put_u16(skb, TCA_BPF_OPS_LEN, prog->bpf_num_ops))
 		goto nla_put_failure;
 
-	nla = nla_reserve(skb, TCA_BPF_OPS, prog->bpf_len *
+	nla = nla_reserve(skb, TCA_BPF_OPS, prog->bpf_num_ops *
 			  sizeof(struct sock_filter));
 	if (nla == NULL)
 		goto nla_put_failure;
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 0/2] net: DSA fixes for bridge and ip-autoconf
From: Florian Fainelli @ 2015-01-14 17:52 UTC (permalink / raw)
  To: netdev; +Cc: Florian Fainelli, bridge, kaber, davem, buytenh

Hi David,

These two patches address some real world use cases of the DSA master and slave
network devices.

You have already seen patch 1 previously and you rejected it since my
explanations were not good enough to provide a justification as to why it is
useful, hopefully this time my explanation is better.

Patch 2 solves a different, yet very real problem as well at the bridge layer
when using DSA network devices.

Thanks!

Florian Fainelli (2):
  net: ipv4: handle DSA enabled master network devices
  net: bridge: reject DSA-enabled master netdevices as bridge members

 net/bridge/br_if.c  | 10 ++++++++--
 net/ipv4/ipconfig.c | 10 +++++++---
 2 files changed, 15 insertions(+), 5 deletions(-)

-- 
2.1.0

^ permalink raw reply

* [PATCH net-next 1/2] net: ipv4: handle DSA enabled master network devices
From: Florian Fainelli @ 2015-01-14 17:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, stephen, kaber, bridge, buytenh, Florian Fainelli
In-Reply-To: <1421257932-11073-1-git-send-email-f.fainelli@gmail.com>

The logic to configure a network interface for kernel IP
auto-configuration is very simplistic, and does not handle the case
where a device is stacked onto another such as with DSA. This causes the
kernel not to open and configure the master network device in a DSA
switch tree, and therefore slave network devices using this master
network devices as conduit device cannot be open.

This restriction comes from a check in net/dsa/slave.c, which is
basically checking the master netdev flags for IFF_UP and returns
-ENETDOWN if it is not the case.

Automatically bringing-up DSA master network devices allows DSA slave
network devices to be used as valid interfaces for e.g: NFS root booting
by allowing kernel IP autoconfiguration to succeed on these interfaces.

On the reverse path, make sure we do not attempt to close a DSA-enabled
device as this would implicitely prevent the slave DSA network device
from operating.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 net/ipv4/ipconfig.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 7fa18bc7e47f..d10073d2be0f 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -209,9 +209,9 @@ static int __init ic_open_devs(void)
 	last = &ic_first_dev;
 	rtnl_lock();
 
-	/* bring loopback device up first */
+	/* bring loopback an DSA master network devices up first */
 	for_each_netdev(&init_net, dev) {
-		if (!(dev->flags & IFF_LOOPBACK))
+		if (!(dev->flags & IFF_LOOPBACK) && !netdev_uses_dsa(dev))
 			continue;
 		if (dev_change_flags(dev, dev->flags | IFF_UP) < 0)
 			pr_err("IP-Config: Failed to open %s\n", dev->name);
@@ -228,6 +228,7 @@ static int __init ic_open_devs(void)
 			if (!(dev->flags & IFF_NOARP))
 				able |= IC_RARP;
 			able &= ic_proto_enabled;
+
 			if (ic_proto_enabled && !able)
 				continue;
 			oflags = dev->flags;
@@ -306,7 +307,10 @@ static void __init ic_close_devs(void)
 	while ((d = next)) {
 		next = d->next;
 		dev = d->dev;
-		if (dev != ic_dev) {
+		/* Only bring down unused devices and not DSA enabled master
+		 * devices
+		 */
+		if (dev != ic_dev && !netdev_uses_dsa(dev)) {
 			DBG(("IP-Config: Downing %s\n", dev->name));
 			dev_change_flags(dev, d->flags);
 		}
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next 2/2] net: bridge: reject DSA-enabled master netdevices as bridge members
From: Florian Fainelli @ 2015-01-14 17:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, stephen, kaber, bridge, buytenh, Florian Fainelli
In-Reply-To: <1421257932-11073-1-git-send-email-f.fainelli@gmail.com>

DSA-enabled master network devices with a switch tagging protocol should
strip the protocol specific format before handing the frame over to
higher layer.

When adding such a DSA master network device as a bridge member, we go
through the following code path when receiving a frame:

__netif_receive_skb_core
	-> first ptype check against ptype_all is not returning any
	   handler for this skb

	-> check and invoke rx_handler:
		-> deliver frame to the bridge layer: br_handle_frame

DSA registers a ptype handler with the fake ETH_XDSA ethertype, which is
called *after* the bridge-layer rx_handler has run. br_handle_frame()
tries to parse the frame it received from the DSA master network device,
and will not be able to match any of its conditions and jumps straight
at the end of the end of br_handle_frame() and returns
RX_HANDLER_CONSUMED there.

Since we returned RX_HANDLER_CONSUMED, __netif_receive_skb_core() stops
RX processing for this frame and returns NET_RX_SUCCESS, so we never get
a chance to call our switch tag packet processing logic and deliver
frames to the DSA slave network devices, and so we do not get any
functional bridge members at all.

Instead of cluttering the bridge receive path with DSA-specific checks,
and rely on assumptions about how __netif_receive_skb_core() is
processing frames, we simply deny adding the DSA master network device
(conduit interface) as a bridge member, leaving only the slave DSA
network devices to be bridge members, since those will work correctly in
all circumstances.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 net/bridge/br_if.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 81e49fb73169..b087d278c679 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -436,10 +436,16 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 	int err = 0;
 	bool changed_addr;

-	/* Don't allow bridging non-ethernet like devices */
+	/* Don't allow bridging non-ethernet like devices, or DSA-enabled
+	 * master network devices since the bridge layer rx_handler prevents
+	 * the DSA fake ethertype handler to be invoked, so we do not strip off
+	 * the DSA switch tag protocol header and the bridge layer just return
+	 * RX_HANDLER_CONSUMED, stopping RX processing for these frames.
+	 */
 	if ((dev->flags & IFF_LOOPBACK) ||
 	    dev->type != ARPHRD_ETHER || dev->addr_len != ETH_ALEN ||
-	    !is_valid_ether_addr(dev->dev_addr))
+	    !is_valid_ether_addr(dev->dev_addr) ||
+	    netdev_uses_dsa(dev))
 		return -EINVAL;

 	/* No bridging of bridges */
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 net] be2net: Allow GRE to work concurrently while a VxLAN tunnel is configured
From: Sriharsha Basavapatna @ 2015-01-15 10:38 UTC (permalink / raw)
  To: netdev

Other tunnels like GRE break while VxLAN offloads are enabled in Skyhawk-R. To
avoid this, we should restrict offload features on a per-packet basis in such
conditions.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@emulex.com>
---
v2 changes: fixed minor nits pointed out by Sergei Shtylyov
---
 drivers/net/ethernet/emulex/benet/be_main.c |   41 +++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 41a0a54..d48806b 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4383,8 +4383,9 @@ static int be_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
  * distinguish various types of transports (VxLAN, GRE, NVGRE ..). So, offload
  * is expected to work across all types of IP tunnels once exported. Skyhawk
  * supports offloads for either VxLAN or NVGRE, exclusively. So we export VxLAN
- * offloads in hw_enc_features only when a VxLAN port is added. Note this only
- * ensures that other tunnels work fine while VxLAN offloads are not enabled.
+ * offloads in hw_enc_features only when a VxLAN port is added. If other (non
+ * VxLAN) tunnels are configured while VxLAN offloads are enabled, offloads for
+ * those other tunnels are unexported on the fly through ndo_features_check().
  *
  * Skyhawk supports VxLAN offloads only for one UDP dport. So, if the stack
  * adds more than one port, disable offloads and don't re-enable them again
@@ -4463,7 +4464,41 @@ static netdev_features_t be_features_check(struct sk_buff *skb,
 					   struct net_device *dev,
 					   netdev_features_t features)
 {
-	return vxlan_features_check(skb, features);
+	struct be_adapter *adapter = netdev_priv(dev);
+	u8 l4_hdr = 0;
+
+	/* The code below restricts offload features for some tunneled packets.
+	 * Offload features for normal (non tunnel) packets are unchanged.
+	 */
+	if (!skb->encapsulation ||
+	    !(adapter->flags & BE_FLAGS_VXLAN_OFFLOADS))
+		return features;
+
+	/* It's an encapsulated packet and VxLAN offloads are enabled. We
+	 * should disable tunnel offload features if it's not a VxLAN packet,
+	 * as tunnel offloads have been enabled only for VxLAN. This is done to
+	 * allow other tunneled traffic like GRE work fine while VxLAN
+	 * offloads are configured in Skyhawk-R.
+	 */
+	switch (vlan_get_protocol(skb)) {
+	case htons(ETH_P_IP):
+		l4_hdr = ip_hdr(skb)->protocol;
+		break;
+	case htons(ETH_P_IPV6):
+		l4_hdr = ipv6_hdr(skb)->nexthdr;
+		break;
+	default:
+		return features;
+	}
+
+	if (l4_hdr != IPPROTO_UDP ||
+	    skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
+	    skb->inner_protocol != htons(ETH_P_TEB) ||
+	    skb_inner_mac_header(skb) - skb_transport_header(skb) !=
+	    sizeof(struct udphdr) + sizeof(struct vxlanhdr))
+		return features & ~(NETIF_F_ALL_CSUM | NETIF_F_GSO_MASK);
+
+	return features;
 }
 #endif
 
-- 
1.7.9.5

^ permalink raw reply related

* Re: [net-next PATCH v1 00/11] A flow API
From: Thomas Graf @ 2015-01-14 19:02 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jamal Hadi Salim, sfeldma, jiri, simon.horman, netdev, davem,
	andy, Shrijeet Mukherjee
In-Reply-To: <54B01DA2.9090104@gmail.com>

On 01/09/15 at 10:27am, John Fastabend wrote:
> Yes we are providing an interface for userspace to interrogate the
> hardware and program it. My take on this is even if you embed this
> into another netlink family OVS, NFT, TCA you end up with the same
> operations w.r.t. table support (a) query hardware for
> resources/constraints/etc and (b) an API to add/del rules in those
> tables. It seems the intersection of these features with existing
> netlink families is fairly small so I opted to create a new family.
> The underlying hardware offload mechanisms in flow_table.c here could
> be used by in-kernel consumers as well as user space. For some
> consumers 'tc' perhaps this makes good sense for others 'OVS'
> it does not IMO.

+1

> [...]
>
> But in many cases my goal is to unify them in userspace
> where it is easier to make policy decisions. For OVS, NFT it
> seems to me that user space libraries can handle the unification
> of hardware/software dataplanes. Further I think it is the correct
> place to unify the dataplanes. I don't want to encode complex
> policies into the kernel. Even if you embed the netlink UAPI into
> another netlink family the semantics look the same.

I think we want the kernel to remain in control but it does not
necessarily have to hold the offload decision logic for all users.
I think this is compareable to routing daemons. We do not want to
talk BPF or OSPF in the kernel and we don't need to know about all
of the selection logic behind it but we want to be in charge of
keeping track of the actual datapath routes.

Also, I think this is still an option to emebed the proposed
attribtues in existing Netlink families even if we shoot for a new
family for now. So far the attributes seem to be defined in a way
that would allow them to be embedded into other existing Netlink
families.

> Maybe I need to be enlightened but I thought for a bit about some grand
> unification of ovs, bridge, tc, netlink, et. al. but that seems like
> an entirely different scope of project. (side note: filters/actions
> are no longer locked by qdisc and could stand on their own) My thoughts
> on this are not yet organized.

I think everybody had this in the back of their mind at some point.
Be it based on BPF, NFT or TC. I don't think it's undoable but it
takes a lot of effort as each is based on a slightly different set
of assumptions with corresponding focus derived from that.

^ permalink raw reply

* Re: [PATCH 3/6] net: davinci_emac: Free clock after checking the frequency
From: Tony Lindgren @ 2015-01-14 19:10 UTC (permalink / raw)
  To: David Miller; +Cc: thomas.lendacky, netdev, linux-omap, b.hutchman, balbi
In-Reply-To: <20150113.160522.558845761776458001.davem@davemloft.net>

* David Miller <davem@davemloft.net> [150113 13:08]:
> From: Tony Lindgren <tony@atomide.com>
> Date: Tue, 13 Jan 2015 11:54:16 -0800
> 
> > * Tom Lendacky <thomas.lendacky@amd.com> [150113 11:51]:
> >> On 01/13/2015 01:29 PM, Tony Lindgren wrote:
> >> >We only use clk_get() to get the frequency, the rest is done by
> >> >the runtime PM calls. Let's free the clock too.
> >> >
> >> >Cc: Brian Hutchinson <b.hutchman@gmail.com>
> >> >Cc: Felipe Balbi <balbi@ti.com>
> >> >Signed-off-by: Tony Lindgren <tony@atomide.com>
> >> >---
> >> >  drivers/net/ethernet/ti/davinci_emac.c | 1 +
> >> >  1 file changed, 1 insertion(+)
> >> >
> >> >diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
> >> >index deb43b3..e9efc74 100644
> >> >--- a/drivers/net/ethernet/ti/davinci_emac.c
> >> >+++ b/drivers/net/ethernet/ti/davinci_emac.c
> >> >@@ -1881,6 +1881,7 @@ static int davinci_emac_probe(struct platform_device *pdev)
> >> >  		return -EBUSY;
> >> >  	}
> >> >  	emac_bus_frequency = clk_get_rate(emac_clk);
> >> >+	clk_put(emac_clk);
> >> 
> >> The devm_clk_get call is used to get the clock so either a devm_clk_put
> >> needs to be used here or just let the devm_ call do its thing and
> >> automatically do the put when the module is unloaded.
> > 
> > Thanks good catch, updated patch below.
> 
> Please, once all the feedback has been addressed, repost the entire
> series.

Sure, will repost on Thursday in case there will be more comments.

Regards,

Tony

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox