Netdev List
 help / color / mirror / Atom feed
* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
From: Kees Cook @ 2017-08-31  4:01 UTC (permalink / raw)
  To: Mike Galbraith, David S. Miller
  Cc: LKML, Ingo Molnar, Reshetova, Elena, Network Development
In-Reply-To: <1504149176.23109.9.camel@gmx.de>

On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
>
>> Interesting! Can you try with 633547973ffc3 ("net: convert
>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
>> running haveged will help me trigger this on my system...
>
> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.

Wonderful! Thank you so much for helping track this down.

So, it seems that sk_buff.users will need some more special attention
before we can convert it to refcount.

x86-refcount will saturate with refcount_dec_and_test() if the result
is negative. But that would mean at least starting at 0. FULL should
have WARNed in this case, so I remain slightly confused why it was
missed by FULL.

Ingo, I'm not sure the best path for this. It seems we need to revert
230cd1279d001 and 633547973ffc3 and then we can restore
ARCH_HAS_REFCOUNT.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: Define {add,kill}_vid callbacks for !CONFIG_NET_NCSI
From: Florian Fainelli @ 2017-08-31  3:59 UTC (permalink / raw)
  To: Samuel Mendoza-Jonas, David S . Miller, netdev, linux-kernel,
	OpenBMC Maillist
  Cc: Joel Stanley, Benjamin Herrenschmidt, Gavin Shan
In-Reply-To: <20170831033846.23538-1-sam@mendozajonas.com>

On August 30, 2017 8:38:46 PM PDT, Samuel Mendoza-Jonas <sam@mendozajonas.com> wrote:
>Patch "net/ncsi: Configure VLAN tag filter" defined two new callback
>functions in include/net/ncsi.h, but neglected the !CONFIG_NET_NCSI
>case. This can cause a build error if these are referenced elsewhere
>without NCSI enabled, for example in ftgmac100:
>
>>>> ERROR: "ncsi_vlan_rx_kill_vid"
>[drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
>>>> ERROR: "ncsi_vlan_rx_add_vid"
>[drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
>
>Add definitions for !CONFIG_NET_NCSI to bring it into line with the
>rest
>of ncsi.h
>
>Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
>---
> include/net/ncsi.h | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
>diff --git a/include/net/ncsi.h b/include/net/ncsi.h
>index 1f96af46df49..2b13b6b91a4d 100644
>--- a/include/net/ncsi.h
>+++ b/include/net/ncsi.h
>@@ -36,6 +36,14 @@ int ncsi_start_dev(struct ncsi_dev *nd);
> void ncsi_stop_dev(struct ncsi_dev *nd);
> void ncsi_unregister_dev(struct ncsi_dev *nd);
> #else /* !CONFIG_NET_NCSI */
>+int ncsi_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16
>vid)
>+{
>+	return -ENOTTY;

Returning -EOPNOTSUPP would probably be more correct here.

>+}
>+int ncsi_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16
>vid)
>+{
>+	return -ENOTTY;

Likewise.

>+}
>static inline struct ncsi_dev *ncsi_register_dev(struct net_device
>*dev,
> 					void (*notifier)(struct ncsi_dev *nd))
> {


-- 
Florian

^ permalink raw reply

* linux-next: manual merge of the tip tree with the net-next tree
From: Stephen Rothwell @ 2017-08-31  3:47 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	David Miller, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List, Intiyaz Basha,
	Ying Huang

Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  drivers/net/ethernet/cavium/liquidio/lio_main.c

between commit:

  d1d97ee6e3a8 ("liquidio: moved liquidio_napi_drv_callback to lio_core.c")

from the net-next tree and commit:

  966a967116e6 ("smp: Avoid using two cache lines for struct call_single_data")

from the tip tree.

I fixed it up (I added the blow merge fix patch) and can carry the fix
as necessary. This is now fixed as far as linux-next is concerned, but
any non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Thu, 31 Aug 2017 13:42:50 +1000
Subject: [PATCH] liquidio: fix for merge with "smp: Avoid using two cache
 lines for struct call_single_data"

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 drivers/net/ethernet/cavium/liquidio/lio_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 0e7896cdb295..23f6b60030c5 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -613,7 +613,7 @@ static void liquidio_napi_drv_callback(void *arg)
 	    droq->cpu_id == this_cpu) {
 		napi_schedule_irqoff(&droq->napi);
 	} else {
-		struct call_single_data *csd = &droq->csd;
+		call_single_data_t *csd = &droq->csd;
 
 		csd->func = napi_schedule_wrapper;
 		csd->info = &droq->napi;
-- 
2.13.2

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply related

* [PATCH net-next] net/ncsi: Define {add,kill}_vid callbacks for !CONFIG_NET_NCSI
From: Samuel Mendoza-Jonas @ 2017-08-31  3:38 UTC (permalink / raw)
  To: David S . Miller, netdev, linux-kernel, OpenBMC Maillist
  Cc: Samuel Mendoza-Jonas, Joel Stanley, Benjamin Herrenschmidt,
	Gavin Shan

Patch "net/ncsi: Configure VLAN tag filter" defined two new callback
functions in include/net/ncsi.h, but neglected the !CONFIG_NET_NCSI
case. This can cause a build error if these are referenced elsewhere
without NCSI enabled, for example in ftgmac100:

>>> ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
>>> ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!

Add definitions for !CONFIG_NET_NCSI to bring it into line with the rest
of ncsi.h

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
---
 include/net/ncsi.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/net/ncsi.h b/include/net/ncsi.h
index 1f96af46df49..2b13b6b91a4d 100644
--- a/include/net/ncsi.h
+++ b/include/net/ncsi.h
@@ -36,6 +36,14 @@ int ncsi_start_dev(struct ncsi_dev *nd);
 void ncsi_stop_dev(struct ncsi_dev *nd);
 void ncsi_unregister_dev(struct ncsi_dev *nd);
 #else /* !CONFIG_NET_NCSI */
+int ncsi_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid)
+{
+	return -ENOTTY;
+}
+int ncsi_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid)
+{
+	return -ENOTTY;
+}
 static inline struct ncsi_dev *ncsi_register_dev(struct net_device *dev,
 					void (*notifier)(struct ncsi_dev *nd))
 {
-- 
2.14.1

^ permalink raw reply related

* Re: [net-next PATCHv6 1/2] dt-bindings: net: Add DT bindings for Socionext Netsec
From: Jassi Brar @ 2017-08-31  3:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Devicetree List,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	David S . Miller, Mark Rutland, arnd-r2nGTMty4D4@public.gmane.org,
	patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org, Jassi Brar,
	Rob Herring, Andy Green
In-Reply-To: <20170830151942.GB22289-g2DYL2Zd6BY@public.gmane.org>

On Wed, Aug 30, 2017 at 8:49 PM, Andrew Lunn <andrew-g2DYL2Zd6BY@public.gmane.org> wrote:
> On Wed, Aug 30, 2017 at 03:55:52PM +0530, Jassi Brar wrote:
>> This patch adds documentation for Device-Tree bindings for the
>> Socionext NetSec Controller driver.
>>
>> Signed-off-by: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>> ---
>>  .../devicetree/bindings/net/socionext-netsec.txt   | 46 ++++++++++++++++++++++
>>  1 file changed, 46 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/net/socionext-netsec.txt
>>
>> diff --git a/Documentation/devicetree/bindings/net/socionext-netsec.txt b/Documentation/devicetree/bindings/net/socionext-netsec.txt
>> new file mode 100644
>> index 0000000..12d596c
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/socionext-netsec.txt
>> @@ -0,0 +1,46 @@
>> +* Socionext NetSec Ethernet Controller IP
>> +
>> +Required properties:
>> +- compatible: Should be "socionext,netsecv5"
>> +- reg: Address and length of the register sets, the first is the main
>> +     registers, then the rdlar and tdlar regions for the SoC
>> +- interrupts: Should contain ethernet controller interrupt
>> +- clocks: phandle to any clocks to be switched by runtime_pm
>> +- phy-mode: See ethernet.txt file in the same directory
>
>> +- max-speed: See ethernet.txt file in the same directory
>> +- max-frame-size: See ethernet.txt file in the same directory, if 9000 or
>> +     above jumbo frames are enabled
>> +- local-mac-address: See ethernet.txt file in the same directory
>
> These three are required, not optimal?
>
optional :)

>> +- phy-handle: phandle to select child phy
>> +
>> +Optional properties:
>> +- use-jumbo: Boolean property to suggest if jumbo packets should be used or not
>> +
>> +For the child phy
>> +
>> +- compatible "ethernet-phy-ieee802.3-c22" is needed
>
> This is normally considered optional. Why require it?
>
Yes, will do.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH][next] bpf: test_maps: fix typo "conenct" -> "connext"
From: Joe Perches @ 2017-08-31  2:33 UTC (permalink / raw)
  To: Colin King, Alexei Starovoitov, Daniel Borkmann, Shuah Khan,
	netdev, linux-kselftest
  Cc: linux-kernel
In-Reply-To: <20170830114707.28493-1-colin.king@canonical.com>

On Wed, 2017-08-30 at 12:47 +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>

connext->connect typo in patch subject

> Trivial fix to typo in printf error message
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  tools/testing/selftests/bpf/test_maps.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> index 7059bb315a10..8fdaf2e21c8a 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -525,7 +525,7 @@ static void test_sockmap(int tasks, void *data)
>  		addr.sin_port = htons(ports[i - 2]);
>  		err = connect(sfd[i], (struct sockaddr *)&addr, sizeof(addr));
>  		if (err) {
> -			printf("failed to conenct\n");
> +			printf("failed to connect\n");
>  			goto out;
>  		}
>  	}

^ permalink raw reply

* Re: [PATCH net] Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
From: David Miller @ 2017-08-31  1:47 UTC (permalink / raw)
  To: f.fainelli
  Cc: netdev, geert+renesas, ddaney.cavm, slash.tmp, marc_gonzales,
	andrew
In-Reply-To: <1504140569-2063-1-git-send-email-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed, 30 Aug 2017 17:49:29 -0700

> This reverts commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ("net: phy:
> Correctly process PHY_HALTED in phy_stop_machine()") because it is
> creating the possibility for a NULL pointer dereference.
> 
> David Daney provide the following call trace and diagram of events:
> 
> When ndo_stop() is called we call:
> 
>  phy_disconnect()
>     +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
>     +---> phy_stop_machine()
>     |      +---> phy_state_machine()
>     |              +----> queue_delayed_work(): Work queued.
>     +--->phy_detach() implies: phydev->attached_dev = NULL;
> 
> Now at a later time the queued work does:
> 
>  phy_state_machine()
>     +---->netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:
> 
>  CPU 12 Unable to handle kernel paging request at virtual address
> 0000000000000048, epc == ffffffff80de37ec, ra == ffffffff80c7c
> Oops[#1]:
> CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
> Workqueue: events_power_efficient phy_state_machine
> task: 80000004021ed100 task.stack: 8000000409d70000
> $ 0   : 0000000000000000 ffffffff84720060 0000000000000048 0000000000000004
> $ 4   : 0000000000000000 0000000000000001 0000000000000004 0000000000000000
> $ 8   : 0000000000000000 0000000000000000 00000000ffff98f3 0000000000000000
> $12   : 8000000409d73fe0 0000000000009c00 ffffffff846547c8 000000000000af3b
> $16   : 80000004096bab68 80000004096babd0 0000000000000000 80000004096ba800
> $20   : 0000000000000000 0000000000000000 ffffffff81090000 0000000000000008
> $24   : 0000000000000061 ffffffff808637b0
> $28   : 8000000409d70000 8000000409d73cf0 80000000271bd300 ffffffff80c7804c
> Hi    : 000000000000002a
> Lo    : 000000000000003f
> epc   : ffffffff80de37ec netif_carrier_off+0xc/0x58
> ra    : ffffffff80c7804c phy_state_machine+0x48c/0x4f8
> Status: 14009ce3        KX SX UX KERNEL EXL IE
> Cause : 00800008 (ExcCode 02)
> BadVA : 0000000000000048
> PrId  : 000d9501 (Cavium Octeon III)
> Modules linked in:
> Process kworker/12:1 (pid: 1502, threadinfo=8000000409d70000,
> task=80000004021ed100, tls=0000000000000000)
> Stack : 8000000409a54000 80000004096bab68 80000000271bd300 80000000271c1e00
>         0000000000000000 ffffffff808a1708 8000000409a54000 80000000271bd300
>         80000000271bd320 8000000409a54030 ffffffff80ff0f00 0000000000000001
>         ffffffff81090000 ffffffff808a1ac0 8000000402182080 ffffffff84650000
>         8000000402182080 ffffffff84650000 ffffffff80ff0000 8000000409a54000
>         ffffffff808a1970 0000000000000000 80000004099e8000 8000000402099240
>         0000000000000000 ffffffff808a8598 0000000000000000 8000000408eeeb00
>         8000000409a54000 00000000810a1d00 0000000000000000 8000000409d73de8
>         8000000409d73de8 0000000000000088 000000000c009c00 8000000409d73e08
>         8000000409d73e08 8000000402182080 ffffffff808a84d0 8000000402182080
>         ...
> Call Trace:
> [<ffffffff80de37ec>] netif_carrier_off+0xc/0x58
> [<ffffffff80c7804c>] phy_state_machine+0x48c/0x4f8
> [<ffffffff808a1708>] process_one_work+0x158/0x368
> [<ffffffff808a1ac0>] worker_thread+0x150/0x4c0
> [<ffffffff808a8598>] kthread+0xc8/0xe0
> [<ffffffff808617f0>] ret_from_kernel_thread+0x14/0x1c
> 
> The original motivation for this change originated from Marc Gonzales
> indicating that his network driver did not have its adjust_link callback
> executing with phydev->link = 0 while he was expecting it.
> 
> PHYLIB has never made any such guarantees ever because phy_stop() merely just
> tells the workqueue to move into PHY_HALTED state which will happen
> asynchronously.
> 
> Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
> Reported-by: David Daney <ddaney.cavm@gmail.com>
> Fixes: 7ad813f20853 ("net: phy: Correctly process PHY_HALTED in phy_stop_machine()")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied and queued up for -stable, thanks Florian.

^ permalink raw reply

* Re: DSA mv88e6xxx RX frame errors and TCP/IP RX failure
From: Andrew Lunn @ 2017-08-31  1:46 UTC (permalink / raw)
  To: Tim Harvey
  Cc: netdev, Vivien Didelot, linux-kernel@vger.kernel.org, Fugang Duan
In-Reply-To: <CAJ+vNU3Qto9iNdCUExb4NWaa3PA1UWe1kbY1b=sBLPgw_xiz7g@mail.gmail.com>

> >                        /* Report late collisions as a frame error. */
> >                         if (status & (BD_ENET_RX_NO | BD_ENET_RX_CL))
> >                                 ndev->stats.rx_frame_errors++;
> >
> > I don't see anywhere else frame errors are counted, but it would be
> > good to prove we are looking in the right place.
> >
> 
> Andrew,
> 
> (adding IMX FEC driver maintainer to CC)
> 
> Yes, that's one of them being hit. It looks like ifconfig reports
> 'frame' as the accumulation of a few stats so here are some more
> specifics from /sys/class/net/eth0/statistics:
> 
> root@xenial:/sys/devices/soc0/soc/2100000.aips-bus/2188000.ethernet/net/eth0/statistics#
> for i in `ls rx_*`; do echo $i:$(cat $i); done
> rx_bytes:103229
> rx_compressed:0
> rx_crc_errors:22
> rx_dropped:0
> rx_errors:22
> rx_fifo_errors:0
> rx_frame_errors:22
> rx_length_errors:22
> rx_missed_errors:0
> rx_nohandler:0
> rx_over_errors:0
> rx_packets:1174
> root@xenial:/sys/devices/soc0/soc/2100000.aips-bus/2188000.ethernet/net/eth0/statistics#
> ifconfig eth0
> eth0      Link encap:Ethernet  HWaddr 00:D0:12:41:F3:E7
>           inet6 addr: fe80::2d0:12ff:fe41:f3e7/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1207 errors:22 dropped:0 overruns:0 frame:66
>           TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:106009 (103.5 KiB)  TX bytes:4604 (4.4 KiB)
> 
> Instrumenting fec driver I see the following getting hit:
> 
> status & BD_ENET_RX_LG /* rx_length_errors: Frame too long */
> status & BD_ENET_RX_CR  /* rx_crc_errors: CRC Error */
> status & BD_ENET_RX_CL /* rx_frame_errors: Collision? */
> 
> Is this a frame size issue where the MV88E6176 is sending frames down
> that exceed the MTU because of headers added?

I did fix an issue recently with that. See 

commit fbbeefdd21049fcf9437c809da3828b210577f36
Author: Andrew Lunn <andrew@lunn.ch>
Date:   Sun Jul 30 19:36:05 2017 +0200

    net: fec: Allow reception of frames bigger than 1522 bytes
    
    The FEC Receive Control Register has a 14 bit field indicating the
    longest frame that may be received. It is being set to 1522. Frames
    longer than this are discarded, but counted as being in error.
    
    When using DSA, frames from the switch has an additional header,
    either 4 or 8 bytes if a Marvell switch is used. Thus a full MTU frame
    of 1522 bytes received by the switch on a port becomes 1530 bytes when
    passed to the host via the FEC interface.
    
    Change the maximum receive size to 2048 - 64, where 64 is the maximum
    rx_alignment applied on the receive buffer for AVB capable FEC
    cores. Use this value also for the maximum receive buffer size. The
    driver is already allocating a receive SKB of 2048 bytes, so this
    change should not have any significant effects.
    
    Tested on imx51, imx6, vf610.
    
    Signed-off-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>


However, this is was of an all/nothing problem. All frames with the
full MTU were getting dropped, where as i think you are only seeing a
few dropped?

Anyway, try cherry picking that patch and see if it helps.

	Andrew

^ permalink raw reply

* [PATCH net-next v4 1/2] inet_diag: allow protocols to provide additional data
From: Ivan Delalande @ 2017-08-31  1:33 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, netdev, Ivan Delalande
In-Reply-To: <20170831013312.29142-1-colona@arista.com>

Extend inet_diag_handler to allow individual protocols to report
additional data on INET_DIAG_INFO through idiag_get_aux. The size
can be dynamic and is computed by idiag_get_aux_size.

Signed-off-by: Ivan Delalande <colona@arista.com>
---
 include/linux/inet_diag.h |  7 +++++++
 net/ipv4/inet_diag.c      | 22 ++++++++++++++++++----
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index 65da430e260f..ee251c585854 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -25,6 +25,13 @@ struct inet_diag_handler {
 					  struct inet_diag_msg *r,
 					  void *info);
 
+	int		(*idiag_get_aux)(struct sock *sk,
+					 bool net_admin,
+					 struct sk_buff *skb);
+
+	size_t		(*idiag_get_aux_size)(struct sock *sk,
+					      bool net_admin);
+
 	int		(*destroy)(struct sk_buff *in_skb,
 				   const struct inet_diag_req_v2 *req);
 
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 67325d5832d7..cb7012d1720f 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -93,8 +93,17 @@ void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(inet_diag_msg_common_fill);
 
-static size_t inet_sk_attr_size(void)
+static size_t inet_sk_attr_size(struct sock *sk,
+				const struct inet_diag_req_v2 *req,
+				bool net_admin)
 {
+	const struct inet_diag_handler *handler;
+	size_t aux = 0;
+
+	handler = inet_diag_table[req->sdiag_protocol];
+	if (handler && handler->idiag_get_aux_size)
+		aux = handler->idiag_get_aux_size(sk, net_admin);
+
 	return	  nla_total_size(sizeof(struct tcp_info))
 		+ nla_total_size(1) /* INET_DIAG_SHUTDOWN */
 		+ nla_total_size(1) /* INET_DIAG_TOS */
@@ -105,6 +114,7 @@ static size_t inet_sk_attr_size(void)
 		+ nla_total_size(SK_MEMINFO_VARS * sizeof(u32))
 		+ nla_total_size(TCP_CA_NAME_MAX)
 		+ nla_total_size(sizeof(struct tcpvegas_info))
+		+ nla_total_size(aux)
 		+ 64;
 }
 
@@ -260,6 +270,10 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 
 	handler->idiag_get_info(sk, r, info);
 
+	if (ext & (1 << (INET_DIAG_INFO - 1)) && handler->idiag_get_aux)
+		if (handler->idiag_get_aux(sk, net_admin, skb) < 0)
+			goto errout;
+
 	if (sk->sk_state < TCP_TIME_WAIT) {
 		union tcp_cc_info info;
 		size_t sz = 0;
@@ -449,6 +463,7 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo,
 			    const struct nlmsghdr *nlh,
 			    const struct inet_diag_req_v2 *req)
 {
+	bool net_admin = netlink_net_capable(in_skb, CAP_NET_ADMIN);
 	struct net *net = sock_net(in_skb->sk);
 	struct sk_buff *rep;
 	struct sock *sk;
@@ -458,7 +473,7 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo,
 	if (IS_ERR(sk))
 		return PTR_ERR(sk);
 
-	rep = nlmsg_new(inet_sk_attr_size(), GFP_KERNEL);
+	rep = nlmsg_new(inet_sk_attr_size(sk, req, net_admin), GFP_KERNEL);
 	if (!rep) {
 		err = -ENOMEM;
 		goto out;
@@ -467,8 +482,7 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo,
 	err = sk_diag_fill(sk, rep, req,
 			   sk_user_ns(NETLINK_CB(in_skb).sk),
 			   NETLINK_CB(in_skb).portid,
-			   nlh->nlmsg_seq, 0, nlh,
-			   netlink_net_capable(in_skb, CAP_NET_ADMIN));
+			   nlh->nlmsg_seq, 0, nlh, net_admin);
 	if (err < 0) {
 		WARN_ON(err == -EMSGSIZE);
 		nlmsg_free(rep);
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v4 0/2] report TCP MD5 signing keys and addresses
From: Ivan Delalande @ 2017-08-31  1:33 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, netdev, Ivan Delalande

Allow userspace to retrieve MD5 signature keys and addresses configured
on TCP sockets through inet_diag.

Thanks to Eric Dumazet and Stephen Hemminger for their useful
explanations and feedback.

v4: - add new struct tcp_diag_md5sig to report the data instead of
      tcp_md5sig to avoid wasting 112 bytes on every tcpm_addr,
    - memset tcpm_addr on IPv4 addresses to avoid leaks,
    - style fix in inet_diag_dump_one_icsk.

v3: - rename inet_diag_*md5sig in tcp_diag.c to tcp_diag_* for
      consistency,
    - don't lock the socket in tcp_diag_put_md5sig,
    - add checks on md5sig_count in tcp_diag_put_md5sig to not create
      the netlink attribute if the list is empty, and to avoid overflows
      or memory leaks if the list has changed in the meantime.

v2: - move changes to tcp_diag.c and extend inet_diag_handler to allow
      protocols to provide additional data on INET_DIAG_INFO,
    - lock socket before calling tcp_diag_put_md5sig.


I also have a patch for iproute2/ss to test this change, making it print
this new attribute. I'm planning to polish and send it if this series
gets applied.


Ivan Delalande (2):
  inet_diag: allow protocols to provide additional data
  tcp_diag: report TCP MD5 signing keys and addresses

 include/linux/inet_diag.h      |   7 +++
 include/uapi/linux/inet_diag.h |   1 +
 include/uapi/linux/tcp.h       |   9 ++++
 net/ipv4/inet_diag.c           |  22 +++++++--
 net/ipv4/tcp_diag.c            | 110 ++++++++++++++++++++++++++++++++++++++---
 5 files changed, 139 insertions(+), 10 deletions(-)

-- 
2.14.1

^ permalink raw reply

* [PATCH net-next v4 2/2] tcp_diag: report TCP MD5 signing keys and addresses
From: Ivan Delalande @ 2017-08-31  1:33 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, netdev, Ivan Delalande
In-Reply-To: <20170831013312.29142-1-colona@arista.com>

Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to
processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is
not possible to retrieve these from the kernel once they have been
configured on sockets.

Signed-off-by: Ivan Delalande <colona@arista.com>
---
 include/uapi/linux/inet_diag.h |   1 +
 include/uapi/linux/tcp.h       |   9 ++++
 net/ipv4/tcp_diag.c            | 110 ++++++++++++++++++++++++++++++++++++++---
 3 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index 678496897a68..f52ff62bfabe 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -143,6 +143,7 @@ enum {
 	INET_DIAG_MARK,
 	INET_DIAG_BBRINFO,
 	INET_DIAG_CLASS_ID,
+	INET_DIAG_MD5SIG,
 	__INET_DIAG_MAX,
 };
 
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 030e594bab45..15c25eccab2b 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -256,4 +256,13 @@ struct tcp_md5sig {
 	__u8	tcpm_key[TCP_MD5SIG_MAXKEYLEN];		/* key (binary) */
 };
 
+/* INET_DIAG_MD5SIG */
+struct tcp_diag_md5sig {
+	__u8	tcpm_family;
+	__u8	tcpm_prefixlen;
+	__u16	tcpm_keylen;
+	__be32	tcpm_addr[4];
+	__u8	tcpm_key[TCP_MD5SIG_MAXKEYLEN];
+};
+
 #endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index a748c74aa8b7..65d0c34a76ee 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -16,6 +16,7 @@
 
 #include <linux/tcp.h>
 
+#include <net/netlink.h>
 #include <net/tcp.h>
 
 static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
@@ -36,6 +37,101 @@ static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
 		tcp_get_info(sk, info);
 }
 
+#ifdef CONFIG_TCP_MD5SIG
+static void tcp_diag_md5sig_fill(struct tcp_diag_md5sig *info,
+				 const struct tcp_md5sig_key *key)
+{
+	info->tcpm_family = key->family;
+	info->tcpm_prefixlen = key->prefixlen;
+	info->tcpm_keylen = key->keylen;
+	memcpy(info->tcpm_key, key->key, key->keylen);
+
+	if (key->family == AF_INET) {
+		memset(info->tcpm_addr, 0, sizeof(info->tcpm_addr));
+		info->tcpm_addr[0] = key->addr.a4.s_addr;
+	}
+	#if IS_ENABLED(CONFIG_IPV6)
+	else if (key->family == AF_INET6) {
+		memcpy(&info->tcpm_addr, &key->addr.a6,
+		       sizeof(info->tcpm_addr));
+	}
+	#endif
+}
+
+static int tcp_diag_put_md5sig(struct sk_buff *skb,
+			       const struct tcp_md5sig_info *md5sig)
+{
+	const struct tcp_md5sig_key *key;
+	struct nlattr *attr;
+	struct tcp_diag_md5sig *info;
+	int md5sig_count = 0;
+
+	hlist_for_each_entry_rcu(key, &md5sig->head, node)
+		md5sig_count++;
+	if (md5sig_count == 0)
+		return 0;
+
+	attr = nla_reserve(skb, INET_DIAG_MD5SIG,
+			   md5sig_count * sizeof(struct tcp_diag_md5sig));
+	if (!attr)
+		return -EMSGSIZE;
+
+	info = nla_data(attr);
+	hlist_for_each_entry_rcu(key, &md5sig->head, node) {
+		tcp_diag_md5sig_fill(info++, key);
+		if (--md5sig_count == 0)
+			break;
+	}
+	if (md5sig_count > 0)
+		memset(info, 0, md5sig_count * sizeof(struct tcp_diag_md5sig));
+
+	return 0;
+}
+#endif
+
+static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
+			    struct sk_buff *skb)
+{
+#ifdef CONFIG_TCP_MD5SIG
+	if (net_admin) {
+		struct tcp_md5sig_info *md5sig;
+		int err = 0;
+
+		rcu_read_lock();
+		md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info);
+		if (md5sig)
+			err = tcp_diag_put_md5sig(skb, md5sig);
+		rcu_read_unlock();
+		if (err < 0)
+			return err;
+	}
+#endif
+
+	return 0;
+}
+
+static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
+{
+	size_t size = 0;
+
+#ifdef CONFIG_TCP_MD5SIG
+	if (sk_fullsock(sk)) {
+		const struct tcp_md5sig_info *md5sig;
+		const struct tcp_md5sig_key *key;
+
+		rcu_read_lock();
+		md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info);
+		if (md5sig) {
+			hlist_for_each_entry_rcu(key, &md5sig->head, node)
+				size += sizeof(struct tcp_diag_md5sig);
+		}
+		rcu_read_unlock();
+	}
+#endif
+
+	return size;
+}
+
 static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 			  const struct inet_diag_req_v2 *r, struct nlattr *bc)
 {
@@ -68,13 +164,15 @@ static int tcp_diag_destroy(struct sk_buff *in_skb,
 #endif
 
 static const struct inet_diag_handler tcp_diag_handler = {
-	.dump		 = tcp_diag_dump,
-	.dump_one	 = tcp_diag_dump_one,
-	.idiag_get_info	 = tcp_diag_get_info,
-	.idiag_type	 = IPPROTO_TCP,
-	.idiag_info_size = sizeof(struct tcp_info),
+	.dump			= tcp_diag_dump,
+	.dump_one		= tcp_diag_dump_one,
+	.idiag_get_info		= tcp_diag_get_info,
+	.idiag_get_aux		= tcp_diag_get_aux,
+	.idiag_get_aux_size	= tcp_diag_get_aux_size,
+	.idiag_type		= IPPROTO_TCP,
+	.idiag_info_size	= sizeof(struct tcp_info),
 #ifdef CONFIG_INET_DIAG_DESTROY
-	.destroy	 = tcp_diag_destroy,
+	.destroy		= tcp_diag_destroy,
 #endif
 };
 
-- 
2.14.1

^ permalink raw reply related

* RE: [patch net-next v2 0/3] net/sched: Improve getting objects by indexes
From: Chris Mi @ 2017-08-31  1:05 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us,
	mawilcox@microsoft.com
In-Reply-To: <20170830.143915.721461254974312457.davem@davemloft.net>



> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, August 31, 2017 5:39 AM
> To: Chris Mi <chrism@mellanox.com>
> Cc: netdev@vger.kernel.org; jhs@mojatatu.com;
> xiyou.wangcong@gmail.com; jiri@resnulli.us; mawilcox@microsoft.com
> Subject: Re: [patch net-next v2 0/3] net/sched: Improve getting objects by
> indexes
> 
> From: Chris Mi <chrism@mellanox.com>
> Date: Wed, 30 Aug 2017 02:31:56 -0400
> 
> > Using current TC code, it is very slow to insert a lot of rules.
> >
> > In order to improve the rules update rate in TC, we introduced the
> > following two changes:
> >         1) changed cls_flower to use IDR to manage the filters.
> >         2) changed all act_xxx modules to use IDR instead of
> >            a small hash table
> >
> > But IDR has a limitation that it uses int. TC handle uses u32.
> > To make sure there is no regression, we add several new IDR APIs to
> > support unsigned long.
> >
> > v2
> > ==
> >
> > Addressed Hannes's comment:
> > express idr_alloc in terms of idr_alloc_ext and most of the other
> > functions
> 
> Series applied, thanks.

Thank you, David,

-Chris

^ permalink raw reply

* RE: [patch net-next v2 3/3] net/sched: Change act_api and act_xxx modules to use IDR
From: Chris Mi @ 2017-08-31  1:05 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev@vger.kernel.org
  Cc: xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	mawilcox@microsoft.com
In-Reply-To: <5b54f1c5-688a-5a7d-25bd-05dc9cbecebd@mojatatu.com>



> -----Original Message-----
> From: Jamal Hadi Salim [mailto:jhs@mojatatu.com]
> Sent: Wednesday, August 30, 2017 8:11 PM
> To: Chris Mi <chrism@mellanox.com>; netdev@vger.kernel.org
> Cc: xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net;
> mawilcox@microsoft.com
> Subject: Re: [patch net-next v2 3/3] net/sched: Change act_api and act_xxx
> modules to use IDR
> 
> On 17-08-30 02:31 AM, Chris Mi wrote:
> > Typically, each TC filter has its own action. All the actions of the
> > same type are saved in its hash table. But the hash buckets are too
> > small that it degrades to a list. And the performance is greatly
> > affected. For example, it takes about 0m11.914s to insert 64K rules.
> > If we convert the hash table to IDR, it only takes about 0m1.500s.
> > The improvement is huge.
> >
> > But please note that the test result is based on previous patch that
> > cls_flower uses IDR.
> >
> > Signed-off-by: Chris Mi <chrism@mellanox.com>
> > Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> 
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> 
> Also already acked this before but you left it out in this version. If you make
> changes to the patch then you will need a new ACK.
Sorry about that, Jamal. I think I need to make a note of the review comment
In case I forget it.
> 
> Dont forget to update selftests please.
Sure, we will work on that.

Thanks,
Chris
> 
> cheers,
> jamal

^ permalink raw reply

* [PATCH net] Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
From: Florian Fainelli @ 2017-08-31  0:49 UTC (permalink / raw)
  To: netdev
  Cc: Geert Uytterhoeven, David Daney, slash.tmp, marc_gonzales, davem,
	andrew, Florian Fainelli

This reverts commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ("net: phy:
Correctly process PHY_HALTED in phy_stop_machine()") because it is
creating the possibility for a NULL pointer dereference.

David Daney provide the following call trace and diagram of events:

When ndo_stop() is called we call:

 phy_disconnect()
    +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
    +---> phy_stop_machine()
    |      +---> phy_state_machine()
    |              +----> queue_delayed_work(): Work queued.
    +--->phy_detach() implies: phydev->attached_dev = NULL;

Now at a later time the queued work does:

 phy_state_machine()
    +---->netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:

 CPU 12 Unable to handle kernel paging request at virtual address
0000000000000048, epc == ffffffff80de37ec, ra == ffffffff80c7c
Oops[#1]:
CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
Workqueue: events_power_efficient phy_state_machine
task: 80000004021ed100 task.stack: 8000000409d70000
$ 0   : 0000000000000000 ffffffff84720060 0000000000000048 0000000000000004
$ 4   : 0000000000000000 0000000000000001 0000000000000004 0000000000000000
$ 8   : 0000000000000000 0000000000000000 00000000ffff98f3 0000000000000000
$12   : 8000000409d73fe0 0000000000009c00 ffffffff846547c8 000000000000af3b
$16   : 80000004096bab68 80000004096babd0 0000000000000000 80000004096ba800
$20   : 0000000000000000 0000000000000000 ffffffff81090000 0000000000000008
$24   : 0000000000000061 ffffffff808637b0
$28   : 8000000409d70000 8000000409d73cf0 80000000271bd300 ffffffff80c7804c
Hi    : 000000000000002a
Lo    : 000000000000003f
epc   : ffffffff80de37ec netif_carrier_off+0xc/0x58
ra    : ffffffff80c7804c phy_state_machine+0x48c/0x4f8
Status: 14009ce3        KX SX UX KERNEL EXL IE
Cause : 00800008 (ExcCode 02)
BadVA : 0000000000000048
PrId  : 000d9501 (Cavium Octeon III)
Modules linked in:
Process kworker/12:1 (pid: 1502, threadinfo=8000000409d70000,
task=80000004021ed100, tls=0000000000000000)
Stack : 8000000409a54000 80000004096bab68 80000000271bd300 80000000271c1e00
        0000000000000000 ffffffff808a1708 8000000409a54000 80000000271bd300
        80000000271bd320 8000000409a54030 ffffffff80ff0f00 0000000000000001
        ffffffff81090000 ffffffff808a1ac0 8000000402182080 ffffffff84650000
        8000000402182080 ffffffff84650000 ffffffff80ff0000 8000000409a54000
        ffffffff808a1970 0000000000000000 80000004099e8000 8000000402099240
        0000000000000000 ffffffff808a8598 0000000000000000 8000000408eeeb00
        8000000409a54000 00000000810a1d00 0000000000000000 8000000409d73de8
        8000000409d73de8 0000000000000088 000000000c009c00 8000000409d73e08
        8000000409d73e08 8000000402182080 ffffffff808a84d0 8000000402182080
        ...
Call Trace:
[<ffffffff80de37ec>] netif_carrier_off+0xc/0x58
[<ffffffff80c7804c>] phy_state_machine+0x48c/0x4f8
[<ffffffff808a1708>] process_one_work+0x158/0x368
[<ffffffff808a1ac0>] worker_thread+0x150/0x4c0
[<ffffffff808a8598>] kthread+0xc8/0xe0
[<ffffffff808617f0>] ret_from_kernel_thread+0x14/0x1c

The original motivation for this change originated from Marc Gonzales
indicating that his network driver did not have its adjust_link callback
executing with phydev->link = 0 while he was expecting it.

PHYLIB has never made any such guarantees ever because phy_stop() merely just
tells the workqueue to move into PHY_HALTED state which will happen
asynchronously.

Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reported-by: David Daney <ddaney.cavm@gmail.com>
Fixes: 7ad813f20853 ("net: phy: Correctly process PHY_HALTED in phy_stop_machine()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/phy.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 5068c582d502..d0626bf5c540 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -749,9 +749,6 @@ void phy_stop_machine(struct phy_device *phydev)
 	if (phydev->state > PHY_UP && phydev->state != PHY_HALTED)
 		phydev->state = PHY_UP;
 	mutex_unlock(&phydev->lock);
-
-	/* Now we can run the state machine synchronously */
-	phy_state_machine(&phydev->state_queue.work);
 }
 
 /**
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()
From: Florian Fainelli @ 2017-08-31  0:52 UTC (permalink / raw)
  To: David Daney, David Miller
  Cc: netdev, andrew, slash.tmp, marc_gonzalez, rmk+kernel,
	Geert Uytterhoeven
In-Reply-To: <57dfe1c5-1816-cf94-7676-293a9dcd343c@gmail.com>

On 08/30/2017 05:13 PM, David Daney wrote:
> On 07/31/2017 05:28 PM, David Miller wrote:
>> From: Florian Fainelli <f.fainelli@gmail.com>
>> Date: Fri, 28 Jul 2017 11:58:36 -0700
>>
>>> Marc reported that he was not getting the PHY library adjust_link()
>>> callback function to run when calling phy_stop() + phy_disconnect()
>>> which does not indeed happen because we set the state machine to
>>> PHY_HALTED but we don't get to run it to process this state past that
>>> point.
>>>
>>> Fix this with a synchronous call to phy_state_machine() in order to have
>>> the state machine actually act on PHY_HALTED, set the PHY device's link
>>> down, turn the network device's carrier off and finally call the
>>> adjust_link() function.
>>>
>>> Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
>>> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>>> ---
>>> Changes in v2:
>>>
>>> - reword subject and commit message based on changes
>>> - dropped flush_scheduled_work() since it is redundant
>>
>> Applied and queued up for -stable, thanks.
>>
> 
> 
> This is broken.  Please revert.
> 
> Upstream commit 7ad813f20853 and in the stable branches as well.
> 
> When ndo_stop() is called we call:
> 
> 
>  phy_disconnect()
>     +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
>     +---> phy_stop_machine()
>     |      +---> phy_stop_machine()
>     |              +----> queue_delayed_work(): Work queued.
>     +--->phy_detach() implies: phydev->attached_dev = NULL;
> 
> Now at a later time the queued work does:
> 
>  phy_state_machine()
>     +---->netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:

How about the following instead of a revert (which I have queued locally
as well along with your correct call graph). This still would not fix
Geert's problem where with this change, we do actually call back into
adjust_link after a phy_stop() which may be problematic for him so I
think the revert is just easier and Marc, we'll figure out something for
nb8800?

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d0626bf5c540..78168e19bd5d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1234,7 +1234,7 @@ void phy_state_machine(struct work_struct *work)
         * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
         * between states from phy_mac_interrupt()
         */
-       if (phydev->irq == PHY_POLL)
+       if (phydev->irq == PHY_POLL && phydev->state != PHY_HALTED)
                queue_delayed_work(system_power_efficient_wq,
&phydev->state_queue,
                                   PHY_STATE_TIME * HZ);
 }
> 
> 
>  CPU 12 Unable to handle kernel paging request at virtual address
> 0000000000000048, epc == ffffffff80de37ec, ra == ffffffff80c7c
> Oops[#1]:
> CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
> Workqueue: events_power_efficient phy_state_machine
> task: 80000004021ed100 task.stack: 8000000409d70000
> $ 0   : 0000000000000000 ffffffff84720060 0000000000000048 0000000000000004
> $ 4   : 0000000000000000 0000000000000001 0000000000000004 0000000000000000
> $ 8   : 0000000000000000 0000000000000000 00000000ffff98f3 0000000000000000
> $12   : 8000000409d73fe0 0000000000009c00 ffffffff846547c8 000000000000af3b
> $16   : 80000004096bab68 80000004096babd0 0000000000000000 80000004096ba800
> $20   : 0000000000000000 0000000000000000 ffffffff81090000 0000000000000008
> $24   : 0000000000000061 ffffffff808637b0
> $28   : 8000000409d70000 8000000409d73cf0 80000000271bd300 ffffffff80c7804c
> Hi    : 000000000000002a
> Lo    : 000000000000003f
> epc   : ffffffff80de37ec netif_carrier_off+0xc/0x58
> ra    : ffffffff80c7804c phy_state_machine+0x48c/0x4f8
> Status: 14009ce3        KX SX UX KERNEL EXL IE
> Cause : 00800008 (ExcCode 02)
> BadVA : 0000000000000048
> PrId  : 000d9501 (Cavium Octeon III)
> Modules linked in:
> Process kworker/12:1 (pid: 1502, threadinfo=8000000409d70000,
> task=80000004021ed100, tls=0000000000000000)
> Stack : 8000000409a54000 80000004096bab68 80000000271bd300 80000000271c1e00
>         0000000000000000 ffffffff808a1708 8000000409a54000 80000000271bd300
>         80000000271bd320 8000000409a54030 ffffffff80ff0f00 0000000000000001
>         ffffffff81090000 ffffffff808a1ac0 8000000402182080 ffffffff84650000
>         8000000402182080 ffffffff84650000 ffffffff80ff0000 8000000409a54000
>         ffffffff808a1970 0000000000000000 80000004099e8000 8000000402099240
>         0000000000000000 ffffffff808a8598 0000000000000000 8000000408eeeb00
>         8000000409a54000 00000000810a1d00 0000000000000000 8000000409d73de8
>         8000000409d73de8 0000000000000088 000000000c009c00 8000000409d73e08
>         8000000409d73e08 8000000402182080 ffffffff808a84d0 8000000402182080
>         ...
> Call Trace:
> [<ffffffff80de37ec>] netif_carrier_off+0xc/0x58
> [<ffffffff80c7804c>] phy_state_machine+0x48c/0x4f8
> [<ffffffff808a1708>] process_one_work+0x158/0x368
> [<ffffffff808a1ac0>] worker_thread+0x150/0x4c0
> [<ffffffff808a8598>] kthread+0xc8/0xe0
> [<ffffffff808617f0>] ret_from_kernel_thread+0x14/0x1c


-- 
Florian

^ permalink raw reply related

* Re: DSA mv88e6xxx RX frame errors and TCP/IP RX failure
From: Ilia Mirkin @ 2017-08-31  0:38 UTC (permalink / raw)
  To: Tim Harvey
  Cc: Andrew Lunn, netdev, Vivien Didelot, linux-kernel@vger.kernel.org,
	Fugang Duan
In-Reply-To: <CAJ+vNU3Qto9iNdCUExb4NWaa3PA1UWe1kbY1b=sBLPgw_xiz7g@mail.gmail.com>

On Wed, Aug 30, 2017 at 8:22 PM, Tim Harvey <tharvey@gateworks.com> wrote:
> On Wed, Aug 30, 2017 at 3:06 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>> On Wed, Aug 30, 2017 at 12:53:56PM -0700, Tim Harvey wrote:
>>> Greetings,
>>>
>>> I'm seeing RX frame errors when using the mv88e6xxx DSA driver on
>>> 4.13-rc7. The board I'm using is a GW5904 [1] which has an IMX6 FEC
>>> MAC (eth0) connected via RGMII to a MV88E6176 with its downstream
>>> P0/P1/P2/P3 to front panel RJ45's (lan1-lan4).
>>
>> Hi Tim
>>
>> Can you confirm the counter is this one:
>>
>>                        /* Report late collisions as a frame error. */
>>                         if (status & (BD_ENET_RX_NO | BD_ENET_RX_CL))
>>                                 ndev->stats.rx_frame_errors++;
>>
>> I don't see anywhere else frame errors are counted, but it would be
>> good to prove we are looking in the right place.
>>
>
> Andrew,
>
> (adding IMX FEC driver maintainer to CC)
>
> Yes, that's one of them being hit. It looks like ifconfig reports
> 'frame' as the accumulation of a few stats so here are some more
> specifics from /sys/class/net/eth0/statistics:
>
> root@xenial:/sys/devices/soc0/soc/2100000.aips-bus/2188000.ethernet/net/eth0/statistics#
> for i in `ls rx_*`; do echo $i:$(cat $i); done
> rx_bytes:103229
> rx_compressed:0
> rx_crc_errors:22
> rx_dropped:0
> rx_errors:22
> rx_fifo_errors:0
> rx_frame_errors:22
> rx_length_errors:22
> rx_missed_errors:0
> rx_nohandler:0
> rx_over_errors:0
> rx_packets:1174
> root@xenial:/sys/devices/soc0/soc/2100000.aips-bus/2188000.ethernet/net/eth0/statistics#
> ifconfig eth0
> eth0      Link encap:Ethernet  HWaddr 00:D0:12:41:F3:E7
>           inet6 addr: fe80::2d0:12ff:fe41:f3e7/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1207 errors:22 dropped:0 overruns:0 frame:66
>           TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:106009 (103.5 KiB)  TX bytes:4604 (4.4 KiB)
>
> Instrumenting fec driver I see the following getting hit:
>
> status & BD_ENET_RX_LG /* rx_length_errors: Frame too long */
> status & BD_ENET_RX_CR  /* rx_crc_errors: CRC Error */
> status & BD_ENET_RX_CL /* rx_frame_errors: Collision? */
>
> Is this a frame size issue where the MV88E6176 is sending frames down
> that exceed the MTU because of headers added?

Not sure if this is relevant to you, but
https://github.com/laanwj/linux-freedreno-a2xx/commit/076b6542fa27499072ec6c3a7941c8b3c79ba1fd
was necessary to fix some MTU issues on a i.MX51. Not sure if it's
upstream yet or not.

Cheers,

  -ilia

^ permalink raw reply

* Re: multi-queue over IFF_NO_QUEUE "virtual" devices
From: Florian Fainelli @ 2017-08-31  0:30 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko, Jamal Hadi Salim,
	andrew, David Miller, Vivien Didelot
In-Reply-To: <CAM_iQpV4y2sd=5UJ5vLzXwM4UpTNA2GY6mf5AfBLJ_UpGGWc_A@mail.gmail.com>

On 08/30/2017 04:37 PM, Cong Wang wrote:
> On Tue, Aug 29, 2017 at 8:49 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> Le 08/07/17 à 15:26, Florian Fainelli a écrit :
>>> Hi,
>>>
>>> Most DSA supported Broadcom switches have multiple queues per ports
>>> (usually 8) and each of these queues can be configured with different
>>> pause, drop, hysteresis thresholds and so on in order to make use of the
>>> switch's internal buffering scheme and have some queues achieve some
>>> kind of lossless behavior (e.g: LAN to LAN traffic for Q7 has a higher
>>> priority than LAN to WAN for Q0).
>>>
>>> This is obviously very workload specific, so I'd want maximum
>>> programmability as much as possible.
>>>
>>> This brings me to a few questions:
>>>
>>> 1) If we have the DSA slave network devices currently flagged with
>>> IFF_NO_QUEUE becoming multi-queue (on TX) aware such that an application
>>> can control exactly which switch egress queue is used on a per-flow
>>> basis, would that be a problem (this is the dynamic selection of the TX
>>> queue)?
>>
>> So I have this part figured out, with a bunch of changes network devices
>> created by DSA are now multiqueue aware and the Broadcom tag layer is
>> capable of extracting the queue index, passing it in the tag where
>> expected and having the switch forward to the appropriate switch port
>> and queue within that port. It also sets the queue mapping in the SKB
>> for later consumption by the master network device driver: bcmsysport.c
>> because of 2).
>>
>>>
>>> 2) The conduit interface (CPU) port network interface has a congestion
>>> control scheme which requires each of its TX queues (32 or 16) to be
>>> statically mapped to each of the underlying switch port queues because
>>> the congestion/ HW needs to inspect the queue depths of the switch to
>>> accept/reject a packet at the CPU's TX ring level. Do we have a good way
>>> with tc to map a virtual/stacked device's queue(s) on-top of its
>>> physical/underlying device's queues (this is the static queue mapping
>>> necessary for congestion to work)?
>>
>> That part I have not figured out yet, with some static mapping I can
>> obtain the results that I want and was even considering the possibility
>> of doing something like this:
>>
>> - register a network device notifier with bcmsysport.c (master network
>> device) for this setup
>> - expose a helper function allowing me to obtain a given DSA network
>> device port index
>> - whenever DSA creates network devices reconfigure the ring and queue
>> mapping of the TX queues managed by bcmsysport.c with the DSA network
>> device port index that has just been registered and just do a 1-1
>> mapping of the 8 queues
>>
>> You would end-up with something like:
>>
>> gphy (port 0) queues 0-7 mapped to systemport queues 0-7
>> rgmii_1 (port 1) queues 0-7 mapped to systemport queues 8-15
>> rgmii_2 (port 2) queues 0-7 mapped to systemport queues 16 through 23
>> moca (port 7) queues 0-7 mapped to systemport queues 24-31
>>
>> This should be working because bcmsysport's TX queues are not under
>> direct control by the user, they are used via DSA created network
>> devices which indicate the queue they want to use. When the DSA
>> interfaces are brought down, their respective systemport queues now
>> become unused. This also works because the number of physical ports of
>> the switch times the number of queues is matching the number of TX
>> queues from systemport (like if someone designed it with that exact
>> purpose in mind ;)).
>>
>> The only problem with that approach of course is that it embeds a policy
>> within the systemport driver.
>>
>> Ideally I would really like to configure this via tc by setting up a
>> mapping between queues of one network devices to queues of another
>> network device, is that a possible thing, Jamal, Cong, Jiri, do you know?
> 
> I am not sure if I understand the mapping you are talking about here.
> 
> TC layer rarely deals with hardware queues directly (except probably mq),
> so this question probably don't belong to TC.
> 
> OTOH, TC can modify skb->hash, so you can redirect packets to a specific
> queue, but this doesn't sound like what you are you looking for.

I am actually building on TC being able to influence the value of
skb->queue_mapping, but that is just for the stacked devices, not the
underlying conduit device that does the actual transmission.

> 
> Maybe Jiri has more thoughts here since he works on TC offloading things.
> 

Patches with explanations and context (hopefully clearer) here:

http://patchwork.ozlabs.org/project/netdev/list/?series=728

Thanks!
-- 
Florian

^ permalink raw reply

* [RFC net-next 8/8] net: systemport: Establish DSA network device queue mapping
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli
In-Reply-To: <1504138732-65383-1-git-send-email-f.fainelli@gmail.com>

Establish a queue mapping between the DSA slave network device queues
created, and the transmit queue that SYSTEMPORT manages.

We need to configure the SYSTEMPORT transmit queue with the switch port
number and switch port queue number in order for the switch and
SYSTEMPORT hardware to utilize the out of band congestion notification.
This hardware mechanism works by looking at the switch port egress queue
and determines whether there is enough buffers for this queue, with that
class of service for a successful transmission and if not, backpressures
the SYSTEMPORT queue that is being used.

For this to work, we register a network device notifier that listens for
the registration of DSA network devices, and when that happens, extracts
the number of queues for these devices and their associated port number,
remembers that in the driver private structure and linearly maps those
queues to TX rings/queues that we manage.

This scheme works because DSA slave network deviecs always transmit
through SYSTEMPORT so when DSA slave network devices are
destroyed/brought down, the corresponding SYSTEMPORT queues are no
longer used. Also, by design of the DSA framework, the master network
device (SYSTEMPORT) is registered first.

For faster lookups we use an array of up to DSA_MAX_PORTS * number
of queues per port, and then map pointers to bcm_sysport_tx_ring such
that our ndo_select_queue() implementation can just index into that
array to locate the corresponding ring index.

Here is an example mapping with this code:

P0,Q0 -> Q0
..
P0,Q7 -> Q7
P1,Q0 -> Q8
..
P1,Q7 -> Q15
P2,Q0 -> Q16
..
P2,Q7 -> Q23
P7,Q0 -> Q24
..
P7,Q7 -> Q31

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 100 +++++++++++++++++++++++++++--
 drivers/net/ethernet/broadcom/bcmsysport.h |  11 +++-
 2 files changed, 106 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 931751e4f369..eed4c3f672d7 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1387,7 +1387,14 @@ static int bcm_sysport_init_tx_ring(struct bcm_sysport_priv *priv,
 	tdma_writel(priv, 0, TDMA_DESC_RING_COUNT(index));
 	tdma_writel(priv, 1, TDMA_DESC_RING_INTR_CONTROL(index));
 	tdma_writel(priv, 0, TDMA_DESC_RING_PROD_CONS_INDEX(index));
-	tdma_writel(priv, RING_IGNORE_STATUS, TDMA_DESC_RING_MAPPING(index));
+
+	/* Configure QID and port mapping */
+	reg = tdma_readl(priv, TDMA_DESC_RING_MAPPING(index));
+	reg &= ~(RING_QID_MASK | RING_PORT_ID_MASK << RING_PORT_ID_SHIFT);
+	reg |= ring->switch_queue & RING_QID_MASK;
+	reg |= ring->switch_port << RING_PORT_ID_SHIFT;
+	reg |= RING_IGNORE_STATUS;
+	tdma_writel(priv, reg, TDMA_DESC_RING_MAPPING(index));
 	tdma_writel(priv, 0, TDMA_DESC_RING_PCP_DEI_VID(index));
 
 	/* Program the number of descriptors as MAX_THRESHOLD and half of
@@ -1405,8 +1412,9 @@ static int bcm_sysport_init_tx_ring(struct bcm_sysport_priv *priv,
 	napi_enable(&ring->napi);
 
 	netif_dbg(priv, hw, priv->netdev,
-		  "TDMA cfg, size=%d, desc_cpu=%p\n",
-		  ring->size, ring->desc_cpu);
+		  "TDMA cfg, size=%d, desc_cpu=%p switch q=%d,port=%d\n",
+		  ring->size, ring->desc_cpu, ring->switch_queue,
+		  ring->switch_port);
 
 	return 0;
 }
@@ -1987,6 +1995,78 @@ static int bcm_sysport_stop(struct net_device *dev)
 	.set_link_ksettings     = phy_ethtool_set_link_ksettings,
 };
 
+static u16 bcm_sysport_select_queue(struct net_device *dev, struct sk_buff *skb,
+				    void *accel_priv,
+				    select_queue_fallback_t fallback)
+{
+	struct bcm_sysport_priv *priv = netdev_priv(dev);
+	u16 queue = skb_get_queue_mapping(skb);
+	struct bcm_sysport_tx_ring *tx_ring;
+	unsigned int q, port;
+
+	if (!netdev_uses_dsa(dev))
+		return fallback(dev, skb);
+
+	/* DSA tagging layer will have configured the correct queue */
+	q = queue & 0xff;
+	port = queue >> priv->per_port_num_tx_queues;
+	tx_ring = priv->ring_map[q + port * priv->per_port_num_tx_queues];
+
+	return tx_ring->index;
+}
+
+static int bcm_sysport_map_queues(struct bcm_sysport_priv *priv,
+				  struct net_device *slave_dev)
+{
+	struct net_device *dev = priv->netdev;
+	struct bcm_sysport_tx_ring *ring;
+	unsigned int num_tx_queues;
+	unsigned int q, start, port;
+
+	port = dsa_slave_dev_port_num(slave_dev);
+	num_tx_queues = slave_dev->num_tx_queues;
+
+	if (priv->per_port_num_tx_queues &&
+	    priv->per_port_num_tx_queues != num_tx_queues)
+		netdev_warn(slave_dev, "asymetric number of per-port queues\n");
+
+	priv->per_port_num_tx_queues = num_tx_queues;
+
+	start = find_first_zero_bit(&priv->queue_bitmap, dev->num_tx_queues);
+	for (q = 0; q < num_tx_queues; q++) {
+		ring = &priv->tx_rings[q + start];
+
+		/* Just remember the mapping actual programming done
+		 * during bcm_sysport_init_tx_ring
+		 */
+		ring->switch_queue = q;
+		ring->switch_port = port;
+		priv->ring_map[q + port * num_tx_queues] = ring;
+
+		/* Set all queues as being used now */
+		set_bit(q + start, &priv->queue_bitmap);
+	}
+
+	return NOTIFY_OK;
+}
+
+static int bcm_sysport_notifier_event(struct notifier_block *nb,
+				      unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct bcm_sysport_priv *priv;
+
+	priv = container_of(nb, struct bcm_sysport_priv, queue_nb);
+
+	if (!dsa_slave_dev_check(dev))
+		return NOTIFY_DONE;
+
+	if (event == NETDEV_REGISTER)
+		return bcm_sysport_map_queues(priv, dev);
+
+	return NOTIFY_DONE;
+}
+
 static const struct net_device_ops bcm_sysport_netdev_ops = {
 	.ndo_start_xmit		= bcm_sysport_xmit,
 	.ndo_tx_timeout		= bcm_sysport_tx_timeout,
@@ -1999,6 +2079,7 @@ static int bcm_sysport_stop(struct net_device *dev)
 	.ndo_poll_controller	= bcm_sysport_poll_controller,
 #endif
 	.ndo_get_stats64	= bcm_sysport_get_stats64,
+	.ndo_select_queue	= bcm_sysport_select_queue,
 };
 
 #define REV_FMT	"v%2x.%02x"
@@ -2148,10 +2229,17 @@ static int bcm_sysport_probe(struct platform_device *pdev)
 
 	u64_stats_init(&priv->syncp);
 
+	priv->queue_nb.notifier_call = bcm_sysport_notifier_event;
+	ret = register_netdevice_notifier(&priv->queue_nb);
+	if (ret) {
+		dev_err(&pdev->dev, "failed to register notifier\n");
+		goto err_deregister_fixed_link;
+	}
+
 	ret = register_netdev(dev);
 	if (ret) {
 		dev_err(&pdev->dev, "failed to register net_device\n");
-		goto err_deregister_fixed_link;
+		goto err_deregister_nb;
 	}
 
 	priv->rev = topctrl_readl(priv, REV_CNTL) & REV_MASK;
@@ -2164,6 +2252,8 @@ static int bcm_sysport_probe(struct platform_device *pdev)
 
 	return 0;
 
+err_deregister_nb:
+	unregister_netdevice_notifier(&priv->queue_nb);
 err_deregister_fixed_link:
 	if (of_phy_is_fixed_link(dn))
 		of_phy_deregister_fixed_link(dn);
@@ -2175,6 +2265,7 @@ static int bcm_sysport_probe(struct platform_device *pdev)
 static int bcm_sysport_remove(struct platform_device *pdev)
 {
 	struct net_device *dev = dev_get_drvdata(&pdev->dev);
+	struct bcm_sysport_priv *priv = netdev_priv(dev);
 	struct device_node *dn = pdev->dev.of_node;
 
 	/* Not much to do, ndo_close has been called
@@ -2183,6 +2274,7 @@ static int bcm_sysport_remove(struct platform_device *pdev)
 	unregister_netdev(dev);
 	if (of_phy_is_fixed_link(dn))
 		of_phy_deregister_fixed_link(dn);
+	unregister_netdevice_notifier(&priv->queue_nb);
 	free_netdev(dev);
 	dev_set_drvdata(&pdev->dev, NULL);
 
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.h b/drivers/net/ethernet/broadcom/bcmsysport.h
index 80b4ffff63b7..8913cb0cace6 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.h
+++ b/drivers/net/ethernet/broadcom/bcmsysport.h
@@ -404,7 +404,7 @@ struct bcm_rsb {
 #define  RING_CONS_INDEX_MASK		0xffff
 
 #define RING_MAPPING			0x14
-#define  RING_QID_MASK			0x3
+#define  RING_QID_MASK			0x7
 #define  RING_PORT_ID_SHIFT		3
 #define  RING_PORT_ID_MASK		0x7
 #define  RING_IGNORE_STATUS		(1 << 6)
@@ -711,6 +711,8 @@ struct bcm_sysport_tx_ring {
 	struct bcm_sysport_priv *priv;	/* private context backpointer */
 	unsigned long	packets;	/* packets statistics */
 	unsigned long	bytes;		/* bytes statistics */
+	unsigned int	switch_queue;	/* switch port queue number */
+	unsigned int	switch_port;	/* switch port queue number */
 };
 
 /* Driver private structure */
@@ -764,5 +766,12 @@ struct bcm_sysport_priv {
 
 	/* For atomic update generic 64bit value on 32bit Machine */
 	struct u64_stats_sync	syncp;
+
+	/* netdev notifier to map switch port queues */
+	struct notifier_block	queue_nb;
+	unsigned int		per_port_num_tx_queues;
+	unsigned long		queue_bitmap;
+	struct bcm_sysport_tx_ring *ring_map[DSA_MAX_PORTS * 8];
+
 };
 #endif /* __BCM_SYSPORT_H */
-- 
1.9.1

^ permalink raw reply related

* [RFC net-next 7/8] net: dsa: tag_brcm: Indicate to master netdevice port + queue
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli
In-Reply-To: <1504138732-65383-1-git-send-email-f.fainelli@gmail.com>

We need to tell the DSA master network device doing the actual
transmission what the desired switch port and queue number is for it to
resolve that to the internal transmit queue it is mapped to.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 net/dsa/tag_brcm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index dbb016434ace..19a617b09b3c 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -86,6 +86,11 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, struct net_device *dev
 		brcm_tag[2] = BRCM_IG_DSTMAP2_MASK;
 	brcm_tag[3] = (1 << p->dp->index) & BRCM_IG_DSTMAP1_MASK;
 
+	/* Now tell the master network device about the desired output queue
+	 * as well
+	 */
+	skb_set_queue_mapping(skb, p->dp->index << dev->num_tx_queues | queue);
+
 	return skb;
 }
 
-- 
1.9.1

^ permalink raw reply related

* [RFC net-next 6/8] net: dsa: Expose dsa_slave_dev_check and dsa_slave_dev_port_num
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli
In-Reply-To: <1504138732-65383-1-git-send-email-f.fainelli@gmail.com>

Expose two helper functions:
* one to verify if a net_device is a DSA slave network device
* one to obtain the physical port number associated with a DSA slave
  network device

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 include/net/dsa.h | 15 +++++++++++++++
 net/dsa/slave.c   | 12 +++++++++---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index b10e8da3f8d7..649bd06f9fe4 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -459,6 +459,21 @@ static inline bool netdev_uses_dsa(struct net_device *dev)
 	return false;
 }
 
+#if IS_ENABLED(CONFIG_NET_DSA)
+bool dsa_slave_dev_check(struct net_device *dev);
+unsigned int dsa_slave_dev_port_num(struct net_device *dev);
+#else
+static inline bool dsa_slave_dev_check(struct net_device *dev)
+{
+	return false;
+}
+
+static inline dsa_slave_dev_port_num(struct net_device *dev)
+{
+	return DSA_MAX_PORTS;
+}
+#endif
+
 struct dsa_switch *dsa_switch_alloc(struct device *dev, size_t n);
 void dsa_unregister_switch(struct dsa_switch *ds);
 int dsa_register_switch(struct dsa_switch *ds);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index bfd7173a3c6a..302ae3326e3a 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -25,8 +25,6 @@
 
 #include "dsa_priv.h"
 
-static bool dsa_slave_dev_check(struct net_device *dev);
-
 /* slave mii_bus handling ***************************************************/
 static int dsa_slave_phy_read(struct mii_bus *bus, int addr, int reg)
 {
@@ -1346,10 +1344,18 @@ void dsa_slave_destroy(struct net_device *slave_dev)
 	free_netdev(slave_dev);
 }
 
-static bool dsa_slave_dev_check(struct net_device *dev)
+bool dsa_slave_dev_check(struct net_device *dev)
 {
 	return dev->netdev_ops == &dsa_slave_netdev_ops;
 }
+EXPORT_SYMBOL_GPL(dsa_slave_dev_check);
+
+unsigned int dsa_slave_dev_port_num(struct net_device *dev)
+{
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	return p->dp->index;
+}
+EXPORT_SYMBOL_GPL(dsa_slave_dev_port_num);
 
 static int dsa_slave_changeupper(struct net_device *dev,
 				 struct netdev_notifier_changeupper_info *info)
-- 
1.9.1

^ permalink raw reply related

* [RFC net-next 5/8] net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli
In-Reply-To: <1504138732-65383-1-git-send-email-f.fainelli@gmail.com>

BCM7278 has only 128 entries while BCM7445 has the full 256 entries set,
fix that.

Fixes: 7318166cacad ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/bcm_sf2.c     | 4 ++++
 drivers/net/dsa/bcm_sf2.h     | 1 +
 drivers/net/dsa/bcm_sf2_cfp.c | 8 ++++----
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index fc9f9f171e55..af299fe343cc 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1043,6 +1043,7 @@ struct bcm_sf2_of_data {
 	u32 type;
 	const u16 *reg_offsets;
 	unsigned int core_reg_align;
+	unsigned int num_cfp_rules;
 };
 
 /* Register offsets for the SWITCH_REG_* block */
@@ -1066,6 +1067,7 @@ struct bcm_sf2_of_data {
 	.type		= BCM7445_DEVICE_ID,
 	.core_reg_align	= 0,
 	.reg_offsets	= bcm_sf2_7445_reg_offsets,
+	.num_cfp_rules	= 256,
 };
 
 static const u16 bcm_sf2_7278_reg_offsets[] = {
@@ -1088,6 +1090,7 @@ struct bcm_sf2_of_data {
 	.type		= BCM7278_DEVICE_ID,
 	.core_reg_align	= 1,
 	.reg_offsets	= bcm_sf2_7278_reg_offsets,
+	.num_cfp_rules	= 128,
 };
 
 static const struct of_device_id bcm_sf2_of_match[] = {
@@ -1144,6 +1147,7 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev)
 	priv->type = data->type;
 	priv->reg_offsets = data->reg_offsets;
 	priv->core_reg_align = data->core_reg_align;
+	priv->num_cfp_rules = data->num_cfp_rules;
 
 	/* Auto-detection using standard registers will not work, so
 	 * provide an indication of what kind of device we are for
diff --git a/drivers/net/dsa/bcm_sf2.h b/drivers/net/dsa/bcm_sf2.h
index d9c96b281fc0..02c499f9c56b 100644
--- a/drivers/net/dsa/bcm_sf2.h
+++ b/drivers/net/dsa/bcm_sf2.h
@@ -72,6 +72,7 @@ struct bcm_sf2_priv {
 	u32 				type;
 	const u16			*reg_offsets;
 	unsigned int			core_reg_align;
+	unsigned int			num_cfp_rules;
 
 	/* spinlock protecting access to the indirect registers */
 	spinlock_t			indir_lock;
diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index 2fb32d67065f..8a1da7e67707 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -98,7 +98,7 @@ static inline void bcm_sf2_cfp_rule_addr_set(struct bcm_sf2_priv *priv,
 {
 	u32 reg;
 
-	WARN_ON(addr >= CFP_NUM_RULES);
+	WARN_ON(addr >= priv->num_cfp_rules);
 
 	reg = core_readl(priv, CORE_CFP_ACC);
 	reg &= ~(XCESS_ADDR_MASK << XCESS_ADDR_SHIFT);
@@ -109,7 +109,7 @@ static inline void bcm_sf2_cfp_rule_addr_set(struct bcm_sf2_priv *priv,
 static inline unsigned int bcm_sf2_cfp_rule_size(struct bcm_sf2_priv *priv)
 {
 	/* Entry #0 is reserved */
-	return CFP_NUM_RULES - 1;
+	return priv->num_cfp_rules - 1;
 }
 
 static int bcm_sf2_cfp_rule_set(struct dsa_switch *ds, int port,
@@ -523,7 +523,7 @@ static int bcm_sf2_cfp_rule_get_all(struct bcm_sf2_priv *priv,
 		if (!(reg & OP_STR_DONE))
 			break;
 
-	} while (index < CFP_NUM_RULES);
+	} while (index < priv->num_cfp_rules);
 
 	/* Put the TCAM size here */
 	nfc->data = bcm_sf2_cfp_rule_size(priv);
@@ -544,7 +544,7 @@ int bcm_sf2_get_rxnfc(struct dsa_switch *ds, int port,
 	case ETHTOOL_GRXCLSRLCNT:
 		/* Subtract the default, unusable rule */
 		nfc->rule_cnt = bitmap_weight(priv->cfp.used,
-					      CFP_NUM_RULES) - 1;
+					      priv->num_cfp_rules) - 1;
 		/* We support specifying rule locations */
 		nfc->data |= RX_CLS_LOC_SPECIAL;
 		break;
-- 
1.9.1

^ permalink raw reply related

* [RFC net-next 4/8] net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli
In-Reply-To: <1504138732-65383-1-git-send-email-f.fainelli@gmail.com>

Even though TC2QOS mapping is for switch egress queues, we need to
configure it correclty in order for the Broadcom tag ingress (CPU ->
switch) queue selection to work correctly since there is a 1:1 mapping
between switch egress queues and ingress queues.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/bcm_sf2.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 3f1ad9d5d7c5..fc9f9f171e55 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -103,6 +103,7 @@ static void bcm_sf2_brcm_hdr_setup(struct bcm_sf2_priv *priv, int port)
 static void bcm_sf2_imp_setup(struct dsa_switch *ds, int port)
 {
 	struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
+	unsigned int i;
 	u32 reg, offset;
 
 	if (priv->type == BCM7445_DEVICE_ID)
@@ -129,6 +130,14 @@ static void bcm_sf2_imp_setup(struct dsa_switch *ds, int port)
 	reg |= MII_DUMB_FWDG_EN;
 	core_writel(priv, reg, CORE_SWITCH_CTRL);
 
+	/* Configure Traffic Class to QoS mapping, allow each priority to map
+	 * to a different queue number
+	 */
+	reg = core_readl(priv, CORE_PORT_TC2_QOS_MAP_PORT(port));
+	for (i = 0; i < 8; i++)
+		reg |= i << (PRT_TO_QID_SHIFT * i);
+	core_writel(priv, reg, CORE_PORT_TC2_QOS_MAP_PORT(port));
+
 	bcm_sf2_brcm_hdr_setup(priv, port);
 
 	/* Force link status for IMP port */
-- 
1.9.1

^ permalink raw reply related

* [RFC net-next 3/8] net: dsa: bcm_sf2: Advertise number of egress queues
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli
In-Reply-To: <1504138732-65383-1-git-send-email-f.fainelli@gmail.com>

The switch supports 8 egress queues per port, so indicate that such that
net/dsa/slave.c::dsa_slave_create can allocate the right number of TX
queues.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/bcm_sf2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 8492c9d64004..3f1ad9d5d7c5 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1147,6 +1147,9 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev)
 	ds = dev->ds;
 	ds->ops = &bcm_sf2_ops;
 
+	/* Advertise the 8 egress queues */
+	ds->num_tx_queues = 8;
+
 	dev_set_drvdata(&pdev->dev, priv);
 
 	spin_lock_init(&priv->indir_lock);
-- 
1.9.1

^ permalink raw reply related

* [RFC net-next 2/8] net: dsa: tag_brcm: Set output queue from skb queue mapping
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli
In-Reply-To: <1504138732-65383-1-git-send-email-f.fainelli@gmail.com>

We originally used skb->priority but that was not quite correct as this
bitfield needs to contain the egress switch queue we intend to send this
SKB to.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 net/dsa/tag_brcm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index de74c3f77818..dbb016434ace 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -62,6 +62,7 @@
 static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
+	u16 queue = skb_get_queue_mapping(skb);
 	u8 *brcm_tag;
 
 	if (skb_cow_head(skb, BRCM_TAG_LEN) < 0)
@@ -78,7 +79,7 @@ static struct sk_buff *brcm_tag_xmit(struct sk_buff *skb, struct net_device *dev
 	 * deprecated
 	 */
 	brcm_tag[0] = (1 << BRCM_OPCODE_SHIFT) |
-			((skb->priority << BRCM_IG_TC_SHIFT) & BRCM_IG_TC_MASK);
+		       ((queue & BRCM_IG_TC_MASK) << BRCM_IG_TC_SHIFT);
 	brcm_tag[1] = 0;
 	brcm_tag[2] = 0;
 	if (p->dp->index == 8)
-- 
1.9.1

^ permalink raw reply related

* [RFC net-next 1/8] net: dsa: Allow switch drivers to indicate number of RX/TX queues
From: Florian Fainelli @ 2017-08-31  0:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, jhs, davem, xiyou.wangcong, andrew, vivien.didelot,
	Florian Fainelli
In-Reply-To: <1504138732-65383-1-git-send-email-f.fainelli@gmail.com>

Let switch drivers indicate how many RX and TX queues they support. Some
switches, such as Broadcom Starfighter 2 are resigned with 8 egress
queues. Future changes will allow us to leverage the queue mapping and
direct the transmission towards a particular queue.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 include/net/dsa.h |  4 ++++
 net/dsa/slave.c   | 10 ++++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 398ca8d70ccd..b10e8da3f8d7 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -243,6 +243,10 @@ struct dsa_switch {
 	/* devlink used to represent this switch device */
 	struct devlink		*devlink;
 
+	/* Number of switch port queues */
+	unsigned int		num_rx_queues;
+	unsigned int		num_tx_queues;
+
 	/* Dynamically allocated ports, keep last */
 	size_t num_ports;
 	struct dsa_port ports[];
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 78e78a6e6833..bfd7173a3c6a 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1259,8 +1259,14 @@ int dsa_slave_create(struct dsa_port *port, const char *name)
 	cpu_dp = ds->dst->cpu_dp;
 	master = cpu_dp->netdev;
 
-	slave_dev = alloc_netdev(sizeof(struct dsa_slave_priv), name,
-				 NET_NAME_UNKNOWN, ether_setup);
+	if (!ds->num_rx_queues)
+		ds->num_rx_queues = 1;
+	if (!ds->num_tx_queues)
+		ds->num_tx_queues = 1;
+
+	slave_dev = alloc_netdev_mqs(sizeof(struct dsa_slave_priv), name,
+				     NET_NAME_UNKNOWN, ether_setup,
+				     ds->num_tx_queues, ds->num_rx_queues);
 	if (slave_dev == NULL)
 		return -ENOMEM;
 
-- 
1.9.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox