* Re: [PATCH 07/17] net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
From: Kevin Darbyshire-Bryant @ 2018-06-15 20:49 UTC (permalink / raw)
To: David Woodhouse
Cc: Eric Dumazet, Elena Reshetova, netdev@vger.kernel.org,
Krzysztof Mazur, 3chas3@gmail.com, Mathias Kresin
In-Reply-To: <00520334446ffa4671513bb42ebeeecfab4107e7.camel@infradead.org>
[-- Attachment #1: Type: text/plain, Size: 7853 bytes --]
> On 15 Jun 2018, at 21:00, David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Fri, 2018-06-15 at 14:44 +0100, David Woodhouse wrote:
>>
>>> Or simply use a new field in ATM_SKB(skb) to remember a stable
>>> truesize used in both sides (add/sub)
>>
>> Right, that was my second suggestion ("copy the accounted value...").
>>
>> It's a bit of a hack, and I think that actually *using* sock_wfree()
>> instead of what's currently in atm_pop_raw() would be the better
>> solution. Does anyone remember why we didn't do that in the first
>> place?
>
> That does end up being quite hairy. I don't think it's worth doing.
>
> This should probably suffice to fix it...
>
> Kevin this is going to conflict with the ifx_atm_alloc_skb() hack in
> the tree you're working on, but that needs to be killed with fire
> anyway. It's utterly pointless as discussed.
I had already done so as part of the last pastebin debug info round :-)
As regards your patch… MAGIC! Works an absolute treat. Will get that submitted along with the ‘nuke ifx_atm_alloc_skb’ patch to OpenWrt tomorrow. For now, maybe my brain will let me sleep :-)
Thank you soooooo much for your help & patience.
Tested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
>
>
> From 3368eaeb0a2f09138894dde0f26f879e5228005a Mon Sep 17 00:00:00 2001
> From: David Woodhouse <dwmw2@infradead.org>
> Date: Fri, 15 Jun 2018 20:49:20 +0100
> Subject: [PATCH] atm: Preserve value of skb->truesize when accounting to vcc
>
> There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
> for certain skbs. Ideally it would cover ATM too. It doesn't. Just
> stashing the accounted value and using it in atm_raw_pop() is probably
> the easiest way to cope.
>
> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
> ---
> include/linux/atmdev.h | 15 +++++++++++++++
> net/atm/br2684.c | 3 +--
> net/atm/clip.c | 3 +--
> net/atm/common.c | 3 +--
> net/atm/lec.c | 3 +--
> net/atm/mpc.c | 3 +--
> net/atm/pppoatm.c | 3 +--
> net/atm/raw.c | 4 ++--
> 8 files changed, 23 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/atmdev.h b/include/linux/atmdev.h
> index 0c27515d2cf6..8124815eb121 100644
> --- a/include/linux/atmdev.h
> +++ b/include/linux/atmdev.h
> @@ -214,6 +214,7 @@ struct atmphy_ops {
> struct atm_skb_data {
> struct atm_vcc *vcc; /* ATM VCC */
> unsigned long atm_options; /* ATM layer options */
> + unsigned int acct_truesize; /* truesize accounted to vcc */
> };
>
> #define VCC_HTABLE_SIZE 32
> @@ -241,6 +242,20 @@ void vcc_insert_socket(struct sock *sk);
>
> void atm_dev_release_vccs(struct atm_dev *dev);
>
> +static inline void atm_account_tx(struct atm_vcc *vcc, struct sk_buff *skb)
> +{
> + /*
> + * Because ATM skbs may not belong to a sock (and we don't
> + * necessarily want to), skb->truesize may be adjusted,
> + * escaping the hack in pskb_expand_head() which avoids
> + * doing so for some cases. So stash the value of truesize
> + * at the time we accounted it, and atm_pop_raw() can use
> + * that value later, in case it changes.
> + */
> + refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
> + ATM_SKB(skb)->acct_truesize = skb->truesize;
> + ATM_SKB(skb)->atm_options = vcc->atm_options;
> +}
>
> static inline void atm_force_charge(struct atm_vcc *vcc,int truesize)
> {
> diff --git a/net/atm/br2684.c b/net/atm/br2684.c
> index 4e111196f902..bc21f8e8daf2 100644
> --- a/net/atm/br2684.c
> +++ b/net/atm/br2684.c
> @@ -252,8 +252,7 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev,
>
> ATM_SKB(skb)->vcc = atmvcc = brvcc->atmvcc;
> pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, atmvcc, atmvcc->dev);
> - refcount_add(skb->truesize, &sk_atm(atmvcc)->sk_wmem_alloc);
> - ATM_SKB(skb)->atm_options = atmvcc->atm_options;
> + atm_account_tx(atmvcc, skb);
> dev->stats.tx_packets++;
> dev->stats.tx_bytes += skb->len;
>
> diff --git a/net/atm/clip.c b/net/atm/clip.c
> index 65f706e4344c..60920a42f640 100644
> --- a/net/atm/clip.c
> +++ b/net/atm/clip.c
> @@ -381,8 +381,7 @@ static netdev_tx_t clip_start_xmit(struct sk_buff *skb,
> memcpy(here, llc_oui, sizeof(llc_oui));
> ((__be16 *) here)[3] = skb->protocol;
> }
> - refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
> - ATM_SKB(skb)->atm_options = vcc->atm_options;
> + atm_account_tx(vcc, skb);
> entry->vccs->last_use = jiffies;
> pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, vcc, vcc->dev);
> old = xchg(&entry->vccs->xoff, 1); /* assume XOFF ... */
> diff --git a/net/atm/common.c b/net/atm/common.c
> index 8a4f99114cd2..9e812c782a37 100644
> --- a/net/atm/common.c
> +++ b/net/atm/common.c
> @@ -630,10 +630,9 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t size)
> goto out;
> }
> pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
> - refcount_add(skb->truesize, &sk->sk_wmem_alloc);
> + atm_account_tx(vcc, skb);
>
> skb->dev = NULL; /* for paths shared with net_device interfaces */
> - ATM_SKB(skb)->atm_options = vcc->atm_options;
> if (!copy_from_iter_full(skb_put(skb, size), size, &m->msg_iter)) {
> kfree_skb(skb);
> error = -EFAULT;
> diff --git a/net/atm/lec.c b/net/atm/lec.c
> index a3d93a1bb133..d7cc165e24e0 100644
> --- a/net/atm/lec.c
> +++ b/net/atm/lec.c
> @@ -179,9 +179,8 @@ lec_send(struct atm_vcc *vcc, struct sk_buff *skb)
> struct net_device *dev = skb->dev;
>
> ATM_SKB(skb)->vcc = vcc;
> - ATM_SKB(skb)->atm_options = vcc->atm_options;
> + atm_account_tx(vcc, skb);
>
> - refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
> if (vcc->send(vcc, skb) < 0) {
> dev->stats.tx_dropped++;
> return;
> diff --git a/net/atm/mpc.c b/net/atm/mpc.c
> index 5677147209e8..db9a1838687c 100644
> --- a/net/atm/mpc.c
> +++ b/net/atm/mpc.c
> @@ -555,8 +555,7 @@ static int send_via_shortcut(struct sk_buff *skb, struct mpoa_client *mpc)
> sizeof(struct llc_snap_hdr));
> }
>
> - refcount_add(skb->truesize, &sk_atm(entry->shortcut)->sk_wmem_alloc);
> - ATM_SKB(skb)->atm_options = entry->shortcut->atm_options;
> + atm_account_tx(entry->shortcut, skb);
> entry->shortcut->send(entry->shortcut, skb);
> entry->packets_fwded++;
> mpc->in_ops->put(entry);
> diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
> index 21d9d341a619..af8c4b38b746 100644
> --- a/net/atm/pppoatm.c
> +++ b/net/atm/pppoatm.c
> @@ -350,8 +350,7 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
> return 1;
> }
>
> - refcount_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
> - ATM_SKB(skb)->atm_options = ATM_SKB(skb)->vcc->atm_options;
> + atm_account_tx(vcc, skb);
> pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n",
> skb, ATM_SKB(skb)->vcc, ATM_SKB(skb)->vcc->dev);
> ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
> diff --git a/net/atm/raw.c b/net/atm/raw.c
> index ee10e8d46185..b3ba44aab0ee 100644
> --- a/net/atm/raw.c
> +++ b/net/atm/raw.c
> @@ -35,8 +35,8 @@ static void atm_pop_raw(struct atm_vcc *vcc, struct sk_buff *skb)
> struct sock *sk = sk_atm(vcc);
>
> pr_debug("(%d) %d -= %d\n",
> - vcc->vci, sk_wmem_alloc_get(sk), skb->truesize);
> - WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
> + vcc->vci, sk_wmem_alloc_get(sk), ATM_SKB(skb)->acct_truesize);
> + WARN_ON(refcount_sub_and_test(ATM_SKB(skb)->acct_truesize, &sk->sk_wmem_alloc));
> dev_kfree_skb_any(skb);
> sk->sk_write_space(sk);
> }
> --
> 2.17.0
>
> --
> dwmw2
Tested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Cheers,
Kevin D-B
012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH 0/2] leds: drop led_trigger_rename_static()
From: Jacek Anaszewski @ 2018-06-15 20:49 UTC (permalink / raw)
To: Uwe Kleine-König, Pavel Machek, Wolfgang Grandegger,
Marc Kleine-Budde, David S. Miller, netdev
Cc: linux-leds, linux-can, kernel
In-Reply-To: <20180518085333.26187-1-u.kleine-koenig@pengutronix.de>
Hi,
I need can or net maintainer's ack for merging this
set via LED tree. It's been awaiting feedback for a few weeks
now and it is blocking another set depending on it.
On 05/18/2018 10:53 AM, Uwe Kleine-König wrote:
> Hello,
>
> initially I prepared a patch to fix the broken things in
> led_trigger_rename_static(), but given that there is only the
> can-trigger driver that makes use of this function and the netdev
> trigger implements a super set of the can-trigger, removing both the
> can-trigger and the broken function seems sensible.
>
> Best regards
> Uwe
>
> Uwe Kleine-König (2):
> can: drop led trigger support
> leds: remove unused function led_trigger_rename_static()
>
> drivers/leds/led-triggers.c | 13 ---
> drivers/net/can/Kconfig | 11 --
> drivers/net/can/Makefile | 2 -
> drivers/net/can/at91_can.c | 10 --
> drivers/net/can/c_can/c_can.c | 11 --
> drivers/net/can/dev.c | 5 -
> drivers/net/can/flexcan.c | 8 --
> drivers/net/can/ifi_canfd/ifi_canfd.c | 9 --
> drivers/net/can/led.c | 143 --------------------------
> drivers/net/can/m_can/m_can.c | 9 --
> drivers/net/can/rcar/rcar_can.c | 8 --
> drivers/net/can/rcar/rcar_canfd.c | 7 --
> drivers/net/can/rx-offload.c | 2 -
> drivers/net/can/sja1000/sja1000.c | 15 +--
> drivers/net/can/spi/hi311x.c | 8 --
> drivers/net/can/spi/mcp251x.c | 10 --
> drivers/net/can/sun4i_can.c | 7 --
> drivers/net/can/ti_hecc.c | 9 --
> drivers/net/can/usb/mcba_usb.c | 7 --
> drivers/net/can/usb/usb_8dev.c | 11 --
> drivers/net/can/xilinx_can.c | 9 --
> include/linux/can/dev.h | 10 --
> include/linux/can/led.h | 54 ----------
> include/linux/leds.h | 18 ----
> 24 files changed, 1 insertion(+), 395 deletions(-)
> delete mode 100644 drivers/net/can/led.c
> delete mode 100644 include/linux/can/led.h
>
--
Best regards,
Jacek Anaszewski
^ permalink raw reply
* Re: WARNING: kmalloc bug in memdup_user (3)
From: Daniel Borkmann @ 2018-06-15 20:24 UTC (permalink / raw)
To: syzbot, ast, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <0000000000005ff424056c5476bf@google.com>
On 05/16/2018 05:35 PM, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: c5c7d7f3c451 Merge branch 'bpf-sock-hashmap'
> git tree: bpf-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1626ae37800000
> kernel config: https://syzkaller.appspot.com/x/.config?x=10c4dc62055b68f5
> dashboard link: https://syzkaller.appspot.com/bug?extid=0f92a17b0706231d0a09
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=126a5197800000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1598c477800000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0f92a17b0706231d0a09@syzkaller.appspotmail.com
#syz fix: bpf: fix sock hashmap kmalloc warning
^ permalink raw reply
* ethtool 4.17 released
From: John W. Linville @ 2018-06-15 20:00 UTC (permalink / raw)
To: netdev
ethtool version 4.17 has been released.
Home page: https://www.kernel.org/pub/software/network/ethtool/
Download link:
https://www.kernel.org/pub/software/network/ethtool/ethtool-4.17.tar.xz
Release notes:
* Fix: In ethtool.8, remove superfluous and incorrect \c.
* Fix: fix uninitialized return value
* Fix: fix RING_VF assignment
* Fix: remove unused global variable
* Fix: several fixes in do_gregs()
* Fix: correctly free hkey when get_stringset() fails
* Fix: remove unreachable code
* Fix: fix stack clash in do_get_phy_tunable and do_set_phy_tunable
* Feature: Add register dump support for MICROCHIP LAN78xx
John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply
* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
From: Tom Herbert @ 2018-06-15 20:12 UTC (permalink / raw)
To: David Miller
Cc: Pablo Neira Ayuso, netfilter-devel,
Linux Kernel Network Developers, Steffen Klassert
In-Reply-To: <20180614.165834.338565136334574983.davem@davemloft.net>
On Thu, Jun 14, 2018 at 4:58 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Thu, 14 Jun 2018 13:52:03 -0700
>
>> IIRC, there was a similar proposal a while back that want to bundle
>> packets of the same flow together (without doing GRO) so that they
>> could be processed by various functions by looking at just one
>> representative packet in the group. The concept had some promise, but
>> in the end it created quite a bit of complexity since at some point
>> the packet bundle needed to be undone to go back to processing the
>> individual packets.
>
> You're probably talking about Edward Cree's SKB list stuff, and as
> per his presenation at netconf 2 weeks ago he plans to revitalize
> it given how Spectre et al. gives cause to reevaluate all bulking
> techniques.nearly
The use case for that will be an interesting question. GSO/GRO solves
the problem for TCP and this extends to nearly all cases where TCP is
in an encapsulated packet. Super efficient forwarding can be done in
XDP/BPF (without needing overhead of GSO/GRO). That pretty much leaves
UDP as non-encapsulation end protocol, which I guess these days pretty
much means QUIC :-) I am still interested to see if we can implement
GSO/GRO for QUIC (via a generic GSO/GRO BPF function so we don't
hardcode any QUIC protocol or other application protocols in kernel).
Tom
^ permalink raw reply
* Re: [PATCH 07/17] net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
From: David Woodhouse @ 2018-06-15 20:00 UTC (permalink / raw)
To: Eric Dumazet, Elena Reshetova, netdev
Cc: Krzysztof Mazur, Kevin Darbyshire-Bryant, 3chas3, Mathias Kresin
In-Reply-To: <1529070283.27158.46.camel@infradead.org>
On Fri, 2018-06-15 at 14:44 +0100, David Woodhouse wrote:
>
> > Or simply use a new field in ATM_SKB(skb) to remember a stable
> > truesize used in both sides (add/sub)
>
> Right, that was my second suggestion ("copy the accounted value...").
>
> It's a bit of a hack, and I think that actually *using* sock_wfree()
> instead of what's currently in atm_pop_raw() would be the better
> solution. Does anyone remember why we didn't do that in the first
> place?
That does end up being quite hairy. I don't think it's worth doing.
This should probably suffice to fix it...
Kevin this is going to conflict with the ifx_atm_alloc_skb() hack in
the tree you're working on, but that needs to be killed with fire
anyway. It's utterly pointless as discussed.
>From 3368eaeb0a2f09138894dde0f26f879e5228005a Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw2@infradead.org>
Date: Fri, 15 Jun 2018 20:49:20 +0100
Subject: [PATCH] atm: Preserve value of skb->truesize when accounting to vcc
There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
for certain skbs. Ideally it would cover ATM too. It doesn't. Just
stashing the accounted value and using it in atm_raw_pop() is probably
the easiest way to cope.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
---
include/linux/atmdev.h | 15 +++++++++++++++
net/atm/br2684.c | 3 +--
net/atm/clip.c | 3 +--
net/atm/common.c | 3 +--
net/atm/lec.c | 3 +--
net/atm/mpc.c | 3 +--
net/atm/pppoatm.c | 3 +--
net/atm/raw.c | 4 ++--
8 files changed, 23 insertions(+), 14 deletions(-)
diff --git a/include/linux/atmdev.h b/include/linux/atmdev.h
index 0c27515d2cf6..8124815eb121 100644
--- a/include/linux/atmdev.h
+++ b/include/linux/atmdev.h
@@ -214,6 +214,7 @@ struct atmphy_ops {
struct atm_skb_data {
struct atm_vcc *vcc; /* ATM VCC */
unsigned long atm_options; /* ATM layer options */
+ unsigned int acct_truesize; /* truesize accounted to vcc */
};
#define VCC_HTABLE_SIZE 32
@@ -241,6 +242,20 @@ void vcc_insert_socket(struct sock *sk);
void atm_dev_release_vccs(struct atm_dev *dev);
+static inline void atm_account_tx(struct atm_vcc *vcc, struct sk_buff *skb)
+{
+ /*
+ * Because ATM skbs may not belong to a sock (and we don't
+ * necessarily want to), skb->truesize may be adjusted,
+ * escaping the hack in pskb_expand_head() which avoids
+ * doing so for some cases. So stash the value of truesize
+ * at the time we accounted it, and atm_pop_raw() can use
+ * that value later, in case it changes.
+ */
+ refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+ ATM_SKB(skb)->acct_truesize = skb->truesize;
+ ATM_SKB(skb)->atm_options = vcc->atm_options;
+}
static inline void atm_force_charge(struct atm_vcc *vcc,int truesize)
{
diff --git a/net/atm/br2684.c b/net/atm/br2684.c
index 4e111196f902..bc21f8e8daf2 100644
--- a/net/atm/br2684.c
+++ b/net/atm/br2684.c
@@ -252,8 +252,7 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev,
ATM_SKB(skb)->vcc = atmvcc = brvcc->atmvcc;
pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, atmvcc, atmvcc->dev);
- refcount_add(skb->truesize, &sk_atm(atmvcc)->sk_wmem_alloc);
- ATM_SKB(skb)->atm_options = atmvcc->atm_options;
+ atm_account_tx(atmvcc, skb);
dev->stats.tx_packets++;
dev->stats.tx_bytes += skb->len;
diff --git a/net/atm/clip.c b/net/atm/clip.c
index 65f706e4344c..60920a42f640 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -381,8 +381,7 @@ static netdev_tx_t clip_start_xmit(struct sk_buff *skb,
memcpy(here, llc_oui, sizeof(llc_oui));
((__be16 *) here)[3] = skb->protocol;
}
- refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
- ATM_SKB(skb)->atm_options = vcc->atm_options;
+ atm_account_tx(vcc, skb);
entry->vccs->last_use = jiffies;
pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, vcc, vcc->dev);
old = xchg(&entry->vccs->xoff, 1); /* assume XOFF ... */
diff --git a/net/atm/common.c b/net/atm/common.c
index 8a4f99114cd2..9e812c782a37 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -630,10 +630,9 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t size)
goto out;
}
pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
- refcount_add(skb->truesize, &sk->sk_wmem_alloc);
+ atm_account_tx(vcc, skb);
skb->dev = NULL; /* for paths shared with net_device interfaces */
- ATM_SKB(skb)->atm_options = vcc->atm_options;
if (!copy_from_iter_full(skb_put(skb, size), size, &m->msg_iter)) {
kfree_skb(skb);
error = -EFAULT;
diff --git a/net/atm/lec.c b/net/atm/lec.c
index a3d93a1bb133..d7cc165e24e0 100644
--- a/net/atm/lec.c
+++ b/net/atm/lec.c
@@ -179,9 +179,8 @@ lec_send(struct atm_vcc *vcc, struct sk_buff *skb)
struct net_device *dev = skb->dev;
ATM_SKB(skb)->vcc = vcc;
- ATM_SKB(skb)->atm_options = vcc->atm_options;
+ atm_account_tx(vcc, skb);
- refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
if (vcc->send(vcc, skb) < 0) {
dev->stats.tx_dropped++;
return;
diff --git a/net/atm/mpc.c b/net/atm/mpc.c
index 5677147209e8..db9a1838687c 100644
--- a/net/atm/mpc.c
+++ b/net/atm/mpc.c
@@ -555,8 +555,7 @@ static int send_via_shortcut(struct sk_buff *skb, struct mpoa_client *mpc)
sizeof(struct llc_snap_hdr));
}
- refcount_add(skb->truesize, &sk_atm(entry->shortcut)->sk_wmem_alloc);
- ATM_SKB(skb)->atm_options = entry->shortcut->atm_options;
+ atm_account_tx(entry->shortcut, skb);
entry->shortcut->send(entry->shortcut, skb);
entry->packets_fwded++;
mpc->in_ops->put(entry);
diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
index 21d9d341a619..af8c4b38b746 100644
--- a/net/atm/pppoatm.c
+++ b/net/atm/pppoatm.c
@@ -350,8 +350,7 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
return 1;
}
- refcount_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
- ATM_SKB(skb)->atm_options = ATM_SKB(skb)->vcc->atm_options;
+ atm_account_tx(vcc, skb);
pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n",
skb, ATM_SKB(skb)->vcc, ATM_SKB(skb)->vcc->dev);
ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
diff --git a/net/atm/raw.c b/net/atm/raw.c
index ee10e8d46185..b3ba44aab0ee 100644
--- a/net/atm/raw.c
+++ b/net/atm/raw.c
@@ -35,8 +35,8 @@ static void atm_pop_raw(struct atm_vcc *vcc, struct sk_buff *skb)
struct sock *sk = sk_atm(vcc);
pr_debug("(%d) %d -= %d\n",
- vcc->vci, sk_wmem_alloc_get(sk), skb->truesize);
- WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
+ vcc->vci, sk_wmem_alloc_get(sk), ATM_SKB(skb)->acct_truesize);
+ WARN_ON(refcount_sub_and_test(ATM_SKB(skb)->acct_truesize, &sk->sk_wmem_alloc));
dev_kfree_skb_any(skb);
sk->sk_write_space(sk);
}
--
2.17.0
--
dwmw2
^ permalink raw reply related
* IMMEDIATE REPLY.
From: Isa Zongo @ 2018-06-15 19:48 UTC (permalink / raw)
In-Reply-To: <86419544.380312.1529092097511.ref@mail.yahoo.com>
DEAR FRIEND,
I know that this letter will come to you as surprise, I got your contact address while I was searching for foreign partner to champion this golden appoint unity that is present in our favor, My name is Mr. Isa Zongo, I am the Bill and Exchange (assistant) Manager CORIS BANK INTERNATIONAL. I'm proposing to lift in your name (US$10.5 Million Dollars) that belong to our later customer, Mr Kurt Kuhle from Alexandra Egypt who died along with his family in Siber airline that crashed into sea at Isreal on 4th October 2001.
I want to present you to my bank here as the beneficiary to this fund, Am waiting for your response for more details, As you are willing to execute this business appoint unity with me.
Thanks,
Yours Sincerely,
Mr. Isa Zongo.
^ permalink raw reply
* Re: [PATCH bpf 0/2] Two bpf fixes
From: Alexei Starovoitov @ 2018-06-15 19:31 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: ast, netdev
In-Reply-To: <20180615003048.3219-1-daniel@iogearbox.net>
On Fri, Jun 15, 2018 at 02:30:46AM +0200, Daniel Borkmann wrote:
> First one is a panic I ran into while testing the second
> one where we got several syzkaller reports. Series here
> fixes both.
>
> Thanks!
Applied, thanks.
The second patch looks dubious to me though.
Nothing in the kernel tree checks the return value of set_memory_ro()
and my understanding that it can fail only when part of huge page
is being marked and pages have to be split. In bpf case I don't think
it's ever the case, so the patch is silencing purely theoretical
syzbot splat that can happen with artificial error injection.
I bet we're still going to see this splat in set_memory_rw.
imo the better fix would have been to drop WARN_ON from both.
^ permalink raw reply
* RE: [PATCH 2/3] net: phy: vitesse: Add support for VSC73xx
From: Woojung.Huh @ 2018-06-15 19:24 UTC (permalink / raw)
To: f.fainelli, linus.walleij, andrew, vivien.didelot, UNGLinuxDriver
Cc: netdev, openwrt-devel, lede-dev, juhosg
In-Reply-To: <87156f76-449c-1ec3-e7fa-776c2fddc992@gmail.com>
Hi Florian,
> On 06/14/2018 05:35 AM, Linus Walleij wrote:
> > The VSC7385, VSC7388, VSC7395 and VSC7398 are integrated
> > switch/router chips for 5+1 or 8-port switches/routers. When
> > managed directly by Linux using DSA we need to do a special
> > set-up "dance" on the PHY. Unfortunately these sequences
> > switches the PHY to undocumented pages named 2a30 and 52b6
> > and does undocumented things. It is described by these opaque
> > sequences also in the reference manual. This is a best
> > effort to integrate it anyways.
> >
> > Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
>
> Probably as good as it can get given the information you have access to.
> Maybe the guys at Mircochip could help, adding them.
Microchip have completed the acquisition of Microsemi last months. It will take some time to get access to the right data.
Hope we can help soon.
Thanks.
Woojung
^ permalink raw reply
* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Eric Dumazet @ 2018-06-15 19:14 UTC (permalink / raw)
To: Mathieu Malaterre, David S. Miller; +Cc: Eric Dumazet, linux-kernel, netdev
In-Reply-To: <20180615185645.8921-1-malat@debian.org>
On 06/15/2018 11:56 AM, Mathieu Malaterre wrote:
> This reverts commit 88078d98d1bb085d72af8437707279e203524fa5.
>
> It causes regressions for people using chips driven by the sungem
> driver. Suspicion is that the skb->csum value isn't being adjusted
> properly.
>
> Symptoms as seen on G4+sungem are:
>
> [ 34.023281] eth0: hw csum failure
> [ 34.023438] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #2
> [ 34.023618] Call Trace:
> [ 34.023707] [dffedbd0] [c069ddac] __skb_checksum_complete+0xf0/0x108 (unreliable)
> [ 34.023948] [dffedbf0] [c0777a70] tcp_v4_rcv+0x604/0xe00
> [ 34.024118] [dffedc70] [c0731624] ip_local_deliver_finish+0xa8/0x3c4
> [ 34.024315] [dffedcb0] [c0732430] ip_local_deliver+0xf0/0x154
> [ 34.024493] [dffedcf0] [c07328dc] ip_rcv+0x448/0x774
> [ 34.024653] [dffedd50] [c06aeae0] __netif_receive_skb_core+0x5e8/0x1184
> [ 34.024857] [dffedde0] [c06bba20] napi_gro_receive+0x160/0x22c
> [ 34.025044] [dffede10] [e14b2590] gem_poll+0x7fc/0x1ac0 [sungem]
> [ 34.025228] [dffedee0] [c06bacf0] net_rx_action+0x34c/0x618
> [ 34.025402] [dffedf60] [c07fd27c] __do_softirq+0x16c/0x5f0
> [ 34.025575] [dffedfd0] [c0064c7c] irq_exit+0x110/0x1a8
> [ 34.025738] [dffedff0] [c0016170] call_do_irq+0x24/0x3c
> [ 34.025903] [c0cf7e80] [c0009a84] do_IRQ+0x98/0x1a0
> [ 34.026055] [c0cf7eb0] [c001b474] ret_from_except+0x0/0x14
> [ 34.026225] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
> LR = arch_cpu_idle+0x30/0x78
> [ 34.026510] [c0cf7f70] [c0cf6000] 0xc0cf6000 (unreliable)
> [ 34.026682] [c0cf7f80] [c00a3868] do_idle+0xc4/0x158
> [ 34.026835] [c0cf7fb0] [c00a3ab0] cpu_startup_entry+0x20/0x28
> [ 34.027013] [c0cf7fc0] [c0998820] start_kernel+0x47c/0x490
> [ 34.027181] [c0cf7ff0] [00003444] 0x3444
>
> See commit 7ce5a27f2ef8 ("Revert "net: Handle CHECKSUM_COMPLETE more
> adequately in pskb_trim_rcsum()."") for previous reference.
This fix seems to hide a bug in csum functions on this architecture.
Or a bug on this NIC when receiving a small packet (less than 60 bytes).
Maybe the padding bytes are not included in NIC provided csum, and not 0.
^ permalink raw reply
* [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Mathieu Malaterre @ 2018-06-15 18:56 UTC (permalink / raw)
To: David S. Miller; +Cc: Mathieu Malaterre, Eric Dumazet, linux-kernel, netdev
This reverts commit 88078d98d1bb085d72af8437707279e203524fa5.
It causes regressions for people using chips driven by the sungem
driver. Suspicion is that the skb->csum value isn't being adjusted
properly.
Symptoms as seen on G4+sungem are:
[ 34.023281] eth0: hw csum failure
[ 34.023438] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #2
[ 34.023618] Call Trace:
[ 34.023707] [dffedbd0] [c069ddac] __skb_checksum_complete+0xf0/0x108 (unreliable)
[ 34.023948] [dffedbf0] [c0777a70] tcp_v4_rcv+0x604/0xe00
[ 34.024118] [dffedc70] [c0731624] ip_local_deliver_finish+0xa8/0x3c4
[ 34.024315] [dffedcb0] [c0732430] ip_local_deliver+0xf0/0x154
[ 34.024493] [dffedcf0] [c07328dc] ip_rcv+0x448/0x774
[ 34.024653] [dffedd50] [c06aeae0] __netif_receive_skb_core+0x5e8/0x1184
[ 34.024857] [dffedde0] [c06bba20] napi_gro_receive+0x160/0x22c
[ 34.025044] [dffede10] [e14b2590] gem_poll+0x7fc/0x1ac0 [sungem]
[ 34.025228] [dffedee0] [c06bacf0] net_rx_action+0x34c/0x618
[ 34.025402] [dffedf60] [c07fd27c] __do_softirq+0x16c/0x5f0
[ 34.025575] [dffedfd0] [c0064c7c] irq_exit+0x110/0x1a8
[ 34.025738] [dffedff0] [c0016170] call_do_irq+0x24/0x3c
[ 34.025903] [c0cf7e80] [c0009a84] do_IRQ+0x98/0x1a0
[ 34.026055] [c0cf7eb0] [c001b474] ret_from_except+0x0/0x14
[ 34.026225] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
LR = arch_cpu_idle+0x30/0x78
[ 34.026510] [c0cf7f70] [c0cf6000] 0xc0cf6000 (unreliable)
[ 34.026682] [c0cf7f80] [c00a3868] do_idle+0xc4/0x158
[ 34.026835] [c0cf7fb0] [c00a3ab0] cpu_startup_entry+0x20/0x28
[ 34.027013] [c0cf7fc0] [c0998820] start_kernel+0x47c/0x490
[ 34.027181] [c0cf7ff0] [00003444] 0x3444
See commit 7ce5a27f2ef8 ("Revert "net: Handle CHECKSUM_COMPLETE more
adequately in pskb_trim_rcsum()."") for previous reference.
Link: https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-June/174444.html
Reported-by: Meelis Roos <mroos@linux.ee>
Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Cc: Eric Dumazet <edumazet@google.com>
---
include/linux/skbuff.h | 5 +++--
net/core/skbuff.c | 14 --------------
2 files changed, 3 insertions(+), 16 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c86885954994..cbc753a3e41c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3134,7 +3134,6 @@ static inline void *skb_push_rcsum(struct sk_buff *skb, unsigned int len)
return skb->data;
}
-int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len);
/**
* pskb_trim_rcsum - trim received skb and update checksum
* @skb: buffer to trim
@@ -3148,7 +3147,9 @@ static inline int pskb_trim_rcsum(struct sk_buff *skb, unsigned int len)
{
if (likely(len >= skb->len))
return 0;
- return pskb_trim_rcsum_slow(skb, len);
+ if (skb->ip_summed == CHECKSUM_COMPLETE)
+ skb->ip_summed = CHECKSUM_NONE;
+ return __pskb_trim(skb, len);
}
static inline int __skb_trim_rcsum(struct sk_buff *skb, unsigned int len)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c642304f178c..360293d1baf3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1840,20 +1840,6 @@ int ___pskb_trim(struct sk_buff *skb, unsigned int len)
}
EXPORT_SYMBOL(___pskb_trim);
-/* Note : use pskb_trim_rcsum() instead of calling this directly
- */
-int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
-{
- if (skb->ip_summed == CHECKSUM_COMPLETE) {
- int delta = skb->len - len;
-
- skb->csum = csum_sub(skb->csum,
- skb_checksum(skb, len, delta, 0));
- }
- return __pskb_trim(skb, len);
-}
-EXPORT_SYMBOL(pskb_trim_rcsum_slow);
-
/**
* __pskb_pull_tail - advance tail of skb header
* @skb: buffer to reallocate
--
2.11.0
^ permalink raw reply related
* KASAN: use-after-free Write in free_htab_elem
From: syzbot @ 2018-06-15 18:40 UTC (permalink / raw)
To: ast, daniel, linux-kernel, netdev, syzkaller-bugs
Hello,
syzbot found the following crash on:
HEAD commit: f0dc7f9c6dd9 Merge git://git.kernel.org/pub/scm/linux/kern..
git tree: bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11dad428400000
kernel config: https://syzkaller.appspot.com/x/.config?x=fa9c20c48788d1c1
dashboard link: https://syzkaller.appspot.com/bug?extid=ce67d3e4fa77eedee964
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ce67d3e4fa77eedee964@syzkaller.appspotmail.com
==================================================================
BUG: KASAN: use-after-free in atomic_dec
include/asm-generic/atomic-instrumented.h:114 [inline]
BUG: KASAN: use-after-free in free_htab_elem+0x23/0x40
kernel/bpf/sockmap.c:224
Write of size 4 at addr ffff8801b3dce648 by task syz-executor1/8114
CPU: 0 PID: 8114 Comm: syz-executor1 Not tainted 4.17.0+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278
atomic_dec include/asm-generic/atomic-instrumented.h:114 [inline]
free_htab_elem+0x23/0x40 kernel/bpf/sockmap.c:224
bpf_tcp_close+0x8c1/0xf80 kernel/bpf/sockmap.c:273
inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427
inet6_release+0x50/0x70 net/ipv6/af_inet6.c:459
__sock_release+0xd7/0x260 net/socket.c:603
sock_close+0x19/0x20 net/socket.c:1186
__fput+0x353/0x890 fs/file_table.c:209
____fput+0x15/0x20 fs/file_table.c:243
task_work_run+0x1e4/0x290 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x1aee/0x2730 kernel/exit.c:865
do_group_exit+0x16f/0x430 kernel/exit.c:968
get_signal+0x886/0x1960 kernel/signal.c:2468
do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
exit_to_usermode_loop+0x2cf/0x360 arch/x86/entry/common.c:162
prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:293
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455b29
Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f3bbf323ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 000000000072bf78 RCX: 0000000000455b29
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bf78
RBP: 000000000072bf78 R08: 0000000000000000 R09: 000000000072bf50
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd7addc2cf R14: 00007f3bbf3249c0 R15: 0000000000000001
Allocated by task 8104:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
kmalloc include/linux/slab.h:513 [inline]
kzalloc include/linux/slab.h:706 [inline]
sock_hash_alloc+0x20d/0x6a0 kernel/bpf/sockmap.c:2003
find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
map_create+0x393/0x1010 kernel/bpf/syscall.c:453
__do_sys_bpf kernel/bpf/syscall.c:2351 [inline]
__se_sys_bpf kernel/bpf/syscall.c:2328 [inline]
__x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2328
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 2131:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kfree+0xd9/0x260 mm/slab.c:3813
sock_hash_free+0x51c/0x6e0 kernel/bpf/sockmap.c:2098
bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
The buggy address belongs to the object at ffff8801b3dce540
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 264 bytes inside of
512-byte region [ffff8801b3dce540, ffff8801b3dce740)
The buggy address belongs to the page:
page:ffffea0006cf7380 count:1 mapcount:0 mapping:ffff8801da800940 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0006caccc8 ffffea0006f57b08 ffff8801da800940
raw: 0000000000000000 ffff8801b3dce040 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8801b3dce500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8801b3dce580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801b3dce600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8801b3dce680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801b3dce700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
^ permalink raw reply
* Re: [iproute2 1/1] rdma: sync some IP headers with glibc
From: Stephen Hemminger @ 2018-06-15 18:19 UTC (permalink / raw)
To: Hoang Le; +Cc: jon.maloy, maloy, ying.xue, netdev, tipc-discussion
In-Reply-To: <1528862996-7045-1-git-send-email-hoang.h.le@dektech.com.au>
On Wed, 13 Jun 2018 11:09:56 +0700
Hoang Le <hoang.h.le@dektech.com.au> wrote:
> In the commit 9a362cc71a45, new userspace header:
> (i.e rdma/rdma_user_cm.h -> linux/in6.h)
> is included before the kernel space header:
> (i.e utils.h -> resolv.h -> netinet/in.h).
>
> This leads to unsynchronous some IP headers and compiler got failure
> with error: redefinition of some structs IP.
>
> In this commit, just reorder this including to make them in-sync.
>
> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Sure applied
^ permalink raw reply
* Re: [PATCH iproute2 v3] ipaddress: strengthen check on 'label' input
From: Stephen Hemminger @ 2018-06-15 18:19 UTC (permalink / raw)
To: Patrick Talbert; +Cc: netdev
In-Reply-To: <1528984017-19490-1-git-send-email-ptalbert@redhat.com>
On Thu, 14 Jun 2018 15:46:57 +0200
Patrick Talbert <ptalbert@redhat.com> wrote:
> As mentioned in the ip-address man page, an address label must
> be equal to the device name or prefixed by the device name
> followed by a colon. Currently the only check on this input is
> to see if the device name appears at the beginning of the label
> string.
>
> This commit adds an additional check to ensure label == dev or
> continues with a colon.
>
> Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
> Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Sure applied
^ permalink raw reply
* Re: [PATCH v2 net-next 4/6] net: ethernet: ti: cpsw: add CBS Qdisc offload
From: Ivan Khoronzhuk @ 2018-06-15 18:15 UTC (permalink / raw)
To: Ilias Apalodimas
Cc: grygorii.strashko, davem, corbet, akpm, netdev, linux-doc,
linux-kernel, linux-omap, vinicius.gomes, henrik,
jesus.sanchez-palencia, p-varis, spatton, francois.ozog, yogeshs,
nsekhar, andrew
In-Reply-To: <20180614080902.GA8377@apalos>
On Thu, Jun 14, 2018 at 11:09:02AM +0300, Ilias Apalodimas wrote:
[...]
>> + "Speed was changed, CBS sahper speeds are changed!");
>typo here, should be shaper
Corrected in v3
--
Regards,
Ivan Khoronzhuk
^ permalink raw reply
* [PATCH v3 net-next 6/6] Documentation: networking: cpsw: add MQPRIO & CBS offload examples
From: Ivan Khoronzhuk @ 2018-06-15 18:13 UTC (permalink / raw)
To: grygorii.strashko, davem
Cc: corbet, akpm, netdev, linux-doc, linux-kernel, linux-omap,
vinicius.gomes, henrik, jesus.sanchez-palencia, ilias.apalodimas,
p-varis, spatton, francois.ozog, yogeshs, nsekhar, andrew,
Ivan Khoronzhuk
In-Reply-To: <20180615181310.10437-1-ivan.khoronzhuk@linaro.org>
This document describes MQPRIO and CBS Qdisc offload configuration
for cpsw driver based on examples. It potentially can be used in
audio video bridging (AVB) and time sensitive networking (TSN).
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
Documentation/networking/ti-cpsw.txt | 540 +++++++++++++++++++++++++++
1 file changed, 540 insertions(+)
create mode 100644 Documentation/networking/ti-cpsw.txt
diff --git a/Documentation/networking/ti-cpsw.txt b/Documentation/networking/ti-cpsw.txt
new file mode 100644
index 000000000000..1e840346cae1
--- /dev/null
+++ b/Documentation/networking/ti-cpsw.txt
@@ -0,0 +1,540 @@
+* Texas Instruments CPSW ethernet driver
+
+Multiqueue & CBS & MQPRIO
+=====================================================================
+=====================================================================
+
+The cpsw has 3 CBS shapers for each external ports. This document
+describes MQPRIO and CBS Qdisc offload configuration for cpsw driver
+based on examples. It potentially can be used in audio video bridging
+(AVB) and time sensitive networking (TSN).
+
+The following examples was tested on AM572x EVM and BBB boards.
+
+Test setup
+==========
+
+Under consideration two examples with AM52xx EVM running cpsw driver
+in dual_emac mode.
+
+Several prerequisites:
+- TX queues must be rated starting from txq0 that has highest priority
+- Traffic classes are used starting from 0, that has highest priority
+- CBS shapers should be used with rated queues
+- The bandwidth for CBS shapers has to be set a little bit more then
+ potential incoming rate, thus, rate of all incoming tx queues has
+ to be a little less
+- Real rates can differ, due to discreetness
+- Map skb-priority to txq is not enough, also skb-priority to l2 prio
+ map has to be created with ip or vconfig tool
+- Any l2/socket prio (0 - 7) for classes can be used, but for
+ simplicity default values are used: 3 and 2
+- only 2 classes tested: A and B, but checked and can work with more,
+ maximum allowed 4, but only for 3 rate can be set.
+
+Test setup for examples
+=======================
+ +-------------------------------+
+ |--+ |
+ | | Workstation0 |
+ |E | MAC 18:03:73:66:87:42 |
++-----------------------------+ +--|t | |
+| | 1 | E | | |h |./tsn_listener -d \ |
+| Target board: | 0 | t |--+ |0 | 18:03:73:66:87:42 -i eth0 \|
+| AM572x EVM | 0 | h | | | -s 1500 |
+| | 0 | 0 | |--+ |
+| Only 2 classes: |Mb +---| +-------------------------------+
+| class A, class B | |
+| | +---| +-------------------------------+
+| | 1 | E | |--+ |
+| | 0 | t | | | Workstation1 |
+| | 0 | h |--+ |E | MAC 20:cf:30:85:7d:fd |
+| |Mb | 1 | +--|t | |
++-----------------------------+ |h |./tsn_listener -d \ |
+ |0 | 20:cf:30:85:7d:fd -i eth0 \|
+ | | -s 1500 |
+ |--+ |
+ +-------------------------------+
+
+*********************************************************************
+*********************************************************************
+*********************************************************************
+Example 1: One port tx AVB configuration scheme for target board
+----------------------------------------------------------------------
+(prints and scheme for AM52xx evm, applicable for single port boards)
+
+tc - traffic class
+txq - transmit queue
+p - priority
+f - fifo (cpsw fifo)
+S - shaper configured
+
++------------------------------------------------------------------+ u
+| +---------------+ +---------------+ +------+ +------+ | s
+| | | | | | | | | | e
+| | App 1 | | App 2 | | Apps | | Apps | | r
+| | Class A | | Class B | | Rest | | Rest | |
+| | Eth0 | | Eth0 | | Eth0 | | Eth1 | | s
+| | VLAN100 | | VLAN100 | | | | | | | | p
+| | 40 Mb/s | | 20 Mb/s | | | | | | | | a
+| | SO_PRIORITY=3 | | SO_PRIORITY=2 | | | | | | | | c
+| | | | | | | | | | | | | | e
+| +---|-----------+ +---|-----------+ +---|--+ +---|--+ |
++-----|------------------|------------------|--------|-------------+
+ +-+ +------------+ | |
+ | | +-----------------+ +--+
+ | | | |
++---|-------|-------------|-----------------------|----------------+
+| +----+ +----+ +----+ +----+ +----+ |
+| | p3 | | p2 | | p1 | | p0 | | p0 | | k
+| \ / \ / \ / \ / \ / | e
+| \ / \ / \ / \ / \ / | r
+| \/ \/ \/ \/ \/ | n
+| | | | | | e
+| | | +-----+ | | l
+| | | | | |
+| +----+ +----+ +----+ +----+ | s
+| |tc0 | |tc1 | |tc2 | |tc0 | | p
+| \ / \ / \ / \ / | a
+| \ / \ / \ / \ / | c
+| \/ \/ \/ \/ | e
+| | | +-----+ | |
+| | | | | | |
+| | | | | | |
+| | | | | | |
+| +----+ +----+ +----+ +----+ +----+ |
+| |txq0| |txq1| |txq2| |txq3| |txq4| |
+| \ / \ / \ / \ / \ / |
+| \ / \ / \ / \ / \ / |
+| \/ \/ \/ \/ \/ |
+| +-|------|------|------|--+ +--|--------------+ |
+| | | | | | | Eth0.100 | | Eth1 | |
++---|------|------|------|------------------------|----------------+
+ | | | | |
+ p p p p |
+ 3 2 0-1, 4-7 <- L2 priority |
+ | | | | |
+ | | | | |
++---|------|------|------|------------------------|----------------+
+| | | | | |----------+ |
+| +----+ +----+ +----+ +----+ +----+ |
+| |dma7| |dma6| |dma5| |dma4| |dma3| |
+| \ / \ / \ / \ / \ / | c
+| \S / \S / \ / \ / \ / | p
+| \/ \/ \/ \/ \/ | s
+| | | | +----- | | w
+| | | | | | |
+| | | | | | | d
+| +----+ +----+ +----+p p+----+ | r
+| | | | | | |o o| | | i
+| | f3 | | f2 | | f0 |r r| f0 | | v
+| |tc0 | |tc1 | |tc2 |t t|tc0 | | e
+| \CBS / \CBS / \CBS /1 2\CBS / | r
+| \S / \S / \ / \ / |
+| \/ \/ \/ \/ |
++------------------------------------------------------------------+
+========================================Eth==========================>
+
+1)
+// Add 4 tx queues, for interface Eth0, and 1 tx queue for Eth1
+$ ethtool -L eth0 rx 1 tx 5
+rx unmodified, ignoring
+
+2)
+// Check if num of queues is set correctly:
+$ ethtool -l eth0
+Channel parameters for eth0:
+Pre-set maximums:
+RX: 8
+TX: 8
+Other: 0
+Combined: 0
+Current hardware settings:
+RX: 1
+TX: 5
+Other: 0
+Combined: 0
+
+3)
+// TX queues must be rated starting from 0, so set bws for tx0 and tx1
+// Set rates 40 and 20 Mb/s appropriately.
+// Pay attention, real speed can differ a bit due to discreetness.
+// Leave last 2 tx queues not rated.
+$ echo 40 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
+$ echo 20 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
+
+4)
+// Check maximum rate of tx (cpdma) queues:
+$ cat /sys/class/net/eth0/queues/tx-*/tx_maxrate
+40
+20
+0
+0
+0
+
+5)
+// Map skb->priority to traffic class:
+// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
+// Map traffic class to transmit queue:
+// tc0 -> txq0, tc1 -> txq1, tc2 -> (txq2, txq3)
+$ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
+map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1
+
+5a)
+// As two interface sharing same set of tx queues, assign all traffic
+// coming to interface Eth1 to separate queue in order to not mix it
+// with traffic from interface Eth0, so use separate txq to send
+// packets to Eth1, so all prio -> tc0 and tc0 -> txq4
+// Here hw 0, so here still default configuration for eth1 in hw
+$ tc qdisc replace dev eth1 handle 100: parent root mqprio num_tc 1 \
+map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@4 hw 0
+
+6)
+// Check classes settings
+$ tc -g class show dev eth0
++---(100:ffe2) mqprio
+| +---(100:3) mqprio
+| +---(100:4) mqprio
+|
++---(100:ffe1) mqprio
+| +---(100:2) mqprio
+|
++---(100:ffe0) mqprio
+ +---(100:1) mqprio
+
+$ tc -g class show dev eth1
++---(100:ffe0) mqprio
+ +---(100:5) mqprio
+
+7)
+// Set rate for class A - 41 Mbit (tc0, txq0) using CBS Qdisc
+// Set it +1 Mb for reserve (important!)
+// here only idle slope is important, others arg are ignored
+// Pay attention, real speed can differ a bit due to discreetness
+$ tc qdisc add dev eth0 parent 100:1 cbs locredit -1438 \
+hicredit 62 sendslope -959000 idleslope 41000 offload 1
+net eth0: set FIFO3 bw = 50
+
+8)
+// Set rate for class B - 21 Mbit (tc1, txq1) using CBS Qdisc:
+// Set it +1 Mb for reserve (important!)
+$ tc qdisc add dev eth0 parent 100:2 cbs locredit -1468 \
+hicredit 65 sendslope -979000 idleslope 21000 offload 1
+net eth0: set FIFO2 bw = 30
+
+9)
+// Create vlan 100 to map sk->priority to vlan qos
+$ ip link add link eth0 name eth0.100 type vlan id 100
+8021q: 802.1Q VLAN Support v1.8
+8021q: adding VLAN 0 to HW filter on device eth0
+8021q: adding VLAN 0 to HW filter on device eth1
+net eth0: Adding vlanid 100 to vlan filter
+
+10)
+// Map skb->priority to L2 prio, 1 to 1
+$ ip link set eth0.100 type vlan \
+egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+11)
+// Check egress map for vlan 100
+$ cat /proc/net/vlan/eth0.100
+[...]
+INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
+EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+12)
+// Run your appropriate tools with socket option "SO_PRIORITY"
+// to 3 for class A and/or to 2 for class B
+// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
+./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p3 -s 1500&
+./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p2 -s 1500&
+
+13)
+// run your listener on workstation (should be in same vlan)
+// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
+./tsn_listener -d 18:03:73:66:87:42 -i enp5s0 -s 1500
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39000 kbps
+
+14)
+// Restore default configuration if needed
+$ ip link del eth0.100
+$ tc qdisc del dev eth1 root
+$ tc qdisc del dev eth0 root
+net eth0: Prev FIFO2 is shaped
+net eth0: set FIFO3 bw = 0
+net eth0: set FIFO2 bw = 0
+$ ethtool -L eth0 rx 1 tx 1
+
+*********************************************************************
+*********************************************************************
+*********************************************************************
+Example 2: Two port tx AVB configuration scheme for target board
+----------------------------------------------------------------------
+(prints and scheme for AM52xx evm, for dual emac boards only)
+
++------------------------------------------------------------------+ u
+| +----------+ +----------+ +------+ +----------+ +----------+ | s
+| | | | | | | | | | | | e
+| | App 1 | | App 2 | | Apps | | App 3 | | App 4 | | r
+| | Class A | | Class B | | Rest | | Class B | | Class A | |
+| | Eth0 | | Eth0 | | | | | Eth1 | | Eth1 | | s
+| | VLAN100 | | VLAN100 | | | | | VLAN100 | | VLAN100 | | p
+| | 40 Mb/s | | 20 Mb/s | | | | | 10 Mb/s | | 30 Mb/s | | a
+| | SO_PRI=3 | | SO_PRI=2 | | | | | SO_PRI=3 | | SO_PRI=2 | | c
+| | | | | | | | | | | | | | | | | e
+| +---|------+ +---|------+ +---|--+ +---|------+ +---|------+ |
++-----|-------------|-------------|---------|-------------|--------+
+ +-+ +-------+ | +----------+ +----+
+ | | +-------+------+ | |
+ | | | | | |
++---|-------|-------------|--------------|-------------|-------|---+
+| +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ |
+| | p3 | | p2 | | p1 | | p0 | | p0 | | p1 | | p2 | | p3 | | k
+| \ / \ / \ / \ / \ / \ / \ / \ / | e
+| \ / \ / \ / \ / \ / \ / \ / \ / | r
+| \/ \/ \/ \/ \/ \/ \/ \/ | n
+| | | | | | | | e
+| | | +----+ +----+ | | | l
+| | | | | | | |
+| +----+ +----+ +----+ +----+ +----+ +----+ | s
+| |tc0 | |tc1 | |tc2 | |tc2 | |tc1 | |tc0 | | p
+| \ / \ / \ / \ / \ / \ / | a
+| \ / \ / \ / \ / \ / \ / | c
+| \/ \/ \/ \/ \/ \/ | e
+| | | +-----+ +-----+ | | |
+| | | | | | | | | |
+| | | | | | | | | |
+| | | | | E E | | | | |
+| +----+ +----+ +----+ +----+ t t +----+ +----+ +----+ +----+ |
+| |txq0| |txq1| |txq4| |txq5| h h |txq6| |txq7| |txq3| |txq2| |
+| \ / \ / \ / \ / 0 1 \ / \ / \ / \ / |
+| \ / \ / \ / \ / . . \ / \ / \ / \ / |
+| \/ \/ \/ \/ 1 1 \/ \/ \/ \/ |
+| +-|------|------|------|--+ 0 0 +-|------|------|------|--+ |
+| | | | | | | 0 0 | | | | | | |
++---|------|------|------|---------------|------|------|------|----+
+ | | | | | | | |
+ p p p p p p p p
+ 3 2 0-1, 4-7 <-L2 pri-> 0-1, 4-7 2 3
+ | | | | | | | |
+ | | | | | | | |
++---|------|------|------|---------------|------|------|------|----+
+| | | | | | | | | |
+| +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ |
+| |dma7| |dma6| |dma3| |dma2| |dma1| |dma0| |dma4| |dma5| |
+| \ / \ / \ / \ / \ / \ / \ / \ / | c
+| \S / \S / \ / \ / \ / \ / \S / \S / | p
+| \/ \/ \/ \/ \/ \/ \/ \/ | s
+| | | | +----- | | | | | w
+| | | | | +----+ | | | |
+| | | | | | | | | | d
+| +----+ +----+ +----+p p+----+ +----+ +----+ | r
+| | | | | | |o o| | | | | | | i
+| | f3 | | f2 | | f0 |r CPSW r| f3 | | f2 | | f0 | | v
+| |tc0 | |tc1 | |tc2 |t t|tc0 | |tc1 | |tc2 | | e
+| \CBS / \CBS / \CBS /1 2\CBS / \CBS / \CBS / | r
+| \S / \S / \ / \S / \S / \ / |
+| \/ \/ \/ \/ \/ \/ |
++------------------------------------------------------------------+
+========================================Eth==========================>
+
+1)
+// Add 8 tx queues, for interface Eth0, but they are common, so are accessed
+// by two interfaces Eth0 and Eth1.
+$ ethtool -L eth1 rx 1 tx 8
+rx unmodified, ignoring
+
+2)
+// Check if num of queues is set correctly:
+$ ethtool -l eth0
+Channel parameters for eth0:
+Pre-set maximums:
+RX: 8
+TX: 8
+Other: 0
+Combined: 0
+Current hardware settings:
+RX: 1
+TX: 8
+Other: 0
+Combined: 0
+
+3)
+// TX queues must be rated starting from 0, so set bws for tx0 and tx1 for Eth0
+// and for tx2 and tx3 for Eth1. That is, rates 40 and 20 Mb/s appropriately
+// for Eth0 and 30 and 10 Mb/s for Eth1.
+// Real speed can differ a bit due to discreetness
+// Leave last 4 tx queues as not rated
+$ echo 40 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
+$ echo 20 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
+$ echo 30 > /sys/class/net/eth1/queues/tx-2/tx_maxrate
+$ echo 10 > /sys/class/net/eth1/queues/tx-3/tx_maxrate
+
+4)
+// Check maximum rate of tx (cpdma) queues:
+$ cat /sys/class/net/eth0/queues/tx-*/tx_maxrate
+40
+20
+30
+10
+0
+0
+0
+0
+
+5)
+// Map skb->priority to traffic class for Eth0:
+// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
+// Map traffic class to transmit queue:
+// tc0 -> txq0, tc1 -> txq1, tc2 -> (txq4, txq5)
+$ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
+map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@4 hw 1
+
+6)
+// Check classes settings
+$ tc -g class show dev eth0
++---(100:ffe2) mqprio
+| +---(100:5) mqprio
+| +---(100:6) mqprio
+|
++---(100:ffe1) mqprio
+| +---(100:2) mqprio
+|
++---(100:ffe0) mqprio
+ +---(100:1) mqprio
+
+7)
+// Set rate for class A - 41 Mbit (tc0, txq0) using CBS Qdisc for Eth0
+// here only idle slope is important, others ignored
+// Real speed can differ a bit due to discreetness
+$ tc qdisc add dev eth0 parent 100:1 cbs locredit -1470 \
+hicredit 62 sendslope -959000 idleslope 41000 offload 1
+net eth0: set FIFO3 bw = 50
+
+8)
+// Set rate for class B - 21 Mbit (tc1, txq1) using CBS Qdisc for Eth0
+$ tc qdisc add dev eth0 parent 100:2 cbs locredit -1470 \
+hicredit 65 sendslope -979000 idleslope 21000 offload 1
+net eth0: set FIFO2 bw = 30
+
+9)
+// Create vlan 100 to map sk->priority to vlan qos for Eth0
+$ ip link add link eth0 name eth0.100 type vlan id 100
+net eth0: Adding vlanid 100 to vlan filter
+
+10)
+// Map skb->priority to L2 prio for Eth0.100, one to one
+$ ip link set eth0.100 type vlan \
+egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+11)
+// Check egress map for vlan 100
+$ cat /proc/net/vlan/eth0.100
+[...]
+INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
+EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+12)
+// Map skb->priority to traffic class for Eth1:
+// 3pri -> tc0, 2pri -> tc1, (0,1,4-7)pri -> tc2
+// Map traffic class to transmit queue:
+// tc0 -> txq2, tc1 -> txq3, tc2 -> (txq6, txq7)
+$ tc qdisc replace dev eth1 handle 100: parent root mqprio num_tc 3 \
+map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@2 1@3 2@6 hw 1
+
+13)
+// Check classes settings
+$ tc -g class show dev eth1
++---(100:ffe2) mqprio
+| +---(100:7) mqprio
+| +---(100:8) mqprio
+|
++---(100:ffe1) mqprio
+| +---(100:4) mqprio
+|
++---(100:ffe0) mqprio
+ +---(100:3) mqprio
+
+14)
+// Set rate for class A - 31 Mbit (tc0, txq2) using CBS Qdisc for Eth1
+// here only idle slope is important, others ignored
+// Set it +1 Mb for reserve (important!)
+$ tc qdisc add dev eth1 parent 100:3 cbs locredit -1453 \
+hicredit 47 sendslope -969000 idleslope 31000 offload 1
+net eth1: set FIFO3 bw = 31
+
+15)
+// Set rate for class B - 11 Mbit (tc1, txq3) using CBS Qdisc for Eth1
+// Set it +1 Mb for reserve (important!)
+$ tc qdisc add dev eth1 parent 100:4 cbs locredit -1483 \
+hicredit 34 sendslope -989000 idleslope 11000 offload 1
+net eth1: set FIFO2 bw = 11
+
+16)
+// Create vlan 100 to map sk->priority to vlan qos for Eth1
+$ ip link add link eth1 name eth1.100 type vlan id 100
+net eth1: Adding vlanid 100 to vlan filter
+
+17)
+// Map skb->priority to L2 prio for Eth1.100, one to one
+$ ip link set eth1.100 type vlan \
+egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+18)
+// Check egress map for vlan 100
+$ cat /proc/net/vlan/eth1.100
+[...]
+INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
+EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
+
+19)
+// Run appropriate tools with socket option "SO_PRIORITY" to 3
+// for class A and to 2 for class B. For both interfaces
+./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p2 -s 1500&
+./tsn_talker -d 18:03:73:66:87:42 -i eth0.100 -p3 -s 1500&
+./tsn_talker -d 20:cf:30:85:7d:fd -i eth1.100 -p2 -s 1500&
+./tsn_talker -d 20:cf:30:85:7d:fd -i eth1.100 -p3 -s 1500&
+
+20)
+// run your listener on workstation (should be in same vlan)
+// (I took at https://www.spinics.net/lists/netdev/msg460869.html)
+./tsn_listener -d 18:03:73:66:87:42 -i enp5s0 -s 1500
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39012 kbps
+Receiving data rate: 39000 kbps
+
+21)
+// Restore default configuration if needed
+$ ip link del eth1.100
+$ ip link del eth0.100
+$ tc qdisc del dev eth1 root
+net eth1: Prev FIFO2 is shaped
+net eth1: set FIFO3 bw = 0
+net eth1: set FIFO2 bw = 0
+$ tc qdisc del dev eth0 root
+net eth0: Prev FIFO2 is shaped
+net eth0: set FIFO3 bw = 0
+net eth0: set FIFO2 bw = 0
+$ ethtool -L eth0 rx 1 tx 1
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 5/6] net: ethernet: ti: cpsw: restore shaper configuration while down/up
From: Ivan Khoronzhuk @ 2018-06-15 18:13 UTC (permalink / raw)
To: grygorii.strashko, davem
Cc: corbet, akpm, netdev, linux-doc, linux-kernel, linux-omap,
vinicius.gomes, henrik, jesus.sanchez-palencia, ilias.apalodimas,
p-varis, spatton, francois.ozog, yogeshs, nsekhar, andrew,
Ivan Khoronzhuk
In-Reply-To: <20180615181310.10437-1-ivan.khoronzhuk@linaro.org>
Need to restore shapers configuration after interface was down/up.
This is needed as appropriate configuration is still replicated in
kernel settings. This only shapers context restore, so vlan
configuration should be restored by user if needed, especially for
devices with one port where vlan frames are sent via ALE.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
drivers/net/ethernet/ti/cpsw.c | 47 ++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 19573627a9bb..e745b4a519dd 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1807,6 +1807,51 @@ static int cpsw_set_cbs(struct net_device *ndev,
return ret;
}
+static void cpsw_cbs_resume(struct cpsw_slave *slave, struct cpsw_priv *priv)
+{
+ int fifo, bw;
+
+ for (fifo = CPSW_FIFO_SHAPERS_NUM; fifo > 0; fifo--) {
+ bw = priv->fifo_bw[fifo];
+ if (!bw)
+ continue;
+
+ cpsw_set_fifo_rlimit(priv, fifo, bw);
+ }
+}
+
+static void cpsw_mqprio_resume(struct cpsw_slave *slave, struct cpsw_priv *priv)
+{
+ struct cpsw_common *cpsw = priv->cpsw;
+ u32 tx_prio_map = 0;
+ int i, tc, fifo;
+ u32 tx_prio_rg;
+
+ if (!priv->mqprio_hw)
+ return;
+
+ for (i = 0; i < 8; i++) {
+ tc = netdev_get_prio_tc_map(priv->ndev, i);
+ fifo = CPSW_FIFO_SHAPERS_NUM - tc;
+ tx_prio_map |= fifo << (4 * i);
+ }
+
+ tx_prio_rg = cpsw->version == CPSW_VERSION_1 ?
+ CPSW1_TX_PRI_MAP : CPSW2_TX_PRI_MAP;
+
+ slave_write(slave, tx_prio_map, tx_prio_rg);
+}
+
+/* restore resources after port reset */
+static void cpsw_restore(struct cpsw_priv *priv)
+{
+ /* restore MQPRIO offload */
+ for_each_slave(priv, cpsw_mqprio_resume, priv);
+
+ /* restore CBS offload */
+ for_each_slave(priv, cpsw_cbs_resume, priv);
+}
+
static int cpsw_ndo_open(struct net_device *ndev)
{
struct cpsw_priv *priv = netdev_priv(ndev);
@@ -1886,6 +1931,8 @@ static int cpsw_ndo_open(struct net_device *ndev)
}
+ cpsw_restore(priv);
+
/* Enable Interrupt pacing if configured */
if (cpsw->coal_intvl != 0) {
struct ethtool_coalesce coal;
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 4/6] net: ethernet: ti: cpsw: add CBS Qdisc offload
From: Ivan Khoronzhuk @ 2018-06-15 18:13 UTC (permalink / raw)
To: grygorii.strashko, davem
Cc: corbet, akpm, netdev, linux-doc, linux-kernel, linux-omap,
vinicius.gomes, henrik, jesus.sanchez-palencia, ilias.apalodimas,
p-varis, spatton, francois.ozog, yogeshs, nsekhar, andrew,
Ivan Khoronzhuk
In-Reply-To: <20180615181310.10437-1-ivan.khoronzhuk@linaro.org>
The cpsw has up to 4 FIFOs per port and upper 3 FIFOs can feed rate
limited queue with shaping. In order to set and enable shaping for
those 3 FIFOs queues the network device with CBS qdisc attached is
needed. The CBS configuration is added for dual-emac/single port mode
only, but potentially can be used in switch mode also, based on
switchdev for instance.
Despite the FIFO shapers can work w/o cpdma level shapers the base
usage must be in combine with cpdma level shapers as described in TRM,
that are set as maximum rates for interface queues with sysfs.
One of the possible configuration with txq shapers and CBS shapers:
Configured with echo RATE >
/sys/class/net/eth0/queues/tx-0/tx_maxrate
/---------------------------------------------------
/
/ cpdma level shapers
+----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+
| c7 | | c6 | | c5 | | c4 | | c3 | | c2 | | c1 | | c0 |
\ / \ / \ / \ / \ / \ / \ / \ /
\ / \ / \ / \ / \ / \ / \ / \ /
\/ \/ \/ \/ \/ \/ \/ \/
+---------|------|------|------|-------------------------------------+
| +----+ | | +---+ |
| | +----+ | | |
| v v v v |
| +----+ +----+ +----+ +----+ p p+----+ +----+ +----+ +----+ |
| | | | | | | | | o o| | | | | | | | |
| | f3 | | f2 | | f1 | | f0 | r CPSW r| f3 | | f2 | | f1 | | f0 | |
| | | | | | | | | t t| | | | | | | | |
| \ / \ / \ / \ / 0 1\ / \ / \ / \ / |
| \ X \ / \ / \ / \ / \ / \ / \ / |
| \/ \ \/ \/ \/ \/ \/ \/ \/ |
+-------\------------------------------------------------------------+
\
\ FIFO shaper, set with CBS offload added in this patch,
\ FIFO0 cannot be rate limited
------------------------------------------------------
CBS shaper configuration is supposed to be used with root MQPRIO Qdisc
offload allowing to add sk_prio->tc->txq maps that direct traffic to
appropriate tx queue and maps L2 priority to FIFO shaper.
The CBS shaper is intended to be used for AVB where L2 priority
(pcp field) is used to differentiate class of traffic. So additionally
vlan needs to be created with appropriate egress sk_prio->l2 prio map.
If CBS has several tx queues assigned to it, the sum of their
bandwidth has not overlap bandwidth set for CBS. It's recomended the
CBS bandwidth to be a little bit more.
The CBS shaper is configured with CBS qdisc offload interface using tc
tool from iproute2 packet.
For instance:
$ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1
$ tc -g class show dev eth0
+---(100:ffe2) mqprio
| +---(100:3) mqprio
| +---(100:4) mqprio
|
+---(100:ffe1) mqprio
| +---(100:2) mqprio
|
+---(100:ffe0) mqprio
+---(100:1) mqprio
$ tc qdisc add dev eth0 parent 100:1 cbs locredit -1440 \
hicredit 60 sendslope -960000 idleslope 40000 offload 1
$ tc qdisc add dev eth0 parent 100:2 cbs locredit -1470 \
hicredit 62 sendslope -980000 idleslope 20000 offload 1
The above code set CBS shapers for tc0 and tc1, for that txq0 and
txq1 is used. Pay attention, the real set bandwidth can differ a bit
due to discreteness of configuration parameters.
Here parameters like locredit, hicredit and sendslope are ignored
internally and are supposed to be set with assumption that maximum
frame size for frame - 1500.
It's supposed that interface speed is not changed while reconnection,
not always is true, so inform user in case speed of interface was
changed, as it can impact on dependent shapers configuration.
For more examples see Documentation.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
drivers/net/ethernet/ti/cpsw.c | 221 +++++++++++++++++++++++++++++++++
1 file changed, 221 insertions(+)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index edd14def98df..19573627a9bb 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -46,6 +46,8 @@
#include "cpts.h"
#include "davinci_cpdma.h"
+#include <net/pkt_sched.h>
+
#define CPSW_DEBUG (NETIF_MSG_HW | NETIF_MSG_WOL | \
NETIF_MSG_DRV | NETIF_MSG_LINK | \
NETIF_MSG_IFUP | NETIF_MSG_INTR | \
@@ -154,8 +156,12 @@ do { \
#define IRQ_NUM 2
#define CPSW_MAX_QUEUES 8
#define CPSW_CPDMA_DESCS_POOL_SIZE_DEFAULT 256
+#define CPSW_FIFO_QUEUE_TYPE_SHIFT 16
+#define CPSW_FIFO_SHAPE_EN_SHIFT 16
+#define CPSW_FIFO_RATE_EN_SHIFT 20
#define CPSW_TC_NUM 4
#define CPSW_FIFO_SHAPERS_NUM (CPSW_TC_NUM - 1)
+#define CPSW_PCT_MASK 0x7f
#define CPSW_RX_VLAN_ENCAP_HDR_PRIO_SHIFT 29
#define CPSW_RX_VLAN_ENCAP_HDR_PRIO_MSK GENMASK(2, 0)
@@ -457,6 +463,8 @@ struct cpsw_priv {
bool rx_pause;
bool tx_pause;
bool mqprio_hw;
+ int fifo_bw[CPSW_TC_NUM];
+ int shp_cfg_speed;
u32 emac_port;
struct cpsw_common *cpsw;
};
@@ -1081,6 +1089,38 @@ static void cpsw_set_slave_mac(struct cpsw_slave *slave,
slave_write(slave, mac_lo(priv->mac_addr), SA_LO);
}
+static bool cpsw_shp_is_off(struct cpsw_priv *priv)
+{
+ struct cpsw_common *cpsw = priv->cpsw;
+ struct cpsw_slave *slave;
+ u32 shift, mask, val;
+
+ val = readl_relaxed(&cpsw->regs->ptype);
+
+ slave = &cpsw->slaves[cpsw_slave_index(cpsw, priv)];
+ shift = CPSW_FIFO_SHAPE_EN_SHIFT + 3 * slave->slave_num;
+ mask = 7 << shift;
+ val = val & mask;
+
+ return !val;
+}
+
+static void cpsw_fifo_shp_on(struct cpsw_priv *priv, int fifo, int on)
+{
+ struct cpsw_common *cpsw = priv->cpsw;
+ struct cpsw_slave *slave;
+ u32 shift, mask, val;
+
+ val = readl_relaxed(&cpsw->regs->ptype);
+
+ slave = &cpsw->slaves[cpsw_slave_index(cpsw, priv)];
+ shift = CPSW_FIFO_SHAPE_EN_SHIFT + 3 * slave->slave_num;
+ mask = (1 << --fifo) << shift;
+ val = on ? val | mask : val & ~mask;
+
+ writel_relaxed(val, &cpsw->regs->ptype);
+}
+
static void _cpsw_adjust_link(struct cpsw_slave *slave,
struct cpsw_priv *priv, bool *link)
{
@@ -1120,6 +1160,12 @@ static void _cpsw_adjust_link(struct cpsw_slave *slave,
mac_control |= BIT(4);
*link = true;
+
+ if (priv->shp_cfg_speed &&
+ priv->shp_cfg_speed != slave->phy->speed &&
+ !cpsw_shp_is_off(priv))
+ dev_warn(priv->dev,
+ "Speed was changed, CBS shaper speeds are changed!");
} else {
mac_control = 0;
/* disable forwarding */
@@ -1589,6 +1635,178 @@ static int cpsw_tc_to_fifo(int tc, int num_tc)
return CPSW_FIFO_SHAPERS_NUM - tc;
}
+static int cpsw_set_fifo_bw(struct cpsw_priv *priv, int fifo, int bw)
+{
+ struct cpsw_common *cpsw = priv->cpsw;
+ u32 val = 0, send_pct, shift;
+ struct cpsw_slave *slave;
+ int pct = 0, i;
+
+ if (bw > priv->shp_cfg_speed * 1000)
+ goto err;
+
+ /* shaping has to stay enabled for highest fifos linearly
+ * and fifo bw no more then interface can allow
+ */
+ slave = &cpsw->slaves[cpsw_slave_index(cpsw, priv)];
+ send_pct = slave_read(slave, SEND_PERCENT);
+ for (i = CPSW_FIFO_SHAPERS_NUM; i > 0; i--) {
+ if (!bw) {
+ if (i >= fifo || !priv->fifo_bw[i])
+ continue;
+
+ dev_warn(priv->dev, "Prev FIFO%d is shaped", i);
+ continue;
+ }
+
+ if (!priv->fifo_bw[i] && i > fifo) {
+ dev_err(priv->dev, "Upper FIFO%d is not shaped", i);
+ return -EINVAL;
+ }
+
+ shift = (i - 1) * 8;
+ if (i == fifo) {
+ send_pct &= ~(CPSW_PCT_MASK << shift);
+ val = DIV_ROUND_UP(bw, priv->shp_cfg_speed * 10);
+ if (!val)
+ val = 1;
+
+ send_pct |= val << shift;
+ pct += val;
+ continue;
+ }
+
+ if (priv->fifo_bw[i])
+ pct += (send_pct >> shift) & CPSW_PCT_MASK;
+ }
+
+ if (pct >= 100)
+ goto err;
+
+ slave_write(slave, send_pct, SEND_PERCENT);
+ priv->fifo_bw[fifo] = bw;
+
+ dev_warn(priv->dev, "set FIFO%d bw = %d\n", fifo,
+ DIV_ROUND_CLOSEST(val * priv->shp_cfg_speed, 100));
+
+ return 0;
+err:
+ dev_err(priv->dev, "Bandwidth doesn't fit in tc configuration");
+ return -EINVAL;
+}
+
+static int cpsw_set_fifo_rlimit(struct cpsw_priv *priv, int fifo, int bw)
+{
+ struct cpsw_common *cpsw = priv->cpsw;
+ struct cpsw_slave *slave;
+ u32 tx_in_ctl_rg, val;
+ int ret;
+
+ ret = cpsw_set_fifo_bw(priv, fifo, bw);
+ if (ret)
+ return ret;
+
+ slave = &cpsw->slaves[cpsw_slave_index(cpsw, priv)];
+ tx_in_ctl_rg = cpsw->version == CPSW_VERSION_1 ?
+ CPSW1_TX_IN_CTL : CPSW2_TX_IN_CTL;
+
+ if (!bw)
+ cpsw_fifo_shp_on(priv, fifo, bw);
+
+ val = slave_read(slave, tx_in_ctl_rg);
+ if (cpsw_shp_is_off(priv)) {
+ /* disable FIFOs rate limited queues */
+ val &= ~(0xf << CPSW_FIFO_RATE_EN_SHIFT);
+
+ /* set type of FIFO queues to normal priority mode */
+ val &= ~(3 << CPSW_FIFO_QUEUE_TYPE_SHIFT);
+
+ /* set type of FIFO queues to be rate limited */
+ if (bw)
+ val |= 2 << CPSW_FIFO_QUEUE_TYPE_SHIFT;
+ else
+ priv->shp_cfg_speed = 0;
+ }
+
+ /* toggle a FIFO rate limited queue */
+ if (bw)
+ val |= BIT(fifo + CPSW_FIFO_RATE_EN_SHIFT);
+ else
+ val &= ~BIT(fifo + CPSW_FIFO_RATE_EN_SHIFT);
+ slave_write(slave, val, tx_in_ctl_rg);
+
+ /* FIFO transmit shape enable */
+ cpsw_fifo_shp_on(priv, fifo, bw);
+ return 0;
+}
+
+/* Defaults:
+ * class A - prio 3
+ * class B - prio 2
+ * shaping for class A should be set first
+ */
+static int cpsw_set_cbs(struct net_device *ndev,
+ struct tc_cbs_qopt_offload *qopt)
+{
+ struct cpsw_priv *priv = netdev_priv(ndev);
+ struct cpsw_common *cpsw = priv->cpsw;
+ struct cpsw_slave *slave;
+ int prev_speed = 0;
+ int tc, ret, fifo;
+ u32 bw = 0;
+
+ tc = netdev_txq_to_tc(priv->ndev, qopt->queue);
+
+ /* enable channels in backward order, as highest FIFOs must be rate
+ * limited first and for compliance with CPDMA rate limited channels
+ * that also used in bacward order. FIFO0 cannot be rate limited.
+ */
+ fifo = cpsw_tc_to_fifo(tc, ndev->num_tc);
+ if (!fifo) {
+ dev_err(priv->dev, "Last tc%d can't be rate limited", tc);
+ return -EINVAL;
+ }
+
+ /* do nothing, it's disabled anyway */
+ if (!qopt->enable && !priv->fifo_bw[fifo])
+ return 0;
+
+ /* shapers can be set if link speed is known */
+ slave = &cpsw->slaves[cpsw_slave_index(cpsw, priv)];
+ if (slave->phy && slave->phy->link) {
+ if (priv->shp_cfg_speed &&
+ priv->shp_cfg_speed != slave->phy->speed)
+ prev_speed = priv->shp_cfg_speed;
+
+ priv->shp_cfg_speed = slave->phy->speed;
+ }
+
+ if (!priv->shp_cfg_speed) {
+ dev_err(priv->dev, "Link speed is not known");
+ return -1;
+ }
+
+ ret = pm_runtime_get_sync(cpsw->dev);
+ if (ret < 0) {
+ pm_runtime_put_noidle(cpsw->dev);
+ return ret;
+ }
+
+ bw = qopt->enable ? qopt->idleslope : 0;
+ ret = cpsw_set_fifo_rlimit(priv, fifo, bw);
+ if (ret) {
+ priv->shp_cfg_speed = prev_speed;
+ prev_speed = 0;
+ }
+
+ if (bw && prev_speed)
+ dev_warn(priv->dev,
+ "Speed was changed, CBS shaper speeds are changed!");
+
+ pm_runtime_put_sync(cpsw->dev);
+ return ret;
+}
+
static int cpsw_ndo_open(struct net_device *ndev)
{
struct cpsw_priv *priv = netdev_priv(ndev);
@@ -2263,6 +2481,9 @@ static int cpsw_ndo_setup_tc(struct net_device *ndev, enum tc_setup_type type,
void *type_data)
{
switch (type) {
+ case TC_SETUP_QDISC_CBS:
+ return cpsw_set_cbs(ndev, type_data);
+
case TC_SETUP_QDISC_MQPRIO:
return cpsw_set_mqprio(ndev, type_data);
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 3/6] net: ethernet: ti: cpsw: add MQPRIO Qdisc offload
From: Ivan Khoronzhuk @ 2018-06-15 18:13 UTC (permalink / raw)
To: grygorii.strashko, davem
Cc: corbet, akpm, netdev, linux-doc, linux-kernel, linux-omap,
vinicius.gomes, henrik, jesus.sanchez-palencia, ilias.apalodimas,
p-varis, spatton, francois.ozog, yogeshs, nsekhar, andrew,
Ivan Khoronzhuk
In-Reply-To: <20180615181310.10437-1-ivan.khoronzhuk@linaro.org>
That's possible to offload vlan to tc priority mapping with
assumption sk_prio == L2 prio.
Example:
$ ethtool -L eth0 rx 1 tx 4
$ qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1
$ tc -g class show dev eth0
+---(100:ffe2) mqprio
| +---(100:3) mqprio
| +---(100:4) mqprio
|
+---(100:ffe1) mqprio
| +---(100:2) mqprio
|
+---(100:ffe0) mqprio
+---(100:1) mqprio
Here, 100:1 is txq0, 100:2 is txq1, 100:3 is txq2, 100:4 is txq3
txq0 belongs to tc0, txq1 to tc1, txq2 and txq3 to tc2
The offload part only maps L2 prio to classes of traffic, but not
to transmit queues, so to direct traffic to traffic class vlan has
to be created with appropriate egress map.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
drivers/net/ethernet/ti/cpsw.c | 82 ++++++++++++++++++++++++++++++++++
1 file changed, 82 insertions(+)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 406537d74ec1..edd14def98df 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -39,6 +39,7 @@
#include <linux/sys_soc.h>
#include <linux/pinctrl/consumer.h>
+#include <net/pkt_cls.h>
#include "cpsw.h"
#include "cpsw_ale.h"
@@ -153,6 +154,8 @@ do { \
#define IRQ_NUM 2
#define CPSW_MAX_QUEUES 8
#define CPSW_CPDMA_DESCS_POOL_SIZE_DEFAULT 256
+#define CPSW_TC_NUM 4
+#define CPSW_FIFO_SHAPERS_NUM (CPSW_TC_NUM - 1)
#define CPSW_RX_VLAN_ENCAP_HDR_PRIO_SHIFT 29
#define CPSW_RX_VLAN_ENCAP_HDR_PRIO_MSK GENMASK(2, 0)
@@ -453,6 +456,7 @@ struct cpsw_priv {
u8 mac_addr[ETH_ALEN];
bool rx_pause;
bool tx_pause;
+ bool mqprio_hw;
u32 emac_port;
struct cpsw_common *cpsw;
};
@@ -1577,6 +1581,14 @@ static void cpsw_slave_stop(struct cpsw_slave *slave, struct cpsw_common *cpsw)
soft_reset_slave(slave);
}
+static int cpsw_tc_to_fifo(int tc, int num_tc)
+{
+ if (tc == num_tc - 1)
+ return 0;
+
+ return CPSW_FIFO_SHAPERS_NUM - tc;
+}
+
static int cpsw_ndo_open(struct net_device *ndev)
{
struct cpsw_priv *priv = netdev_priv(ndev);
@@ -2190,6 +2202,75 @@ static int cpsw_ndo_set_tx_maxrate(struct net_device *ndev, int queue, u32 rate)
return ret;
}
+static int cpsw_set_mqprio(struct net_device *ndev, void *type_data)
+{
+ struct tc_mqprio_qopt_offload *mqprio = type_data;
+ struct cpsw_priv *priv = netdev_priv(ndev);
+ struct cpsw_common *cpsw = priv->cpsw;
+ int fifo, num_tc, count, offset;
+ struct cpsw_slave *slave;
+ u32 tx_prio_map = 0;
+ int i, tc, ret;
+
+ num_tc = mqprio->qopt.num_tc;
+ if (num_tc > CPSW_TC_NUM)
+ return -EINVAL;
+
+ if (mqprio->mode != TC_MQPRIO_MODE_DCB)
+ return -EINVAL;
+
+ ret = pm_runtime_get_sync(cpsw->dev);
+ if (ret < 0) {
+ pm_runtime_put_noidle(cpsw->dev);
+ return ret;
+ }
+
+ if (num_tc) {
+ for (i = 0; i < 8; i++) {
+ tc = mqprio->qopt.prio_tc_map[i];
+ fifo = cpsw_tc_to_fifo(tc, num_tc);
+ tx_prio_map |= fifo << (4 * i);
+ }
+
+ netdev_set_num_tc(ndev, num_tc);
+ for (i = 0; i < num_tc; i++) {
+ count = mqprio->qopt.count[i];
+ offset = mqprio->qopt.offset[i];
+ netdev_set_tc_queue(ndev, i, count, offset);
+ }
+ }
+
+ if (!mqprio->qopt.hw) {
+ /* restore default configuration */
+ netdev_reset_tc(ndev);
+ tx_prio_map = TX_PRIORITY_MAPPING;
+ }
+
+ priv->mqprio_hw = mqprio->qopt.hw;
+
+ offset = cpsw->version == CPSW_VERSION_1 ?
+ CPSW1_TX_PRI_MAP : CPSW2_TX_PRI_MAP;
+
+ slave = &cpsw->slaves[cpsw_slave_index(cpsw, priv)];
+ slave_write(slave, tx_prio_map, offset);
+
+ pm_runtime_put_sync(cpsw->dev);
+
+ return 0;
+}
+
+static int cpsw_ndo_setup_tc(struct net_device *ndev, enum tc_setup_type type,
+ void *type_data)
+{
+ switch (type) {
+ case TC_SETUP_QDISC_MQPRIO:
+ return cpsw_set_mqprio(ndev, type_data);
+
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static const struct net_device_ops cpsw_netdev_ops = {
.ndo_open = cpsw_ndo_open,
.ndo_stop = cpsw_ndo_stop,
@@ -2205,6 +2286,7 @@ static const struct net_device_ops cpsw_netdev_ops = {
#endif
.ndo_vlan_rx_add_vid = cpsw_ndo_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = cpsw_ndo_vlan_rx_kill_vid,
+ .ndo_setup_tc = cpsw_ndo_setup_tc,
};
static int cpsw_get_regs_len(struct net_device *ndev)
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 2/6] net: ethernet: ti: cpdma: fit rated channels in backward order
From: Ivan Khoronzhuk @ 2018-06-15 18:13 UTC (permalink / raw)
To: grygorii.strashko, davem
Cc: corbet, akpm, netdev, linux-doc, linux-kernel, linux-omap,
vinicius.gomes, henrik, jesus.sanchez-palencia, ilias.apalodimas,
p-varis, spatton, francois.ozog, yogeshs, nsekhar, andrew,
Ivan Khoronzhuk
In-Reply-To: <20180615181310.10437-1-ivan.khoronzhuk@linaro.org>
According to TRM tx rated channels should be in 7..0 order,
so correct it.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
drivers/net/ethernet/ti/davinci_cpdma.c | 31 ++++++++++++-------------
1 file changed, 15 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index cdbddf16dd29..19bb63902997 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -406,37 +406,36 @@ static int cpdma_chan_fit_rate(struct cpdma_chan *ch, u32 rate,
struct cpdma_chan *chan;
u32 old_rate = ch->rate;
u32 new_rmask = 0;
- int rlim = 1;
+ int rlim = 0;
int i;
- *prio_mode = 0;
for (i = tx_chan_num(0); i < tx_chan_num(CPDMA_MAX_CHANNELS); i++) {
chan = ctlr->channels[i];
- if (!chan) {
- rlim = 0;
+ if (!chan)
continue;
- }
if (chan == ch)
chan->rate = rate;
if (chan->rate) {
- if (rlim) {
- new_rmask |= chan->mask;
- } else {
- ch->rate = old_rate;
- dev_err(ctlr->dev, "Prev channel of %dch is not rate limited\n",
- chan->chan_num);
- return -EINVAL;
- }
- } else {
- *prio_mode = 1;
- rlim = 0;
+ rlim = 1;
+ new_rmask |= chan->mask;
+ continue;
}
+
+ if (rlim)
+ goto err;
}
*rmask = new_rmask;
+ *prio_mode = rlim;
return 0;
+
+err:
+ ch->rate = old_rate;
+ dev_err(ctlr->dev, "Upper cpdma ch%d is not rate limited\n",
+ chan->chan_num);
+ return -EINVAL;
}
static u32 cpdma_chan_set_factors(struct cpdma_ctlr *ctlr,
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 1/6] net: ethernet: ti: cpsw: use cpdma channels in backward order for txq
From: Ivan Khoronzhuk @ 2018-06-15 18:13 UTC (permalink / raw)
To: grygorii.strashko, davem
Cc: corbet, akpm, netdev, linux-doc, linux-kernel, linux-omap,
vinicius.gomes, henrik, jesus.sanchez-palencia, ilias.apalodimas,
p-varis, spatton, francois.ozog, yogeshs, nsekhar, andrew,
Ivan Khoronzhuk
In-Reply-To: <20180615181310.10437-1-ivan.khoronzhuk@linaro.org>
The cpdma channel highest priority is from hi to lo number.
The driver has limited number of descriptors that are shared between
number of cpdma channels. Number of queues can be tuned with ethtool,
that allows to not spend descriptors on not needed cpdma channels.
In AVB usually only 2 tx queues can be enough with rate limitation.
The rate limitation can be used only for hi priority queues. Thus, to
use only 2 queues the 8 has to be created. It's wasteful.
So, in order to allow using only needed number of rate limited
tx queues, save resources, and be able to set rate limitation for
them, let assign tx cpdma channels in backward order to queues.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
drivers/net/ethernet/ti/cpsw.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 534596ce00d3..406537d74ec1 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -967,8 +967,8 @@ static int cpsw_tx_mq_poll(struct napi_struct *napi_tx, int budget)
/* process every unprocessed channel */
ch_map = cpdma_ctrl_txchs_state(cpsw->dma);
- for (ch = 0, num_tx = 0; ch_map; ch_map >>= 1, ch++) {
- if (!(ch_map & 0x01))
+ for (ch = 0, num_tx = 0; ch_map & 0xff; ch_map <<= 1, ch++) {
+ if (!(ch_map & 0x80))
continue;
txv = &cpsw->txv[ch];
@@ -2431,7 +2431,7 @@ static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx)
void (*handler)(void *, int, int);
struct netdev_queue *queue;
struct cpsw_vector *vec;
- int ret, *ch;
+ int ret, *ch, vch;
if (rx) {
ch = &cpsw->rx_ch_num;
@@ -2444,7 +2444,8 @@ static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx)
}
while (*ch < ch_num) {
- vec[*ch].ch = cpdma_chan_create(cpsw->dma, *ch, handler, rx);
+ vch = rx ? *ch : 7 - *ch;
+ vec[*ch].ch = cpdma_chan_create(cpsw->dma, vch, handler, rx);
queue = netdev_get_tx_queue(priv->ndev, *ch);
queue->tx_maxrate = 0;
@@ -2980,7 +2981,7 @@ static int cpsw_probe(struct platform_device *pdev)
u32 slave_offset, sliver_offset, slave_size;
const struct soc_device_attribute *soc;
struct cpsw_common *cpsw;
- int ret = 0, i;
+ int ret = 0, i, ch;
int irq;
cpsw = devm_kzalloc(&pdev->dev, sizeof(struct cpsw_common), GFP_KERNEL);
@@ -3155,7 +3156,8 @@ static int cpsw_probe(struct platform_device *pdev)
if (soc)
cpsw->quirk_irq = 1;
- cpsw->txv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);
+ ch = cpsw->quirk_irq ? 0 : 7;
+ cpsw->txv[0].ch = cpdma_chan_create(cpsw->dma, ch, cpsw_tx_handler, 0);
if (IS_ERR(cpsw->txv[0].ch)) {
dev_err(priv->dev, "error initializing tx dma channel\n");
ret = PTR_ERR(cpsw->txv[0].ch);
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 0/6] net: ethernet: ti: cpsw: add MQPRIO and CBS Qdisc offload
From: Ivan Khoronzhuk @ 2018-06-15 18:13 UTC (permalink / raw)
To: grygorii.strashko, davem
Cc: corbet, akpm, netdev, linux-doc, linux-kernel, linux-omap,
vinicius.gomes, henrik, jesus.sanchez-palencia, ilias.apalodimas,
p-varis, spatton, francois.ozog, yogeshs, nsekhar, andrew,
Ivan Khoronzhuk
This series adds MQPRIO and CBS Qdisc offload for TI cpsw driver.
It potentially can be used in audio video bridging (AVB) and time
sensitive networking (TSN).
Patchset was tested on AM572x EVM and BBB boards. Last patch from this
series adds detailed description of configuration with examples. For
consistency reasons, in role of talker and listener, tools from
patchset "TSN: Add qdisc based config interface for CBS" were used and
can be seen here: https://www.spinics.net/lists/netdev/msg460869.html
Based on net-next/master
v3..v2:
- corrected typo of "shaper" word, any functional changes
v2..v1:
- changed name cpsw.txt on ti-cpsw.txt
- changed name cpsw_set_tc() on cpsw_set_mqprio()
Ivan Khoronzhuk (6):
net: ethernet: ti: cpsw: use cpdma channels in backward order for txq
net: ethernet: ti: cpdma: fit rated channels in backward order
net: ethernet: ti: cpsw: add MQPRIO Qdisc offload
net: ethernet: ti: cpsw: add CBS Qdisc offload
net: ethernet: ti: cpsw: restore shaper configuration while down/up
Documentation: networking: cpsw: add MQPRIO & CBS offload examples
Documentation/networking/ti-cpsw.txt | 540 ++++++++++++++++++++++++
drivers/net/ethernet/ti/cpsw.c | 364 +++++++++++++++-
drivers/net/ethernet/ti/davinci_cpdma.c | 31 +-
3 files changed, 913 insertions(+), 22 deletions(-)
create mode 100644 Documentation/networking/ti-cpsw.txt
--
2.17.1
^ permalink raw reply
* [PATCH] socket.7: Clarify that poll() and select() are affected by SO_RCVLOWAT
From: Anmol Sarma @ 2018-06-15 17:38 UTC (permalink / raw)
To: mtk.manpages; +Cc: netdev, davem, linux-man, Anmol Sarma
Signed-off-by: Anmol Sarma <me@anmolsarma.in>
---
man7/socket.7 | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/man7/socket.7 b/man7/socket.7
index b1aebfae2..65532bc49 100644
--- a/man7/socket.7
+++ b/man7/socket.7
@@ -777,17 +777,20 @@ fails with the error
.B SO_RCVLOWAT
is changeable
only since Linux 2.4.
-The
+In versions of Linux prior to 2.6.28, the
.BR select (2)
and
.BR poll (2)
-system calls currently do not respect the
+system calls did not respect the
.B SO_RCVLOWAT
-setting on Linux,
-and mark a socket readable when even a single byte of data is available.
-A subsequent read from the socket will block until
+setting. Since Linux 2.6.28,
+.BR select (2)
+and
+.BR poll (2)
+will not mark a socket readable until
.B SO_RCVLOWAT
bytes are available.
+.\" Commit c7004482e8dcb7c3c72666395cfa98a216a4fb70
.\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2
.\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
.TP
--
2.17.1
^ permalink raw reply related
* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-15 17:06 UTC (permalink / raw)
To: Cornelia Huck
Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Michael S. Tsirkin,
Jakub Kicinski, Samudrala, Sridhar, konrad.wilk, qemu-devel,
virtualization, Venu Busireddy, Netdev, boris.ostrovsky,
aaron.f.brown, Joao Martins
In-Reply-To: <20180615134815.6613620e.cohuck@redhat.com>
On Fri, Jun 15, 2018 at 4:48 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> On Thu, 14 Jun 2018 18:57:11 -0700
> Siwei Liu <loseweigh@gmail.com> wrote:
>
>> Thank you for sharing your thoughts, Cornelia. With questions below, I
>> think you raised really good points, some of which I don't have answer
>> yet and would also like to explore here.
>>
>> First off, I don't want to push the discussion to the extreme at this
>> point, or sell anything about having QEMU manage everything
>> automatically. Don't get me wrong, it's not there yet. Let's don't
>> assume we are tied to a specific or concerte solution. I think the key
>> for our discussion might be to define or refine the boundary between
>> VM and guest, e.g. what each layer is expected to control and manage
>> exactly.
>>
>> In my view, there might be possibly 3 different options to represent
>> the failover device conceipt to QEMU and libvirt (or any upper layer
>> software):
>>
>> a. Seperate device: in this model, virtio and passthough remains
>> separate devices just as today. QEMU exposes the standby feature bit
>> for virtio, and publish status/event around the negotiation process of
>> this feature bit for libvirt to react upon. Since Libvirt has the
>> pairing relationship itself, maybe through MAC address or something
>> else, it can control the presence of primary by hot plugging or
>> unplugging the passthrough device, although it has to work tightly
>> with virtio's feature negotation process. Not just for migration but
>> also various corner scenarios (driver/feature ok, device reset,
>> reboot, legacy guest etc) along virtio's feature negotiation.
>
> Yes, that one has obvious tie-ins to virtio's modus operandi.
>
>>
>> b. Coupled device: in this model, virtio and passthough devices are
>> weakly coupled using some group ID, i.e. QEMU match the passthough
>> device for a standby virtio instance by comparing the group ID value
>> present behind each device's bridge. Libvirt provides QEMU the group
>> ID for both type of devices, and only deals with hot plug for
>> migration, by checking some migration status exposed (e.g. the feature
>> negotiation status on the virtio device) by QEMU. QEMU manages the
>> visibility of the primary in guest along virtio's feature negotiation
>> process.
>
> I'm a bit confused here. What, exactly, ties the two devices together?
The group UUID. Since QEMU VFIO dvice does not have insight of MAC
address (which it doesn't have to), the association between VFIO
passthrough and standby must be specificed for QEMU to understand the
relationship with this model. Note, standby feature is no longer
required to be exposed under this model.
> If libvirt already has the knowledge that it should manage the two as a
> couple, why do we need the group id (or something else for other
> architectures)? (Maybe I'm simply missing something because I'm not
> that familiar with pci.)
The idea is to have QEMU control the visibility and enumeration order
of the passthrough VFIO for the failover scenario. Hotplug can be one
way to achieve it, and perhaps there's other way around also. The
group ID is not just for QEMU to couple devices, it's also helpful to
guest too as grouping using MAC address is just not safe.
>
>>
>> c. Fully combined device: in this model, virtio and passthough devices
>> are viewed as a single VM interface altogther. QEMU not just controls
>> the visibility of the primary in guest, but can also manage the
>> exposure of the passthrough for migratability. It can be like that
>> libvirt supplies the group ID to QEMU. Or libvirt does not even have
>> to provide group ID for grouping the two devices, if just one single
>> combined device is exposed by QEMU. In either case, QEMU manages all
>> aspect of such internal construct, including virtio feature
>> negotiation, presence of the primary, and live migration.
>
> Same question as above.
>
>>
>> It looks like to me that, in your opinion, you seem to prefer go with
>> (a). While I'm actually okay with either (b) or (c). Do I understand
>> your point correctly?
>
> I'm not yet preferring anything, as I'm still trying to understand how
> this works :) I hope we can arrive at a model that covers the use case
> and that is also flexible enough to be extended to other platforms.
>
>>
>> The reason that I feel that (a) might not be ideal, just as Michael
>> alluded to (quoting below), is that as management stack, it really
>> doesn't need to care about the detailed process of feature negotiation
>> (if we view the guest presence of the primary as part of feature
>> negotiation at an extended level not just virtio). All it needs to be
>> done is to hand in the required devices to QEMU and that's all. Why do
>> we need to addd various hooks, events for whichever happens internally
>> within the guest?
>>
>> ''
>> Primary device is added with a special "primary-failover" flag.
>> A virtual machine is then initialized with just a standby virtio
>> device. Primary is not yet added.
>>
>> Later QEMU detects that guest driver device set DRIVER_OK.
>> It then exposes the primary device to the guest, and triggers
>> a device addition event (hot-plug event) for it.
>>
>> If QEMU detects guest driver removal, it initiates a hot-unplug sequence
>> to remove the primary driver. In particular, if QEMU detects guest
>> re-initialization (e.g. by detecting guest reset) it immediately removes
>> the primary device.
>> ''
>>
>> and,
>>
>> ''
>> management just wants to give the primary to guest and later take it back,
>> it really does not care about the details of the process,
>> so I don't see what does pushing it up the stack buy you.
>>
>> So I don't think it *needs* to be done in libvirt. It probably can if you
>> add a bunch of hooks so it knows whenever vm reboots, driver binds and
>> unbinds from device, and can check that backup flag was set.
>> If you are pushing for a setup like that please get a buy-in
>> from libvirt maintainers or better write a patch.
>> ''
>
> This actually seems to mean the opposite to me: We need to know what
> the guest is doing and when, as it directly drives what we need to do
> with the devices. If we switch to a visibility vs a hotplug model (see
> the other mail), we might be able to handle that part within qemu.
In the model of (b), I think it essentially turns hotplug to one of
mechanisms for QEMU to control the visibility. The libvirt can still
manage the hotplug of individual devices during live migration or in
normal situation to hot add/remove devices. Though the visibility of
the VFIO is under the controll of QEMU, and it's possible that the hot
add/remove request does not involve actual hot plug activity in guest
at all.
In the model of (c), the hotplug semantics of the combined device
would mean differently - it would end up with devices plugged in or
out altogther. To make this work, we either have to build a brand new
bond-like QEMU device consist of virtio and VFIO internally, or need
to have some abstraction in place for libvirt to manipulate the
combined device (and prohibit libvirt from operating on individual
internal device directly). Note with this model the group ID doesn't
even need to get exposed to libvirt, just imagine libvirt to supply
all options required to configure two regular virtio-net and VFIO
devices for a single device object, and QEMU will deal with the
device's visibility and enumeration, such when to hot plug VFIO device
in to or out from the guest.
It might be complicated to implement (c) though.
Regards,
-Siwei
> However, I don't see how you get around needing libvirt to actually set
> this up in the first place and to handle migration per se.
^ permalink raw reply
* Re: [PATCH bpf 2/2] bpf: reject any prog that failed read-only lock
From: Martin KaFai Lau @ 2018-06-15 16:51 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: ast, netdev
In-Reply-To: <20180615003048.3219-3-daniel@iogearbox.net>
On Fri, Jun 15, 2018 at 02:30:48AM +0200, Daniel Borkmann wrote:
> We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro()
> as well as the BPF image as read-only through bpf_prog_lock_ro(). In
> the case any of these would fail we throw a WARN_ON_ONCE() in order to
> yell loudly to the log. Perhaps, to some extend, this may be comparable
> to an allocation where __GFP_NOWARN is explicitly not set.
>
> Added via 65869a47f348 ("bpf: improve read-only handling"), this behavior
> is slightly different compared to any of the other in-kernel set_memory_ro()
> users who do not check the return code of set_memory_ro() and friends /at
> all/ (e.g. in the case of module_enable_ro() / module_disable_ro()). Given
> in BPF this is mandatory hardening step, we want to know whether there
> are any issues that would leave both BPF data writable. So it happens
> that syzkaller enabled fault injection and it triggered memory allocation
> failure deep inside x86's change_page_attr_set_clr() which was triggered
> from set_memory_ro().
>
> Now, there are two options: i) leaving everything as is, and ii) reworking
> the image locking code in order to have a final checkpoint out of the
> central bpf_prog_select_runtime() which probes whether any of the calls
> during prog setup weren't successful, and then bailing out with an error.
> Option ii) is a better approach since this additional paranoia avoids
> altogether leaving any potential W+X pages from BPF side in the system.
> Therefore, lets be strict about it, and reject programs in such unlikely
> occasion. While testing I noticed also that one bpf_prog_lock_ro()
> call was missing on the outer dummy prog in case of calls, e.g. in the
> destructor we call bpf_prog_free_deferred() on the main prog where we
> try to bpf_prog_unlock_free() the program, and since we go via
> bpf_prog_select_runtime() do that as well.
>
> Reported-by: syzbot+3b889862e65a98317058@syzkaller.appspotmail.com
> Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox