* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-05-23 16:16 UTC (permalink / raw)
To: Jiri Pirko
Cc: Michael S. Tsirkin, stephen, davem, netdev, virtualization,
virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
jasowang, loseweigh, aaron.f.brown, anjali.singhai
In-Reply-To: <20180523062748.GA3155@nanopsycho>
On 5/22/2018 11:27 PM, Jiri Pirko wrote:
> Tue, May 22, 2018 at 10:54:29PM CEST, sridhar.samudrala@intel.com wrote:
>>
>> On 5/22/2018 9:12 AM, Jiri Pirko wrote:
>>> Fixing the subj, sorry about that.
>>>
>>> Tue, May 22, 2018 at 05:46:21PM CEST, mst@redhat.com wrote:
>>>> On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
>>>>> Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>>>>>> On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>>>>>>> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>>>>>>>> Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>>>>>>>>> Use the registration/notification framework supported by the generic
>>>>>>>>> failover infrastructure.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>>>>> In previous patchset versions, the common code did
>>>>>>>> netdev_rx_handler_register() and netdev_upper_dev_link() etc
>>>>>>>> (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>>>>>>>>
>>>>>>>> This should be part of the common "failover" code.
>>>>>> Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>>>>>> netvsc and only commonize the notifier and the main event handler routine.
>>>>>> Another complication is that netvsc does part of registration in a delayed workqueue.
>>>>> :( This kind of degrades the whole efford of having single solution
>>>>> in "failover" module. I think that common parts, as
>>>>> netdev_rx_handler_register() and others certainly is should be inside
>>>>> the common module. This is not a good time to minimize changes. Let's do
>>>>> the thing properly and fix the netvsc mess now.
>>>>>
>>>>>
>>>>>> It should be possible to move some of the code from net_failover.c to generic
>>>>>> failover.c in future if Stephen is ok with it.
>>>>>>
>>>>>>
>>>>>>> Also note that in the current patchset you use IFF_FAILOVER flag for
>>>>>>> master, yet for the slave you use IFF_SLAVE. That is wrong.
>>>>>>> IFF_FAILOVER_SLAVE should be used.
>>>>>> Not sure which code you are referring to. I only set IFF_FAILOVER_SLAVE
>>>>>> in patch 3.
>>>>> The existing netvsc driver.
>>>> We really can't change netvsc's flags now, even if it's interface is
>>>> messy, it's being used in the field. We can add a flag that makes netvsc
>>>> behave differently, and if this flag also allows enhanced functionality
>>>> userspace will gradually switch.
>>> Okay, although in this case, it really does not make much sense, so be
>>> it. Leave the netvsc set the ->priv flag to IFF_SLAVE as it is doing
>>> now. (This once-wrong-forever-wrong policy is flustrating me).
>>>
>>> But since this patchset introduces private flag IFF_FAILOVER and
>>> IFF_FAILOVER_SLAVE, and we set IFF_FAILOVER to the netvsc netdev
>>> instance, we should also set IFF_FAILOVER_SLAVE to the enslaved VF
>>> netdevice to get at least some consistency between virtio_net and
>>> netvsc.
>> OK. I can make this change to set/unset IFF_FAILOVER_SLAVE in the netvsc
>> register/unregister routines so that it is consistent with virtio_net.
>>
>> Based on your discussion with mst, i think we can even remove IFF_SLAVE
>> setting on netvsc as it should not impact userspace. If Stephen is OK
>> we can make this change too.
>>
>> Do you see any other items that need to be resolved for this series to go in
>> this merge window?
> As I wrote previously, the common code including rx_handler registration
> and setting of flags and master link should be done in a common code,
> moved away from netvsc code.
>
This requires re-introducing the 2 additional ops pre_register and pre_unregister
that i removed in the last couple of revisions to minimize netvsc changes and the
indirect calls that Stephen expressed some concern.
But, as these calls don't happen in hot path, i guess it should not be a big
issue and the right way to go.
Will submit a v12 with these updates.
^ permalink raw reply
* Re: [PATCH 1/1] tools/lib/libbpf.c: fix string format to allow build on arm32
From: Sirio Balmelli @ 2018-05-23 16:16 UTC (permalink / raw)
To: Martin KaFai Lau; +Cc: Daniel Borkmann, netdev
In-Reply-To: <20180523154928.uh6h5keueumbhs6g@kafai-mbp>
On Wed, May 23, 2018 at 08:49:47AM -0700, Martin KaFai Lau wrote:
> On Wed, May 23, 2018 at 12:41:14PM +0200, Daniel Borkmann wrote:
> > [ +Martin ]
> >
> > On 05/21/2018 08:59 AM, Sirio Balmelli wrote:
> > > On arm32, 'cd tools/testing/selftests/bpf && make' fails with:
> > >
> > > libbpf.c:80:10: error: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘int64_t {aka long long int}’ [-Werror=format=]
> > > (func)("libbpf: " fmt, ##__VA_ARGS__); \
> > > ^
> > > libbpf.c:83:30: note: in expansion of macro ‘__pr’
> > > #define pr_warning(fmt, ...) __pr(__pr_warning, fmt, ##__VA_ARGS__)
> > > ^~~~
> > > libbpf.c:1072:3: note: in expansion of macro ‘pr_warning’
> > > pr_warning("map:%s value_type:%s has BTF type_size:%ld != value_size:%u\n",
> > >
> > > To fix, include 'inttypes.h' and change format directive to 'PRIi64'.
> > >
> > > Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
> >
> > Thanks for the fix! One minor comment below.
> >
> > > ---
> > > tools/lib/bpf/libbpf.c | 6 ++++--
> > > 1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index 3dbe217bf23e..e2cc8f10c188 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -48,6 +48,8 @@
> > > #include "bpf.h"
> > > #include "btf.h"
> > >
> > > +#include <inttypes.h> /* PRIi64 */
> > > +
> > > #ifndef EM_BPF
> > > #define EM_BPF 247
> > > #endif
> > > @@ -1042,7 +1044,7 @@ static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
> > > }
> > >
> > > if (def->key_size != key_size) {
> > > - pr_warning("map:%s key_type:%s has BTF type_size:%ld != key_size:%u\n",
> > > + pr_warning("map:%s key_type:%s has BTF type_size:%"PRIi64" != key_size:%u\n",
> > > map->name, name, key_size, def->key_size);
> > > return -EINVAL;
> > > }
> > > @@ -1069,7 +1071,7 @@ static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
> > > }
> > >
> > > if (def->value_size != value_size) {
> > > - pr_warning("map:%s value_type:%s has BTF type_size:%ld != value_size:%u\n",
> > > + pr_warning("map:%s value_type:%s has BTF type_size:%"PRIi64" != value_size:%u\n",
> > > map->name, name, value_size, def->value_size);
> >
> > I don't think we need the PRIi64 in here. Could you just change it to 'type_size:%u'
> > and then cast the 'key_size' resp. 'value_size' to unsigned int? We know at this
> > point that the type size value is positive anyway, so we only want to show the size
> > mismatch so simple cast should suffice. Thanks!
> Good point. 'unsigned int' is enough because the t->size (returned
> by btf_type_size()) is a u32.
>
> >
> > > return -EINVAL;
> > > }
> > >
> >
Thank you very much; respinning patch.
^ permalink raw reply
* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
From: John Fastabend @ 2018-05-23 16:03 UTC (permalink / raw)
To: Huy Nguyen, Jiri Pirko, Jakub Kicinski
Cc: Saeed Mahameed, David S. Miller, netdev, Or Gerlitz
In-Reply-To: <b2f9130c-5f03-0058-cfc4-f6bc0fc44faa@mellanox.com>
On 05/23/2018 08:37 AM, Huy Nguyen wrote:
>
>
> On 5/23/2018 8:52 AM, John Fastabend wrote:
>> It would be nice though if the API gave us some hint on max/min/stride
>> of allowed values. Could the get API return these along with current
>> value? Presumably the allowed max size could change with devlink buffer
>> changes in how the global buffer is divided up as well.
> Acked. I will add Max. Let's skip min/stride since it is too hardware specific.
At minimum then we need to document for driver writers what to do
with a value that falls between strides. Round-up or round-down.
.John
^ permalink raw reply
* Re: [PATCH v2 bpf-next 1/5] bpf: Hooks for sys_sendmsg
From: Martin KaFai Lau @ 2018-05-23 16:02 UTC (permalink / raw)
To: Andrey Ignatov; +Cc: netdev, davem, ast, daniel, kernel-team
In-Reply-To: <ba05de3cd3b6af6d3300d9c5623976f4aec161b0.1527031931.git.rdna@fb.com>
On Tue, May 22, 2018 at 04:40:02PM -0700, Andrey Ignatov wrote:
> In addition to already existing BPF hooks for sys_bind and sys_connect,
> the patch provides new hooks for sys_sendmsg.
>
> It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR`
> that provides access to socket itlself (properties like family, type,
> protocol) and user-passed `struct sockaddr *` so that BPF program can
> override destination IP and port for system calls such as sendto(2) or
> sendmsg(2) and/or assign source IP to the socket.
>
> The hooks are implemented as two new attach types:
> `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and
> UDPv6 correspondingly.
>
> UDPv4 and UDPv6 separate attach types for same reason as sys_bind and
> sys_connect hooks, i.e. to prevent reading from / writing to e.g.
> user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound.
>
> The difference with already existing hooks is sys_sendmsg are
> implemented only for unconnected UDP.
>
> For TCP it doesn't make sense to change user-provided `struct sockaddr *`
> at sendto(2)/sendmsg(2) time since socket either was already connected
> and has source/destination set or wasn't connected and call to
> sendto(2)/sendmsg(2) would lead to ENOTCONN anyway.
>
> Connected UDP is already handled by sys_connect hooks that can override
> source/destination at connect time and use fast-path later, i.e. these
> hooks don't affect UDP fast-path.
>
> Rewriting source IP is implemented differently than that in sys_connect
> hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to
> just bind socket to desired local IP address since source IP can be set
> on per-packet basis by using ancillary data (cmsg(3)). So no matter if
> socket is bound or not, source IP has to be rewritten on every call to
> sys_sendmsg.
>
> To do so two new fields are added to UAPI `struct bpf_sock_addr`;
> * `msg_src_ip4` to set source IPv4 for UDPv4;
> * `msg_src_ip6` to set source IPv6 for UDPv6.
>
> Signed-off-by: Andrey Ignatov <rdna@fb.com>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
^ permalink raw reply
* Re: [PATCH] ppp: remove the PPPIOCDETACH ioctl
From: David Miller @ 2018-05-23 15:56 UTC (permalink / raw)
To: g.nault
Cc: ebiggers3, linux-ppp, paulus, netdev, linux-fsdevel, linux-kernel,
syzkaller-bugs, ebiggers
In-Reply-To: <20180523135708.GB1569@alphalink.fr>
From: Guillaume Nault <g.nault@alphalink.fr>
Date: Wed, 23 May 2018 15:57:08 +0200
> I'd rather add
> + if (cmd == PPPIOCDETACH) {
> + err = -EINVAL;
> + goto out;
> + }
>
> Making PPPIOCDETACH unknown to ppp_generic means that the ioctl would
> be handled by the underlying channel when pf->kind == CHANNEL (see the
> chan->ops->ioctl() call further down). That shouldn't be a problem per
> se, but even though PPPIOCDETACH is unsupported, I feel that it should
> remain a ppp_generic thing. I don't really want its value to be reused
> for other purposes in the future or have different behaviour depending
> on the underlying channel.
>
> Also PPPIOCDETACH can already fail with -EINVAL. Therefore, if ever
> there really were programs out there using this call, they'd already
> have to handle this case. Unconditionally returning -EINVAL would
> further minimise possibilities for breakage.
I agree.
^ permalink raw reply
* Re: [PATCH v2] packet: track ring entry use using a shadow ring to prevent RX ring overrun
From: Willem de Bruijn @ 2018-05-23 15:53 UTC (permalink / raw)
To: Jon Rosen (jrosen)
Cc: David S. Miller, Willem de Bruijn, Eric Dumazet, Kees Cook,
David Windsor, Rosen, Rami, Reshetova, Elena, Mike Maloney,
Benjamin Poirier, Thomas Gleixner, Greg Kroah-Hartman,
open list:NETWORKING [GENERAL], open list
In-Reply-To: <3d1ac567c6d24265909aeec748a743e0@XCH-RTP-016.cisco.com>
On Wed, May 23, 2018 at 11:29 AM, Jon Rosen (jrosen) <jrosen@cisco.com> wrote:
>
>
>> -----Original Message-----
>> From: Willem de Bruijn [mailto:willemdebruijn.kernel@gmail.com]
>> Sent: Wednesday, May 23, 2018 9:37 AM
>> To: Jon Rosen (jrosen) <jrosen@cisco.com>
>> Cc: David S. Miller <davem@davemloft.net>; Willem de Bruijn <willemb@google.com>; Eric Dumazet
>> <edumazet@google.com>; Kees Cook <keescook@chromium.org>; David Windsor <dwindsor@gmail.com>; Rosen,
>> Rami <rami.rosen@intel.com>; Reshetova, Elena <elena.reshetova@intel.com>; Mike Maloney
>> <maloney@google.com>; Benjamin Poirier <bpoirier@suse.com>; Thomas Gleixner <tglx@linutronix.de>; Greg
>> Kroah-Hartman <gregkh@linuxfoundation.org>; open list:NETWORKING [GENERAL] <netdev@vger.kernel.org>;
>> open list <linux-kernel@vger.kernel.org>
>> Subject: Re: [PATCH v2] packet: track ring entry use using a shadow ring to prevent RX ring overrun
>>
>> On Wed, May 23, 2018 at 7:54 AM, Jon Rosen (jrosen) <jrosen@cisco.com> wrote:
>> >> > For the ring, there is no requirement to allocate exactly the amount
>> >> > specified by the user request. Safer than relying on shared memory
>> >> > and simpler than the extra allocation in this patch would be to allocate
>> >> > extra shadow memory at the end of the ring (and not mmap that).
>> >> >
>> >> > That still leaves an extra cold cacheline vs using tp_padding.
>> >>
>> >> Given my lack of experience and knowledge in writing kernel code
>> >> it was easier for me to allocate the shadow ring as a separate
>> >> structure. Of course it's not about me and my skills so if it's
>> >> more appropriate to allocate at the tail of the existing ring
>> >> then certainly I can look at doing that.
>> >
>> > The memory for the ring is not one contiguous block, it's an array of
>> > blocks of pages (or 'order' sized blocks of pages). I don't think
>> > increasing the size of each of the blocks to provided storage would be
>> > such a good idea as it will risk spilling over into the next order and
>> > wasting lots of memory. I suspect it's also more complex than a single
>> > shadow ring to do both the allocation and the access.
>> >
>> > It could be tacked onto the end of the pg_vec[] used to store the
>> > pointers to the blocks. The challenge with that is that a pg_vec[] is
>> > created for each of RX and TX rings so either it would have to
>> > allocate unnecessary storage for TX or the caller will have to say if
>> > extra space should be allocated or not. E.g.:
>> >
>> > static struct pgv *alloc_pg_vec(struct tpacket_req *req, int order, int scratch, void **scratch_p)
>> >
>> > I'm not sure avoiding the extra allocation and moving it to the
>> > pg_vec[] for the RX ring is going to get the simplification you were
>> > hoping for. Is there another way of storing the shadow ring which
>> > I should consider?
>>
>> I did indeed mean attaching extra pages to pg_vec[]. It should be
>> simpler than a separate structure, but I may be wrong.
>
> I don't think it would be too bad, it may actually turn out to be
> convenient to implement.
>
>>
>> Either way, I still would prefer to avoid the shadow buffer completely.
>> It incurs complexity and cycle cost on all users because of only the
>> rare (non-existent?) consumer that overwrites the padding bytes.
>
> I prefer that as well. I'm just not sure there is a bulletproof
> solution without the shadow state. I also wish it were only a
> theoretical issue but unfortunately it is actually something our
> customers have seen.
>>
>> Perhaps we can use padding yet avoid deadlock by writing a
>> timed value. The simplest would be jiffies >> N. Then only a
>> process that writes this exact value would be subject to drops and
>> then still only for a limited period.
>>
>> Instead of depending on wall clock time, like jiffies, another option
>> would be to keep a percpu array of values. Each cpu has a zero
>> entry if it is not writing, nonzero if it is. If a writer encounters a
>> number in padding that is > num_cpus, then the state is garbage
>> from userspace. If <= num_cpus, it is adhered to only until that cpu
>> clears its entry, which is guaranteed to happen eventually.
>>
>> Just a quick thought. This might not fly at all upon closer scrutiny.
>
> I'm not sure I understand the suggestion, but I'll think on it
> some more.
The idea is to use tp_padding to signal state between kernel threads.
But in such a way that worst case is not a deadlock, just a higher drop
rate.
First, write a specific value and ignore any values other than zero and
this. This fixes the bug, except for the rare users that write to padding.
Second, invalidate the specific value at some point to ensure forward
progress.
Simplest is wall clock time, so jiffies based. But this is imprecise.
Alternative that scales with #cpus instead of #slots is to a percpu
array. A kernel thread that reads a non-zero padding is reading
either garbage or a cpuid. If a cpuid, it can double check that the
given cpu is busy writing by reading the entry from the percpu array.
>
> Some other options maybe worth considering (in no specific order):
> - test the application to see if it will consume entries if tp_status
> is set to anything other than TP_STATUS_USER, only use shadow if
> it doesn't strictly honor the TP_STATUS_USER bit.
>
> - skip shadow if we see new TP_STATUS_USER_TO_KERNEL is used
>
> - use tp_len == -1 to indicate inuse
>
>
>
^ permalink raw reply
* Re: pull-request: mac80211 2018-05-23
From: David Miller @ 2018-05-23 15:51 UTC (permalink / raw)
To: johannes; +Cc: netdev, linux-wireless
In-Reply-To: <20180523094758.8892-1-johannes@sipsolutions.net>
From: Johannes Berg <johannes@sipsolutions.net>
Date: Wed, 23 May 2018 11:47:57 +0200
> Just another handful of fixes as we wind down towards the
> merge window.
>
> Please pull and let me know if there's any problem.
Pulled, thanks Johannes.
Please don't tell me you will soon queue up a patch that
limits wiphy names to 32 bytes :-)
^ permalink raw reply
* Re: [PATCH 1/1] tools/lib/libbpf.c: fix string format to allow build on arm32
From: Martin KaFai Lau @ 2018-05-23 15:49 UTC (permalink / raw)
To: Daniel Borkmann, Sirio Balmelli; +Cc: netdev
In-Reply-To: <3007a75a-1f1f-64b8-ecd1-e0087fa47403@iogearbox.net>
On Wed, May 23, 2018 at 12:41:14PM +0200, Daniel Borkmann wrote:
> [ +Martin ]
>
> On 05/21/2018 08:59 AM, Sirio Balmelli wrote:
> > On arm32, 'cd tools/testing/selftests/bpf && make' fails with:
> >
> > libbpf.c:80:10: error: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘int64_t {aka long long int}’ [-Werror=format=]
> > (func)("libbpf: " fmt, ##__VA_ARGS__); \
> > ^
> > libbpf.c:83:30: note: in expansion of macro ‘__pr’
> > #define pr_warning(fmt, ...) __pr(__pr_warning, fmt, ##__VA_ARGS__)
> > ^~~~
> > libbpf.c:1072:3: note: in expansion of macro ‘pr_warning’
> > pr_warning("map:%s value_type:%s has BTF type_size:%ld != value_size:%u\n",
> >
> > To fix, include 'inttypes.h' and change format directive to 'PRIi64'.
> >
> > Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
>
> Thanks for the fix! One minor comment below.
>
> > ---
> > tools/lib/bpf/libbpf.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 3dbe217bf23e..e2cc8f10c188 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -48,6 +48,8 @@
> > #include "bpf.h"
> > #include "btf.h"
> >
> > +#include <inttypes.h> /* PRIi64 */
> > +
> > #ifndef EM_BPF
> > #define EM_BPF 247
> > #endif
> > @@ -1042,7 +1044,7 @@ static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
> > }
> >
> > if (def->key_size != key_size) {
> > - pr_warning("map:%s key_type:%s has BTF type_size:%ld != key_size:%u\n",
> > + pr_warning("map:%s key_type:%s has BTF type_size:%"PRIi64" != key_size:%u\n",
> > map->name, name, key_size, def->key_size);
> > return -EINVAL;
> > }
> > @@ -1069,7 +1071,7 @@ static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
> > }
> >
> > if (def->value_size != value_size) {
> > - pr_warning("map:%s value_type:%s has BTF type_size:%ld != value_size:%u\n",
> > + pr_warning("map:%s value_type:%s has BTF type_size:%"PRIi64" != value_size:%u\n",
> > map->name, name, value_size, def->value_size);
>
> I don't think we need the PRIi64 in here. Could you just change it to 'type_size:%u'
> and then cast the 'key_size' resp. 'value_size' to unsigned int? We know at this
> point that the type size value is positive anyway, so we only want to show the size
> mismatch so simple cast should suffice. Thanks!
Good point. 'unsigned int' is enough because the t->size (returned
by btf_type_size()) is a u32.
>
> > return -EINVAL;
> > }
> >
>
^ permalink raw reply
* [PATCH V4 8/8] dt-bindings: stm32: add compatible for syscon
From: Christophe Roullier @ 2018-05-23 15:47 UTC (permalink / raw)
To: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro
Cc: devicetree, linux-arm-kernel, netdev, christophe.roullier, andrew
In-Reply-To: <1527090479-5263-1-git-send-email-christophe.roullier@st.com>
This patch describes syscon DT bindings.
Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
---
Documentation/devicetree/bindings/arm/stm32.txt | 10 ----------
.../devicetree/bindings/arm/stm32/stm32-syscon.txt | 14 ++++++++++++++
Documentation/devicetree/bindings/arm/stm32/stm32.txt | 10 ++++++++++
3 files changed, 24 insertions(+), 10 deletions(-)
delete mode 100644 Documentation/devicetree/bindings/arm/stm32.txt
create mode 100644 Documentation/devicetree/bindings/arm/stm32/stm32-syscon.txt
create mode 100644 Documentation/devicetree/bindings/arm/stm32/stm32.txt
diff --git a/Documentation/devicetree/bindings/arm/stm32.txt b/Documentation/devicetree/bindings/arm/stm32.txt
deleted file mode 100644
index 6808ed9..0000000
--- a/Documentation/devicetree/bindings/arm/stm32.txt
+++ /dev/null
@@ -1,10 +0,0 @@
-STMicroelectronics STM32 Platforms Device Tree Bindings
-
-Each device tree must specify which STM32 SoC it uses,
-using one of the following compatible strings:
-
- st,stm32f429
- st,stm32f469
- st,stm32f746
- st,stm32h743
- st,stm32mp157
diff --git a/Documentation/devicetree/bindings/arm/stm32/stm32-syscon.txt b/Documentation/devicetree/bindings/arm/stm32/stm32-syscon.txt
new file mode 100644
index 0000000..99980ae
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/stm32/stm32-syscon.txt
@@ -0,0 +1,14 @@
+STMicroelectronics STM32 Platforms System Controller
+
+Properties:
+ - compatible : should contain two values. First value must be :
+ - " st,stm32mp157-syscfg " - for stm32mp157 based SoCs,
+ second value must be always "syscon".
+ - reg : offset and length of the register set.
+
+ Example:
+ syscfg: syscon@50020000 {
+ compatible = "st,stm32mp157-syscfg", "syscon";
+ reg = <0x50020000 0x400>;
+ };
+
diff --git a/Documentation/devicetree/bindings/arm/stm32/stm32.txt b/Documentation/devicetree/bindings/arm/stm32/stm32.txt
new file mode 100644
index 0000000..6808ed9
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/stm32/stm32.txt
@@ -0,0 +1,10 @@
+STMicroelectronics STM32 Platforms Device Tree Bindings
+
+Each device tree must specify which STM32 SoC it uses,
+using one of the following compatible strings:
+
+ st,stm32f429
+ st,stm32f469
+ st,stm32f746
+ st,stm32h743
+ st,stm32mp157
--
1.9.1
^ permalink raw reply related
* [PATCH V4 6/8] net: stmmac: add dwmac-4.20a compatible
From: Christophe Roullier @ 2018-05-23 15:47 UTC (permalink / raw)
To: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro
Cc: devicetree, linux-arm-kernel, netdev, christophe.roullier, andrew
In-Reply-To: <1527090479-5263-1-git-send-email-christophe.roullier@st.com>
Manage dwmac-4.20a version from synopsys
Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index ebd3e5f..6d141f3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -472,7 +472,8 @@ struct plat_stmmacenet_data *
}
if (of_device_is_compatible(np, "snps,dwmac-4.00") ||
- of_device_is_compatible(np, "snps,dwmac-4.10a")) {
+ of_device_is_compatible(np, "snps,dwmac-4.10a") ||
+ of_device_is_compatible(np, "snps,dwmac-4.20a")) {
plat->has_gmac4 = 1;
plat->has_gmac = 0;
plat->pmt = 1;
--
1.9.1
^ permalink raw reply related
* [PATCH V4 2/8] dt-bindings: stm32-dwmac: add support of MPU families
From: Christophe Roullier @ 2018-05-23 15:47 UTC (permalink / raw)
To: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro
Cc: devicetree, linux-arm-kernel, netdev, christophe.roullier, andrew
In-Reply-To: <1527090479-5263-1-git-send-email-christophe.roullier@st.com>
Add description for Ethernet MPU families fields
Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
Documentation/devicetree/bindings/net/stm32-dwmac.txt | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/Documentation/devicetree/bindings/net/stm32-dwmac.txt b/Documentation/devicetree/bindings/net/stm32-dwmac.txt
index 489dbcb..1341012 100644
--- a/Documentation/devicetree/bindings/net/stm32-dwmac.txt
+++ b/Documentation/devicetree/bindings/net/stm32-dwmac.txt
@@ -6,14 +6,28 @@ Please see stmmac.txt for the other unchanged properties.
The device node has following properties.
Required properties:
-- compatible: Should be "st,stm32-dwmac" to select glue, and
+- compatible: For MCU family should be "st,stm32-dwmac" to select glue, and
"snps,dwmac-3.50a" to select IP version.
+ For MPU family should be "st,stm32mp1-dwmac" to select
+ glue, and "snps,dwmac-4.20a" to select IP version.
- clocks: Must contain a phandle for each entry in clock-names.
- clock-names: Should be "stmmaceth" for the host clock.
Should be "mac-clk-tx" for the MAC TX clock.
Should be "mac-clk-rx" for the MAC RX clock.
+ For MPU family need to add also "ethstp" for power mode clock and,
+ "syscfg-clk" for SYSCFG clock.
+- interrupt-names: Should contain a list of interrupt names corresponding to
+ the interrupts in the interrupts property, if available.
+ Should be "macirq" for the main MAC IRQ
+ Should be "eth_wake_irq" for the IT which wake up system
- st,syscon : Should be phandle/offset pair. The phandle to the syscon node which
- encompases the glue register, and the offset of the control register.
+ encompases the glue register, and the offset of the control register.
+
+Optional properties:
+- clock-names: For MPU family "mac-clk-ck" for PHY without quartz
+- st,int-phyclk (boolean) : valid only where PHY do not have quartz and need to be clock
+ by RCC
+
Example:
ethernet@40028000 {
--
1.9.1
^ permalink raw reply related
* [PATCH V4 3/8] ARM: dts: stm32: add ethernet pins to stm32mp157c
From: Christophe Roullier @ 2018-05-23 15:47 UTC (permalink / raw)
To: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro
Cc: devicetree, linux-arm-kernel, netdev, christophe.roullier, andrew
In-Reply-To: <1527090479-5263-1-git-send-email-christophe.roullier@st.com>
Add ethernet pins on stm32mp157c.
Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
---
arch/arm/boot/dts/stm32mp157-pinctrl.dtsi | 46 +++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi b/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi
index 6f044100..cf83eb244 100644
--- a/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi
+++ b/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi
@@ -158,6 +158,52 @@
bias-disable;
};
};
+
+ ethernet0_rgmii_pins_a: rgmii-0 {
+ pins1 {
+ pinmux = <STM32_PINMUX('G', 5, AF11)>, /* ETH_RGMII_CLK125 */
+ <STM32_PINMUX('G', 4, AF11)>, /* ETH_RGMII_GTX_CLK */
+ <STM32_PINMUX('G', 13, AF11)>, /* ETH_RGMII_TXD0 */
+ <STM32_PINMUX('G', 14, AF11)>, /* ETH_RGMII_TXD1 */
+ <STM32_PINMUX('C', 2, AF11)>, /* ETH_RGMII_TXD2 */
+ <STM32_PINMUX('E', 2, AF11)>, /* ETH_RGMII_TXD3 */
+ <STM32_PINMUX('B', 11, AF11)>, /* ETH_RGMII_TX_CTL */
+ <STM32_PINMUX('A', 2, AF11)>, /* ETH_MDIO */
+ <STM32_PINMUX('C', 1, AF11)>; /* ETH_MDC */
+ bias-disable;
+ drive-push-pull;
+ slew-rate = <3>;
+ };
+ pins2 {
+ pinmux = <STM32_PINMUX('C', 4, AF11)>, /* ETH_RGMII_RXD0 */
+ <STM32_PINMUX('C', 5, AF11)>, /* ETH_RGMII_RXD1 */
+ <STM32_PINMUX('B', 0, AF11)>, /* ETH_RGMII_RXD2 */
+ <STM32_PINMUX('B', 1, AF11)>, /* ETH_RGMII_RXD3 */
+ <STM32_PINMUX('A', 1, AF11)>, /* ETH_RGMII_RX_CLK */
+ <STM32_PINMUX('A', 7, AF11)>; /* ETH_RGMII_RX_CTL */
+ bias-disable;
+ };
+ };
+
+ ethernet0_rgmii_pins_sleep_a: rgmii-sleep-0 {
+ pins1 {
+ pinmux = <STM32_PINMUX('G', 5, ANALOG)>, /* ETH_RGMII_CLK125 */
+ <STM32_PINMUX('G', 4, ANALOG)>, /* ETH_RGMII_GTX_CLK */
+ <STM32_PINMUX('G', 13, ANALOG)>, /* ETH_RGMII_TXD0 */
+ <STM32_PINMUX('G', 14, ANALOG)>, /* ETH_RGMII_TXD1 */
+ <STM32_PINMUX('C', 2, ANALOG)>, /* ETH_RGMII_TXD2 */
+ <STM32_PINMUX('E', 2, ANALOG)>, /* ETH_RGMII_TXD3 */
+ <STM32_PINMUX('B', 11, ANALOG)>, /* ETH_RGMII_TX_CTL */
+ <STM32_PINMUX('A', 2, ANALOG)>, /* ETH_MDIO */
+ <STM32_PINMUX('C', 1, ANALOG)>, /* ETH_MDC */
+ <STM32_PINMUX('C', 4, ANALOG)>, /* ETH_RGMII_RXD0 */
+ <STM32_PINMUX('C', 5, ANALOG)>, /* ETH_RGMII_RXD1 */
+ <STM32_PINMUX('B', 0, ANALOG)>, /* ETH_RGMII_RXD2 */
+ <STM32_PINMUX('B', 1, ANALOG)>, /* ETH_RGMII_RXD3 */
+ <STM32_PINMUX('A', 1, ANALOG)>, /* ETH_RGMII_RX_CLK */
+ <STM32_PINMUX('A', 7, ANALOG)>; /* ETH_RGMII_RX_CTL */
+ };
+ };
};
pinctrl_z: pin-controller-z {
--
1.9.1
^ permalink raw reply related
* [PATCH V4 5/8] ARM: dts: stm32: Add ethernet dwmac on stm32mp1
From: Christophe Roullier @ 2018-05-23 15:47 UTC (permalink / raw)
To: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro
Cc: devicetree, linux-arm-kernel, netdev, christophe.roullier, andrew
In-Reply-To: <1527090479-5263-1-git-send-email-christophe.roullier@st.com>
Add Ethernet support (Synopsys MAC IP 4.20a) on stm32mp1 SOC.
Enable feature supported by the stmmac driver, such as TSO.
Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
---
arch/arm/boot/dts/stm32mp157c.dtsi | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/arm/boot/dts/stm32mp157c.dtsi b/arch/arm/boot/dts/stm32mp157c.dtsi
index 3db03a2..ea7b6cb 100644
--- a/arch/arm/boot/dts/stm32mp157c.dtsi
+++ b/arch/arm/boot/dts/stm32mp157c.dtsi
@@ -179,5 +179,35 @@
clocks = <&rcc USART1_K>;
status = "disabled";
};
+
+ stmmac_axi_config_0: stmmac-axi-config {
+ snps,wr_osr_lmt = <0x7>;
+ snps,rd_osr_lmt = <0x7>;
+ snps,blen = <0 0 0 0 16 8 4>;
+ };
+
+ ethernet0: ethernet@5800a000 {
+ compatible = "st,stm32mp1-dwmac", "snps,dwmac-4.20a";
+ reg = <0x5800a000 0x2000>;
+ reg-names = "stmmaceth";
+ interrupts-extended = <&intc GIC_SPI 61 IRQ_TYPE_NONE>;
+ interrupt-names = "macirq";
+ clock-names = "stmmaceth",
+ "mac-clk-tx",
+ "mac-clk-rx",
+ "ethstp",
+ "syscfg-clk";
+ clocks = <&rcc ETHMAC>,
+ <&rcc ETHTX>,
+ <&rcc ETHRX>,
+ <&rcc ETHSTP>,
+ <&rcc SYSCFG>;
+ st,syscon = <&syscfg 0x4>;
+ snps,mixed-burst;
+ snps,pbl = <2>;
+ snps,axi-config = <&stmmac_axi_config_0>;
+ snps,tso;
+ status = "disabled";
+ };
};
};
--
1.9.1
^ permalink raw reply related
* [PATCH V4 0/8] net: ethernet: stmmac: add support for stm32mp1
From: Christophe Roullier @ 2018-05-23 15:47 UTC (permalink / raw)
To: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro
Cc: devicetree, linux-arm-kernel, netdev, christophe.roullier, andrew
Patches to have Ethernet support on stm32mp1
Changelog:
Remark from Rob Herring
Move Documentation/devicetree/bindings/arm/stm32.txt in
Documentation/devicetree/bindings/arm/stm32/stm32.txt and create
Documentation/devicetree/bindings/arm/stm32/stm32-syscon.txt
Replace also in arch/arm/boot/dts/stm32mp157c.dtsi, syscfg: system-config@50020000
with syscfg: syscon@50020000syscfg: system-config@50020000
Christophe Roullier (8):
net: ethernet: stmmac: add adaptation for stm32mp157c.
dt-bindings: stm32-dwmac: add support of MPU families
ARM: dts: stm32: add ethernet pins to stm32mp157c
ARM: dts: stm32: Add syscfg on stm32mp1
ARM: dts: stm32: Add ethernet dwmac on stm32mp1
net: stmmac: add dwmac-4.20a compatible
ARM: dts: stm32: add support of ethernet on stm32mp157c-ev1
dt-bindings: stm32: add compatible for syscon
Documentation/devicetree/bindings/arm/stm32.txt | 10 -
.../devicetree/bindings/arm/stm32/stm32-syscon.txt | 14 ++
.../devicetree/bindings/arm/stm32/stm32.txt | 10 +
.../devicetree/bindings/net/stm32-dwmac.txt | 18 +-
arch/arm/boot/dts/stm32mp157-pinctrl.dtsi | 46 ++++
arch/arm/boot/dts/stm32mp157c-ev1.dts | 20 ++
arch/arm/boot/dts/stm32mp157c.dtsi | 35 +++
drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c | 270 +++++++++++++++++++--
.../net/ethernet/stmicro/stmmac/stmmac_platform.c | 3 +-
9 files changed, 398 insertions(+), 28 deletions(-)
delete mode 100644 Documentation/devicetree/bindings/arm/stm32.txt
create mode 100644 Documentation/devicetree/bindings/arm/stm32/stm32-syscon.txt
create mode 100644 Documentation/devicetree/bindings/arm/stm32/stm32.txt
--
1.9.1
^ permalink raw reply
* [PATCH V4 7/8] ARM: dts: stm32: add support of ethernet on stm32mp157c-ev1
From: Christophe Roullier @ 2018-05-23 15:47 UTC (permalink / raw)
To: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro
Cc: devicetree, linux-arm-kernel, netdev, christophe.roullier, andrew
In-Reply-To: <1527090479-5263-1-git-send-email-christophe.roullier@st.com>
MAC is connected to a PHY in RGMII mode.
Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
---
arch/arm/boot/dts/stm32mp157c-ev1.dts | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/arch/arm/boot/dts/stm32mp157c-ev1.dts b/arch/arm/boot/dts/stm32mp157c-ev1.dts
index 57e6dbc..a7fee5c 100644
--- a/arch/arm/boot/dts/stm32mp157c-ev1.dts
+++ b/arch/arm/boot/dts/stm32mp157c-ev1.dts
@@ -17,5 +17,25 @@
aliases {
serial0 = &uart4;
+ ethernet0 = ðernet0;
+ };
+};
+
+ðernet0 {
+ status = "okay";
+ pinctrl-0 = <ðernet0_rgmii_pins_a>;
+ pinctrl-1 = <ðernet0_rgmii_pins_sleep_a>;
+ pinctrl-names = "default", "sleep";
+ phy-mode = "rgmii";
+ max-speed = <1000>;
+ phy-handle = <&phy0>;
+
+ mdio0 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "snps,dwmac-mdio";
+ phy0: ethernet-phy@0 {
+ reg = <0>;
+ };
};
};
--
1.9.1
^ permalink raw reply related
* [PATCH V4 1/8] net: ethernet: stmmac: add adaptation for stm32mp157c.
From: Christophe Roullier @ 2018-05-23 15:47 UTC (permalink / raw)
To: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro
Cc: devicetree, linux-arm-kernel, netdev, christophe.roullier, andrew
In-Reply-To: <1527090479-5263-1-git-send-email-christophe.roullier@st.com>
Glue codes to support stm32mp157c device and stay
compatible with stm32 mcu family
Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
---
drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c | 270 ++++++++++++++++++++--
1 file changed, 255 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c
index 9e6db16..f51e327 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c
@@ -16,49 +16,183 @@
#include <linux/of_net.h>
#include <linux/phy.h>
#include <linux/platform_device.h>
+#include <linux/pm_wakeirq.h>
#include <linux/regmap.h>
#include <linux/slab.h>
#include <linux/stmmac.h>
#include "stmmac_platform.h"
-#define MII_PHY_SEL_MASK BIT(23)
+#define SYSCFG_MCU_ETH_MASK BIT(23)
+#define SYSCFG_MP1_ETH_MASK GENMASK(23, 16)
+
+#define SYSCFG_PMCR_ETH_CLK_SEL BIT(16)
+#define SYSCFG_PMCR_ETH_REF_CLK_SEL BIT(17)
+#define SYSCFG_PMCR_ETH_SEL_MII BIT(20)
+#define SYSCFG_PMCR_ETH_SEL_RGMII BIT(21)
+#define SYSCFG_PMCR_ETH_SEL_RMII BIT(23)
+#define SYSCFG_PMCR_ETH_SEL_GMII 0
+#define SYSCFG_MCU_ETH_SEL_MII 0
+#define SYSCFG_MCU_ETH_SEL_RMII 1
struct stm32_dwmac {
struct clk *clk_tx;
struct clk *clk_rx;
+ struct clk *clk_eth_ck;
+ struct clk *clk_ethstp;
+ struct clk *syscfg_clk;
+ bool int_phyclk; /* Clock from RCC to drive PHY */
u32 mode_reg; /* MAC glue-logic mode register */
struct regmap *regmap;
u32 speed;
+ const struct stm32_ops *ops;
+ struct device *dev;
+};
+
+struct stm32_ops {
+ int (*set_mode)(struct plat_stmmacenet_data *plat_dat);
+ int (*clk_prepare)(struct stm32_dwmac *dwmac, bool prepare);
+ int (*suspend)(struct stm32_dwmac *dwmac);
+ void (*resume)(struct stm32_dwmac *dwmac);
+ int (*parse_data)(struct stm32_dwmac *dwmac,
+ struct device *dev);
+ u32 syscfg_eth_mask;
};
static int stm32_dwmac_init(struct plat_stmmacenet_data *plat_dat)
{
struct stm32_dwmac *dwmac = plat_dat->bsp_priv;
- u32 reg = dwmac->mode_reg;
- u32 val;
int ret;
- val = (plat_dat->interface == PHY_INTERFACE_MODE_MII) ? 0 : 1;
- ret = regmap_update_bits(dwmac->regmap, reg, MII_PHY_SEL_MASK, val);
- if (ret)
- return ret;
+ if (dwmac->ops->set_mode) {
+ ret = dwmac->ops->set_mode(plat_dat);
+ if (ret)
+ return ret;
+ }
ret = clk_prepare_enable(dwmac->clk_tx);
if (ret)
return ret;
- ret = clk_prepare_enable(dwmac->clk_rx);
- if (ret)
- clk_disable_unprepare(dwmac->clk_tx);
+ if (!dwmac->dev->power.is_suspended) {
+ ret = clk_prepare_enable(dwmac->clk_rx);
+ if (ret) {
+ clk_disable_unprepare(dwmac->clk_tx);
+ return ret;
+ }
+ }
+
+ if (dwmac->ops->clk_prepare) {
+ ret = dwmac->ops->clk_prepare(dwmac, true);
+ if (ret) {
+ clk_disable_unprepare(dwmac->clk_rx);
+ clk_disable_unprepare(dwmac->clk_tx);
+ }
+ }
return ret;
}
+static int stm32mp1_clk_prepare(struct stm32_dwmac *dwmac, bool prepare)
+{
+ int ret = 0;
+
+ if (prepare) {
+ ret = clk_prepare_enable(dwmac->syscfg_clk);
+ if (ret)
+ return ret;
+
+ if (dwmac->int_phyclk) {
+ ret = clk_prepare_enable(dwmac->clk_eth_ck);
+ if (ret) {
+ clk_disable_unprepare(dwmac->syscfg_clk);
+ return ret;
+ }
+ }
+ } else {
+ clk_disable_unprepare(dwmac->syscfg_clk);
+ if (dwmac->int_phyclk)
+ clk_disable_unprepare(dwmac->clk_eth_ck);
+ }
+ return ret;
+}
+
+static int stm32mp1_set_mode(struct plat_stmmacenet_data *plat_dat)
+{
+ struct stm32_dwmac *dwmac = plat_dat->bsp_priv;
+ u32 reg = dwmac->mode_reg;
+ int val;
+
+ switch (plat_dat->interface) {
+ case PHY_INTERFACE_MODE_MII:
+ val = SYSCFG_PMCR_ETH_SEL_MII;
+ pr_debug("SYSCFG init : PHY_INTERFACE_MODE_MII\n");
+ break;
+ case PHY_INTERFACE_MODE_GMII:
+ val = SYSCFG_PMCR_ETH_SEL_GMII;
+ if (dwmac->int_phyclk)
+ val |= SYSCFG_PMCR_ETH_CLK_SEL;
+ pr_debug("SYSCFG init : PHY_INTERFACE_MODE_GMII\n");
+ break;
+ case PHY_INTERFACE_MODE_RMII:
+ val = SYSCFG_PMCR_ETH_SEL_RMII;
+ if (dwmac->int_phyclk)
+ val |= SYSCFG_PMCR_ETH_REF_CLK_SEL;
+ pr_debug("SYSCFG init : PHY_INTERFACE_MODE_RMII\n");
+ break;
+ case PHY_INTERFACE_MODE_RGMII:
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ case PHY_INTERFACE_MODE_RGMII_RXID:
+ case PHY_INTERFACE_MODE_RGMII_TXID:
+ val = SYSCFG_PMCR_ETH_SEL_RGMII;
+ if (dwmac->int_phyclk)
+ val |= SYSCFG_PMCR_ETH_CLK_SEL;
+ pr_debug("SYSCFG init : PHY_INTERFACE_MODE_RGMII\n");
+ break;
+ default:
+ pr_debug("SYSCFG init : Do not manage %d interface\n",
+ plat_dat->interface);
+ /* Do not manage others interfaces */
+ return -EINVAL;
+ }
+
+ return regmap_update_bits(dwmac->regmap, reg,
+ dwmac->ops->syscfg_eth_mask, val);
+}
+
+static int stm32mcu_set_mode(struct plat_stmmacenet_data *plat_dat)
+{
+ struct stm32_dwmac *dwmac = plat_dat->bsp_priv;
+ u32 reg = dwmac->mode_reg;
+ int val;
+
+ switch (plat_dat->interface) {
+ case PHY_INTERFACE_MODE_MII:
+ val = SYSCFG_MCU_ETH_SEL_MII;
+ pr_debug("SYSCFG init : PHY_INTERFACE_MODE_MII\n");
+ break;
+ case PHY_INTERFACE_MODE_RMII:
+ val = SYSCFG_MCU_ETH_SEL_RMII;
+ pr_debug("SYSCFG init : PHY_INTERFACE_MODE_RMII\n");
+ break;
+ default:
+ pr_debug("SYSCFG init : Do not manage %d interface\n",
+ plat_dat->interface);
+ /* Do not manage others interfaces */
+ return -EINVAL;
+ }
+
+ return regmap_update_bits(dwmac->regmap, reg,
+ dwmac->ops->syscfg_eth_mask, val);
+}
+
static void stm32_dwmac_clk_disable(struct stm32_dwmac *dwmac)
{
clk_disable_unprepare(dwmac->clk_tx);
clk_disable_unprepare(dwmac->clk_rx);
+
+ if (dwmac->ops->clk_prepare)
+ dwmac->ops->clk_prepare(dwmac, false);
}
static int stm32_dwmac_parse_data(struct stm32_dwmac *dwmac,
@@ -70,15 +204,22 @@ static int stm32_dwmac_parse_data(struct stm32_dwmac *dwmac,
/* Get TX/RX clocks */
dwmac->clk_tx = devm_clk_get(dev, "mac-clk-tx");
if (IS_ERR(dwmac->clk_tx)) {
- dev_err(dev, "No tx clock provided...\n");
+ dev_err(dev, "No ETH Tx clock provided...\n");
return PTR_ERR(dwmac->clk_tx);
}
+
dwmac->clk_rx = devm_clk_get(dev, "mac-clk-rx");
if (IS_ERR(dwmac->clk_rx)) {
- dev_err(dev, "No rx clock provided...\n");
+ dev_err(dev, "No ETH Rx clock provided...\n");
return PTR_ERR(dwmac->clk_rx);
}
+ if (dwmac->ops->parse_data) {
+ err = dwmac->ops->parse_data(dwmac, dev);
+ if (err)
+ return err;
+ }
+
/* Get mode register */
dwmac->regmap = syscon_regmap_lookup_by_phandle(np, "st,syscon");
if (IS_ERR(dwmac->regmap))
@@ -91,11 +232,46 @@ static int stm32_dwmac_parse_data(struct stm32_dwmac *dwmac,
return err;
}
+static int stm32mp1_parse_data(struct stm32_dwmac *dwmac,
+ struct device *dev)
+{
+ struct device_node *np = dev->of_node;
+
+ dwmac->int_phyclk = of_property_read_bool(np, "st,int-phyclk");
+
+ /* Check if internal clk from RCC selected */
+ if (dwmac->int_phyclk) {
+ /* Get ETH_CLK clocks */
+ dwmac->clk_eth_ck = devm_clk_get(dev, "eth-ck");
+ if (IS_ERR(dwmac->clk_eth_ck)) {
+ dev_err(dev, "No ETH CK clock provided...\n");
+ return PTR_ERR(dwmac->clk_eth_ck);
+ }
+ }
+
+ /* Clock used for low power mode */
+ dwmac->clk_ethstp = devm_clk_get(dev, "ethstp");
+ if (IS_ERR(dwmac->clk_ethstp)) {
+ dev_err(dev, "No ETH peripheral clock provided for CStop mode ...\n");
+ return PTR_ERR(dwmac->clk_ethstp);
+ }
+
+ /* Clock for sysconfig */
+ dwmac->syscfg_clk = devm_clk_get(dev, "syscfg-clk");
+ if (IS_ERR(dwmac->syscfg_clk)) {
+ dev_err(dev, "No syscfg clock provided...\n");
+ return PTR_ERR(dwmac->syscfg_clk);
+ }
+
+ return 0;
+}
+
static int stm32_dwmac_probe(struct platform_device *pdev)
{
struct plat_stmmacenet_data *plat_dat;
struct stmmac_resources stmmac_res;
struct stm32_dwmac *dwmac;
+ const struct stm32_ops *data;
int ret;
ret = stmmac_get_platform_resources(pdev, &stmmac_res);
@@ -112,6 +288,16 @@ static int stm32_dwmac_probe(struct platform_device *pdev)
goto err_remove_config_dt;
}
+ data = of_device_get_match_data(&pdev->dev);
+ if (!data) {
+ dev_err(&pdev->dev, "no of match data provided\n");
+ ret = -EINVAL;
+ goto err_remove_config_dt;
+ }
+
+ dwmac->ops = data;
+ dwmac->dev = &pdev->dev;
+
ret = stm32_dwmac_parse_data(dwmac, &pdev->dev);
if (ret) {
dev_err(&pdev->dev, "Unable to parse OF data\n");
@@ -149,15 +335,48 @@ static int stm32_dwmac_remove(struct platform_device *pdev)
return ret;
}
+static int stm32mp1_suspend(struct stm32_dwmac *dwmac)
+{
+ int ret = 0;
+
+ ret = clk_prepare_enable(dwmac->clk_ethstp);
+ if (ret)
+ return ret;
+
+ clk_disable_unprepare(dwmac->clk_tx);
+ clk_disable_unprepare(dwmac->syscfg_clk);
+ if (dwmac->int_phyclk)
+ clk_disable_unprepare(dwmac->clk_eth_ck);
+
+ return ret;
+}
+
+static void stm32mp1_resume(struct stm32_dwmac *dwmac)
+{
+ clk_disable_unprepare(dwmac->clk_ethstp);
+}
+
+static int stm32mcu_suspend(struct stm32_dwmac *dwmac)
+{
+ clk_disable_unprepare(dwmac->clk_tx);
+ clk_disable_unprepare(dwmac->clk_rx);
+
+ return 0;
+}
+
#ifdef CONFIG_PM_SLEEP
static int stm32_dwmac_suspend(struct device *dev)
{
struct net_device *ndev = dev_get_drvdata(dev);
struct stmmac_priv *priv = netdev_priv(ndev);
+ struct stm32_dwmac *dwmac = priv->plat->bsp_priv;
+
int ret;
ret = stmmac_suspend(dev);
- stm32_dwmac_clk_disable(priv->plat->bsp_priv);
+
+ if (dwmac->ops->suspend)
+ ret = dwmac->ops->suspend(dwmac);
return ret;
}
@@ -166,8 +385,12 @@ static int stm32_dwmac_resume(struct device *dev)
{
struct net_device *ndev = dev_get_drvdata(dev);
struct stmmac_priv *priv = netdev_priv(ndev);
+ struct stm32_dwmac *dwmac = priv->plat->bsp_priv;
int ret;
+ if (dwmac->ops->resume)
+ dwmac->ops->resume(dwmac);
+
ret = stm32_dwmac_init(priv->plat);
if (ret)
return ret;
@@ -181,8 +404,24 @@ static int stm32_dwmac_resume(struct device *dev)
static SIMPLE_DEV_PM_OPS(stm32_dwmac_pm_ops,
stm32_dwmac_suspend, stm32_dwmac_resume);
+static struct stm32_ops stm32mcu_dwmac_data = {
+ .set_mode = stm32mcu_set_mode,
+ .suspend = stm32mcu_suspend,
+ .syscfg_eth_mask = SYSCFG_MCU_ETH_MASK
+};
+
+static struct stm32_ops stm32mp1_dwmac_data = {
+ .set_mode = stm32mp1_set_mode,
+ .clk_prepare = stm32mp1_clk_prepare,
+ .suspend = stm32mp1_suspend,
+ .resume = stm32mp1_resume,
+ .parse_data = stm32mp1_parse_data,
+ .syscfg_eth_mask = SYSCFG_MP1_ETH_MASK
+};
+
static const struct of_device_id stm32_dwmac_match[] = {
- { .compatible = "st,stm32-dwmac"},
+ { .compatible = "st,stm32-dwmac", .data = &stm32mcu_dwmac_data},
+ { .compatible = "st,stm32mp1-dwmac", .data = &stm32mp1_dwmac_data},
{ }
};
MODULE_DEVICE_TABLE(of, stm32_dwmac_match);
@@ -199,5 +438,6 @@ static SIMPLE_DEV_PM_OPS(stm32_dwmac_pm_ops,
module_platform_driver(stm32_dwmac_driver);
MODULE_AUTHOR("Alexandre Torgue <alexandre.torgue@gmail.com>");
-MODULE_DESCRIPTION("STMicroelectronics MCU DWMAC Specific Glue layer");
+MODULE_AUTHOR("Christophe Roullier <christophe.roullier@st.com>");
+MODULE_DESCRIPTION("STMicroelectronics STM32 DWMAC Specific Glue layer");
MODULE_LICENSE("GPL v2");
--
1.9.1
^ permalink raw reply related
* [PATCH V4 4/8] ARM: dts: stm32: Add syscfg on stm32mp1
From: Christophe Roullier @ 2018-05-23 15:47 UTC (permalink / raw)
To: mark.rutland, mcoquelin.stm32, alexandre.torgue, peppe.cavallaro
Cc: devicetree, linux-arm-kernel, netdev, christophe.roullier, andrew
In-Reply-To: <1527090479-5263-1-git-send-email-christophe.roullier@st.com>
System configuration controller is mainly used to manage
the compensation cell and other IOs and system related
settings.
Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
---
arch/arm/boot/dts/stm32mp157c.dtsi | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/arm/boot/dts/stm32mp157c.dtsi b/arch/arm/boot/dts/stm32mp157c.dtsi
index bc3eddc..3db03a2 100644
--- a/arch/arm/boot/dts/stm32mp157c.dtsi
+++ b/arch/arm/boot/dts/stm32mp157c.dtsi
@@ -167,6 +167,11 @@
#reset-cells = <1>;
};
+ syscfg: syscon@50020000 {
+ compatible = "st,stm32mp157-syscfg", "syscon";
+ reg = <0x50020000 0x400>;
+ };
+
usart1: serial@5c000000 {
compatible = "st,stm32h7-uart";
reg = <0x5c000000 0x400>;
--
1.9.1
^ permalink raw reply related
* Re: [RFC PATCH 0/6] net: ethernet: ti: cpsw: add MQPRIO and CBS Qdisc offload
From: Grygorii Strashko @ 2018-05-23 15:43 UTC (permalink / raw)
To: Ivan Khoronzhuk, davem
Cc: corbet, akpm, netdev, linux-doc, linux-kernel, linux-omap,
vinicius.gomes, henrik, jesus.sanchez-palencia, Sekhar Nori,
Yogesh Siraswar, Schuyler Patton
In-Reply-To: <20180518211510.13341-1-ivan.khoronzhuk@linaro.org>
Hi Ivan,
On 05/18/2018 04:15 PM, Ivan Khoronzhuk wrote:
> This series adds MQPRIO and CBS Qdisc offload for TI cpsw driver.
> It potentially can be used in audio video bridging (AVB) and time
> sensitive networking (TSN).
>
> Patchset was tested on AM572x EVM and BBB boards. Last patch from this
> series adds detailed description of configuration with examples. For
> consistency reasons, in role of talker and listener, tools from
> patchset "TSN: Add qdisc based config interface for CBS" were used and
> can be seen here: https://www.spinics.net/lists/netdev/msg460869.html
>
> Based on net-next/master
Thanks a lot, it is great work.
In general I have no comments as of now, but I prefer to wait a bit (few
weeks) for more comments and possible test reports.
If no comments, pls feel free to repost as final series.
>
> Ivan Khoronzhuk (6):
> net: ethernet: ti: cpsw: use cpdma channels in backward order for txq
> net: ethernet: ti: cpdma: fit rated channels in backward order
> net: ethernet: ti: cpsw: add MQPRIO Qdisc offload
> net: ethernet: ti: cpsw: add CBS Qdisc offload
> net: ethernet: ti: cpsw: restore shaper configuration while down/up
> Documentation: networking: cpsw: add MQPRIO & CBS offload examples
>
> Documentation/networking/cpsw.txt | 540 ++++++++++++++++++++++++
> drivers/net/ethernet/ti/cpsw.c | 364 +++++++++++++++-
> drivers/net/ethernet/ti/davinci_cpdma.c | 31 +-
> 3 files changed, 913 insertions(+), 22 deletions(-)
> create mode 100644 Documentation/networking/cpsw.txt
>
--
regards,
-grygorii
^ permalink raw reply
* Re: WARNING in ip_recv_error
From: Willem de Bruijn @ 2018-05-23 15:40 UTC (permalink / raw)
To: David Miller
Cc: Eric Dumazet, DaeLyong Jeong, Alexey Kuznetsov, Hideaki YOSHIFUJI,
Network Development, LKML, Byoungyoung Lee, Kyungtae Kim,
bammanag, Willem de Bruijn
In-Reply-To: <CAF=yD-JObh3u+41V609=zph0XfcgOxpmqm3hL5WPKWFrSnnV3Q@mail.gmail.com>
On Sun, May 20, 2018 at 7:13 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Fri, May 18, 2018 at 2:59 PM, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Fri, May 18, 2018 at 2:46 PM, Willem de Bruijn
>> <willemdebruijn.kernel@gmail.com> wrote:
>>> On Fri, May 18, 2018 at 2:44 PM, Willem de Bruijn
>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>> On Fri, May 18, 2018 at 1:09 PM, Willem de Bruijn
>>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>>> On Fri, May 18, 2018 at 11:44 AM, David Miller <davem@davemloft.net> wrote:
>>>>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>>>>> Date: Fri, 18 May 2018 08:30:43 -0700
>>>>>>
>>>>>>> We probably need to revert Willem patch (7ce875e5ecb8562fd44040f69bda96c999e38bbc)
>>>>>>
>>>>>> Is it really valid to reach ip_recv_err with an ipv6 socket?
>>>>>
>>>>> I guess the issue is that setsockopt IPV6_ADDRFORM is not an
>>>>> atomic operation, so that the socket is neither fully ipv4 nor fully
>>>>> ipv6 by the time it reaches ip_recv_error.
>>>>>
>>>>> sk->sk_socket->ops = &inet_dgram_ops;
>>>>> < HERE >
>>>>> sk->sk_family = PF_INET;
>>>>>
>>>>> Even calling inet_recv_error to demux would not necessarily help.
>>>>>
>>>>> Safest would be to look up by skb->protocol, similar to what
>>>>> ipv6_recv_error does to handle v4-mapped-v6.
>>>>>
>>>>> Or to make that function safe with PF_INET and swap the order
>>>>> of the above two operations.
>>>>>
>>>>> All sound needlessly complicated for this rare socket option, but
>>>>> I don't have a better idea yet. Dropping on the floor is not nice,
>>>>> either.
>>>>
>>>> Ensuring that ip_recv_error correctly handles packets from either
>>>> socket and removing the warning should indeed be good.
>>>>
>>>> It is robust against v4-mapped packets from an AF_INET6 socket,
>>>> but see caveat on reconnect below.
>>>>
>>>> The code between ipv6_recv_error for v4-mapped addresses and
>>>> ip_recv_error is essentially the same, the main difference being
>>>> whether to return network headers as sockaddr_in with SOL_IP
>>>> or sockaddr_in6 with SOL_IPV6.
>>>>
>>>> There are very few other locations in the stack that explicitly test
>>>> sk_family in this way and thus would be vulnerable to races with
>>>> IPV6_ADDRFORM.
>>>>
>>>> I'm not sure whether it is possible for a udpv6 socket to queue a
>>>> real ipv6 packet on the error queue, disconnect, connect to an
>>>> ipv4 address, call IPV6_ADDRFORM and then call ip_recv_error
>>>> on a true ipv6 packet. That would return buggy data, e.g., in
>>>> msg_name.
>>>
>>> In do_ipv6_setsockopt IPV6_ADDRFORM we can test that the
>>> error queue is empty, and then take its lock for the duration of the
>>> operation.
>>
>> Actually, no reason to hold the lock. This setsockopt holds the socket
>> lock, which connect would need, too. So testing that the queue
>> is empty after testing that it is connected to a v4 address is
>> sufficient to ensure that no ipv6 packets are queued for reception.
>>
>> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
>> index 4d780c7f0130..a975d6311341 100644
>> --- a/net/ipv6/ipv6_sockglue.c
>> +++ b/net/ipv6/ipv6_sockglue.c
>> @@ -199,6 +199,11 @@ static int do_ipv6_setsockopt(struct sock *sk,
>> int level, int optname,
>>
>> if (ipv6_only_sock(sk) ||
>> !ipv6_addr_v4mapped(&sk->sk_v6_daddr)) {
>> retv = -EADDRNOTAVAIL;
>> break;
>> }
>>
>> + if (!skb_queue_empty(&sk->sk_error_queue)) {
>> + retv = -EBUSY;
>> + break;
>> + }
>> +
>> fl6_free_socklist(sk);
>> __ipv6_sock_mc_close(sk);
>>
>> After this it should be safe to remove the warning in ip_recv_error.
>
> Hmm.. nope.
>
> This ensures that the socket cannot produce any new true v6 packets.
> But it does not guarantee that they are not already in the system, e.g.
> queued in tc, and will find their way to the error queue later.
>
> We'll have to just be able to handle ipv6 packets in ip_recv_error.
> Since IPV6_ADDRFORM is used to pass to legacy v4-only
> processes and those likely are only confused by SOL_IPV6
> error messages, it is probably best to just drop them and perhaps
> WARN_ONCE.
Even more fun, this is not limited to the error queue.
I can queue a v6 packet for reception on a socket, connect to a v4
address, call IPV6_ADDRFORM and then a regular recvfrom will
return a partial v6 address as AF_INET.
We definitely do not want to have to add a check
if (skb->protocol == htons(ETH_P_IPV6)) {
kfree_skb(skb);
goto try_again;
}
to the normal recvmsg path.
An alternative may be to tighten the check on when to allow
IPV6_ADDRFORM. Not only return EBUSY if a packet is pending,
but also if any sk_{rmem, omem, wmem}_alloc is non-zero. Only,
these tightened constraints could break a legacy application.
Either way, this race is somewhat tangential to the one that
RaceFuzzer found. The sk changes that IPV6_ADDRFORM makes
to sk_prot, sk_socket->ops and sk_family are not atomic and will
not be. They need not be, because no other code assumes this
consistency.
So I'll start by removing the warning as Eric suggested.
^ permalink raw reply
* Re: [PATCH net-next v3 0/7] Add support for QCA8334 switch
From: Florian Fainelli @ 2018-05-23 15:39 UTC (permalink / raw)
To: Michal Vokáč, netdev
Cc: linux-kernel, devicetree, vivien.didelot, andrew, mark.rutland,
robh+dt, davem, michal.vokac
In-Reply-To: <1527056424-14528-1-git-send-email-michal.vokac@ysoft.com>
On 05/22/2018 11:20 PM, Michal Vokáč wrote:
> This series basically adds support for a QCA8334 ethernet switch to the
> qca8k driver. It is a four-port variant of the already supported seven
> port QCA8337. Register map is the same for the whole familly and all chips
> have the same device ID.
>
> Major part of this series enhances the CPU port setting. Currently the CPU
> port is not set to any sensible defaults compatible with the xGMII
> interface. This series forces the CPU port to its maximum bandwidth and
> also allows to adjust the new defaults using fixed-link device tree
> sub-node.
>
> Alongside these changes I fixed two checkpatch warnings regarding SPDX and
> redundant parentheses.
Looks great, thanks Michal! Do you have any features or things you are
working on that would be added later to the driver?
>
> Changes in v3:
> - Rebased on latest net-next/master.
> - Corrected fixed-link documentation.
>
> Michal Vokáč (7):
> net: dsa: qca8k: Add QCA8334 binding documentation
> net: dsa: qca8k: Add support for QCA8334 switch
> net: dsa: qca8k: Enable RXMAC when bringing up a port
> net: dsa: qca8k: Force CPU port to its highest bandwidth
> net: dsa: qca8k: Allow overwriting CPU port setting
> net: dsa: qca8k: Replace GPL boilerplate by SPDX
> net: dsa: qca8k: Remove redundant parentheses
>
> .../devicetree/bindings/net/dsa/qca8k.txt | 23 +++++++-
> drivers/net/dsa/qca8k.c | 64 ++++++++++++++++++----
> drivers/net/dsa/qca8k.h | 7 ++-
> 3 files changed, 79 insertions(+), 15 deletions(-)
>
--
Florian
^ permalink raw reply
* Re: [PATCH net-next v3 2/7] net: dsa: qca8k: Add support for QCA8334 switch
From: Florian Fainelli @ 2018-05-23 15:38 UTC (permalink / raw)
To: Michal Vokáč, netdev
Cc: linux-kernel, devicetree, vivien.didelot, andrew, mark.rutland,
robh+dt, davem, michal.vokac
In-Reply-To: <1527056424-14528-3-git-send-email-michal.vokac@ysoft.com>
On 05/22/2018 11:20 PM, Michal Vokáč wrote:
> Add support for the four-port variant of the Qualcomm QCA833x switch.
>
> Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
From: Huy Nguyen @ 2018-05-23 15:37 UTC (permalink / raw)
To: John Fastabend, Jiri Pirko, Jakub Kicinski
Cc: Saeed Mahameed, David S. Miller, netdev, Or Gerlitz
In-Reply-To: <e361efd5-293d-0712-7ddf-5ad2a838d013@gmail.com>
On 5/23/2018 8:52 AM, John Fastabend wrote:
> It would be nice though if the API gave us some hint on max/min/stride
> of allowed values. Could the get API return these along with current
> value? Presumably the allowed max size could change with devlink buffer
> changes in how the global buffer is divided up as well.
Acked. I will add Max. Let's skip min/stride since it is too hardware
specific.
^ permalink raw reply
* Re: [PATCH net-next v3 1/7] net: dsa: qca8k: Add QCA8334 binding documentation
From: Florian Fainelli @ 2018-05-23 15:37 UTC (permalink / raw)
To: Michal Vokáč, netdev
Cc: linux-kernel, devicetree, vivien.didelot, andrew, mark.rutland,
robh+dt, davem, michal.vokac
In-Reply-To: <1527056424-14528-2-git-send-email-michal.vokac@ysoft.com>
On 05/22/2018 11:20 PM, Michal Vokáč wrote:
> Add support for the four-port variant of the Qualcomm QCA833x switch.
>
> The CPU port default link settings can be reconfigured using
> a fixed-link sub-node.
>
> Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
> Reviewed-by: Rob Herring <robh@kernel.org>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* RE: [PATCH v2] packet: track ring entry use using a shadow ring to prevent RX ring overrun
From: Jon Rosen (jrosen) @ 2018-05-23 15:29 UTC (permalink / raw)
To: Willem de Bruijn
Cc: David S. Miller, Willem de Bruijn, Eric Dumazet, Kees Cook,
David Windsor, Rosen, Rami, Reshetova, Elena, Mike Maloney,
Benjamin Poirier, Thomas Gleixner, Greg Kroah-Hartman,
open list:NETWORKING [GENERAL], open list
In-Reply-To: <CAF=yD-LpGOTx_jrh10G59M4Y-vpdtx_q5SG1Hjc7452y9KUt4w@mail.gmail.com>
> -----Original Message-----
> From: Willem de Bruijn [mailto:willemdebruijn.kernel@gmail.com]
> Sent: Wednesday, May 23, 2018 9:37 AM
> To: Jon Rosen (jrosen) <jrosen@cisco.com>
> Cc: David S. Miller <davem@davemloft.net>; Willem de Bruijn <willemb@google.com>; Eric Dumazet
> <edumazet@google.com>; Kees Cook <keescook@chromium.org>; David Windsor <dwindsor@gmail.com>; Rosen,
> Rami <rami.rosen@intel.com>; Reshetova, Elena <elena.reshetova@intel.com>; Mike Maloney
> <maloney@google.com>; Benjamin Poirier <bpoirier@suse.com>; Thomas Gleixner <tglx@linutronix.de>; Greg
> Kroah-Hartman <gregkh@linuxfoundation.org>; open list:NETWORKING [GENERAL] <netdev@vger.kernel.org>;
> open list <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH v2] packet: track ring entry use using a shadow ring to prevent RX ring overrun
>
> On Wed, May 23, 2018 at 7:54 AM, Jon Rosen (jrosen) <jrosen@cisco.com> wrote:
> >> > For the ring, there is no requirement to allocate exactly the amount
> >> > specified by the user request. Safer than relying on shared memory
> >> > and simpler than the extra allocation in this patch would be to allocate
> >> > extra shadow memory at the end of the ring (and not mmap that).
> >> >
> >> > That still leaves an extra cold cacheline vs using tp_padding.
> >>
> >> Given my lack of experience and knowledge in writing kernel code
> >> it was easier for me to allocate the shadow ring as a separate
> >> structure. Of course it's not about me and my skills so if it's
> >> more appropriate to allocate at the tail of the existing ring
> >> then certainly I can look at doing that.
> >
> > The memory for the ring is not one contiguous block, it's an array of
> > blocks of pages (or 'order' sized blocks of pages). I don't think
> > increasing the size of each of the blocks to provided storage would be
> > such a good idea as it will risk spilling over into the next order and
> > wasting lots of memory. I suspect it's also more complex than a single
> > shadow ring to do both the allocation and the access.
> >
> > It could be tacked onto the end of the pg_vec[] used to store the
> > pointers to the blocks. The challenge with that is that a pg_vec[] is
> > created for each of RX and TX rings so either it would have to
> > allocate unnecessary storage for TX or the caller will have to say if
> > extra space should be allocated or not. E.g.:
> >
> > static struct pgv *alloc_pg_vec(struct tpacket_req *req, int order, int scratch, void **scratch_p)
> >
> > I'm not sure avoiding the extra allocation and moving it to the
> > pg_vec[] for the RX ring is going to get the simplification you were
> > hoping for. Is there another way of storing the shadow ring which
> > I should consider?
>
> I did indeed mean attaching extra pages to pg_vec[]. It should be
> simpler than a separate structure, but I may be wrong.
I don't think it would be too bad, it may actually turn out to be
convenient to implement.
>
> Either way, I still would prefer to avoid the shadow buffer completely.
> It incurs complexity and cycle cost on all users because of only the
> rare (non-existent?) consumer that overwrites the padding bytes.
I prefer that as well. I'm just not sure there is a bulletproof
solution without the shadow state. I also wish it were only a
theoretical issue but unfortunately it is actually something our
customers have seen.
>
> Perhaps we can use padding yet avoid deadlock by writing a
> timed value. The simplest would be jiffies >> N. Then only a
> process that writes this exact value would be subject to drops and
> then still only for a limited period.
>
> Instead of depending on wall clock time, like jiffies, another option
> would be to keep a percpu array of values. Each cpu has a zero
> entry if it is not writing, nonzero if it is. If a writer encounters a
> number in padding that is > num_cpus, then the state is garbage
> from userspace. If <= num_cpus, it is adhered to only until that cpu
> clears its entry, which is guaranteed to happen eventually.
>
> Just a quick thought. This might not fly at all upon closer scrutiny.
I'm not sure I understand the suggestion, but I'll think on it
some more.
Some other options maybe worth considering (in no specific order):
- test the application to see if it will consume entries if tp_status
is set to anything other than TP_STATUS_USER, only use shadow if
it doesn't strictly honor the TP_STATUS_USER bit.
- skip shadow if we see new TP_STATUS_USER_TO_KERNEL is used
- use tp_len == -1 to indicate inuse
^ permalink raw reply
* Re: [PATCH net-next] sfc: stop the TX queue before pushing new buffers
From: Edward Cree @ 2018-05-23 15:28 UTC (permalink / raw)
To: Martin Habets, linux-net-drivers, davem; +Cc: netdev, jarod
In-Reply-To: <152706844446.27257.6312747433904122379.stgit@mh-desktop.uk.solarflarecom.com>
On 23/05/18 10:41, Martin Habets wrote:
> efx_enqueue_skb() can push new buffers for the xmit_more functionality.
> We must stops the TX queue before this or else the TX queue does not get
> restarted and we get a netdev watchdog.
>
> In the error handling we may now need to unwind more than 1 packet, and
> we may need to push the new buffers onto the partner queue.
>
> Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2")
> Reported-by: Jarod Wilson <jarod@redhat.com>
> Signed-off-by: Martin Habets <mhabets@solarflare.com>
> ---
>
> Dave, could you please also queue up this patch for stable?
>
> drivers/net/ethernet/sfc/tx.c | 31 +++++++++++++++++++++++--------
> 1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
> index cece961f2e82..17e0697f42d5 100644
> --- a/drivers/net/ethernet/sfc/tx.c
> +++ b/drivers/net/ethernet/sfc/tx.c
> @@ -435,17 +435,18 @@ static int efx_tx_map_data(struct efx_tx_queue *tx_queue, struct sk_buff *skb,
> } while (1);
> }
>
> -/* Remove buffers put into a tx_queue. None of the buffers must have
> - * an skb attached.
> +/* Remove buffers put into a tx_queue for the current packet.
> + * None of the buffers must have an skb attached.
> */
> -static void efx_enqueue_unwind(struct efx_tx_queue *tx_queue)
> +static void efx_enqueue_unwind(struct efx_tx_queue *tx_queue,
> + unsigned int insert_count)
> {
> struct efx_tx_buffer *buffer;
> unsigned int bytes_compl = 0;
> unsigned int pkts_compl = 0;
>
> /* Work backwards until we hit the original insert pointer value */
> - while (tx_queue->insert_count != tx_queue->write_count) {
> + while (tx_queue->insert_count != insert_count) {
> --tx_queue->insert_count;
> buffer = __efx_tx_queue_get_insert_buffer(tx_queue);
> efx_dequeue_buffer(tx_queue, buffer, &pkts_compl, &bytes_compl);
> @@ -504,6 +505,8 @@ static int efx_tx_tso_fallback(struct efx_tx_queue *tx_queue,
> */
> netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
> {
> + unsigned int old_insert_count = tx_queue->insert_count;
> + bool xmit_more = skb->xmit_more;
> bool data_mapped = false;
> unsigned int segments;
> unsigned int skb_len;
> @@ -553,8 +556,10 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
> /* Update BQL */
> netdev_tx_sent_queue(tx_queue->core_txq, skb_len);
>
> + efx_tx_maybe_stop_queue(tx_queue);
> +
> /* Pass off to hardware */
> - if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq)) {
> + if (!xmit_more || netif_xmit_stopped(tx_queue->core_txq)) {
> struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
>
> /* There could be packets left on the partner queue if those
> @@ -577,14 +582,24 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
> tx_queue->tx_packets++;
> }
>
> - efx_tx_maybe_stop_queue(tx_queue);
> -
> return NETDEV_TX_OK;
>
>
> err:
> - efx_enqueue_unwind(tx_queue);
> + efx_enqueue_unwind(tx_queue, old_insert_count);
> dev_kfree_skb_any(skb);
> +
> + /* If we're not expecting another transmit and we had something to push
> + * on this queue or a partner queue then we need to push here to get the
> + * previous packets out.
> + */
> + if (!xmit_more) {
> + struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
> +
> + if (txq2->xmit_more_available)
> + efx_nic_push_buffers(txq2);
> + }
This only pushes the partner queue, it needs to also push this queue if
tx_queue->xmit_more_available (as your comment just above mentions).
Apart from this, LGTM.
-Ed
> +
> return NETDEV_TX_OK;
> }
>
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox