* Re: Soft lockup in inet_put_port on 4.6
From: Josef Bacik @ 2016-12-12 18:05 UTC (permalink / raw)
To: Eric Dumazet
Cc: Hannes Frederic Sowa, Tom Herbert,
Linux Kernel Network Developers
In-Reply-To: <1481343298.4930.208.camel@edumazet-glaptop3.roam.corp.google.com>
On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazet <eric.dumazet@gmail.com>
wrote:
> On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote:
>
>>
>> Hmm... Is your ephemeral port range includes the port your load
>> balancing app is using ?
>
> I suspect that you might have processes doing bind( port = 0) that are
> trapped into the bind_conflict() scan ?
>
> With 100,000 + timewaits there, this possibly hurts.
>
> Can you try the following loop breaker ?
It doesn't appear that the app is doing bind(port = 0) during normal
operation. I tested this patch and it made no difference. I'm going
to test simply restarting the app without changing to the SO_REUSEPORT
option. Thanks,
Josef
^ permalink raw reply
* Re: Soft lockup in tc_classify
From: Shahar Klein @ 2016-12-12 16:04 UTC (permalink / raw)
To: Daniel Borkmann, netdev
Cc: shahark, Roi Dayan, David Miller, Cong Wang, Jiri Pirko,
John Fastabend, Or Gerlitz, Hadar Hen Zion
In-Reply-To: <584EA60B.80803@iogearbox.net>
On 12/12/2016 3:28 PM, Daniel Borkmann wrote:
> Hi Shahar,
>
> On 12/12/2016 10:43 AM, Shahar Klein wrote:
>> Hi All,
>>
>> sorry for the spam, the first time was sent with html part and was
>> rejected.
>>
>> We observed an issue where a classifier instance next member is
>> pointing back to itself, causing a CPU soft lockup.
>> We found it by running traffic on many udp connections and then adding
>> a new flower rule using tc.
>>
>> We added a quick workaround to verify it:
>>
>> In tc_classify:
>>
>> for (; tp; tp = rcu_dereference_bh(tp->next)) {
>> int err;
>> + if (tp == tp->next)
>> + RCU_INIT_POINTER(tp->next, NULL);
>>
>>
>> We also had a print here showing tp->next is pointing to tp. With this
>> workaround we are not hitting the issue anymore.
>> We are not sure we fully understand the mechanism here - with the rtnl
>> and rcu locks.
>> We'll appreciate your help solving this issue.
>
> Note that there's still the RCU fix missing for the deletion race that
> Cong will still send out, but you say that the only thing you do is to
> add a single rule, but no other operation in involved during that test?
>
> Do you have a script and kernel .config for reproducing this?
I'm using a user space socket
app(https://github.com/shahar-klein/noodle)on a vm to push udp packets
from ~2000 different udp src ports ramping up at ~100 per second towards
another vm on the same Hypervisor. Once the traffic starts I'm pushing
ingress flower tc udp rules(even_udp_src_port->mirred, odd->drop) on the
relevant representor in the Hypervisor.
>
> Thanks,
> Daniel
^ permalink raw reply
* Re: [PATCH] sh_eth: add wake-on-lan support via magic packet
From: Sergei Shtylyov @ 2016-12-12 17:35 UTC (permalink / raw)
To: Niklas Söderlund; +Cc: Simon Horman, netdev, linux-renesas-soc
In-Reply-To: <20161212154955.GA17342@bigcity.dyn.berto.se>
On 12/12/2016 06:49 PM, Niklas Söderlund wrote:
> Thanks for your feedback.
Not at all, it's my duty now. :-)
I should probably have warned you not to post the new version to netdev --
DaveM has closed his net-next.git tree (ahead of the usual time, which would
have been 4.9 release), so you posting would only upset him...
[...]
>>>> You only enable the WOL support fo the R-Car gen2 chips but never say that
>>>> explicitly, neither in the subject nor here.
>>>>
>>>>> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
>>>>> ---
>>>>> drivers/net/ethernet/renesas/sh_eth.c | 120 +++++++++++++++++++++++++++++++---
>>>>> drivers/net/ethernet/renesas/sh_eth.h | 4 ++
>>>>> 2 files changed, 116 insertions(+), 8 deletions(-)
>>>>
>>>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
>>>>> index 05b0dc5..3974046 100644
>>>>> --- a/drivers/net/ethernet/renesas/sh_eth.c
>>>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c
[...]
>>>> + /* Handle MagicPacket interrupt */
>>>> + if (sh_eth_read(ndev, ECSR) & ECSR_MPD)
>>
>> What if it wasn't enabled ATM?
>
> Sorry I don't understand this comment.
I'm trying to handle only the enabled interrupts but this hasn't been
consistently done yet (only for EESR, not ECSR), so nevermind. :-)
[...]
>>>>> @@ -3150,15 +3193,71 @@ static int sh_eth_drv_remove(struct platform_device *pdev)
[...]
>>> This is how it's done in
>>> other parts of the driver when disabling interrupts.
>>
>> Not in all parts of the driver that disable EESIPR interrupts... I must
>> confess that I never liked that 'mdp->irq_enabled' flag and still suspect we
>> can get things done without it... I need to look at this code again, sigh...
Well, we can't most probably but I have a patch almost ready that turns
the boolean flag into u32 field holding the EESIPR value to be written next.
Would that help you?
>>> This is also why I only check for MagicPacket interrupts if irq_enabled
>>> is false.
>>
>> I would have preferred that this was done with the other EMAC interrupts,
>> in sh_eth_error().
>
> I removed the check for Magic Packet in sh_eth_interrupt() and running
> without setting mdp->irq_enabled = false. sh_eth_error() will then clear
> any ECI interrupt so no need to add Magic Packet detection to it since
> all that is needed on Magic Packet is to clear the interrupt which
> already is done. This works and I can do multiple suspend/resume cycles,
> will be in v2 thanks for the suggestion.
OK, let's see what you have when I have some more time. We have a lot of
time for ironing things out till net-next is opened again -- which will happen
after -rc1)...
[...]
>>>>> +
>>>>> + /* Enable MagicPacket */
>>>>> + sh_eth_modify(ndev, ECMR, 0, ECMR_PMDE);
>>>>> +
>>>>> + /* Increased clock usage so device won't be suspended */
>>>>> + clk_enable(mdp->clk);
>>>>
>>>> Hum, intermixiggn runtime PM with clock API doesn't look good...
>>>
>>> I agree it looks weird but I need a way to increment the usage count for
>>> the clock otherwise the PM code will disable the module clock and WoL
>>> will not work.
>>
>> How will it do it if you don't call sh_eth_close() in this case?
>>
>>> Note that this call will not enable the clock just
>>> increase the usage count so it won't be disabled when the PM code
>>> decrease it after the sh_eth suspend function is run.
>>
>> You mean that the PM code calls RPM or clk API on its own? That's strange...
>
> Yes it calls clk API.
Hum, will have to look into it as well...
[...]
MBR, Sergei
^ permalink raw reply
* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: Jesper Dangaard Brouer @ 2016-12-12 17:13 UTC (permalink / raw)
To: John Fastabend
Cc: Mike Rapoport, netdev@vger.kernel.org, linux-mm, Willem de Bruijn,
Björn Töpel, Karlsson, Magnus, Alexander Duyck,
Mel Gorman, Tom Herbert, Brenden Blanco, Tariq Toukan,
Saeed Mahameed, Jesse Brandeburg, Kalman Meth, Vladislav Yasevich,
brouer
In-Reply-To: <584EB8DF.8000308@gmail.com>
On Mon, 12 Dec 2016 06:49:03 -0800
John Fastabend <john.fastabend@gmail.com> wrote:
> On 16-12-12 06:14 AM, Mike Rapoport wrote:
> > On Mon, Dec 12, 2016 at 10:40:42AM +0100, Jesper Dangaard Brouer wrote:
> >>
> >> On Mon, 12 Dec 2016 10:38:13 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> >>
> >>> Hello Jesper,
> >>>
> >>> On Mon, Dec 05, 2016 at 03:31:32PM +0100, Jesper Dangaard Brouer wrote:
> >>>> Hi all,
> >>>>
> >>>> This is my design for how to safely handle RX zero-copy in the network
> >>>> stack, by using page_pool[1] and modifying NIC drivers. Safely means
> >>>> not leaking kernel info in pages mapped to userspace and resilience
> >>>> so a malicious userspace app cannot crash the kernel.
> >>>>
> >>>> Design target
> >>>> =============
> >>>>
> >>>> Allow the NIC to function as a normal Linux NIC and be shared in a
> >>>> safe manor, between the kernel network stack and an accelerated
> >>>> userspace application using RX zero-copy delivery.
> >>>>
> >>>> Target is to provide the basis for building RX zero-copy solutions in
> >>>> a memory safe manor. An efficient communication channel for userspace
> >>>> delivery is out of scope for this document, but OOM considerations are
> >>>> discussed below (`Userspace delivery and OOM`_).
> >>>
> >>> Sorry, if this reply is a bit off-topic.
> >>
> >> It is very much on topic IMHO :-)
> >>
> >>> I'm working on implementation of RX zero-copy for virtio and I've dedicated
> >>> some thought about making guest memory available for physical NIC DMAs.
> >>> I believe this is quite related to your page_pool proposal, at least from
> >>> the NIC driver perspective, so I'd like to share some thoughts here.
> >>
> >> Seems quite related. I'm very interested in cooperating with you! I'm
> >> not very familiar with virtio, and how packets/pages gets channeled
> >> into virtio.
> >
> > They are copied :-)
> > Presuming we are dealing only with vhost backend, the received skb
> > eventually gets converted to IOVs, which in turn are copied to the guest
> > memory. The IOVs point to the guest memory that is allocated by virtio-net
> > running in the guest.
> >
>
> Great I'm also doing something similar.
>
> My plan was to embed the zero copy as an AF_PACKET mode and then push
> a AF_PACKET backend into vhost. I'll post a patch later this week.
>
> >>> The idea is to dedicate one (or more) of the NIC's queues to a VM, e.g.
> >>> using macvtap, and then propagate guest RX memory allocations to the NIC
> >>> using something like new .ndo_set_rx_buffers method.
> >>
> >> I believe the page_pool API/design aligns with this idea/use-case.
> >>
> >>> What is your view about interface between the page_pool and the NIC
> >>> drivers?
> >>
> >> In my Prove-of-Concept implementation, the NIC driver (mlx5) register
> >> a page_pool per RX queue. This is done for two reasons (1) performance
> >> and (2) for supporting use-cases where only one single RX-ring queue is
> >> (re)configured to support RX-zero-copy. There are some associated
> >> extra cost of enabling this mode, thus it makes sense to only enable it
> >> when needed.
> >>
> >> I've not decided how this gets enabled, maybe some new driver NDO. It
> >> could also happen when a XDP program gets loaded, which request this
> >> feature.
> >>
> >> The macvtap solution is nice and we should support it, but it requires
> >> VM to have their MAC-addr registered on the physical switch. This
> >> design is about adding flexibility. Registering an XDP eBPF filter
> >> provides the maximum flexibility for matching the destination VM.
> >
> > I'm not very familiar with XDP eBPF, and it's difficult for me to estimate
> > what needs to be done in BPF program to do proper conversion of skb to the
> > virtio descriptors.
>
> I don't think XDP has much to do with this code and they should be done
> separately. XDP runs eBPF code on received packets after the DMA engine
> has already placed the packet in memory so its too late in the process.
It does not have to be connected to XDP. My idea should support RX
zero-copy into normal sockets, without XDP.
My idea was to pre-VMA map the RX ring, when zero-copy is requested,
thus it is not too late in the process. When frame travel the normal
network stack, then require the SKB-read-only-page mode (skb-frags).
If the SKB reach a socket that support zero-copy, then we can do RX
zero-copy on normal sockets.
> The other piece here is enabling XDP in vhost but that is again separate
> IMO.
>
> Notice that ixgbe supports pushing packets into a macvlan via 'tc'
> traffic steering commands so even though macvlan gets an L2 address it
> doesn't mean it can't use other criteria to steer traffic to it.
This sounds interesting. As this allow much more flexibility macvlan
matching, which I like, but still depending on HW support.
> > We were not considered using XDP yet, so we've decided to limit the initial
> > implementation to macvtap because we can ensure correspondence between a
> > NIC queue and virtual NIC, which is not the case with more generic tap
> > device. It could be that use of XDP will allow for a generic solution for
> > virtio case as well.
>
> Interesting this was one of the original ideas behind the macvlan
> offload mode. iirc Vlad also was interested in this.
>
> I'm guessing this was used because of the ability to push macvlan onto
> its own queue?
>
> >
> >>
> >>> Have you considered using "push" model for setting the NIC's RX memory?
> >>
> >> I don't understand what you mean by a "push" model?
> >
> > Currently, memory allocation in NIC drivers boils down to alloc_page with
> > some wrapping code. I see two possible ways to make NIC use of some
> > preallocated pages: either NIC driver will call an API (probably different
> > from alloc_page) to obtain that memory, or there will be NDO API that
> > allows to set the NIC's RX buffers. I named the later case "push".
>
> I prefer the ndo op. This matches up well with AF_PACKET model where we
> have "slots" and offload is just a transparent "push" of these "slots"
> to the driver. Below we have a snippet of our proposed API,
Hmmm. If you can rely on hardware setup to give you steering and
dedicated access to the RX rings. In those cases, I guess, the "push"
model could be a more direct API approach.
I was shooting for a model that worked without hardware support. And
then transparently benefit from HW support by configuring a HW filter
into a specific RX queue and attaching/using to that queue.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH net-next 1/3] net:dsa:mv88e6xxx: use hashtable to store multicast entries
From: Vivien Didelot @ 2016-12-12 17:11 UTC (permalink / raw)
To: Florian Fainelli, Volodymyr Bendiuga
Cc: Volodymyr Bendiuga, andrew, netdev, Volodymyr Bendiuga
In-Reply-To: <48ff1136-dd8f-7704-a512-c23b27989bf8@gmail.com>
Hi all,
Florian Fainelli <f.fainelli@gmail.com> writes:
> Seeing such a change makes me wonder if we should not try to push some
> of this hashtable abstraction (provided that we agree we want it) at a
> higher layer, like net/dsa/slave.c?
That is the major reason why I am reluctant to cache stuffs in drivers.
In most cases, we want the DSA drivers to be "stupid", as much stateless
as possible, simply implementing the supported DSA switch operations.
The DSA core then handles the generic logic of how switch fabrics should
behave, and thus all DSA drivers are consistent and benefit from this.
Thanks,
Vivien
^ permalink raw reply
* Re: [PATCH v2] audit: use proper refcount locking on audit_sock
From: Paul Moore @ 2016-12-12 17:10 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: netdev, linux-kernel, edumazet, linux-audit, xiyou.wangcong,
dvyukov
In-Reply-To: <5714bd7468cfec225407a6c367e658478d590495.1481534171.git.rgb@redhat.com>
On Mon, Dec 12, 2016 at 5:03 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Resetting audit_sock appears to be racy.
>
> audit_sock was being copied and dereferenced without using a refcount on
> the source sock.
>
> Bump the refcount on the underlying sock when we store a refrence in
> audit_sock and release it when we reset audit_sock. audit_sock
> modification needs the audit_cmd_mutex.
>
> See: https://lkml.org/lkml/2016/11/26/232
>
> Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang
> <xiyou.wangcong@gmail.com> on ideas how to fix it.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
> There has been a lot of change in the audit code that is about to go
> upstream to address audit queue issues. This patch is based on the
> source tree: git://git.infradead.org/users/pcmoore/audit#next
> ---
> kernel/audit.c | 34 ++++++++++++++++++++++++++++------
> 1 files changed, 28 insertions(+), 6 deletions(-)
This is coming in pretty late for the v4.10 merge window, much later
than I would usually take things, but this is arguably important, and
(at first glance) relatively low risk - what testing have you done on
this?
> diff --git a/kernel/audit.c b/kernel/audit.c
> index f20eee0..439f7f3 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -452,7 +452,9 @@ static void auditd_reset(void)
> struct sk_buff *skb;
>
> /* break the connection */
> + sock_put(audit_sock);
> audit_pid = 0;
> + audit_nlk_portid = 0;
> audit_sock = NULL;
>
> /* flush all of the retry queue to the hold queue */
> @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
> if (rc >= 0) {
> consume_skb(skb);
> rc = 0;
> + } else {
> + if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
> + mutex_lock(&audit_cmd_mutex);
> + auditd_reset();
> + mutex_unlock(&audit_cmd_mutex);
> + }
> }
>
> return rc;
> @@ -579,7 +587,9 @@ static int kauditd_thread(void *dummy)
>
> auditd = 0;
> if (AUDITD_BAD(rc, reschedule)) {
> + mutex_lock(&audit_cmd_mutex);
> auditd_reset();
> + mutex_unlock(&audit_cmd_mutex);
> reschedule = 0;
> }
> } else
> @@ -594,7 +604,9 @@ static int kauditd_thread(void *dummy)
> auditd = 0;
> if (AUDITD_BAD(rc, reschedule)) {
> kauditd_hold_skb(skb);
> + mutex_lock(&audit_cmd_mutex);
> auditd_reset();
> + mutex_unlock(&audit_cmd_mutex);
> reschedule = 0;
> } else
> /* temporary problem (we hope), queue
> @@ -623,7 +635,9 @@ quick_loop:
> if (rc) {
> auditd = 0;
> if (AUDITD_BAD(rc, reschedule)) {
> + mutex_lock(&audit_cmd_mutex);
> auditd_reset();
> + mutex_unlock(&audit_cmd_mutex);
> reschedule = 0;
> }
>
> @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> return -EACCES;
> }
> if (audit_pid && new_pid &&
> - audit_replace(requesting_pid) != -ECONNREFUSED) {
> + (audit_replace(requesting_pid) & (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> audit_log_config_change("audit_pid", new_pid, audit_pid, 0);
> return -EEXIST;
> }
> if (audit_enabled != AUDIT_OFF)
> audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
> - audit_pid = new_pid;
> - audit_nlk_portid = NETLINK_CB(skb).portid;
> - audit_sock = skb->sk;
> - if (!new_pid)
> + if (new_pid) {
> + if (audit_sock)
> + sock_put(audit_sock);
> + audit_pid = new_pid;
> + audit_nlk_portid = NETLINK_CB(skb).portid;
> + sock_hold(skb->sk);
> + audit_sock = skb->sk;
> + } else {
> auditd_reset();
> + }
> wake_up_interruptible(&kauditd_wait);
> }
> if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
> @@ -1283,8 +1302,11 @@ static void __net_exit audit_net_exit(struct net *net)
> {
> struct audit_net *aunet = net_generic(net, audit_net_id);
> struct sock *sock = aunet->nlsk;
> - if (sock == audit_sock)
> + if (sock == audit_sock) {
> + mutex_lock(&audit_cmd_mutex);
> auditd_reset();
> + mutex_unlock(&audit_cmd_mutex);
> + }
>
> RCU_INIT_POINTER(aunet->nlsk, NULL);
> synchronize_net();
> --
> 1.7.1
>
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit
--
paul moore
www.paul-moore.com
^ permalink raw reply
* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Jason Gunthorpe @ 2016-12-12 17:07 UTC (permalink / raw)
To: Selvin Xavier
Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CA+sbYW1ZEa5fndGkvN8OXr-orcUx4jaL73Di8zBJQX_uCdK=Ww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Sat, Dec 10, 2016 at 11:06:58AM +0530, Selvin Xavier wrote:
> On Fri, Dec 9, 2016 at 12:17 PM, Selvin Xavier
> <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
> > I am preparing a git repository with these changes as per Jason's
> > comment and will share the details later today.
>
> Please use bnxt_re branch in this git repository.
>
> https://github.com/Broadcom/linux-rdma-nxt.git
Why are you using __packed in bnxt_re_uverbs_abi.h ? that doesn't seem
necessary. It is a good idea to make sure all those structures are a
multiple of 64 bits (add explicit reserved fields), and make sure you
test 32 bit verbs as well.
Why are you using debugfs just to export counters? Isn't the core code
counter framework good enough?
Please try and avoid writing functions as defines (eg rdev_to_dev,
to_bnxt_re, SQE_PG, RCFW_CMDQ_COOKIE, PTR_PG etc)
There is something wrong with the tabs and spaces (see
https://github.com/Broadcom/linux-rdma-nxt/blob/03e23b087f7e86ea28656273994e065827210ce5/drivers/infiniband/hw/bnxtre/bnxt_re_hsi.h)
FWIW, I really dislike the column alignment style, it is so hard to
maintain..
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next 0/2] Add ethtool set regs support
From: Florian Fainelli @ 2016-12-12 17:00 UTC (permalink / raw)
To: Andrew Lunn, Saeed Mahameed
Cc: Saeed Mahameed, David S. Miller, Linux Netdev List,
John W . Linville
In-Reply-To: <20161211152229.GC29761@lunn.ch>
On 12/11/2016 07:22 AM, Andrew Lunn wrote:
> On Sun, Dec 11, 2016 at 02:18:00PM +0200, Saeed Mahameed wrote:
>> On Wed, Dec 7, 2016 at 4:41 AM, Andrew Lunn <andrew@lunn.ch> wrote:
>>> On Wed, Dec 07, 2016 at 12:33:08AM +0200, Saeed Mahameed wrote:
>>>> Hi Dave,
>>>>
>>>> This series adds the support for setting device registers from user
>>>> space ethtool.
>>>
>>> Is this not the start of allowing binary only drivers in user space?
>>>
>>
>> It is not, we want to do same as set_eeprom already do,
>> Just set some HW registers, for analysis/debug/tweak/configure HW
>> dependent register for the NIC netdev sake.
>
> Mellanox has a good reputation of open drivers. However, this API
> sounds like it would be the first step towards user space
> drivers. This is an API which can peek and poke registers, so it
> probably could be mis-used to put part of a driver in user
> space. Hence we should avoid this sort of API to start with.
I don't necessarily share your concerns here on the proprietary vs. open
source driver, because this interface is limited to the register space,
not the data path, there is only a handful of things you can do here,
but getting a NIC to work, is not probably one of them.
My concern is more with the support/debugging aspect, if someone starts
tweaking a gazillion of registers through that interface, and I have no
way to tell, how am I going to support that? Of course, the first step
is: have you tried to turn it on and off again, and see if this is
reproducible, but what if I was asked/told to tweak this or that value
first, etc... it can be hard to collect the exact state in which a NIC
is at the time of the problem.
NB: on the proprietary driver side, you can already mmap() the PCI
device's space and write an entire user-space based driver (DPDK) and
bypass the kernel for most things, ethtool -D is not much worse here
since it just offers a subset (and a small one) of that.
--
Florian
^ permalink raw reply
* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Jonathan Toppins @ 2016-12-12 16:54 UTC (permalink / raw)
To: Selvin Xavier, dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
On 12/09/2016 01:47 AM, Selvin Xavier wrote:
> This series introduces the RoCE driver for the Broadcom
> NetXtreme-E 10/25/40/50 gigabit RoCE HCAs.
> This driver is dependent on the bnxt_en NIC driver and is
> based on the bnxt_re branch in Doug's repository. bnxt_en changes
> required for this patch series is already available in this branch.
>
> I am preparing a git repository with these changes as per Jason's
> comment and will share the details later today.
>
> v1-> v2:
> * The license text in each file updated to reflect Dual license.
> * Makefile and Kconfig changes are pushed to the last patch
> * Moved bnxt_re_uverbs_abi.h to include/uapi/rdma folder
> * Remove duplicate structure definitions from bnxt_re_hsi.h as
> it is available in the corresponding bnxt_en header file (bnxt_hsi.h)
> * Removed some unused code reported during code review.
> * Fixed few sparse warnings
>
I get the following sparse errors (filtered for only bnxt_re ones),
please let me know if they are false positives:
$ make C=2 drivers/net/ethernet/broadcom/bnxt/bnxt_en.ko
drivers/infiniband/hw/bnxtre/bnxt_re.ko
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
CHECK arch/x86/purgatory/purgatory.c
[...]
CHECK arch/x86/purgatory/sha256.c
CHECK arch/x86/purgatory/string.c
[...]
CHK include/generated/bounds.h
CHK include/generated/timeconst.h
CHK include/generated/asm-offsets.h
CALL scripts/checksyscalls.sh
CHECK scripts/mod/empty.c
CHECK drivers/net/ethernet/broadcom/bnxt/bnxt.c
CHECK drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
CHECK drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
CHECK drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
CHECK drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
MODPOST 2 modules
CHECK drivers/infiniband/hw/bnxtre/bnxt_re_main.c
CHECK drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c
[...]
CHECK drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.c
CHECK drivers/infiniband/hw/bnxtre/bnxt_qplib_res.c
drivers/infiniband/hw/bnxtre/bnxt_qplib_res.c:729:6: warning: symbol
'bnxt_qplib_cleanup_pkey_tbl' was not declared. Should it be static?
CHECK drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c
CHECK drivers/infiniband/hw/bnxtre/bnxt_qplib_sp.c
CHECK drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c
drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1015:22: warning: context
imbalance in 'bnxt_qplib_lock_cqs' - wrong count at exit
drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1030:28: warning: context
imbalance in 'bnxt_qplib_unlock_cqs' - unexpected unlock
MODPOST 2 modules
-Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: fib_frontend: Add network specific broadcasts, when it takes a sense
From: Hannes Frederic Sowa @ 2016-12-12 16:52 UTC (permalink / raw)
To: Brandon Philips, Jiri Benc; +Cc: netdev, Tom Denham, Aaron Levy, Brad Ison
In-Reply-To: <CAD2oYtPPsvHgitejXTMAYa8qVvkibdPruEUcv8hhhiCfgfOvvw@mail.gmail.com>
On 12.12.2016 15:44, Brandon Philips wrote:
> On Mon, Dec 12, 2016 at 9:30 AM, Jiri Benc <jbenc@redhat.com> wrote:
>> On Fri, 9 Dec 2016 15:41:52 -0800, Brandon Philips wrote:
>>> The issue we have: when creating the VXLAN interface and assigning it
>>> an address we see a broadcast route being added by the Kernel. For
>>> example if we have 10.4.0.0/16 a broadcast route to 10.4.0.0 is
>>> created. This route is unwanted because we assign 10.4.0.0 to one of
>>> our VXLAN interfaces.
>>
>> Are you saying you're trying to assign the IP address 10.4.0.0/16 as a
>> unicast address to an interface? Then you'll run into way more problems
>> than the one you're describing. You can't have host part of the IP
>> address consisting of all zeros (or all ones). Just don't do it. Choose
>> a valid IP address instead.
>
> Yes, this is what we are doing; it is because of an upstream, to us,
> address assignment so I will figure it out upstream.
>
> Regardless, it is hard to find an RFC that says "simply don't do this
> because _____". The closest I could find was RFC 922 after sending
> this which says:
>
> "There is probably no reason for such addresses to appear anywhere but
> as the source address of an ICMP Information".
Alternatively you can renumber the network to use /32 and add the
unicast routes for your /16 yourself.
Bye,
Hannes
^ permalink raw reply
* RE: Re: Synopsys Ethernet QoS
From: Stephen Warren @ 2016-12-12 16:46 UTC (permalink / raw)
To: Niklas Cassel, Joao Pinto, Florian Fainelli, Andy Shevchenko
Cc: David Miller, Giuseppe CAVALLARO, larper@axis.com,
rabinv@axis.com, netdev, CARLOS.PALMINHA@synopsys.com,
Jie.Deng1@synopsys.com, pavel@ucw.cz, Thierry Reding
In-Reply-To: <1d445ec1-deb8-6e36-39c4-6813c446095f@axis.com>
Niklas Cassel wrote at Monday, December 12, 2016 9:25 AM:
...
> However, I've noticed that NVIDIA has extended the DWC EQoS DT binding,
> I don't how easy it would be for them to switch to stmmac's DT binding.
> (Adding Stephen Warren to CC.)
I don't believe there's any issue switching drivers, so long as the new one
works on our HW. However, we can't switch DT binding since that's an ABI.
So, if we switch drivers, the "new" driver needs to support all existing
DT bindings.
FWIW, I'd recommend that we don't name the "new" driver stmmac or anything
like that. Rather, name it after the IP block so that any new user of the same
IP block will find the name they expect in the source tree, and just use it.
Renaming the "new" driver to dwc_eqos or similar might be one way to achieve
that.
--
nvpublic
^ permalink raw reply
* Re: [PATCH net-next 1/3] net:dsa:mv88e6xxx: use hashtable to store multicast entries
From: Florian Fainelli @ 2016-12-12 16:37 UTC (permalink / raw)
To: Volodymyr Bendiuga, Vivien Didelot
Cc: Volodymyr Bendiuga, andrew, netdev, Volodymyr Bendiuga
In-Reply-To: <CAMr9Lbp5eCg1oyWGN+uiDEcF0VZuKUi87FH6JYTGj6pL82R+Mw@mail.gmail.com>
On 12/12/2016 07:22 AM, Volodymyr Bendiuga wrote:
> Hi,
>
> I apologise for incorrectly formatted patch, I will fix and resend it.
> The problem with the ATU right now is that it is too slow when inserting
> entries.
> When the OS boots up, it might insert some multicast entries into the
> atu (if
> they are preconfigured by user). I run a test with 10 mc entries being
> configured for
> each port (13 ports), and it took 15 seconds, which made system quite
> slow on responding to
> other commands, as it has been inserting mc entries. The implementation
> with hashtable
> made insert command for 13 ports and 10 entries per port about 700 msec
> long.
Just wondering how do you achieve such speed up? What part of using a
hashtable allows you not to write down all 10 MC entries across the
ports? I would assume that the number of MDIO (is that the bus you are
using here?) operations would be identical.
Seeing such a change makes me wonder if we should not try to push some
of this hashtable abstraction (provided that we agree we want it) at a
higher layer, like net/dsa/slave.c?
Thanks!
--
Florian
^ permalink raw reply
* Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device
From: Florian Fainelli @ 2016-12-12 16:33 UTC (permalink / raw)
To: maowenan, David Laight, netdev@vger.kernel.org,
dingtianhong@huawei.com, weiyongjun (A)
In-Reply-To: <7902ba31-de5d-27cf-f631-4a88a78b246f@huawei.com>
On 12/12/2016 12:49 AM, maowenan wrote:
>
>
> On 2016/12/5 16:47, maowenan wrote:
>>
>>
>> On 2016/12/2 17:45, David Laight wrote:
>>> From: Mao Wenan
>>>> Sent: 30 November 2016 10:23
>>>> The nic in my board use the phy dev from marvell, and the system will
>>>> load the marvell phy driver automatically, but when I remove the phy
>>>> drivers, the system immediately panic:
>>>> Call trace:
>>>> [ 2582.834493] [<ffff800000715384>] phy_state_machine+0x3c/0x438 [
>>>> 2582.851754] [<ffff8000000db3b8>] process_one_work+0x150/0x428 [
>>>> 2582.868188] [<ffff8000000db7d4>] worker_thread+0x144/0x4b0 [
>>>> 2582.883882] [<ffff8000000e1d0c>] kthread+0xfc/0x110
>>>>
>>>> there should be proper reference counting in place to avoid that.
>>>> I found that phy_attach_direct() forgets to add phy device driver
>>>> reference count, and phy_detach() forgets to subtract reference count.
>>>> This patch is to fix this bug, after that panic is disappeared when remove
>>>> marvell.ko
>>>>
>>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>>> ---
>>>> drivers/net/phy/phy_device.c | 7 +++++++
>>>> 1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>> index 1a4bf8a..a7ec7c2 100644
>>>> --- a/drivers/net/phy/phy_device.c
>>>> +++ b/drivers/net/phy/phy_device.c
>>>> @@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>> return -EIO;
>>>> }
>>>>
>>>> + if (!try_module_get(d->driver->owner)) {
>>>> + dev_err(&dev->dev, "failed to get the device driver module\n");
>>>> + return -EIO;
>>>> + }
>>>
>>> If this is the phy code, what stops the phy driver being unloaded
>>> before the try_module_get() obtains a reference.
>>> If it isn't the phy driver then there ought to be a reference count obtained
>>> when the phy driver is located (by whatever decides which phy driver to use).
>>> Even if that code later releases its reference (it probably shouldn't on success)
>>> then you can't fail to get an extra reference here.
>>
>> [Mao Wenan]Yes, this is phy code, in function phy_attach_direct(), drivers/net/phy/phy_device.c.
>> when one NIC driver to do probe behavior, it will attach one matched phy driver. phy_attach_direct()
>> is to obtain phy driver reference and bind phy driver, if try_module_get() execute on success, the reference
>> count is added; if failed, the driver can't be attached to this NIC, and it can't added the phy driver
>> reference count. So before try_module_get obtains a reference, phy driver can't can't be bound to this NIC.
>> when the phy driver is attached to NIC, the reference count is added, if someone remove phy driver directly,
>> it will be failed because reference count is not equal to 0.
>>
>> An example of call trace when there is NIC driver to attch one phy driver:
>> hns_nic_dev_probe->hns_nic_try_get_ae->hns_nic_init_phy->of_phy_connect->phy_connect_direct->phy_attach_direct
>>
>> Consider the steps of phy driver(marvell.ko) added and removed, and NIC driver(hns_enet_drv.ko) added and removed:
>> 1)insmod marvell ref=0
>> 2)insmod hns_enet_drv ref=1
>> 3)rmmod marvell (should not on success, ref=1)
>> 4)rmmod hns_enet_drv ref=0
>> 5)rmmod marvell (should on success, because ref=0)
>>
>> if we don't add the reference count in phy_attach_direct(the second step ref=0), so the third step rmmod marvell will
>> be panic, because there is one user remain use marvell driver and phy_stat_machine use the NULL drv pointer.
>>
>>>
>>>> +
>>>> get_device(d);
>>>>
>>>> /* Assume that if there is no driver, that it doesn't
>>>> @@ -921,6 +926,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>>
>>>> error:
>>>> put_device(d);
>>>> + module_put(d->driver->owner);
>>>
>>> Are those two in the wrong order ?
>>>
>>>> module_put(bus->owner);
>>>> return err;
>>>> }
>>>> @@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
>>>> bus = phydev->mdio.bus;
>>>>
>>>> put_device(&phydev->mdio.dev);
>>>> + module_put(phydev->mdio.dev.driver->owner);
>>>> module_put(bus->owner);
>>>
>>> Where is this code called from?
>>> You can't call it from the phy driver because the driver can be unloaded
>>> as soon as the last reference is removed.
>>> At that point the code memory is freed.
>>
>> [Mao Wenan] it is called by NIC when it is removed, which aims to disconnect one bound phy driver. If this phy driver
>> is not used for this NIC, reference count should be subtracted, and phy driver can be removed if there is no user.
>> hns_nic_dev_remove->phy_disconnect->phy_detach
>>
>>
>>
>>>
>>>> }
>>>> EXPORT_SYMBOL(phy_detach);
>>>> --
>>>> 2.7.0
>>>>
>>>
>>>
>>> .
>>>
>
> @Florian Fainelli, what's your comments about this patch?
I am trying to reproduce what you are seeing, but at first glance is
looks like an appropriate solution to me. Do you mind giving me a couple
more days?
Thanks!
--
Florian
^ permalink raw reply
* Re: Misalignment, MIPS, and ip_hdr(skb)->version
From: Måns Rullgård @ 2016-12-12 16:31 UTC (permalink / raw)
To: David Laight
Cc: linux-mips@linux-mips.org, Netdev, LKML, David Miller,
WireGuard mailing list, Felix Fietkau
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB023BF78@AcuExch.aculab.com>
David Laight <David.Laight@ACULAB.COM> writes:
> From: Måns Rullgård
>> Sent: 10 December 2016 13:25
> ...
>> I solved this problem in an Ethernet driver by copying the initial part
>> of the packet to an aligned skb and appending the remainder using
>> skb_add_rx_frag(). The kernel network stack only cares about the
>> headers, so the alignment of the packet payload doesn't matter.
>
> That rather depends on where the packet payload ends up.
> It is likely that it will be copied to userspace (or maybe
> into some aligned kernel buffer).
> In which case you get an expensive misaligned copy.
There's not much to be done about that. The only other option is to
bypass DMA entirely, and that's sure to be even worse.
> What do the hardware engineers think the ethernet interface will
> be used for!
Ticking boxes in marketing material.
--
Måns Rullgård
^ permalink raw reply
* Re: [PATCH 0/2] net: ethernet: hisilicon: set dev->dev.parent before PHY connect
From: Florian Fainelli @ 2016-12-12 16:29 UTC (permalink / raw)
To: Dongpo Li, yisen.zhuang, salil.mehta, davem
Cc: xuejiancheng, benjamin.chenhao, caizhiyong, netdev, linux-kernel
In-Reply-To: <1481544223-207906-1-git-send-email-lidongpo@hisilicon.com>
On 12/12/2016 04:03 AM, Dongpo Li wrote:
> This patch series builds atop:
> ec988ad78ed6d184a7f4ca6b8e962b0e8f1de461 ("phy: Don't increment MDIO bus
> refcount unless it's a different owner")
>
> I have checked all the hisilicon ethernet driver and found only two drivers
> need to be fixed to make sure set dev->dev.parent before PHY connect.
Thanks for doing this, these two drivers did not show up on my list
because I did not see them calling SET_NETDEV_DEV() so late.
--
Florian
^ permalink raw reply
* Re: [PATCH 2/2] net: ethernet: hip04: Call SET_NETDEV_DEV()
From: Florian Fainelli @ 2016-12-12 16:28 UTC (permalink / raw)
To: Dongpo Li, yisen.zhuang, salil.mehta, davem
Cc: xuejiancheng, benjamin.chenhao, caizhiyong, netdev, linux-kernel
In-Reply-To: <1481544223-207906-3-git-send-email-lidongpo@hisilicon.com>
On 12/12/2016 04:03 AM, Dongpo Li wrote:
> The hip04 driver calls into PHYLIB which now checks for
> net_device->dev.parent, so make sure we do set it before calling into
> any MDIO/PHYLIB related function.
>
> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a different owner")
> Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH 1/2] net: ethernet: hisi_femac: Call SET_NETDEV_DEV()
From: Florian Fainelli @ 2016-12-12 16:28 UTC (permalink / raw)
To: Dongpo Li, yisen.zhuang, salil.mehta, davem
Cc: xuejiancheng, benjamin.chenhao, caizhiyong, netdev, linux-kernel
In-Reply-To: <1481544223-207906-2-git-send-email-lidongpo@hisilicon.com>
On 12/12/2016 04:03 AM, Dongpo Li wrote:
> The hisi_femac driver calls into PHYLIB which now checks for
> net_device->dev.parent, so make sure we do set it before calling into
> any MDIO/PHYLIB related function.
>
> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a different owner")
> Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: Re: Synopsys Ethernet QoS
From: Niklas Cassel @ 2016-12-12 16:25 UTC (permalink / raw)
To: Joao Pinto, Florian Fainelli, Andy Shevchenko
Cc: David Miller, Giuseppe CAVALLARO, larper, rabinv, netdev,
CARLOS.PALMINHA, Jie.Deng1, Stephen Warren, pavel
In-Reply-To: <f5357ea2-a295-ab08-046e-f8b8f6ca4344@synopsys.com>
On 12/12/2016 11:19 AM, Joao Pinto wrote:
> Hi,
>
> Às 1:44 AM de 12/10/2016, Florian Fainelli escreveu:
>> Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
>>> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>
>>>> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe)
>>>> did
>>>> actually pioneer the upstreaming effort, but it is good to see people
>>>> from Synopsys willing to fix that in the future.
>>> Wait, you would like to tell that we have more than 2 drivers for the
>>> same (okay, same vendor) IP?!
>>> It's better to unify them earlier, than have n+ copies.
>> Unfortunately that is the case, see this email:
>>
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg142796.html
>>
>> dwc_eth_qos and stmmac have some overlap. There seems to be work
>> underway to unify these two to begin with.
>>
>>> P.S. Though, I don't see how sxgbe got in the list. First glance on
>>> the code doesn't show similarities.
>> Well samsung/sxgbe looks potentially similar to amd/xgbe, but that's
>> just my cursory look at the code, it may very well be something entirely
>> different. The descriptor formats just look suspiciously similar.
>>
> Thank you for your inputs! Renaming seems to be a hotspot. I agree that maybe
> instead of renaming (breaking retro-compatibility as David and Florian
> mentioned), the best is to move stmmac to synopsys/ after merging *qos* and
> removing it. As Florian mentioned, git is capable of detecting folder restructured.
>
> @Rabin Vincent: Hi Rabin. Since Axis is more familiar with the synopsys/*qos*
> driver would it be possible for you to make an initial analysis of what has to
> be merged into Stmmac? This way the development would speed-up.
I can answer that question.
I've sent out 12 patches to the stmmac driver
(all patches are included in the current net-next tree),
with these patches the stmmac driver works properly on Axis hardware
(we use Synopsys GMAC 4.10a synthesized with multiple TX queues).
stmmac's DT binding has also been extended with properties that
existed in DWC EQoS's DT binding, such as no-pbl-x8, txpbl, rxpbl.
Since we have no problem updating the DTB together with the kernel,
we will simply move to using the start using the stmmac driver,
with stmmac's DT binding.
However, I've noticed that NVIDIA has extended the DWC EQoS DT binding,
I don't how easy it would be for them to switch to stmmac's DT binding.
(Adding Stephen Warren to CC.)
The reset sequence that Lars Persson was worried about is not an issue
with the stmmac driver.
There are some performance problems with the stmmac driver though:
When running iperf3 with 3 streams:
iperf3 -c 192.168.0.90 -P 3 -t 30
iperf3 -c 192.168.0.90 -P 3 -t 30 -R
I get really bad fairness between the streams.
This appears to be an issue with how TX IRQ coalescing is implemented in stmmac.
Disabling TX IRQ coalescing in the stmmac driver makes the problem go away.
We have a local patch that implements TX IRQ coalescing in the dwceqos driver,
and we don't see the same problem.
Also netperf TCP_RR and UDP_RR gives really bad results compared to the
dwceqos driver (without IRQ coalescing).
2000 transactions/sec vs 9000 transactions/sec.
Turning TX IRQ coalescing off and RX interrupt watchdog off in stmmac
gives the same performance. I guess it's a trade off, low CPU usage
vs low latency, so I don't know how important TCP_RR/UDP_RR really is.
The best thing would be to get a good working TX IRQ coalesce
implementation with HR timers in stmmac.
Perhaps it should also be investigated if the RX interrupt watchdog
timeout should have a lower default value.
>
> Thanks to all.
>
> Joao
^ permalink raw reply
* Re: [PATCH net-next 1/3] net:dsa:mv88e6xxx: use hashtable to store multicast entries
From: Vivien Didelot @ 2016-12-12 16:22 UTC (permalink / raw)
To: Volodymyr Bendiuga, andrew, f.fainelli, netdev,
volodymyr.bendiuga
Cc: Volodymyr Bendiuga
In-Reply-To: <1481549958-1265-1-git-send-email-volodymyr.bendiuga@gmail.com>
Hi Volodymyr,
Volodymyr Bendiuga <volodymyr.bendiuga@gmail.com> writes:
> +struct pvec_tbl_entry {
> + struct hlist_node entry;
> + u32 key_crc32; /* key */
> + u16 pvec;
> + struct pvec_tbl_key {
> + u8 addr[ETH_ALEN];
> + u16 fid;
> + } key;
> +};
> +
> struct mv88e6xxx_atu_entry {
> u16 fid;
> u8 state;
Also, if we were to cache some values in mv88e6xxx, I'd use the existing
structures which match the hardware definition. Easier to understand.
mv88e6xxx_atu_entry already represents an ATU entry with its port
vector, FID, MAC address and more. Do you think it can be used here?
Thanks,
Vivien
^ permalink raw reply
* RE: Misalignment, MIPS, and ip_hdr(skb)->version
From: David Laight @ 2016-12-12 16:19 UTC (permalink / raw)
To: 'Måns Rullgård', Felix Fietkau
Cc: Jason A. Donenfeld, David Miller, Netdev, WireGuard mailing list,
LKML, linux-mips@linux-mips.org
In-Reply-To: <yw1x37hvykzk.fsf@unicorn.mansr.com>
From: Måns Rullgård
> Sent: 10 December 2016 13:25
...
> I solved this problem in an Ethernet driver by copying the initial part
> of the packet to an aligned skb and appending the remainder using
> skb_add_rx_frag(). The kernel network stack only cares about the
> headers, so the alignment of the packet payload doesn't matter.
That rather depends on where the packet payload ends up.
It is likely that it will be copied to userspace (or maybe
into some aligned kernel buffer).
In which case you get an expensive misaligned copy.
Encapsulation protocols not using headers that are multiples
of 4 bytes is as stupid as ethernet hardware that can't place
the mac address on a 4n+2 boundary.
The latter is particularly stupid when it happens on embedded
silicon with a processor that can only do aligned memory accesses.
What do the hardware engineers think the ethernet interface will
be used for!
David
^ permalink raw reply
* [PATCHv2 5/5] sh_eth: enable wake-on-lan for sh7763
From: Niklas Söderlund @ 2016-12-12 16:09 UTC (permalink / raw)
To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
Cc: Geert Uytterhoeven, Niklas Söderlund
In-Reply-To: <20161212160931.6478-1-niklas.soderlund+renesas@ragnatech.se>
This is based on public datasheet for sh7763 which shows it have the
same behavior and registers for WoL as other versions of sh_eth.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
drivers/net/ethernet/renesas/sh_eth.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 927de77..6e80457 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -851,6 +851,7 @@ static struct sh_eth_cpu_data sh7763_data = {
.no_ade = 1,
.tsu = 1,
.irq_flags = IRQF_SHARED,
+ .magic = 1,
};
static struct sh_eth_cpu_data sh7619_data = {
--
2.10.2
^ permalink raw reply related
* [PATCHv2 3/5] sh_eth: enable wake-on-lan for r8a7740/armadillo
From: Niklas Söderlund @ 2016-12-12 16:09 UTC (permalink / raw)
To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
Cc: Geert Uytterhoeven, Niklas Söderlund
In-Reply-To: <20161212160931.6478-1-niklas.soderlund+renesas@ragnatech.se>
Geert Uytterhoeven reported WoL worked on his Armadillo board.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
drivers/net/ethernet/renesas/sh_eth.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 348ed22..fafde6f 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -577,6 +577,7 @@ static struct sh_eth_cpu_data r8a7740_data = {
.tsu = 1,
.select_mii = 1,
.shift_rd0 = 1,
+ .magic = 1,
};
/* There is CPU dependent code */
--
2.10.2
^ permalink raw reply related
* [PATCHv2 0/5] sh_eth: add wake-on-lan support via magic packet
From: Niklas Söderlund @ 2016-12-12 16:09 UTC (permalink / raw)
To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
Cc: Geert Uytterhoeven, Niklas Söderlund
Hi,
This series adds support for Wake-on-Lan using Magic Packet for a few
models of the sh_eth driver. Patch 1/5 adds generic support to control
and support WoL while patches 2/5 - 5/5 enable different models.
Changes since v1.
- Spit generic WoL functionality and device enablement to different
patches.
- Enable more devices then Gen2 after feedback from Geert and
datasheets.
- Do not set mdp->irq_enabled = false and remove specific MagicPacket
interrupt clearing, instead let sh_eth_error() clear the interrupt as
for other EMAC interrupts, thanks Sergei for the suggestion.
- Use the original return logic in sh_eth_resume().
- Moved sh_eth_private variable *clk to top of data structure to avoid
possible gaps due to alignment restrictions.
- Make wol_enabled in sh_eth_private part of the already existing
bitfield instead of a bool.
- Do not initiate mdp->wol_enabled to 0, the struct is kzalloc'ed so
it's already set to 0.
Niklas Söderlund (5):
sh_eth: add generic wake-on-lan support via magic packet
sh_eth: enable wake-on-lan for Gen2 devices
sh_eth: enable wake-on-lan for r8a7740/armadillo
sh_eth: enable wake-on-lan for sh7743
sh_eth: enable wake-on-lan for sh7763
drivers/net/ethernet/renesas/sh_eth.c | 121 +++++++++++++++++++++++++++++++---
drivers/net/ethernet/renesas/sh_eth.h | 3 +
2 files changed, 114 insertions(+), 10 deletions(-)
--
2.10.2
^ permalink raw reply
* [PATCHv2 4/5] sh_eth: enable wake-on-lan for sh7743
From: Niklas Söderlund @ 2016-12-12 16:09 UTC (permalink / raw)
To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
Cc: Geert Uytterhoeven, Niklas Söderlund
In-Reply-To: <20161212160931.6478-1-niklas.soderlund+renesas@ragnatech.se>
This is based on public datasheet for sh7743 which shows it have the
same behavior and registers for WoL as other versions of sh_eth.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
drivers/net/ethernet/renesas/sh_eth.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index fafde6f..927de77 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -822,6 +822,7 @@ static struct sh_eth_cpu_data sh7734_data = {
.tsu = 1,
.hw_crc = 1,
.select_mii = 1,
+ .magic = 1,
};
/* SH7763 */
--
2.10.2
^ permalink raw reply related
* [PATCHv2 2/5] sh_eth: enable wake-on-lan for Gen2 devices
From: Niklas Söderlund @ 2016-12-12 16:09 UTC (permalink / raw)
To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
Cc: Geert Uytterhoeven, Niklas Söderlund
In-Reply-To: <20161212160931.6478-1-niklas.soderlund+renesas@ragnatech.se>
Tested on Gen2 r8a7791/Koelsch.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
drivers/net/ethernet/renesas/sh_eth.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 87640b9..348ed22 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -624,8 +624,9 @@ static struct sh_eth_cpu_data r8a779x_data = {
.register_type = SH_ETH_REG_FAST_RCAR,
- .ecsr_value = ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD,
- .ecsipr_value = ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP,
+ .ecsr_value = ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD | ECSR_MPD,
+ .ecsipr_value = ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP |
+ ECSIPR_MPDIP,
.eesipr_value = 0x01ff009f,
.tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
@@ -641,6 +642,7 @@ static struct sh_eth_cpu_data r8a779x_data = {
.tpauser = 1,
.hw_swap = 1,
.rmiimode = 1,
+ .magic = 1,
};
#endif /* CONFIG_OF */
--
2.10.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox