* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Jonathan Toppins @ 2016-12-12 16:54 UTC (permalink / raw)
To: Selvin Xavier, dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
On 12/09/2016 01:47 AM, Selvin Xavier wrote:
> This series introduces the RoCE driver for the Broadcom
> NetXtreme-E 10/25/40/50 gigabit RoCE HCAs.
> This driver is dependent on the bnxt_en NIC driver and is
> based on the bnxt_re branch in Doug's repository. bnxt_en changes
> required for this patch series is already available in this branch.
>
> I am preparing a git repository with these changes as per Jason's
> comment and will share the details later today.
>
> v1-> v2:
> * The license text in each file updated to reflect Dual license.
> * Makefile and Kconfig changes are pushed to the last patch
> * Moved bnxt_re_uverbs_abi.h to include/uapi/rdma folder
> * Remove duplicate structure definitions from bnxt_re_hsi.h as
> it is available in the corresponding bnxt_en header file (bnxt_hsi.h)
> * Removed some unused code reported during code review.
> * Fixed few sparse warnings
>
I get the following sparse errors (filtered for only bnxt_re ones),
please let me know if they are false positives:
$ make C=2 drivers/net/ethernet/broadcom/bnxt/bnxt_en.ko
drivers/infiniband/hw/bnxtre/bnxt_re.ko
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
CHECK arch/x86/purgatory/purgatory.c
[...]
CHECK arch/x86/purgatory/sha256.c
CHECK arch/x86/purgatory/string.c
[...]
CHK include/generated/bounds.h
CHK include/generated/timeconst.h
CHK include/generated/asm-offsets.h
CALL scripts/checksyscalls.sh
CHECK scripts/mod/empty.c
CHECK drivers/net/ethernet/broadcom/bnxt/bnxt.c
CHECK drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
CHECK drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
CHECK drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
CHECK drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
MODPOST 2 modules
CHECK drivers/infiniband/hw/bnxtre/bnxt_re_main.c
CHECK drivers/infiniband/hw/bnxtre/bnxt_re_ib_verbs.c
[...]
CHECK drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.c
CHECK drivers/infiniband/hw/bnxtre/bnxt_qplib_res.c
drivers/infiniband/hw/bnxtre/bnxt_qplib_res.c:729:6: warning: symbol
'bnxt_qplib_cleanup_pkey_tbl' was not declared. Should it be static?
CHECK drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c
CHECK drivers/infiniband/hw/bnxtre/bnxt_qplib_sp.c
CHECK drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c
drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1015:22: warning: context
imbalance in 'bnxt_qplib_lock_cqs' - wrong count at exit
drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1030:28: warning: context
imbalance in 'bnxt_qplib_unlock_cqs' - unexpected unlock
MODPOST 2 modules
-Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: fib_frontend: Add network specific broadcasts, when it takes a sense
From: Hannes Frederic Sowa @ 2016-12-12 16:52 UTC (permalink / raw)
To: Brandon Philips, Jiri Benc; +Cc: netdev, Tom Denham, Aaron Levy, Brad Ison
In-Reply-To: <CAD2oYtPPsvHgitejXTMAYa8qVvkibdPruEUcv8hhhiCfgfOvvw@mail.gmail.com>
On 12.12.2016 15:44, Brandon Philips wrote:
> On Mon, Dec 12, 2016 at 9:30 AM, Jiri Benc <jbenc@redhat.com> wrote:
>> On Fri, 9 Dec 2016 15:41:52 -0800, Brandon Philips wrote:
>>> The issue we have: when creating the VXLAN interface and assigning it
>>> an address we see a broadcast route being added by the Kernel. For
>>> example if we have 10.4.0.0/16 a broadcast route to 10.4.0.0 is
>>> created. This route is unwanted because we assign 10.4.0.0 to one of
>>> our VXLAN interfaces.
>>
>> Are you saying you're trying to assign the IP address 10.4.0.0/16 as a
>> unicast address to an interface? Then you'll run into way more problems
>> than the one you're describing. You can't have host part of the IP
>> address consisting of all zeros (or all ones). Just don't do it. Choose
>> a valid IP address instead.
>
> Yes, this is what we are doing; it is because of an upstream, to us,
> address assignment so I will figure it out upstream.
>
> Regardless, it is hard to find an RFC that says "simply don't do this
> because _____". The closest I could find was RFC 922 after sending
> this which says:
>
> "There is probably no reason for such addresses to appear anywhere but
> as the source address of an ICMP Information".
Alternatively you can renumber the network to use /32 and add the
unicast routes for your /16 yourself.
Bye,
Hannes
^ permalink raw reply
* RE: Re: Synopsys Ethernet QoS
From: Stephen Warren @ 2016-12-12 16:46 UTC (permalink / raw)
To: Niklas Cassel, Joao Pinto, Florian Fainelli, Andy Shevchenko
Cc: David Miller, Giuseppe CAVALLARO, larper@axis.com,
rabinv@axis.com, netdev, CARLOS.PALMINHA@synopsys.com,
Jie.Deng1@synopsys.com, pavel@ucw.cz, Thierry Reding
In-Reply-To: <1d445ec1-deb8-6e36-39c4-6813c446095f@axis.com>
Niklas Cassel wrote at Monday, December 12, 2016 9:25 AM:
...
> However, I've noticed that NVIDIA has extended the DWC EQoS DT binding,
> I don't how easy it would be for them to switch to stmmac's DT binding.
> (Adding Stephen Warren to CC.)
I don't believe there's any issue switching drivers, so long as the new one
works on our HW. However, we can't switch DT binding since that's an ABI.
So, if we switch drivers, the "new" driver needs to support all existing
DT bindings.
FWIW, I'd recommend that we don't name the "new" driver stmmac or anything
like that. Rather, name it after the IP block so that any new user of the same
IP block will find the name they expect in the source tree, and just use it.
Renaming the "new" driver to dwc_eqos or similar might be one way to achieve
that.
--
nvpublic
^ permalink raw reply
* Re: [PATCH net-next 1/3] net:dsa:mv88e6xxx: use hashtable to store multicast entries
From: Florian Fainelli @ 2016-12-12 16:37 UTC (permalink / raw)
To: Volodymyr Bendiuga, Vivien Didelot
Cc: Volodymyr Bendiuga, andrew, netdev, Volodymyr Bendiuga
In-Reply-To: <CAMr9Lbp5eCg1oyWGN+uiDEcF0VZuKUi87FH6JYTGj6pL82R+Mw@mail.gmail.com>
On 12/12/2016 07:22 AM, Volodymyr Bendiuga wrote:
> Hi,
>
> I apologise for incorrectly formatted patch, I will fix and resend it.
> The problem with the ATU right now is that it is too slow when inserting
> entries.
> When the OS boots up, it might insert some multicast entries into the
> atu (if
> they are preconfigured by user). I run a test with 10 mc entries being
> configured for
> each port (13 ports), and it took 15 seconds, which made system quite
> slow on responding to
> other commands, as it has been inserting mc entries. The implementation
> with hashtable
> made insert command for 13 ports and 10 entries per port about 700 msec
> long.
Just wondering how do you achieve such speed up? What part of using a
hashtable allows you not to write down all 10 MC entries across the
ports? I would assume that the number of MDIO (is that the bus you are
using here?) operations would be identical.
Seeing such a change makes me wonder if we should not try to push some
of this hashtable abstraction (provided that we agree we want it) at a
higher layer, like net/dsa/slave.c?
Thanks!
--
Florian
^ permalink raw reply
* Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device
From: Florian Fainelli @ 2016-12-12 16:33 UTC (permalink / raw)
To: maowenan, David Laight, netdev@vger.kernel.org,
dingtianhong@huawei.com, weiyongjun (A)
In-Reply-To: <7902ba31-de5d-27cf-f631-4a88a78b246f@huawei.com>
On 12/12/2016 12:49 AM, maowenan wrote:
>
>
> On 2016/12/5 16:47, maowenan wrote:
>>
>>
>> On 2016/12/2 17:45, David Laight wrote:
>>> From: Mao Wenan
>>>> Sent: 30 November 2016 10:23
>>>> The nic in my board use the phy dev from marvell, and the system will
>>>> load the marvell phy driver automatically, but when I remove the phy
>>>> drivers, the system immediately panic:
>>>> Call trace:
>>>> [ 2582.834493] [<ffff800000715384>] phy_state_machine+0x3c/0x438 [
>>>> 2582.851754] [<ffff8000000db3b8>] process_one_work+0x150/0x428 [
>>>> 2582.868188] [<ffff8000000db7d4>] worker_thread+0x144/0x4b0 [
>>>> 2582.883882] [<ffff8000000e1d0c>] kthread+0xfc/0x110
>>>>
>>>> there should be proper reference counting in place to avoid that.
>>>> I found that phy_attach_direct() forgets to add phy device driver
>>>> reference count, and phy_detach() forgets to subtract reference count.
>>>> This patch is to fix this bug, after that panic is disappeared when remove
>>>> marvell.ko
>>>>
>>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>>> ---
>>>> drivers/net/phy/phy_device.c | 7 +++++++
>>>> 1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>> index 1a4bf8a..a7ec7c2 100644
>>>> --- a/drivers/net/phy/phy_device.c
>>>> +++ b/drivers/net/phy/phy_device.c
>>>> @@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>> return -EIO;
>>>> }
>>>>
>>>> + if (!try_module_get(d->driver->owner)) {
>>>> + dev_err(&dev->dev, "failed to get the device driver module\n");
>>>> + return -EIO;
>>>> + }
>>>
>>> If this is the phy code, what stops the phy driver being unloaded
>>> before the try_module_get() obtains a reference.
>>> If it isn't the phy driver then there ought to be a reference count obtained
>>> when the phy driver is located (by whatever decides which phy driver to use).
>>> Even if that code later releases its reference (it probably shouldn't on success)
>>> then you can't fail to get an extra reference here.
>>
>> [Mao Wenan]Yes, this is phy code, in function phy_attach_direct(), drivers/net/phy/phy_device.c.
>> when one NIC driver to do probe behavior, it will attach one matched phy driver. phy_attach_direct()
>> is to obtain phy driver reference and bind phy driver, if try_module_get() execute on success, the reference
>> count is added; if failed, the driver can't be attached to this NIC, and it can't added the phy driver
>> reference count. So before try_module_get obtains a reference, phy driver can't can't be bound to this NIC.
>> when the phy driver is attached to NIC, the reference count is added, if someone remove phy driver directly,
>> it will be failed because reference count is not equal to 0.
>>
>> An example of call trace when there is NIC driver to attch one phy driver:
>> hns_nic_dev_probe->hns_nic_try_get_ae->hns_nic_init_phy->of_phy_connect->phy_connect_direct->phy_attach_direct
>>
>> Consider the steps of phy driver(marvell.ko) added and removed, and NIC driver(hns_enet_drv.ko) added and removed:
>> 1)insmod marvell ref=0
>> 2)insmod hns_enet_drv ref=1
>> 3)rmmod marvell (should not on success, ref=1)
>> 4)rmmod hns_enet_drv ref=0
>> 5)rmmod marvell (should on success, because ref=0)
>>
>> if we don't add the reference count in phy_attach_direct(the second step ref=0), so the third step rmmod marvell will
>> be panic, because there is one user remain use marvell driver and phy_stat_machine use the NULL drv pointer.
>>
>>>
>>>> +
>>>> get_device(d);
>>>>
>>>> /* Assume that if there is no driver, that it doesn't
>>>> @@ -921,6 +926,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>>
>>>> error:
>>>> put_device(d);
>>>> + module_put(d->driver->owner);
>>>
>>> Are those two in the wrong order ?
>>>
>>>> module_put(bus->owner);
>>>> return err;
>>>> }
>>>> @@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
>>>> bus = phydev->mdio.bus;
>>>>
>>>> put_device(&phydev->mdio.dev);
>>>> + module_put(phydev->mdio.dev.driver->owner);
>>>> module_put(bus->owner);
>>>
>>> Where is this code called from?
>>> You can't call it from the phy driver because the driver can be unloaded
>>> as soon as the last reference is removed.
>>> At that point the code memory is freed.
>>
>> [Mao Wenan] it is called by NIC when it is removed, which aims to disconnect one bound phy driver. If this phy driver
>> is not used for this NIC, reference count should be subtracted, and phy driver can be removed if there is no user.
>> hns_nic_dev_remove->phy_disconnect->phy_detach
>>
>>
>>
>>>
>>>> }
>>>> EXPORT_SYMBOL(phy_detach);
>>>> --
>>>> 2.7.0
>>>>
>>>
>>>
>>> .
>>>
>
> @Florian Fainelli, what's your comments about this patch?
I am trying to reproduce what you are seeing, but at first glance is
looks like an appropriate solution to me. Do you mind giving me a couple
more days?
Thanks!
--
Florian
^ permalink raw reply
* Re: Misalignment, MIPS, and ip_hdr(skb)->version
From: Måns Rullgård @ 2016-12-12 16:31 UTC (permalink / raw)
To: David Laight
Cc: linux-mips@linux-mips.org, Netdev, LKML, David Miller,
WireGuard mailing list, Felix Fietkau
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB023BF78@AcuExch.aculab.com>
David Laight <David.Laight@ACULAB.COM> writes:
> From: Måns Rullgård
>> Sent: 10 December 2016 13:25
> ...
>> I solved this problem in an Ethernet driver by copying the initial part
>> of the packet to an aligned skb and appending the remainder using
>> skb_add_rx_frag(). The kernel network stack only cares about the
>> headers, so the alignment of the packet payload doesn't matter.
>
> That rather depends on where the packet payload ends up.
> It is likely that it will be copied to userspace (or maybe
> into some aligned kernel buffer).
> In which case you get an expensive misaligned copy.
There's not much to be done about that. The only other option is to
bypass DMA entirely, and that's sure to be even worse.
> What do the hardware engineers think the ethernet interface will
> be used for!
Ticking boxes in marketing material.
--
Måns Rullgård
^ permalink raw reply
* Re: [PATCH 0/2] net: ethernet: hisilicon: set dev->dev.parent before PHY connect
From: Florian Fainelli @ 2016-12-12 16:29 UTC (permalink / raw)
To: Dongpo Li, yisen.zhuang, salil.mehta, davem
Cc: xuejiancheng, benjamin.chenhao, caizhiyong, netdev, linux-kernel
In-Reply-To: <1481544223-207906-1-git-send-email-lidongpo@hisilicon.com>
On 12/12/2016 04:03 AM, Dongpo Li wrote:
> This patch series builds atop:
> ec988ad78ed6d184a7f4ca6b8e962b0e8f1de461 ("phy: Don't increment MDIO bus
> refcount unless it's a different owner")
>
> I have checked all the hisilicon ethernet driver and found only two drivers
> need to be fixed to make sure set dev->dev.parent before PHY connect.
Thanks for doing this, these two drivers did not show up on my list
because I did not see them calling SET_NETDEV_DEV() so late.
--
Florian
^ permalink raw reply
* Re: [PATCH 2/2] net: ethernet: hip04: Call SET_NETDEV_DEV()
From: Florian Fainelli @ 2016-12-12 16:28 UTC (permalink / raw)
To: Dongpo Li, yisen.zhuang, salil.mehta, davem
Cc: xuejiancheng, benjamin.chenhao, caizhiyong, netdev, linux-kernel
In-Reply-To: <1481544223-207906-3-git-send-email-lidongpo@hisilicon.com>
On 12/12/2016 04:03 AM, Dongpo Li wrote:
> The hip04 driver calls into PHYLIB which now checks for
> net_device->dev.parent, so make sure we do set it before calling into
> any MDIO/PHYLIB related function.
>
> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a different owner")
> Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH 1/2] net: ethernet: hisi_femac: Call SET_NETDEV_DEV()
From: Florian Fainelli @ 2016-12-12 16:28 UTC (permalink / raw)
To: Dongpo Li, yisen.zhuang, salil.mehta, davem
Cc: xuejiancheng, benjamin.chenhao, caizhiyong, netdev, linux-kernel
In-Reply-To: <1481544223-207906-2-git-send-email-lidongpo@hisilicon.com>
On 12/12/2016 04:03 AM, Dongpo Li wrote:
> The hisi_femac driver calls into PHYLIB which now checks for
> net_device->dev.parent, so make sure we do set it before calling into
> any MDIO/PHYLIB related function.
>
> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a different owner")
> Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: Re: Synopsys Ethernet QoS
From: Niklas Cassel @ 2016-12-12 16:25 UTC (permalink / raw)
To: Joao Pinto, Florian Fainelli, Andy Shevchenko
Cc: David Miller, Giuseppe CAVALLARO, larper, rabinv, netdev,
CARLOS.PALMINHA, Jie.Deng1, Stephen Warren, pavel
In-Reply-To: <f5357ea2-a295-ab08-046e-f8b8f6ca4344@synopsys.com>
On 12/12/2016 11:19 AM, Joao Pinto wrote:
> Hi,
>
> Às 1:44 AM de 12/10/2016, Florian Fainelli escreveu:
>> Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
>>> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>
>>>> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe)
>>>> did
>>>> actually pioneer the upstreaming effort, but it is good to see people
>>>> from Synopsys willing to fix that in the future.
>>> Wait, you would like to tell that we have more than 2 drivers for the
>>> same (okay, same vendor) IP?!
>>> It's better to unify them earlier, than have n+ copies.
>> Unfortunately that is the case, see this email:
>>
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg142796.html
>>
>> dwc_eth_qos and stmmac have some overlap. There seems to be work
>> underway to unify these two to begin with.
>>
>>> P.S. Though, I don't see how sxgbe got in the list. First glance on
>>> the code doesn't show similarities.
>> Well samsung/sxgbe looks potentially similar to amd/xgbe, but that's
>> just my cursory look at the code, it may very well be something entirely
>> different. The descriptor formats just look suspiciously similar.
>>
> Thank you for your inputs! Renaming seems to be a hotspot. I agree that maybe
> instead of renaming (breaking retro-compatibility as David and Florian
> mentioned), the best is to move stmmac to synopsys/ after merging *qos* and
> removing it. As Florian mentioned, git is capable of detecting folder restructured.
>
> @Rabin Vincent: Hi Rabin. Since Axis is more familiar with the synopsys/*qos*
> driver would it be possible for you to make an initial analysis of what has to
> be merged into Stmmac? This way the development would speed-up.
I can answer that question.
I've sent out 12 patches to the stmmac driver
(all patches are included in the current net-next tree),
with these patches the stmmac driver works properly on Axis hardware
(we use Synopsys GMAC 4.10a synthesized with multiple TX queues).
stmmac's DT binding has also been extended with properties that
existed in DWC EQoS's DT binding, such as no-pbl-x8, txpbl, rxpbl.
Since we have no problem updating the DTB together with the kernel,
we will simply move to using the start using the stmmac driver,
with stmmac's DT binding.
However, I've noticed that NVIDIA has extended the DWC EQoS DT binding,
I don't how easy it would be for them to switch to stmmac's DT binding.
(Adding Stephen Warren to CC.)
The reset sequence that Lars Persson was worried about is not an issue
with the stmmac driver.
There are some performance problems with the stmmac driver though:
When running iperf3 with 3 streams:
iperf3 -c 192.168.0.90 -P 3 -t 30
iperf3 -c 192.168.0.90 -P 3 -t 30 -R
I get really bad fairness between the streams.
This appears to be an issue with how TX IRQ coalescing is implemented in stmmac.
Disabling TX IRQ coalescing in the stmmac driver makes the problem go away.
We have a local patch that implements TX IRQ coalescing in the dwceqos driver,
and we don't see the same problem.
Also netperf TCP_RR and UDP_RR gives really bad results compared to the
dwceqos driver (without IRQ coalescing).
2000 transactions/sec vs 9000 transactions/sec.
Turning TX IRQ coalescing off and RX interrupt watchdog off in stmmac
gives the same performance. I guess it's a trade off, low CPU usage
vs low latency, so I don't know how important TCP_RR/UDP_RR really is.
The best thing would be to get a good working TX IRQ coalesce
implementation with HR timers in stmmac.
Perhaps it should also be investigated if the RX interrupt watchdog
timeout should have a lower default value.
>
> Thanks to all.
>
> Joao
^ permalink raw reply
* Re: [PATCH net-next 1/3] net:dsa:mv88e6xxx: use hashtable to store multicast entries
From: Vivien Didelot @ 2016-12-12 16:22 UTC (permalink / raw)
To: Volodymyr Bendiuga, andrew, f.fainelli, netdev,
volodymyr.bendiuga
Cc: Volodymyr Bendiuga
In-Reply-To: <1481549958-1265-1-git-send-email-volodymyr.bendiuga@gmail.com>
Hi Volodymyr,
Volodymyr Bendiuga <volodymyr.bendiuga@gmail.com> writes:
> +struct pvec_tbl_entry {
> + struct hlist_node entry;
> + u32 key_crc32; /* key */
> + u16 pvec;
> + struct pvec_tbl_key {
> + u8 addr[ETH_ALEN];
> + u16 fid;
> + } key;
> +};
> +
> struct mv88e6xxx_atu_entry {
> u16 fid;
> u8 state;
Also, if we were to cache some values in mv88e6xxx, I'd use the existing
structures which match the hardware definition. Easier to understand.
mv88e6xxx_atu_entry already represents an ATU entry with its port
vector, FID, MAC address and more. Do you think it can be used here?
Thanks,
Vivien
^ permalink raw reply
* RE: Misalignment, MIPS, and ip_hdr(skb)->version
From: David Laight @ 2016-12-12 16:19 UTC (permalink / raw)
To: 'Måns Rullgård', Felix Fietkau
Cc: Jason A. Donenfeld, David Miller, Netdev, WireGuard mailing list,
LKML, linux-mips@linux-mips.org
In-Reply-To: <yw1x37hvykzk.fsf@unicorn.mansr.com>
From: Måns Rullgård
> Sent: 10 December 2016 13:25
...
> I solved this problem in an Ethernet driver by copying the initial part
> of the packet to an aligned skb and appending the remainder using
> skb_add_rx_frag(). The kernel network stack only cares about the
> headers, so the alignment of the packet payload doesn't matter.
That rather depends on where the packet payload ends up.
It is likely that it will be copied to userspace (or maybe
into some aligned kernel buffer).
In which case you get an expensive misaligned copy.
Encapsulation protocols not using headers that are multiples
of 4 bytes is as stupid as ethernet hardware that can't place
the mac address on a 4n+2 boundary.
The latter is particularly stupid when it happens on embedded
silicon with a processor that can only do aligned memory accesses.
What do the hardware engineers think the ethernet interface will
be used for!
David
^ permalink raw reply
* [PATCHv2 5/5] sh_eth: enable wake-on-lan for sh7763
From: Niklas Söderlund @ 2016-12-12 16:09 UTC (permalink / raw)
To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
Cc: Geert Uytterhoeven, Niklas Söderlund
In-Reply-To: <20161212160931.6478-1-niklas.soderlund+renesas@ragnatech.se>
This is based on public datasheet for sh7763 which shows it have the
same behavior and registers for WoL as other versions of sh_eth.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
drivers/net/ethernet/renesas/sh_eth.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 927de77..6e80457 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -851,6 +851,7 @@ static struct sh_eth_cpu_data sh7763_data = {
.no_ade = 1,
.tsu = 1,
.irq_flags = IRQF_SHARED,
+ .magic = 1,
};
static struct sh_eth_cpu_data sh7619_data = {
--
2.10.2
^ permalink raw reply related
* [PATCHv2 3/5] sh_eth: enable wake-on-lan for r8a7740/armadillo
From: Niklas Söderlund @ 2016-12-12 16:09 UTC (permalink / raw)
To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
Cc: Geert Uytterhoeven, Niklas Söderlund
In-Reply-To: <20161212160931.6478-1-niklas.soderlund+renesas@ragnatech.se>
Geert Uytterhoeven reported WoL worked on his Armadillo board.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
drivers/net/ethernet/renesas/sh_eth.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 348ed22..fafde6f 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -577,6 +577,7 @@ static struct sh_eth_cpu_data r8a7740_data = {
.tsu = 1,
.select_mii = 1,
.shift_rd0 = 1,
+ .magic = 1,
};
/* There is CPU dependent code */
--
2.10.2
^ permalink raw reply related
* [PATCHv2 0/5] sh_eth: add wake-on-lan support via magic packet
From: Niklas Söderlund @ 2016-12-12 16:09 UTC (permalink / raw)
To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
Cc: Geert Uytterhoeven, Niklas Söderlund
Hi,
This series adds support for Wake-on-Lan using Magic Packet for a few
models of the sh_eth driver. Patch 1/5 adds generic support to control
and support WoL while patches 2/5 - 5/5 enable different models.
Changes since v1.
- Spit generic WoL functionality and device enablement to different
patches.
- Enable more devices then Gen2 after feedback from Geert and
datasheets.
- Do not set mdp->irq_enabled = false and remove specific MagicPacket
interrupt clearing, instead let sh_eth_error() clear the interrupt as
for other EMAC interrupts, thanks Sergei for the suggestion.
- Use the original return logic in sh_eth_resume().
- Moved sh_eth_private variable *clk to top of data structure to avoid
possible gaps due to alignment restrictions.
- Make wol_enabled in sh_eth_private part of the already existing
bitfield instead of a bool.
- Do not initiate mdp->wol_enabled to 0, the struct is kzalloc'ed so
it's already set to 0.
Niklas Söderlund (5):
sh_eth: add generic wake-on-lan support via magic packet
sh_eth: enable wake-on-lan for Gen2 devices
sh_eth: enable wake-on-lan for r8a7740/armadillo
sh_eth: enable wake-on-lan for sh7743
sh_eth: enable wake-on-lan for sh7763
drivers/net/ethernet/renesas/sh_eth.c | 121 +++++++++++++++++++++++++++++++---
drivers/net/ethernet/renesas/sh_eth.h | 3 +
2 files changed, 114 insertions(+), 10 deletions(-)
--
2.10.2
^ permalink raw reply
* [PATCHv2 4/5] sh_eth: enable wake-on-lan for sh7743
From: Niklas Söderlund @ 2016-12-12 16:09 UTC (permalink / raw)
To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
Cc: Geert Uytterhoeven, Niklas Söderlund
In-Reply-To: <20161212160931.6478-1-niklas.soderlund+renesas@ragnatech.se>
This is based on public datasheet for sh7743 which shows it have the
same behavior and registers for WoL as other versions of sh_eth.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
drivers/net/ethernet/renesas/sh_eth.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index fafde6f..927de77 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -822,6 +822,7 @@ static struct sh_eth_cpu_data sh7734_data = {
.tsu = 1,
.hw_crc = 1,
.select_mii = 1,
+ .magic = 1,
};
/* SH7763 */
--
2.10.2
^ permalink raw reply related
* [PATCHv2 2/5] sh_eth: enable wake-on-lan for Gen2 devices
From: Niklas Söderlund @ 2016-12-12 16:09 UTC (permalink / raw)
To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
Cc: Geert Uytterhoeven, Niklas Söderlund
In-Reply-To: <20161212160931.6478-1-niklas.soderlund+renesas@ragnatech.se>
Tested on Gen2 r8a7791/Koelsch.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
drivers/net/ethernet/renesas/sh_eth.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 87640b9..348ed22 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -624,8 +624,9 @@ static struct sh_eth_cpu_data r8a779x_data = {
.register_type = SH_ETH_REG_FAST_RCAR,
- .ecsr_value = ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD,
- .ecsipr_value = ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP,
+ .ecsr_value = ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD | ECSR_MPD,
+ .ecsipr_value = ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP |
+ ECSIPR_MPDIP,
.eesipr_value = 0x01ff009f,
.tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
@@ -641,6 +642,7 @@ static struct sh_eth_cpu_data r8a779x_data = {
.tpauser = 1,
.hw_swap = 1,
.rmiimode = 1,
+ .magic = 1,
};
#endif /* CONFIG_OF */
--
2.10.2
^ permalink raw reply related
* [PATCHv2 1/5] sh_eth: add generic wake-on-lan support via magic packet
From: Niklas Söderlund @ 2016-12-12 16:09 UTC (permalink / raw)
To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
Cc: Geert Uytterhoeven, Niklas Söderlund
In-Reply-To: <20161212160931.6478-1-niklas.soderlund+renesas@ragnatech.se>
Add generic functionality to support Wake-on-Lan using MagicPacket which
are supported by at least a few versions of sh_eth. Only add
functionality for WoL, no specific sh_eth version are marked to support
WoL yet.
WoL is enabled in the suspend callback by setting MagicPacket detection
and disabling all interrupts expect MagicPacket. In the resume path the
driver needs to reset the hardware to rearm the WoL logic, this prevents
the driver from simply restoring the registers and to take advantage of
that sh_eth was not suspended to reduce resume time. To reset the
hardware the driver close and reopens the device just like it would do
in a normal suspend/resume scenario without WoL enabled, but it both
close and open the device in the resume callback since the device needs
to be open for WoL to work.
One quirk needed for WoL is that the module clock needs to be prevented
from being switched off by Runtime PM. To keep the clock alive the
suspend callback need to call clk_enable() directly to increase the
usage count of the clock. Then when Runtime PM decreases the clock usage
count it won't reach 0 and be switched off.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
drivers/net/ethernet/renesas/sh_eth.c | 112 +++++++++++++++++++++++++++++++---
drivers/net/ethernet/renesas/sh_eth.h | 3 +
2 files changed, 107 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 05b0dc5..87640b9 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -2199,6 +2199,33 @@ static int sh_eth_set_ringparam(struct net_device *ndev,
return 0;
}
+static void sh_eth_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
+{
+ struct sh_eth_private *mdp = netdev_priv(ndev);
+
+ wol->supported = 0;
+ wol->wolopts = 0;
+
+ if (mdp->cd->magic && mdp->clk) {
+ wol->supported = WAKE_MAGIC;
+ wol->wolopts = mdp->wol_enabled ? WAKE_MAGIC : 0;
+ }
+}
+
+static int sh_eth_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
+{
+ struct sh_eth_private *mdp = netdev_priv(ndev);
+
+ if (!mdp->cd->magic || !mdp->clk || wol->wolopts & ~WAKE_MAGIC)
+ return -EOPNOTSUPP;
+
+ mdp->wol_enabled = !!(wol->wolopts & WAKE_MAGIC);
+
+ device_set_wakeup_enable(&mdp->pdev->dev, mdp->wol_enabled);
+
+ return 0;
+}
+
static const struct ethtool_ops sh_eth_ethtool_ops = {
.get_regs_len = sh_eth_get_regs_len,
.get_regs = sh_eth_get_regs,
@@ -2213,6 +2240,8 @@ static const struct ethtool_ops sh_eth_ethtool_ops = {
.set_ringparam = sh_eth_set_ringparam,
.get_link_ksettings = sh_eth_get_link_ksettings,
.set_link_ksettings = sh_eth_set_link_ksettings,
+ .get_wol = sh_eth_get_wol,
+ .set_wol = sh_eth_set_wol,
};
/* network device open function */
@@ -3017,6 +3046,11 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
goto out_release;
}
+ /* Get clock, if not found that's OK but Wake-On-Lan is unavailable */
+ mdp->clk = devm_clk_get(&pdev->dev, NULL);
+ if (IS_ERR(mdp->clk))
+ mdp->clk = NULL;
+
ndev->base_addr = res->start;
spin_lock_init(&mdp->lock);
@@ -3111,6 +3145,9 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
if (ret)
goto out_napi_del;
+ if (mdp->cd->magic && mdp->clk)
+ device_set_wakeup_capable(&pdev->dev, 1);
+
/* print device information */
netdev_info(ndev, "Base address at 0x%x, %pM, IRQ %d.\n",
(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
@@ -3150,15 +3187,67 @@ static int sh_eth_drv_remove(struct platform_device *pdev)
#ifdef CONFIG_PM
#ifdef CONFIG_PM_SLEEP
+static int sh_eth_wol_setup(struct net_device *ndev)
+{
+ struct sh_eth_private *mdp = netdev_priv(ndev);
+
+ /* Only allow ECI interrupts */
+ synchronize_irq(ndev->irq);
+ napi_disable(&mdp->napi);
+ sh_eth_write(ndev, DMAC_M_ECI, EESIPR);
+
+ /* Enable MagicPacket */
+ sh_eth_modify(ndev, ECMR, 0, ECMR_PMDE);
+
+ /* Increased clock usage so device won't be suspended */
+ clk_enable(mdp->clk);
+
+ return enable_irq_wake(ndev->irq);
+}
+
+static int sh_eth_wol_restore(struct net_device *ndev)
+{
+ struct sh_eth_private *mdp = netdev_priv(ndev);
+ int ret;
+
+ napi_enable(&mdp->napi);
+
+ /* Disable MagicPacket */
+ sh_eth_modify(ndev, ECMR, ECMR_PMDE, 0);
+
+ /* The device need to be reset to restore MagicPacket logic
+ * for next wakeup. If we close and open the device it will
+ * both be reset and all registers restored. This is what
+ * happens during suspend and resume without WoL enabled.
+ */
+ ret = sh_eth_close(ndev);
+ if (ret < 0)
+ return ret;
+ ret = sh_eth_open(ndev);
+ if (ret < 0)
+ return ret;
+
+ /* Restore clock usage count */
+ clk_disable(mdp->clk);
+
+ return disable_irq_wake(ndev->irq);
+}
+
static int sh_eth_suspend(struct device *dev)
{
struct net_device *ndev = dev_get_drvdata(dev);
+ struct sh_eth_private *mdp = netdev_priv(ndev);
int ret = 0;
- if (netif_running(ndev)) {
- netif_device_detach(ndev);
+ if (!netif_running(ndev))
+ return 0;
+
+ netif_device_detach(ndev);
+
+ if (mdp->wol_enabled)
+ ret = sh_eth_wol_setup(ndev);
+ else
ret = sh_eth_close(ndev);
- }
return ret;
}
@@ -3166,14 +3255,21 @@ static int sh_eth_suspend(struct device *dev)
static int sh_eth_resume(struct device *dev)
{
struct net_device *ndev = dev_get_drvdata(dev);
+ struct sh_eth_private *mdp = netdev_priv(ndev);
int ret = 0;
- if (netif_running(ndev)) {
+ if (!netif_running(ndev))
+ return 0;
+
+ if (mdp->wol_enabled)
+ ret = sh_eth_wol_restore(ndev);
+ else
ret = sh_eth_open(ndev);
- if (ret < 0)
- return ret;
- netif_device_attach(ndev);
- }
+
+ if (ret < 0)
+ return ret;
+
+ netif_device_attach(ndev);
return ret;
}
diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h
index d050f37..4ceed00 100644
--- a/drivers/net/ethernet/renesas/sh_eth.h
+++ b/drivers/net/ethernet/renesas/sh_eth.h
@@ -493,6 +493,7 @@ struct sh_eth_cpu_data {
unsigned shift_rd0:1; /* shift Rx descriptor word 0 right by 16 */
unsigned rmiimode:1; /* EtherC has RMIIMODE register */
unsigned rtrate:1; /* EtherC has RTRATE register */
+ unsigned magic:1; /* EtherC has ECMR.PMDE and ECSR.MPD */
};
struct sh_eth_private {
@@ -501,6 +502,7 @@ struct sh_eth_private {
const u16 *reg_offset;
void __iomem *addr;
void __iomem *tsu_addr;
+ struct clk *clk;
u32 num_rx_ring;
u32 num_tx_ring;
dma_addr_t rx_desc_dma;
@@ -529,6 +531,7 @@ struct sh_eth_private {
unsigned no_ether_link:1;
unsigned ether_link_active_low:1;
unsigned is_opened:1;
+ unsigned wol_enabled:1;
};
static inline void sh_eth_soft_swap(char *src, int len)
--
2.10.2
^ permalink raw reply related
* Re: [PATCH net-next 1/3] net:dsa:mv88e6xxx: use hashtable to store multicast entries
From: Andrew Lunn @ 2016-12-12 16:07 UTC (permalink / raw)
To: Volodymyr Bendiuga
Cc: Vivien Didelot, Volodymyr Bendiuga, f.fainelli, netdev,
Volodymyr Bendiuga
In-Reply-To: <CAMr9Lbp5eCg1oyWGN+uiDEcF0VZuKUi87FH6JYTGj6pL82R+Mw@mail.gmail.com>
On Mon, Dec 12, 2016 at 04:22:13PM +0100, Volodymyr Bendiuga wrote:
> Hi,
>
> I apologise for incorrectly formatted patch, I will fix and resend it.
Please don't resend with just the white space fixed. The memory model
is very important, using the fdb_prepare() call to allocate memory,
etc.
Andrew
^ permalink raw reply
* Re: [PATCH] sh_eth: add wake-on-lan support via magic packet
From: Niklas Söderlund @ 2016-12-12 15:49 UTC (permalink / raw)
To: Sergei Shtylyov; +Cc: Simon Horman, netdev, linux-renesas-soc
In-Reply-To: <b0bc6386-45ba-2205-6cd4-dce9a1529124@cogentembedded.com>
Hi Sergei,
Thanks for your feedback.
On 2016-12-11 00:25:41 +0300, Sergei Shtylyov wrote:
> Hello!
>
> On 12/08/2016 05:56 PM, Niklas Söderlund wrote:
>
> > > You only enable the WOL support fo the R-Car gen2 chips but never say that
> > > explicitly, neither in the subject nor here.
> > >
> > > > Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
> > > > ---
> > > > drivers/net/ethernet/renesas/sh_eth.c | 120 +++++++++++++++++++++++++++++++---
> > > > drivers/net/ethernet/renesas/sh_eth.h | 4 ++
> > > > 2 files changed, 116 insertions(+), 8 deletions(-)
> > >
> > > > diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
> > > > index 05b0dc5..3974046 100644
> > > > --- a/drivers/net/ethernet/renesas/sh_eth.c
> > > > +++ b/drivers/net/ethernet/renesas/sh_eth.c
> [...]
> > > > @@ -1657,6 +1658,10 @@ static irqreturn_t sh_eth_interrupt(int irq, void *netdev)
> > > > goto out;
> > > >
> > > > if (!likely(mdp->irq_enabled)) {
> > >
> > > Oops, I guess unlikely(!mdp->irq_enabled) was meant here...
> >
> > I can correct this in a separate patch if you wish.
>
> I'll look into this myself, I think.
OK.
>
> > > + /* Handle MagicPacket interrupt */
> > > + if (sh_eth_read(ndev, ECSR) & ECSR_MPD)
>
> What if it wasn't enabled ATM?
Sorry I don't understand this comment.
>
> [...]
> > > > @@ -3111,6 +3150,10 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
> > > > if (ret)
> > > > goto out_napi_del;
> > > >
> > > > + mdp->wol_enabled = false;
> > >
> > > No need, the '*mdp' was kzalloc'ed.
> >
> > OK, i prefer to explicitly set for easier reading of the code. But if
> > you wish I will remove this in v2.
>
> Yes, remove it please.
Will remove for v2.
>
> > > > @@ -3150,15 +3193,71 @@ static int sh_eth_drv_remove(struct platform_device *pdev)
> > > >
> > > > #ifdef CONFIG_PM
> > > > #ifdef CONFIG_PM_SLEEP
> > > > +static int sh_eth_wol_setup(struct net_device *ndev)
> > > > +{
> > > > + struct sh_eth_private *mdp = netdev_priv(ndev);
> > > > +
> > > > + /* Only allow ECI interrupts */
> > > > + mdp->irq_enabled = false;
> > >
> > > Why 'false' if you enable IRQs below?
> >
> > I mask all interrupts except MagicPacket (ECSIPR_MPDIP) interrupts form
> > the ECI (DMAC_M_ECI) and by setting irq_enabled to false the interrupt
> > handler will only ack any residue interrupt.
>
> I don't see where it ack's anything, it just clears EESIPR and returns in
> this case.
You are right, I must have misread when looking at this.
>
> > This is how it's done in
> > other parts of the driver when disabling interrupts.
>
> Not in all parts of the driver that disable EESIPR interrupts... I must
> confess that I never liked that 'mdp->irq_enabled' flag and still suspect we
> can get things done without it... I need to look at this code again, sigh...
>
> > This is also why I only check for MagicPacket interrupts if irq_enabled
> > is false.
>
> I would have preferred that this was done with the other EMAC interrupts,
> in sh_eth_error().
I removed the check for Magic Packet in sh_eth_interrupt() and running
without setting mdp->irq_enabled = false. sh_eth_error() will then clear
any ECI interrupt so no need to add Magic Packet detection to it since
all that is needed on Magic Packet is to clear the interrupt which
already is done. This works and I can do multiple suspend/resume cycles,
will be in v2 thanks for the suggestion.
>
> > > > + synchronize_irq(ndev->irq);
> > > > + napi_disable(&mdp->napi);
> > > > + sh_eth_write(ndev, DMAC_M_ECI, EESIPR);
> > > > +
> > > > + /* Enable ECI MagicPacket interrupt */
> > > > + sh_eth_write(ndev, ECSIPR_MPDIP, ECSIPR);
>
> I'd prefer if it was always enabled via 'ecsipr_value'.
Will do so in v2.
>
> > > > +
> > > > + /* Enable MagicPacket */
> > > > + sh_eth_modify(ndev, ECMR, 0, ECMR_PMDE);
> > > > +
> > > > + /* Increased clock usage so device won't be suspended */
> > > > + clk_enable(mdp->clk);
> > >
> > > Hum, intermixiggn runtime PM with clock API doesn't look good...
> >
> > I agree it looks weird but I need a way to increment the usage count for
> > the clock otherwise the PM code will disable the module clock and WoL
> > will not work.
>
> How will it do it if you don't call sh_eth_close() in this case?
>
> > Note that this call will not enable the clock just
> > increase the usage count so it won't be disabled when the PM code
> > decrease it after the sh_eth suspend function is run.
>
> You mean that the PM code calls RPM or clk API on its own? That's strange...
Yes it calls clk API.
>
> > If you know of a different way of ensuring that the clock is not turned
> > off I be happy to look at it. I did some investigation into this and
> > calling clk_enable() directly is for example what happens in the
> > enable_irq_wake() call path to ensure the clock for the irq_chip is not
> > turned off if it is a wakeup source, se for example
> > gpio_rcar_irq_set_wake() in drivers/gpio/gpio-rcar.c.
>
> Thanks, will look into it...
>
> [...]
>
> MBR, Sergei
>
--
Regards,
Niklas Söderlund
^ permalink raw reply
* Re: [patch net-next] irda: w83977af_ir: cleanup an indent issue
From: Joe Perches @ 2016-12-12 15:22 UTC (permalink / raw)
To: Dan Carpenter, Samuel Ortiz; +Cc: netdev, kernel-janitors
In-Reply-To: <20161212112134.GA10035@elgon.mountain>
On Mon, 2016-12-12 at 14:21 +0300, Dan Carpenter wrote:
> In commit 99d8d2159d7c ("irda: w83977af_ir: Neaten logging"), we
> accidentally added an extra tab to these lines.
Thanks Dan.
Oops, It's kinda funny I did the whole
series to fix that misindentation originally
and added it back when rebasing.
^ permalink raw reply
* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: Jesper Dangaard Brouer @ 2016-12-12 15:10 UTC (permalink / raw)
To: Mike Rapoport
Cc: netdev@vger.kernel.org, linux-mm, John Fastabend,
Willem de Bruijn, Björn Töpel, Karlsson, Magnus,
Alexander Duyck, Mel Gorman, Tom Herbert, Brenden Blanco,
Tariq Toukan, Saeed Mahameed, Jesse Brandeburg, Kalman Meth,
brouer
In-Reply-To: <20161212141433.GB19987@rapoport-lnx>
On Mon, 12 Dec 2016 16:14:33 +0200
Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> On Mon, Dec 12, 2016 at 10:40:42AM +0100, Jesper Dangaard Brouer wrote:
> >
> > On Mon, 12 Dec 2016 10:38:13 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> >
> > > Hello Jesper,
> > >
> > > On Mon, Dec 05, 2016 at 03:31:32PM +0100, Jesper Dangaard Brouer wrote:
> > > > Hi all,
> > > >
> > > > This is my design for how to safely handle RX zero-copy in the network
> > > > stack, by using page_pool[1] and modifying NIC drivers. Safely means
> > > > not leaking kernel info in pages mapped to userspace and resilience
> > > > so a malicious userspace app cannot crash the kernel.
> > > >
> > > > Design target
> > > > =============
> > > >
> > > > Allow the NIC to function as a normal Linux NIC and be shared in a
> > > > safe manor, between the kernel network stack and an accelerated
> > > > userspace application using RX zero-copy delivery.
> > > >
> > > > Target is to provide the basis for building RX zero-copy solutions in
> > > > a memory safe manor. An efficient communication channel for userspace
> > > > delivery is out of scope for this document, but OOM considerations are
> > > > discussed below (`Userspace delivery and OOM`_).
> > >
> > > Sorry, if this reply is a bit off-topic.
> >
> > It is very much on topic IMHO :-)
> >
> > > I'm working on implementation of RX zero-copy for virtio and I've dedicated
> > > some thought about making guest memory available for physical NIC DMAs.
> > > I believe this is quite related to your page_pool proposal, at least from
> > > the NIC driver perspective, so I'd like to share some thoughts here.
> >
> > Seems quite related. I'm very interested in cooperating with you! I'm
> > not very familiar with virtio, and how packets/pages gets channeled
> > into virtio.
>
> They are copied :-)
> Presuming we are dealing only with vhost backend, the received skb
> eventually gets converted to IOVs, which in turn are copied to the guest
> memory. The IOVs point to the guest memory that is allocated by virtio-net
> running in the guest.
Thanks for explaining that. It seems like a lot of overhead. I have to
wrap my head around this... so, the hardware NIC is receiving the
packet/page, in the RX ring, and after converting it to IOVs, it is
conceptually transmitted into the guest, and then the guest-side have a
RX-function to handle this packet. Correctly understood?
> > > The idea is to dedicate one (or more) of the NIC's queues to a VM, e.g.
> > > using macvtap, and then propagate guest RX memory allocations to the NIC
> > > using something like new .ndo_set_rx_buffers method.
> >
> > I believe the page_pool API/design aligns with this idea/use-case.
> >
> > > What is your view about interface between the page_pool and the NIC
> > > drivers?
> >
> > In my Prove-of-Concept implementation, the NIC driver (mlx5) register
> > a page_pool per RX queue. This is done for two reasons (1) performance
> > and (2) for supporting use-cases where only one single RX-ring queue is
> > (re)configured to support RX-zero-copy. There are some associated
> > extra cost of enabling this mode, thus it makes sense to only enable it
> > when needed.
> >
> > I've not decided how this gets enabled, maybe some new driver NDO. It
> > could also happen when a XDP program gets loaded, which request this
> > feature.
> >
> > The macvtap solution is nice and we should support it, but it requires
> > VM to have their MAC-addr registered on the physical switch. This
> > design is about adding flexibility. Registering an XDP eBPF filter
> > provides the maximum flexibility for matching the destination VM.
>
> I'm not very familiar with XDP eBPF, and it's difficult for me to estimate
> what needs to be done in BPF program to do proper conversion of skb to the
> virtio descriptors.
XDP is a step _before_ the SKB is allocated. The XDP eBPF program can
modify the packet-page data, but I don't think it is needed for your
use-case. View XDP (primarily) as an early (demux) filter.
XDP is missing a feature your need, which is TX packet into another
net_device (I actually imagine a port mapping table, that point to a
net_device). This require a new "TX-raw" NDO that takes a page (+
offset and length).
I imagine, the virtio driver (virtio_net or a new driver?) getting
extended with this new "TX-raw" NDO, that takes "raw" packet-pages.
Whether zero-copy is possible is determined by checking if page
originates from a page_pool that have enabled zero-copy (and likely
matching against a "protection domain" id number).
> We were not considered using XDP yet, so we've decided to limit the initial
> implementation to macvtap because we can ensure correspondence between a
> NIC queue and virtual NIC, which is not the case with more generic tap
> device. It could be that use of XDP will allow for a generic solution for
> virtio case as well.
You don't need an XDP filter, if you can make the HW do the early demux
binding into a queue. The check for if memory is zero-copy enabled
would be the same.
> >
> > > Have you considered using "push" model for setting the NIC's RX memory?
> >
> > I don't understand what you mean by a "push" model?
>
> Currently, memory allocation in NIC drivers boils down to alloc_page with
> some wrapping code. I see two possible ways to make NIC use of some
> preallocated pages: either NIC driver will call an API (probably different
> from alloc_page) to obtain that memory, or there will be NDO API that
> allows to set the NIC's RX buffers. I named the later case "push".
As you might have guessed, I'm not into the "push" model, because this
means I cannot share the queue with the normal network stack. Which I
believe is possible as outlined (in email and [2]) and can be done with
out HW filter features (like macvlan).
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
[1] https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/index.html
[2] https://prototype-kernel.readthedocs.io/en/latest/vm/page_pool/design/memory_model_nic.html
^ permalink raw reply
* Re: fib_frontend: Add network specific broadcasts, when it takes a sense
From: David Miller @ 2016-12-12 15:03 UTC (permalink / raw)
To: jbenc; +Cc: brandon.philips, netdev, tom, aaron.levy, bison
In-Reply-To: <20161212153035.4a751ccc@griffin>
From: Jiri Benc <jbenc@redhat.com>
Date: Mon, 12 Dec 2016 15:30:35 +0100
> On Fri, 9 Dec 2016 15:41:52 -0800, Brandon Philips wrote:
>> The issue we have: when creating the VXLAN interface and assigning it
>> an address we see a broadcast route being added by the Kernel. For
>> example if we have 10.4.0.0/16 a broadcast route to 10.4.0.0 is
>> created. This route is unwanted because we assign 10.4.0.0 to one of
>> our VXLAN interfaces.
>
> Are you saying you're trying to assign the IP address 10.4.0.0/16 as a
> unicast address to an interface? Then you'll run into way more problems
> than the one you're describing. You can't have host part of the IP
> address consisting of all zeros (or all ones). Just don't do it. Choose
> a valid IP address instead.
Agreed.
^ permalink raw reply
* Re: [PATCH net-next 1/3] net:dsa:mv88e6xxx: use hashtable to store multicast entries
From: Vivien Didelot @ 2016-12-12 15:03 UTC (permalink / raw)
To: Volodymyr Bendiuga, andrew, f.fainelli, netdev,
volodymyr.bendiuga
Cc: Volodymyr Bendiuga
In-Reply-To: <1481549958-1265-1-git-send-email-volodymyr.bendiuga@gmail.com>
Hi Volodymyr,
Volodymyr Bendiuga <volodymyr.bendiuga@gmail.com> writes:
> Hashtable will make it extremely faster when inserting fdb entries
> into the forwarding database.
This is hard to follow. As Andrew correctly mentioned, when you have two
or more patches, please format them with --cover-letter, describe them
in patch 0 and send them all together, so that we get a single thread.
You are correct about the ATU get operations being slow. However, we
intentionally keep the driver stateless at the moment to keep it simple.
Unless speed is an issue to fix here, I am quite reluctant to add
caching to this driver yet. What is the issue you are having with the
ATU being slow?
Thanks,
Vivien
^ permalink raw reply
* Re: [PATCH v2] net: macb: Added PCI wrapper for Platform Driver.
From: kbuild test robot @ 2016-12-12 15:00 UTC (permalink / raw)
To: Bartosz Folta
Cc: kbuild-all, Nicolas Ferre, David S. Miller, Niklas Cassel,
Alexandre Torgue, Satanand Burla, Raghu Vatsavayi, Simon Horman,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
Bartosz Folta, Rafal Ozieblo
In-Reply-To: <SN1PR0701MB1951E07687DBD871E4F1BF31CC980@SN1PR0701MB1951.namprd07.prod.outlook.com>
[-- Attachment #1: Type: text/plain, Size: 5003 bytes --]
Hi Bartosz,
[auto build test ERROR on net-next/master]
[also build test ERROR on v4.9 next-20161209]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Bartosz-Folta/net-macb-Added-PCI-wrapper-for-Platform-Driver/20161212-220228
config: blackfin-allyesconfig (attached as .config)
compiler: bfin-uclinux-gcc (GCC) 6.2.0
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=blackfin
All error/warnings (new ones prefixed by >>):
drivers/net/ethernet/cadence/macb_pci.c: In function 'macb_probe':
drivers/net/ethernet/cadence/macb_pci.c:78:19: error: implicit declaration of function 'clk_register_fixed_rate' [-Werror=implicit-function-declaration]
plat_data.pclk = clk_register_fixed_rate(&pdev->dev, "pclk", NULL, 0,
^~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/cadence/macb_pci.c:78:17: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
plat_data.pclk = clk_register_fixed_rate(&pdev->dev, "pclk", NULL, 0,
^
drivers/net/ethernet/cadence/macb_pci.c:85:17: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
plat_data.hclk = clk_register_fixed_rate(&pdev->dev, "hclk", NULL, 0,
^
drivers/net/ethernet/cadence/macb_pci.c:116:2: error: implicit declaration of function 'clk_unregister' [-Werror=implicit-function-declaration]
clk_unregister(plat_data.hclk);
^~~~~~~~~~~~~~
drivers/net/ethernet/cadence/macb_pci.c: At top level:
>> drivers/net/ethernet/cadence/macb_pci.c:149:1: warning: data definition has no type or storage class
module_pci_driver(macb_pci_driver);
^~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/cadence/macb_pci.c:149:1: error: type defaults to 'int' in declaration of 'module_pci_driver' [-Werror=implicit-int]
>> drivers/net/ethernet/cadence/macb_pci.c:149:1: warning: parameter names (without types) in function declaration
drivers/net/ethernet/cadence/macb_pci.c:142:26: warning: 'macb_pci_driver' defined but not used [-Wunused-variable]
static struct pci_driver macb_pci_driver = {
^~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +149 drivers/net/ethernet/cadence/macb_pci.c
79 GEM_PCLK_RATE);
80 if (IS_ERR(plat_data.pclk)) {
81 err = PTR_ERR(plat_data.pclk);
82 goto err_pclk_register;
83 }
84
> 85 plat_data.hclk = clk_register_fixed_rate(&pdev->dev, "hclk", NULL, 0,
86 GEM_HCLK_RATE);
87 if (IS_ERR(plat_data.hclk)) {
88 err = PTR_ERR(plat_data.hclk);
89 goto err_hclk_register;
90 }
91
92 /* set up platform device info */
93 memset(&plat_info, 0, sizeof(plat_info));
94 plat_info.parent = &pdev->dev;
95 plat_info.fwnode = pdev->dev.fwnode;
96 plat_info.name = PLAT_DRIVER_NAME;
97 plat_info.id = pdev->devfn;
98 plat_info.res = res;
99 plat_info.num_res = ARRAY_SIZE(res);
100 plat_info.data = &plat_data;
101 plat_info.size_data = sizeof(plat_data);
102 plat_info.dma_mask = DMA_BIT_MASK(32);
103
104 /* register platform device */
105 plat_dev = platform_device_register_full(&plat_info);
106 if (IS_ERR(plat_dev)) {
107 err = PTR_ERR(plat_dev);
108 goto err_plat_dev_register;
109 }
110
111 pci_set_drvdata(pdev, plat_dev);
112
113 return 0;
114
115 err_plat_dev_register:
> 116 clk_unregister(plat_data.hclk);
117
118 err_hclk_register:
119 clk_unregister(plat_data.pclk);
120
121 err_pclk_register:
122 pci_disable_device(pdev);
123 return err;
124 }
125
126 void macb_remove(struct pci_dev *pdev)
127 {
128 struct platform_device *plat_dev = pci_get_drvdata(pdev);
129 struct macb_platform_data *plat_data = dev_get_platdata(&plat_dev->dev);
130
131 platform_device_unregister(plat_dev);
132 pci_disable_device(pdev);
133 clk_unregister(plat_data->pclk);
134 clk_unregister(plat_data->hclk);
135 }
136
137 static struct pci_device_id dev_id_table[] = {
138 { PCI_DEVICE(CDNS_VENDOR_ID, CDNS_DEVICE_ID), },
139 { 0, }
140 };
141
142 static struct pci_driver macb_pci_driver = {
143 .name = PCI_DRIVER_NAME,
144 .id_table = dev_id_table,
145 .probe = macb_probe,
146 .remove = macb_remove,
147 };
148
> 149 module_pci_driver(macb_pci_driver);
150 MODULE_DEVICE_TABLE(pci, dev_id_table);
151 MODULE_LICENSE("GPL");
152 MODULE_DESCRIPTION("Cadence NIC PCI wrapper");
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 42460 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox