* [PATCH net] dcbnl : Fix lock initialization
From: Anish Bhatt @ 2014-11-06 18:09 UTC (permalink / raw)
To: netdev
Cc: davem, john.r.fastabend, ying.xue, jeffrey.t.kirsher, ebiederm,
Anish Bhatt
dcb_lock was being used uninitialized in dcbnl and is infact missing
initialization code. Fixed
Signed-off-by: Anish Bhatt <anish@chelsio.com>
---
net/dcb/dcbnl.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index ca11d28..7bc44e1 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1914,6 +1914,8 @@ static int __init dcbnl_init(void)
{
INIT_LIST_HEAD(&dcb_app_list);
+ spin_lock_init(&dcb_lock);
+
rtnl_register(PF_UNSPEC, RTM_GETDCB, dcb_doit, NULL, NULL);
rtnl_register(PF_UNSPEC, RTM_SETDCB, dcb_doit, NULL, NULL);
--
2.1.3
^ permalink raw reply related
* Re: [PATCH 00/13] net_sched: misc cleanups and improvements
From: Cong Wang @ 2014-11-06 18:05 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Linux Kernel Network Developers, Jamal Hadi Salim
In-Reply-To: <1415152068.1458.2.camel@edumazet-glaptop2.roam.corp.google.com>
On Tue, Nov 4, 2014 at 5:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-11-04 at 17:25 -0800, Cong Wang wrote:
>
>> Seriously, think about why it should when it's just cleanup's, be practical.
>
> I seriously ask you to not do cleanups then.
Apparently you didn't say this when the following commits got accepted:
commit 436f7c206860729d543a457aca5887e52039a5f4
Author: Fabian Frederick <fabf@skynet.be>
Date: Tue Nov 4 20:52:14 2014 +0100
igmp: remove camel case definitions
use standard uppercase for definitions
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit c18450a52a10a5c4cea3dc426c40447a7152290f
Author: Fabian Frederick <fabf@skynet.be>
Date: Tue Nov 4 20:48:41 2014 +0100
udp: remove else after return
commit aa1f731e52807077e9e13a86c0cad12d442c8fd4
Author: Fabian Frederick <fabf@skynet.be>
Date: Tue Nov 4 20:44:04 2014 +0100
inet: frags: remove inline on static in c file
remove __inline__ / inline and let compiler decide what to do
with static functions
Inspired-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
>
> Some people are working adding real stuff here, this code changing every
> month is slowing them a lot.
>
Who works on what? Does he/she at least announce it on netdev?
(If you meant John, I already waited for his rcu stuffs in the last
merge window,
I assumed his works is almost done therefore sent this patchset.)
Since when it becomes a rule that we should yield to something not merged,
not even announced? If so, why not adding it to netdev-FAQ?
^ permalink raw reply
* Fw: [Bug 87701] New: hard cpu lockup during pppd initialization of vpn
From: Stephen Hemminger @ 2014-11-04 18:09 UTC (permalink / raw)
To: netdev
Begin forwarded message:
Date: Tue, 4 Nov 2014 08:35:39 -0800
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 87701] New: hard cpu lockup during pppd initialization of vpn
https://bugzilla.kernel.org/show_bug.cgi?id=87701
Bug ID: 87701
Summary: hard cpu lockup during pppd initialization of vpn
Product: Networking
Version: 2.5
Kernel Version: 3.18.0-rc3
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: blocking
Priority: P1
Component: Other
Assignee: shemminger@linux-foundation.org
Reporter: richcoe2@gmail.com
Regression: No
I did not experience this issue in 3.16 or before.
I did not try 3.17.
I moved from kernel-3.15 to 3.16, and then to kernel 3.18.
When I start forticlientsslvpn on 3.18, the system locks up hard. No mouse and
no keyboard.
forticlient starts pppd to enable a vpn connection.
Since this is laptop, I don't get a kernel traceback, or OOPS message.
I'm enabling kdump to see if I can get a reliable traceback.
I was first on 3.18.0-rc2, and moved to 3.18.0-rc3 today, and still have the
issue.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply
* Re: [PATCH net-next 1/7] bpf: add 'flags' attribute to BPF_MAP_UPDATE_ELEM command
From: Alexei Starovoitov @ 2014-11-06 17:39 UTC (permalink / raw)
To: Daniel Borkmann
Cc: David S. Miller, Ingo Molnar, Andy Lutomirski,
Hannes Frederic Sowa, Eric Dumazet, Linux API,
Network Development, LKML
In-Reply-To: <545A3ACC.3080101@redhat.com>
On Wed, Nov 5, 2014 at 6:57 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 11/05/2014 12:04 AM, Alexei Starovoitov wrote:
>>
>> On Tue, Nov 4, 2014 at 1:25 AM, Daniel Borkmann <dborkman@redhat.com>
>> wrote:
>>>
>>> On 11/04/2014 03:54 AM, Alexei Starovoitov wrote:
>>>>
>>>>
>>>> the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
>>>> either update existing map element or create a new one.
>>>> Initially the plan was to add a new command to handle the case of
>>>> 'create new element if it didn't exist', but 'flags' style looks
>>>> cleaner and overall diff is much smaller (more code reused), so add
>>>> 'flags'
>>>> attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
>>>> enum {
>>>> BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing
>>>> */
>>>> BPF_MAP_CREATE_ONLY, /* add new element if it didn't exist
>>>> */
>>>> BPF_MAP_UPDATE_ONLY /* update existing element */
>>>> };
>>>
>>>
>>> From you commit message/code I currently don't see an explanation why
>>> it cannot be done in typical ``flags style'' as various syscalls do,
>>> i.e. BPF_MAP_UPDATE_OR_CREATE rather represented as ...
>>>
>>> BPF_MAP_CREATE | BPF_MAP_UPDATE
>>>
>>> Do you expect more than 64 different flags to be passed from user space
>>> for BPF_MAP?
>>
>>
>> several reasons:
>> - preserve flags==0 as default behavior
>> - avoid holes and extra checks for invalid combinations, so
>> if (flags > BPF_MAP_UPDATE_ONLY) goto err, is enough.
>> - it looks much neater when user space uses
>> BPF_MAP_UPDATE_OR_CREATE instead of ORing bits.
>>
>> Note this choice doesn't prevent adding bit-like flags
>> in the future. Today I cannot think of any new flags
>> for the update() command, but if somebody comes up with
>> a new selector that can apply to all three combinations,
>> we can add it as 3rd bit that can be ORed.
>
>
> Hm, mixing enums together with bitfield-like flags seems
> kind of hacky ... :/ Or, do you mean to say you're adding
> a 2nd flag field, i.e. splitting the 64bits into a 32bit
> ``cmd enum'' and 32bit ``flag section''?
something like this.
or splitting 64-bit into 2 and 62. We'll see.
First two encode this 'type' of update and the rest -
whatever else.
> Hm, my concern is that we start to add many *_OR_* enum
> elements once we find that a flag might be a useful in
> combination with many other flags ... even though if we
> only can think of 3 flags /right now/.
Agree. Adding many *_OR_* would look bad, that's
why I said that future additions can be bits. Like in
paragraph above.
Also, we don't have 3 flags now. In this patch I'm
showing 3 types and you're suggesting to treat
them as 2 flags. To me that's incorrect, since 'no flags'
becomes invalid combination, which logically incorrect.
Therefore I cannot see them as 'flags'. This is a 'type'
or 'style' of update() command.
I think it actually matches how open() defines things
in similar situation:
#define O_RDONLY 00000000
#define O_WRONLY 00000001
#define O_RDWR 00000002
We used to think of them as flags, but they're not
bit flags, though the rest of open() flags are bit-like.
If we apply your argument to open() then open()
should have defined O_RD as 1 and OR_WR as 2
and force everyone to mix and match them, but
then zero would be invalid. So I still think that
what I have is a cleaner API :)
^ permalink raw reply
* Re: [PATCH 3/4] macvtap: Use iovec iterators
From: Al Viro @ 2014-11-06 17:33 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, netdev, linux-kernel, bcrl, YOSHIFUJI Hideaki
In-Reply-To: <E1XmIQW-0007mb-J4@gondolin.me.apana.org.au>
On Thu, Nov 06, 2014 at 04:28:20PM +0800, Herbert Xu wrote:
> + if (copy_to_iter(&vnet_hdr, sizeof(vnet_hdr), iter))
> return -EFAULT;
Again, wrong calling conventions. It returns how much has it copied.
> + ret = copy_to_iter(&veth, sizeof(veth), iter);
> + if (ret || !iov_iter_count(iter))
> goto done;
Ditto.
^ permalink raw reply
* Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter
From: Al Viro @ 2014-11-06 17:30 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, netdev, linux-kernel, bcrl, YOSHIFUJI Hideaki
In-Reply-To: <E1XmIQU-0007m1-2e@gondolin.me.apana.org.au>
On Thu, Nov 06, 2014 at 04:28:18PM +0800, Herbert Xu wrote:
> + if (copy_to_iter(skb->data + offset, copy, to))
> + goto fault;
Sorry, no - copy_to_iter() returns the number of bytes copied, not 0 or -EFAULT.
> + vaddr = kmap(page);
> + err = copy_to_iter(vaddr + frag->page_offset +
> + offset - start, copy, to);
> + kunmap(page);
> + if (err)
> + goto fault;
And that one should be
copied = copy_page_to_iter(page, frag->page_offset +
offset - start, copy, to);
if (copied != copy)
goto fault;
Don't bother with kmap(), vaddr and all that shite. The primitive is
copy_page_to_iter(page, offset_in_page, nbytes, iter)
it does all needed kmap itself and it's smart enough to use kmap_atomic
when it can get away with that. Similar for copy_page_from_iter().
Both of those (as well as copy_{to,from}_iter()) advance iov_iter and return
the number of bytes actually copied. So the check for EFAULT is "it has copied
less than you've asked it to copy *and* you haven't run out that iov_iter".
The second part is guaranteed to be true in this case - your code makes sure
that 'copy' is no more than the space left in iterator.
In general, this check would be spelled
if (copied != copy && iov_iter_count(to))
goto fault;
^ permalink raw reply
* Re: am335x: cpsw: phy ignores max-speed setting
From: Eric Dumazet @ 2014-11-06 17:28 UTC (permalink / raw)
To: Dave Taht
Cc: Yegor Yefremov, netdev, N, Mugunthan V, mpa, lsorense,
Daniel Mack
In-Reply-To: <CAA93jw5=LDirktyC+rvpLi-kywUSosj6QV8-na5p3-f=PxKcWQ@mail.gmail.com>
On Thu, 2014-11-06 at 08:51 -0800, Dave Taht wrote:
> ooh! ooh! I have a BQL enablement patch for the cpsw that I have no
> means of testing against multiple phys. Could
> you give the attached very small patch a shot along the way?
>
> The results I get on the beaglebone vs netperf-wrapper are pretty
> spectacular - huge increase in throughput, big reduction in
> latency.
Please send this patch inline, so that we can comment, and start a new
thread.
@@ -1375,9 +1380,11 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb,
skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
skb_tx_timestamp(skb);
-
+ len = max(skb->len, CPSW_MIN_PACKET_SIZE);
+ netdev_sent_queue(ndev,len);
ret = cpsw_tx_packet_submit(ndev, priv, skb);
if (unlikely(ret != 0)) {
+ netdev_completed_queue(ndev,1,len);
<<you can not do that, its racy with other netdev_completed_queue() calls from TX completion >>
cpsw_err(priv, tx_err, "desc submit failed\n");
goto fail;
}
You need to call netdev_sent_queue(ndev, len); at the correct place,
because we can not 'undo' it.
^ permalink raw reply
* Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter
From: David Miller @ 2014-11-06 17:25 UTC (permalink / raw)
To: herbert; +Cc: viro, netdev, linux-kernel, bcrl
In-Reply-To: <20141106082338.GA29800@gondor.apana.org.au>
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 6 Nov 2014 16:23:38 +0800
> On Wed, Nov 05, 2014 at 03:24:10PM -0500, David Miller wrote:
>>
>> Herbert, please provide a cover letter for this series, and the most recent
>> version of patch #2 gets various rejects when I try to apply it to net-next.
>
> Sure, I'll regenerate them. However, while doing so I noticed that
> a number of my patches on tun/macvtap that you have previously set
> as accepted are missing from net-next. Could this be why you got
> the rejects?
Those were bug fixes so went into plain 'net', they will show up next
time I do a merge and I will deal with the conflicts, if any.
^ permalink raw reply
* Re: [patch net-next 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
From: Florian Fainelli @ 2014-11-06 16:59 UTC (permalink / raw)
To: Jiri Pirko, netdev
Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
john.r.fastabend, edumazet, jhs, sfeldma, roopa, linville,
jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415265610-9338-8-git-send-email-jiri@resnulli.us>
On 11/06/2014 01:20 AM, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> To notify switch driver of change in STP state of bridge port, add new
> .ndo op and provide swdev wrapper func to call ndo op. Use it in bridge
> code then.
>
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
[snip]
> #endif /* _LINUX_SWITCHDEV_H_ */
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index 86c239b..13fecf1 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -17,6 +17,7 @@
> #include <net/net_namespace.h>
> #include <net/sock.h>
> #include <uapi/linux/if_bridge.h>
> +#include <net/switchdev.h>
>
> #include "br_private.h"
> #include "br_private_stp.h"
> @@ -304,6 +305,7 @@ static int br_set_port_state(struct net_bridge_port *p, u8 state)
>
> br_set_state(p, state);
> br_log_state(p);
> + netdev_sw_port_stp_update(p->dev, p->state);
Is there a reason netdev_sw_port_stp_update() is not folded in
br_set_state()? Are we missing calls to br_set_state() in some locations?
--
Florian
^ permalink raw reply
* Re: am335x: cpsw: phy ignores max-speed setting
From: Florian Fainelli @ 2014-11-06 16:58 UTC (permalink / raw)
To: Yegor Yefremov, netdev; +Cc: N, Mugunthan V, mpa, lsorense, Daniel Mack
In-Reply-To: <CAGm1_ktWK5ai85PZJTkq8Q1mAFH6JZ5XM1mDOHO3K_N2iGNLWg@mail.gmail.com>
On 11/06/2014 08:25 AM, Yegor Yefremov wrote:
> I' m trying to override max-speed setting for both CPSW connected
> PHYs. This is my DTS section for configuring CPSW:
>
> &mac {
> pinctrl-names = "default", "sleep";
> pinctrl-0 = <&cpsw_default>;
> pinctrl-1 = <&cpsw_sleep>;
> dual_emac = <1>;
>
> status = "okay";
> };
>
> &davinci_mdio {
> pinctrl-names = "default", "sleep";
> pinctrl-0 = <&davinci_mdio_default>;
> pinctrl-1 = <&davinci_mdio_sleep>;
>
> status = "okay";
> };
>
> &cpsw_emac0 {
> phy_id = <&davinci_mdio>, <0>;
> phy-mode = "rgmii-id";
> dual_emac_res_vlan = <1>;
> max-speed = <100>;
> };
>
> &cpsw_emac1 {
> phy_id = <&davinci_mdio>, <1>;
> phy-mode = "rgmii-id";
> dual_emac_res_vlan = <2>;
> max-speed = <100>;
> };
>
> But in drivers/net/phy/phy_device.c->of_set_phy_supported() routine I
> don't get through node check, i.e. node == NULL. Any idea why?
Yes, because the 'max-speed' property is placed at the Ethernet MAC node
level, not the PHY node as of_set_phy_supported() expect its.
This driver does not appear to use the standard Ethernet PHY device tree
node, so I am not sure what are your options here.
>
> static void of_set_phy_supported(struct phy_device *phydev)
> {
> struct device_node *node = phydev->dev.of_node;
> u32 max_speed;
>
> if (!IS_ENABLED(CONFIG_OF_MDIO))
> return;
>
> if (!node)
> return;
>
> if (!of_property_read_u32(node, "max-speed", &max_speed)) {
> /* The default values for phydev->supported are
> provided by the PHY
> * driver "features" member, we want to reset to sane
> defaults fist
> * before supporting higher speeds.
> */
> phydev->supported &= PHY_DEFAULT_FEATURES;
>
> switch (max_speed) {
> default:
> return;
>
> case SPEED_1000:
> phydev->supported |= PHY_1000BT_FEATURES;
> case SPEED_100:
> phydev->supported |= PHY_100BT_FEATURES;
> case SPEED_10:
> phydev->supported |= PHY_10BT_FEATURES;
> }
> }
> }
>
> Yegor
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: am335x: cpsw: phy ignores max-speed setting
From: Dave Taht @ 2014-11-06 16:51 UTC (permalink / raw)
To: Yegor Yefremov; +Cc: netdev, N, Mugunthan V, mpa, lsorense, Daniel Mack
In-Reply-To: <CAGm1_ktWK5ai85PZJTkq8Q1mAFH6JZ5XM1mDOHO3K_N2iGNLWg@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2996 bytes --]
ooh! ooh! I have a BQL enablement patch for the cpsw that I have no
means of testing against multiple phys. Could
you give the attached very small patch a shot along the way?
The results I get on the beaglebone vs netperf-wrapper are pretty
spectacular - huge increase in throughput, big reduction in
latency.
http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes_a_difference.png
http://snapon.lab.bufferbloat.net/~d/beagle_bql/beaglebonewithbql.png
On Thu, Nov 6, 2014 at 8:25 AM, Yegor Yefremov
<yegorslists@googlemail.com> wrote:
> I' m trying to override max-speed setting for both CPSW connected
> PHYs. This is my DTS section for configuring CPSW:
>
> &mac {
> pinctrl-names = "default", "sleep";
> pinctrl-0 = <&cpsw_default>;
> pinctrl-1 = <&cpsw_sleep>;
> dual_emac = <1>;
>
> status = "okay";
> };
>
> &davinci_mdio {
> pinctrl-names = "default", "sleep";
> pinctrl-0 = <&davinci_mdio_default>;
> pinctrl-1 = <&davinci_mdio_sleep>;
>
> status = "okay";
> };
>
> &cpsw_emac0 {
> phy_id = <&davinci_mdio>, <0>;
> phy-mode = "rgmii-id";
> dual_emac_res_vlan = <1>;
> max-speed = <100>;
> };
>
> &cpsw_emac1 {
> phy_id = <&davinci_mdio>, <1>;
> phy-mode = "rgmii-id";
> dual_emac_res_vlan = <2>;
> max-speed = <100>;
> };
>
> But in drivers/net/phy/phy_device.c->of_set_phy_supported() routine I
> don't get through node check, i.e. node == NULL. Any idea why?
>
> static void of_set_phy_supported(struct phy_device *phydev)
> {
> struct device_node *node = phydev->dev.of_node;
> u32 max_speed;
>
> if (!IS_ENABLED(CONFIG_OF_MDIO))
> return;
>
> if (!node)
> return;
>
> if (!of_property_read_u32(node, "max-speed", &max_speed)) {
> /* The default values for phydev->supported are
> provided by the PHY
> * driver "features" member, we want to reset to sane
> defaults fist
> * before supporting higher speeds.
> */
> phydev->supported &= PHY_DEFAULT_FEATURES;
>
> switch (max_speed) {
> default:
> return;
>
> case SPEED_1000:
> phydev->supported |= PHY_1000BT_FEATURES;
> case SPEED_100:
> phydev->supported |= PHY_100BT_FEATURES;
> case SPEED_10:
> phydev->supported |= PHY_10BT_FEATURES;
> }
> }
> }
>
> Yegor
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Dave Täht
thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks
[-- Attachment #2: 0001-Add-BQL-support-to-the-TI-cpsw-driver.patch --]
[-- Type: text/x-patch, Size: 2369 bytes --]
From 7eccb26dc8f6d09660b22fcbd868572d050df26f Mon Sep 17 00:00:00 2001
From: Dave Taht <dave.taht@bufferbloat.net>
Date: Thu, 6 Nov 2014 08:45:30 -0800
Subject: [PATCH] Add BQL support to the TI cpsw driver
Tested on the beaglebone black.
I get a huge improvement in both throughput and latency.
Latency goes from 60ms worst case with pfifo_fast, and 12ms worst case with
sch_fq to 2.5ms with BQL enabled.
Throughput improved also.
---
drivers/net/ethernet/ti/cpsw.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index d879448..5934fbc 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -118,7 +118,7 @@ do { \
#define CPDMA_TXCP 0x40
#define CPDMA_RXCP 0x60
-#define CPSW_POLL_WEIGHT 64
+#define CPSW_POLL_WEIGHT 16
#define CPSW_MIN_PACKET_SIZE 60
#define CPSW_MAX_PACKET_SIZE (1500 + 14 + 4 + 4)
@@ -693,6 +693,7 @@ static void cpsw_tx_handler(void *token, int len, int status)
cpts_tx_timestamp(priv->cpts, skb);
ndev->stats.tx_packets++;
ndev->stats.tx_bytes += len;
+ netdev_completed_queue(ndev,1,len);
dev_kfree_skb_any(skb);
}
@@ -1307,6 +1308,8 @@ static int cpsw_ndo_open(struct net_device *ndev)
cpsw_set_coalesce(ndev, &coal);
}
+ netdev_reset_queue(ndev);
+ dev_info(priv->dev, "BQL enabled\n");
napi_enable(&priv->napi);
cpdma_ctlr_start(priv->dma);
cpsw_intr_enable(priv);
@@ -1341,6 +1344,7 @@ static int cpsw_ndo_stop(struct net_device *ndev)
netif_stop_queue(priv->ndev);
napi_disable(&priv->napi);
netif_carrier_off(priv->ndev);
+ netdev_reset_queue(priv->ndev);
if (cpsw_common_res_usage_state(priv) <= 1) {
cpts_unregister(priv->cpts);
@@ -1361,6 +1365,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb,
{
struct cpsw_priv *priv = netdev_priv(ndev);
int ret;
+ int len;
ndev->trans_start = jiffies;
@@ -1375,9 +1380,11 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb,
skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
skb_tx_timestamp(skb);
-
+ len = max(skb->len, CPSW_MIN_PACKET_SIZE);
+ netdev_sent_queue(ndev,len);
ret = cpsw_tx_packet_submit(ndev, priv, skb);
if (unlikely(ret != 0)) {
+ netdev_completed_queue(ndev,1,len);
cpsw_err(priv, tx_err, "desc submit failed\n");
goto fail;
}
--
1.9.1
^ permalink raw reply related
* Re: [PATCHv2 net-next 2/2] sunvnet: Use one Tx queue per vnet_port
From: Sowmini Varadhan @ 2014-11-06 16:46 UTC (permalink / raw)
To: Ben Hutchings; +Cc: davem, netdev
In-Reply-To: <1415291564.3398.66.camel@decadent.org.uk>
On (11/06/14 16:32), Ben Hutchings wrote:
>
> OK, then the indentation of the following line is wrong.
you are right, sorry about that. I'll fix that shortly..
--Sowmini
^ permalink raw reply
* Re: [PATCH net-next] net: gro: add a per device gro flush timer
From: Rick Jones @ 2014-11-06 16:42 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Or Gerlitz, Willem de Bruijn
In-Reply-To: <1415241576.13896.62.camel@edumazet-glaptop2.roam.corp.google.com>
On 11/05/2014 06:39 PM, Eric Dumazet wrote:
> On Wed, 2014-11-05 at 18:14 -0800, Eric Dumazet wrote:
>> On Wed, 2014-11-05 at 17:38 -0800, Rick Jones wrote:
>>
>>> Speaking of QPS, what happens to 200 TCP_RR tests when the feature is
>>> enabled?
>
> The possible reduction of QPS happens when you have a single flow like
> TCP_RR -- -r 40000,40000
>
> (Because we have one single TCP packet with 40000 bytes of payload,
> application is waked up once when Push flag is received)
>
> So cpu effiency is way better, but application has to copy 40000 bytes
> in one go _after_ Push flag, instead of being able to copy part of the
> data _before_ receiving the Push flag.
Thanks. That isn't too unlike what I've seen happen in the past with
say an 8K request size and switching back and forth between a 1500 and
9000 byte MTU.
happy benchmarking,
rick
^ permalink raw reply
* Re: [PATCHv2 net-next 2/2] sunvnet: Use one Tx queue per vnet_port
From: Ben Hutchings @ 2014-11-06 16:32 UTC (permalink / raw)
To: Sowmini Varadhan; +Cc: davem, netdev
In-Reply-To: <20141106162822.GL15665@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]
On Thu, 2014-11-06 at 11:28 -0500, Sowmini Varadhan wrote:
> On (11/06/14 16:19), Ben Hutchings wrote:
> > > + txq = netdev_get_tx_queue(port->vp->dev, port->q_index);
> > > + __netif_tx_lock(txq, smp_processor_id());
> > > + if (likely(netif_tx_queue_stopped(txq))) {
> > > + struct vio_dring_state *dr;
> > > +
> > > + dr = &port->vio.drings[VIO_DRIVER_TX_RING];
> >
> > You seem to have dropped the condition for the netif_tx_wake_queue(),
> > which I would guess based on the old code should be:
> >
> > if (vnet_tx_dring_avail(dr) >= VNET_TX_WAKEUP_THRESH(dr))
> >
> > > + netif_tx_wake_queue(txq);
>
> yes, this was deliberate.
>
> As I indicated in the comments:
>
> /* Got back a STOPPED LDC message on port. If the queue is stopped,
> * wake it up so that we'll send out another START message at the
> * next TX.
> */
>
> We only call maybe_tx_wakeup() if the peer has sent us a STOPPED
> ack (meaning that the peer is no longer reading the descriptor rings).
> So if our tx queue is full and stopped, we need to poke the peer
> on the next TX with a start message. (otherwise we'd never wake up!)
OK, then the indentation of the following line is wrong.
Ben.
--
Ben Hutchings
Beware of programmers who carry screwdrivers. - Leonard Brandwein
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply
* Re: mlx4+vxlan offload breaks gre tunnels
From: Or Gerlitz @ 2014-11-06 16:30 UTC (permalink / raw)
To: Florian Westphal; +Cc: netdev, Tom Herbert, Jesse Gross, amirv
In-Reply-To: <20141105165351.GA23131@breakpoint.cc>
On 11/5/2014 6:53 PM, Florian Westphal wrote:
> Right, the patch below works in my setup as well (until link-add-vxlan,
> that is;) )
Good, let me look on that little further to see what's the best approach
here, thanks for the report
Or.
^ permalink raw reply
* [PATCH V2 net-next 2/2] net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE
From: Or Gerlitz @ 2014-11-06 16:28 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli,
Ido Shamay, Jerry Chu, Or Gerlitz
In-Reply-To: <1415291324-22069-1-git-send-email-ogerlitz@mellanox.com>
From: Shani Michaeli <shanim@mellanox.com>
When processing received traffic, pass CHECKSUM_COMPLETE status to the
stack, with calculated checksum for non TCP/UDP packets (such
as GRE or ICMP).
Although the stack expects checksum which doesn't include the pseudo
header, the HW adds it. To address that, we are subtracting the pseudo
header checksum from the checksum value provided by the HW.
In the IPv6 case, we also compute/add the IP header checksum which
is not added by the HW for such packets.
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 5 +
drivers/net/ethernet/mellanox/mlx4/en_port.c | 2 +
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 127 +++++++++++++++++++++--
drivers/net/ethernet/mellanox/mlx4/main.c | 9 ++
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 5 +-
include/linux/mlx4/device.h | 1 +
7 files changed, 142 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 8ea4d5b..6c64323 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -115,7 +115,7 @@ static const char main_strings[][ETH_GSTRING_LEN] = {
"tso_packets",
"xmit_more",
"queue_stopped", "wake_queue", "tx_timeout", "rx_alloc_failed",
- "rx_csum_good", "rx_csum_none", "tx_chksum_offload",
+ "rx_csum_good", "rx_csum_none", "rx_csum_complete", "tx_chksum_offload",
/* packet statistics */
"broadcast", "rx_prio_0", "rx_prio_1", "rx_prio_2", "rx_prio_3",
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0efbae9..d1eb25d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1893,6 +1893,7 @@ static void mlx4_en_clear_stats(struct net_device *dev)
priv->rx_ring[i]->packets = 0;
priv->rx_ring[i]->csum_ok = 0;
priv->rx_ring[i]->csum_none = 0;
+ priv->rx_ring[i]->csum_complete = 0;
}
}
@@ -2503,6 +2504,10 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
/* Query for default mac and max mtu */
priv->max_mtu = mdev->dev->caps.eth_mtu_cap[priv->port];
+ if (mdev->dev->caps.rx_checksum_flags_port[priv->port] &
+ MLX4_RX_CSUM_MODE_VAL_NON_TCP_UDP)
+ priv->flags |= MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP;
+
/* Set default MAC */
dev->addr_len = ETH_ALEN;
mlx4_en_u64_to_mac(dev->dev_addr, mdev->dev->caps.def_mac[priv->port]);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index 134b12e..6cb8007 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -155,11 +155,13 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
stats->rx_bytes = 0;
priv->port_stats.rx_chksum_good = 0;
priv->port_stats.rx_chksum_none = 0;
+ priv->port_stats.rx_chksum_complete = 0;
for (i = 0; i < priv->rx_ring_num; i++) {
stats->rx_packets += priv->rx_ring[i]->packets;
stats->rx_bytes += priv->rx_ring[i]->bytes;
priv->port_stats.rx_chksum_good += priv->rx_ring[i]->csum_ok;
priv->port_stats.rx_chksum_none += priv->rx_ring[i]->csum_none;
+ priv->port_stats.rx_chksum_complete += priv->rx_ring[i]->csum_complete;
}
stats->tx_packets = 0;
stats->tx_bytes = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c83127e..25065c2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -42,6 +42,10 @@
#include <linux/vmalloc.h>
#include <linux/irq.h>
+#if IS_ENABLED(CONFIG_IPV6)
+#include <net/ip6_checksum.h>
+#endif
+
#include "mlx4_en.h"
static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
@@ -643,6 +647,86 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
}
}
+/* When hardware doesn't strip the vlan, we need to calculate the checksum
+ * over it and add it to the hardware's checksum calculation
+ */
+static inline __wsum get_fixed_vlan_csum(__wsum hw_checksum,
+ struct vlan_hdr *vlanh)
+{
+ return csum_add(hw_checksum, *(__wsum *)vlanh);
+}
+
+/* Although the stack expects checksum which doesn't include the pseudo
+ * header, the HW adds it. To address that, we are subtracting the pseudo
+ * header checksum from the checksum value provided by the HW.
+ */
+static void get_fixed_ipv4_csum(__wsum hw_checksum, struct sk_buff *skb,
+ struct iphdr *iph)
+{
+ __u16 length_for_csum = 0;
+ __wsum csum_pseudo_header = 0;
+
+ length_for_csum = (be16_to_cpu(iph->tot_len) - (iph->ihl << 2));
+ csum_pseudo_header = csum_tcpudp_nofold(iph->saddr, iph->daddr,
+ length_for_csum, iph->protocol, 0);
+ skb->csum = csum_sub(hw_checksum, csum_pseudo_header);
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+/* In IPv6 packets, besides subtracting the pseudo header checksum,
+ * we also compute/add the IP header checksum which
+ * is not added by the HW.
+ */
+static int get_fixed_ipv6_csum(__wsum hw_checksum, struct sk_buff *skb,
+ struct ipv6hdr *ipv6h)
+{
+ __wsum csum_pseudo_hdr = 0;
+
+ if (ipv6h->nexthdr == IPPROTO_FRAGMENT || ipv6h->nexthdr == IPPROTO_HOPOPTS)
+ return -1;
+ hw_checksum = csum_add(hw_checksum, (__force __wsum)(ipv6h->nexthdr << 8));
+
+ csum_pseudo_hdr = csum_partial(&ipv6h->saddr,
+ sizeof(ipv6h->saddr) + sizeof(ipv6h->daddr), 0);
+ csum_pseudo_hdr = csum_add(csum_pseudo_hdr, (__force __wsum)ipv6h->payload_len);
+ csum_pseudo_hdr = csum_add(csum_pseudo_hdr, (__force __wsum)ntohs(ipv6h->nexthdr));
+
+ skb->csum = csum_sub(hw_checksum, csum_pseudo_hdr);
+ skb->csum = csum_add(skb->csum, csum_partial(ipv6h, sizeof(struct ipv6hdr), 0));
+ return 0;
+}
+#endif
+static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
+ int hwtstamp_rx_filter)
+{
+ __wsum hw_checksum = 0;
+
+ void *hdr = (u8 *)va + sizeof(struct ethhdr);
+
+ hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
+
+ if (((struct ethhdr *)va)->h_proto == htons(ETH_P_8021Q) &&
+ hwtstamp_rx_filter != HWTSTAMP_FILTER_NONE) {
+ /* next protocol non IPv4 or IPv6 */
+ if (((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
+ != htons(ETH_P_IP) &&
+ ((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
+ != htons(ETH_P_IPV6))
+ return -1;
+ hw_checksum = get_fixed_vlan_csum(hw_checksum, hdr);
+ hdr += sizeof(struct vlan_hdr);
+ }
+
+ if (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPV4))
+ get_fixed_ipv4_csum(hw_checksum, skb, hdr);
+#if IS_ENABLED(CONFIG_IPV6)
+ else if (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPV6))
+ if (get_fixed_ipv6_csum(hw_checksum, skb, hdr))
+ return -1;
+#endif
+ return 0;
+}
+
int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
{
struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -660,6 +744,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
int factor = priv->cqe_factor;
u64 timestamp;
bool l2_tunnel;
+ void *va = 0;
if (!priv->port_up)
return 0;
@@ -702,8 +787,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
* and not performing the selftest or flb disabled
*/
if (priv->flags & MLX4_EN_FLAG_RX_FILTER_NEEDED) {
- struct ethhdr *ethh;
dma_addr_t dma;
+ struct ethhdr *ethh;
/* Get pointer to first fragment since we haven't
* skb yet and cast it to ethhdr struct
*/
@@ -744,13 +829,26 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
(cqe->vlan_my_qpn & cpu_to_be32(MLX4_CQE_L2_TUNNEL));
if (likely(dev->features & NETIF_F_RXCSUM)) {
- if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
- (cqe->checksum == cpu_to_be16(0xffff))) {
- ring->csum_ok++;
- ip_summed = CHECKSUM_UNNECESSARY;
+ if (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP |
+ MLX4_CQE_STATUS_UDP)) {
+ if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
+ cqe->checksum == cpu_to_be16(0xffff)) {
+ ip_summed = CHECKSUM_UNNECESSARY;
+ ring->csum_ok++;
+ } else {
+ ip_summed = CHECKSUM_NONE;
+ ring->csum_none++;
+ }
} else {
- ip_summed = CHECKSUM_NONE;
- ring->csum_none++;
+ if (priv->flags & MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP &&
+ (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPV4 |
+ MLX4_CQE_STATUS_IPV6))) {
+ ip_summed = CHECKSUM_COMPLETE;
+ ring->csum_complete++;
+ } else {
+ ip_summed = CHECKSUM_NONE;
+ ring->csum_none++;
+ }
}
} else {
ip_summed = CHECKSUM_NONE;
@@ -776,6 +874,14 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
if (!nr)
goto next;
+ if (ip_summed == CHECKSUM_COMPLETE) {
+ va = skb_frag_address(skb_shinfo(gro_skb)->frags);
+ if (check_csum(cqe, gro_skb, va, ring->hwtstamp_rx_filter)) {
+ ip_summed = CHECKSUM_NONE;
+ ring->csum_none++;
+ }
+ }
+
skb_shinfo(gro_skb)->nr_frags = nr;
gro_skb->len = length;
gro_skb->data_len = length;
@@ -822,6 +928,13 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
goto next;
}
+ if (ip_summed == CHECKSUM_COMPLETE) {
+ if (check_csum(cqe, skb, skb->data, ring->hwtstamp_rx_filter)) {
+ ip_summed = CHECKSUM_NONE;
+ ring->csum_none++;
+ }
+ }
+
skb->ip_summed = ip_summed;
skb->protocol = eth_type_trans(skb, dev);
skb_record_rx_queue(skb, cq->ring);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 9f82196..2f6ba42 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1629,6 +1629,7 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
struct mlx4_init_hca_param init_hca;
u64 icm_size;
int err;
+ struct mlx4_config_dev_params params;
if (!mlx4_is_slave(dev)) {
err = mlx4_QUERY_FW(dev);
@@ -1762,6 +1763,14 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
goto unmap_bf;
}
+ /* Query CONFIG_DEV parameters */
+ err = mlx4_config_dev_retrieval(dev, ¶ms);
+ if (err && err != -ENOTSUPP) {
+ mlx4_err(dev, "Failed to query CONFIG_DEV parameters\n");
+ } else if (!err) {
+ dev->caps.rx_checksum_flags_port[1] = params.rx_csum_flags_port_1;
+ dev->caps.rx_checksum_flags_port[2] = params.rx_csum_flags_port_2;
+ }
priv->eq_table.inta_pin = adapter.inta_pin;
memcpy(dev->board_id, adapter.board_id, sizeof dev->board_id);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index ef83d12..de45674 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -326,6 +326,7 @@ struct mlx4_en_rx_ring {
#endif
unsigned long csum_ok;
unsigned long csum_none;
+ unsigned long csum_complete;
int hwtstamp_rx_filter;
cpumask_var_t affinity_mask;
};
@@ -449,6 +450,7 @@ struct mlx4_en_port_stats {
unsigned long rx_alloc_failed;
unsigned long rx_chksum_good;
unsigned long rx_chksum_none;
+ unsigned long rx_chksum_complete;
unsigned long tx_chksum_offload;
#define NUM_PORT_STATS 9
};
@@ -507,7 +509,8 @@ enum {
MLX4_EN_FLAG_ENABLE_HW_LOOPBACK = (1 << 2),
/* whether we need to drop packets that hardware loopback-ed */
MLX4_EN_FLAG_RX_FILTER_NEEDED = (1 << 3),
- MLX4_EN_FLAG_FORCE_PROMISC = (1 << 4)
+ MLX4_EN_FLAG_FORCE_PROMISC = (1 << 4),
+ MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP = (1 << 5),
};
#define MLX4_EN_MAC_HASH_SIZE (1 << BITS_PER_BYTE)
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 5cc5eac..3d9bff0 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -497,6 +497,7 @@ struct mlx4_caps {
u16 hca_core_clock;
u64 phys_port_id[MLX4_MAX_PORTS + 1];
int tunnel_offload_mode;
+ u8 rx_checksum_flags_port[MLX4_MAX_PORTS + 1];
};
struct mlx4_buf_list {
--
1.7.1
^ permalink raw reply related
* [PATCH V2 net-next 1/2] net/mlx4_en: Extend usage of napi_gro_frags
From: Or Gerlitz @ 2014-11-06 16:28 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli,
Ido Shamay, Or Gerlitz
In-Reply-To: <1415291324-22069-1-git-send-email-ogerlitz@mellanox.com>
From: Shani Michaeli <shanim@mellanox.com>
We can call napi_gro_frags for all the received traffic regardless
of the checksum status. Specifically, received packets whose status
is CHECKSUM_NONE (and soon to be added CHECKSUM_COMPLETE)
are eligible for napi_gro_frags as well.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 108 ++++++++++++++--------------
1 files changed, 54 insertions(+), 54 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index b173a0c..c83127e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -747,60 +747,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
(cqe->checksum == cpu_to_be16(0xffff))) {
ring->csum_ok++;
- /* This packet is eligible for GRO if it is:
- * - DIX Ethernet (type interpretation)
- * - TCP/IP (v4)
- * - without IP options
- * - not an IP fragment
- * - no LLS polling in progress
- */
- if (!mlx4_en_cq_busy_polling(cq) &&
- (dev->features & NETIF_F_GRO)) {
- struct sk_buff *gro_skb = napi_get_frags(&cq->napi);
- if (!gro_skb)
- goto next;
-
- nr = mlx4_en_complete_rx_desc(priv,
- rx_desc, frags, gro_skb,
- length);
- if (!nr)
- goto next;
-
- skb_shinfo(gro_skb)->nr_frags = nr;
- gro_skb->len = length;
- gro_skb->data_len = length;
- gro_skb->ip_summed = CHECKSUM_UNNECESSARY;
-
- if (l2_tunnel)
- gro_skb->csum_level = 1;
- if ((cqe->vlan_my_qpn &
- cpu_to_be32(MLX4_CQE_VLAN_PRESENT_MASK)) &&
- (dev->features & NETIF_F_HW_VLAN_CTAG_RX)) {
- u16 vid = be16_to_cpu(cqe->sl_vid);
-
- __vlan_hwaccel_put_tag(gro_skb, htons(ETH_P_8021Q), vid);
- }
-
- if (dev->features & NETIF_F_RXHASH)
- skb_set_hash(gro_skb,
- be32_to_cpu(cqe->immed_rss_invalid),
- PKT_HASH_TYPE_L3);
-
- skb_record_rx_queue(gro_skb, cq->ring);
- skb_mark_napi_id(gro_skb, &cq->napi);
-
- if (ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL) {
- timestamp = mlx4_en_get_cqe_ts(cqe);
- mlx4_en_fill_hwtstamps(mdev,
- skb_hwtstamps(gro_skb),
- timestamp);
- }
-
- napi_gro_frags(&cq->napi);
- goto next;
- }
-
- /* GRO not possible, complete processing here */
ip_summed = CHECKSUM_UNNECESSARY;
} else {
ip_summed = CHECKSUM_NONE;
@@ -811,6 +757,60 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
ring->csum_none++;
}
+ /* This packet is eligible for GRO if it is:
+ * - DIX Ethernet (type interpretation)
+ * - TCP/IP (v4)
+ * - without IP options
+ * - not an IP fragment
+ * - no LLS polling in progress
+ */
+ if (!mlx4_en_cq_busy_polling(cq) &&
+ (dev->features & NETIF_F_GRO)) {
+ struct sk_buff *gro_skb = napi_get_frags(&cq->napi);
+ if (!gro_skb)
+ goto next;
+
+ nr = mlx4_en_complete_rx_desc(priv,
+ rx_desc, frags, gro_skb,
+ length);
+ if (!nr)
+ goto next;
+
+ skb_shinfo(gro_skb)->nr_frags = nr;
+ gro_skb->len = length;
+ gro_skb->data_len = length;
+ gro_skb->ip_summed = ip_summed;
+
+ if (l2_tunnel && ip_summed == CHECKSUM_UNNECESSARY)
+ gro_skb->encapsulation = 1;
+ if ((cqe->vlan_my_qpn &
+ cpu_to_be32(MLX4_CQE_VLAN_PRESENT_MASK)) &&
+ (dev->features & NETIF_F_HW_VLAN_CTAG_RX)) {
+ u16 vid = be16_to_cpu(cqe->sl_vid);
+
+ __vlan_hwaccel_put_tag(gro_skb, htons(ETH_P_8021Q), vid);
+ }
+
+ if (dev->features & NETIF_F_RXHASH)
+ skb_set_hash(gro_skb,
+ be32_to_cpu(cqe->immed_rss_invalid),
+ PKT_HASH_TYPE_L3);
+
+ skb_record_rx_queue(gro_skb, cq->ring);
+ skb_mark_napi_id(gro_skb, &cq->napi);
+
+ if (ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL) {
+ timestamp = mlx4_en_get_cqe_ts(cqe);
+ mlx4_en_fill_hwtstamps(mdev,
+ skb_hwtstamps(gro_skb),
+ timestamp);
+ }
+
+ napi_gro_frags(&cq->napi);
+ goto next;
+ }
+
+ /* GRO not possible, complete processing here */
skb = mlx4_en_rx_skb(priv, rx_desc, frags, length);
if (!skb) {
priv->stats.rx_dropped++;
--
1.7.1
^ permalink raw reply related
* [PATCH V2 net-next 0/2] Add CHECKSUM_COMPLETE support
From: Or Gerlitz @ 2014-11-06 16:28 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli,
Ido Shamay, Or Gerlitz
Hi Dave,
These patches from Shani, Matan and myself add support for
CHECKSUM_COMPLETE reporting on non TCP/UDP packets such as
GRE and ICMP. I'd like to deeply thank Jerry Chu for his
innovation and support in that effort.
Based on the feedback from Eric and Ido Shamay, in V2 we dropped
the patch which removed the calls to napi_gro_frags() and added
a patch which makes the RX code to go through that path
regardless of the checksum status.
Or.
changes from V0/V1:
- added a patch to extend usage napi_gro_frags()
- applied feedback from David Laight to optimally use checksum helpers
Shani Michaeli (2):
net/mlx4_en: Extend usage of napi_gro_frags
net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 5 +
drivers/net/ethernet/mellanox/mlx4/en_port.c | 2 +
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 229 +++++++++++++++++------
drivers/net/ethernet/mellanox/mlx4/main.c | 9 +
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 5 +-
include/linux/mlx4/device.h | 1 +
7 files changed, 193 insertions(+), 60 deletions(-)
^ permalink raw reply
* Re: [PATCHv2 net-next 2/2] sunvnet: Use one Tx queue per vnet_port
From: Sowmini Varadhan @ 2014-11-06 16:28 UTC (permalink / raw)
To: Ben Hutchings; +Cc: davem, netdev
In-Reply-To: <1415290773.3398.61.camel@decadent.org.uk>
On (11/06/14 16:19), Ben Hutchings wrote:
> > + txq = netdev_get_tx_queue(port->vp->dev, port->q_index);
> > + __netif_tx_lock(txq, smp_processor_id());
> > + if (likely(netif_tx_queue_stopped(txq))) {
> > + struct vio_dring_state *dr;
> > +
> > + dr = &port->vio.drings[VIO_DRIVER_TX_RING];
>
> You seem to have dropped the condition for the netif_tx_wake_queue(),
> which I would guess based on the old code should be:
>
> if (vnet_tx_dring_avail(dr) >= VNET_TX_WAKEUP_THRESH(dr))
>
> > + netif_tx_wake_queue(txq);
yes, this was deliberate.
As I indicated in the comments:
/* Got back a STOPPED LDC message on port. If the queue is stopped,
* wake it up so that we'll send out another START message at the
* next TX.
*/
We only call maybe_tx_wakeup() if the peer has sent us a STOPPED
ack (meaning that the peer is no longer reading the descriptor rings).
So if our tx queue is full and stopped, we need to poke the peer
on the next TX with a start message. (otherwise we'd never wake up!)
--Sowmini
^ permalink raw reply
* am335x: cpsw: phy ignores max-speed setting
From: Yegor Yefremov @ 2014-11-06 16:25 UTC (permalink / raw)
To: netdev; +Cc: N, Mugunthan V, mpa, lsorense, Daniel Mack
I' m trying to override max-speed setting for both CPSW connected
PHYs. This is my DTS section for configuring CPSW:
&mac {
pinctrl-names = "default", "sleep";
pinctrl-0 = <&cpsw_default>;
pinctrl-1 = <&cpsw_sleep>;
dual_emac = <1>;
status = "okay";
};
&davinci_mdio {
pinctrl-names = "default", "sleep";
pinctrl-0 = <&davinci_mdio_default>;
pinctrl-1 = <&davinci_mdio_sleep>;
status = "okay";
};
&cpsw_emac0 {
phy_id = <&davinci_mdio>, <0>;
phy-mode = "rgmii-id";
dual_emac_res_vlan = <1>;
max-speed = <100>;
};
&cpsw_emac1 {
phy_id = <&davinci_mdio>, <1>;
phy-mode = "rgmii-id";
dual_emac_res_vlan = <2>;
max-speed = <100>;
};
But in drivers/net/phy/phy_device.c->of_set_phy_supported() routine I
don't get through node check, i.e. node == NULL. Any idea why?
static void of_set_phy_supported(struct phy_device *phydev)
{
struct device_node *node = phydev->dev.of_node;
u32 max_speed;
if (!IS_ENABLED(CONFIG_OF_MDIO))
return;
if (!node)
return;
if (!of_property_read_u32(node, "max-speed", &max_speed)) {
/* The default values for phydev->supported are
provided by the PHY
* driver "features" member, we want to reset to sane
defaults fist
* before supporting higher speeds.
*/
phydev->supported &= PHY_DEFAULT_FEATURES;
switch (max_speed) {
default:
return;
case SPEED_1000:
phydev->supported |= PHY_1000BT_FEATURES;
case SPEED_100:
phydev->supported |= PHY_100BT_FEATURES;
case SPEED_10:
phydev->supported |= PHY_10BT_FEATURES;
}
}
}
Yegor
^ permalink raw reply
* Re: [PATCHv2 net-next 2/2] sunvnet: Use one Tx queue per vnet_port
From: Ben Hutchings @ 2014-11-06 16:19 UTC (permalink / raw)
To: Sowmini Varadhan; +Cc: davem, netdev
In-Reply-To: <20141030164609.GG650@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 1673 bytes --]
On Thu, 2014-10-30 at 12:46 -0400, Sowmini Varadhan wrote:
> Use multple Tx netdev queues for sunvnet by supporting a one-to-one
> mapping between vnet_port and Tx queue. Provide a ndo_select_queue
> indirection (vnet_select_queue()) which selects the queue based
> on the peer that would be selected in vnet_start_xmit()
>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
[...]
> +static void maybe_tx_wakeup(struct vnet_port *port)
> {
> - struct net_device *dev = vp->dev;
> + struct netdev_queue *txq;
>
> - netif_tx_lock(dev);
> - if (likely(netif_queue_stopped(dev))) {
> - struct vnet_port *port;
> - int wake = 1;
> -
> - rcu_read_lock();
> - list_for_each_entry_rcu(port, &vp->port_list, list) {
> - struct vio_dring_state *dr;
> -
> - dr = &port->vio.drings[VIO_DRIVER_TX_RING];
> - if (vnet_tx_dring_avail(dr) <
> - VNET_TX_WAKEUP_THRESH(dr)) {
> - wake = 0;
> - break;
> - }
> - }
> - rcu_read_unlock();
> - if (wake)
> - netif_wake_queue(dev);
> + txq = netdev_get_tx_queue(port->vp->dev, port->q_index);
> + __netif_tx_lock(txq, smp_processor_id());
> + if (likely(netif_tx_queue_stopped(txq))) {
> + struct vio_dring_state *dr;
> +
> + dr = &port->vio.drings[VIO_DRIVER_TX_RING];
You seem to have dropped the condition for the netif_tx_wake_queue(),
which I would guess based on the old code should be:
if (vnet_tx_dring_avail(dr) >= VNET_TX_WAKEUP_THRESH(dr))
> + netif_tx_wake_queue(txq);
> }
> - netif_tx_unlock(dev);
> + __netif_tx_unlock(txq);
> }
[...]
--
Ben Hutchings
Beware of programmers who carry screwdrivers. - Leonard Brandwein
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply
* Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics
From: Jesse Gross @ 2014-11-06 16:15 UTC (permalink / raw)
To: Tom Herbert
Cc: Sathya Perla, David Miller, Joe Stringer, Or Gerlitz,
Linux Netdev List, Jeff Kirsher, Amir Vadai, shahed.shaikh,
dept-gelinuxnicdev, LKML
In-Reply-To: <CA+mtBx87o1Lo2zpMyUGC_P+7+DmXnZyA2NF0-GYCEWiRh6VJQg@mail.gmail.com>
On Thu, Nov 6, 2014 at 8:06 AM, Tom Herbert <therbert@google.com> wrote:
> On Wed, Nov 5, 2014 at 10:16 PM, Sathya Perla <Sathya.Perla@emulex.com> wrote:
>>> -----Original Message-----
>>> From: Tom Herbert [mailto:therbert@google.com]
>>>
>>> On Wed, Nov 5, 2014 at 6:15 PM, David Miller <davem@davemloft.net>
>>> wrote:
>>> > From: Joe Stringer <joestringer@nicira.com>
>>> > Date: Wed, 5 Nov 2014 17:06:46 -0800
>>> >
>>> >> My impression was that the changes are more likely to be
>>> >> hardware-specific (like the i40e changes) rather than software-specific,
>>> >> like changes that might be integrated into the helper.
>>> >
>>> > I think there is more commonality amongst hardware capabilities,
>>> > and this is why I want the helper to play itself out.
>>> >
>>> >> That said, I can rework for one helper. The way I see it would be the
>>> >> same code as these patches, as "vxlan_gso_check(struct sk_buff *)" in
>>> >> drivers/net/vxlan.c which would be called from each driver. Is that what
>>> >> you had in mind?
>>> >
>>> > Yes.
>>>
>>> Note that this code is not VXLAN specific, it will also accept NVGRE
>>> and GRE/UDP with keyid and TEB. I imagine all these cases should be
>>> indistinguishable to the hardware so they probably just work (which
>>> would be cool!). It might be better to name and locate the helper
>>> function to reflect that.
>>
>> Tom, I'm confused as to how the value of (skb_inner_mac_header(skb) - skb_transport_header(skb))
>> would be the same for VxLAN and NVGRE encapsulated packets. Wouldn't this value be 16 for VxLAN
>> and 8 for NVGRE?
>>
> The inner headers are reset in iptunnel_handle_offloads. This is
> called in the xmit encapsulation functions (GRE, fou, VXLAN, etc.)
> before adding in encapsulation headers (skb_push), so the
> mac_inner_header will point to the encapsulation payload, i.e. the
> encapsulated packet. This should not change after being set, although
> inner network and inner transport can. The headers are only set on the
> first encapsulation, so with nested tunnels the inner headers point to
> the innermost encapsulated packet. Since VXLAN and NVGRE have same
> size of encapsulation (8 UDP + 8 header), skb_inner_mac_header(skb)
> - skb_transport_header(skb) should always be 16.
Tom, NVGRE is not encapsulated in UDP and it is not 16 bytes.
http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-06
^ permalink raw reply
* Re: [PATCHv2 net-next 0/3] RDMA/cxgb4,cxgb4vf,cxgb4i,csiostor: Cleanup macros
From: Hariprasad S @ 2014-11-06 16:15 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-rdma, linux-scsi, roland, JBottomley, hch, swise,
leedom, praveenm, anish, nirranjan, kumaras
In-Reply-To: <20141105.145443.897956918105740779.davem@davemloft.net>
On Wed, Nov 05, 2014 at 14:54:43 -0500, David Miller wrote:
> From: Hariprasad Shenai <hariprasad@chelsio.com>
> Date: Tue, 4 Nov 2014 08:20:54 +0530
>
> > It's not really the "hardware" which generates these hardware constant symbolic
> > macros/register defines of course, it's scripts developed by the hardware team.
> > Various patches have ended up changing the style of the symbolic macros/register
> > defines and some of them used the macros/register defines that matches the
> > output of the script from the hardware team.
>
> We've told you that we don't care what format your internal whatever uses
> for these macros.
>
> We have standards, tastes, and desires and reasons for naming macros
> in a certain way in upstream kernel code.
>
> I consider it flat out unacceptable to use macros with one letter
> prefixes like "S_". You simply should not do this.
>
Okay. We’ll clean up all of the macros to match the files' original style. We
do need to change the sense of the *_MASK macros since they don’t match how we
use them as field tokens. Also the *_SHIFT, *_MASK and *_GET names are
sucking up space and making lines wrap unnecessarily, creating readability
problems. Can we change these to *_S, *_M and *_G? E.g.:
-#define INGPADBOUNDARY_MASK 0x00000070U
-#define INGPADBOUNDARY_SHIFT 4
-#define INGPADBOUNDARY(x) ((x) << INGPADBOUNDARY_SHIFT)
-#define INGPADBOUNDARY_GET(x) (((x) & INGPADBOUNDARY_MASK) \
- >> INGPADBOUNDARY_SHIFT)
+#define INGPADBOUNDARY_M 0x00000007U
+#define INGPADBOUNDARY_S 4
+#define INGPADBOUNDARY(x) ((x) << INGPADBOUNDARY_S)
+#define INGPADBOUNDARY_G(x) (((x) >> INGPADBOUNDARY_S) \
+ & INGPADBOUNDARY_M)
Thanks,
Hari
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH net-next] rtnetlink: add babel protocol recognition
From: Stephen Hemminger @ 2014-11-06 16:10 UTC (permalink / raw)
To: David Miller, Dave Taht; +Cc: netdev
From: Dave Taht <dave.taht@bufferbloat.net>
Babel uses rt_proto 42. Add to userspace visible header file.
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
include/uapi/linux/rtnetlink.h | 1 +
1 file changed, 1 insertion(+)
--- a/include/uapi/linux/rtnetlink.h 2014-11-06 08:07:16.041165946 -0800
+++ b/include/uapi/linux/rtnetlink.h 2014-11-06 08:07:16.029165894 -0800
@@ -235,6 +235,7 @@ enum {
#define RTPROT_NTK 15 /* Netsukuku */
#define RTPROT_DHCP 16 /* DHCP client */
#define RTPROT_MROUTED 17 /* Multicast daemon */
+#define RTPROT_BABEL 42 /* Babel daemon */
/* rtm_scope
^ permalink raw reply
* Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics
From: Tom Herbert @ 2014-11-06 16:06 UTC (permalink / raw)
To: Sathya Perla
Cc: David Miller, Joe Stringer, Or Gerlitz, Linux Netdev List,
Jeff Kirsher, Amir Vadai, shahed.shaikh, dept-gelinuxnicdev, LKML
In-Reply-To: <CF9D1877D81D214CB0CA0669EFAE020C68CC79EF@CMEXMB1.ad.emulex.com>
The inner headers are reset in iptunnel_handle_offloads. This called
in the xmit encapsulation function (GRE, fou, VXLAN, etc.) before
added in encapsulation headers, so the inner headers will point to the
encapsulation payload, i.e. the encapsulated packet. The headers are
only on the first encapsulation, so with nested tunnels the inner
headers point to encapsulated packet. Since VXLAN and NVGRE have same
size of encapsulation (8 UDP + 8 header), skb_inner_mac_header(skb)
- skb_transport_header(skb) should always be 16.
On Wed, Nov 5, 2014 at 10:16 PM, Sathya Perla <Sathya.Perla@emulex.com> wrote:
>> -----Original Message-----
>> From: Tom Herbert [mailto:therbert@google.com]
>>
>> On Wed, Nov 5, 2014 at 6:15 PM, David Miller <davem@davemloft.net>
>> wrote:
>> > From: Joe Stringer <joestringer@nicira.com>
>> > Date: Wed, 5 Nov 2014 17:06:46 -0800
>> >
>> >> My impression was that the changes are more likely to be
>> >> hardware-specific (like the i40e changes) rather than software-specific,
>> >> like changes that might be integrated into the helper.
>> >
>> > I think there is more commonality amongst hardware capabilities,
>> > and this is why I want the helper to play itself out.
>> >
>> >> That said, I can rework for one helper. The way I see it would be the
>> >> same code as these patches, as "vxlan_gso_check(struct sk_buff *)" in
>> >> drivers/net/vxlan.c which would be called from each driver. Is that what
>> >> you had in mind?
>> >
>> > Yes.
>>
>> Note that this code is not VXLAN specific, it will also accept NVGRE
>> and GRE/UDP with keyid and TEB. I imagine all these cases should be
>> indistinguishable to the hardware so they probably just work (which
>> would be cool!). It might be better to name and locate the helper
>> function to reflect that.
>
> Tom, I'm confused as to how the value of (skb_inner_mac_header(skb) - skb_transport_header(skb))
> would be the same for VxLAN and NVGRE encapsulated packets. Wouldn't this value be 16 for VxLAN
> and 8 for NVGRE?
>
The inner headers are reset in iptunnel_handle_offloads. This is
called in the xmit encapsulation functions (GRE, fou, VXLAN, etc.)
before adding in encapsulation headers (skb_push), so the
mac_inner_header will point to the encapsulation payload, i.e. the
encapsulated packet. This should not change after being set, although
inner network and inner transport can. The headers are only set on the
first encapsulation, so with nested tunnels the inner headers point to
the innermost encapsulated packet. Since VXLAN and NVGRE have same
size of encapsulation (8 UDP + 8 header), skb_inner_mac_header(skb)
- skb_transport_header(skb) should always be 16.
Tom
> thks,
> -Sathya
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox