Netdev List
 help / color / mirror / Atom feed
* Re: [RFC iproute2 0/8] RDMA tool
From: Dennis Dalessandro @ 2017-05-04 19:26 UTC (permalink / raw)
  To: Leon Romanovsky, Bart Van Assche
  Cc: jiri@mellanox.com, linux-rdma@vger.kernel.org,
	ram.amrani@cavium.com, sagi@grimberg.me, ogerlitz@mellanox.com,
	hch@lst.de, netdev@vger.kernel.org,
	jgunthorpe@obsidianresearch.com, stephen@networkplumber.org,
	dledford@redhat.com, ariela@mellanox.com
In-Reply-To: <20170504184531.GE22833@mtr-leonro.local>

On 05/04/2017 02:45 PM, Leon Romanovsky wrote:
> On Thu, May 04, 2017 at 06:30:27PM +0000, Bart Van Assche wrote:
>> On Thu, 2017-05-04 at 21:25 +0300, Leon Romanovsky wrote:
>>> On Thu, May 04, 2017 at 06:10:54PM +0000, Bart Van Assche wrote:
>>>> On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:
>>>>> Following our discussion both in mailing list [1] and at the LPC 2016 [2],
>>>>> we would like to propose this RDMA tool to be part of iproute2 package
>>>>> and finally improve this situation.
>>>>
>>>> Hello Leon,
>>>>
>>>> Although I really appreciate your work: can you clarify why you would like to
>>>> add *RDMA* functionality to an *IP routing* tool? I haven't found any motivation
>>>> for adding RDMA functionality to iproute2 in [1].
>>>
>>> We are planning to reuse the same infrastructure provided by iproute2,
>>> like netlink parsing, access to distributions, same CLI and same standards.
>>>
>>> Right now, RDMA is already tightened to netdev: iWARP, RoCE, IPoIB, HFI-VNIC.
>>> Many drivers (mlx, qed, i40, cxgb) are sharing code between net and
>>> RDMA.
>>>
>>> I do expect that iproute2 will be installed on every machine with any
>>> type of connection, including IB and OPA.
>>>
>>> So I think that it is enough to be part of that suite and don't invent
>>> our own for one specific tool.
>>
>> Hello Leon,
>>
>> Sorry but to me that sounds like a weak argument for including RDMA functionality
>> in iproute2. There is already a library for communication over netlink sockets,
>> namely libnl. Is there functionality that is in iproute2 but not in libnl and
>> that is needed for the new tool? If so, have you considered to create a new
>> library for that functionality?
>
> It is not hard to create new tool, the hardest part is to ensure that it is
> part of the distributions. Did you count how many months we are trying to
> add rdma-core to debian?

I do agree that it is a strange pairing and am not really a fan. However 
at the end of the day it's just a name for a repo/package. If the 
iproute folks are fine to include rdma in their repo/package, great we 
can leverage their code for CLI and other common stuff.

Now if the interface was something like "ip -FlagForRdma ..." I would 
object to that, but the interface is "rdma ... " so from users 
perspective it's different tools. They don't need to care that it was 
sourced from a common git repo.

Just as an aside this already works a bit with OPA:

  $ ./rdma link
1/1: hfi1_0/1: ifname NONE cap_mask 0x00410022 lid 0x1 lid_mask_count 0 
link_layer InfiniBand
          phys_state 5: LinkUp rate 100 Gb/sec (4X EDR) sm_lid 0x1 sm_sl 
0 state 4: ACTIVE

Leon I'll get you more feedback and testing, I've just been really 
bogged down this week, sorry.

-Denny

^ permalink raw reply

* Re: [PATCH RESEND 4.4-only] netlink: Allow direct reclaim for fallback allocation
From: Greg Kroah-Hartman @ 2017-05-04 19:38 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: stable, David S. Miller, Eric Dumazet, netdev, linux-kernel
In-Reply-To: <1493801059-2828-1-git-send-email-ross.lagerwall@citrix.com>

On Wed, May 03, 2017 at 09:44:19AM +0100, Ross Lagerwall wrote:
> The backport of d35c99ff77ec ("netlink: do not enter direct reclaim from
> netlink_dump()") to the 4.4 branch (first in 4.4.32) mistakenly removed
> direct claim from the initial large allocation _and_ the fallback
> allocation which means that allocations can spuriously fail.
> Fix the issue by adding back the direct reclaim flag to the fallback
> allocation.
> 
> Fixes: 6d123f1d396b ("netlink: do not enter direct reclaim from netlink_dump()")
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
> 
> Note that this is only for the 4.4 branch as the regression is only in
> this branch. Consequently, there is no corresponding upstream commit.
> 
> I'm resending this to the linux-stable list since I now understand the
> netdev maintainer only handles backports for the last couple of versions
> of Linux.
> 

Many thanks for this fix, now queued up.

greg k-h

^ permalink raw reply

* Re: [Patch net v2] ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
From: David Ahern @ 2017-05-04 19:41 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: andreyknvl
In-Reply-To: <1493919363-19989-1-git-send-email-xiyou.wangcong@gmail.com>

On 5/4/17 11:36 AM, Cong Wang wrote:
> For each netns (except init_net), we initialize its null entry
> in 3 places:
> 
> 1) The template itself, as we use kmemdup()
> 2) Code around dst_init_metrics() in ip6_route_net_init()
> 3) ip6_route_dev_notify(), which is supposed to initialize it after
>    loopback registers
> 
> Unfortunately the last one still happens in a wrong order because
> we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to
> net->loopback_dev's idev, so we have to do that after we add
> idev to it. However, this notifier has priority == 0 same as
> ipv6_dev_notf, and ipv6_dev_notf is registered after
> ip6_route_dev_notifier so it is called actually after
> ip6_route_dev_notifier.
> 
> Fix it by picking a smaller priority for ip6_route_dev_notifier.
> Also, we have to release the refcnt accordingly when unregistering
> loopback_dev because device exit functions are called before subsys
> exit functions.
> 
> Cc: David Ahern <dsahern@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---

Commit message needs a tie in to the problem that Andrey reported. It
solves the same problem for namespaces other than init_net.

Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [RFC iproute2 0/8] RDMA tool
From: Leon Romanovsky @ 2017-05-04 19:42 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: Bart Van Assche, jiri-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ram.amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org,
	sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	hch-jcswGhMUV9g@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	ariela-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
In-Reply-To: <2dee6cde-0406-b101-0fe6-c1f6de7c1b1a-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3598 bytes --]

On Thu, May 04, 2017 at 03:26:13PM -0400, Dennis Dalessandro wrote:
> On 05/04/2017 02:45 PM, Leon Romanovsky wrote:
> > On Thu, May 04, 2017 at 06:30:27PM +0000, Bart Van Assche wrote:
> > > On Thu, 2017-05-04 at 21:25 +0300, Leon Romanovsky wrote:
> > > > On Thu, May 04, 2017 at 06:10:54PM +0000, Bart Van Assche wrote:
> > > > > On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:
> > > > > > Following our discussion both in mailing list [1] and at the LPC 2016 [2],
> > > > > > we would like to propose this RDMA tool to be part of iproute2 package
> > > > > > and finally improve this situation.
> > > > >
> > > > > Hello Leon,
> > > > >
> > > > > Although I really appreciate your work: can you clarify why you would like to
> > > > > add *RDMA* functionality to an *IP routing* tool? I haven't found any motivation
> > > > > for adding RDMA functionality to iproute2 in [1].
> > > >
> > > > We are planning to reuse the same infrastructure provided by iproute2,
> > > > like netlink parsing, access to distributions, same CLI and same standards.
> > > >
> > > > Right now, RDMA is already tightened to netdev: iWARP, RoCE, IPoIB, HFI-VNIC.
> > > > Many drivers (mlx, qed, i40, cxgb) are sharing code between net and
> > > > RDMA.
> > > >
> > > > I do expect that iproute2 will be installed on every machine with any
> > > > type of connection, including IB and OPA.
> > > >
> > > > So I think that it is enough to be part of that suite and don't invent
> > > > our own for one specific tool.
> > >
> > > Hello Leon,
> > >
> > > Sorry but to me that sounds like a weak argument for including RDMA functionality
> > > in iproute2. There is already a library for communication over netlink sockets,
> > > namely libnl. Is there functionality that is in iproute2 but not in libnl and
> > > that is needed for the new tool? If so, have you considered to create a new
> > > library for that functionality?
> >
> > It is not hard to create new tool, the hardest part is to ensure that it is
> > part of the distributions. Did you count how many months we are trying to
> > add rdma-core to debian?
>
> I do agree that it is a strange pairing and am not really a fan. However at
> the end of the day it's just a name for a repo/package. If the iproute folks
> are fine to include rdma in their repo/package, great we can leverage their
> code for CLI and other common stuff.
>
> Now if the interface was something like "ip -FlagForRdma ..." I would object
> to that, but the interface is "rdma ... " so from users perspective it's
> different tools. They don't need to care that it was sourced from a common
> git repo.
>
> Just as an aside this already works a bit with OPA:
>
>  $ ./rdma link
> 1/1: hfi1_0/1: ifname NONE cap_mask 0x00410022 lid 0x1 lid_mask_count 0
> link_layer InfiniBand
>          phys_state 5: LinkUp rate 100 Gb/sec (4X EDR) sm_lid 0x1 sm_sl 0
> state 4: ACTIVE
>
> Leon I'll get you more feedback and testing, I've just been really bogged
> down this week, sorry.

Thanks Denny,

Before you are starting to test it, can you please provide your feedback
on my initial questions? Usability and need of sysfs.

----
This is initial phase to understand if user experience for this tool fits
RDMA and netdev communities exepectations. Also I would like to get feedback
if it is really worth to provide legacy sysfs for old kernels, or maybe I should
implement netlink from the beginning and abandon sysfs completely.
-----

P.S. I believe this will give you wrong output, because it parses IB port cap_mask.
$./rdma link show hfi1_0/1 cap_mask

Thanks

>
> -Denny
>
>
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Patch "netlink: Allow direct reclaim for fallback allocation" has been added to the 4.4-stable tree
From: gregkh @ 2017-05-04 19:43 UTC (permalink / raw)
  To: ross.lagerwall, davem, edumazet, gregkh, linux-kernel, netdev,
	stable
  Cc: stable, stable-commits
In-Reply-To: <1493801059-2828-1-git-send-email-ross.lagerwall@citrix.com>


This is a note to let you know that I've just added the patch titled

    netlink: Allow direct reclaim for fallback allocation

to the 4.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     netlink-allow-direct-reclaim-for-fallback-allocation.patch
and it can be found in the queue-4.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


>From ross.lagerwall@citrix.com  Thu May  4 12:37:51 2017
From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Wed, 3 May 2017 09:44:19 +0100
Subject: netlink: Allow direct reclaim for fallback allocation
To: <stable@vger.kernel.org>
Cc: Ross Lagerwall <ross.lagerwall@citrix.com>, "David S. Miller" <davem@davemloft.net>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Eric Dumazet <edumazet@google.com>, <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Message-ID: <1493801059-2828-1-git-send-email-ross.lagerwall@citrix.com>

From: Ross Lagerwall <ross.lagerwall@citrix.com>

The backport of d35c99ff77ec ("netlink: do not enter direct reclaim from
netlink_dump()") to the 4.4 branch (first in 4.4.32) mistakenly removed
direct claim from the initial large allocation _and_ the fallback
allocation which means that allocations can spuriously fail.
Fix the issue by adding back the direct reclaim flag to the fallback
allocation.

Fixes: 6d123f1d396b ("netlink: do not enter direct reclaim from netlink_dump()")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

Note that this is only for the 4.4 branch as the regression is only in
this branch. Consequently, there is no corresponding upstream commit.

I'm resending this to the linux-stable list since I now understand the
netdev maintainer only handles backports for the last couple of versions
of Linux.

 net/netlink/af_netlink.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2107,7 +2107,7 @@ static int netlink_dump(struct sock *sk)
 	if (!skb) {
 		alloc_size = alloc_min_size;
 		skb = netlink_alloc_skb(sk, alloc_size, nlk->portid,
-					(GFP_KERNEL & ~__GFP_DIRECT_RECLAIM));
+					GFP_KERNEL);
 	}
 	if (!skb)
 		goto errout_skb;


Patches currently in stable-queue which might be from ross.lagerwall@citrix.com are

queue-4.4/netlink-allow-direct-reclaim-for-fallback-allocation.patch

^ permalink raw reply

* [PATCH] net: ipv4: add code comment for clarification
From: Gustavo A. R. Silva @ 2017-05-04 19:44 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, linux-kernel, Gustavo A. R. Silva, Joe Perches
In-Reply-To: <20170504142432.Horde.g9y26Ryxbtg1EIl_cnsdbbw@gator4166.hostgator.com>

Add code comment to make it clear that the position of the arguments
req->id.idiag_dport and req->id.idiag_sport is a locked in behavior
and it should not be changed.

Addresses-Coverity-ID: 1357474
Cc: David Miller <davem@davemloft.net>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
---
 net/ipv4/inet_diag.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 3828b3a..841800b 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -389,6 +389,12 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb,
 				  nlmsg_flags, unlh, net_admin);
 }
 
+/*
+ * Ignore the position of the arguments req->id.idiag_dport and
+ * req->id.idiag_sport in both calls to inet_lookup() and inet6_lookup()
+ * functions, once this is a locked in behavior exposed to user space.
+ * Changing this will break things for people.
+ */
 struct sock *inet_diag_find_one_icsk(struct net *net,
 				     struct inet_hashinfo *hashinfo,
 				     const struct inet_diag_req_v2 *req)
-- 
2.5.0

^ permalink raw reply related

* Re: [PATCH] iproute2: hide devices starting with period by default
From: David Ahern @ 2017-05-04 19:47 UTC (permalink / raw)
  To: Florian Fainelli, nicolas.dichtel; +Cc: David Miller, stephen, netdev
In-Reply-To: <26442035-25fd-ff9a-2804-ed21ec6198cf@gmail.com>

On 5/4/17 1:10 PM, Florian Fainelli wrote:
> On 05/04/2017 09:37 AM, David Ahern wrote:
>> On 5/4/17 9:15 AM, Nicolas Dichtel wrote:
>>> Le 24/02/2017 à 16:52, David Ahern a écrit :
>>>> On 2/23/17 8:12 PM, David Miller wrote:
>>>>> This really need to be a fundamental facility, so that it transparently
>>>>> works for NetworkManager, router daemons, everything.  Not just iproute2
>>>>> and "ls".
>>>>
>>>> I'll rebase my patch and send out as RFC.
>>>>
>>> David, did you finally send those patches?
>>>
>>
>> No, but for a few reasons.
>>
>> It is easy to hide devices in a dump:
>>
>> https://github.com/dsahern/linux/commit/48a80a00eac284e58bae04af10a5a932dd7aee00
>>
>>
>> But I think those devices should also not exist in sysfs or procfs which
>> overlaps what I would like to see for lightweight netdevices:
>>
>> https://github.com/dsahern/linux/commit/70574be699cf252e77f71e3df11192438689f976
> 
> Interesting that does indeed solve the same problems as the L2 only
> patch set intended. I am not exactly sure if hiding the devices from
> procfs/sysfs would be appropriate in my case (dumb L2 only switch that
> only does 802.1q for instance), but why not.
> 
> 
>>
>>
>> and to be complete, hidden devices should not be allowed to have a
>> network address or transmit packets which is the L2 only intent from
>> Florian:
>>     https://www.spinics.net/lists/netdev/msg340808.html
>>
> 
> Do you plan on submitting the LWT patch set at some point?

Definitely. Maybe I can find some time this weekend.

^ permalink raw reply

* [PATCH] aquantia: Fix "ethtool -S" crash when adapter down.
From: Pavel Belous @ 2017-05-04 20:10 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Lino Sanfilippo, Simon Edelhaus

From: Pavel Belous <pavel.belous@aquantia.com>

This patch fixes the crash that happens when driver tries to collect statistics
from already released "aq_vec" object.
If adapter is in "down" state we still allow user to see statistics from HW.

V2: fixed braces around "aq_vec_free".

Fixes: 97bde5c4f909 ("net: ethernet: aquantia: Support for NIC-specific code")
Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index cdb0299..9ee1c50 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -755,7 +755,7 @@ void aq_nic_get_stats(struct aq_nic_s *self, u64 *data)
 	count = 0U;
 
 	for (i = 0U, aq_vec = self->aq_vec[0];
-		self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
+		aq_vec && self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
 		data += count;
 		aq_vec_get_sw_stats(aq_vec, data, &count);
 	}
@@ -959,8 +959,10 @@ void aq_nic_free_hot_resources(struct aq_nic_s *self)
 		goto err_exit;
 
 	for (i = AQ_DIMOF(self->aq_vec); i--;) {
-		if (self->aq_vec[i])
+		if (self->aq_vec[i]) {
 			aq_vec_free(self->aq_vec[i]);
+			self->aq_vec[i] = NULL;
+		}
 	}
 
 err_exit:;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] aquantia: Fix "ethtool -S" crash when adapter down.
From: Joe Perches @ 2017-05-04 20:30 UTC (permalink / raw)
  To: Pavel Belous, David S . Miller
  Cc: netdev, David Arcari, Lino Sanfilippo, Simon Edelhaus
In-Reply-To: <9ea5a2ba21b4f8b22a63acf39135a41e2f1da23c.1493928470.git.pavel.belous@aquantia.com>

On Thu, 2017-05-04 at 23:10 +0300, Pavel Belous wrote:
> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
[]
> @@ -959,8 +959,10 @@ void aq_nic_free_hot_resources(struct aq_nic_s *self)
>  		goto err_exit;
>  
>  	for (i = AQ_DIMOF(self->aq_vec); i--;) {
> -		if (self->aq_vec[i])
> +		if (self->aq_vec[i]) {
>  			aq_vec_free(self->aq_vec[i]);
> +			self->aq_vec[i] = NULL;
> +		}
>  	}
>  
>  err_exit:;

unrelated style trivia:

err_exit:;

the error label then :; is pretty odd.

Also casting the return value to void is really odd.
A simple return instead of goto would be more common.

drivers/net/ethernet/aquantia/atlantic/aq_ring.c:311:err_exit:;
drivers/net/ethernet/aquantia/atlantic/aq_ring.c:326:err_exit:;
drivers/net/ethernet/aquantia/atlantic/aq_main.c:161:err_exit:;
drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:277:err_exit:;
drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:313:err_exit:;
drivers/net/ethernet/aquantia/atlantic/aq_nic.c:304:err_exit:;
drivers/net/ethernet/aquantia/atlantic/aq_nic.c:763:err_exit:;
drivers/net/ethernet/aquantia/atlantic/aq_nic.c:951:err_exit:;
drivers/net/ethernet/aquantia/atlantic/aq_nic.c:966:err_exit:;
drivers/net/ethernet/aquantia/atlantic/aq_vec.c:281:err_exit:;
drivers/net/ethernet/aquantia/atlantic/aq_vec.c:302:err_exit:;
drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c:251:err_exit:;
drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c:270:err_exit:;

^ permalink raw reply

* [PATCH 0/6] cxgb4: Fine-tuning for some function implementations
From: SF Markus Elfring @ 2017-05-04 20:30 UTC (permalink / raw)
  To: netdev, Casey Leedom, Ganesh Goudar; +Cc: LKML, kernel-janitors

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Thu, 4 May 2017 22:23:45 +0200

A few update suggestions were taken into account
from static source code analysis.

Markus Elfring (6):
  Use seq_putc() in mboxlog_show()
  Combine substrings for 24 messages
  Adjust five checks for null pointers
  Replace seven seq_puts() calls by seq_putc()
  Use seq_puts() in cim_qcfg_show()
  Combine substrings for two messages

 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  42 +++----
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c    | 125 ++++++++++++---------
 2 files changed, 86 insertions(+), 81 deletions(-)

-- 
2.12.2

^ permalink raw reply

* [PATCH 1/6] cxgb4vf: Use seq_putc() in mboxlog_show()
From: SF Markus Elfring @ 2017-05-04 20:31 UTC (permalink / raw)
  To: netdev, Casey Leedom, Ganesh Goudar; +Cc: LKML, kernel-janitors
In-Reply-To: <a3216042-2461-b397-dfa1-1de178d58513@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Thu, 4 May 2017 20:02:04 +0200

A single character (line break) should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index ac7a150c54e9..4ac9316f3081 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -1818,7 +1818,7 @@ static int mboxlog_show(struct seq_file *seq, void *v)
 
 		seq_printf(seq, "  %08x %08x", hi, lo);
 	}
-	seq_puts(seq, "\n");
+	seq_putc(seq, '\n');
 	return 0;
 }
 
-- 
2.12.2

^ permalink raw reply related

* [PATCH 2/6] cxgb4vf: Combine substrings for 24 messages
From: SF Markus Elfring @ 2017-05-04 20:32 UTC (permalink / raw)
  To: netdev, Casey Leedom, Ganesh Goudar; +Cc: LKML, kernel-janitors
In-Reply-To: <a3216042-2461-b397-dfa1-1de178d58513@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Thu, 4 May 2017 21:00:20 +0200

The script "checkpatch.pl" pointed information out like the following.

WARNING: quoted string split across lines

Thus fix the affected source code places.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c    | 113 ++++++++++++---------
 1 file changed, 64 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index 4ac9316f3081..9c2690aeb32b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -226,17 +226,20 @@ void t4vf_os_portmod_changed(struct adapter *adapter, int pidx)
 		dev_info(adapter->pdev_dev, "%s: %s port module inserted\n",
 			 dev->name, mod_str[pi->mod_type]);
 	else if (pi->mod_type == FW_PORT_MOD_TYPE_NOTSUPPORTED)
-		dev_info(adapter->pdev_dev, "%s: unsupported optical port "
-			 "module inserted\n", dev->name);
+		dev_info(adapter->pdev_dev,
+			 "%s: unsupported optical port module inserted\n",
+			 dev->name);
 	else if (pi->mod_type == FW_PORT_MOD_TYPE_UNKNOWN)
-		dev_info(adapter->pdev_dev, "%s: unknown port module inserted,"
-			 "forcing TWINAX\n", dev->name);
+		dev_info(adapter->pdev_dev,
+			 "%s: unknown port module inserted,forcing TWINAX\n",
+			 dev->name);
 	else if (pi->mod_type == FW_PORT_MOD_TYPE_ERROR)
 		dev_info(adapter->pdev_dev, "%s: transceiver module error\n",
 			 dev->name);
 	else
-		dev_info(adapter->pdev_dev, "%s: unknown module type %d "
-			 "inserted\n", dev->name, pi->mod_type);
+		dev_info(adapter->pdev_dev,
+			 "%s: unknown module type %d inserted\n",
+			 dev->name, pi->mod_type);
 }
 
 /*
@@ -2357,8 +2360,9 @@ static void size_nports_qsets(struct adapter *adapter)
 	 */
 	adapter->params.nports = vfres->nvi;
 	if (adapter->params.nports > MAX_NPORTS) {
-		dev_warn(adapter->pdev_dev, "only using %d of %d maximum"
-			 " allowed virtual interfaces\n", MAX_NPORTS,
+		dev_warn(adapter->pdev_dev,
+			 "only using %d of %d maximum allowed virtual interfaces\n",
+			 MAX_NPORTS,
 			 adapter->params.nports);
 		adapter->params.nports = MAX_NPORTS;
 	}
@@ -2370,9 +2374,9 @@ static void size_nports_qsets(struct adapter *adapter)
 	 */
 	pmask_nports = hweight32(adapter->params.vfres.pmask);
 	if (pmask_nports < adapter->params.nports) {
-		dev_warn(adapter->pdev_dev, "only using %d of %d provisioned"
-			 " virtual interfaces; limited by Port Access Rights"
-			 " mask %#x\n", pmask_nports, adapter->params.nports,
+		dev_warn(adapter->pdev_dev,
+			 "only using %d of %d provisioned virtual interfaces; limited by Port Access Rights mask %#x\n",
+			 pmask_nports, adapter->params.nports,
 			 adapter->params.vfres.pmask);
 		adapter->params.nports = pmask_nports;
 	}
@@ -2403,8 +2407,8 @@ static void size_nports_qsets(struct adapter *adapter)
 	adapter->sge.max_ethqsets = ethqsets;
 
 	if (adapter->sge.max_ethqsets < adapter->params.nports) {
-		dev_warn(adapter->pdev_dev, "only using %d of %d available"
-			 " virtual interfaces (too few Queue Sets)\n",
+		dev_warn(adapter->pdev_dev,
+			 "only using %d of %d available virtual interfaces (too few Queue Sets)\n",
 			 adapter->sge.max_ethqsets, adapter->params.nports);
 		adapter->params.nports = adapter->sge.max_ethqsets;
 	}
@@ -2448,38 +2452,44 @@ static int adap_init0(struct adapter *adapter)
 	 */
 	err = t4vf_get_dev_params(adapter);
 	if (err) {
-		dev_err(adapter->pdev_dev, "unable to retrieve adapter"
-			" device parameters: err=%d\n", err);
+		dev_err(adapter->pdev_dev,
+			"unable to retrieve adapter device parameters: err=%d\n",
+			err);
 		return err;
 	}
 	err = t4vf_get_vpd_params(adapter);
 	if (err) {
-		dev_err(adapter->pdev_dev, "unable to retrieve adapter"
-			" VPD parameters: err=%d\n", err);
+		dev_err(adapter->pdev_dev,
+			"unable to retrieve adapter VPD parameters: err=%d\n",
+			err);
 		return err;
 	}
 	err = t4vf_get_sge_params(adapter);
 	if (err) {
-		dev_err(adapter->pdev_dev, "unable to retrieve adapter"
-			" SGE parameters: err=%d\n", err);
+		dev_err(adapter->pdev_dev,
+			"unable to retrieve adapter SGE parameters: err=%d\n",
+			err);
 		return err;
 	}
 	err = t4vf_get_rss_glb_config(adapter);
 	if (err) {
-		dev_err(adapter->pdev_dev, "unable to retrieve adapter"
-			" RSS parameters: err=%d\n", err);
+		dev_err(adapter->pdev_dev,
+			"unable to retrieve adapter RSS parameters: err=%d\n",
+			err);
 		return err;
 	}
 	if (adapter->params.rss.mode !=
 	    FW_RSS_GLB_CONFIG_CMD_MODE_BASICVIRTUAL) {
-		dev_err(adapter->pdev_dev, "unable to operate with global RSS"
-			" mode %d\n", adapter->params.rss.mode);
+		dev_err(adapter->pdev_dev,
+			"unable to operate with global RSS mode %d\n",
+			adapter->params.rss.mode);
 		return -EINVAL;
 	}
 	err = t4vf_sge_init(adapter);
 	if (err) {
-		dev_err(adapter->pdev_dev, "unable to use adapter parameters:"
-			" err=%d\n", err);
+		dev_err(adapter->pdev_dev,
+			"unable to use adapter parameters: err=%d\n",
+			err);
 		return err;
 	}
 
@@ -2522,20 +2532,21 @@ static int adap_init0(struct adapter *adapter)
 	 */
 	err = t4vf_get_vfres(adapter);
 	if (err) {
-		dev_err(adapter->pdev_dev, "unable to get virtual interface"
-			" resources: err=%d\n", err);
+		dev_err(adapter->pdev_dev,
+			"unable to get virtual interface resources: err=%d\n",
+			err);
 		return err;
 	}
 
 	/* Check for various parameter sanity issues */
 	if (adapter->params.vfres.pmask == 0) {
-		dev_err(adapter->pdev_dev, "no port access configured\n"
-			"usable!\n");
+		dev_err(adapter->pdev_dev,
+			"no port access configured/usable!\n");
 		return -EINVAL;
 	}
 	if (adapter->params.vfres.nvi == 0) {
-		dev_err(adapter->pdev_dev, "no virtual interfaces configured/"
-			"usable!\n");
+		dev_err(adapter->pdev_dev,
+			"no virtual interfaces configured/usable!\n");
 		return -EINVAL;
 	}
 
@@ -2726,8 +2737,9 @@ static int enable_msix(struct adapter *adapter)
 
 	nqsets = want - MSIX_EXTRAS;
 	if (nqsets < s->max_ethqsets) {
-		dev_warn(adapter->pdev_dev, "only enough MSI-X vectors"
-			 " for %d Queue Sets\n", nqsets);
+		dev_warn(adapter->pdev_dev,
+			 "only enough MSI-X vectors for %d Queue Sets\n",
+			 nqsets);
 		s->max_ethqsets = nqsets;
 		if (nqsets < s->ethqsets)
 			reduce_ethqs(adapter, nqsets);
@@ -2804,8 +2816,8 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 	if (err == 0) {
 		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
 		if (err) {
-			dev_err(&pdev->dev, "unable to obtain 64-bit DMA for"
-				" coherent allocations\n");
+			dev_err(&pdev->dev,
+				"unable to obtain 64-bit DMA for coherent allocations\n");
 			goto err_release_regions;
 		}
 		pci_using_dac = 1;
@@ -2866,8 +2878,9 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 	 */
 	err = t4vf_prep_adapter(adapter);
 	if (err) {
-		dev_err(adapter->pdev_dev, "device didn't become ready:"
-			" err=%d\n", err);
+		dev_err(adapter->pdev_dev,
+			"device didn't become ready: err=%d\n",
+			err);
 		goto err_unmap_bar0;
 	}
 
@@ -2914,8 +2927,9 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 		pmask &= ~(1 << port_id);
 		viid = t4vf_alloc_vi(adapter, port_id);
 		if (viid < 0) {
-			dev_err(&pdev->dev, "cannot allocate VI for port %d:"
-				" err=%d\n", port_id, viid);
+			dev_err(&pdev->dev,
+				"cannot allocate VI for port %d: err=%d\n",
+				port_id, viid);
 			err = viid;
 			goto err_free_dev;
 		}
@@ -2978,8 +2992,8 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 		err = t4vf_get_vf_mac_acl(adapter, pf, &naddr, mac);
 		if (err) {
 			dev_err(&pdev->dev,
-				"unable to determine MAC ACL address, "
-				"continuing anyway.. (status %d)\n", err);
+				"unable to determine MAC ACL address, continuing anyway.. (status %d)\n",
+				err);
 		} else if (naddr && adapter->params.vfres.nvi == 1) {
 			struct sockaddr addr;
 
@@ -3006,8 +3020,7 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 	else {
 		if (msi == MSI_MSIX) {
 			dev_info(adapter->pdev_dev,
-				 "Unable to use MSI-X Interrupts; falling "
-				 "back to MSI Interrupts\n");
+				 "Unable to use MSI-X Interrupts; falling back to MSI Interrupts\n");
 
 			/* We're going to need a Forwarded Interrupt Queue so
 			 * that may cut into how many Queue Sets we can
@@ -3018,8 +3031,9 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 		}
 		err = pci_enable_msi(pdev);
 		if (err) {
-			dev_err(&pdev->dev, "Unable to allocate MSI Interrupts;"
-				" err=%d\n", err);
+			dev_err(&pdev->dev,
+				"Unable to allocate MSI Interrupts; err=%d\n",
+				err);
 			goto err_free_dev;
 		}
 		adapter->flags |= USING_MSI;
@@ -3047,8 +3061,9 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 
 		err = register_netdev(netdev);
 		if (err) {
-			dev_warn(&pdev->dev, "cannot register net device %s,"
-				 " skipping\n", netdev->name);
+			dev_warn(&pdev->dev,
+				 "cannot register net device %s, skipping\n",
+				 netdev->name);
 			continue;
 		}
 
@@ -3067,8 +3082,8 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 			debugfs_create_dir(pci_name(pdev),
 					   cxgb4vf_debugfs_root);
 		if (IS_ERR_OR_NULL(adapter->debugfs_root))
-			dev_warn(&pdev->dev, "could not create debugfs"
-				 " directory");
+			dev_warn(&pdev->dev,
+				 "could not create debugfs directory");
 		else
 			setup_debugfs(adapter);
 	}
-- 
2.12.2

^ permalink raw reply related

* [PATCH 3/6] cxgb4vf: Adjust five checks for null pointers
From: SF Markus Elfring @ 2017-05-04 20:33 UTC (permalink / raw)
  To: netdev, Casey Leedom, Ganesh Goudar; +Cc: LKML, kernel-janitors
In-Reply-To: <a3216042-2461-b397-dfa1-1de178d58513@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Thu, 4 May 2017 21:20:25 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The script “checkpatch.pl” pointed information out like the following.

Comparison to NULL could be written !…

Thus fix the affected source code places.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index 9c2690aeb32b..682e844c5a7d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -491,7 +491,7 @@ static int fwevtq_handler(struct sge_rspq *rspq, const __be64 *rsp,
 			break;
 		}
 		tq = s->egr_map[eq_idx];
-		if (unlikely(tq == NULL)) {
+		if (unlikely(!tq)) {
 			dev_err(adapter->pdev_dev,
 				"Egress Update QID %d TXQ=NULL\n", qid);
 			break;
@@ -2939,7 +2939,7 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 		 */
 		netdev = alloc_etherdev_mq(sizeof(struct port_info),
 					   MAX_PORT_QSETS);
-		if (netdev == NULL) {
+		if (!netdev) {
 			t4vf_free_vi(adapter, viid);
 			err = -ENOMEM;
 			goto err_free_dev;
@@ -3053,7 +3053,7 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 	for_each_port(adapter, pidx) {
 		struct port_info *pi = netdev_priv(adapter->port[pidx]);
 		netdev = adapter->port[pidx];
-		if (netdev == NULL)
+		if (!netdev)
 			continue;
 
 		netif_set_real_num_tx_queues(netdev, pi->nqsets);
@@ -3120,7 +3120,7 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 err_free_dev:
 	for_each_port(adapter, pidx) {
 		netdev = adapter->port[pidx];
-		if (netdev == NULL)
+		if (!netdev)
 			continue;
 		pi = netdev_priv(netdev);
 		t4vf_free_vi(adapter, pi->viid);
@@ -3197,7 +3197,7 @@ static void cxgb4vf_pci_remove(struct pci_dev *pdev)
 			struct net_device *netdev = adapter->port[pidx];
 			struct port_info *pi;
 
-			if (netdev == NULL)
+			if (!netdev)
 				continue;
 
 			pi = netdev_priv(netdev);
-- 
2.12.2

^ permalink raw reply related

* [PATCH 4/6] cxgb4: Replace seven seq_puts() calls by seq_putc()
From: SF Markus Elfring @ 2017-05-04 20:34 UTC (permalink / raw)
  To: netdev, Casey Leedom, Ganesh Goudar; +Cc: LKML, kernel-janitors
In-Reply-To: <a3216042-2461-b397-dfa1-1de178d58513@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Thu, 4 May 2017 21:40:54 +0200

Seven single characters (line breaks) should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 24 +++++++---------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 1fa34b009891..2bc40d89f874 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -278,7 +278,7 @@ static int cim_ma_la_show(struct seq_file *seq, void *v, int idx)
 	const u32 *p = v;
 
 	if (v == SEQ_START_TOKEN) {
-		seq_puts(seq, "\n");
+		seq_putc(seq, '\n');
 	} else if (idx < CIM_MALA_SIZE) {
 		seq_printf(seq, "%02x%08x%08x%08x%08x\n",
 			   p[4], p[3], p[2], p[1], p[0]);
@@ -1196,7 +1196,7 @@ static int mboxlog_show(struct seq_file *seq, void *v)
 
 		seq_printf(seq, "  %08x %08x", hi, lo);
 	}
-	seq_puts(seq, "\n");
+	seq_putc(seq, '\n');
 	return 0;
 }
 
@@ -2112,9 +2112,7 @@ static int rss_config_show(struct seq_file *seq, void *v)
 							HASHTOEPLITZ_F));
 	seq_printf(seq, "  Udp4En:        %3s\n", yesno(rssconf & UDPENABLE_F));
 	seq_printf(seq, "  Disable:       %3s\n", yesno(rssconf & DISABLE_F));
-
-	seq_puts(seq, "\n");
-
+	seq_putc(seq, '\n');
 	rssconf = t4_read_reg(adapter, TP_RSS_CONFIG_TNL_A);
 	seq_printf(seq, "TP_RSS_CONFIG_TNL: %#x\n", rssconf);
 	seq_printf(seq, "  MaskSize:      %3d\n", MASKSIZE_G(rssconf));
@@ -2126,25 +2124,19 @@ static int rss_config_show(struct seq_file *seq, void *v)
 			   yesno(rssconf & HASHETH_F));
 	}
 	seq_printf(seq, "  UseWireCh:     %3s\n", yesno(rssconf & USEWIRECH_F));
-
-	seq_puts(seq, "\n");
-
+	seq_putc(seq, '\n');
 	rssconf = t4_read_reg(adapter, TP_RSS_CONFIG_OFD_A);
 	seq_printf(seq, "TP_RSS_CONFIG_OFD: %#x\n", rssconf);
 	seq_printf(seq, "  MaskSize:      %3d\n", MASKSIZE_G(rssconf));
 	seq_printf(seq, "  RRCplMapEn:    %3s\n", yesno(rssconf &
 							RRCPLMAPEN_F));
 	seq_printf(seq, "  RRCplQueWidth: %3d\n", RRCPLQUEWIDTH_G(rssconf));
-
-	seq_puts(seq, "\n");
-
+	seq_putc(seq, '\n');
 	rssconf = t4_read_reg(adapter, TP_RSS_CONFIG_SYN_A);
 	seq_printf(seq, "TP_RSS_CONFIG_SYN: %#x\n", rssconf);
 	seq_printf(seq, "  MaskSize:      %3d\n", MASKSIZE_G(rssconf));
 	seq_printf(seq, "  UseWireCh:     %3s\n", yesno(rssconf & USEWIRECH_F));
-
-	seq_puts(seq, "\n");
-
+	seq_putc(seq, '\n');
 	rssconf = t4_read_reg(adapter, TP_RSS_CONFIG_VRT_A);
 	seq_printf(seq, "TP_RSS_CONFIG_VRT: %#x\n", rssconf);
 	if (CHELSIO_CHIP_VERSION(adapter->params.chip) > CHELSIO_T5) {
@@ -2170,9 +2162,7 @@ static int rss_config_show(struct seq_file *seq, void *v)
 	seq_printf(seq, "  VfWrEn:        %3s\n", yesno(rssconf & VFWREN_F));
 	seq_printf(seq, "  KeyWrEn:       %3s\n", yesno(rssconf & KEYWREN_F));
 	seq_printf(seq, "  KeyWrAddr:     %3d\n", KEYWRADDR_G(rssconf));
-
-	seq_puts(seq, "\n");
-
+	seq_putc(seq, '\n');
 	rssconf = t4_read_reg(adapter, TP_RSS_CONFIG_CNG_A);
 	seq_printf(seq, "TP_RSS_CONFIG_CNG: %#x\n", rssconf);
 	seq_printf(seq, "  ChnCount3:     %3s\n", yesno(rssconf & CHNCOUNT3_F));
-- 
2.12.2

^ permalink raw reply related

* [PATCH 5/6] cxgb4: Use seq_puts() in cim_qcfg_show()
From: SF Markus Elfring @ 2017-05-04 20:35 UTC (permalink / raw)
  To: netdev, Casey Leedom, Ganesh Goudar; +Cc: LKML, kernel-janitors
In-Reply-To: <a3216042-2461-b397-dfa1-1de178d58513@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Thu, 4 May 2017 21:52:32 +0200

A string which did not contain a data format specification should be put
into a sequence. Thus use the corresponding function "seq_puts".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 2bc40d89f874..32add8dfc253 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -357,9 +357,8 @@ static int cim_qcfg_show(struct seq_file *seq, void *v)
 		return i;
 
 	t4_read_cimq_cfg(adap, base, size, thres);
-
-	seq_printf(seq,
-		   "  Queue  Base  Size Thres  RdPtr WrPtr  SOP  EOP Avail\n");
+	seq_puts(seq,
+		 "  Queue  Base  Size Thres  RdPtr WrPtr  SOP  EOP Avail\n");
 	for (i = 0; i < CIM_NUM_IBQ; i++, p += 4)
 		seq_printf(seq, "%7s %5x %5u %5u %6x  %4x %4u %4u %5u\n",
 			   qname[i], base[i], size[i], thres[i],
-- 
2.12.2

^ permalink raw reply related

* [PATCH 6/6] cxgb4: Combine substrings for two messages
From: SF Markus Elfring @ 2017-05-04 20:36 UTC (permalink / raw)
  To: netdev, Casey Leedom, Ganesh Goudar; +Cc: LKML, kernel-janitors
In-Reply-To: <a3216042-2461-b397-dfa1-1de178d58513@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Thu, 4 May 2017 22:16:57 +0200

The script "checkpatch.pl" pointed information out like the following.

WARNING: quoted string split across lines

Thus fix two source code places.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 32add8dfc253..f9384d8b680d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -1353,8 +1353,9 @@ static int mps_trc_show(struct seq_file *seq, void *v)
 	if (tp.port < 8) {
 		i = adap->chan_map[tp.port & 3];
 		if (i >= MAX_NPORTS) {
-			dev_err(adap->pdev_dev, "tracer %u is assigned "
-				"to non-existing port\n", trcidx);
+			dev_err(adap->pdev_dev,
+				"tracer %u is assigned to non-existing port\n",
+				trcidx);
 			return -EINVAL;
 		}
 		seq_printf(seq, "tracer is capturing %s %s, ",
@@ -1798,11 +1799,11 @@ static int mps_tcam_show(struct seq_file *seq, void *v)
 				      FW_LDST_CMD_IDX_V(idx));
 			ret = t4_wr_mbox(adap, adap->mbox, &ldst_cmd,
 					 sizeof(ldst_cmd), &ldst_cmd);
-			if (ret)
-				dev_warn(adap->pdev_dev, "Can't read MPS "
-					 "replication map for idx %d: %d\n",
+			if (ret) {
+				dev_warn(adap->pdev_dev,
+					 "Can't read MPS replication map for idx %d: %d\n",
 					 idx, -ret);
-			else {
+			} else {
 				mps_rplc = ldst_cmd.u.mps.rplc;
 				rplc[0] = ntohl(mps_rplc.rplc31_0);
 				rplc[1] = ntohl(mps_rplc.rplc63_32);
-- 
2.12.2

^ permalink raw reply related

* Re: [PATCH] aquantia: Fix "ethtool -S" crash when adapter down.
From: David Arcari @ 2017-05-04 20:39 UTC (permalink / raw)
  To: Pavel Belous, David S . Miller; +Cc: netdev, Lino Sanfilippo, Simon Edelhaus
In-Reply-To: <9ea5a2ba21b4f8b22a63acf39135a41e2f1da23c.1493928470.git.pavel.belous@aquantia.com>

On 05/04/2017 04:10 PM, Pavel Belous wrote:
> From: Pavel Belous <pavel.belous@aquantia.com>
> 
> This patch fixes the crash that happens when driver tries to collect statistics
> from already released "aq_vec" object.
> If adapter is in "down" state we still allow user to see statistics from HW.
> 
> V2: fixed braces around "aq_vec_free".
> 
> Fixes: 97bde5c4f909 ("net: ethernet: aquantia: Support for NIC-specific code")
> Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
> ---
>  drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
> index cdb0299..9ee1c50 100644
> --- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
> @@ -755,7 +755,7 @@ void aq_nic_get_stats(struct aq_nic_s *self, u64 *data)
>  	count = 0U;
>  
>  	for (i = 0U, aq_vec = self->aq_vec[0];
> -		self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
> +		aq_vec && self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
>  		data += count;
>  		aq_vec_get_sw_stats(aq_vec, data, &count);
>  	}
> @@ -959,8 +959,10 @@ void aq_nic_free_hot_resources(struct aq_nic_s *self)
>  		goto err_exit;
>  
>  	for (i = AQ_DIMOF(self->aq_vec); i--;) {
> -		if (self->aq_vec[i])
> +		if (self->aq_vec[i]) {
>  			aq_vec_free(self->aq_vec[i]);
> +			self->aq_vec[i] = NULL;
> +		}
>  	}
>  
>  err_exit:;
> 

Resolves the ethtool crash.

Tested-by: David Arcari <darcari@redhat.com>

^ permalink raw reply

* Re: [RFC] iproute: Add support for extended ack to rtnl_talk
From: Phil Sutter @ 2017-05-04 20:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, dsahern, daniel, netdev
In-Reply-To: <20170504094356.66590a9a@xeon-e3>

Hi,

On Thu, May 04, 2017 at 09:43:56AM -0700, Stephen Hemminger wrote:
> On Thu, 04 May 2017 10:41:03 -0400 (EDT)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: David Ahern <dsahern@gmail.com>
> > Date: Thu, 4 May 2017 08:27:35 -0600
> > 
> > > On 5/4/17 3:36 AM, Daniel Borkmann wrote:  
> > >> What is the clear benefit/rationale of outsourcing this to
> > >> libmnl? I always was the impression we should strive for as little
> > >> dependencies as possible?  
> > > 
> > > +1  
> > 
> > Agreed, all else being equal iproute2 should be as self contained
> > as possible since it is such a fundamental tool.
> 
> Sorry, the old netlink code is more difficult to understand than libmnl.
> Having dependency on a library is not a problem. There already is
> an alternative implementation of ip commands in busybox for those
> people trying to work in small environments.

I second that. If you can't afford the extra ~24KB of libmnl on your
system, you much rather can't afford the 20 times bigger ip binary,
either.

Regarding conversion to libmnl, which I investigated and started working
on once: My gut feeling back then was that it's not quite worth the
effor since iproute2 requires an intermediate layer of functions anyway.
Another detail which I didn't like that much was libmnl's idiom of
creating netlink messages on base of just a plain buffer and using
mnl_nlmsg_put_header() et al. to populate it with data. I'm probably a
bit biased since I did the conversion to c99-style initializers for the
various struct req data types, but I didn't like the added run-time
overhead to achieve just the same.

So in summary, given that very little change happens to iproute2's
internal libnetlink, I don't see much urge to make it use libmnl as
backend. In my opinion it just adds another potential source of errors.

Eventually this should be a maintainer level decision, though. :)

Cheers, Phil

^ permalink raw reply

* Re: [RFC iproute2 0/8] RDMA tool
From: Doug Ledford @ 2017-05-04 20:45 UTC (permalink / raw)
  To: Dennis Dalessandro, Leon Romanovsky, Bart Van Assche
  Cc: jiri-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ram.amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org,
	sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	hch-jcswGhMUV9g@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org,
	ariela-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
In-Reply-To: <2dee6cde-0406-b101-0fe6-c1f6de7c1b1a-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

On Thu, 2017-05-04 at 15:26 -0400, Dennis Dalessandro wrote:
> On 05/04/2017 02:45 PM, Leon Romanovsky wrote:
> > 
> > On Thu, May 04, 2017 at 06:30:27PM +0000, Bart Van Assche wrote:
> > > 
> > > On Thu, 2017-05-04 at 21:25 +0300, Leon Romanovsky wrote:
> > > > 
> > > > On Thu, May 04, 2017 at 06:10:54PM +0000, Bart Van Assche
> > > > wrote:
> > > > > 
> > > > > On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:
> > > > > > 
> > > > > > Following our discussion both in mailing list [1] and at
> > > > > > the LPC 2016 [2],
> > > > > > we would like to propose this RDMA tool to be part of
> > > > > > iproute2 package
> > > > > > and finally improve this situation.
> > > > > 
> > > > > Hello Leon,
> > > > > 
> > > > > Although I really appreciate your work: can you clarify why
> > > > > you would like to
> > > > > add *RDMA* functionality to an *IP routing* tool? I haven't
> > > > > found any motivation
> > > > > for adding RDMA functionality to iproute2 in [1].
> > > > 
> > > > We are planning to reuse the same infrastructure provided by
> > > > iproute2,
> > > > like netlink parsing, access to distributions, same CLI and
> > > > same standards.
> > > > 
> > > > Right now, RDMA is already tightened to netdev: iWARP, RoCE,
> > > > IPoIB, HFI-VNIC.
> > > > Many drivers (mlx, qed, i40, cxgb) are sharing code between net
> > > > and
> > > > RDMA.
> > > > 
> > > > I do expect that iproute2 will be installed on every machine
> > > > with any
> > > > type of connection, including IB and OPA.
> > > > 
> > > > So I think that it is enough to be part of that suite and don't
> > > > invent
> > > > our own for one specific tool.
> > > 
> > > Hello Leon,
> > > 
> > > Sorry but to me that sounds like a weak argument for including
> > > RDMA functionality
> > > in iproute2. There is already a library for communication over
> > > netlink sockets,
> > > namely libnl. Is there functionality that is in iproute2 but not
> > > in libnl and
> > > that is needed for the new tool? If so, have you considered to
> > > create a new
> > > library for that functionality?
> > 
> > It is not hard to create new tool, the hardest part is to ensure
> > that it is
> > part of the distributions. Did you count how many months we are
> > trying to
> > add rdma-core to debian?
> 
> I do agree that it is a strange pairing and am not really a fan.
> However 
> at the end of the day it's just a name for a repo/package. If the 
> iproute folks are fine to include rdma in their repo/package, great
> we 
> can leverage their code for CLI and other common stuff.

If you look into the iproute2 package, it becomes clear that the name
iproute2 is historical and not really accurate any more.  It contains
things like the bridge control software, tc for controlling send
queues, and many things network related but not routing related.  The
rdma tool is a perfectly fine fit in the sense that it is an additional
network management tool IMO.

For reference, here's the list of stuff already in iproute on my Fedora
24 box:

/usr/sbin/arpd
/usr/sbin/bridge
/usr/sbin/cbq
/usr/sbin/ctstat
/usr/sbin/genl
/usr/sbin/ifcfg
/usr/sbin/ifstat
/usr/sbin/ip
/usr/sbin/lnstat
/usr/sbin/nstat
/usr/sbin/routef
/usr/sbin/routel
/usr/sbin/rtacct
/usr/sbin/rtmon
/usr/sbin/rtpr
/usr/sbin/rtstat
/usr/sbin/ss
/usr/sbin/tc
/usr/sbin/tipc

And in fact, if you check, tipc is almost similar to RDMA ;-)  So, I
suggest people not get hung up on the name iproute2, the fit is fine
when you look deeper into the nature of the package.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
From: Casey Leedom @ 2017-05-04 21:01 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Raj, Ashok, Bjorn Helgaas, Michael Werner, Ganesh GR, Arjun V.,
	Asit K Mallick, Patrick J Cramer, Suravee Suthikulpanit, Bob Shaw,
	h, Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight,
	Jeff Kirsher, Netdev
In-Reply-To: <CAKgT0UdJ17DGAphuP5Y9co6ky-GBFdq2s-1nqctY_Tz_iz5__g@mail.gmail.com>

| From: Alexander Duyck <alexander.duyck@gmail.com>
| Sent: Wednesday, May 3, 2017 9:02 AM
| ...
| It sounds like we are more or less in agreement. My only concern is
| really what we default this to. On x86 I would say we could probably
| default this to disabled for existing platforms since my understanding
| is that relaxed ordering doesn't provide much benefit on what is out
| there right now when performing DMA through the root complex. As far
| as peer-to-peer I would say we should probably look at enabling the
| ability to have Relaxed Ordering enabled for some channels but not
| others. In those cases the hardware needs to be smart enough to allow
| for you to indicate you want it disabled by default for most of your
| DMA channels, and then enabled for the select channels that are
| handling the peer-to-peer traffic.

  Yes, I think that we are mostly in agreement.  I had just wanted to make
sure that whatever scheme was developed would allow for simultaneously
supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
Ordering for others within the same system.  I.e. not simply
enabling/disabling/etc.  based solely on System Platform Architecture.

  By the way, I've started our QA folks off looking at what things look like
in Linux Virtual Machines under different Hypervisors to see what
information they may provide to the VM in the way of what Root Complex Port
is being used, etc.  So far they've got Windows HyperV done and there
there's no PCIe Fabric exposed in any way: just the attached device.  I'll
have to see what pci_find_pcie_root_port() returns in that environment.
Maybe NULL?

  With your reservations (which I also share), I think that it probably
makes sense to have a per-architecture definition of the "Can I Use Relaxed
Ordering With TLPs Directed At This End Point" predicate, with the default
being "No" for any architecture which doesn't implement the predicate.  And
if the specified (struct pci_dev *) End Node is NULL, it ought to return
False for that as well.  I can't see any reason to pass in the Source End
Node but I may be missing something.

  At this point, this is pretty far outside my level of expertise.  I'm
happy to give it a go, but I'd be even happier if someone with a lot more
experience in the PCIe Infrastructure were to want to carry the ball
forward.  I'm not super familiar with the Linux Kernel "Rules Of
Engagement", so let me know what my next step should be.  Thanks.

Casey

^ permalink raw reply

* Re: FEC on i.MX 7 transmit queue timeout
From: Stefan Agner @ 2017-05-04 21:36 UTC (permalink / raw)
  To: Andy Duan; +Cc: festevam, netdev, netdev-owner
In-Reply-To: <AM4PR0401MB2260F011313883C522EA7684FFEA0@AM4PR0401MB2260.eurprd04.prod.outlook.com>

On 2017-05-03 20:08, Andy Duan wrote:
> From: Stefan Agner <stefan@agner.ch> Sent: Thursday, May 04, 2017 9:22 AM
>>To: Andy Duan <fugang.duan@nxp.com>
>>Cc: fugang.duan@freescale.com; festevam@gmail.com;
>>netdev@vger.kernel.org; netdev-owner@vger.kernel.org
>>Subject: Re: FEC on i.MX 7 transmit queue timeout
>>
>>Hi Andy,
>>
>>On 2017-04-20 19:48, Andy Duan wrote:
>>> On 2017年04月20日 07:15, Stefan Agner wrote:
>>>> I tested again with imx6sx-fec compatible string. I could reproduce
>>>> it on a Colibri with i.MX 7Dual. But not always: It really depends
>>>> whether queue 2 is counting up or not. Just after boot, I check
>>>> /proc/interrupts twice, if queue 2 is counting it will happen!
>>>>
>>>> But if only queue 0 is mostly in use, then it seems to work just fine.
>>> If your case is only running best effort like tcp/udp, you can re-set
>>> the "fsl,num-tx-queues" and "fsl,num-rx-queues" to 1 in board dts file.
>>> Other two queues are for AVB audio/video queues, they have high
>>> priority than queue 0. If running iperf tcp test on the three queues,
>>> then the tcp segment may be out-of-order that cause net watchdog
>>timeout.
>>>>
>>>> I also tried i.MX 7Dual SabreSD here, and the same thing. I had to
>>>> reboot 3 times, then queue 2 was counting:
>>>>   57:          8     GIC-0 150 Level     30be0000.ethernet
>>>>   58:      20137     GIC-0 151 Level     30be0000.ethernet
>>>>   59:       9269     GIC-0 152 Level     30be0000.ethernet
>>>>
>>>> It took me about 40 minutes on Sabre until it happened, and I had to
>>>> force it using iperf, but then I got the ring dumps:
>>> My board had ran more than 47 hours with nfs rootfs in 4.11.0-rc6, but
>>> not running iperf.
>>> I am testing with iperf.
>>
>>Any update on this issue?
>>
>>When using iperf (server) on the board with Linux 4.11 the issue appears
>>within a few iperf iterations on a Sabre (TO 1.2, Board Rev C, if that matters)...
>>
> I don’t know whether you received my last mail. (maybe failed due to I
> received some rejection mails)

I think I did not... The last email I received was Fri, 21 Apr 2017
02:48:23 UTC.

 
> If your case is only running best effort like tcp/udp, you can re-set
> the "fsl,num-tx-queues" and "fsl,num-rx-queues" to 1 in board dts
> file.

I did test that, and it seems to work fine with those properties set to
1.

> Other two queues are for AVB audio/video queues, they have high
> priority than queue 0. If running iperf tcp test on the three queues,
> then the tcp segment may be out-of-order that cause net watchdog
> timeout.

Okay. A single event would be understandable, but it seems to enter some
kind of loop after that (continuously printing "fec 30be0000.ethernet
eth0: TX ring dump ...").

In a quick test I commented out the fec_dump call, with that it seems to
print only once and continues working afterwards (although, speed starts
to decrease, so something is not good at that point).

> In fsl kernel tree, there have one patch that only select the queue0
> for best effort like tcp/udp. Pls test again in your board, if no
> problem I will upstream the patch.

That sounds like a reasonable fix.

IP, no matter whether TCP/UDP, is the most common use case, so IMHO this
should "just work" by default.

^ permalink raw reply

* [Patch net] ipv4: restore rt->fi for reference counting
From: Cong Wang @ 2017-05-04 21:54 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, edumazet, Cong Wang

IPv4 dst could use fi->fib_metrics to store metrics but fib_info
itself is refcnt'ed, so without taking a refcnt fi and
fi->fib_metrics could be freed while dst metrics still points to
it. This triggers use-after-free as reported by Andrey twice.

This patch reverts commit 2860583fe840 ("ipv4: Kill rt->fi") to
restore this reference counting. It is a quick fix for -net and
-stable, for -net-next, as Eric suggested, we can consider doing
reference counting for metrics itself instead of relying on fib_info.

IPv6 is very different, it copies or steals the metrics from mx6_config
in fib6_commit_metrics() so probably doesn't need a refcnt.

Decnet has already done the refcnt'ing, see dn_fib_semantic_match().

Fixes: 2860583fe840 ("ipv4: Kill rt->fi")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/route.h |  1 +
 net/ipv4/route.c    | 18 +++++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/net/route.h b/include/net/route.h
index 2cc0e14..4335eb7 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -69,6 +69,7 @@ struct rtable {
 
 	struct list_head	rt_uncached;
 	struct uncached_list	*rt_uncached_list;
+	struct fib_info		*fi; /* for refcnt to shared metrics */
 };
 
 static inline bool rt_is_input_route(const struct rtable *rt)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 655d9ee..f647310 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1387,6 +1387,11 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
 {
 	struct rtable *rt = (struct rtable *) dst;
 
+	if (rt->fi) {
+		fib_info_put(rt->fi);
+		rt->fi = NULL;
+	}
+
 	if (!list_empty(&rt->rt_uncached)) {
 		struct uncached_list *ul = rt->rt_uncached_list;
 
@@ -1424,6 +1429,16 @@ static bool rt_cache_valid(const struct rtable *rt)
 		!rt_is_expired(rt);
 }
 
+static void rt_init_metrics(struct rtable *rt, struct fib_info *fi)
+{
+	if (fi->fib_metrics != (u32 *)dst_default_metrics) {
+		fib_info_hold(fi);
+		rt->fi = fi;
+	}
+
+	dst_init_metrics(&rt->dst, fi->fib_metrics, true);
+}
+
 static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 			   const struct fib_result *res,
 			   struct fib_nh_exception *fnhe,
@@ -1438,7 +1453,7 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 			rt->rt_gateway = nh->nh_gw;
 			rt->rt_uses_gateway = 1;
 		}
-		dst_init_metrics(&rt->dst, fi->fib_metrics, true);
+		rt_init_metrics(rt, fi);
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		rt->dst.tclassid = nh->nh_tclassid;
 #endif
@@ -1490,6 +1505,7 @@ struct rtable *rt_dst_alloc(struct net_device *dev,
 		rt->rt_gateway = 0;
 		rt->rt_uses_gateway = 0;
 		rt->rt_table_id = 0;
+		rt->fi = NULL;
 		INIT_LIST_HEAD(&rt->rt_uncached);
 
 		rt->dst.output = ip_output;
-- 
2.5.5

^ permalink raw reply related

* [PATCH iproute2] vxlan: Add support for modifying vxlan device attributes
From: Girish Moodalbail @ 2017-05-04 21:46 UTC (permalink / raw)
  To: stephen; +Cc: netdev

Ability to change vxlan device attributes was added to kernel through
commit 8bcdc4f3a20b ("vxlan: add changelink support"), however one
cannot do the same through ip(8) command.  Changing the allowed vxlan
device attributes using 'ip link set dev <vxlan_name> type vxlan
<allowed_attributes>' currently fails with 'operation not supported'
error.  This failure is due to the incorrect rtnetlink message
construction for the 'ip link set' operation.

The vxlan_parse_opt() callback function is called for parsing options
for both 'ip link add' and 'ip link set'. For the 'add' case, we pass
down default values for those attributes that were not provided as CLI
options. However, for the 'set' case we should be only passing down the
explicitly provided attributes and not any other (default) attributes.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
---
 ip/iplink_vxlan.c | 73 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 59 insertions(+), 14 deletions(-)

diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index b4ebb13..c8959aa 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -72,16 +72,25 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	struct in6_addr daddr6 = IN6ADDR_ANY_INIT;
 	unsigned int link = 0;
 	__u8 tos = 0;
+	bool tos_set = false;
 	__u8 ttl = 0;
+	bool ttl_set = false;
 	__u32 label = 0;
+	bool label_set = false;
 	__u8 learning = 1;
+	bool learning_set = false;
 	__u8 proxy = 0;
+	bool proxy_set = false;
 	__u8 rsc = 0;
+	bool rsc_set = false;
 	__u8 l2miss = 0;
+	bool l2miss_set = false;
 	__u8 l3miss = 0;
+	bool l3miss_set = false;
 	__u8 noage = 0;
 	__u32 age = 0;
 	__u32 maxaddr = 0;
+	bool maxaddr_set = false;
 	__u16 dstport = 0;
 	__u8 udpcsum = 0;
 	bool udpcsum_set = false;
@@ -90,12 +99,17 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u8 udp6zerocsumrx = 0;
 	bool udp6zerocsumrx_set = false;
 	__u8 remcsumtx = 0;
+	bool remcsumtx_set = false;
 	__u8 remcsumrx = 0;
+	bool remcsumrx_set = false;
 	__u8 metadata = 0;
+	bool metadata_set = false;
 	__u8 gbp = 0;
 	__u8 gpe = 0;
 	int dst_port_set = 0;
 	struct ifla_vxlan_port_range range = { 0, 0 };
+	bool set_op = (n->nlmsg_type == RTM_NEWLINK &&
+		       !(n->nlmsg_flags & NLM_F_CREATE));
 
 	while (argc > 0) {
 		if (!matches(*argv, "id") ||
@@ -152,6 +166,7 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 					invarg("TTL must be <= 255", *argv);
 				ttl = uval;
 			}
+			ttl_set = true;
 		} else if (!matches(*argv, "tos") ||
 			   !matches(*argv, "dsfield")) {
 			__u32 uval;
@@ -163,6 +178,7 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 				tos = uval;
 			} else
 				tos = 1;
+			tos_set = true;
 		} else if (!matches(*argv, "label") ||
 			   !matches(*argv, "flowlabel")) {
 			__u32 uval;
@@ -172,6 +188,7 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 			    (uval & ~LABEL_MAX_MASK))
 				invarg("invalid flowlabel", *argv);
 			label = htonl(uval);
+			label_set = true;
 		} else if (!matches(*argv, "ageing")) {
 			NEXT_ARG();
 			if (strcmp(*argv, "none") == 0)
@@ -184,6 +201,7 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 				maxaddr = 0;
 			else if (get_u32(&maxaddr, *argv, 0))
 				invarg("max addresses", *argv);
+			maxaddr_set = true;
 		} else if (!matches(*argv, "port") ||
 			   !matches(*argv, "srcport")) {
 			NEXT_ARG();
@@ -199,24 +217,34 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 			dst_port_set = 1;
 		} else if (!matches(*argv, "nolearning")) {
 			learning = 0;
+			learning_set = true;
 		} else if (!matches(*argv, "learning")) {
 			learning = 1;
+			learning_set = true;
 		} else if (!matches(*argv, "noproxy")) {
 			proxy = 0;
+			proxy_set = true;
 		} else if (!matches(*argv, "proxy")) {
 			proxy = 1;
+			proxy_set = true;
 		} else if (!matches(*argv, "norsc")) {
 			rsc = 0;
+			rsc_set = true;
 		} else if (!matches(*argv, "rsc")) {
 			rsc = 1;
+			rsc_set = true;
 		} else if (!matches(*argv, "nol2miss")) {
 			l2miss = 0;
+			l2miss_set = true;
 		} else if (!matches(*argv, "l2miss")) {
 			l2miss = 1;
+			l2miss_set = true;
 		} else if (!matches(*argv, "nol3miss")) {
 			l3miss = 0;
+			l3miss_set = true;
 		} else if (!matches(*argv, "l3miss")) {
 			l3miss = 1;
+			l3miss_set = true;
 		} else if (!matches(*argv, "udpcsum")) {
 			udpcsum = 1;
 			udpcsum_set = true;
@@ -237,17 +265,24 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 			udp6zerocsumrx_set = true;
 		} else if (!matches(*argv, "remcsumtx")) {
 			remcsumtx = 1;
+			remcsumtx_set = true;
 		} else if (!matches(*argv, "noremcsumtx")) {
 			remcsumtx = 0;
+			remcsumtx_set = true;
 		} else if (!matches(*argv, "remcsumrx")) {
 			remcsumrx = 1;
+			remcsumrx_set = true;
 		} else if (!matches(*argv, "noremcsumrx")) {
 			remcsumrx = 0;
+			remcsumrx_set = true;
 		} else if (!matches(*argv, "external")) {
 			metadata = 1;
 			learning = 0;
+			metadata_set = true;
+			learning_set = true;
 		} else if (!matches(*argv, "noexternal")) {
 			metadata = 0;
+			metadata_set = true;
 		} else if (!matches(*argv, "gbp")) {
 			gbp = 1;
 		} else if (!matches(*argv, "gpe")) {
@@ -287,7 +322,7 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 
 	if (!dst_port_set && gpe) {
 		dstport = 4790;
-	} else if (!dst_port_set) {
+	} else if (!dst_port_set && !set_op) {
 		fprintf(stderr, "vxlan: destination port not specified\n"
 			"Will use Linux kernel default (non-standard value)\n");
 		fprintf(stderr,
@@ -316,18 +351,28 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 
 	if (link)
 		addattr32(n, 1024, IFLA_VXLAN_LINK, link);
-	addattr32(n, 1024, IFLA_VXLAN_LABEL, label);
-	addattr8(n, 1024, IFLA_VXLAN_TTL, ttl);
-	addattr8(n, 1024, IFLA_VXLAN_TOS, tos);
-	addattr8(n, 1024, IFLA_VXLAN_LEARNING, learning);
-	addattr8(n, 1024, IFLA_VXLAN_PROXY, proxy);
-	addattr8(n, 1024, IFLA_VXLAN_RSC, rsc);
-	addattr8(n, 1024, IFLA_VXLAN_L2MISS, l2miss);
-	addattr8(n, 1024, IFLA_VXLAN_L3MISS, l3miss);
-	addattr8(n, 1024, IFLA_VXLAN_REMCSUM_TX, remcsumtx);
-	addattr8(n, 1024, IFLA_VXLAN_REMCSUM_RX, remcsumrx);
-	addattr8(n, 1024, IFLA_VXLAN_COLLECT_METADATA, metadata);
-
+	if (label_set)
+		addattr32(n, 1024, IFLA_VXLAN_LABEL, label);
+	if (ttl_set)
+		addattr8(n, 1024, IFLA_VXLAN_TTL, ttl);
+	if (tos_set)
+		addattr8(n, 1024, IFLA_VXLAN_TOS, tos);
+	if (!set_op || learning_set)
+		addattr8(n, 1024, IFLA_VXLAN_LEARNING, learning);
+	if (proxy_set)
+		addattr8(n, 1024, IFLA_VXLAN_PROXY, proxy);
+	if (rsc_set)
+		addattr8(n, 1024, IFLA_VXLAN_RSC, rsc);
+	if (l2miss_set)
+		addattr8(n, 1024, IFLA_VXLAN_L2MISS, l2miss);
+	if (l3miss_set)
+		addattr8(n, 1024, IFLA_VXLAN_L3MISS, l3miss);
+	if (remcsumtx_set)
+		addattr8(n, 1024, IFLA_VXLAN_REMCSUM_TX, remcsumtx);
+	if (remcsumrx_set)
+		addattr8(n, 1024, IFLA_VXLAN_REMCSUM_RX, remcsumrx);
+	if (metadata_set)
+		addattr8(n, 1024, IFLA_VXLAN_COLLECT_METADATA, metadata);
 	if (udpcsum_set)
 		addattr8(n, 1024, IFLA_VXLAN_UDP_CSUM, udpcsum);
 	if (udp6zerocsumtx_set)
@@ -338,7 +383,7 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 		addattr32(n, 1024, IFLA_VXLAN_AGEING, 0);
 	else if (age)
 		addattr32(n, 1024, IFLA_VXLAN_AGEING, age);
-	if (maxaddr)
+	if (maxaddr_set)
 		addattr32(n, 1024, IFLA_VXLAN_LIMIT, maxaddr);
 	if (range.low || range.high)
 		addattr_l(n, 1024, IFLA_VXLAN_PORT_RANGE,
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net] tcp: randomize timestamps on syncookies
From: Eric Dumazet @ 2017-05-04 22:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Florian Westphal, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

Whole point of randomization was to hide server uptime, but an attacker
can simply start a syn flood and TCP generates 'old style' timestamps,
directly revealing server jiffies value.

Also, TSval sent by the server to a particular remote address vary depending
on syncookies being sent or not, potentially triggering PAWS drops for
innocent clients.

Lets implement proper randomization, including for SYNcookies.

Also we do not need to export sysctl_tcp_timestamps, it is not used from
a module.

Fixes: 95a22caee396c ("tcp: randomize tcp timestamp offsets for each connection")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Yuchung Cheng <ycheng@google.com>
---
 include/net/secure_seq.h |   10 ++++++----
 include/net/tcp.h        |    3 ++-
 net/core/secure_seq.c    |   31 +++++++++++++++++++------------
 net/ipv4/tcp_input.c     |    8 +++-----
 net/ipv4/tcp_ipv4.c      |   31 +++++++++++++++++++------------
 net/ipv6/tcp_ipv6.c      |   32 +++++++++++++++++++-------------
 6 files changed, 68 insertions(+), 47 deletions(-)

diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h
index fe236b3429f0d8caeb1adc367b5b4a20591c848b..b94006f6fbdde0d78fe33b9c2d86159e291c30cf 100644
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -6,10 +6,12 @@
 u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport);
-u32 secure_tcp_seq_and_tsoff(__be32 saddr, __be32 daddr,
-			     __be16 sport, __be16 dport, u32 *tsoff);
-u32 secure_tcpv6_seq_and_tsoff(const __be32 *saddr, const __be32 *daddr,
-			       __be16 sport, __be16 dport, u32 *tsoff);
+u32 secure_tcp_seq(__be32 saddr, __be32 daddr,
+		   __be16 sport, __be16 dport);
+u32 secure_tcp_ts_off(__be32 saddr, __be32 daddr);
+u32 secure_tcpv6_seq(const __be32 *saddr, const __be32 *daddr,
+		     __be16 sport, __be16 dport);
+u32 secure_tcpv6_ts_off(const __be32 *saddr, const __be32 *daddr);
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 				__be16 sport, __be16 dport);
 u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 270e5cc43c99e7030e95af218095cf9f283950bc..1be353fc5cb1c313e8fec350d8a0a9ccc8f770ee 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1822,7 +1822,8 @@ struct tcp_request_sock_ops {
 #endif
 	struct dst_entry *(*route_req)(const struct sock *sk, struct flowi *fl,
 				       const struct request_sock *req);
-	__u32 (*init_seq_tsoff)(const struct sk_buff *skb, u32 *tsoff);
+	u32 (*init_seq)(const struct sk_buff *skb);
+	u32 (*init_ts_off)(const struct sk_buff *skb);
 	int (*send_synack)(const struct sock *sk, struct dst_entry *dst,
 			   struct flowi *fl, struct request_sock *req,
 			   struct tcp_fastopen_cookie *foc,
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 6bd2f8fb0476baabf507557fc0d06b6787511c70..ae35cce3a40d70387bee815798933aa43a0e6d84 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -24,9 +24,13 @@ static siphash_key_t ts_secret __read_mostly;
 
 static __always_inline void net_secret_init(void)
 {
-	net_get_random_once(&ts_secret, sizeof(ts_secret));
 	net_get_random_once(&net_secret, sizeof(net_secret));
 }
+
+static __always_inline void ts_secret_init(void)
+{
+	net_get_random_once(&ts_secret, sizeof(ts_secret));
+}
 #endif
 
 #ifdef CONFIG_INET
@@ -47,7 +51,7 @@ static u32 seq_scale(u32 seq)
 #endif
 
 #if IS_ENABLED(CONFIG_IPV6)
-static u32 secure_tcpv6_ts_off(const __be32 *saddr, const __be32 *daddr)
+u32 secure_tcpv6_ts_off(const __be32 *saddr, const __be32 *daddr)
 {
 	const struct {
 		struct in6_addr saddr;
@@ -60,12 +64,14 @@ static u32 secure_tcpv6_ts_off(const __be32 *saddr, const __be32 *daddr)
 	if (sysctl_tcp_timestamps != 1)
 		return 0;
 
+	ts_secret_init();
 	return siphash(&combined, offsetofend(typeof(combined), daddr),
 		       &ts_secret);
 }
+EXPORT_SYMBOL(secure_tcpv6_ts_off);
 
-u32 secure_tcpv6_seq_and_tsoff(const __be32 *saddr, const __be32 *daddr,
-			       __be16 sport, __be16 dport, u32 *tsoff)
+u32 secure_tcpv6_seq(const __be32 *saddr, const __be32 *daddr,
+		     __be16 sport, __be16 dport)
 {
 	const struct {
 		struct in6_addr saddr;
@@ -78,14 +84,14 @@ u32 secure_tcpv6_seq_and_tsoff(const __be32 *saddr, const __be32 *daddr,
 		.sport = sport,
 		.dport = dport
 	};
-	u64 hash;
+	u32 hash;
+
 	net_secret_init();
 	hash = siphash(&combined, offsetofend(typeof(combined), dport),
 		       &net_secret);
-	*tsoff = secure_tcpv6_ts_off(saddr, daddr);
 	return seq_scale(hash);
 }
-EXPORT_SYMBOL(secure_tcpv6_seq_and_tsoff);
+EXPORT_SYMBOL(secure_tcpv6_seq);
 
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport)
@@ -107,11 +113,12 @@ EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 #endif
 
 #ifdef CONFIG_INET
-static u32 secure_tcp_ts_off(__be32 saddr, __be32 daddr)
+u32 secure_tcp_ts_off(__be32 saddr, __be32 daddr)
 {
 	if (sysctl_tcp_timestamps != 1)
 		return 0;
 
+	ts_secret_init();
 	return siphash_2u32((__force u32)saddr, (__force u32)daddr,
 			    &ts_secret);
 }
@@ -121,15 +128,15 @@ static u32 secure_tcp_ts_off(__be32 saddr, __be32 daddr)
  * it would be easy enough to have the former function use siphash_4u32, passing
  * the arguments as separate u32.
  */
-u32 secure_tcp_seq_and_tsoff(__be32 saddr, __be32 daddr,
-			     __be16 sport, __be16 dport, u32 *tsoff)
+u32 secure_tcp_seq(__be32 saddr, __be32 daddr,
+		   __be16 sport, __be16 dport)
 {
-	u64 hash;
+	u32 hash;
+
 	net_secret_init();
 	hash = siphash_3u32((__force u32)saddr, (__force u32)daddr,
 			    (__force u32)sport << 16 | (__force u32)dport,
 			    &net_secret);
-	*tsoff = secure_tcp_ts_off(saddr, daddr);
 	return seq_scale(hash);
 }
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9739962bfb3fd2d39cb13f643def223f4f17fcb6..5a3ad09e2786fb41ad12681d09938c645b69866d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -85,7 +85,6 @@ int sysctl_tcp_dsack __read_mostly = 1;
 int sysctl_tcp_app_win __read_mostly = 31;
 int sysctl_tcp_adv_win_scale __read_mostly = 1;
 EXPORT_SYMBOL(sysctl_tcp_adv_win_scale);
-EXPORT_SYMBOL(sysctl_tcp_timestamps);
 
 /* rfc5961 challenge ack rate limiting */
 int sysctl_tcp_challenge_ack_limit = 1000;
@@ -6347,8 +6346,8 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	if (security_inet_conn_request(sk, skb, req))
 		goto drop_and_free;
 
-	if (isn && tmp_opt.tstamp_ok)
-		af_ops->init_seq_tsoff(skb, &tcp_rsk(req)->ts_off);
+	if (tmp_opt.tstamp_ok)
+		tcp_rsk(req)->ts_off = af_ops->init_ts_off(skb);
 
 	if (!want_cookie && !isn) {
 		/* Kill the following clause, if you dislike this way. */
@@ -6368,7 +6367,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 			goto drop_and_release;
 		}
 
-		isn = af_ops->init_seq_tsoff(skb, &tcp_rsk(req)->ts_off);
+		isn = af_ops->init_seq(skb);
 	}
 	if (!dst) {
 		dst = af_ops->route_req(sk, &fl, req);
@@ -6380,7 +6379,6 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 
 	if (want_cookie) {
 		isn = cookie_init_sequence(af_ops, sk, skb, &req->mss);
-		tcp_rsk(req)->ts_off = 0;
 		req->cookie_ts = tmp_opt.tstamp_ok;
 		if (!tmp_opt.tstamp_ok)
 			inet_rsk(req)->ecn_ok = 0;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index cbbafe546c0f5c5f43531eaf24f5b460264785c6..e7d4e47f83ae2074b6ca602bc50a39d5defb85c5 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -94,12 +94,18 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key,
 struct inet_hashinfo tcp_hashinfo;
 EXPORT_SYMBOL(tcp_hashinfo);
 
-static u32 tcp_v4_init_seq_and_tsoff(const struct sk_buff *skb, u32 *tsoff)
+static u32 tcp_v4_init_seq(const struct sk_buff *skb)
 {
-	return secure_tcp_seq_and_tsoff(ip_hdr(skb)->daddr,
-					ip_hdr(skb)->saddr,
-					tcp_hdr(skb)->dest,
-					tcp_hdr(skb)->source, tsoff);
+	return secure_tcp_seq(ip_hdr(skb)->daddr,
+			      ip_hdr(skb)->saddr,
+			      tcp_hdr(skb)->dest,
+			      tcp_hdr(skb)->source);
+}
+
+static u32 tcp_v4_init_ts_off(const struct sk_buff *skb)
+{
+	return secure_tcp_ts_off(ip_hdr(skb)->daddr,
+				 ip_hdr(skb)->saddr);
 }
 
 int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
@@ -232,13 +238,13 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	rt = NULL;
 
 	if (likely(!tp->repair)) {
-		seq = secure_tcp_seq_and_tsoff(inet->inet_saddr,
-					       inet->inet_daddr,
-					       inet->inet_sport,
-					       usin->sin_port,
-					       &tp->tsoffset);
 		if (!tp->write_seq)
-			tp->write_seq = seq;
+			tp->write_seq = secure_tcp_seq(inet->inet_saddr,
+						       inet->inet_daddr,
+						       inet->inet_sport,
+						       usin->sin_port);
+		tp->tsoffset = secure_tcp_ts_off(inet->inet_saddr,
+						 inet->inet_daddr);
 	}
 
 	inet->inet_id = tp->write_seq ^ jiffies;
@@ -1239,7 +1245,8 @@ static const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops = {
 	.cookie_init_seq =	cookie_v4_init_sequence,
 #endif
 	.route_req	=	tcp_v4_route_req,
-	.init_seq_tsoff	=	tcp_v4_init_seq_and_tsoff,
+	.init_seq	=	tcp_v4_init_seq,
+	.init_ts_off	=	tcp_v4_init_ts_off,
 	.send_synack	=	tcp_v4_send_synack,
 };
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 8e42e8f54b705ed8780890c7434feeff1055599a..aeb9497b5bb754f2277dec4a4dec02bf25bdbbe5 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -101,12 +101,18 @@ static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 	}
 }
 
-static u32 tcp_v6_init_seq_and_tsoff(const struct sk_buff *skb, u32 *tsoff)
+static u32 tcp_v6_init_seq(const struct sk_buff *skb)
 {
-	return secure_tcpv6_seq_and_tsoff(ipv6_hdr(skb)->daddr.s6_addr32,
-					  ipv6_hdr(skb)->saddr.s6_addr32,
-					  tcp_hdr(skb)->dest,
-					  tcp_hdr(skb)->source, tsoff);
+	return secure_tcpv6_seq(ipv6_hdr(skb)->daddr.s6_addr32,
+				ipv6_hdr(skb)->saddr.s6_addr32,
+				tcp_hdr(skb)->dest,
+				tcp_hdr(skb)->source);
+}
+
+static u32 tcp_v6_init_ts_off(const struct sk_buff *skb)
+{
+	return secure_tcpv6_ts_off(ipv6_hdr(skb)->daddr.s6_addr32,
+				   ipv6_hdr(skb)->saddr.s6_addr32);
 }
 
 static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
@@ -122,7 +128,6 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	struct flowi6 fl6;
 	struct dst_entry *dst;
 	int addr_type;
-	u32 seq;
 	int err;
 	struct inet_timewait_death_row *tcp_death_row = &sock_net(sk)->ipv4.tcp_death_row;
 
@@ -282,13 +287,13 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	sk_set_txhash(sk);
 
 	if (likely(!tp->repair)) {
-		seq = secure_tcpv6_seq_and_tsoff(np->saddr.s6_addr32,
-						 sk->sk_v6_daddr.s6_addr32,
-						 inet->inet_sport,
-						 inet->inet_dport,
-						 &tp->tsoffset);
 		if (!tp->write_seq)
-			tp->write_seq = seq;
+			tp->write_seq = secure_tcpv6_seq(np->saddr.s6_addr32,
+							 sk->sk_v6_daddr.s6_addr32,
+							 inet->inet_sport,
+							 inet->inet_dport);
+		tp->tsoffset = secure_tcpv6_ts_off(np->saddr.s6_addr32,
+						   sk->sk_v6_daddr.s6_addr32);
 	}
 
 	if (tcp_fastopen_defer_connect(sk, &err))
@@ -749,7 +754,8 @@ static const struct tcp_request_sock_ops tcp_request_sock_ipv6_ops = {
 	.cookie_init_seq =	cookie_v6_init_sequence,
 #endif
 	.route_req	=	tcp_v6_route_req,
-	.init_seq_tsoff	=	tcp_v6_init_seq_and_tsoff,
+	.init_seq	=	tcp_v6_init_seq,
+	.init_ts_off	=	tcp_v6_init_ts_off,
 	.send_synack	=	tcp_v6_send_synack,
 };
 

^ permalink raw reply related

* Re: [PATCH net-next 9/9] ipvlan: introduce individual MAC addresses
From: Mahesh Bandewar (महेश बंडेवार) @ 2017-05-04 22:09 UTC (permalink / raw)
  To: Chiappero, Marco
  Cc: Dan Williams, netdev@vger.kernel.org, David S . Miller,
	Kirsher, Jeffrey T, Duyck, Alexander H, Grandhi, Sainath
In-Reply-To: <695CDAEB5A6CE2498DE413F2A9AA82960915EF08@IRSMSX101.ger.corp.intel.com>

On Thu, May 4, 2017 at 2:37 AM, Chiappero, Marco
<marco.chiappero@intel.com> wrote:
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>> On Behalf Of Dan Williams
>> Sent: Tuesday, May 2, 2017 5:09 PM
>> To: Chiappero, Marco <marco.chiappero@intel.com>; netdev@vger.kernel.org
>> Cc: David S . Miller <davem@davemloft.net>; Kirsher, Jeffrey T
>> <jeffrey.t.kirsher@intel.com>; Duyck, Alexander H
>> <alexander.h.duyck@intel.com>; Grandhi, Sainath
>> <sainath.grandhi@intel.com>; Mahesh Bandewar <maheshb@google.com>
>> Subject: Re: [PATCH net-next 9/9] ipvlan: introduce individual MAC addresses
>>
>> On Tue, 2017-05-02 at 15:08 +0000, Chiappero, Marco wrote:
>> > > -----Original Message-----
>> > > From: Dan Williams [mailto:dcbw@redhat.com] On Thu, 2017-04-27 at
>> > > 11:20 -0500, Dan Williams wrote:
>> > > > On Thu, 2017-04-27 at 15:51 +0100, Marco Chiappero wrote:
>> > > > > Currently all the slave devices belonging to the same port
>> > > > > inherit their MAC address from its master device. This patch
>> > > > > removes this limitation and allows every slave device to obtain
>> > > > > a unique MAC address, by default randomly generated at creation
>> > > > > time.
>> > > > >
>> > > > > Moreover it is now possible to correctly modify the MAC address
>> > > > > at any time, fixing an existing bug as MAC address changes on
>> > > > > the master were not reflected on the slaves. It also avoids
>> > > > > multiple interfaces sharing the same IPv6 link-local address.
>> > > >
>> > > > How is this different than macvlan now?
>> >
>> > The same way it was before. The purpose of the patch is to make
>> > possible to change the MAC address on slaves, not to change the
>> > external behavior of ipvlan: ipvlan will still behave as ipvlan,
>> > macvlan will still behave as macvlan.
>>
>> Ok, it was completely unclear from the commit message that the "internal" MAC
>> addresses of the ipvlan interfaces were not reflected "on the wire", but that this
>> was essentially (as you say below) MAC NAT.
>
> Sorry about that, I'll fix it in V2.
>
>> I think everyone agrees that being able to change the MAC is useful and was a
>> bug.
>>
>> What I'm still not clear on is, if IPv6 is already solved, why is it useful to have
>> assign ipvlan interface a unique MAC address?  Is it only to make interface
>> lookups via MAC easier?
>
> The main motivation is that some higher level management software expect interfaces on the same L2 broadcast domain to obviously have different MAC addresses, either out-of-the-box or via address change - or both.  Instead with ipvlan:
> - there is no real/formal segmentation between slaves
segmentation between slaves is at L3. If you want segmentation at L2
then use Macvlan

> - slaves share the same L2 address
Yes, that's by design!

> This looks conceptually wrong.
Why is it hard to view a device that does mux/demux using L3 while
keeping L2 same. Namespaces within physical box have different L3 and
have different routing needs too while all this enclosed inside one
host exposing one external L2. If some mgmt software doesn't like it
then either IPvlan is not the best fit for that solution or that
software needs an update.

>Yes, ipvlan works at L3 (which is an implementation detail anyway), but slaves are Ethernet interfaces and should behave as much as possible as such regardless, with an individual MAC address assigned.
>
> Additionally there is another related bug as it's currently possible to work around this limitation, although breaking the whole thing, by:
> 1) changing the MAC address on master from X to Y
> 2) creating a slave, receiving address Y
> 3) restoring the original MAC address X on master
>
> So, either we fix this by forcing slaves to stay in sync with master,
I have already mentioned that this is the only acceptable fix out of
this patchset but not by way of rewriting headers. The simplest fix is
to use the master->dev_addr in dev_hard_header()

> or correctly support independent MAC addresses, which would be IMO preferable for the above reasons.
>
> Best Regards,
> Marco
> --------------------------------------------------------------
> Intel Research and Development Ireland Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
>
>
> This e-mail and any attachments may contain confidential material for the sole
> use of the intended recipient(s). Any review or distribution by others is
> strictly prohibited. If you are not the intended recipient, please contact the
> sender and delete all copies.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox