Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] virtio_net: enable tx after resuming from suspend
From: ake @ 2018-10-12  4:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel
In-Reply-To: <a156fd7e-e36e-e4e9-c240-3117ecc5b1b4@redhat.com>



On 2018年10月11日 22:06, Jason Wang wrote:
> 
> 
> On 2018年10月11日 18:22, ake wrote:
>>
>> On 2018年10月11日 18:44, Jason Wang wrote:
>>>
>>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>>> disabled the virtio tx before going to suspend to avoid a use after
>>>> free.
>>>> However, after resuming, it causes the virtio_net device to lose its
>>>> network connectivity.
>>>>
>>>> To solve the issue, we need to enable tx after resuming.
>>>>
>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>> reset")
>>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>>> ---
>>>>    drivers/net/virtio_net.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index dab504ec5e50..3453d80f5f81 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>> virtio_device *vdev)
>>>>        }
>>>>          netif_device_attach(vi->dev);
>>>> +    netif_start_queue(vi->dev);
>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>> netif_device_attach() above?
>> Thank you for your review.
>>
>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>> conditions in netif_device_attach() is not satisfied?
> 
> Yes, maybe. One case I can see now is when the device is down, in this
> case netif_device_attach() won't try to wakeup the queue.
> 
>>   Without
>> netif_start_queue(), the virtio_net device does not resume properly
>> after waking up.
> 
> How do you trigger the issue? Just do suspend/resume?

Yes, simply suspend and resume.

Here is how I trigger the issue:

1) Start the Virtual Machine Manager GUI program.
2) Create a guest Linux OS. Make sure that the guest OS kernel is
   >= 4.12. Make sure that it uses virtio_net as its network device.
   In addition, make sure that the video adapter is VGA. Otherwise,
   waking up with the virtual power button does not work.
3) After installing the guest OS, log in, and test the network
   connectivity by ping the host machine.
4) Suspend. After this, the screen is blank.
5) Resume by hitting the virtual power button. The login screen
   appears again.
6) Log in again. The guest loses its network connection.

In my test:
Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic

>>
>> Is it better to report this as a bug first?
> 
> Nope, you're very welcome to post patch directly.
> 
>> If I am to do more
>> investigation, what areas should I look into?
> 
> As you've figured out, you can start with why netif_tx_wake_all_queues()
> were not executed?
> 
> (Btw, does the issue disappear if you move netif_tx_disable() under the
> check of netif_running() in virtnet_freeze_down()?)

The issue disappears if I move netif_tx_disable() under the check of
netif_running() in virtnet_freeze_down(). Moving netif_tx_disable()
is probably better as its logic is consistent with
netif_device_attach() implementation. If you are OK with this idea,
I will submit another patch.

> Thanks
> 
>>
>> Best Regards
>> Ake Koomsin
>>
> 

Best Regards

^ permalink raw reply

* [PATCH iproute2 net-next] ipneigh: support for NTF_EXT_LEARNED flag on neigh entries
From: Roopa Prabhu @ 2018-10-11 20:45 UTC (permalink / raw)
  To: dsahern; +Cc: netdev

From: Roopa Prabhu <roopa@cumulusnetworks.com>

Adds new option extern_learn to set NTF_EXT_LEARNED flag
on neigh entries.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 ip/ipneigh.c            | 7 ++++++-
 man/man8/ip-neighbour.8 | 9 ++++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index 165546e..042d01f 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -48,7 +48,7 @@ static void usage(void)
 {
 	fprintf(stderr, "Usage: ip neigh { add | del | change | replace }\n"
 			"                { ADDR [ lladdr LLADDR ] [ nud STATE ] | proxy ADDR } [ dev DEV ]\n");
-	fprintf(stderr, "                                 [ router ]\n\n");
+	fprintf(stderr, "                                 [ router ] [ extern_learn ]\n\n");
 	fprintf(stderr, "       ip neigh { show | flush } [ proxy ] [ to PREFIX ] [ dev DEV ] [ nud STATE ]\n");
 	fprintf(stderr, "                                 [ vrf NAME ]\n\n");
 	fprintf(stderr, "STATE := { permanent | noarp | stale | reachable | none |\n"
@@ -142,6 +142,8 @@ static int ipneigh_modify(int cmd, int flags, int argc, char **argv)
 			req.ndm.ndm_flags |= NTF_PROXY;
 		} else if (strcmp(*argv, "router") == 0) {
 			req.ndm.ndm_flags |= NTF_ROUTER;
+		} else if (matches(*argv, "extern_learn") == 0) {
+			req.ndm.ndm_flags |= NTF_EXT_LEARNED;
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
 			dev = *argv;
@@ -354,6 +356,9 @@ int print_neigh(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	if (r->ndm_flags & NTF_PROXY)
 		print_null(PRINT_ANY, "proxy", " %s", "proxy");
 
+	if (r->ndm_flags & NTF_EXT_LEARNED)
+		print_null(PRINT_ANY, "extern_learn", " %s ", "extern_learn");
+
 	if (show_stats) {
 		if (tb[NDA_CACHEINFO])
 			print_cacheinfo(RTA_DATA(tb[NDA_CACHEINFO]));
diff --git a/man/man8/ip-neighbour.8 b/man/man8/ip-neighbour.8
index db286d1..4a672bb 100644
--- a/man/man8/ip-neighbour.8
+++ b/man/man8/ip-neighbour.8
@@ -24,7 +24,8 @@ ip-neighbour \- neighbour/arp tables management.
 .IR ADDR " } [ "
 .B  dev
 .IR DEV " ] [ "
-.BR router " ] "
+.BR router " ] [ "
+.BR extern_learn " ]"
 
 .ti -8
 .BR "ip neigh" " { " show " | " flush " } [ " proxy " ] [ " to
@@ -85,6 +86,12 @@ indicates whether we are proxying for this neigbour entry
 indicates whether neigbour is a router
 
 .TP
+.BI extern_learn
+this neigh entry was learned externally. This option can be used to
+indicate to the kernel that this is a controller learnt dynamic entry.
+Kernel will not gc such an entry.
+
+.TP
 .BI lladdr " LLADDRESS"
 the link layer address of the neighbour.
 .I LLADDRESS
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net-next] rxrpc: Remove set but not used variable 'ioc'
From: David Howells @ 2018-10-11 20:38 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: dhowells, YueHaibing, davem, linux-afs, netdev, kernel-janitors
In-Reply-To: <20181011083807.tvhfguimyw7glrtp@mwanda>

Dan Carpenter <dan.carpenter@oracle.com> wrote:

> I told him to do that because it wasn't a bugfix...  Probably just Fixes
> is the right thing to use though.

But the correct fix *is* a bug fix.

David

^ permalink raw reply

* [PATCH net-next 2/2] net: phy: simplify handling of PHY_RESUMING in state machine
From: Heiner Kallweit @ 2018-10-11 20:37 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <a3f84d67-c7f1-82d5-1956-bd348ab2b5c0@gmail.com>

Simplify code for handling state PHY_RESUMING, no functional change
intended.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy.c | 43 ++++++++++++++-----------------------------
 1 file changed, 14 insertions(+), 29 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 696955d38..d03bdbbd1 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1059,41 +1059,26 @@ void phy_state_machine(struct work_struct *work)
 	case PHY_RESUMING:
 		if (AUTONEG_ENABLE == phydev->autoneg) {
 			err = phy_aneg_done(phydev);
-			if (err < 0)
+			if (err < 0) {
 				break;
-
-			/* err > 0 if AN is done.
-			 * Otherwise, it's 0, and we're  still waiting for AN
-			 */
-			if (err > 0) {
-				err = phy_read_status(phydev);
-				if (err)
-					break;
-
-				if (phydev->link) {
-					phydev->state = PHY_RUNNING;
-					phy_link_up(phydev);
-				} else	{
-					phydev->state = PHY_NOLINK;
-					phy_link_down(phydev, false);
-				}
-			} else {
+			} else if (!err) {
 				phydev->state = PHY_AN;
 				phydev->link_timeout = PHY_AN_TIMEOUT;
-			}
-		} else {
-			err = phy_read_status(phydev);
-			if (err)
 				break;
-
-			if (phydev->link) {
-				phydev->state = PHY_RUNNING;
-				phy_link_up(phydev);
-			} else	{
-				phydev->state = PHY_NOLINK;
-				phy_link_down(phydev, false);
 			}
 		}
+
+		err = phy_read_status(phydev);
+		if (err)
+			break;
+
+		if (phydev->link) {
+			phydev->state = PHY_RUNNING;
+			phy_link_up(phydev);
+		} else	{
+			phydev->state = PHY_NOLINK;
+			phy_link_down(phydev, false);
+		}
 		break;
 	}
 
-- 
2.19.1

^ permalink raw reply related

* [PATCH net-next 1/2] net: phy: improve handling of PHY_RUNNING in state machine
From: Heiner Kallweit @ 2018-10-11 20:36 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <a3f84d67-c7f1-82d5-1956-bd348ab2b5c0@gmail.com>

Handling of state PHY_RUNNING seems to be more complex than it needs
to be. If not polling, then we don't have to do anything, we'll
receive an interrupt and go to state PHY_CHANGELINK once the link
goes down. If polling and link is down, we don't have to go the
extra mile over PHY_CHANGELINK and call phy_read_status() again
but can set status PHY_NOLINK directly.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy.c | 29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 704428211..696955d38 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -941,7 +941,6 @@ void phy_state_machine(struct work_struct *work)
 	bool needs_aneg = false, do_suspend = false;
 	enum phy_state old_state;
 	int err = 0;
-	int old_link;
 
 	mutex_lock(&phydev->lock);
 
@@ -1025,26 +1024,16 @@ void phy_state_machine(struct work_struct *work)
 		}
 		break;
 	case PHY_RUNNING:
-		/* Only register a CHANGE if we are polling and link changed
-		 * since latest checking.
-		 */
-		if (phy_polling_mode(phydev)) {
-			old_link = phydev->link;
-			err = phy_read_status(phydev);
-			if (err)
-				break;
+		if (!phy_polling_mode(phydev))
+			break;
 
-			if (old_link != phydev->link)
-				phydev->state = PHY_CHANGELINK;
-		}
-		/*
-		 * Failsafe: check that nobody set phydev->link=0 between two
-		 * poll cycles, otherwise we won't leave RUNNING state as long
-		 * as link remains down.
-		 */
-		if (!phydev->link && phydev->state == PHY_RUNNING) {
-			phydev->state = PHY_CHANGELINK;
-			phydev_err(phydev, "no link in PHY_RUNNING\n");
+		err = phy_read_status(phydev);
+		if (err)
+			break;
+
+		if (!phydev->link) {
+			phydev->state = PHY_NOLINK;
+			phy_link_down(phydev, true);
 		}
 		break;
 	case PHY_CHANGELINK:
-- 
2.19.1

^ permalink raw reply related

* [PATCH net-next 0/2] net: phy: improve and simplify state machine
From: Heiner Kallweit @ 2018-10-11 20:35 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

Improve / simplify handling of states PHY_RUNNING and PHY_RESUMING in
phylib state machine.

Heiner Kallweit (2):
  net: phy: improve handling of PHY_RUNNING in state machine
  net: phy: simplify handling of PHY_RESUMING in state machine

 drivers/net/phy/phy.c | 72 ++++++++++++++-----------------------------
 1 file changed, 23 insertions(+), 49 deletions(-)

-- 
2.19.1

^ permalink raw reply

* Fwd: [PATCH stable 4.4 V2 0/6] fix SegmentSmack in stable branch (CVE-2018-5390)
From: salil GK @ 2018-10-12  3:53 UTC (permalink / raw)
  To: maowenan, netdev, dwmw2, eric.dumazet, davem, stable,
	linux-kernel, gregkh
In-Reply-To: <CAPACB-yF16qtiYn9e41_iFNRUzHB349NEiAGubUvW-kiUwpRMQ@mail.gmail.com>

I was looking for fix for CVE-2018-5390 and CVE-2018-5390) in 4.18.x.
Will these fix be available in 4.18 train ?

PS: Sorry for sending again - I got a rejection message as my previous
message contains html tags !

Thanks
~S

On Oct 11, 2018 7:38 PM, "Greg KH" <gregkh@linux-foundation.org> wrote:

On Wed, Sep 26, 2018 at 10:21:21PM +0200, Greg KH wrote:
> On Tue, Sep 25, 2018 at 10:10:15PM +0800, maowenan wrote:
> > Hi Greg:
> >
> > can you review this patch set?
>
> It is still in the queue, don't worry.  It will take some more time to
> properly review and test it.
>
> Ideally you could get someone else to test this and provide a
> "tested-by:" tag for it?

All now queued up, let's see what breaks :)

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH stable 4.4 V2 0/6] fix SegmentSmack in stable branch (CVE-2018-5390)
From: maowenan @ 2018-10-12  3:39 UTC (permalink / raw)
  To: salil GK, gregkh
  Cc: netdev, dwmw2, eric.dumazet, davem, stable, linux-kernel,
	maowenan
In-Reply-To: <CAPACB-yF16qtiYn9e41_iFNRUzHB349NEiAGubUvW-kiUwpRMQ@mail.gmail.com>



On 2018/10/12 10:28, salil GK wrote:
> I was looking for fix for CVE-2018-5390 and CVE-2018-5390) in 4.18.x. Will these fix be available in 4.18 train ?

The fixes of CVE-2018-5390 have already existed in stable 4.18. These fixes only available with < 4.9 that don't using RB tree.

58152ec tcp: add tcp_ooo_try_coalesce() helper
8541b21 tcp: call tcp_drop() from tcp_data_queue_ofo()
3d4bf93 tcp: detect malicious patterns in tcp_collapse_ofo_queue()
f4a3313 tcp: avoid collapses in tcp_prune_queue() if possible
72cd43b tcp: free batches of packets in tcp_prune_ofo_queue()


> 
> Thanks
> ~S
> 
> On Oct 11, 2018 7:38 PM, "Greg KH" <gregkh@linux-foundation.org <mailto:gregkh@linux-foundation.org>> wrote:
> 
>     On Wed, Sep 26, 2018 at 10:21:21PM +0200, Greg KH wrote:
>     > On Tue, Sep 25, 2018 at 10:10:15PM +0800, maowenan wrote:
>     > > Hi Greg:
>     > >
>     > > can you review this patch set?
>     >
>     > It is still in the queue, don't worry.  It will take some more time to
>     > properly review and test it.
>     >
>     > Ideally you could get someone else to test this and provide a
>     > "tested-by:" tag for it?
> 
>     All now queued up, let's see what breaks :)
> 
>     thanks,
> 
>     greg k-h
> 
> 

^ permalink raw reply

* [net  1/1] tipc: initialize broadcast link stale counter correctly
From: Jon Maloy @ 2018-10-11 20:02 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

In the commit referred to below we added link tolerance as an additional
criteria for declaring broadcast transmission "stale" and resetting the
unicast links to the affected node.

Unfortunately, this 'improvement' introduced two bugs, which each and
one alone cause only limited problems, but combined lead to seemingly
stochastic unicast link resets, depending on the amount of broadcast
traffic transmitted.

The first issue, a missing initialization of the 'tolerance' field of
the receiver broadcast link, was recently fixed by commit 047491ea334a
("tipc: set link tolerance correctly in broadcast link").

Ths second issue, where we omit to reset the 'stale_cnt' field of
the same link after a 'stale' period is over, leads to this counter
accumulating over time, and in the absence of the 'tolerance' criteria
leads to the above described symptoms. This commit adds the missing
initialization.

Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet
retransmission")

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/link.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index f6552e4..201c3b5 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1041,6 +1041,7 @@ static int tipc_link_retrans(struct tipc_link *l, struct tipc_link *r,
 	if (r->last_retransm != buf_seqno(skb)) {
 		r->last_retransm = buf_seqno(skb);
 		r->stale_limit = jiffies + msecs_to_jiffies(r->tolerance);
+		r->stale_cnt = 0;
 	} else if (++r->stale_cnt > 99 && time_after(jiffies, r->stale_limit)) {
 		link_retransmit_failure(l, skb);
 		if (link_is_bc_sndlink(l))
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps
From: Jamal Hadi Salim @ 2018-10-11 19:54 UTC (permalink / raw)
  To: David Ahern, Sowmini Varadhan, Stephen Hemminger
  Cc: David Ahern, netdev, davem
In-Reply-To: <199349c3-fd14-1401-86d1-814776372917@gmail.com>

On 2018-10-11 2:44 p.m., David Ahern wrote:
> On 10/11/18 12:05 PM, Jamal Hadi Salim wrote:
>> On 2018-10-11 1:04 p.m., David Ahern wrote:

> 
> I meant the general API of users passing filter arguments as attributes
> to the dump (or values in the header) -- KIND, MASTER, device index,
> etc. This is an existing API and existing capability.
> 

which i referred to as use-case driven. It is not unreasonable
to optimize for the most common - but every time somebody
comes up with something new you need to patch the kernel.


> I disagree with your overall premise of bpf the end-all hammer. It is a
> tool but not the only tool. For starters, you are proposing building the
> message, run the filter on it, and potentially back the message up to
> drop the recently added piece because the filter does not want it
> included. That is still wasting a lot of cpu cycles to build and drop. I
> am thinking about scale to 1 million routes -- I do not need the dump
> loop building a message for 1 million entries only to drop 99% of them.
> That is crazy.
> 

My earlier suggestion was for somewhere before the skb is formed.
In the vicinity of xxx_fill_info()
The "create skb and drop" kind works already today with some
acrobatics needed for some cases with cbpf.

Is it unfeasible to add an ebpf hook at that point and ask a user
supplied code "is this ok to send?" - this is no different
than doing a "get by key" operation where key/filter is any arbitrary
construction of fields rtm understands (including the ones you
provided like table index) that are passed in the user program.
Classical "select" mechanism in database tables

> The way the kernel manages route tables says I should pass in the table
> id as it is a major differentiator on what is returned. From there
> lookup the specific table (I need to fix this part per my response to
> Andrew), and then only walk it. The existing semantics, capabilities
> that exist for other dump commands is the most efficient for some of
> these high level, big hammer filters.
> 

Sure.

> What you want gets into the tiniest of details and yes the imagination
> can go wild with combinations of filter options. So maybe this scanning
> of post-built messages is reasonable *after* the high level sorting is done.
> 

That doesnt require any change.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps
From: David Miller @ 2018-10-11 19:43 UTC (permalink / raw)
  To: sowmini.varadhan; +Cc: dsahern, jhs, stephen, dsahern, netdev
In-Reply-To: <20181011193248.GD17841@oracle.com>

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Thu, 11 Oct 2018 15:32:48 -0400

> Without getting into Ahern's patchset, which he obviously feels 
> quite passionately about..
> 
> On (10/11/18 12:28), David Miller wrote:
>> 
>> Once you've composed the message, the whole point of filtering is lost.
> 
> it would be nice to apply the filter *before* constructing the skb, 
> but afaict most things in BPF today only operate on sk_buffs. How should
> we use *BPF on something other than an sk_buff?

Personally I'm not going to spend cycles on that.

What's important to me in the short term is that David's patch set is
an appropriate way to add filtering, using existing facilities and
mechanisms that already exist for that purpose.

If people want to explore a possible eBPF mechanism for the future,
with an emphasis on "future", feel free to explore to your heart's
content.

But that doesn't exist in any form whatsoever, so that's not what we
should be talking about here.

^ permalink raw reply

* [PATCH net-next v2] vxlan: support NTF_USE refresh of fdb entries
From: Roopa Prabhu @ 2018-10-11 19:35 UTC (permalink / raw)
  To: davem; +Cc: netdev

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This makes use of NTF_USE in vxlan driver consistent
with bridge driver.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
v2: fix patch prefix

 drivers/net/vxlan.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index fb0cdbb..018406c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -697,6 +697,7 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 			    __be16 port, __be32 src_vni, __be32 vni,
 			    __u32 ifindex, __u8 ndm_flags)
 {
+	__u8 fdb_flags = (ndm_flags & ~NTF_USE);
 	struct vxlan_rdst *rd = NULL;
 	struct vxlan_fdb *f;
 	int notify = 0;
@@ -714,8 +715,8 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 			f->updated = jiffies;
 			notify = 1;
 		}
-		if (f->flags != ndm_flags) {
-			f->flags = ndm_flags;
+		if (f->flags != fdb_flags) {
+			f->flags = fdb_flags;
 			f->updated = jiffies;
 			notify = 1;
 		}
@@ -737,6 +738,9 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 				return rc;
 			notify |= rc;
 		}
+
+		if (ndm_flags & NTF_USE)
+			f->used = jiffies;
 	} else {
 		if (!(flags & NLM_F_CREATE))
 			return -ENOENT;
@@ -748,7 +752,7 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 
 		netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
 		rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni,
-				      vni, ifindex, ndm_flags, &f);
+				      vni, ifindex, fdb_flags, &f);
 		if (rc < 0)
 			return rc;
 		notify = 1;
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH cumulus-4.18.y] vxlan: support NTF_USE refresh of fdb entries
From: Roopa Prabhu @ 2018-10-11 19:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1539286408-25597-1-git-send-email-roopa@cumulusnetworks.com>

On Thu, Oct 11, 2018 at 12:33 PM Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> This makes use of NTF_USE in vxlan driver consistent
> with bridge driver.
>
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> ---

pls ignore. wrong patch prefix :)

^ permalink raw reply

* [PATCH cumulus-4.18.y] vxlan: support NTF_USE refresh of fdb entries
From: Roopa Prabhu @ 2018-10-11 19:33 UTC (permalink / raw)
  To: davem; +Cc: netdev

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This makes use of NTF_USE in vxlan driver consistent
with bridge driver.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 drivers/net/vxlan.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index fb0cdbb..018406c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -697,6 +697,7 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 			    __be16 port, __be32 src_vni, __be32 vni,
 			    __u32 ifindex, __u8 ndm_flags)
 {
+	__u8 fdb_flags = (ndm_flags & ~NTF_USE);
 	struct vxlan_rdst *rd = NULL;
 	struct vxlan_fdb *f;
 	int notify = 0;
@@ -714,8 +715,8 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 			f->updated = jiffies;
 			notify = 1;
 		}
-		if (f->flags != ndm_flags) {
-			f->flags = ndm_flags;
+		if (f->flags != fdb_flags) {
+			f->flags = fdb_flags;
 			f->updated = jiffies;
 			notify = 1;
 		}
@@ -737,6 +738,9 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 				return rc;
 			notify |= rc;
 		}
+
+		if (ndm_flags & NTF_USE)
+			f->used = jiffies;
 	} else {
 		if (!(flags & NLM_F_CREATE))
 			return -ENOENT;
@@ -748,7 +752,7 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 
 		netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
 		rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni,
-				      vni, ifindex, ndm_flags, &f);
+				      vni, ifindex, fdb_flags, &f);
 		if (rc < 0)
 			return rc;
 		notify = 1;
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps
From: Sowmini Varadhan @ 2018-10-11 19:32 UTC (permalink / raw)
  To: David Miller; +Cc: dsahern, jhs, stephen, dsahern, netdev
In-Reply-To: <20181011.122847.1015772734131510783.davem@davemloft.net>


Without getting into Ahern's patchset, which he obviously feels 
quite passionately about..

On (10/11/18 12:28), David Miller wrote:
> 
> Once you've composed the message, the whole point of filtering is lost.

it would be nice to apply the filter *before* constructing the skb, 
but afaict most things in BPF today only operate on sk_buffs. How should
we use *BPF on something other than an sk_buff?

--Sowmini

^ permalink raw reply

* Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps
From: David Miller @ 2018-10-11 19:28 UTC (permalink / raw)
  To: dsahern; +Cc: jhs, sowmini.varadhan, stephen, dsahern, netdev
In-Reply-To: <199349c3-fd14-1401-86d1-814776372917@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Thu, 11 Oct 2018 12:44:49 -0600

> I disagree with your overall premise of bpf the end-all hammer. It is a
> tool but not the only tool. For starters, you are proposing building the
> message, run the filter on it, and potentially back the message up to
> drop the recently added piece because the filter does not want it
> included. That is still wasting a lot of cpu cycles to build and drop. I
> am thinking about scale to 1 million routes -- I do not need the dump
> loop building a message for 1 million entries only to drop 99% of them.
> That is crazy.

I completely agree.

Once you've composed the message, the whole point of filtering is lost.

^ permalink raw reply

* Re: [PATCH net] net: udp: fix handling of CHECKSUM_COMPLETE packets
From: Eric Dumazet @ 2018-10-11 19:20 UTC (permalink / raw)
  To: Sean Tranchetti, davem, netdev
In-Reply-To: <1539282601-19795-1-git-send-email-stranche@codeaurora.org>



On 10/11/2018 11:30 AM, Sean Tranchetti wrote:
> Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
> incorrect for any packet that has an incorrect checksum value.
> 
> udp4/6_csum_init() will both make a call to
> __skb_checksum_validate_complete() to initialize/validate the csum
> field when receiving a CHECKSUM_COMPLETE packet. When this packet
> fails validation, skb->csum will be overwritten with the pseudoheader
> checksum so the packet can be fully validated by software, but the
> skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
> the stack can later warn the user about their hardware spewing bad
> checksums. Unfortunately, leaving the SKB in this state can cause
> problems later on in the checksum calculation.
> 
> Since the the packet is still marked as CHECKSUM_COMPLETE,
> udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
> from skb->csum instead of adding it, leaving us with a garbage value
> in that field. Once we try to copy the packet to userspace in the
> udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
> to checksum the packet data and add it in the garbage skb->csum value
> to perform our final validation check.
> 
> Since the value we're validating is not the proper checksum, it's possible
> that the folded value could come out to 0, causing us not to drop the
> packet. Instead, we believe that the packet was checksummed incorrectly
> by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
> to warn the user with netdev_rx_csum_fault(skb->dev);
> 
> Unfortunately, since this is the UDP path, skb->dev has been overwritten
> by skb->dev_scratch and is no longer a valid pointer, so we end up
> reading invalid memory.
> 
> This patch addresses this problem in two ways:
> 	1) Remove the invalid netdev_rx_csum_fault(skb->dev) call from
> 	   skb_copy_and_csum_datagram_msg(). Since this is used by UDP
> 	   where skb->dev is invalid, trying to warn doesn't make sense.
> 
> 	2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
> 	   If we receive a packet that's CHECKSUM_COMPLETE that fails
> 	   verification (i.e. skb->csum_valid == 0), check who performed
> 	   the calculation. It's possible that the checksum was done in
> 	   software by the network stack earlier (such as Netfilter's
> 	   CONNTRACK module), and if that says the checksum is bad,
> 	   we can drop the packet immediately instead of waiting until
> 	   we try and copy it to userspace. Otherwise, we need to
> 	   mark the SKB as CHECKSUM_NONE, since the skb->csum field
> 	   no longer contains the full packet checksum after the
> 	   call to __skb_checksum_validate_complete().
> 
> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>

It would be nice if you could provide a Fixes: tag, so that we know what kind
of backport work is needed.

You could also CC the patch author, it is always a nice thing to do.

If really we could emit a message using skb->dev after skb has been queued
and RCU section gone, this always has been buggy, and maybe we should not
even try to use skb->dev when warning is sent.

Alternative would be to use dev index and perform a lookup to find the device,
if device still exists.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] nfp: replace long license headers with SPDX
From: David Miller @ 2018-10-11 19:16 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: netdev, oss-drivers, simon.horman, edwin.peer, nick.viljoen
In-Reply-To: <20181011155742.19633-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Thu, 11 Oct 2018 08:57:42 -0700

> Replace the repeated license text with SDPX identifiers.
> While at it bump the Copyright dates for files we touched
> this year.
> 
> Signed-off-by: Edwin Peer <edwin.peer@netronome.com>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Nic Viljoen <nick.viljoen@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: phy: sfp: remove sfp_mutex's definition
From: David Miller @ 2018-10-11 19:10 UTC (permalink / raw)
  To: bigeasy; +Cc: netdev, tglx, andrew, f.fainelli
In-Reply-To: <20181011150621.8466-1-bigeasy@linutronix.de>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Thu, 11 Oct 2018 17:06:21 +0200

> The sfp_mutex variable is defined but never used in this file. Not even
> in the commit that introduced that variable.
> 
> Remove sfp_mutex, it has no purpose.
> 
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Applied.

^ permalink raw reply

* [PATCH net-next] net: fddi: skfp: Remove unused macros 'PNMI_GET_ID' and 'PNMI_SET_ID'
From: YueHaibing @ 2018-10-12  2:37 UTC (permalink / raw)
  To: davem, natechancellor; +Cc: linux-kernel, netdev, YueHaibing

The two PNMI macros are never used

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/fddi/skfp/h/cmtdef.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/fddi/skfp/h/cmtdef.h b/drivers/net/fddi/skfp/h/cmtdef.h
index a12f464..448d66c 100644
--- a/drivers/net/fddi/skfp/h/cmtdef.h
+++ b/drivers/net/fddi/skfp/h/cmtdef.h
@@ -655,14 +655,6 @@ void dump_hex(char *p, int len);
 #ifndef PNMI_INIT
 #define	PNMI_INIT(smc)	/* Nothing */
 #endif
-#ifndef PNMI_GET_ID
-#define PNMI_GET_ID( smc, ndis_oid, buf, len, BytesWritten, BytesNeeded ) \
-		( 1 ? (-1) : (-1) )
-#endif
-#ifndef PNMI_SET_ID
-#define PNMI_SET_ID( smc, ndis_oid, buf, len, BytesRead, BytesNeeded, \
-		set_type) ( 1 ? (-1) : (-1) )
-#endif
 
 /*
  * SMT_PANIC defines
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH 1/1] net: socionext: clear rx irq correctly
From: David Miller @ 2018-10-11 19:04 UTC (permalink / raw)
  To: ilias.apalodimas
  Cc: netdev, jaswinder.singh, ard.biesheuvel, masami.hiramatsu
In-Reply-To: <1539260906-29596-1-git-send-email-ilias.apalodimas@linaro.org>

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Date: Thu, 11 Oct 2018 15:28:26 +0300

> commit 63ae7949e94a ("net: socionext: Use descriptor info instead of MMIO reads on Rx")
> removed constant mmio reads from the driver and started using a descriptor 
> field to check if packet should be processed.
> This lead the napi rx handler being constantly called while no packets
> needed processing and ksoftirq getting 100% cpu usage. Issue one mmio read 
> to clear the irq correcty after processing packets
> 
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Applied.

^ permalink raw reply

* Re: [RFC PATCH 2/2] net/ncsi: Configure multi-package, multi-channel modes with failover
From: Samuel Mendoza-Jonas @ 2018-10-12  2:24 UTC (permalink / raw)
  To: Justin.Lee1, netdev; +Cc: davem, linux-kernel, openbmc
In-Reply-To: <e20a7a3da4ca4aec913800355cd6cdaf@AUSX13MPS302.AMER.DELL.COM>

On Thu, 2018-10-11 at 22:09 +0000, Justin.Lee1@Dell.com wrote:
> Hi Sam,
> 
> Please see my comments below.
> 
> Thanks,
> Justin
>  
>  
> > On Wed, 2018-10-10 at 22:36 +0000, Justin.Lee1@Dell.com wrote:
> > > Hi Samuel,
> > > 
> > > I am still testing your change and have some comments below.
> > > 
> > > Thanks,
> > > Justin
> > > 
> > > 
> > > > This patch extends the ncsi-netlink interface with two new commands and
> > > > three new attributes to configure multiple packages and/or channels at
> > > > once, and configure specific failover modes.
> > > > 
> > > > NCSI_CMD_SET_PACKAGE mask and NCSI_CMD_SET_CHANNEL_MASK set a whitelist
> > > > of packages or channels allowed to be configured with the
> > > > NCSI_ATTR_PACKAGE_MASK and NCSI_ATTR_CHANNEL_MASK attributes
> > > > respectively. If one of these whitelists is set only packages or
> > > > channels matching the whitelist are considered for the channel queue in
> > > > ncsi_choose_active_channel().
> > > > 
> > > > These commands may also use the NCSI_ATTR_MULTI_FLAG to signal that
> > > > multiple packages or channels may be configured simultaneously. NCSI
> > > > hardware arbitration (HWA) must be available in order to enable
> > > > multi-package mode. Multi-channel mode is always available.
> > > > 
> > > > If the NCSI_ATTR_CHANNEL_ID attribute is present in the
> > > > NCSI_CMD_SET_CHANNEL_MASK command the it sets the preferred channel as
> > > > with the NCSI_CMD_SET_INTERFACE command. The combination of preferred
> > > > channel and channel whitelist defines a primary channel and the allowed
> > > > failover channels.
> > > > If the NCSI_ATTR_MULTI_FLAG attribute is also present then the preferred
> > > > channel is configured for Tx/Rx and the other channels are enabled only
> > > > for Rx.
> > > > 
> > > > Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
> > > > ---
> > > >  include/uapi/linux/ncsi.h |  16 +++
> > > >  net/ncsi/internal.h       |  11 +-
> > > >  net/ncsi/ncsi-aen.c       |   2 +-
> > > >  net/ncsi/ncsi-manage.c    | 138 ++++++++++++++++--------
> > > >  net/ncsi/ncsi-netlink.c   | 217 +++++++++++++++++++++++++++++++++-----
> > > >  net/ncsi/ncsi-rsp.c       |   2 +-
> > > >  6 files changed, 312 insertions(+), 74 deletions(-)
> > > > 
> > > > diff --git a/include/uapi/linux/ncsi.h b/include/uapi/linux/ncsi.h
> > > > index 4c292ecbb748..035fba1693f9 100644
> > > > --- a/include/uapi/linux/ncsi.h
> > > > +++ b/include/uapi/linux/ncsi.h
> > > > @@ -23,6 +23,13 @@
> > > >   *	optionally the preferred NCSI_ATTR_CHANNEL_ID.
> > > >   * @NCSI_CMD_CLEAR_INTERFACE: clear any preferred package/channel combination.
> > > >   *	Requires NCSI_ATTR_IFINDEX.
> > > > + * @NCSI_CMD_SET_PACKAGE_MASK: set a whitelist of allowed packages.
> > > > + * @NCSI_CMD_SET_PACKAGE_MASK: set a whitelist of allowed channels.
> > > > + *	Requires NCSI_ATTR_IFINDEX and NCSI_ATTR_PACKAGE_MASK.
> > > > + * @NCSI_CMD_SET_PACKAGE_MASK: set a whitelist of allowed channels.
> > > > + *	Requires NCSI_ATTR_IFINDEX, NCSI_ATTR_PACKAGE_ID, and
> > > > + *	NCSI_ATTR_CHANNEL_MASK. If NCSI_ATTR_CHANNEL_ID is present it sets
> > > > + *	the primary channel.
> > > >   * @NCSI_CMD_MAX: highest command number
> > > >   */
> > > 
> > > There are some typo in the description.
> > > * @NCSI_CMD_SET_PACKAGE_MASK: set a whitelist of allowed packages.
> > >  *	Requires NCSI_ATTR_IFINDEX and NCSI_ATTR_PACKAGE_MASK.
> > >  * @NCSI_CMD_SET_CHANNEL_MASK: set a whitelist of allowed channels.
> > >  *	Requires NCSI_ATTR_IFINDEX, NCSI_ATTR_PACKAGE_ID, and
> > >  *	NCSI_ATTR_CHANNEL_MASK. If NCSI_ATTR_CHANNEL_ID is present it sets
> > >  *	the primary channel.
> > 
> > Haha, yes I threw that in at the end, thanks for catching it.
> > 
> > > >  enum ncsi_nl_commands {
> > > > @@ -30,6 +37,8 @@ enum ncsi_nl_commands {
> > > >  	NCSI_CMD_PKG_INFO,
> > > >  	NCSI_CMD_SET_INTERFACE,
> > > >  	NCSI_CMD_CLEAR_INTERFACE,
> > > > +	NCSI_CMD_SET_PACKAGE_MASK,
> > > > +	NCSI_CMD_SET_CHANNEL_MASK,
> > > >  
> > > >  	__NCSI_CMD_AFTER_LAST,
> > > >  	NCSI_CMD_MAX = __NCSI_CMD_AFTER_LAST - 1
> > > > @@ -43,6 +52,10 @@ enum ncsi_nl_commands {
> > > >   * @NCSI_ATTR_PACKAGE_LIST: nested array of NCSI_PKG_ATTR attributes
> > > >   * @NCSI_ATTR_PACKAGE_ID: package ID
> > > >   * @NCSI_ATTR_CHANNEL_ID: channel ID
> > > > + * @NCSI_ATTR_MULTI_FLAG: flag to signal that multi-mode should be enabled with
> > > > + *	NCSI_CMD_SET_PACKAGE_MASK or NCSI_CMD_SET_CHANNEL_MASK.
> > > > + * @NCSI_ATTR_PACKAGE_MASK: 32-bit mask of allowed packages.
> > > > + * @NCSI_ATTR_CHANNEL_MASK: 32-bit mask of allowed channels.
> > > >   * @NCSI_ATTR_MAX: highest attribute number
> > > >   */
> > > >  enum ncsi_nl_attrs {
> > > > @@ -51,6 +64,9 @@ enum ncsi_nl_attrs {
> > > >  	NCSI_ATTR_PACKAGE_LIST,
> > > >  	NCSI_ATTR_PACKAGE_ID,
> > > >  	NCSI_ATTR_CHANNEL_ID,
> > > > +	NCSI_ATTR_MULTI_FLAG,
> > > > +	NCSI_ATTR_PACKAGE_MASK,
> > > > +	NCSI_ATTR_CHANNEL_MASK,
> > > 
> > > Is there a case that we might set these two masks at the same time?
> > > If not, maybe we can just have one generic MASK attribute.
> > > 
> > 
> > I thought of this too: not yet, but I wonder if we might in the future.
> > I'll have a think about it.
> > 
> > > >  
> > > >  	__NCSI_ATTR_AFTER_LAST,
> > > >  	NCSI_ATTR_MAX = __NCSI_ATTR_AFTER_LAST - 1
> > > > diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
> > > > index 3d0a33b874f5..8437474d0a78 100644
> > > > --- a/net/ncsi/internal.h
> > > > +++ b/net/ncsi/internal.h
> > > > @@ -213,6 +213,10 @@ struct ncsi_package {
> > > >  	unsigned int         channel_num; /* Number of channels     */
> > > >  	struct list_head     channels;    /* List of chanels        */
> > > >  	struct list_head     node;        /* Form list of packages  */
> > > > +
> > > > +	bool                 multi_channel; /* Enable multiple channels  */
> > > > +	u32                  channel_whitelist; /* Channels to configure */
> > > > +	struct ncsi_channel  *preferred_channel; /* Primary channel      */
> > > >  };
> > > >  
> > > >  struct ncsi_request {
> > > > @@ -280,8 +284,6 @@ struct ncsi_dev_priv {
> > > >  	unsigned int        package_num;     /* Number of packages         */
> > > >  	struct list_head    packages;        /* List of packages           */
> > > >  	struct ncsi_channel *hot_channel;    /* Channel was ever active    */
> > > > -	struct ncsi_package *force_package;  /* Force a specific package   */
> > > > -	struct ncsi_channel *force_channel;  /* Force a specific channel   */
> > > >  	struct ncsi_request requests[256];   /* Request table              */
> > > >  	unsigned int        request_id;      /* Last used request ID       */
> > > >  #define NCSI_REQ_START_IDX	1
> > > > @@ -294,6 +296,9 @@ struct ncsi_dev_priv {
> > > >  	struct list_head    node;            /* Form NCSI device list      */
> > > >  #define NCSI_MAX_VLAN_VIDS	15
> > > >  	struct list_head    vlan_vids;       /* List of active VLAN IDs */
> > > > +
> > > > +	bool                multi_package;   /* Enable multiple packages   */
> > > > +	u32                 package_whitelist; /* Packages to configure    */
> > > >  };
> > > >  
> > > >  struct ncsi_cmd_arg {
> > > > @@ -345,6 +350,8 @@ struct ncsi_request *ncsi_alloc_request(struct ncsi_dev_priv *ndp,
> > > >  void ncsi_free_request(struct ncsi_request *nr);
> > > >  struct ncsi_dev *ncsi_find_dev(struct net_device *dev);
> > > >  int ncsi_process_next_channel(struct ncsi_dev_priv *ndp);
> > > > +bool ncsi_channel_is_last(struct ncsi_dev_priv *ndp,
> > > > +			  struct ncsi_channel *channel);
> > > >  
> > > >  /* Packet handlers */
> > > >  u32 ncsi_calculate_checksum(unsigned char *data, int len);
> > > > diff --git a/net/ncsi/ncsi-aen.c b/net/ncsi/ncsi-aen.c
> > > > index 65f47a648be3..eac56aee30c4 100644
> > > > --- a/net/ncsi/ncsi-aen.c
> > > > +++ b/net/ncsi/ncsi-aen.c
> > > > @@ -86,7 +86,7 @@ static int ncsi_aen_handler_lsc(struct ncsi_dev_priv *ndp,
> > > >  	    !(state == NCSI_CHANNEL_ACTIVE && !(data & 0x1)))
> > > >  		return 0;
> > > >  
> > > > -	if (state == NCSI_CHANNEL_ACTIVE)
> > > > +	if (state == NCSI_CHANNEL_ACTIVE && ncsi_channel_is_last(ndp, nc))
> > > >  		ndp->flags |= NCSI_DEV_RESHUFFLE;
> > > >  
> > > >  	ncsi_stop_channel_monitor(nc);
> > > > diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
> > > > index 665bee25ec44..6a55df700bcb 100644
> > > > --- a/net/ncsi/ncsi-manage.c
> > > > +++ b/net/ncsi/ncsi-manage.c
> > > > @@ -27,6 +27,24 @@
> > > >  LIST_HEAD(ncsi_dev_list);
> > > >  DEFINE_SPINLOCK(ncsi_dev_lock);
> > > >  
> > > > +/* Returns true if the given channel is the last channel available */
> > > > +bool ncsi_channel_is_last(struct ncsi_dev_priv *ndp,
> > > > +			  struct ncsi_channel *channel)
> > > > +{
> > > > +	struct ncsi_package *np;
> > > > +	struct ncsi_channel *nc;
> > > > +
> > > > +	NCSI_FOR_EACH_PACKAGE(ndp, np)
> > > > +		NCSI_FOR_EACH_CHANNEL(np, nc) {
> > > > +			if (nc == channel)
> > > > +				continue;
> > > > +			if (nc->state == NCSI_CHANNEL_ACTIVE)
> > > > +				return false;
> > > > +		}
> > > > +
> > > > +	return true;
> > > > +}
> > > > +
> > > >  static void ncsi_report_link(struct ncsi_dev_priv *ndp, bool force_down)
> > > >  {
> > > >  	struct ncsi_dev *nd = &ndp->ndev;
> > > > @@ -266,6 +284,7 @@ struct ncsi_package *ncsi_add_package(struct ncsi_dev_priv *ndp,
> > > >  	np->ndp = ndp;
> > > >  	spin_lock_init(&np->lock);
> > > >  	INIT_LIST_HEAD(&np->channels);
> > > > +	np->channel_whitelist = UINT_MAX;
> > > >  
> > > >  	spin_lock_irqsave(&ndp->lock, flags);
> > > >  	tmp = ncsi_find_package(ndp, id);
> > > > @@ -633,6 +652,34 @@ static int set_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc,
> > > >  	return 0;
> > > >  }
> > > >  
> > > > +/* Determine if a given channel should be the Tx channel */
> > > > +bool ncsi_channel_is_tx(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc)
> > > > +{
> > > > +	struct ncsi_package *np = nc->package;
> > > > +	struct ncsi_channel *channel;
> > > > +	struct ncsi_channel_mode *ncm;
> 
> Learn from Dave. All local variable declarations from longest to shortest
> line. :)
> 
> > > > +
> > > > +	NCSI_FOR_EACH_CHANNEL(np, channel) {
> > > > +		ncm = &channel->modes[NCSI_MODE_TX_ENABLE];
> > > > +		/* Another channel is already Tx */
> > > > +		if (ncm->enable)
> > > > +			return false;
> > > > +	}
> > > > +
> > > > +	if (!np->preferred_channel)
> > > > +		return true;
> > > > +
> > > > +	if (np->preferred_channel == nc)
> > > > +		return true;
> > > > +
> > > > +	/* The preferred channel is not in the queue and not active */
> > > > +	if (list_empty(&np->preferred_channel->link) &&
> > > > +	    np->preferred_channel->state != NCSI_CHANNEL_ACTIVE)
> > > > +		return true;
> > > > +
> > > > +	return false;
> > > > +}
> > > > +
> > > >  static void ncsi_configure_channel(struct ncsi_dev_priv *ndp)
> > > >  {
> > > >  	struct ncsi_dev *nd = &ndp->ndev;
> > > > @@ -745,18 +792,22 @@ static void ncsi_configure_channel(struct ncsi_dev_priv *ndp)
> > > >  		} else if (nd->state == ncsi_dev_state_config_ebf) {
> > > >  			nca.type = NCSI_PKT_CMD_EBF;
> > > >  			nca.dwords[0] = nc->caps[NCSI_CAP_BC].cap;
> > > > -			nd->state = ncsi_dev_state_config_ecnt;
> > > > +			if (ncsi_channel_is_tx(ndp, nc))
> > > > +				nd->state = ncsi_dev_state_config_ecnt;
> > > > +			else
> > > > +				nd->state = ncsi_dev_state_config_ec;
> > > >  #if IS_ENABLED(CONFIG_IPV6)
> > > >  			if (ndp->inet6_addr_num > 0 &&
> > > >  			    (nc->caps[NCSI_CAP_GENERIC].cap &
> > > >  			     NCSI_CAP_GENERIC_MC))
> > > >  				nd->state = ncsi_dev_state_config_egmf;
> > > > -			else
> > > > -				nd->state = ncsi_dev_state_config_ecnt;
> > > >  		} else if (nd->state == ncsi_dev_state_config_egmf) {
> > > >  			nca.type = NCSI_PKT_CMD_EGMF;
> > > >  			nca.dwords[0] = nc->caps[NCSI_CAP_MC].cap;
> > > > -			nd->state = ncsi_dev_state_config_ecnt;
> > > > +			if (ncsi_channel_is_tx(ndp, nc))
> > > > +				nd->state = ncsi_dev_state_config_ecnt;
> > > > +			else
> > > > +				nd->state = ncsi_dev_state_config_ec;
> > > >  #endif /* CONFIG_IPV6 */
> > > >  		} else if (nd->state == ncsi_dev_state_config_ecnt) {
> > > >  			nca.type = NCSI_PKT_CMD_ECNT;
> > > > @@ -840,43 +891,35 @@ static void ncsi_configure_channel(struct ncsi_dev_priv *ndp)
> > > >  
> > > >  static int ncsi_choose_active_channel(struct ncsi_dev_priv *ndp)
> > > >  {
> > > > -	struct ncsi_package *np, *force_package;
> > > > -	struct ncsi_channel *nc, *found, *hot_nc, *force_channel;
> > > > +	struct ncsi_package *np;
> > > > +	struct ncsi_channel *nc, *found, *hot_nc;
> > > >  	struct ncsi_channel_mode *ncm;
> > > > -	unsigned long flags;
> > > > +	unsigned long flags, cflags;
> > > > +	bool with_link;
> 
> All local variable declarations from longest to shortest
> line.
> 
> > > >  
> > > >  	spin_lock_irqsave(&ndp->lock, flags);
> > > >  	hot_nc = ndp->hot_channel;
> > > > -	force_channel = ndp->force_channel;
> > > > -	force_package = ndp->force_package;
> > > >  	spin_unlock_irqrestore(&ndp->lock, flags);
> > > >  
> > > > -	/* Force a specific channel whether or not it has link if we have been
> > > > -	 * configured to do so
> > > > -	 */
> > > > -	if (force_package && force_channel) {
> > > > -		found = force_channel;
> > > > -		ncm = &found->modes[NCSI_MODE_LINK];
> > > > -		if (!(ncm->data[2] & 0x1))
> > > > -			netdev_info(ndp->ndev.dev,
> > > > -				    "NCSI: Channel %u forced, but it is link down\n",
> > > > -				    found->id);
> > > > -		goto out;
> > > > -	}
> > > > -
> > > > -	/* The search is done once an inactive channel with up
> > > > -	 * link is found.
> > > > +	/* By default the search is done once an inactive channel with up
> > > > +	 * link is found, unless a preferred channel is set.
> > > > +	 * If multi_package or multi_channel are configured all channels in the
> > > > +	 * whitelist with link are added to the channel queue.
> > > >  	 */
> > > >  	found = NULL;
> > > > +	with_link = false;
> > > >  	NCSI_FOR_EACH_PACKAGE(ndp, np) {
> > > > -		if (ndp->force_package && np != ndp->force_package)
> > > > +		if (!(ndp->package_whitelist & (0x1 << np->id)))
> > > >  			continue;
> > > >  		NCSI_FOR_EACH_CHANNEL(np, nc) {
> > > > -			spin_lock_irqsave(&nc->lock, flags);
> > > > +			if (!(np->channel_whitelist & (0x1 << nc->id)))
> > > > +				continue;
> > > > +
> > > > +			spin_lock_irqsave(&nc->lock, cflags);
> > > >  
> > > >  			if (!list_empty(&nc->link) ||
> > > >  			    nc->state != NCSI_CHANNEL_INACTIVE) {
> > > > -				spin_unlock_irqrestore(&nc->lock, flags);
> > > > +				spin_unlock_irqrestore(&nc->lock, cflags);
> > > >  				continue;
> > > >  			}
> > > >  
> > > > @@ -888,32 +931,42 @@ static int ncsi_choose_active_channel(struct ncsi_dev_priv *ndp)
> > > >  
> > > >  			ncm = &nc->modes[NCSI_MODE_LINK];
> > > >  			if (ncm->data[2] & 0x1) {
> > > 
> > > This data will not be updated if the channel monitor for it is not running.
> > > If I move the cable from the current configured channel to the other channel,
> > > NC-SI module will not detect the link status as the other channel is not configured
> > > and AEN will not happen.
> > > Is it per design that NC-SI module will always use the first interface with the link?
> > 
> > We'll check the link state of every channel at init, and per-package on
> > suspend events, but you're right that we only get AENs right now from
> > configured channels. There's probably no reason why we could at least
> > enable AENs on all channels even if we don't use them for network, maybe
> > during the package probe step.
> > 
> 
> To receive the AEN packet, the channel (RX) needs to be enabled.
> It seems that we need to send Get Link Status command to all channels before choosing
> The active channel.

We mostly already do a GLS before this; either we've just started the
NCSI device in which case we sent a GLS to every package as part of
probing, or some other event has occured and we've gone through
ncsi_suspend_channel() which will do ncsi_dev_state_suspend_gls.
However there are some gaps here, for example the
ncsi_stop_dev()/ncsi_start_dev() path won't cause GLS commands to be sent
so we'll have stale information. Continued below..

> 
> > > > -				spin_unlock_irqrestore(&nc->lock, flags);
> > > >  				found = nc;
> > > > -				goto out;
> > > > +				with_link = true;
> > > > +
> > > > +				spin_lock_irqsave(&ndp->lock, flags);
> > > > +				list_add_tail_rcu(&found->link,
> > > > +						  &ndp->channel_queue);
> > > > +				spin_unlock_irqrestore(&ndp->lock, flags);
> > > > +
> > > > +				netdev_dbg(ndp->ndev.dev,
> > > > +					   "NCSI: Channel %u added to queue (link %s)\n",
> > > > +					   found->id,
> > > > +					   ncm->data[2] & 0x1 ? "up" : "down");
> > > >  			}
> 
> If multi_channel is enabled, the interface without link still needs to be processed.
> Something likes below?
> 			} else if (np->multi_channel) {
> 				found = nc;
> 
> 				spin_lock_irqsave(&ndp->lock, flags);
> 				list_add_tail_rcu(&found->link,
> 						  &ndp->channel_queue);
> 				spin_unlock_irqrestore(&ndp->lock, flags);
> 
> 				netdev_dbg(ndp->ndev.dev,
> 					   "NCSI: pkg %u ch %u added to queue (link %s)\n",
> 					   found->package->id,
> 					   found->id,
> 					   ncm->data[2] & 0x1 ? "up" : "down");
> 			}

Yes we should just configure every channel in the whitelist if
multi_channel is set - this gives us AENs for that channel and we'll
recognise when it goes link up.

It would be good to have a better way of "bouncing" the NCSI
configuration in these cases though, especially to include a re-probe of
channel states. I'll include something like that for this series.

> 
> > > > +			spin_unlock_irqrestore(&nc->lock, cflags);
> > > >  
> > > > -			spin_unlock_irqrestore(&nc->lock, flags);
> > > > +			if (with_link && !np->multi_channel)
> > > > +				break;
> > > >  		}
> > > > +		if (with_link && !ndp->multi_package)
> > > > +			break;
> > > >  	}
> > > >  
> > > > -	if (!found) {
> > > > +	if (!with_link && found) {
> > > > +		netdev_info(ndp->ndev.dev,
> > > > +			    "NCSI: No channel with link found, configuring channel %u\n",
> > > > +			    found->id);
> > > > +		spin_lock_irqsave(&ndp->lock, flags);
> > > > +		list_add_tail_rcu(&found->link, &ndp->channel_queue);
> > > > +		spin_unlock_irqrestore(&ndp->lock, flags);
> > > > +	} else if (!found) {
> > > >  		netdev_warn(ndp->ndev.dev,
> > > > -			    "NCSI: No channel found with link\n");
> > > > +			    "NCSI: No channel found to configure!\n");
> > > >  		ncsi_report_link(ndp, true);
> > > >  		return -ENODEV;
> > > >  	}
> > > >  
> > > > -	ncm = &found->modes[NCSI_MODE_LINK];
> > > > -	netdev_dbg(ndp->ndev.dev,
> > > > -		   "NCSI: Channel %u added to queue (link %s)\n",
> > > > -		   found->id, ncm->data[2] & 0x1 ? "up" : "down");
> > > > -
> > > > -out:
> > > > -	spin_lock_irqsave(&ndp->lock, flags);
> > > > -	list_add_tail_rcu(&found->link, &ndp->channel_queue);
> > > > -	spin_unlock_irqrestore(&ndp->lock, flags);
> > > > -
> > > >  	return ncsi_process_next_channel(ndp);
> > > >  }
> > > >  
> > > > @@ -1428,6 +1481,7 @@ struct ncsi_dev *ncsi_register_dev(struct net_device *dev,
> > > >  	INIT_LIST_HEAD(&ndp->channel_queue);
> > > >  	INIT_LIST_HEAD(&ndp->vlan_vids);
> > > >  	INIT_WORK(&ndp->work, ncsi_dev_work);
> > > > +	ndp->package_whitelist = UINT_MAX;
> > > >  
> > > >  	/* Initialize private NCSI device */
> > > >  	spin_lock_init(&ndp->lock);
> > > > diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
> > > > index 32cb7751d216..33a091e6f466 100644
> > > > --- a/net/ncsi/ncsi-netlink.c
> > > > +++ b/net/ncsi/ncsi-netlink.c
> > > 
> > > Is the following missed in the patch?
> > > static const struct nla_policy ncsi_genl_policy[NCSI_ATTR_MAX + 1] = {
> > > ...
> > > 	[NCSI_ATTR_MULTI_FLAG] =	{ .type = NLA_FLAG },
> > > 	[NCSI_ATTR_PACKAGE_MASK] =	{ .type = NLA_U32 },
> > > 	[NCSI_ATTR_CHANNEL_MASK] =	{ .type = NLA_U32 },
> > 
> > Oops, added.
> > 
> > > > @@ -67,7 +67,7 @@ static int ncsi_write_channel_info(struct sk_buff *skb,
> > > >  	nla_put_u32(skb, NCSI_CHANNEL_ATTR_LINK_STATE, m->data[2]);
> > > >  	if (nc->state == NCSI_CHANNEL_ACTIVE)
> > > >  		nla_put_flag(skb, NCSI_CHANNEL_ATTR_ACTIVE);
> > > > -	if (ndp->force_channel == nc)
> > > > +	if (nc == nc->package->preferred_channel)
> > > >  		nla_put_flag(skb, NCSI_CHANNEL_ATTR_FORCED);
> > > >  
> > > >  	nla_put_u32(skb, NCSI_CHANNEL_ATTR_VERSION_MAJOR, nc->version.version);
> > > > @@ -112,7 +112,7 @@ static int ncsi_write_package_info(struct sk_buff *skb,
> > > >  		if (!pnest)
> > > >  			return -ENOMEM;
> > > >  		nla_put_u32(skb, NCSI_PKG_ATTR_ID, np->id);
> > > > -		if (ndp->force_package == np)
> > > > +		if ((0x1 << np->id) == ndp->package_whitelist)
> > > >  			nla_put_flag(skb, NCSI_PKG_ATTR_FORCED);
> > > >  		cnest = nla_nest_start(skb, NCSI_PKG_ATTR_CHANNEL_LIST);
> > > >  		if (!cnest) {
> > > > @@ -288,45 +288,54 @@ static int ncsi_set_interface_nl(struct sk_buff *msg, struct genl_info *info)
> > > >  	package_id = nla_get_u32(info->attrs[NCSI_ATTR_PACKAGE_ID]);
> > > >  	package = NULL;
> > > >  
> > > > -	spin_lock_irqsave(&ndp->lock, flags);
> > > > -
> > > >  	NCSI_FOR_EACH_PACKAGE(ndp, np)
> > > >  		if (np->id == package_id)
> > > >  			package = np;
> > > >  	if (!package) {
> > > >  		/* The user has set a package that does not exist */
> > > > -		spin_unlock_irqrestore(&ndp->lock, flags);
> > > >  		return -ERANGE;
> > > >  	}
> > > >  
> > > >  	channel = NULL;
> > > > -	if (!info->attrs[NCSI_ATTR_CHANNEL_ID]) {
> > > > -		/* Allow any channel */
> > > > -		channel_id = NCSI_RESERVED_CHANNEL;
> > > > -	} else {
> > > > +	if (info->attrs[NCSI_ATTR_CHANNEL_ID]) {
> > > >  		channel_id = nla_get_u32(info->attrs[NCSI_ATTR_CHANNEL_ID]);
> > > >  		NCSI_FOR_EACH_CHANNEL(package, nc)
> > > > -			if (nc->id == channel_id)
> > > > +			if (nc->id == channel_id) {
> > > >  				channel = nc;
> > > > +				break;
> > > > +			}
> > > > +		if (!channel) {
> > > > +			netdev_info(ndp->ndev.dev,
> > > > +				    "NCSI: Channel %u does not exist!\n",
> > > > +				    channel_id);
> > > > +			return -ERANGE;
> > > > +		}
> > > >  	}
> > > >  
> > > > -	if (channel_id != NCSI_RESERVED_CHANNEL && !channel) {
> > > > -		/* The user has set a channel that does not exist on this
> > > > -		 * package
> > > > -		 */
> > > > -		spin_unlock_irqrestore(&ndp->lock, flags);
> > > > -		netdev_info(ndp->ndev.dev, "NCSI: Channel %u does not exist!\n",
> > > > -			    channel_id);
> > > > -		return -ERANGE;
> > > > -	}
> > > > -
> > > > -	ndp->force_package = package;
> > > > -	ndp->force_channel = channel;
> > > > +	spin_lock_irqsave(&ndp->lock, flags);
> > > > +	ndp->package_whitelist = 0x1 << package->id;
> > > > +	ndp->multi_package = false;
> > > >  	spin_unlock_irqrestore(&ndp->lock, flags);
> > > >  
> > > > -	netdev_info(ndp->ndev.dev, "Set package 0x%x, channel 0x%x%s as preferred\n",
> > > > -		    package_id, channel_id,
> > > > -		    channel_id == NCSI_RESERVED_CHANNEL ? " (any)" : "");
> > > > +	spin_lock_irqsave(&package->lock, flags);
> > > > +	package->multi_channel = false;
> > > > +	if (channel) {
> > > > +		package->channel_whitelist = 0x1 << channel->id;
> > > > +		package->preferred_channel = channel;
> > > > +	} else {
> > > > +		/* Allow any channel */
> > > > +		package->channel_whitelist = UINT_MAX;
> > > > +		package->preferred_channel = NULL;
> > > > +	}
> > > > +	spin_unlock_irqrestore(&package->lock, flags);
> > > > +
> > > > +	if (channel)
> > > > +		netdev_info(ndp->ndev.dev,
> > > > +			    "Set package 0x%x, channel 0x%x as preferred\n",
> > > > +			    package_id, channel_id);
> > > > +	else
> > > > +		netdev_info(ndp->ndev.dev, "Set package 0x%x as preferred\n",
> > > > +			    package_id);
> > > >  
> > > >  	/* Bounce the NCSI channel to set changes */
> > > >  	ncsi_stop_dev(&ndp->ndev);
> > > > @@ -338,6 +347,7 @@ static int ncsi_set_interface_nl(struct sk_buff *msg, struct genl_info *info)
> > > >  static int ncsi_clear_interface_nl(struct sk_buff *msg, struct genl_info *info)
> > > >  {
> > > >  	struct ncsi_dev_priv *ndp;
> > > > +	struct ncsi_package *np;
> > > >  	unsigned long flags;
> > > >  
> > > >  	if (!info || !info->attrs)
> > > > @@ -351,11 +361,19 @@ static int ncsi_clear_interface_nl(struct sk_buff *msg, struct genl_info *info)
> > > >  	if (!ndp)
> > > >  		return -ENODEV;
> > > >  
> > > > -	/* Clear any override */
> > > > +	/* Reset any whitelists and disable multi mode */
> > > >  	spin_lock_irqsave(&ndp->lock, flags);
> > > > -	ndp->force_package = NULL;
> > > > -	ndp->force_channel = NULL;
> > > > +	ndp->package_whitelist = UINT_MAX;
> > > > +	ndp->multi_package = false;
> > > >  	spin_unlock_irqrestore(&ndp->lock, flags);
> > > > +
> > > > +	NCSI_FOR_EACH_PACKAGE(ndp, np) {
> > > > +		spin_lock_irqsave(&np->lock, flags);
> > > > +		np->multi_channel = false;
> > > > +		np->channel_whitelist = UINT_MAX;
> > > > +		np->preferred_channel = NULL;
> > > > +		spin_unlock_irqrestore(&np->lock, flags);
> > > > +	}
> > > >  	netdev_info(ndp->ndev.dev, "NCSI: Cleared preferred package/channel\n");
> > > >  
> > > >  	/* Bounce the NCSI channel to set changes */
> > > > @@ -365,6 +383,137 @@ static int ncsi_clear_interface_nl(struct sk_buff *msg, struct genl_info *info)
> > > >  	return 0;
> > > >  }
> > > >  
> > > > +static int ncsi_set_package_mask_nl(struct sk_buff *msg,
> > > > +				    struct genl_info *info)
> > > > +{
> > > > +	struct ncsi_dev_priv *ndp;
> > > > +	unsigned long flags;
> > > > +	int rc;
> > > > +
> > > > +	if (!info || !info->attrs)
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (!info->attrs[NCSI_ATTR_IFINDEX])
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (!info->attrs[NCSI_ATTR_PACKAGE_MASK])
> > > > +		return -EINVAL;
> > > > +
> > > > +	ndp = ndp_from_ifindex(get_net(sock_net(msg->sk)),
> > > > +			       nla_get_u32(info->attrs[NCSI_ATTR_IFINDEX]));
> > > > +	if (!ndp)
> > > > +		return -ENODEV;
> > > > +
> > > > +	spin_lock_irqsave(&ndp->lock, flags);
> > > > +	ndp->package_whitelist =
> > > > +		nla_get_u32(info->attrs[NCSI_ATTR_PACKAGE_MASK]);
> > > > +
> > > > +	if (nla_get_flag(info->attrs[NCSI_ATTR_MULTI_FLAG])) {
> > > > +		if (ndp->flags & NCSI_DEV_HWA) {
> > > > +			ndp->multi_package = true;
> > > > +			rc = 0;
> > > > +		} else {
> > > > +			netdev_err(ndp->ndev.dev,
> > > > +				   "NCSI: Can't use multiple packages without HWA\n");
> > > > +			rc = -EPERM;
> > > > +		}
> > > > +	} else {
> > > > +		rc = 0;
> > > > +	}
> 
> If the flag is not set, do we need to clear the multi_package flag? It is possible that it is
> user's intension to disable the multi-mode.
> 	} else {
> 		ndp->multi_package = false;
> 		rc = 0;
> 	}

Yep, good catch (although technically multi_package could not have been
set true in the past if HWA is not available).

> 
> > > > +
> > > > +	spin_unlock_irqrestore(&ndp->lock, flags);
> > > > +
> > > > +	if (!rc) {
> > > > +		/* Bounce the NCSI channel to set changes */
> > > > +		ncsi_stop_dev(&ndp->ndev);
> > > > +		ncsi_start_dev(&ndp->ndev);
> > > 
> > > Is it possible to delay the restart? If we have two packages, we might send
> > > set_package_mask command once and set_channel_mask command twice.
> > > We will see the unnecessary reconfigurations in a very short period time.
> > 
> > We could probably do with a better way of bouncing the link here too. I
> > suppose we could schedule the actual link 'bounce' to occur a second or
> > two later, and delay if we receive new configuration in that time.
> > Perhaps something outside of the scope of this patch though.
> > 
> > > > +	}
> > > > +
> > > > +	return rc;
> > > > +}
> > > > +
> > > > +static int ncsi_set_channel_mask_nl(struct sk_buff *msg,
> > > > +				    struct genl_info *info)
> > > > +{
> > > > +	struct ncsi_package *np, *package;
> > > > +	struct ncsi_channel *nc, *channel;
> > > > +	struct ncsi_dev_priv *ndp;
> > > > +	unsigned long flags;
> > > > +	u32 package_id, channel_id;
> 
> All local variable declarations from longest to shortest
> line.
> 
> > > > +
> > > > +	if (!info || !info->attrs)
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (!info->attrs[NCSI_ATTR_IFINDEX])
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (!info->attrs[NCSI_ATTR_PACKAGE_ID])
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (!info->attrs[NCSI_ATTR_CHANNEL_MASK])
> > > > +		return -EINVAL;
> > > > +
> > > > +	ndp = ndp_from_ifindex(get_net(sock_net(msg->sk)),
> > > > +			       nla_get_u32(info->attrs[NCSI_ATTR_IFINDEX]));
> > > > +	if (!ndp)
> > > > +		return -ENODEV;
> > > > +
> > > > +	package_id = nla_get_u32(info->attrs[NCSI_ATTR_PACKAGE_ID]);
> > > > +	package = NULL;
> > > > +	NCSI_FOR_EACH_PACKAGE(ndp, np)
> > > > +		if (np->id == package_id) {
> > > > +			package = np;
> > > > +			break;
> > > > +		}
> > > > +	if (!package)
> > > > +		return -ERANGE;
> > > > +
> > > > +	spin_lock_irqsave(&package->lock, flags);
> > > > +
> > > > +	channel = NULL;
> > > > +	if (info->attrs[NCSI_ATTR_CHANNEL_ID]) {
> > > > +		channel_id = nla_get_u32(info->attrs[NCSI_ATTR_CHANNEL_ID]);
> > > > +		NCSI_FOR_EACH_CHANNEL(np, nc)
> > > > +			if (nc->id == channel_id) {
> > > > +				channel = nc;
> > > > +				break;
> > > > +			}
> > > > +		if (!channel) {
> > > > +			spin_unlock_irqrestore(&package->lock, flags);
> > > > +			return -ERANGE;
> > > > +		}
> > > > +		netdev_dbg(ndp->ndev.dev,
> > > > +			   "NCSI: Channel %u set as preferred channel\n",
> > > > +			   channel->id);
> > > > +	}
> > > > +
> > > > +	package->channel_whitelist =
> > > > +		nla_get_u32(info->attrs[NCSI_ATTR_CHANNEL_MASK]);
> > > > +	if (package->channel_whitelist == 0)
> > > > +		netdev_dbg(ndp->ndev.dev,
> > > > +			   "NCSI: Package %u set to all channels disabled\n",
> > > > +			   package->id);
> > > > +
> > > > +	package->preferred_channel = channel;
> > > > +
> > > > +	if (nla_get_flag(info->attrs[NCSI_ATTR_MULTI_FLAG])) {
> > > > +		package->multi_channel = true;
> > > > +		netdev_info(ndp->ndev.dev,
> > > > +			    "NCSI: Multi-channel enabled on package %u\n",
> > > > +			    package_id);
> > > > +	} else {
> > > > +		package->multi_channel = false;
> > > > +	}
> > > > +
> > > > +	spin_unlock_irqrestore(&package->lock, flags);
> > > > +
> > > > +	/* Bounce the NCSI channel to set changes */
> > > > +	ncsi_stop_dev(&ndp->ndev);
> > > > +	ncsi_start_dev(&ndp->ndev);
> > > 
> > > Same question as set_package_mask function.
> > > Is it possible to delay the restart? If we have two packages, we might send
> > > set_package_mask command once and set_channel_mask command twice.
> > > We will see the unnecessary reconfigurations in a very short period time.
> > > 
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  static const struct genl_ops ncsi_ops[] = {
> > > >  	{
> > > >  		.cmd = NCSI_CMD_PKG_INFO,
> > > > @@ -385,6 +534,18 @@ static const struct genl_ops ncsi_ops[] = {
> > > >  		.doit = ncsi_clear_interface_nl,
> > > >  		.flags = GENL_ADMIN_PERM,
> > > >  	},
> > > > +	{
> > > > +		.cmd = NCSI_CMD_SET_PACKAGE_MASK,
> > > > +		.policy = ncsi_genl_policy,
> > > > +		.doit = ncsi_set_package_mask_nl,
> > > > +		.flags = GENL_ADMIN_PERM,
> > > > +	},
> > > > +	{
> > > > +		.cmd = NCSI_CMD_SET_CHANNEL_MASK,
> > > > +		.policy = ncsi_genl_policy,
> > > > +		.doit = ncsi_set_channel_mask_nl,
> > > > +		.flags = GENL_ADMIN_PERM,
> > > > +	},
> > > >  };
> > > >  
> > > >  static struct genl_family ncsi_genl_family __ro_after_init = {
> > > > diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
> > > > index d66b34749027..02ce7626b579 100644
> > > > --- a/net/ncsi/ncsi-rsp.c
> > > > +++ b/net/ncsi/ncsi-rsp.c
> > > > @@ -241,7 +241,7 @@ static int ncsi_rsp_handler_dcnt(struct ncsi_request *nr)
> > > >  	if (!ncm->enable)
> > > >  		return 0;
> > > >  
> > > > -	ncm->enable = 1;
> > > > +	ncm->enable = 0;
> > > >  	return 0;
> > > >  }
> > > >  
> > > > -- 
> > > > 2.19.0

^ permalink raw reply

* Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps
From: David Ahern @ 2018-10-11 18:44 UTC (permalink / raw)
  To: Jamal Hadi Salim, Sowmini Varadhan, Stephen Hemminger
  Cc: David Ahern, netdev, davem
In-Reply-To: <a8810dd6-fb22-8309-77a2-3e761feea871@mojatatu.com>

On 10/11/18 12:05 PM, Jamal Hadi Salim wrote:
> On 2018-10-11 1:04 p.m., David Ahern wrote:
> 
>> You can already filter link dumps by kind. How? By passing in the KIND
>> attribute on a dump request. This type of filtering exists for link
>> dumps, neighbor dumps, fdb dumps. Why is there a push to make route
>> dumps different? Why can't they be consistent and use existing semantics?
> 
> I think you meant filtering by ifindex in neighbor.

I meant the general API of users passing filter arguments as attributes
to the dump (or values in the header) -- KIND, MASTER, device index,
etc. This is an existing API and existing capability.

> note: I would argue that there are already "adhoc" ways of filtering
> in place, mostly use case driven). Otherwise Sowmini wouldnt have to
> craft that bpf filter. There are netlink users who have none or some
> weird filtering involved. There is no arguement that your approach
> works for rtm. But the rest of the users missing filters will require
> similar kernel changes. Could this be made generic enough to benefit
> other netlink users?
> The problem is there's always one new attribute that would make sense
> for some use case which requires a kernel change ("send me an event only
> if you get link down" or "dump all ports with link down").
> 

I disagree with your overall premise of bpf the end-all hammer. It is a
tool but not the only tool. For starters, you are proposing building the
message, run the filter on it, and potentially back the message up to
drop the recently added piece because the filter does not want it
included. That is still wasting a lot of cpu cycles to build and drop. I
am thinking about scale to 1 million routes -- I do not need the dump
loop building a message for 1 million entries only to drop 99% of them.
That is crazy.

The way the kernel manages route tables says I should pass in the table
id as it is a major differentiator on what is returned. From there
lookup the specific table (I need to fix this part per my response to
Andrew), and then only walk it. The existing semantics, capabilities
that exist for other dump commands is the most efficient for some of
these high level, big hammer filters.

What you want gets into the tiniest of details and yes the imagination
can go wild with combinations of filter options. So maybe this scanning
of post-built messages is reasonable *after* the high level sorting is done.

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2018-10-12  2:06 UTC (permalink / raw)
  To: gregkh; +Cc: akpm, netdev, linux-kernel


1) RXRPC receive path fixes from David Howells.

2) Re-export __skb_recv_udp(), from Jiri Kosina.

3) Fix refcounting in u32 classificer, from Al Viro.

4) Userspace netlink ABI fixes from Eugene Syromiatnikov.

5) Don't double iounmap on rmmod in ena driver, from Arthur
   Kiyanovski.

6) Fix devlink string attribute handling, we must pull a copy
   into a kernel buffer if the lifetime extends past the netlink
   request.  From Moshe Shemesh.

7) Fix hangs in RDS, from Ka-Cheong Poon.

8) Fix recursive locking lockdep warnings in tipc, from Ying Xue.

9) Clear RX irq correctly in socionext, from Ilias Apalodimas.

10) bcm_sf2 fixes from Florian Fainelli.

Please pull, thanks a lot!

The following changes since commit c1d84a1b42ef70d8ae601df9cadedc7ed4f1beb1:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-10-06 02:11:30 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 6b9bab550cac108d86c731c207e3e74ea10eb638:

  Merge branch 'net-dsa-bcm_sf2-Couple-of-fixes' (2018-10-11 15:20:00 -0700)

----------------------------------------------------------------
Al Viro (1):
      net: sched: cls_u32: fix hnode refcounting

Arthur Kiyanovski (4):
      net: ena: fix warning in rmmod caused by double iounmap
      net: ena: fix rare bug when failed restart/resume is followed by driver removal
      net: ena: fix NULL dereference due to untimely napi initialization
      net: ena: fix auto casting to boolean

David Howells (10):
      rxrpc: Fix some missed refs to init_net
      rxrpc: Fix the data_ready handler
      rxrpc: Use the UDP encap_rcv hook
      rxrpc: Don't need to take the RCU read lock in the packet receiver
      rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window()
      rxrpc: Carry call state out of locked section in rxrpc_rotate_tx_window()
      rxrpc: Only take the rwind and mtu values from latest ACK
      rxrpc: Fix connection-level abort handling
      rxrpc: Fix the rxrpc_tx_packet trace line
      rxrpc: Fix the packet reception routine

David S. Miller (7):
      Merge branch 'net-smc-userspace-breakage-fixes'
      Merge branch 'ena-fixes'
      Merge branch 'devlink-param-type-string-fixes'
      Merge tag 'rxrpc-fixes-20181008' of git://git.kernel.org/.../dhowells/linux-fs
      Merge branch 'net-ipv4-fixes-for-PMTU-when-link-MTU-changes'
      Merge branch 'net-explicitly-requires-bash-when-needed'
      Merge branch 'net-dsa-bcm_sf2-Couple-of-fixes'

Eric Dumazet (1):
      net: make skb_partial_csum_set() more robust against overflows

Eugene Syromiatnikov (2):
      net/smc: use __aligned_u64 for 64-bit smc_diag fields
      net/smc: retain old name for diag_mode field

Florian Fainelli (2):
      net: dsa: bcm_sf2: Fix unbind ordering
      net: dsa: bcm_sf2: Call setup during switch resume

Giacinto Cifelli (1):
      qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface

Ilias Apalodimas (1):
      net: socionext: clear rx irq correctly

Jiri Kosina (1):
      udp: Unbreak modules that rely on external __skb_recv_udp() availability

Jon Maloy (1):
      tipc: set link tolerance correctly in broadcast link

Ka-Cheong Poon (1):
      rds: RDS (tcp) hangs on sendto() to unresponding address

Maciej S. Szmigiero (1):
      r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips

Mike Rapoport (1):
      net/ipv6: stop leaking percpu memory in fib6 info

Moshe Shemesh (4):
      devlink: Fix param set handling for string type
      devlink: Fix param cmode driverinit for string type
      devlink: Add helper function for safely copy string param
      net/mlx4_core: Fix warnings during boot on driverinit param set failures

Paolo Abeni (2):
      selftests: rtnetlink.sh explicitly requires bash.
      selftests: udpgso_bench.sh explicitly requires bash

Parthasarathy Bhuvaragan (1):
      tipc: queue socket protocol error messages into socket receive buffer

Sabrina Dubroca (2):
      net: ipv4: update fnhe_pmtu when first hop's MTU changes
      net: ipv4: don't let PMTU updates increase route MTU

Sebastian Andrzej Siewior (1):
      net: phy: sfp: remove sfp_mutex's definition

Ying Xue (1):
      tipc: eliminate possible recursive locking detected by LOCKDEP

 drivers/net/dsa/bcm_sf2.c                     |  14 ++-----
 drivers/net/ethernet/amazon/ena/ena_eth_com.c |   8 ++--
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  22 +++++-----
 drivers/net/ethernet/mellanox/mlx4/main.c     |  43 +++++++------------
 drivers/net/ethernet/realtek/r8169.c          |   4 +-
 drivers/net/ethernet/socionext/netsec.c       |   5 ++-
 drivers/net/phy/sfp.c                         |   2 -
 drivers/net/usb/qmi_wwan.c                    |   1 +
 include/linux/netdevice.h                     |   7 ++++
 include/net/devlink.h                         |  12 +++++-
 include/net/ip_fib.h                          |   1 +
 include/trace/events/rxrpc.h                  |   1 +
 include/uapi/linux/smc_diag.h                 |  25 ++++++-----
 include/uapi/linux/udp.h                      |   1 +
 net/core/dev.c                                |  28 ++++++++++++-
 net/core/devlink.c                            |  43 ++++++++++++++++---
 net/core/skbuff.c                             |  12 +++---
 net/ipv4/fib_frontend.c                       |  12 ++++--
 net/ipv4/fib_semantics.c                      |  50 ++++++++++++++++++++++
 net/ipv4/route.c                              |   7 ++--
 net/ipv4/udp.c                                |   2 +-
 net/ipv6/ip6_fib.c                            |   2 +
 net/rds/send.c                                |  13 ++++--
 net/rxrpc/ar-internal.h                       |  23 +++++-----
 net/rxrpc/call_accept.c                       |  27 ++++++++----
 net/rxrpc/call_object.c                       |   5 ++-
 net/rxrpc/conn_client.c                       |  10 +++--
 net/rxrpc/conn_event.c                        |  26 +++++++-----
 net/rxrpc/input.c                             | 253 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------------------
 net/rxrpc/local_object.c                      |  30 ++++++++++---
 net/rxrpc/peer_event.c                        |   5 +++
 net/rxrpc/peer_object.c                       |  29 ++++++++-----
 net/sched/cls_u32.c                           |  10 ++---
 net/tipc/link.c                               |  27 ++++++++----
 net/tipc/socket.c                             |  14 ++++++-
 tools/testing/selftests/net/rtnetlink.sh      |   2 +-
 tools/testing/selftests/net/udpgso_bench.sh   |   2 +-
 37 files changed, 493 insertions(+), 285 deletions(-)

^ permalink raw reply

* [PATCH net] net: udp: fix handling of CHECKSUM_COMPLETE packets
From: Sean Tranchetti @ 2018-10-11 18:30 UTC (permalink / raw)
  To: eric.dumazet, davem, netdev; +Cc: Sean Tranchetti

Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
incorrect for any packet that has an incorrect checksum value.

udp4/6_csum_init() will both make a call to
__skb_checksum_validate_complete() to initialize/validate the csum
field when receiving a CHECKSUM_COMPLETE packet. When this packet
fails validation, skb->csum will be overwritten with the pseudoheader
checksum so the packet can be fully validated by software, but the
skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
the stack can later warn the user about their hardware spewing bad
checksums. Unfortunately, leaving the SKB in this state can cause
problems later on in the checksum calculation.

Since the the packet is still marked as CHECKSUM_COMPLETE,
udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
from skb->csum instead of adding it, leaving us with a garbage value
in that field. Once we try to copy the packet to userspace in the
udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
to checksum the packet data and add it in the garbage skb->csum value
to perform our final validation check.

Since the value we're validating is not the proper checksum, it's possible
that the folded value could come out to 0, causing us not to drop the
packet. Instead, we believe that the packet was checksummed incorrectly
by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
to warn the user with netdev_rx_csum_fault(skb->dev);

Unfortunately, since this is the UDP path, skb->dev has been overwritten
by skb->dev_scratch and is no longer a valid pointer, so we end up
reading invalid memory.

This patch addresses this problem in two ways:
	1) Remove the invalid netdev_rx_csum_fault(skb->dev) call from
	   skb_copy_and_csum_datagram_msg(). Since this is used by UDP
	   where skb->dev is invalid, trying to warn doesn't make sense.

	2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
	   If we receive a packet that's CHECKSUM_COMPLETE that fails
	   verification (i.e. skb->csum_valid == 0), check who performed
	   the calculation. It's possible that the checksum was done in
	   software by the network stack earlier (such as Netfilter's
	   CONNTRACK module), and if that says the checksum is bad,
	   we can drop the packet immediately instead of waiting until
	   we try and copy it to userspace. Otherwise, we need to
	   mark the SKB as CHECKSUM_NONE, since the skb->csum field
	   no longer contains the full packet checksum after the
	   call to __skb_checksum_validate_complete().

Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
---
 net/core/datagram.c     |  3 ---
 net/ipv4/udp.c          | 20 ++++++++++++++++++--
 net/ipv6/ip6_checksum.c | 20 ++++++++++++++++++--
 3 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 9aac0d6..ab679c5 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -807,9 +807,6 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 			iov_iter_revert(&msg->msg_iter, chunk);
 			return -EINVAL;
 		}
-
-		if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
-			netdev_rx_csum_fault(skb->dev);
 	}
 	return 0;
 fault:
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c32a4c1..f8183fd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2120,8 +2120,24 @@ static inline int udp4_csum_init(struct sk_buff *skb, struct udphdr *uh,
 	/* Note, we are only interested in != 0 or == 0, thus the
 	 * force to int.
 	 */
-	return (__force int)skb_checksum_init_zero_check(skb, proto, uh->check,
-							 inet_compute_pseudo);
+	err = (__force int)skb_checksum_init_zero_check(skb, proto, uh->check,
+							inet_compute_pseudo);
+	if (err)
+		return err;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE && !skb->csum_valid) {
+		/* If SW calculated the value, we know it's bad */
+		if (skb->csum_complete_sw)
+			return 1;
+
+		/* HW says the value is bad. Let's validate that.
+		 * skb->csum is no longer the full packet checksum,
+		 * so don't treat it as such.
+		 */
+		skb_checksum_complete_unset(skb);
+	}
+
+	return 0;
 }
 
 /* wrapper for udp_queue_rcv_skb tacking care of csum conversion and
diff --git a/net/ipv6/ip6_checksum.c b/net/ipv6/ip6_checksum.c
index 547515e..3777170 100644
--- a/net/ipv6/ip6_checksum.c
+++ b/net/ipv6/ip6_checksum.c
@@ -88,8 +88,24 @@ int udp6_csum_init(struct sk_buff *skb, struct udphdr *uh, int proto)
 	 * Note, we are only interested in != 0 or == 0, thus the
 	 * force to int.
 	 */
-	return (__force int)skb_checksum_init_zero_check(skb, proto, uh->check,
-							 ip6_compute_pseudo);
+	err = (__force int)skb_checksum_init_zero_check(skb, proto, uh->check,
+							ip6_compute_pseudo);
+	if (err)
+		return err;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE && !skb->csum_valid) {
+		/* If SW calculated the value, we know it's bad */
+		if (skb->csum_complete_sw)
+			return 1;
+
+		/* HW says the value is bad. Let's validate that.
+		 * skb->csum is no longer the full packet checksum,
+		 * so don't treat is as such.
+		 */
+		skb_checksum_complete_unset(skb);
+	}
+
+	return 0;
 }
 EXPORT_SYMBOL(udp6_csum_init);
 
-- 
1.9.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox