Netdev List
 help / color / mirror / Atom feed
* [PATCH net v2 6/9] ibmvnic: track pending login
From: Dany Madden @ 2020-11-23 23:57 UTC (permalink / raw)
  To: netdev; +Cc: drt, sukadev, ljp
In-Reply-To: <20201123235757.6466-1-drt@linux.ibm.com>

From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>

If after ibmvnic sends a LOGIN it gets a FAILOVER, it is possible that
the worker thread will start reset process and free the login response
buffer before it gets a (now stale) LOGIN_RSP. The ibmvnic tasklet will
then try to access the login response buffer and crash.

Have ibmvnic track pending logins and discard any stale login responses.

Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 17 +++++++++++++++++
 drivers/net/ethernet/ibm/ibmvnic.h |  1 +
 2 files changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index a11af9e0a0a1..6d01cc8ec6b6 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3838,6 +3838,8 @@ static int send_login(struct ibmvnic_adapter *adapter)
 	crq.login.cmd = LOGIN;
 	crq.login.ioba = cpu_to_be32(buffer_token);
 	crq.login.len = cpu_to_be32(buffer_size);
+
+	adapter->login_pending = true;
 	ibmvnic_send_crq(adapter, &crq);
 
 	return 0;
@@ -4390,6 +4392,15 @@ static int handle_login_rsp(union ibmvnic_crq *login_rsp_crq,
 	u64 *size_array;
 	int i;
 
+	/* CHECK: Test/set of login_pending does not need to be atomic
+	 * because only ibmvnic_tasklet tests/clears this.
+	 */
+	if (!adapter->login_pending) {
+		netdev_warn(netdev, "Ignoring unexpected login response\n");
+		return 0;
+	}
+	adapter->login_pending = false;
+
 	dma_unmap_single(dev, adapter->login_buf_token, adapter->login_buf_sz,
 			 DMA_TO_DEVICE);
 	dma_unmap_single(dev, adapter->login_rsp_buf_token,
@@ -4761,6 +4772,11 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
 		case IBMVNIC_CRQ_INIT:
 			dev_info(dev, "Partner initialized\n");
 			adapter->from_passive_init = true;
+			/* Discard any stale login responses from prev reset.
+			 * CHECK: should we clear even on INIT_COMPLETE?
+			 */
+			adapter->login_pending = false;
+
 			if (!completion_done(&adapter->init_done)) {
 				complete(&adapter->init_done);
 				adapter->init_done_rc = -EIO;
@@ -5190,6 +5206,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id)
 	dev_set_drvdata(&dev->dev, netdev);
 	adapter->vdev = dev;
 	adapter->netdev = netdev;
+	adapter->login_pending = false;
 
 	ether_addr_copy(adapter->mac_addr, mac_addr_p);
 	ether_addr_copy(netdev->dev_addr, adapter->mac_addr);
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index 217dcc7ded70..6f0a701c4a38 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -1087,6 +1087,7 @@ struct ibmvnic_adapter {
 	struct delayed_work ibmvnic_delayed_reset;
 	unsigned long resetting;
 	bool napi_enabled, from_passive_init;
+	bool login_pending;
 
 	bool failover_pending;
 	bool force_reset_recovery;
-- 
2.26.2


^ permalink raw reply related

* [PATCH net v2 4/9] ibmvnic: restore adapter state on failed reset
From: Dany Madden @ 2020-11-23 23:57 UTC (permalink / raw)
  To: netdev; +Cc: drt, sukadev, ljp
In-Reply-To: <20201123235757.6466-1-drt@linux.ibm.com>

In a failed reset, driver could end up in VNIC_PROBED or VNIC_CLOSED
state and cannot recover in subsequent resets, leaving it offline.
This patch restores the adapter state to reset_state, the original
state when reset was called.

Fixes: b27507bb59ed5 ("net/ibmvnic: unlock rtnl_lock in reset so linkwatch_event can run")
Fixes: 2770a7984db58 ("ibmvnic: Introduce hard reset recovery")

Signed-off-by: Dany Madden <drt@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 67 ++++++++++++++++--------------
 1 file changed, 36 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index e84255c2fc72..97e01f01c458 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1857,7 +1857,7 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter,
 	if (reset_state == VNIC_OPEN) {
 		rc = __ibmvnic_close(netdev);
 		if (rc)
-			return rc;
+			goto out;
 	}
 
 	release_resources(adapter);
@@ -1875,24 +1875,25 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter,
 	}
 
 	rc = ibmvnic_reset_init(adapter, true);
-	if (rc)
-		return IBMVNIC_INIT_FAILED;
+	if (rc) {
+		rc = IBMVNIC_INIT_FAILED;
+		goto out;
+	}
 
 	/* If the adapter was in PROBE state prior to the reset,
 	 * exit here.
 	 */
 	if (reset_state == VNIC_PROBED)
-		return 0;
+		goto out;
 
 	rc = ibmvnic_login(netdev);
 	if (rc) {
-		adapter->state = reset_state;
-		return rc;
+		goto out;
 	}
 
 	rc = init_resources(adapter);
 	if (rc)
-		return rc;
+		goto out;
 
 	ibmvnic_disable_irqs(adapter);
 
@@ -1902,8 +1903,10 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter,
 		return 0;
 
 	rc = __ibmvnic_open(netdev);
-	if (rc)
-		return IBMVNIC_OPEN_FAILED;
+	if (rc) {
+		rc = IBMVNIC_OPEN_FAILED;
+		goto out;
+	}
 
 	/* refresh device's multicast list */
 	ibmvnic_set_multi(netdev);
@@ -1912,7 +1915,10 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter,
 	for (i = 0; i < adapter->req_rx_queues; i++)
 		napi_schedule(&adapter->napi[i]);
 
-	return 0;
+out:
+	if (rc)
+		adapter->state = reset_state;
+	return rc;
 }
 
 /**
@@ -2015,7 +2021,6 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 
 		rc = ibmvnic_login(netdev);
 		if (rc) {
-			adapter->state = reset_state;
 			goto out;
 		}
 
@@ -2083,6 +2088,9 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 	rc = 0;
 
 out:
+	/* restore the adapter state if reset failed */
+	if (rc)
+		adapter->state = reset_state;
 	rtnl_unlock();
 
 	return rc;
@@ -2115,43 +2123,46 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter,
 	if (rc) {
 		netdev_err(adapter->netdev,
 			   "Couldn't initialize crq. rc=%d\n", rc);
-		return rc;
+		goto out;
 	}
 
 	rc = ibmvnic_reset_init(adapter, false);
 	if (rc)
-		return rc;
+		goto out;
 
 	/* If the adapter was in PROBE state prior to the reset,
 	 * exit here.
 	 */
 	if (reset_state == VNIC_PROBED)
-		return 0;
+		goto out;
 
 	rc = ibmvnic_login(netdev);
-	if (rc) {
-		adapter->state = VNIC_PROBED;
-		return 0;
-	}
+	if (rc)
+		goto out;
 
 	rc = init_resources(adapter);
 	if (rc)
-		return rc;
+		goto out;
 
 	ibmvnic_disable_irqs(adapter);
 	adapter->state = VNIC_CLOSED;
 
 	if (reset_state == VNIC_CLOSED)
-		return 0;
+		goto out;
 
 	rc = __ibmvnic_open(netdev);
-	if (rc)
-		return IBMVNIC_OPEN_FAILED;
+	if (rc) {
+		rc = IBMVNIC_OPEN_FAILED;
+		goto out;
+	}
 
 	call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, netdev);
 	call_netdevice_notifiers(NETDEV_RESEND_IGMP, netdev);
-
-	return 0;
+out:
+	/* restore adapter state if reset failed */
+	if (rc)
+		adapter->state = reset_state;
+	return rc;
 }
 
 static struct ibmvnic_rwi *get_next_rwi(struct ibmvnic_adapter *adapter)
@@ -2236,13 +2247,7 @@ static void __ibmvnic_reset(struct work_struct *work)
 			rc = do_reset(adapter, rwi, reset_state);
 		}
 		kfree(rwi);
-		if (rc == IBMVNIC_OPEN_FAILED) {
-			if (list_empty(&adapter->rwi_list))
-				adapter->state = VNIC_CLOSED;
-			else
-				adapter->state = reset_state;
-			rc = 0;
-		}
+
 		if (rc)
 			netdev_dbg(adapter->netdev, "Reset failed, rc=%d\n", rc);
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH net v2 9/9] ibmvnic: reduce wait for completion time
From: Dany Madden @ 2020-11-23 23:57 UTC (permalink / raw)
  To: netdev; +Cc: drt, sukadev, ljp
In-Reply-To: <20201123235757.6466-1-drt@linux.ibm.com>

Reduce the wait time for Command Response Queue response from 30 seconds
to 20 seconds, as recommended by VIOS and Power Hypervisor teams.

Fixes: bd0b672313941 ("ibmvnic: Move login and queue negotiation into
ibmvnic_open")
Fixes: 53da09e92910f ("ibmvnic: Add set_link_state routine for setting
adapter link state")

Signed-off-by: Dany Madden <drt@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 3bfdf9f2edff..63b39744a07a 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -834,7 +834,7 @@ static void release_napi(struct ibmvnic_adapter *adapter)
 static int ibmvnic_login(struct net_device *netdev)
 {
 	struct ibmvnic_adapter *adapter = netdev_priv(netdev);
-	unsigned long timeout = msecs_to_jiffies(30000);
+	unsigned long timeout = msecs_to_jiffies(20000);
 	int retry_count = 0;
 	int retries = 10;
 	bool retry;
@@ -938,7 +938,7 @@ static void release_resources(struct ibmvnic_adapter *adapter)
 static int set_link_state(struct ibmvnic_adapter *adapter, u8 link_state)
 {
 	struct net_device *netdev = adapter->netdev;
-	unsigned long timeout = msecs_to_jiffies(30000);
+	unsigned long timeout = msecs_to_jiffies(20000);
 	union ibmvnic_crq crq;
 	bool resend;
 	int rc;
@@ -5124,7 +5124,7 @@ static int init_crq_queue(struct ibmvnic_adapter *adapter)
 static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter, bool reset)
 {
 	struct device *dev = &adapter->vdev->dev;
-	unsigned long timeout = msecs_to_jiffies(30000);
+	unsigned long timeout = msecs_to_jiffies(20000);
 	u64 old_num_rx_queues, old_num_tx_queues;
 	int rc;
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH net v2 5/9] ibmvnic: delay next reset if hard reset fails
From: Dany Madden @ 2020-11-23 23:57 UTC (permalink / raw)
  To: netdev; +Cc: drt, sukadev, ljp
In-Reply-To: <20201123235757.6466-1-drt@linux.ibm.com>

From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>

If auto-priority failover is enabled, the backing device needs time
to settle if hard resetting fails for any reason. Add a delay of 60
seconds before retrying the hard-reset.

Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery")

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 97e01f01c458..a11af9e0a0a1 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2242,6 +2242,14 @@ static void __ibmvnic_reset(struct work_struct *work)
 				rc = do_hard_reset(adapter, rwi, reset_state);
 				rtnl_unlock();
 			}
+			if (rc) {
+				/* give backing device time to settle down */
+				netdev_dbg(adapter->netdev,
+					   "[S:%d] Hard reset failed, waiting 60 secs\n",
+					   adapter->state);
+				set_current_state(TASK_UNINTERRUPTIBLE);
+				schedule_timeout(60 * HZ);
+			}
 		} else if (!(rwi->reset_reason == VNIC_RESET_FATAL &&
 				adapter->from_passive_init)) {
 			rc = do_reset(adapter, rwi, reset_state);
-- 
2.26.2


^ permalink raw reply related

* [PATCH net v2 8/9] ibmvnic: no reset timeout for 5 seconds after reset
From: Dany Madden @ 2020-11-23 23:57 UTC (permalink / raw)
  To: netdev; +Cc: drt, sukadev, ljp
In-Reply-To: <20201123235757.6466-1-drt@linux.ibm.com>

Reset timeout is going off right after adapter reset. This patch ensures
that timeout is scheduled if it has been 5 seconds since the last reset.
5 seconds is the default watchdog timeout.

Fixes: ed651a10875f1 ("ibmvnic: Updated reset handling")

Signed-off-by: Dany Madden <drt@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 11 +++++++++--
 drivers/net/ethernet/ibm/ibmvnic.h |  2 ++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 79a8b78d93ca..3bfdf9f2edff 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2253,6 +2253,7 @@ static void __ibmvnic_reset(struct work_struct *work)
 			rc = do_reset(adapter, rwi, reset_state);
 		}
 		kfree(rwi);
+		adapter->last_reset_time = jiffies;
 
 		if (rc)
 			netdev_dbg(adapter->netdev, "Reset failed, rc=%d\n", rc);
@@ -2356,7 +2357,13 @@ static void ibmvnic_tx_timeout(struct net_device *dev, unsigned int txqueue)
 			   "Adapter is resetting, skip timeout reset\n");
 		return;
 	}
-
+	/* No queuing up reset until at least 5 seconds (default watchdog val)
+	 * after last reset
+	 */
+	if (time_before(jiffies, (adapter->last_reset_time + dev->watchdog_timeo))) {
+		netdev_dbg(dev, "Not yet time to tx timeout.\n");
+		return;
+	}
 	ibmvnic_reset(adapter, VNIC_RESET_TIMEOUT);
 }
 
@@ -5276,7 +5283,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id)
 	adapter->state = VNIC_PROBED;
 
 	adapter->wait_for_reset = false;
-
+	adapter->last_reset_time = jiffies;
 	return 0;
 
 ibmvnic_register_fail:
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index 6f0a701c4a38..b21092f5f9c1 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -1088,6 +1088,8 @@ struct ibmvnic_adapter {
 	unsigned long resetting;
 	bool napi_enabled, from_passive_init;
 	bool login_pending;
+	/* last device reset time */
+	unsigned long last_reset_time;
 
 	bool failover_pending;
 	bool force_reset_recovery;
-- 
2.26.2


^ permalink raw reply related

* [PATCH net v2 3/9] ibmvnic: avoid memset null scrq msgs
From: Dany Madden @ 2020-11-23 23:57 UTC (permalink / raw)
  To: netdev; +Cc: drt, sukadev, ljp
In-Reply-To: <20201123235757.6466-1-drt@linux.ibm.com>

scrq->msgs could be NULL during device reset, causing Linux to crash.
So, check before memset scrq->msgs.

Fixes: c8b2ad0a4a901 ("ibmvnic: Sanitize entire SCRQ buffer on reset")

Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index d5a927bb4954..e84255c2fc72 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2844,16 +2844,26 @@ static int reset_one_sub_crq_queue(struct ibmvnic_adapter *adapter,
 				   struct ibmvnic_sub_crq_queue *scrq)
 {
 	int rc;
+	if (!scrq) {
+		netdev_dbg(adapter->netdev,
+			   "Invalid scrq reset. irq (%d) or msgs (%p).\n",
+			   scrq->irq, scrq->msgs);
+		return -EINVAL;
+	}
 
 	if (scrq->irq) {
 		free_irq(scrq->irq, scrq);
 		irq_dispose_mapping(scrq->irq);
 		scrq->irq = 0;
 	}
-
-	memset(scrq->msgs, 0, 4 * PAGE_SIZE);
-	atomic_set(&scrq->used, 0);
-	scrq->cur = 0;
+	if (scrq->msgs) {
+		memset(scrq->msgs, 0, 4 * PAGE_SIZE);
+		atomic_set(&scrq->used, 0);
+		scrq->cur = 0;
+	} else {
+		netdev_dbg(adapter->netdev, "Invalid scrq reset\n");
+		return -EINVAL;
+	}
 
 	rc = h_reg_sub_crq(adapter->vdev->unit_address, scrq->msg_token,
 			   4 * PAGE_SIZE, &scrq->crq_num, &scrq->hw_irq);
-- 
2.26.2


^ permalink raw reply related

* [PATCH net v2 2/9] ibmvnic: stop free_all_rwi on failed reset
From: Dany Madden @ 2020-11-23 23:57 UTC (permalink / raw)
  To: netdev; +Cc: drt, sukadev, ljp
In-Reply-To: <20201123235757.6466-1-drt@linux.ibm.com>

When ibmvnic fails to reset, it breaks out of the reset loop and frees
all of the remaining resets from the workqueue. Doing so prevents the
adapter from recovering if no reset is scheduled after that. Instead,
have the driver continue to process resets on the workqueue.

Remove the no longer need free_all_rwi().

Fixes: ed651a10875f1 ("ibmvnic: Updated reset handling")

Signed-off-by: Dany Madden <drt@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 22 +++-------------------
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index dcb23015b6b4..d5a927bb4954 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2173,17 +2173,6 @@ static struct ibmvnic_rwi *get_next_rwi(struct ibmvnic_adapter *adapter)
 	return rwi;
 }
 
-static void free_all_rwi(struct ibmvnic_adapter *adapter)
-{
-	struct ibmvnic_rwi *rwi;
-
-	rwi = get_next_rwi(adapter);
-	while (rwi) {
-		kfree(rwi);
-		rwi = get_next_rwi(adapter);
-	}
-}
-
 static void __ibmvnic_reset(struct work_struct *work)
 {
 	struct ibmvnic_rwi *rwi;
@@ -2253,9 +2242,9 @@ static void __ibmvnic_reset(struct work_struct *work)
 			else
 				adapter->state = reset_state;
 			rc = 0;
-		} else if (rc && rc != IBMVNIC_INIT_FAILED &&
-		    !adapter->force_reset_recovery)
-			break;
+		}
+		if (rc)
+			netdev_dbg(adapter->netdev, "Reset failed, rc=%d\n", rc);
 
 		rwi = get_next_rwi(adapter);
 
@@ -2269,11 +2258,6 @@ static void __ibmvnic_reset(struct work_struct *work)
 		complete(&adapter->reset_done);
 	}
 
-	if (rc) {
-		netdev_dbg(adapter->netdev, "Reset failed\n");
-		free_all_rwi(adapter);
-	}
-
 	clear_bit_unlock(0, &adapter->resetting);
 }
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH net v2 7/9] ibmvnic: send_login should check for crq errors
From: Dany Madden @ 2020-11-23 23:57 UTC (permalink / raw)
  To: netdev; +Cc: drt, sukadev, ljp
In-Reply-To: <20201123235757.6466-1-drt@linux.ibm.com>

send_login() does not check for the result of ibmvnic_send_crq() of the
login request. This results in the driver needlessly retrying the login
10 times even when CRQ is no longer active. Check the return code and
give up in case of errors in sending the CRQ.

The only time we want to retry is if we get a PARITALSUCCESS response
from the partner.

Fixes: 032c5e82847a2 ("Driver for IBM System i/p VNIC protocol")

Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 6d01cc8ec6b6..79a8b78d93ca 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -850,10 +850,8 @@ static int ibmvnic_login(struct net_device *netdev)
 		adapter->init_done_rc = 0;
 		reinit_completion(&adapter->init_done);
 		rc = send_login(adapter);
-		if (rc) {
-			netdev_warn(netdev, "Unable to login\n");
+		if (rc)
 			return rc;
-		}
 
 		if (!wait_for_completion_timeout(&adapter->init_done,
 						 timeout)) {
@@ -3726,15 +3724,16 @@ static int send_login(struct ibmvnic_adapter *adapter)
 	struct ibmvnic_login_rsp_buffer *login_rsp_buffer;
 	struct ibmvnic_login_buffer *login_buffer;
 	struct device *dev = &adapter->vdev->dev;
+	struct vnic_login_client_data *vlcd;
 	dma_addr_t rsp_buffer_token;
 	dma_addr_t buffer_token;
 	size_t rsp_buffer_size;
 	union ibmvnic_crq crq;
+	int client_data_len;
 	size_t buffer_size;
 	__be64 *tx_list_p;
 	__be64 *rx_list_p;
-	int client_data_len;
-	struct vnic_login_client_data *vlcd;
+	int rc;
 	int i;
 
 	if (!adapter->tx_scrq || !adapter->rx_scrq) {
@@ -3840,16 +3839,23 @@ static int send_login(struct ibmvnic_adapter *adapter)
 	crq.login.len = cpu_to_be32(buffer_size);
 
 	adapter->login_pending = true;
-	ibmvnic_send_crq(adapter, &crq);
+	rc = ibmvnic_send_crq(adapter, &crq);
+	if (rc) {
+		adapter->login_pending = false;
+		netdev_err(adapter->netdev, "Failed to send login, rc=%d\n", rc);
+		goto buf_rsp_map_failed;
+	}
 
 	return 0;
 
 buf_rsp_map_failed:
 	kfree(login_rsp_buffer);
+	adapter->login_rsp_buf = NULL;
 buf_rsp_alloc_failed:
 	dma_unmap_single(dev, buffer_token, buffer_size, DMA_TO_DEVICE);
 buf_map_failed:
 	kfree(login_buffer);
+	adapter->login_buf = NULL;
 buf_alloc_failed:
 	return -1;
 }
-- 
2.26.2


^ permalink raw reply related

* [PATCH net-next v2] ibmvnic: process HMC disable command
From: Dany Madden @ 2020-11-23 23:58 UTC (permalink / raw)
  To: netdev; +Cc: drt, sukadev, ljp

Currently ibmvnic does not support the "Disable vNIC" command from
the Hardware Management Console. The HMC uses this command to disconnect
the adapter from the network if the adapter is misbehaving or sending
malicious traffic. The effect of this command is equivalent to setting
the link to the "down" state on the linux client.

Enable support in ibmvnic driver for the Disable vNIC command.

Signed-off-by: Dany Madden <drt@linux.ibm.com>
---
V2 changes based on Jakub Kicinski's feedback:
- Broke from "[PATCH net 00/15] ibmvnic: assorted bug fixes" sent by Lijun Pan.
- Expanded on the description
- Submitting to net-next
---
 drivers/net/ethernet/ibm/ibmvnic.c | 40 ++++++++++++++++++++++++++++++
 drivers/net/ethernet/ibm/ibmvnic.h |  3 ++-
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 63b39744a07a..47446e5f8ec5 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -109,6 +109,8 @@ static void release_crq_queue(struct ibmvnic_adapter *);
 static int __ibmvnic_set_mac(struct net_device *, u8 *);
 static int init_crq_queue(struct ibmvnic_adapter *adapter);
 static int send_query_phys_parms(struct ibmvnic_adapter *adapter);
+static void ibmvnic_disable(struct ibmvnic_adapter *adapter);
+static int ibmvnic_close(struct net_device *netdev);
 
 struct ibmvnic_stat {
 	char name[ETH_GSTRING_LEN];
@@ -1207,6 +1209,42 @@ static int ibmvnic_open(struct net_device *netdev)
 	return rc;
 }
 
+static void ibmvnic_disable(struct ibmvnic_adapter *adapter)
+{
+	struct list_head *entry, *tmp_entry;
+	struct net_device *netdev = adapter->netdev;
+	int rc = 0;
+
+	/* cancel all pending resets in the queue */
+	if (!list_empty(&adapter->rwi_list)) {
+		list_for_each_safe(entry, tmp_entry, &adapter->rwi_list)
+			list_del(entry);
+	}
+
+	/* wait for current reset to finish */
+	flush_work(&adapter->ibmvnic_reset);
+	flush_delayed_work(&adapter->ibmvnic_delayed_reset);
+
+	if (test_bit(0, &adapter->resetting) ||
+	    adapter->state == VNIC_PROBED ||
+	    adapter->state == VNIC_OPEN ||
+	    adapter->state == VNIC_OPENING) {
+		rc = ibmvnic_close(netdev);
+		/* Expect -EINVAL when crq is no longer active. Set link down
+		 * would fail.
+		 */
+		if (rc && rc != -EINVAL) {
+			netdev_err(netdev, "Failed to disable adapter, rc=%d\n", rc);
+			return;
+		}
+	} else {
+		netdev_dbg(netdev, "Disable adapter request ignored (state=%d)\n", adapter->state);
+		return;
+	}
+
+	netdev_dbg(netdev, "Adapter disabled\n");
+}
+
 static void clean_rx_pools(struct ibmvnic_adapter *adapter)
 {
 	struct ibmvnic_rx_pool *rx_pool;
@@ -4825,6 +4863,8 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
 		} else if (gen_crq->cmd == IBMVNIC_DEVICE_FAILOVER) {
 			dev_info(dev, "Backing device failover detected\n");
 			adapter->failover_pending = true;
+		} else if (gen_crq->cmd == IBMVNIC_DEVICE_DISABLE) {
+			ibmvnic_disable(adapter);
 		} else {
 			/* The adapter lost the connection */
 			dev_err(dev, "Virtual Adapter failed (rc=%d)\n",
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index b21092f5f9c1..d15866cbc2a6 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -834,10 +834,11 @@ enum ibmvnic_crq_type {
 	IBMVNIC_CRQ_XPORT_EVENT		= 0xFF,
 };
 
-enum ibmvfc_crq_format {
+enum ibmvnic_crq_format {
 	IBMVNIC_CRQ_INIT                 = 0x01,
 	IBMVNIC_CRQ_INIT_COMPLETE        = 0x02,
 	IBMVNIC_PARTITION_MIGRATED       = 0x06,
+	IBMVNIC_DEVICE_DISABLE		 = 0x07,
 	IBMVNIC_DEVICE_FAILOVER          = 0x08,
 };
 
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH bpf-next v3 01/10] net: introduce preferred busy-polling
From: Jakub Kicinski @ 2020-11-24  0:04 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, bpf, Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, edumazet, jonathan.lemon, maximmi
In-Reply-To: <20201119083024.119566-2-bjorn.topel@gmail.com>

On Thu, 19 Nov 2020 09:30:15 +0100 Björn Töpel wrote:
> +	/* The NAPI context has more processing work, but busy-polling
> +	 * is preferred. Exit early.
> +	 */
> +	if (napi_prefer_busy_poll(n)) {
> +		if (napi_complete_done(n, work)) {
> +			/* If timeout is not set, we need to make sure
> +			 * that the NAPI is re-scheduled.
> +			 */
> +			napi_schedule(n);
> +		}
> +		goto out_unlock;
> +	}

Do we really need to go through napi_complete_done() here?

Isn't it sufficient to check:

	if (napi_prefer_busy_poll(n) && 
	    hrtimer_active(&n->timer)) // not 100% sure this is the
	                               // right helper for the check

If timer is scheduled it will fire and worst case sirq will kick back
in after timeout. napi_complete_done() should had been called by the
driver already to schedule the timer. If the driver doesn't call
napi_complete_done() we should not allow it to use busy_poll() anyway.

^ permalink raw reply

* Re: [PATCH bpf-next v7 00/34] bpf: switch to memcg-based memory accounting
From: Roman Gushchin @ 2020-11-24  0:05 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: bpf, ast, netdev, andrii, akpm, linux-mm, linux-kernel,
	kernel-team
In-Reply-To: <9134a408-e26c-a7f2-23a7-5fc221bafdde@iogearbox.net>

On Mon, Nov 23, 2020 at 02:30:09PM +0100, Daniel Borkmann wrote:
> On 11/19/20 6:37 PM, Roman Gushchin wrote:
> > Currently bpf is using the memlock rlimit for the memory accounting.
> > This approach has its downsides and over time has created a significant
> > amount of problems:
> > 
> > 1) The limit is per-user, but because most bpf operations are performed
> >     as root, the limit has a little value.
> > 
> > 2) It's hard to come up with a specific maximum value. Especially because
> >     the counter is shared with non-bpf users (e.g. memlock() users).
> >     Any specific value is either too low and creates false failures
> >     or too high and useless.
> > 
> > 3) Charging is not connected to the actual memory allocation. Bpf code
> >     should manually calculate the estimated cost and precharge the counter,
> >     and then take care of uncharging, including all fail paths.
> >     It adds to the code complexity and makes it easy to leak a charge.
> > 
> > 4) There is no simple way of getting the current value of the counter.
> >     We've used drgn for it, but it's far from being convenient.
> > 
> > 5) Cryptic -EPERM is returned on exceeding the limit. Libbpf even had
> >     a function to "explain" this case for users.
> > 
> > In order to overcome these problems let's switch to the memcg-based
> > memory accounting of bpf objects. With the recent addition of the percpu
> > memory accounting, now it's possible to provide a comprehensive accounting
> > of the memory used by bpf programs and maps.
> > 
> > This approach has the following advantages:
> > 1) The limit is per-cgroup and hierarchical. It's way more flexible and allows
> >     a better control over memory usage by different workloads. Of course, it
> >     requires enabled cgroups and kernel memory accounting and properly configured
> >     cgroup tree, but it's a default configuration for a modern Linux system.
> > 
> > 2) The actual memory consumption is taken into account. It happens automatically
> >     on the allocation time if __GFP_ACCOUNT flags is passed. Uncharging is also
> >     performed automatically on releasing the memory. So the code on the bpf side
> >     becomes simpler and safer.
> > 
> > 3) There is a simple way to get the current value and statistics.
> > 
> > In general, if a process performs a bpf operation (e.g. creates or updates
> > a map), it's memory cgroup is charged. However map updates performed from
> > an interrupt context are charged to the memory cgroup which contained
> > the process, which created the map.
> > 
> > Providing a 1:1 replacement for the rlimit-based memory accounting is
> > a non-goal of this patchset. Users and memory cgroups are completely
> > orthogonal, so it's not possible even in theory.
> > Memcg-based memory accounting requires a properly configured cgroup tree
> > to be actually useful. However, it's the way how the memory is managed
> > on a modern Linux system.

Hi Daniel!

> 
> The cover letter here only describes the advantages of this series, but leaves
> out discussion of the disadvantages. They definitely must be part of the series
> to provide a clear description of the semantic changes to readers.

Honestly, I don't see them as disadvantages. Cgroups are basic units in which
resource control limits/guarantees/accounting are expressed. If there are
no cgroups created and configured in the system, it's obvious (maybe only to me)
that no limits are applied.

Users (rlimits) are to some extent similar units, but they do not provide
a comprehensive resource control system. Some parts are deprecated (like rss limits),
some parts are just missing. Aside from bpf nobody uses users to control
the memory as a physical resource. It simple doesn't work (and never did).
If somebody expects that a non-privileged user can't harm the system by depleting
it's memory (and other resources), it's simple not correct. There are multiple ways
for doing it. Accounting or not accounting bpf maps doesn't really change anything.
If we see them not as a real security mechanism, but as a way to prevent "mistakes",
which can harm the system, it's to some extent legit. The question is only if
it justifies the amount of problems we had with these limits.

Switching to memory cgroups, which are the way how the memory control is expressed,
IMO doesn't need an additional justification. During the last year I remember 2 or 3
times when various people (internally in Fb and in public mailing lists) were asking
why bpf memory is not accounted to memory cgroups. I think it's basically expected
these days.

I'll try to make more obvious that we're switching from users to cgroups and
describe the consequences of it on an unconfigured system. I'll update the cover.

> Last time we
> discussed them, they were i) no mem limits in general on unprivileged users when
> memory cgroups was not configured in the kernel, and ii) no mem limits by default
> if not configured in the cgroup specifically.
> Did we made any progress on these
> in the meantime? How do we want to address them? What is the concrete justification
> to not address them?

I don't see how they can and should be addressed.
Cgroups are the way how the resource consumption of a group of processes can be
limited. If there are no cgroups configured, it means all resources are available
to everyone. Maybe a user wants to use the whole memory for a bpf map? Why not?

Do you have any specific use case in your mind?
If you see a real value in the old system (I don't), which can justify an additional
complexity of keeping them both in a working state, we can discuss this option too.
We can make a switch in few steps, if you think it's too risky.

> 
> Also I wonder what are the risk of regressions here, for example, if an existing
> orchestrator has configured memory cgroup limits that are tailored to the application's
> needs.. now, with kernel upgrade BPF will start to interfere, e.g. if a BPF program
> attached to cgroups (e.g. connect/sendmsg/recvmsg or general cgroup skb egress hook)
> starts charging to the process' memcg due to map updates?

Well, if somebody has a tight memory limit and large bpf map(s), they can see a "regression".
However the kernel memory usage and accounting implementation details vary from a version
to a version, so nobody should expect that limits once set will work forever.
If for some strange reason it'll create a critical problem, as a workaround it's possible
to disable the kernel memory accounting as a whole (via a boot option).

Actually, it seems that the usefulness of strict limits is generally limited, because
it's hard to get and assign any specific value. They are always either too relaxed
(and have no value), either too strict (and causing production issues). Memory cgroups
are generally moving towards soft limits and protections. But it's a separate theme...

Thanks!

^ permalink raw reply

* Re: [PATCH bpf-next v3 01/10] net: introduce preferred busy-polling
From: Jakub Kicinski @ 2020-11-24  0:11 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, bpf, Björn Töpel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, edumazet, jonathan.lemon, maximmi
In-Reply-To: <20201119083024.119566-2-bjorn.topel@gmail.com>

On Thu, 19 Nov 2020 09:30:15 +0100 Björn Töpel wrote:
> @@ -105,7 +105,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
>  	unsigned int napi_id = READ_ONCE(sk->sk_napi_id);
>  
>  	if (napi_id >= MIN_NAPI_ID)
> -		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk);
> +		napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
> +			       READ_ONCE(sk->sk_prefer_busy_poll));

Perhaps a noob question, but aren't all accesses to the new sk members
under the socket lock? Do we really need the READ_ONCE() / WRITE_ONCE()?

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/10] Introduce preferred busy-polling
From: Jakub Kicinski @ 2020-11-24  0:14 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, bpf, bjorn.topel, magnus.karlsson, ast, daniel,
	maciej.fijalkowski, sridhar.samudrala, jesse.brandeburg,
	qi.z.zhang, edumazet, jonathan.lemon, maximmi
In-Reply-To: <20201119083024.119566-1-bjorn.topel@gmail.com>

On Thu, 19 Nov 2020 09:30:14 +0100 Björn Töpel wrote:
> Performance netperf UDP_RR:
> 
> Note that netperf UDP_RR is not a heavy traffic tests, and preferred
> busy-polling is not typically something we want to use here.
> 
>   $ echo 20 | sudo tee /proc/sys/net/core/busy_read
>   $ netperf -H 192.168.1.1 -l 30 -t UDP_RR -v 2 -- \
>       -o min_latency,mean_latency,max_latency,stddev_latency,transaction_rate
> 
> busy-polling blocking sockets:            12,13.33,224,0.63,74731.177
> 
> I hacked netperf to use non-blocking sockets and re-ran:
> 
> busy-polling non-blocking sockets:        12,13.46,218,0.72,73991.172
> prefer busy-polling non-blocking sockets: 12,13.62,221,0.59,73138.448
> 
> Using the preferred busy-polling mode does not impact performance.
> 
> The above tests was done for the 'ice' driver.

Any interest in this work form ADQ folks? I recall they were using
memcache with busy polling for their tests, it'd cool to see how much
this helps memcache on P99+ latency!

^ permalink raw reply

* Re: [PATCH net-next] net: sfp: add debugfs support
From: Andrew Lunn @ 2020-11-24  0:14 UTC (permalink / raw)
  To: Russell King; +Cc: Heiner Kallweit, David S. Miller, netdev, Jakub Kicinski
In-Reply-To: <E1khJyS-0003UU-9O@rmk-PC.armlinux.org.uk>

On Mon, Nov 23, 2020 at 10:06:16PM +0000, Russell King wrote:
> Add debugfs support to SFP so that the internal state of the SFP state
> machines and hardware signal state can be viewed from userspace, rather
> than having to compile a debug kernel to view state state transitions
> in the kernel log.  The 'state' output looks like:
> 
> Module state: empty
> Module probe attempts: 0 0
> Device state: up
> Main state: down
> Fault recovery remaining retries: 5
> PHY probe remaining retries: 12
> moddef0: 0
> rx_los: 1
> tx_fault: 1
> tx_disable: 1
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

Hi Russell

This looks useful. I always seem to end up recompiling the kernel,
which as you said, this should avoid.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next] net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround
From: kernel test robot @ 2020-11-24  0:18 UTC (permalink / raw)
  To: Russell King, Andrew Lunn, Heiner Kallweit
  Cc: kbuild-all, netdev, Jakub Kicinski
In-Reply-To: <E1khJlv-0003Jq-ET@rmk-PC.armlinux.org.uk>

[-- Attachment #1: Type: text/plain, Size: 2855 bytes --]

Hi Russell,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Russell-King/net-sfp-VSOL-V2801F-CarlitoxxPro-CPGOS03-0490-v2-0-workaround/20201124-055921
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 8e1e33ffa696b2d779dd5cd422a80960b88e508c
config: arc-randconfig-r016-20201123 (attached as .config)
compiler: arc-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/90849b26224de3b2e508f1c3fa31665f4fd72d0a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Russell-King/net-sfp-VSOL-V2801F-CarlitoxxPro-CPGOS03-0490-v2-0-workaround/20201124-055921
        git checkout 90849b26224de3b2e508f1c3fa31665f4fd72d0a
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/net/phy/sfp.c: In function 'sfp_i2c_read':
>> drivers/net/phy/sfp.c:339:9: warning: variable 'block_size' set but not used [-Wunused-but-set-variable]
     339 |  size_t block_size;
         |         ^~~~~~~~~~

vim +/block_size +339 drivers/net/phy/sfp.c

   334	
   335	static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 dev_addr, void *buf,
   336				size_t len)
   337	{
   338		struct i2c_msg msgs[2];
 > 339		size_t block_size;
   340		size_t this_len;
   341		u8 bus_addr;
   342		int ret;
   343	
   344		if (a2) {
   345			block_size = 16;
   346			bus_addr = 0x51;
   347		} else {
   348			block_size = sfp->i2c_block_size;
   349			bus_addr = 0x50;
   350		}
   351	
   352		msgs[0].addr = bus_addr;
   353		msgs[0].flags = 0;
   354		msgs[0].len = 1;
   355		msgs[0].buf = &dev_addr;
   356		msgs[1].addr = bus_addr;
   357		msgs[1].flags = I2C_M_RD;
   358		msgs[1].len = len;
   359		msgs[1].buf = buf;
   360	
   361		while (len) {
   362			this_len = len;
   363			if (this_len > sfp->i2c_block_size)
   364				this_len = sfp->i2c_block_size;
   365	
   366			msgs[1].len = this_len;
   367	
   368			ret = i2c_transfer(sfp->i2c, msgs, ARRAY_SIZE(msgs));
   369			if (ret < 0)
   370				return ret;
   371	
   372			if (ret != ARRAY_SIZE(msgs))
   373				break;
   374	
   375			msgs[1].buf += this_len;
   376			dev_addr += this_len;
   377			len -= this_len;
   378		}
   379	
   380		return msgs[1].buf - (u8 *)buf;
   381	}
   382	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26847 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround
From: Andrew Lunn @ 2020-11-24  0:20 UTC (permalink / raw)
  To: Russell King; +Cc: Heiner Kallweit, David S. Miller, netdev, Jakub Kicinski
In-Reply-To: <E1khJlv-0003Jq-ET@rmk-PC.armlinux.org.uk>

> @@ -335,10 +336,19 @@ static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 dev_addr, void *buf,
>  			size_t len)
>  {
>  	struct i2c_msg msgs[2];
> -	u8 bus_addr = a2 ? 0x51 : 0x50;
> +	size_t block_size;
>  	size_t this_len;
> +	u8 bus_addr;
>  	int ret;
>  
> +	if (a2) {
> +		block_size = 16;
> +		bus_addr = 0x51;

Hi Russell, Thomas

Does this man the diagnostic page can be read 16 bytes at a time, even
when the other page has to be 1 bytes at a time? That seems rather
odd. Or is the diagnostic page not implemented in these SFPs?

     Andrew

^ permalink raw reply

* VRF NS for lladdr sent on the wrong interface
From: Stephen Suryaputra @ 2020-11-24  0:23 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 5469 bytes --]

Hi,

I'm running into a problem with lladdr pinging all-host mcast all nodes
addr. The ping intially works but after cycling the interface that
receives the ping, the echo request packet causes a neigh solicitation
being sent on a different interface.

To repro, I included the attached namespace scripts. This is the
topology and an output of my test.

# +-------+     +----------+   +-------+
# | h0    |     |    r0    |   |    h1 |
# |    v00+-----+v00    v01+---+v10    |
# |       |     |          |   |       |
# +-------+     +----------+   +-------+

Setup and list of addresses:

$ sudo sh ./setup.sh
$ sudo ip netns exec h0 ip addr show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
28: h0_v00@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b2:72:0a:25:7d:0f brd ff:ff:ff:ff:ff:ff link-netns r0
    inet6 fe80::b072:aff:fe25:7d0f/64 scope link 
       valid_lft forever preferred_lft forever
$ sudo ip netns exec r0 ip addr show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: vrf_r0: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc noqueue state UP group default qlen 1000
    link/ether 5a:48:08:1e:e6:38 brd ff:ff:ff:ff:ff:ff
    inet 127.0.0.1/8 scope host vrf_r0
       valid_lft forever preferred_lft forever
27: r0_v00@if28: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf_r0 state UP group default qlen 1000
    link/ether aa:9d:a5:cf:ab:75 brd ff:ff:ff:ff:ff:ff link-netns h0
    inet6 fe80::a89d:a5ff:fecf:ab75/64 scope link 
       valid_lft forever preferred_lft forever
30: r0_v01@if29: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf_r0 state UP group default qlen 1000
    link/ether 52:0f:70:b5:a8:6a brd ff:ff:ff:ff:ff:ff link-netns h1
    inet6 fe80::500f:70ff:feb5:a86a/64 scope link 
       valid_lft forever preferred_lft forever
$ sudo ip netns exec h1 ip addr show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
29: h1_v10@if30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether aa:3c:c1:a0:07:7c brd ff:ff:ff:ff:ff:ff link-netns r0
    inet6 fe80::a83c:c1ff:fea0:77c/64 scope link 
       valid_lft forever preferred_lft forever

Initially ping is replied by r0_v00:

$ sudo ip netns exec h0 ping -c 1 ff02::1%h0_v00
PING ff02::1%h0_v00(ff02::1%h0_v00) 56 data bytes
64 bytes from fe80::a89d:a5ff:fecf:ab75%h0_v00: icmp_seq=1 ttl=64 time=4.44 ms

--- ff02::1%h0_v00 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 4.443/4.443/4.443/0.000 ms

Cycle r0_v00. Ping isn't replied and the tcpdump shows that the NS for h0_v00 lladdr
is sent over r0_v01:

$ sudo ip netns exec r0 ip link set r0_v00 down
$ sudo ip netns exec r0 ip link set r0_v00 up
$ sudo ip netns exec r0 ip addr show dev r0_v00
27: r0_v00@if28: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf_r0 state UP group default qlen 1000
    link/ether aa:9d:a5:cf:ab:75 brd ff:ff:ff:ff:ff:ff link-netns h0
    inet6 fe80::a89d:a5ff:fecf:ab75/64 scope link 
       valid_lft forever preferred_lft forever

$ sudo ip netns exec h0 ping -c 1 ff02::1%h0_v00
PING ff02::1%h0_v00(ff02::1%h0_v00) 56 data bytes

--- ff02::1%h0_v00 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

$ sudo ip netns exec h1 tcpdump -i h1_v10
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on h1_v10, link-type EN10MB (Ethernet), capture size 262144 bytes
^C12:36:12.210524 IP6 fe80::a83c:c1ff:fea0:77c > ip6-allrouters: ICMP6, router solicitation, length 16
12:36:34.456650 IP6 fe80::500f:70ff:feb5:a86a > ff02::1:ff25:7d0f: ICMP6, neighbor solicitation, who has fe80::b072:aff:fe25:7d0f, length 32
12:36:35.474519 IP6 fe80::500f:70ff:feb5:a86a > ff02::1:ff25:7d0f: ICMP6, neighbor solicitation, who has fe80::b072:aff:fe25:7d0f, length 32
12:36:36.498455 IP6 fe80::500f:70ff:feb5:a86a > ff02::1:ff25:7d0f: ICMP6, neighbor solicitation, who has fe80::b072:aff:fe25:7d0f, length 32

4 packets captured
4 packets received by filter
0 packets dropped by kernel

I'm thinking that the following patch is needed:

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index f2793ffde191..30f4e867fe5b 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1315,11 +1315,14 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
 	int orig_iif = skb->skb_iif;
 	bool need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr);
 	bool is_ndisc = ipv6_ndisc_frame(skb);
+	bool is_ll_src;
 
 	/* loopback, multicast & non-ND link-local traffic; do not push through
 	 * packet taps again. Reset pkt_type for upper layers to process skb
 	 */
-	if (skb->pkt_type == PACKET_LOOPBACK || (need_strict && !is_ndisc)) {
+	is_ll_src = ipv6_addr_type(&ipv6_hdr(skb)->saddr) & IPV6_ADDR_LINKLOCAL;
+	if (skb->pkt_type == PACKET_LOOPBACK ||
+	    (need_strict && !is_ndisc && !is_ll_src)) {
 		skb->dev = vrf_dev;
 		skb->skb_iif = vrf_dev->ifindex;
 		IP6CB(skb)->flags |= IP6SKB_L3SLAVE;

But wanting to probe first to see if I could have missed something. Or
is there a better patch. I would be happy to follow up with a formal patch.

Thank you,

Stephen.

[-- Attachment #2: setup.sh --]
[-- Type: application/x-sh, Size: 915 bytes --]

[-- Attachment #3: teardown.sh --]
[-- Type: application/x-sh, Size: 184 bytes --]

^ permalink raw reply related

* Re: LRO: creating vlan subports affects parent port's LRO settings
From: Jakub Kicinski @ 2020-11-24  0:26 UTC (permalink / raw)
  To: Limin Wang; +Cc: netdev
In-Reply-To: <CACpdL32SRKb8hXzuF_APybip+hyxkXRYoxCW_OMhn0odRSQKuw@mail.gmail.com>

On Thu, 19 Nov 2020 20:37:27 -0500 Limin Wang wrote:
> Under relatively recent kernels (v4.4+), creating a vlan subport on a
> LRO supported parent NIC may turn LRO off on the parent port and
> further render its LRO feature practically unchangeable.

That does sound like an oversight in commit fd867d51f889 ("net/core:
generic support for disabling netdev features down stack").

Are you able to create a patch to fix this?

^ permalink raw reply

* Re: [PATCH net] net: stmmac: Use hrtimer for TX coalescing
From: Jakub Kicinski @ 2020-11-24  0:46 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: peppe.cavallaro, alexandre.torgue, joabreu, davem, kernel, netdev
In-Reply-To: <20201120150208.6838-1-vincent.whitchurch@axis.com>

On Fri, 20 Nov 2020 16:02:08 +0100 Vincent Whitchurch wrote:
> This driver uses a normal timer for TX coalescing, which means that the
> with the default tx-usecs of 1000 microseconds the cleanups actually
> happen 10 ms or more later with HZ=100.  This leads to very low
> througput with TCP when bridged to a slow link such as a 4G modem.  Fix
> this by using an hrtimer instead.
> 
> On my ARM platform with HZ=100 and the default TX coalescing settings
> (tx-frames 25 tx-usecs 1000), with "tc qdisc add dev eth0 root netem
> delay 60ms 40ms rate 50Mbit" run on the server, netperf's TCP_STREAM
> improves from ~5.5 Mbps to ~100 Mbps.
> 
> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>

Looks perfectly reasonable, but you marked it for net. Do you consider
this to be a bug fix, and need it backported to stable? Otherwise I'd
rather apply it to net-next.

^ permalink raw reply

* Re: [PATCH net] vsock/virtio: discard packets only when socket is really closed
From: patchwork-bot+netdevbpf @ 2020-11-24  0:50 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, slp, davem, justin.he, kvm, linux-kernel, stefanha,
	virtualization, kuba
In-Reply-To: <20201120104736.73749-1-sgarzare@redhat.com>

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Fri, 20 Nov 2020 11:47:36 +0100 you wrote:
> Starting from commit 8692cefc433f ("virtio_vsock: Fix race condition
> in virtio_transport_recv_pkt"), we discard packets in
> virtio_transport_recv_pkt() if the socket has been released.
> 
> When the socket is connected, we schedule a delayed work to wait the
> RST packet from the other peer, also if SHUTDOWN_MASK is set in
> sk->sk_shutdown.
> This is done to complete the virtio-vsock shutdown algorithm, releasing
> the port assigned to the socket definitively only when the other peer
> has consumed all the packets.
> 
> [...]

Here is the summary with links:
  - [net] vsock/virtio: discard packets only when socket is really closed
    https://git.kernel.org/netdev/net/c/3fe356d58efa

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v8] tcp: fix race condition when creating child sockets from syncookies
From: patchwork-bot+netdevbpf @ 2020-11-24  0:50 UTC (permalink / raw)
  To: Ricardo Dias
  Cc: davem, kuba, kuznet, yoshfuji, edumazet, netdev, linux-kernel
In-Reply-To: <20201120111133.GA67501@rdias-suse-pc.lan>

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Fri, 20 Nov 2020 11:11:33 +0000 you wrote:
> When the TCP stack is in SYN flood mode, the server child socket is
> created from the SYN cookie received in a TCP packet with the ACK flag
> set.
> 
> The child socket is created when the server receives the first TCP
> packet with a valid SYN cookie from the client. Usually, this packet
> corresponds to the final step of the TCP 3-way handshake, the ACK
> packet. But is also possible to receive a valid SYN cookie from the
> first TCP data packet sent by the client, and thus create a child socket
> from that SYN cookie.
> 
> [...]

Here is the summary with links:
  - [v8] tcp: fix race condition when creating child sockets from syncookies
    https://git.kernel.org/netdev/net/c/01770a166165

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v2 1/1] ath10k: add option for chip-id based BDF selection
From: Doug Anderson @ 2020-11-24  0:56 UTC (permalink / raw)
  To: Abhishek Kumar
  Cc: Kalle Valo, Rakesh Pillai, LKML, ath10k, Brian Norris,
	linux-wireless, David S. Miller, Jakub Kicinski, netdev
In-Reply-To: <20201112200856.v2.1.Ia526132a366886e3b5cf72433d0d58bb7bb1be0f@changeid>

Hi,

On Thu, Nov 12, 2020 at 12:09 PM Abhishek Kumar <kuabhs@chromium.org> wrote:
>
> In some devices difference in chip-id should be enough to pick
> the right BDF. Add another support for chip-id based BDF selection.
> With this new option, ath10k supports 2 fallback options.
>
> The board name with chip-id as option looks as follows
> board name 'bus=snoc,qmi-board-id=ff,qmi-chip-id=320'
>
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2-00696-QCAHLSWMTPL-1
> Tested-on: QCA6174 HW3.2 WLAN.RM.4.4.1-00157-QCARMSWPZ-1
> Signed-off-by: Abhishek Kumar <kuabhs@chromium.org>
> ---
>
> (no changes since v1)

I think you need to work on the method you're using to generate your
patches.  There are most definitely changes since v1.  You described
them in your cover letter (which you don't really need for a singleton
patch) instead of here.


> @@ -1438,12 +1439,17 @@ static int ath10k_core_create_board_name(struct ath10k *ar, char *name,
>         }
>
>         if (ar->id.qmi_ids_valid) {
> -               if (with_variant && ar->id.bdf_ext[0] != '\0')
> +               if (with_additional_params && ar->id.bdf_ext[0] != '\0')
>                         scnprintf(name, name_len,
>                                   "bus=%s,qmi-board-id=%x,qmi-chip-id=%x%s",
>                                   ath10k_bus_str(ar->hif.bus),
>                                   ar->id.qmi_board_id, ar->id.qmi_chip_id,
>                                   variant);
> +               else if (with_additional_params)
> +                       scnprintf(name, name_len,
> +                                 "bus=%s,qmi-board-id=%x,qmi-chip-id=%x",
> +                                 ath10k_bus_str(ar->hif.bus),
> +                                 ar->id.qmi_board_id, ar->id.qmi_chip_id);

I believe this is exactly opposite of what Rakesh was requesting.
Specifically, he was trying to eliminate the extra scnprintf() but I
think he still agreed that it was a good idea to generate 3 different
strings.  I believe the proper diff to apply to v1 is:

https://crrev.com/c/255643

-Doug

^ permalink raw reply

* Re: [PATCH 000/141] Fix fall-through warnings for Clang
From: Finn Thain @ 2020-11-24  0:58 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: James Bottomley, Kees Cook, Jakub Kicinski, Gustavo A. R. Silva,
	linux-kernel, alsa-devel, amd-gfx, bridge, ceph-devel,
	cluster-devel, coreteam, devel, dm-devel, drbd-dev, dri-devel,
	GR-everest-linux-l2, GR-Linux-NIC-Dev, intel-gfx, intel-wired-lan,
	keyrings, linux1394-devel, linux-acpi, linux-afs, Linux ARM,
	linux-arm-msm, linux-atm-general, linux-block, linux-can,
	linux-cifs, Linux Crypto Mailing List, linux-decnet-user,
	Ext4 Developers List, linux-fbdev, linux-geode, linux-gpio,
	linux-hams, linux-hwmon, linux-i3c, linux-ide, linux-iio,
	linux-input, linux-integrity, linux-mediatek,
	Linux Media Mailing List, linux-mmc, Linux-MM, linux-mtd,
	linux-nfs, linux-rdma, linux-renesas-soc, linux-scsi, linux-sctp,
	linux-security-module, linux-stm32, linux-usb, linux-watchdog,
	linux-wireless, Network Development, netfilter-devel, nouveau,
	op-tee, oss-drivers, patches, rds-devel, reiserfs-devel,
	samba-technical, selinux, target-devel, tipc-discussion,
	usb-storage, virtualization, wcn36xx,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), xen-devel,
	linux-hardening, Nick Desaulniers, Nathan Chancellor,
	Miguel Ojeda, Joe Perches
In-Reply-To: <CANiq72=z+tmuey9wj3Kk7wX5s0hTHpsQdLhAqcOVNrHon6xn5Q@mail.gmail.com>


On Mon, 23 Nov 2020, Miguel Ojeda wrote:

> On Mon, 23 Nov 2020, Finn Thain wrote:
> 
> > On Sun, 22 Nov 2020, Miguel Ojeda wrote:
> > 
> > > 
> > > It isn't that much effort, isn't it? Plus we need to take into 
> > > account the future mistakes that it might prevent, too.
> > 
> > We should also take into account optimisim about future improvements 
> > in tooling.
> > 
> Not sure what you mean here. There is no reliable way to guess what the 
> intention was with a missing fallthrough, even if you parsed whitespace 
> and indentation.
> 

What I meant was that you've used pessimism as if it was fact.

For example, "There is no way to guess what the effect would be if the 
compiler trained programmers to add a knee-jerk 'break' statement to avoid 
a warning".

Moreover, what I meant was that preventing programmer mistakes is a 
problem to be solved by development tools. The idea that retro-fitting new 
language constructs onto mature code is somehow necessary to "prevent 
future mistakes" is entirely questionable.

> > > So even if there were zero problems found so far, it is still a 
> > > positive change.
> > > 
> > 
> > It is if you want to spin it that way.
> > 
> How is that a "spin"? It is a fact that we won't get *implicit* 
> fallthrough mistakes anymore (in particular if we make it a hard error).
> 

Perhaps "handwaving" is a better term?

> > > I would agree if these changes were high risk, though; but they are 
> > > almost trivial.
> > > 
> > 
> > This is trivial:
> > 
> >  case 1:
> >         this();
> > +       fallthrough;
> >  case 2:
> >         that();
> > 
> > But what we inevitably get is changes like this:
> > 
> >  case 3:
> >         this();
> > +       break;
> >  case 4:
> >         hmmm();
> > 
> > Why? Mainly to silence the compiler. Also because the patch author 
> > argued successfully that they had found a theoretical bug, often in 
> > mature code.
> > 
> If someone changes control flow, that is on them. Every kernel developer 
> knows what `break` does.
> 

Sure. And if you put -Wimplicit-fallthrough into the Makefile and if that 
leads to well-intentioned patches that cause regressions, it is partly on 
you.

Have you ever considered the overall cost of the countless 
-Wpresume-incompetence flags?

Perhaps you pay the power bill for a build farm that produces logs that 
no-one reads? Perhaps you've run git bisect, knowing that the compiler 
messages are not interesting? Or compiled software in using a language 
that generates impenetrable messages? If so, here's a tip:

# grep CFLAGS /etc/portage/make.conf 
CFLAGS="... -Wno-all -Wno-extra ..."
CXXFLAGS="${CFLAGS}"

Now allow me some pessimism: the hardware upgrades, gigawatt hours and 
wait time attributable to obligatory static analyses are a net loss.

> > But is anyone keeping score of the regressions? If unreported bugs 
> > count, what about unreported regressions?
> > 
> Introducing `fallthrough` does not change semantics. If you are really 
> keen, you can always compare the objects because the generated code 
> shouldn't change.
> 

No, it's not for me to prove that such patches don't affect code 
generation. That's for the patch author and (unfortunately) for reviewers.

> Cheers,
> Miguel
> 

^ permalink raw reply

* Re: [PATCH net-next 0/2] net: dsa: hellcreek: Minor cleanups
From: patchwork-bot+netdevbpf @ 2020-11-24  1:00 UTC (permalink / raw)
  To: Kurt Kanzenbach
  Cc: andrew, vivien.didelot, f.fainelli, olteanv, davem, kuba, netdev
In-Reply-To: <20201121114455.22422-1-kurt@linutronix.de>

Hello:

This series was applied to netdev/net-next.git (refs/heads/master):

On Sat, 21 Nov 2020 12:44:53 +0100 you wrote:
> Hi,
> 
> fix two minor issues in the hellcreek driver.
> 
> Thanks,
> Kurt
> 
> [...]

Here is the summary with links:
  - [net-next,1/2] net: dsa: tag_hellcreek: Cleanup includes
    https://git.kernel.org/netdev/net-next/c/8551fad63cd3
  - [net-next,2/2] net: dsa: hellcreek: Don't print error message on defer
    https://git.kernel.org/netdev/net-next/c/ed5ef9fb2023

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 1/1] i40e: Fix removing driver while bare-metal VFs pass traffic
From: patchwork-bot+netdevbpf @ 2020-11-24  1:00 UTC (permalink / raw)
  To: Nguyen, Anthony L
  Cc: davem, kuba, sylwesterx.dziedziuch, netdev, sassmann,
	slawomirx.laba, brett.creeley, konrad0.jankowski
In-Reply-To: <20201120180640.3654474-1-anthony.l.nguyen@intel.com>

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Fri, 20 Nov 2020 10:06:40 -0800 you wrote:
> From: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
> 
> Prevent VFs from resetting when PF driver is being unloaded:
> - introduce new pf state: __I40E_VF_RESETS_DISABLED;
> - check if pf state has __I40E_VF_RESETS_DISABLED state set,
>   if so, disable any further VFLR event notifications;
> - when i40e_remove (rmmod i40e) is called, disable any resets on
>   the VFs;
> 
> [...]

Here is the summary with links:
  - [net,1/1] i40e: Fix removing driver while bare-metal VFs pass traffic
    https://git.kernel.org/netdev/net/c/2980cbd4dce7

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox