Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH v4 net-next 4/6] net: dsa: sja1105: Advertise the 8 TX queues
From: Jose Abreu @ 2019-09-15 16:41 UTC (permalink / raw)
  To: Vladimir Oltean, f.fainelli@gmail.com, vivien.didelot@gmail.com,
	andrew@lunn.ch, davem@davemloft.net, vinicius.gomes@intel.com,
	vedang.patel@intel.com, richardcochran@gmail.com
  Cc: weifeng.voon@intel.com, jiri@mellanox.com, m-karicheri2@ti.com,
	jose.abreu@synopsys.com, ilias.apalodimas@linaro.org,
	jhs@mojatatu.com, xiyou.wangcong@gmail.com,
	kurt.kanzenbach@linutronix.de, joergen.andreasen@microchip.com,
	netdev@vger.kernel.org
In-Reply-To: <20190915020003.27926-5-olteanv@gmail.com>

From: Vladimir Oltean <olteanv@gmail.com>
Date: Sep/15/2019, 03:00:01 (UTC+00:00)

> Instead of looking directly at skb->priority during xmit, let's get the
> netdev queue and the queue-to-traffic-class mapping, and put the
> resulting traffic class into the dsa_8021q PCP field. The switch is
> configured with a 1-to-1 PCP-to-ingress-queue-to-egress-queue mapping
> (see vlan_pmap in sja1105_main.c), so the effect is that we can inject
> into a front-panel's egress traffic class through VLAN tagging from
> Linux, completely transparently.

Wouldn't it be better to just rely on skb queue mapping as per userspace 
settings ? I mean this way you cant create some complex scenarios 
without having the VLAN ID need.

I generally use u32 filter and skbedit queue_mapping action to achieve 
this.

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* [PATCH V1 net-next 5/5] net: ena: ethtool: support set_channels callback
From: sameehj @ 2019-09-15 15:27 UTC (permalink / raw)
  To: davem, netdev
  Cc: Sameeh Jubran, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano
In-Reply-To: <20190915152722.8240-1-sameehj@amazon.com>

From: Sameeh Jubran <sameehj@amazon.com>

Set channels callback enables the user to change the count of queues
used by the driver using ethtool. We decided to currently support only
equal number of rx and tx queues, this might change in the future.

Also rename dev_up to dev_was_up in ena_update_queue_count() to make
it clearer.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c | 17 ++++++++++++++
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 22 ++++++++++++++++---
 drivers/net/ethernet/amazon/ena/ena_netdev.h  |  3 +++
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index c9d760465..f58fc3c68 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -744,6 +744,22 @@ static void ena_get_channels(struct net_device *netdev,
 	channels->combined_count = 0;
 }
 
+static int ena_set_channels(struct net_device *netdev,
+			    struct ethtool_channels *channels)
+{
+	struct ena_adapter *adapter = netdev_priv(netdev);
+	u32 new_channel_count;
+
+	if (channels->rx_count != channels->tx_count ||
+	    channels->max_tx != channels->max_rx)
+		return -EINVAL;
+
+	new_channel_count = clamp_val(channels->tx_count,
+				      ENA_MIN_NUM_IO_QUEUES, channels->max_tx);
+
+	return ena_update_queue_count(adapter, new_channel_count);
+}
+
 static int ena_get_tunable(struct net_device *netdev,
 			   const struct ethtool_tunable *tuna, void *data)
 {
@@ -807,6 +823,7 @@ static const struct ethtool_ops ena_ethtool_ops = {
 	.get_rxfh		= ena_get_rxfh,
 	.set_rxfh		= ena_set_rxfh,
 	.get_channels		= ena_get_channels,
+	.set_channels		= ena_set_channels,
 	.get_tunable		= ena_get_tunable,
 	.set_tunable		= ena_set_tunable,
 };
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 9d3abb184..5844ba07a 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2051,14 +2051,30 @@ int ena_update_queue_sizes(struct ena_adapter *adapter,
 			   u32 new_tx_size,
 			   u32 new_rx_size)
 {
-	bool dev_up;
+	bool dev_was_up;
 
-	dev_up = test_bit(ENA_FLAG_DEV_UP, &adapter->flags);
+	dev_was_up = test_bit(ENA_FLAG_DEV_UP, &adapter->flags);
 	ena_close(adapter->netdev);
 	adapter->requested_tx_ring_size = new_tx_size;
 	adapter->requested_rx_ring_size = new_rx_size;
 	ena_init_io_rings(adapter);
-	return dev_up ? ena_up(adapter) : 0;
+	return dev_was_up ? ena_up(adapter) : 0;
+}
+
+int ena_update_queue_count(struct ena_adapter *adapter, u32 new_channel_count)
+{
+	struct ena_com_dev *ena_dev = adapter->ena_dev;
+	bool dev_was_up;
+
+	dev_was_up = test_bit(ENA_FLAG_DEV_UP, &adapter->flags);
+	ena_close(adapter->netdev);
+	adapter->num_io_queues = new_channel_count;
+       /* We need to destroy the rss table so that the indirection
+	* table will be reinitialized by ena_up()
+	*/
+	ena_com_rss_destroy(ena_dev);
+	ena_init_io_rings(adapter);
+	return dev_was_up ? ena_open(adapter->netdev) : 0;
 }
 
 static void ena_tx_csum(struct ena_com_tx_ctx *ena_tx_ctx, struct sk_buff *skb)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 7499afb58..bffd778f2 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -82,6 +82,8 @@
 #define ENA_DEFAULT_RING_SIZE	(1024)
 #define ENA_MIN_RING_SIZE	(256)
 
+#define ENA_MIN_NUM_IO_QUEUES	(1)
+
 #define ENA_TX_WAKEUP_THRESH		(MAX_SKB_FRAGS + 2)
 #define ENA_DEFAULT_RX_COPYBREAK	(256 - NET_IP_ALIGN)
 
@@ -388,6 +390,7 @@ void ena_dump_stats_to_buf(struct ena_adapter *adapter, u8 *buf);
 int ena_update_queue_sizes(struct ena_adapter *adapter,
 			   u32 new_tx_size,
 			   u32 new_rx_size);
+int ena_update_queue_count(struct ena_adapter *adapter, u32 new_channel_count);
 
 int ena_get_sset_count(struct net_device *netdev, int sset);
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH V1 net-next 4/5] net: ena:remove redundant print of number of queues and placement policy
From: sameehj @ 2019-09-15 15:27 UTC (permalink / raw)
  To: davem, netdev
  Cc: Sameeh Jubran, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano
In-Reply-To: <20190915152722.8240-1-sameehj@amazon.com>

From: Sameeh Jubran <sameehj@amazon.com>

The number of queues and the placement policy are printed in the process
of queue creation in ena_up(). No need to print them in ena_probe()

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index a98422667..9d3abb184 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3441,7 +3441,6 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct net_device *netdev;
 	static int adapters_found;
 	u32 max_num_io_queues;
-	char *queue_type_str;
 	bool wd_state;
 	int bars, rc;
 
@@ -3607,15 +3606,10 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	timer_setup(&adapter->timer_service, ena_timer_service, 0);
 	mod_timer(&adapter->timer_service, round_jiffies(jiffies + HZ));
 
-	if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_HOST)
-		queue_type_str = "Regular";
-	else
-		queue_type_str = "Low Latency";
-
 	dev_info(&pdev->dev,
-		 "%s found at mem %lx, mac addr %pM Queues %d, Placement policy: %s\n",
+		 "%s found at mem %lx, mac addr %pM\n",
 		 DEVICE_NAME, (long)pci_resource_start(pdev, 0),
-		 netdev->dev_addr, max_num_io_queues, queue_type_str);
+		 netdev->dev_addr);
 
 	set_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags);
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH V1 net-next 3/5] make ethtool -l show correct max number of queues
From: sameehj @ 2019-09-15 15:27 UTC (permalink / raw)
  To: davem, netdev
  Cc: Sameeh Jubran, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano
In-Reply-To: <20190915152722.8240-1-sameehj@amazon.com>

From: Sameeh Jubran <sameehj@amazon.com>

- Update ena_ethtool:ena_get_channels() to return adapter->max_io_queues
  so that ethtool -l returns the correct maximum queue number.

- Change the name of ena_calc_io_queue_num() to
  ena_calc_max_io_queue_num() as it returns the maximum number of io
  queues and actual number of queues can be smaller if changed
  by ethtool -L which is implemented in a later commit.

- Change variable name from io_queue_num to max_num_io_queues in
  ena_calc_max_io_queue_num() and ena_probe().

- Make all types of variables that convey the number and sizeof queues
  to be u32, for consistency with the API between the driver and the
  device.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c |  4 +-
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 41 ++++++++++---------
 drivers/net/ethernet/amazon/ena/ena_netdev.h  | 11 ++---
 3 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 3ad661fd9..c9d760465 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -734,8 +734,8 @@ static void ena_get_channels(struct net_device *netdev,
 {
 	struct ena_adapter *adapter = netdev_priv(netdev);
 
-	channels->max_rx = adapter->num_io_queues;
-	channels->max_tx = adapter->num_io_queues;
+	channels->max_rx = adapter->max_num_io_queues;
+	channels->max_tx = adapter->max_num_io_queues;
 	channels->max_other = 0;
 	channels->max_combined = 0;
 	channels->rx_count = adapter->num_io_queues;
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index b7cd80c5f..a98422667 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3143,16 +3143,16 @@ static void ena_timer_service(struct timer_list *t)
 	mod_timer(&adapter->timer_service, jiffies + HZ);
 }
 
-static int ena_calc_io_queue_num(struct pci_dev *pdev,
-				 struct ena_com_dev *ena_dev,
-				 struct ena_com_dev_get_features_ctx *get_feat_ctx)
+static int ena_calc_max_io_queue_num(struct pci_dev *pdev,
+				     struct ena_com_dev *ena_dev,
+				     struct ena_com_dev_get_features_ctx *get_feat_ctx)
 {
-	int io_tx_sq_num, io_tx_cq_num, io_rx_num, io_queue_num;
+	int io_tx_sq_num, io_tx_cq_num, io_rx_num, max_num_io_queues;
 
 	if (ena_dev->supported_features & BIT(ENA_ADMIN_MAX_QUEUES_EXT)) {
 		struct ena_admin_queue_ext_feature_fields *max_queue_ext =
 			&get_feat_ctx->max_queue_ext.max_queue_ext;
-		io_rx_num = min_t(int, max_queue_ext->max_rx_sq_num,
+		io_rx_num = min_t(u32, max_queue_ext->max_rx_sq_num,
 				  max_queue_ext->max_rx_cq_num);
 
 		io_tx_sq_num = max_queue_ext->max_tx_sq_num;
@@ -3162,25 +3162,25 @@ static int ena_calc_io_queue_num(struct pci_dev *pdev,
 			&get_feat_ctx->max_queues;
 		io_tx_sq_num = max_queues->max_sq_num;
 		io_tx_cq_num = max_queues->max_cq_num;
-		io_rx_num = min_t(int, io_tx_sq_num, io_tx_cq_num);
+		io_rx_num = min_t(u32, io_tx_sq_num, io_tx_cq_num);
 	}
 
 	/* In case of LLQ use the llq fields for the tx SQ/CQ */
 	if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV)
 		io_tx_sq_num = get_feat_ctx->llq.max_llq_num;
 
-	io_queue_num = min_t(int, num_online_cpus(), ENA_MAX_NUM_IO_QUEUES);
-	io_queue_num = min_t(int, io_queue_num, io_rx_num);
-	io_queue_num = min_t(int, io_queue_num, io_tx_sq_num);
-	io_queue_num = min_t(int, io_queue_num, io_tx_cq_num);
+	max_num_io_queues = min_t(u32, num_online_cpus(), ENA_MAX_NUM_IO_QUEUES);
+	max_num_io_queues = min_t(u32, max_num_io_queues, io_rx_num);
+	max_num_io_queues = min_t(u32, max_num_io_queues, io_tx_sq_num);
+	max_num_io_queues = min_t(u32, max_num_io_queues, io_tx_cq_num);
 	/* 1 IRQ for for mgmnt and 1 IRQs for each IO direction */
-	io_queue_num = min_t(int, io_queue_num, pci_msix_vec_count(pdev) - 1);
-	if (unlikely(!io_queue_num)) {
+	max_num_io_queues = min_t(u32, max_num_io_queues, pci_msix_vec_count(pdev) - 1);
+	if (unlikely(!max_num_io_queues)) {
 		dev_err(&pdev->dev, "The device doesn't have io queues\n");
 		return -EFAULT;
 	}
 
-	return io_queue_num;
+	return max_num_io_queues;
 }
 
 static int ena_set_queues_placement_policy(struct pci_dev *pdev,
@@ -3438,11 +3438,12 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct ena_llq_configurations llq_config;
 	struct ena_com_dev *ena_dev = NULL;
 	struct ena_adapter *adapter;
-	int io_queue_num, bars, rc;
 	struct net_device *netdev;
 	static int adapters_found;
+	u32 max_num_io_queues;
 	char *queue_type_str;
 	bool wd_state;
+	int bars, rc;
 
 	dev_dbg(&pdev->dev, "%s\n", __func__);
 
@@ -3508,15 +3509,15 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	ena_dev->intr_moder_tx_interval = ENA_INTR_INITIAL_TX_INTERVAL_USECS;
 	ena_dev->intr_moder_rx_interval = ENA_INTR_INITIAL_RX_INTERVAL_USECS;
 	ena_dev->intr_delay_resolution = ENA_DEFAULT_INTR_DELAY_RESOLUTION;
-	io_queue_num = ena_calc_io_queue_num(pdev, ena_dev, &get_feat_ctx);
+	max_num_io_queues = ena_calc_max_io_queue_num(pdev, ena_dev, &get_feat_ctx);
 	rc = ena_calc_io_queue_size(&calc_queue_ctx);
-	if (rc || io_queue_num <= 0) {
+	if (rc || !max_num_io_queues) {
 		rc = -EFAULT;
 		goto err_device_destroy;
 	}
 
 	/* dev zeroed in init_etherdev */
-	netdev = alloc_etherdev_mq(sizeof(struct ena_adapter), io_queue_num);
+	netdev = alloc_etherdev_mq(sizeof(struct ena_adapter), max_num_io_queues);
 	if (!netdev) {
 		dev_err(&pdev->dev, "alloc_etherdev_mq failed\n");
 		rc = -ENOMEM;
@@ -3544,7 +3545,9 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	adapter->max_tx_sgl_size = calc_queue_ctx.max_tx_sgl_size;
 	adapter->max_rx_sgl_size = calc_queue_ctx.max_rx_sgl_size;
 
-	adapter->num_io_queues = io_queue_num;
+	adapter->num_io_queues = max_num_io_queues;
+	adapter->max_num_io_queues = max_num_io_queues;
+
 	adapter->last_monitored_tx_qid = 0;
 
 	adapter->rx_copybreak = ENA_DEFAULT_RX_COPYBREAK;
@@ -3612,7 +3615,7 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev_info(&pdev->dev,
 		 "%s found at mem %lx, mac addr %pM Queues %d, Placement policy: %s\n",
 		 DEVICE_NAME, (long)pci_resource_start(pdev, 0),
-		 netdev->dev_addr, io_queue_num, queue_type_str);
+		 netdev->dev_addr, max_num_io_queues, queue_type_str);
 
 	set_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags);
 
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 7e8e51e32..7499afb58 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -161,10 +161,10 @@ struct ena_calc_queue_size_ctx {
 	struct ena_com_dev_get_features_ctx *get_feat_ctx;
 	struct ena_com_dev *ena_dev;
 	struct pci_dev *pdev;
-	u16 tx_queue_size;
-	u16 rx_queue_size;
-	u16 max_tx_queue_size;
-	u16 max_rx_queue_size;
+	u32 tx_queue_size;
+	u32 rx_queue_size;
+	u32 max_tx_queue_size;
+	u32 max_rx_queue_size;
 	u16 max_tx_sgl_size;
 	u16 max_rx_sgl_size;
 };
@@ -324,7 +324,8 @@ struct ena_adapter {
 	u32 rx_copybreak;
 	u32 max_mtu;
 
-	int num_io_queues;
+	u32 num_io_queues;
+	u32 max_num_io_queues;
 
 	int msix_vecs;
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH V1 net-next 2/5] net: ena: multiple queue creation related cleanups
From: sameehj @ 2019-09-15 15:27 UTC (permalink / raw)
  To: davem, netdev
  Cc: Sameeh Jubran, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano
In-Reply-To: <20190915152722.8240-1-sameehj@amazon.com>

From: Sameeh Jubran <sameehj@amazon.com>

- Move the print to dmesg of creating io queues from ena_probe to
  ena_up. ena_up is the place where queues are actually created.
- Rename ena_calc_queue_size() to ena_calc_io_queue_size() for clarity
  and consistency
- Remove redundant number of io queues parameter in functions
  ena_enable_msix() and ena_enable_msix_and_set_admin_interrupts(),
  which already get adapter parameter, so use adapter->num_io_queues
  in the function instead.
- Use the local variable ena_dev instead of ctx->ena_dev in
  ena_calc_io_queue_size
- Fix multi row comment alignments

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 40 ++++++++++----------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index d4d2639d2..b7cd80c5f 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1332,7 +1332,7 @@ static irqreturn_t ena_intr_msix_io(int irq, void *data)
  * the number of potential io queues is the minimum of what the device
  * supports and the number of vCPUs.
  */
-static int ena_enable_msix(struct ena_adapter *adapter, int num_queues)
+static int ena_enable_msix(struct ena_adapter *adapter)
 {
 	int msix_vecs, irq_cnt;
 
@@ -1343,7 +1343,7 @@ static int ena_enable_msix(struct ena_adapter *adapter, int num_queues)
 	}
 
 	/* Reserved the max msix vectors we might need */
-	msix_vecs = ENA_MAX_MSIX_VEC(num_queues);
+	msix_vecs = ENA_MAX_MSIX_VEC(adapter->num_io_queues);
 	netif_dbg(adapter, probe, adapter->netdev,
 		  "trying to enable MSI-X, vectors %d\n", msix_vecs);
 
@@ -1885,6 +1885,13 @@ static int ena_up(struct ena_adapter *adapter)
 	if (rc)
 		goto err_req_irq;
 
+	netif_info(adapter, ifup, adapter->netdev, "creating %d io queues. rx queue size: %d tx queue size. %d LLQ is %s\n",
+		   adapter->num_io_queues,
+		   adapter->requested_rx_ring_size,
+		   adapter->requested_tx_ring_size,
+		   (adapter->ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) ?
+		   "ENABLED" : "DISABLED");
+
 	rc = create_queues_with_size_backoff(adapter);
 	if (rc)
 		goto err_create_queues_with_backoff;
@@ -2683,14 +2690,13 @@ err_mmio_read_less:
 	return rc;
 }
 
-static int ena_enable_msix_and_set_admin_interrupts(struct ena_adapter *adapter,
-						    int io_vectors)
+static int ena_enable_msix_and_set_admin_interrupts(struct ena_adapter *adapter)
 {
 	struct ena_com_dev *ena_dev = adapter->ena_dev;
 	struct device *dev = &adapter->pdev->dev;
 	int rc;
 
-	rc = ena_enable_msix(adapter, io_vectors);
+	rc = ena_enable_msix(adapter);
 	if (rc) {
 		dev_err(dev, "Can not reserve msix vectors\n");
 		return rc;
@@ -2783,8 +2789,7 @@ static int ena_restore_device(struct ena_adapter *adapter)
 		goto err_device_destroy;
 	}
 
-	rc = ena_enable_msix_and_set_admin_interrupts(adapter,
-						      adapter->num_io_queues);
+	rc = ena_enable_msix_and_set_admin_interrupts(adapter);
 	if (rc) {
 		dev_err(&pdev->dev, "Enable MSI-X failed\n");
 		goto err_device_destroy;
@@ -3350,7 +3355,7 @@ static void set_default_llq_configurations(struct ena_llq_configurations *llq_co
 	llq_config->llq_ring_entry_size_value = 128;
 }
 
-static int ena_calc_queue_size(struct ena_calc_queue_size_ctx *ctx)
+static int ena_calc_io_queue_size(struct ena_calc_queue_size_ctx *ctx)
 {
 	struct ena_admin_feature_llq_desc *llq = &ctx->get_feat_ctx->llq;
 	struct ena_com_dev *ena_dev = ctx->ena_dev;
@@ -3359,7 +3364,7 @@ static int ena_calc_queue_size(struct ena_calc_queue_size_ctx *ctx)
 	u32 max_tx_queue_size;
 	u32 max_rx_queue_size;
 
-	if (ctx->ena_dev->supported_features & BIT(ENA_ADMIN_MAX_QUEUES_EXT)) {
+	if (ena_dev->supported_features & BIT(ENA_ADMIN_MAX_QUEUES_EXT)) {
 		struct ena_admin_queue_ext_feature_fields *max_queue_ext =
 			&ctx->get_feat_ctx->max_queue_ext.max_queue_ext;
 		max_rx_queue_size = min_t(u32, max_queue_ext->max_rx_cq_depth,
@@ -3497,26 +3502,19 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	calc_queue_ctx.get_feat_ctx = &get_feat_ctx;
 	calc_queue_ctx.pdev = pdev;
 
-	/* Initial Tx and RX interrupt delay. Assumes 1 usec granularity.
-	* Updated during device initialization with the real granularity
-	*/
+	/* initial Tx interrupt delay, Assumes 1 usec granularity.
+	 * Updated during device initialization with the real granularity
+	 */
 	ena_dev->intr_moder_tx_interval = ENA_INTR_INITIAL_TX_INTERVAL_USECS;
 	ena_dev->intr_moder_rx_interval = ENA_INTR_INITIAL_RX_INTERVAL_USECS;
 	ena_dev->intr_delay_resolution = ENA_DEFAULT_INTR_DELAY_RESOLUTION;
 	io_queue_num = ena_calc_io_queue_num(pdev, ena_dev, &get_feat_ctx);
-	rc = ena_calc_queue_size(&calc_queue_ctx);
+	rc = ena_calc_io_queue_size(&calc_queue_ctx);
 	if (rc || io_queue_num <= 0) {
 		rc = -EFAULT;
 		goto err_device_destroy;
 	}
 
-	dev_info(&pdev->dev, "creating %d io queues. rx queue size: %d tx queue size. %d LLQ is %s\n",
-		 io_queue_num,
-		 calc_queue_ctx.rx_queue_size,
-		 calc_queue_ctx.tx_queue_size,
-		 (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) ?
-		 "ENABLED" : "DISABLED");
-
 	/* dev zeroed in init_etherdev */
 	netdev = alloc_etherdev_mq(sizeof(struct ena_adapter), io_queue_num);
 	if (!netdev) {
@@ -3570,7 +3568,7 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	u64_stats_init(&adapter->syncp);
 
-	rc = ena_enable_msix_and_set_admin_interrupts(adapter, io_queue_num);
+	rc = ena_enable_msix_and_set_admin_interrupts(adapter);
 	if (rc) {
 		dev_err(&pdev->dev,
 			"Failed to enable and set the admin interrupts\n");
-- 
2.17.1


^ permalink raw reply related

* [PATCH V1 net-next 0/5] Introduce ethtool's set_channels
From: sameehj @ 2019-09-15 15:27 UTC (permalink / raw)
  To: davem, netdev
  Cc: Sameeh Jubran, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano

From: Sameeh Jubran <sameehj@amazon.com>

This patch series introduces the support of "ethtool --set-channels/-L"
command to the ena driver.

This series is also a preparation for the upcoming xdp support in the ena
driver.

This patch series has been rebased over the series:
"net: ena: implement adaptive interrupt moderation using dim"

Sameeh Jubran (5):
  net: ena: change num_queues to num_io_queues for clarity and
    consistency
  net: ena: multiple queue creation related cleanups
  make ethtool -l show correct max number of queues
  net: ena:remove redundant print of number of queues and placement
    policy
  net: ena: ethtool: support set_channels callback

 drivers/net/ethernet/amazon/ena/ena_ethtool.c |  37 +++-
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 173 ++++++++++--------
 drivers/net/ethernet/amazon/ena/ena_netdev.h  |  14 +-
 3 files changed, 128 insertions(+), 96 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH V1 net-next 1/5] net: ena: change num_queues to num_io_queues for clarity and consistency
From: sameehj @ 2019-09-15 15:27 UTC (permalink / raw)
  To: davem, netdev
  Cc: Sameeh Jubran, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano
In-Reply-To: <20190915152722.8240-1-sameehj@amazon.com>

From: Sameeh Jubran <sameehj@amazon.com>

Most places in the code refer to the IO queues as io_queues and not
simply queues. Examples - max_io_queues_per_vf, ENA_MAX_NUM_IO_QUEUES,
ena_destroy_all_io_queues() etc..

We are also adding the new max_num_io_queues field to struct ena_adapter
in the following commit.

The changes included in this commit are:
struct ena_adapter->num_queues => struct ena_adapter->num_io_queues

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c | 20 +++---
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 66 +++++++++----------
 drivers/net/ethernet/amazon/ena/ena_netdev.h  |  2 +-
 3 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 16553d92f..3ad661fd9 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -133,7 +133,7 @@ static void ena_queue_stats(struct ena_adapter *adapter, u64 **data)
 	u64 *ptr;
 	int i, j;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		/* Tx stats */
 		ring = &adapter->tx_ring[i];
 
@@ -205,7 +205,7 @@ int ena_get_sset_count(struct net_device *netdev, int sset)
 	if (sset != ETH_SS_STATS)
 		return -EOPNOTSUPP;
 
-	return  adapter->num_queues * (ENA_STATS_ARRAY_TX + ENA_STATS_ARRAY_RX)
+	return  adapter->num_io_queues * (ENA_STATS_ARRAY_TX + ENA_STATS_ARRAY_RX)
 		+ ENA_STATS_ARRAY_GLOBAL + ENA_STATS_ARRAY_ENA_COM;
 }
 
@@ -214,7 +214,7 @@ static void ena_queue_strings(struct ena_adapter *adapter, u8 **data)
 	const struct ena_stats *ena_stats;
 	int i, j;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		/* Tx stats */
 		for (j = 0; j < ENA_STATS_ARRAY_TX; j++) {
 			ena_stats = &ena_stats_tx_strings[j];
@@ -333,7 +333,7 @@ static void ena_update_tx_rings_intr_moderation(struct ena_adapter *adapter)
 
 	val = ena_com_get_nonadaptive_moderation_interval_tx(adapter->ena_dev);
 
-	for (i = 0; i < adapter->num_queues; i++)
+	for (i = 0; i < adapter->num_io_queues; i++)
 		adapter->tx_ring[i].smoothed_interval = val;
 }
 
@@ -344,7 +344,7 @@ static void ena_update_rx_rings_intr_moderation(struct ena_adapter *adapter)
 
 	val = ena_com_get_nonadaptive_moderation_interval_rx(adapter->ena_dev);
 
-	for (i = 0; i < adapter->num_queues; i++)
+	for (i = 0; i < adapter->num_io_queues; i++)
 		adapter->rx_ring[i].smoothed_interval = val;
 }
 
@@ -612,7 +612,7 @@ static int ena_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *info,
 
 	switch (info->cmd) {
 	case ETHTOOL_GRXRINGS:
-		info->data = adapter->num_queues;
+		info->data = adapter->num_io_queues;
 		rc = 0;
 		break;
 	case ETHTOOL_GRXFH:
@@ -734,12 +734,12 @@ static void ena_get_channels(struct net_device *netdev,
 {
 	struct ena_adapter *adapter = netdev_priv(netdev);
 
-	channels->max_rx = adapter->num_queues;
-	channels->max_tx = adapter->num_queues;
+	channels->max_rx = adapter->num_io_queues;
+	channels->max_tx = adapter->num_io_queues;
 	channels->max_other = 0;
 	channels->max_combined = 0;
-	channels->rx_count = adapter->num_queues;
-	channels->tx_count = adapter->num_queues;
+	channels->rx_count = adapter->num_io_queues;
+	channels->tx_count = adapter->num_io_queues;
 	channels->other_count = 0;
 	channels->combined_count = 0;
 }
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 2b79fb5f7..d4d2639d2 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -101,7 +101,7 @@ static void update_rx_ring_mtu(struct ena_adapter *adapter, int mtu)
 {
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++)
+	for (i = 0; i < adapter->num_io_queues; i++)
 		adapter->rx_ring[i].mtu = mtu;
 }
 
@@ -129,10 +129,10 @@ static int ena_init_rx_cpu_rmap(struct ena_adapter *adapter)
 	u32 i;
 	int rc;
 
-	adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter->num_queues);
+	adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter->num_io_queues);
 	if (!adapter->netdev->rx_cpu_rmap)
 		return -ENOMEM;
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		int irq_idx = ENA_IO_IRQ_IDX(i);
 
 		rc = irq_cpu_rmap_add(adapter->netdev->rx_cpu_rmap,
@@ -172,7 +172,7 @@ static void ena_init_io_rings(struct ena_adapter *adapter)
 
 	ena_dev = adapter->ena_dev;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		txr = &adapter->tx_ring[i];
 		rxr = &adapter->rx_ring[i];
 
@@ -294,7 +294,7 @@ static int ena_setup_all_tx_resources(struct ena_adapter *adapter)
 {
 	int i, rc = 0;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		rc = ena_setup_tx_resources(adapter, i);
 		if (rc)
 			goto err_setup_tx;
@@ -322,7 +322,7 @@ static void ena_free_all_io_tx_resources(struct ena_adapter *adapter)
 {
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++)
+	for (i = 0; i < adapter->num_io_queues; i++)
 		ena_free_tx_resources(adapter, i);
 }
 
@@ -428,7 +428,7 @@ static int ena_setup_all_rx_resources(struct ena_adapter *adapter)
 {
 	int i, rc = 0;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		rc = ena_setup_rx_resources(adapter, i);
 		if (rc)
 			goto err_setup_rx;
@@ -456,7 +456,7 @@ static void ena_free_all_io_rx_resources(struct ena_adapter *adapter)
 {
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++)
+	for (i = 0; i < adapter->num_io_queues; i++)
 		ena_free_rx_resources(adapter, i);
 }
 
@@ -600,7 +600,7 @@ static void ena_refill_all_rx_bufs(struct ena_adapter *adapter)
 	struct ena_ring *rx_ring;
 	int i, rc, bufs_num;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		rx_ring = &adapter->rx_ring[i];
 		bufs_num = rx_ring->ring_size - 1;
 		rc = ena_refill_rx_bufs(rx_ring, bufs_num);
@@ -616,7 +616,7 @@ static void ena_free_all_rx_bufs(struct ena_adapter *adapter)
 {
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++)
+	for (i = 0; i < adapter->num_io_queues; i++)
 		ena_free_rx_bufs(adapter, i);
 }
 
@@ -688,7 +688,7 @@ static void ena_free_all_tx_bufs(struct ena_adapter *adapter)
 	struct ena_ring *tx_ring;
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		tx_ring = &adapter->tx_ring[i];
 		ena_free_tx_bufs(tx_ring);
 	}
@@ -699,7 +699,7 @@ static void ena_destroy_all_tx_queues(struct ena_adapter *adapter)
 	u16 ena_qid;
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		ena_qid = ENA_IO_TXQ_IDX(i);
 		ena_com_destroy_io_queue(adapter->ena_dev, ena_qid);
 	}
@@ -710,7 +710,7 @@ static void ena_destroy_all_rx_queues(struct ena_adapter *adapter)
 	u16 ena_qid;
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		ena_qid = ENA_IO_RXQ_IDX(i);
 		cancel_work_sync(&adapter->ena_napi[i].dim.work);
 		ena_com_destroy_io_queue(adapter->ena_dev, ena_qid);
@@ -1360,7 +1360,7 @@ static int ena_enable_msix(struct ena_adapter *adapter, int num_queues)
 		netif_notice(adapter, probe, adapter->netdev,
 			     "enable only %d MSI-X (out of %d), reduce the number of queues\n",
 			     irq_cnt, msix_vecs);
-		adapter->num_queues = irq_cnt - ENA_ADMIN_MSIX_VEC;
+		adapter->num_io_queues = irq_cnt - ENA_ADMIN_MSIX_VEC;
 	}
 
 	if (ena_init_rx_cpu_rmap(adapter))
@@ -1398,7 +1398,7 @@ static void ena_setup_io_intr(struct ena_adapter *adapter)
 
 	netdev = adapter->netdev;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		irq_idx = ENA_IO_IRQ_IDX(i);
 		cpu = i % num_online_cpus();
 
@@ -1530,7 +1530,7 @@ static void ena_del_napi(struct ena_adapter *adapter)
 {
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++)
+	for (i = 0; i < adapter->num_io_queues; i++)
 		netif_napi_del(&adapter->ena_napi[i].napi);
 }
 
@@ -1539,7 +1539,7 @@ static void ena_init_napi(struct ena_adapter *adapter)
 	struct ena_napi *napi;
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		napi = &adapter->ena_napi[i];
 
 		netif_napi_add(adapter->netdev,
@@ -1556,7 +1556,7 @@ static void ena_napi_disable_all(struct ena_adapter *adapter)
 {
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++)
+	for (i = 0; i < adapter->num_io_queues; i++)
 		napi_disable(&adapter->ena_napi[i].napi);
 }
 
@@ -1564,7 +1564,7 @@ static void ena_napi_enable_all(struct ena_adapter *adapter)
 {
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++)
+	for (i = 0; i < adapter->num_io_queues; i++)
 		napi_enable(&adapter->ena_napi[i].napi);
 }
 
@@ -1674,7 +1674,7 @@ static int ena_create_all_io_tx_queues(struct ena_adapter *adapter)
 	struct ena_com_dev *ena_dev = adapter->ena_dev;
 	int rc, i;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		rc = ena_create_io_tx_queue(adapter, i);
 		if (rc)
 			goto create_err;
@@ -1742,7 +1742,7 @@ static int ena_create_all_io_rx_queues(struct ena_adapter *adapter)
 	struct ena_com_dev *ena_dev = adapter->ena_dev;
 	int rc, i;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		rc = ena_create_io_rx_queue(adapter, i);
 		if (rc)
 			goto create_err;
@@ -1765,7 +1765,7 @@ static void set_io_rings_size(struct ena_adapter *adapter,
 {
 	int i;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		adapter->tx_ring[i].ring_size = new_tx_size;
 		adapter->rx_ring[i].ring_size = new_rx_size;
 	}
@@ -1903,14 +1903,14 @@ static int ena_up(struct ena_adapter *adapter)
 	set_bit(ENA_FLAG_DEV_UP, &adapter->flags);
 
 	/* Enable completion queues interrupt */
-	for (i = 0; i < adapter->num_queues; i++)
+	for (i = 0; i < adapter->num_io_queues; i++)
 		ena_unmask_interrupt(&adapter->tx_ring[i],
 				     &adapter->rx_ring[i]);
 
 	/* schedule napi in case we had pending packets
 	 * from the last time we disable napi
 	 */
-	for (i = 0; i < adapter->num_queues; i++)
+	for (i = 0; i < adapter->num_io_queues; i++)
 		napi_schedule(&adapter->ena_napi[i].napi);
 
 	return rc;
@@ -1985,13 +1985,13 @@ static int ena_open(struct net_device *netdev)
 	int rc;
 
 	/* Notify the stack of the actual queue counts. */
-	rc = netif_set_real_num_tx_queues(netdev, adapter->num_queues);
+	rc = netif_set_real_num_tx_queues(netdev, adapter->num_io_queues);
 	if (rc) {
 		netif_err(adapter, ifup, netdev, "Can't set num tx queues\n");
 		return rc;
 	}
 
-	rc = netif_set_real_num_rx_queues(netdev, adapter->num_queues);
+	rc = netif_set_real_num_rx_queues(netdev, adapter->num_io_queues);
 	if (rc) {
 		netif_err(adapter, ifup, netdev, "Can't set num rx queues\n");
 		return rc;
@@ -2496,7 +2496,7 @@ static void ena_get_stats64(struct net_device *netdev,
 	if (!test_bit(ENA_FLAG_DEV_UP, &adapter->flags))
 		return;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		u64 bytes, packets;
 
 		tx_ring = &adapter->tx_ring[i];
@@ -2784,7 +2784,7 @@ static int ena_restore_device(struct ena_adapter *adapter)
 	}
 
 	rc = ena_enable_msix_and_set_admin_interrupts(adapter,
-						      adapter->num_queues);
+						      adapter->num_io_queues);
 	if (rc) {
 		dev_err(&pdev->dev, "Enable MSI-X failed\n");
 		goto err_device_destroy;
@@ -2949,7 +2949,7 @@ static void check_for_missing_completions(struct ena_adapter *adapter)
 
 	budget = ENA_MONITORED_TX_QUEUES;
 
-	for (i = adapter->last_monitored_tx_qid; i < adapter->num_queues; i++) {
+	for (i = adapter->last_monitored_tx_qid; i < adapter->num_io_queues; i++) {
 		tx_ring = &adapter->tx_ring[i];
 		rx_ring = &adapter->rx_ring[i];
 
@@ -2966,7 +2966,7 @@ static void check_for_missing_completions(struct ena_adapter *adapter)
 			break;
 	}
 
-	adapter->last_monitored_tx_qid = i % adapter->num_queues;
+	adapter->last_monitored_tx_qid = i % adapter->num_io_queues;
 }
 
 /* trigger napi schedule after 2 consecutive detections */
@@ -2996,7 +2996,7 @@ static void check_for_empty_rx_ring(struct ena_adapter *adapter)
 	if (test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))
 		return;
 
-	for (i = 0; i < adapter->num_queues; i++) {
+	for (i = 0; i < adapter->num_io_queues; i++) {
 		rx_ring = &adapter->rx_ring[i];
 
 		refill_required =
@@ -3303,7 +3303,7 @@ static int ena_rss_init_default(struct ena_adapter *adapter)
 	}
 
 	for (i = 0; i < ENA_RX_RSS_TABLE_SIZE; i++) {
-		val = ethtool_rxfh_indir_default(i, adapter->num_queues);
+		val = ethtool_rxfh_indir_default(i, adapter->num_io_queues);
 		rc = ena_com_indirect_table_fill_entry(ena_dev, i,
 						       ENA_IO_RXQ_IDX(val));
 		if (unlikely(rc && (rc != -EOPNOTSUPP))) {
@@ -3546,7 +3546,7 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	adapter->max_tx_sgl_size = calc_queue_ctx.max_tx_sgl_size;
 	adapter->max_rx_sgl_size = calc_queue_ctx.max_rx_sgl_size;
 
-	adapter->num_queues = io_queue_num;
+	adapter->num_io_queues = io_queue_num;
 	adapter->last_monitored_tx_qid = 0;
 
 	adapter->rx_copybreak = ENA_DEFAULT_RX_COPYBREAK;
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 72ee51a82..7e8e51e32 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -324,7 +324,7 @@ struct ena_adapter {
 	u32 rx_copybreak;
 	u32 max_mtu;
 
-	int num_queues;
+	int num_io_queues;
 
 	int msix_vecs;
 
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v4 net-next 4/6] net: dsa: sja1105: Advertise the 8 TX queues
From: Florian Fainelli @ 2019-09-15 15:15 UTC (permalink / raw)
  To: Vladimir Oltean, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran
  Cc: weifeng.voon, jiri, m-karicheri2, jose.abreu, ilias.apalodimas,
	jhs, xiyou.wangcong, kurt.kanzenbach, joergen.andreasen, netdev
In-Reply-To: <20190915020003.27926-5-olteanv@gmail.com>



On 9/14/2019 7:00 PM, Vladimir Oltean wrote:
> This is a preparation patch for the tc-taprio offload (and potentially
> for other future offloads such as tc-mqprio).
> 
> Instead of looking directly at skb->priority during xmit, let's get the
> netdev queue and the queue-to-traffic-class mapping, and put the
> resulting traffic class into the dsa_8021q PCP field. The switch is
> configured with a 1-to-1 PCP-to-ingress-queue-to-egress-queue mapping
> (see vlan_pmap in sja1105_main.c), so the effect is that we can inject
> into a front-panel's egress traffic class through VLAN tagging from
> Linux, completely transparently.
> 
> Unfortunately the switch doesn't look at the VLAN PCP in the case of
> management traffic to/from the CPU (link-local frames at
> 01-80-C2-xx-xx-xx or 01-1B-19-xx-xx-xx) so we can't alter the
> transmission queue of this type of traffic on a frame-by-frame basis. It
> is only selected through the "hostprio" setting which ATM is harcoded in
> the driver to 7.
> 
> Signed-off-by: Vladimir Oltean <olteanv@gmail.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH v4 2/2] net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
From: Florian Fainelli @ 2019-09-15 15:11 UTC (permalink / raw)
  To: Andrew Lunn, Alexandru Ardelean
  Cc: netdev, devicetree, linux-kernel, davem, robh+dt, mark.rutland,
	hkallweit1, mkubecek
In-Reply-To: <20190914152931.GK27922@lunn.ch>



On 9/14/2019 8:29 AM, Andrew Lunn wrote:
> On Thu, Sep 12, 2019 at 07:28:12PM +0300, Alexandru Ardelean wrote:
> 
>> +static int adin_set_edpd(struct phy_device *phydev, u16 tx_interval)
>> +{
>> +	u16 val;
>> +
>> +	if (tx_interval == ETHTOOL_PHY_EDPD_DISABLE)
>> +		return phy_clear_bits(phydev, ADIN1300_PHY_CTRL_STATUS2,
>> +				(ADIN1300_NRG_PD_EN | ADIN1300_NRG_PD_TX_EN));
>> +
>> +	val = ADIN1300_NRG_PD_EN;
>> +
>> +	switch (tx_interval) {
>> +	case 1000: /* 1 second */
>> +		/* fallthrough */
>> +	case ETHTOOL_PHY_EDPD_DFLT_TX_MSECS:
>> +		val |= ADIN1300_NRG_PD_TX_EN;
>> +		/* fallthrough */
>> +	case ETHTOOL_PHY_EDPD_NO_TX:
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	return phy_modify(phydev, ADIN1300_PHY_CTRL_STATUS2,
>> +			  (ADIN1300_NRG_PD_EN | ADIN1300_NRG_PD_TX_EN),
>> +			  val);
>> +}
>> +
> 
>>  
>> +	rc = adin_set_edpd(phydev, 1);
>> +	if (rc < 0)
>> +		return rc;
> 
> Hi Alexandru
> 
> Shouldn't this be adin_set_edpd(phydev, 1000);

That does sound like the intended use, or use
ETHTOOL_PHY_EDPD_DFLT_TX_MSECS, with that fixed:

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH v4 1/2] ethtool: implement Energy Detect Powerdown support via phy-tunable
From: Florian Fainelli @ 2019-09-15 15:08 UTC (permalink / raw)
  To: Alexandru Ardelean, netdev, devicetree, linux-kernel
  Cc: davem, robh+dt, mark.rutland, hkallweit1, andrew, mkubecek
In-Reply-To: <20190912162812.402-2-alexandru.ardelean@analog.com>



On 9/12/2019 9:28 AM, Alexandru Ardelean wrote:
> The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
> this feature is common across other PHYs (like EEE), and defining
> `ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.
> 
> The way EDPD works, is that the RX block is put to a lower power mode,
> except for link-pulse detection circuits. The TX block is also put to low
> power mode, but the PHY wakes-up periodically to send link pulses, to avoid
> lock-ups in case the other side is also in EDPD mode.
> 
> Currently, there are 2 PHY drivers that look like they could use this new
> PHY tunable feature: the `adin` && `micrel` PHYs.
> 
> The ADIN's datasheet mentions that TX pulses are at intervals of 1 second
> default each, and they can be disabled. For the Micrel KSZ9031 PHY, the
> datasheet does not mention whether they can be disabled, but mentions that
> they can modified.
> 
> The way this change is structured, is similar to the PHY tunable downshift
> control:
> * a `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` value is exposed to cover a default
>   TX interval; some PHYs could specify a certain value that makes sense
> * `ETHTOOL_PHY_EDPD_NO_TX` would disable TX when EDPD is enabled
> * `ETHTOOL_PHY_EDPD_DISABLE` will disable EDPD
> 
> As noted by the `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` the interval unit is 1
> millisecond, which should cover a reasonable range of intervals:
>  - from 1 millisecond, which does not sound like much of a power-saver
>  - to ~65 seconds which is quite a lot to wait for a link to come up when
>    plugging a cable
> 
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* [PATCH V1 net] net: ena: don't wake up tx queue when down
From: sameehj @ 2019-09-15 14:29 UTC (permalink / raw)
  To: davem, netdev
  Cc: Sameeh Jubran, dwmw, zorik, matua, saeedb, msw, aliguori, nafea,
	gtzalik, netanel, alisaidi, benh, akiyano

From: Sameeh Jubran <sameehj@amazon.com>

There is a race condition that can occur when calling ena_down().
The ena_clean_tx_irq() - which is a part of the napi handler -
function might wake up the tx queue when the queue is supposed
to be down (during recovery or changing the size of the queues
for example) This causes the ena_start_xmit() function to trigger
and possibly try to access the destroyed queues.

The race is illustrated below:

Flow A:                                       Flow B(napi handler)
ena_down()
   netif_carrier_off()
   netif_tx_disable()
                                                      ena_clean_tx_irq()
                                                         netif_tx_wake_queue()
   ena_napi_disable_all()
   ena_destroy_all_io_queues()

After these flows the tx queue is active and ena_start_xmit() accesses
the destroyed queue which leads to a kernel panic.

fixes: 1738cd3ed342 (net: ena: Add a driver for Amazon Elastic Network Adapters (ENA))

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 664e3ed97..d118ed4c5 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -823,7 +823,8 @@ static int ena_clean_tx_irq(struct ena_ring *tx_ring, u32 budget)
 		above_thresh =
 			ena_com_sq_have_enough_space(tx_ring->ena_com_io_sq,
 						     ENA_TX_WAKEUP_THRESH);
-		if (netif_tx_queue_stopped(txq) && above_thresh) {
+		if (netif_tx_queue_stopped(txq) && above_thresh &&
+		    test_bit(ENA_FLAG_DEV_UP, &tx_ring->adapter->flags)) {
 			netif_tx_wake_queue(txq);
 			u64_stats_update_begin(&tx_ring->syncp);
 			tx_ring->tx_stats.queue_wakeup++;
-- 
2.17.1


^ permalink raw reply related

* RE: [PATCH net] udp: correct reuseport selection with connected sockets
From: Steve Zabele @ 2019-09-15 13:36 UTC (permalink / raw)
  To: 'Willem de Bruijn', netdev
  Cc: davem, edumazet, kraig, pabeni, mark.keaton,
	'Willem de Bruijn'
In-Reply-To: <20190913011639.55895-1-willemdebruijn.kernel@gmail.com>

Hey Willem,

Thanks a bunch for getting this resolved, *very* much appreciated. This is a
really big help for us

Do you know if this will be backported to 4.19 stable, and if so when it
might be available??

Thanks again

Steve

-----Original Message-----
From: Willem de Bruijn [mailto:willemdebruijn.kernel@gmail.com] 
Sent: Thursday, September 12, 2019 9:17 PM
To: netdev@vger.kernel.org
Cc: davem@davemloft.net; edumazet@google.com; kraig@google.com;
zabele@comcast.net; pabeni@redhat.com; mark.keaton@raytheon.com; Willem de
Bruijn
Subject: [PATCH net] udp: correct reuseport selection with connected sockets

From: Willem de Bruijn <willemb@google.com>

UDP reuseport groups can hold a mix unconnected and connected sockets.
Ensure that connections only receive all traffic to their 4-tuple.

Fast reuseport returns on the first reuseport match on the assumption
that all matches are equal. Only if connections are present, return to
the previous behavior of scoring all sockets.

Record if connections are present and if so (1) treat such connected
sockets as an independent match from the group, (2) only return
2-tuple matches from reuseport and (3) do not return on the first
2-tuple reuseport match to allow for a higher scoring match later.

New field has_conns is set without locks. No other fields in the
bitmap are modified at runtime and the field is only ever set
unconditionally, so an RMW cannot miss a change.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Link:
http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw
@mail.gmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>

---

I was unable to compile some older kernels, so the Fixes tag is based
on basic analysis, not bisected to by the regression test.
---
 include/net/sock_reuseport.h | 20 +++++++++++++++++++-
 net/core/sock_reuseport.c    | 15 +++++++++++++--
 net/ipv4/datagram.c          |  2 ++
 net/ipv4/udp.c               |  5 +++--
 net/ipv6/datagram.c          |  2 ++
 net/ipv6/udp.c               |  5 +++--
 6 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h
index d9112de85261..43f4a818d88f 100644
--- a/include/net/sock_reuseport.h
+++ b/include/net/sock_reuseport.h
@@ -21,7 +21,8 @@ struct sock_reuseport {
 	unsigned int		synq_overflow_ts;
 	/* ID stays the same even after the size of socks[] grows. */
 	unsigned int		reuseport_id;
-	bool			bind_inany;
+	unsigned int		bind_inany:1;
+	unsigned int		has_conns:1;
 	struct bpf_prog __rcu	*prog;		/* optional BPF sock
selector */
 	struct sock		*socks[0];	/* array of sock pointers */
 };
@@ -37,6 +38,23 @@ extern struct sock *reuseport_select_sock(struct sock
*sk,
 extern int reuseport_attach_prog(struct sock *sk, struct bpf_prog *prog);
 extern int reuseport_detach_prog(struct sock *sk);
 
+static inline bool reuseport_has_conns(struct sock *sk, bool set)
+{
+	struct sock_reuseport *reuse;
+	bool ret = false;
+
+	rcu_read_lock();
+	reuse = rcu_dereference(sk->sk_reuseport_cb);
+	if (reuse) {
+		if (set)
+			reuse->has_conns = 1;
+		ret = reuse->has_conns;
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
 int reuseport_get_id(struct sock_reuseport *reuse);
 
 #endif  /* _SOCK_REUSEPORT_H */
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index 9408f9264d05..f3ceec93f392 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -295,8 +295,19 @@ struct sock *reuseport_select_sock(struct sock *sk,
 
 select_by_hash:
 		/* no bpf or invalid bpf result: fall back to hash usage */
-		if (!sk2)
-			sk2 = reuse->socks[reciprocal_scale(hash, socks)];
+		if (!sk2) {
+			int i, j;
+
+			i = j = reciprocal_scale(hash, socks);
+			while (reuse->socks[i]->sk_state == TCP_ESTABLISHED)
{
+				i++;
+				if (i >= reuse->num_socks)
+					i = 0;
+				if (i == j)
+					goto out;
+			}
+			sk2 = reuse->socks[i];
+		}
 	}
 
 out:
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 7bd29e694603..9a0fe0c2fa02 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -15,6 +15,7 @@
 #include <net/sock.h>
 #include <net/route.h>
 #include <net/tcp_states.h>
+#include <net/sock_reuseport.h>
 
 int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int
addr_len)
 {
@@ -69,6 +70,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
sockaddr *uaddr, int addr_len
 	}
 	inet->inet_daddr = fl4->daddr;
 	inet->inet_dport = usin->sin_port;
+	reuseport_has_conns(sk, true);
 	sk->sk_state = TCP_ESTABLISHED;
 	sk_set_txhash(sk);
 	inet->inet_id = jiffies;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d88821c794fb..16486c8b708b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -423,12 +423,13 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport) {
+			if (sk->sk_reuseport &&
+			    sk->sk_state != TCP_ESTABLISHED) {
 				hash = udp_ehashfn(net, daddr, hnum,
 						   saddr, sport);
 				result = reuseport_select_sock(sk, hash,
skb,
 							sizeof(struct
udphdr));
-				if (result)
+				if (result && !reuseport_has_conns(sk,
false))
 					return result;
 			}
 			badness = score;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 9ab897ded4df..96f939248d2f 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -27,6 +27,7 @@
 #include <net/ip6_route.h>
 #include <net/tcp_states.h>
 #include <net/dsfield.h>
+#include <net/sock_reuseport.h>
 
 #include <linux/errqueue.h>
 #include <linux/uaccess.h>
@@ -254,6 +255,7 @@ int __ip6_datagram_connect(struct sock *sk, struct
sockaddr *uaddr,
 		goto out;
 	}
 
+	reuseport_has_conns(sk, true);
 	sk->sk_state = TCP_ESTABLISHED;
 	sk_set_txhash(sk);
 out:
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 827fe7385078..5995fdc99d3f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -158,13 +158,14 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport) {
+			if (sk->sk_reuseport &&
+			    sk->sk_state != TCP_ESTABLISHED) {
 				hash = udp6_ehashfn(net, daddr, hnum,
 						    saddr, sport);
 
 				result = reuseport_select_sock(sk, hash,
skb,
 							sizeof(struct
udphdr));
-				if (result)
+				if (result && !reuseport_has_conns(sk,
false))
 					return result;
 			}
 			result = sk;
-- 
2.23.0.237.gc6a4ce50a0-goog


^ permalink raw reply related

* [bpf-next,v4] samples: bpf: add max_pckt_size option at xdp_adjust_tail
From: Daniel T. Lee @ 2019-09-15 12:47 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, bpf

Currently, at xdp_adjust_tail_kern.c, MAX_PCKT_SIZE is limited
to 600. To make this size flexible, a new map 'pcktsz' is added.

By updating new packet size to this map from the userland,
xdp_adjust_tail_kern.o will use this value as a new max_pckt_size.

If no '-P <MAX_PCKT_SIZE>' option is used, the size of maximum packet
will be 600 as a default.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>

---
Changes in v4:
    - make pckt_size no less than ICMP_TOOBIG_SIZE
    - Fix code style
Changes in v2:
    - Change the helper to fetch map from 'bpf_map__next' to 
    'bpf_object__find_map_fd_by_name'.

 samples/bpf/xdp_adjust_tail_kern.c | 23 +++++++++++++++++++----
 samples/bpf/xdp_adjust_tail_user.c | 28 ++++++++++++++++++++++------
 2 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/samples/bpf/xdp_adjust_tail_kern.c b/samples/bpf/xdp_adjust_tail_kern.c
index 411fdb21f8bc..8869bbb160d2 100644
--- a/samples/bpf/xdp_adjust_tail_kern.c
+++ b/samples/bpf/xdp_adjust_tail_kern.c
@@ -25,6 +25,13 @@
 #define ICMP_TOOBIG_SIZE 98
 #define ICMP_TOOBIG_PAYLOAD_SIZE 92
 
+struct bpf_map_def SEC("maps") pcktsz = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u32),
+	.max_entries = 1,
+};
+
 struct bpf_map_def SEC("maps") icmpcnt = {
 	.type = BPF_MAP_TYPE_ARRAY,
 	.key_size = sizeof(__u32),
@@ -64,7 +71,8 @@ static __always_inline void ipv4_csum(void *data_start, int data_size,
 	*csum = csum_fold_helper(*csum);
 }
 
-static __always_inline int send_icmp4_too_big(struct xdp_md *xdp)
+static __always_inline int send_icmp4_too_big(struct xdp_md *xdp,
+					      __u32 max_pckt_size)
 {
 	int headroom = (int)sizeof(struct iphdr) + (int)sizeof(struct icmphdr);
 
@@ -92,7 +100,7 @@ static __always_inline int send_icmp4_too_big(struct xdp_md *xdp)
 	orig_iph = data + off;
 	icmp_hdr->type = ICMP_DEST_UNREACH;
 	icmp_hdr->code = ICMP_FRAG_NEEDED;
-	icmp_hdr->un.frag.mtu = htons(MAX_PCKT_SIZE-sizeof(struct ethhdr));
+	icmp_hdr->un.frag.mtu = htons(max_pckt_size - sizeof(struct ethhdr));
 	icmp_hdr->checksum = 0;
 	ipv4_csum(icmp_hdr, ICMP_TOOBIG_PAYLOAD_SIZE, &csum);
 	icmp_hdr->checksum = csum;
@@ -118,14 +126,21 @@ static __always_inline int handle_ipv4(struct xdp_md *xdp)
 {
 	void *data_end = (void *)(long)xdp->data_end;
 	void *data = (void *)(long)xdp->data;
+	__u32 max_pckt_size = MAX_PCKT_SIZE;
 	int pckt_size = data_end - data;
+	__u32 *pckt_sz;
+	__u32 key = 0;
 	int offset;
 
-	if (pckt_size > MAX_PCKT_SIZE) {
+	pckt_sz = bpf_map_lookup_elem(&pcktsz, &key);
+	if (pckt_sz && *pckt_sz)
+		max_pckt_size = *pckt_sz;
+
+	if (pckt_size > max(max_pckt_size, ICMP_TOOBIG_SIZE)) {
 		offset = pckt_size - ICMP_TOOBIG_SIZE;
 		if (bpf_xdp_adjust_tail(xdp, 0 - offset))
 			return XDP_PASS;
-		return send_icmp4_too_big(xdp);
+		return send_icmp4_too_big(xdp, max_pckt_size);
 	}
 	return XDP_PASS;
 }
diff --git a/samples/bpf/xdp_adjust_tail_user.c b/samples/bpf/xdp_adjust_tail_user.c
index a3596b617c4c..99e965c68054 100644
--- a/samples/bpf/xdp_adjust_tail_user.c
+++ b/samples/bpf/xdp_adjust_tail_user.c
@@ -23,6 +23,7 @@
 #include "libbpf.h"
 
 #define STATS_INTERVAL_S 2U
+#define MAX_PCKT_SIZE 600
 
 static int ifindex = -1;
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -72,6 +73,7 @@ static void usage(const char *cmd)
 	printf("Usage: %s [...]\n", cmd);
 	printf("    -i <ifname|ifindex> Interface\n");
 	printf("    -T <stop-after-X-seconds> Default: 0 (forever)\n");
+	printf("    -P <MAX_PCKT_SIZE> Default: %u\n", MAX_PCKT_SIZE);
 	printf("    -S use skb-mode\n");
 	printf("    -N enforce native mode\n");
 	printf("    -F force loading prog\n");
@@ -85,13 +87,14 @@ int main(int argc, char **argv)
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
 	unsigned char opt_flags[256] = {};
-	const char *optstr = "i:T:SNFh";
+	const char *optstr = "i:T:P:SNFh";
 	struct bpf_prog_info info = {};
 	__u32 info_len = sizeof(info);
+	__u32 max_pckt_size = 0;
+	__u32 key = 0;
 	unsigned int kill_after_s = 0;
 	int i, prog_fd, map_fd, opt;
 	struct bpf_object *obj;
-	struct bpf_map *map;
 	char filename[256];
 	int err;
 
@@ -110,6 +113,9 @@ int main(int argc, char **argv)
 		case 'T':
 			kill_after_s = atoi(optarg);
 			break;
+		case 'P':
+			max_pckt_size = atoi(optarg);
+			break;
 		case 'S':
 			xdp_flags |= XDP_FLAGS_SKB_MODE;
 			break;
@@ -150,12 +156,22 @@ int main(int argc, char **argv)
 	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
 		return 1;
 
-	map = bpf_map__next(NULL, obj);
-	if (!map) {
-		printf("finding a map in obj file failed\n");
+	/* update pcktsz map */
+	if (max_pckt_size) {
+		map_fd = bpf_object__find_map_fd_by_name(obj, "pcktsz");
+		if (map_fd < 0) {
+			printf("finding a pcktsz map in obj file failed\n");
+			return 1;
+		}
+		bpf_map_update_elem(map_fd, &key, &max_pckt_size, BPF_ANY);
+	}
+
+	/* fetch icmpcnt map */
+	map_fd = bpf_object__find_map_fd_by_name(obj, "icmpcnt");
+	if (map_fd < 0) {
+		printf("finding a icmpcnt map in obj file failed\n");
 		return 1;
 	}
-	map_fd = bpf_map__fd(map);
 
 	if (!prog_fd) {
 		printf("load_bpf_file: %s\n", strerror(errno));
-- 
2.20.1


^ permalink raw reply related

* Re: [bpf-next,v3] samples: bpf: add max_pckt_size option at xdp_adjust_tail
From: Daniel T. Lee @ 2019-09-15  3:24 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Daniel Borkmann, Alexei Starovoitov, netdev@vger.kernel.org,
	bpf@vger.kernel.org
In-Reply-To: <7add91c8-a22c-f10e-76a4-495d8be09c9b@fb.com>

On Sat, Sep 14, 2019 at 7:34 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 9/11/19 8:02 PM, Daniel T. Lee wrote:
> > Currently, at xdp_adjust_tail_kern.c, MAX_PCKT_SIZE is limited
> > to 600. To make this size flexible, a new map 'pcktsz' is added.
> >
> > By updating new packet size to this map from the userland,
> > xdp_adjust_tail_kern.o will use this value as a new max_pckt_size.
> >
> > If no '-P <MAX_PCKT_SIZE>' option is used, the size of maximum packet
> > will be 600 as a default.
> >
> > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> >
> > ---
> > Changes in v2:
> >      - Change the helper to fetch map from 'bpf_map__next' to
> >      'bpf_object__find_map_fd_by_name'.
> >
> >   samples/bpf/xdp_adjust_tail_kern.c | 23 +++++++++++++++++++----
> >   samples/bpf/xdp_adjust_tail_user.c | 28 ++++++++++++++++++++++------
> >   2 files changed, 41 insertions(+), 10 deletions(-)
> >
> > diff --git a/samples/bpf/xdp_adjust_tail_kern.c b/samples/bpf/xdp_adjust_tail_kern.c
> > index 411fdb21f8bc..d6d84ffe6a7a 100644
> > --- a/samples/bpf/xdp_adjust_tail_kern.c
> > +++ b/samples/bpf/xdp_adjust_tail_kern.c
> > @@ -25,6 +25,13 @@
> >   #define ICMP_TOOBIG_SIZE 98
> >   #define ICMP_TOOBIG_PAYLOAD_SIZE 92
> >
> > +struct bpf_map_def SEC("maps") pcktsz = {
> > +     .type = BPF_MAP_TYPE_ARRAY,
> > +     .key_size = sizeof(__u32),
> > +     .value_size = sizeof(__u32),
> > +     .max_entries = 1,
> > +};
>
> We have new map definition format like in
> tools/testing/selftests/bpf/progs/bpf_flow.c.
> But looks like most samples/bpf still use SEC("maps").
> I guess we can leave it for now, and if needed,
> later on a massive conversion for all samples/bpf/
> bpf programs can be done.
>

Thanks for the detailed review!
Didn't notice there was an update with map definition.
This new map definition format looks very neat.

> > +
> >   struct bpf_map_def SEC("maps") icmpcnt = {
> >       .type = BPF_MAP_TYPE_ARRAY,
> >       .key_size = sizeof(__u32),
> > @@ -64,7 +71,8 @@ static __always_inline void ipv4_csum(void *data_start, int data_size,
> >       *csum = csum_fold_helper(*csum);
> >   }
> >
> > -static __always_inline int send_icmp4_too_big(struct xdp_md *xdp)
> > +static __always_inline int send_icmp4_too_big(struct xdp_md *xdp,
> > +                                           __u32 max_pckt_size)
> >   {
> >       int headroom = (int)sizeof(struct iphdr) + (int)sizeof(struct icmphdr);
> >
> > @@ -92,7 +100,7 @@ static __always_inline int send_icmp4_too_big(struct xdp_md *xdp)
> >       orig_iph = data + off;
> >       icmp_hdr->type = ICMP_DEST_UNREACH;
> >       icmp_hdr->code = ICMP_FRAG_NEEDED;
> > -     icmp_hdr->un.frag.mtu = htons(MAX_PCKT_SIZE-sizeof(struct ethhdr));
> > +     icmp_hdr->un.frag.mtu = htons(max_pckt_size - sizeof(struct ethhdr));
> >       icmp_hdr->checksum = 0;
> >       ipv4_csum(icmp_hdr, ICMP_TOOBIG_PAYLOAD_SIZE, &csum);
> >       icmp_hdr->checksum = csum;
> > @@ -118,14 +126,21 @@ static __always_inline int handle_ipv4(struct xdp_md *xdp)
> >   {
> >       void *data_end = (void *)(long)xdp->data_end;
> >       void *data = (void *)(long)xdp->data;
> > +     __u32 max_pckt_size = MAX_PCKT_SIZE;
> > +     __u32 *pckt_sz;
> > +     __u32 key = 0;
>
> The above two new definitions may the code not in
> reverse Christmas definition order, could you fix it?
>

I'll fix it right away!

> >       int pckt_size = data_end - data;
> >       int offset;
> >
> > -     if (pckt_size > MAX_PCKT_SIZE) {
> > +     pckt_sz = bpf_map_lookup_elem(&pcktsz, &key);
> > +     if (pckt_sz && *pckt_sz)
> > +             max_pckt_size = *pckt_sz;
> > +
> > +     if (pckt_size > max_pckt_size) {
> >               offset = pckt_size - ICMP_TOOBIG_SIZE;
> >               if (bpf_xdp_adjust_tail(xdp, 0 - offset))
> >                       return XDP_PASS;
>
> We could have the following scenario:
>    max_pckt_size = 1
>    pckt_size = 2
>    offset = -96
>    bpf_xdp_adjust_tail return -EINVAL
>    so we return XDP_PASS now
>
> Maybe you want to do
>     if (pckt_size > max(max_pckt_size, ICMP_TOOBIG_SIZE)) {
>        ...
>     }
> as in original code, bpf_xdp_adjust_tail(...) already succeeds.
>

I'll try to fix this way.

> > -             return send_icmp4_too_big(xdp);
> > +             return send_icmp4_too_big(xdp, max_pckt_size);
> >       }
> >       return XDP_PASS;
> >   }
> > diff --git a/samples/bpf/xdp_adjust_tail_user.c b/samples/bpf/xdp_adjust_tail_user.c
> > index a3596b617c4c..aef6c69a48a7 100644
> > --- a/samples/bpf/xdp_adjust_tail_user.c
> > +++ b/samples/bpf/xdp_adjust_tail_user.c
> > @@ -23,6 +23,7 @@
> >   #include "libbpf.h"
> >
> >   #define STATS_INTERVAL_S 2U
> > +#define MAX_PCKT_SIZE 600
> >
> >   static int ifindex = -1;
> >   static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> > @@ -72,6 +73,7 @@ static void usage(const char *cmd)
> >       printf("Usage: %s [...]\n", cmd);
> >       printf("    -i <ifname|ifindex> Interface\n");
> >       printf("    -T <stop-after-X-seconds> Default: 0 (forever)\n");
> > +     printf("    -P <MAX_PCKT_SIZE> Default: %u\n", MAX_PCKT_SIZE);
> >       printf("    -S use skb-mode\n");
> >       printf("    -N enforce native mode\n");
> >       printf("    -F force loading prog\n");
> > @@ -85,13 +87,14 @@ int main(int argc, char **argv)
> >               .prog_type      = BPF_PROG_TYPE_XDP,
> >       };
> >       unsigned char opt_flags[256] = {};
> > -     const char *optstr = "i:T:SNFh";
> > +     const char *optstr = "i:T:P:SNFh";
> >       struct bpf_prog_info info = {};
> >       __u32 info_len = sizeof(info);
> > +     __u32 max_pckt_size = 0;
> > +     __u32 key = 0;
> >       unsigned int kill_after_s = 0;
> >       int i, prog_fd, map_fd, opt;
> >       struct bpf_object *obj;
> > -     struct bpf_map *map;
> >       char filename[256];
> >       int err;
> >
> > @@ -110,6 +113,9 @@ int main(int argc, char **argv)
> >               case 'T':
> >                       kill_after_s = atoi(optarg);
> >                       break;
> > +             case 'P':
> > +                     max_pckt_size = atoi(optarg);
> > +                     break;
> >               case 'S':
> >                       xdp_flags |= XDP_FLAGS_SKB_MODE;
> >                       break;
> > @@ -150,12 +156,22 @@ int main(int argc, char **argv)
> >       if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
> >               return 1;
> >
> > -     map = bpf_map__next(NULL, obj);
> > -     if (!map) {
> > -             printf("finding a map in obj file failed\n");
> > +     /* update pcktsz map */
> > +     if (max_pckt_size) {
> > +             map_fd = bpf_object__find_map_fd_by_name(obj, "pcktsz");
> > +             if (!map_fd) {
>
> Let us test map_fd and below prog_fd with '< 0" instead of "!= 0'.
> In this particular sample, "! = 0" is okay since we did not close
> stdin. But in programs if stdin is closed, the fd 0 may be reused
> for map_fd. Let us just keep good coding practice here.
>

I didn't think of those details. I'll update this right away!

Once again, I really appreciate your time and effort for the review.
Thank you.

Best,
Daniel

> > +                     printf("finding a pcktsz map in obj file failed\n");
> > +                     return 1;
> > +             }
> > +             bpf_map_update_elem(map_fd, &key, &max_pckt_size, BPF_ANY);
> > +     }
> > +
> > +     /* fetch icmpcnt map */
> > +     map_fd = bpf_object__find_map_fd_by_name(obj, "icmpcnt");
> > +     if (!map_fd) {
> > +             printf("finding a icmpcnt map in obj file failed\n");
> >               return 1;
> >       }
> > -     map_fd = bpf_map__fd(map);
> >
> >       if (!prog_fd) {
> >               printf("load_bpf_file: %s\n", strerror(errno));
> >

^ permalink raw reply

* Re: pull-request: wireless-drivers-next 2019-09-14
From: David Miller @ 2019-09-15 11:38 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <87muf5df3i.fsf@kamboji.qca.qualcomm.com>

From: Kalle Valo <kvalo@codeaurora.org>
Date: Sun, 15 Sep 2019 13:32:49 +0300

> David Miller <davem@davemloft.net> writes:
> 
>> From: Kalle Valo <kvalo@codeaurora.org>
>> Date: Sat, 14 Sep 2019 13:14:40 +0300
>>
>>> here's a pull request to net-next tree for v5.4, more info below. Please
>>> let me know if there are any problems.
>>
>> Pulled, thanks Kalle.
> 
> Thanks for pulling this but I don't see it in net-next, maybe you forgot
> to push? Nothing important, just making sure it didn't get lost.

I feel asleep while the build test was running, it should be there now :-)

^ permalink raw reply

* Re: [PATCH v2] net: mdio: switch to using gpiod_get_optional()
From: Andy Shevchenko @ 2019-09-15 11:13 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Andy Shevchenko, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	David S. Miller, Linus Walleij, netdev, Linux Kernel Mailing List
In-Reply-To: <20190915060524.GC237523@dtor-ws>

On Sun, Sep 15, 2019 at 9:26 AM Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> On Sat, Sep 14, 2019 at 08:09:33PM +0300, Andy Shevchenko wrote:
> > On Fri, Sep 13, 2019 at 03:55:47PM -0700, Dmitry Torokhov wrote:

> > > +   mdiodev->reset_gpio = gpiod_get_optional(&mdiodev->dev,
> > > +                                            "reset", GPIOD_OUT_LOW);
> > > +   error = PTR_ERR_OR_ZERO(mdiodev->reset_gpio);
> > > +   if (error)
> > > +           return error;

> > > +   if (mdiodev->reset_gpio)
> >
> > This is redundant check.
>
> I see that gpiod_* API handle NULL desc and usually return immediately,
> but frankly I am not that comfortable with it. I'm OK with functions
> that free/destroy objects that recognize NULL resources,

> but it is
> unusual for other types of APIs.

CLK, reset, ... frameworks do check for NULL pointer since they
provide an *_optional() getters. So, it's not unusual.
--
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: pull-request: wireless-drivers-next 2019-09-14
From: Kalle Valo @ 2019-09-15 10:32 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20190914.140843.945413345284987204.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Kalle Valo <kvalo@codeaurora.org>
> Date: Sat, 14 Sep 2019 13:14:40 +0300
>
>> here's a pull request to net-next tree for v5.4, more info below. Please
>> let me know if there are any problems.
>
> Pulled, thanks Kalle.

Thanks for pulling this but I don't see it in net-next, maybe you forgot
to push? Nothing important, just making sure it didn't get lost.

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [patch iproute2-next 2/2] devlink: extend reload command to add support for network namespace change
From: Jiri Pirko @ 2019-09-15  9:44 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, idosch, dsahern, jakub.kicinski, tariqt, saeedm,
	kuznet, yoshfuji, shuah, mlxsw
In-Reply-To: <20190915071639.GA8776@splinter>

Sun, Sep 15, 2019 at 09:16:39AM CEST, idosch@idosch.org wrote:
>On Sat, Sep 14, 2019 at 08:57:57AM +0200, Jiri Pirko wrote:
>> diff --git a/man/man8/devlink-dev.8 b/man/man8/devlink-dev.8
>> index 1804463b2321..0e1a5523fa7b 100644
>> --- a/man/man8/devlink-dev.8
>> +++ b/man/man8/devlink-dev.8
>> @@ -25,6 +25,13 @@ devlink-dev \- devlink device configuration
>>  .ti -8
>>  .B devlink dev help
>>  
>> +.ti -8
>> +.BR "devlink dev set"
>> +.IR DEV
>> +.RI "[ "
>> +.BI "netns { " PID " | " NAME " | " ID " }
>> +.RI "]"
>> +
>>  .ti -8
>>  .BR "devlink dev eswitch set"
>>  .IR DEV
>> @@ -92,6 +99,11 @@ Format is:
>>  .in +2
>>  BUS_NAME/BUS_ADDRESS
>>  
>> +.SS devlink dev set  - sets devlink device attributes
>> +
>> +.TP
>> +.BI "netns { " PID " | " NAME " | " ID " }
>
>This looks like leftover from previous version?

Will fix. Thanks!

>
>> +
>>  .SS devlink dev eswitch show - display devlink device eswitch attributes
>>  .SS devlink dev eswitch set  - sets devlink device eswitch attributes
>>  
>> -- 
>> 2.21.0
>> 

^ permalink raw reply

* Re: [patch net-next 14/15] net: devlink: allow to change namespaces during reload
From: Jiri Pirko @ 2019-09-15  9:43 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, idosch, dsahern, jakub.kicinski, tariqt, saeedm,
	kuznet, yoshfuji, shuah, mlxsw
In-Reply-To: <20190915085829.GE11194@splinter>

Sun, Sep 15, 2019 at 10:58:29AM CEST, idosch@idosch.org wrote:
>On Sat, Sep 14, 2019 at 08:46:07AM +0200, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> All devlink instances are created in init_net and stay there for a
>> lifetime. Allow user to be able to move devlink instances into
>> namespaces during devlink reload operation. That ensures proper
>> re-instantiation of driver objects, including netdevices.
>> 
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  drivers/net/ethernet/mellanox/mlx4/main.c |   4 +
>>  include/uapi/linux/devlink.h              |   4 +
>>  net/core/devlink.c                        | 155 ++++++++++++++++++++--
>>  3 files changed, 155 insertions(+), 8 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
>> index ef3f3d06ff1e..989d0882aaa9 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
>> @@ -3942,6 +3942,10 @@ static int mlx4_devlink_reload_down(struct devlink *devlink,
>>  	struct mlx4_dev *dev = &priv->dev;
>>  	struct mlx4_dev_persistent *persist = dev->persist;
>>  
>> +	if (!net_eq(devlink_net(devlink), &init_net)) {
>> +		NL_SET_ERR_MSG_MOD(extack, "Namespace change is not supported");
>> +		return -EOPNOTSUPP;
>> +	}
>
>Are you sure that this actually works? I see that you first invoke
>reload_down(), then set the new namespace, then invoke reload_up().
>
>So shouldn't this check be done in reload_up() callback instead?

Correct, need to fix this. But I need to do this in down phase so the
objects are not removed.


>
>>  	if (persist->num_vfs)
>>  		mlx4_warn(persist->dev, "Reload performed on PF, will cause reset on operating Virtual Functions\n");
>>  	mlx4_restart_one_down(persist->pdev);
>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>> index 580b7a2e40e1..b558ea88b766 100644
>> --- a/include/uapi/linux/devlink.h
>> +++ b/include/uapi/linux/devlink.h
>> @@ -421,6 +421,10 @@ enum devlink_attr {
>>  
>>  	DEVLINK_ATTR_RELOAD_FAILED,			/* u8 0 or 1 */
>>  
>> +	DEVLINK_ATTR_NETNS_FD,			/* u32 */
>> +	DEVLINK_ATTR_NETNS_PID,			/* u32 */
>> +	DEVLINK_ATTR_NETNS_ID,			/* u32 */
>> +
>>  	/* add new attributes above here, update the policy in devlink.c */
>>  
>>  	__DEVLINK_ATTR_MAX,
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index 362cbbcca225..2a5db95cce3c 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -435,8 +435,16 @@ static void devlink_nl_post_doit(const struct genl_ops *ops,
>>  {
>>  	struct devlink *devlink;
>>  
>> -	devlink = devlink_get_from_info(info);
>> -	if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
>> +	/* When devlink changes netns, it would not be found
>> +	 * by devlink_get_from_info(). So try if it is stored first.
>> +	 */
>> +	if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK) {
>> +		devlink = info->user_ptr[0];
>> +	} else {
>> +		devlink = devlink_get_from_info(info);
>> +		WARN_ON(IS_ERR(devlink));
>> +	}
>> +	if (!IS_ERR(devlink) && ~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
>>  		mutex_unlock(&devlink->lock);
>>  	mutex_unlock(&devlink_mutex);
>>  }
>> @@ -2675,6 +2683,73 @@ devlink_resources_validate(struct devlink *devlink,
>>  	return err;
>>  }
>>  
>> +static struct net *devlink_netns_get(struct sk_buff *skb,
>> +				     struct devlink *devlink,
>> +				     struct genl_info *info)
>> +{
>> +	struct nlattr *netns_pid_attr = info->attrs[DEVLINK_ATTR_NETNS_PID];
>> +	struct nlattr *netns_fd_attr = info->attrs[DEVLINK_ATTR_NETNS_FD];
>> +	struct nlattr *netns_id_attr = info->attrs[DEVLINK_ATTR_NETNS_ID];
>> +	struct net *net;
>> +
>> +	if (!!netns_pid_attr + !!netns_fd_attr + !!netns_id_attr > 1) {
>> +		NL_SET_ERR_MSG(info->extack, "multiple netns identifying attributes specified");
>> +		return ERR_PTR(-EINVAL);
>> +	}
>> +
>> +	if (netns_pid_attr) {
>> +		net = get_net_ns_by_pid(nla_get_u32(netns_pid_attr));
>> +	} else if (netns_fd_attr) {
>> +		net = get_net_ns_by_fd(nla_get_u32(netns_fd_attr));
>> +	} else if (netns_id_attr) {
>> +		net = get_net_ns_by_id(sock_net(skb->sk),
>> +				       nla_get_u32(netns_id_attr));
>> +		if (!net)
>> +			net = ERR_PTR(-EINVAL);
>> +	} else {
>> +		WARN_ON(1);
>> +		net = ERR_PTR(-EINVAL);
>> +	}
>> +	if (IS_ERR(net)) {
>> +		NL_SET_ERR_MSG(info->extack, "Unknown network namespace");
>> +		return ERR_PTR(-EINVAL);
>> +	}
>> +	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
>> +		put_net(net);
>> +		return ERR_PTR(-EPERM);
>> +	}
>> +	return net;
>> +}
>> +
>> +static void devlink_param_notify(struct devlink *devlink,
>> +				 unsigned int port_index,
>> +				 struct devlink_param_item *param_item,
>> +				 enum devlink_command cmd);
>> +
>> +static void devlink_reload_netns_change(struct devlink *devlink,
>> +					struct net *dest_net)
>> +{
>> +	struct devlink_param_item *param_item;
>> +
>> +	/* Userspace needs to be notified about devlink objects
>> +	 * removed from original and entering new network namespace.
>> +	 * The rest of the devlink objects are re-created during
>> +	 * reload process so the notifications are generated separatelly.
>> +	 */
>> +
>> +	list_for_each_entry(param_item, &devlink->param_list, list)
>> +		devlink_param_notify(devlink, 0, param_item,
>> +				     DEVLINK_CMD_PARAM_DEL);
>> +	devlink_notify(devlink, DEVLINK_CMD_DEL);
>> +
>> +	devlink_net_set(devlink, dest_net);
>> +
>> +	devlink_notify(devlink, DEVLINK_CMD_NEW);
>> +	list_for_each_entry(param_item, &devlink->param_list, list)
>> +		devlink_param_notify(devlink, 0, param_item,
>> +				     DEVLINK_CMD_PARAM_NEW);
>> +}
>> +
>>  static bool devlink_reload_supported(struct devlink *devlink)
>>  {
>>  	return devlink->ops->reload_down && devlink->ops->reload_up;
>> @@ -2695,9 +2770,27 @@ bool devlink_is_reload_failed(const struct devlink *devlink)
>>  }
>>  EXPORT_SYMBOL_GPL(devlink_is_reload_failed);
>>  
>> +static int devlink_reload(struct devlink *devlink, struct net *dest_net,
>> +			  struct netlink_ext_ack *extack)
>> +{
>> +	int err;
>> +
>> +	err = devlink->ops->reload_down(devlink, extack);
>> +	if (err)
>> +		return err;
>> +
>> +	if (dest_net && !net_eq(dest_net, devlink_net(devlink)))
>> +		devlink_reload_netns_change(devlink, dest_net);
>> +
>> +	err = devlink->ops->reload_up(devlink, extack);
>> +	devlink_reload_failed_set(devlink, !!err);
>> +	return err;
>> +}
>> +
>>  static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
>>  {
>>  	struct devlink *devlink = info->user_ptr[0];
>> +	struct net *dest_net = NULL;
>>  	int err;
>>  
>>  	if (!devlink_reload_supported(devlink))
>> @@ -2708,11 +2801,20 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
>>  		NL_SET_ERR_MSG_MOD(info->extack, "resources size validation failed");
>>  		return err;
>>  	}
>> -	err = devlink->ops->reload_down(devlink, info->extack);
>> -	if (err)
>> -		return err;
>> -	err = devlink->ops->reload_up(devlink, info->extack);
>> -	devlink_reload_failed_set(devlink, !!err);
>> +
>> +	if (info->attrs[DEVLINK_ATTR_NETNS_PID] ||
>> +	    info->attrs[DEVLINK_ATTR_NETNS_FD] ||
>> +	    info->attrs[DEVLINK_ATTR_NETNS_ID]) {
>> +		dest_net = devlink_netns_get(skb, devlink, info);
>
>Hmm, you're never using 'devlink' there, so I guess you can drop it.

Right, will do.


>
>> +		if (IS_ERR(dest_net))
>> +			return PTR_ERR(dest_net);
>> +	}
>> +
>> +	err = devlink_reload(devlink, dest_net, info->extack);
>> +
>> +	if (dest_net)
>> +		put_net(dest_net);
>> +
>>  	return err;
>>  }
>>  
>> @@ -5794,6 +5896,9 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
>>  	[DEVLINK_ATTR_TRAP_NAME] = { .type = NLA_NUL_STRING },
>>  	[DEVLINK_ATTR_TRAP_ACTION] = { .type = NLA_U8 },
>>  	[DEVLINK_ATTR_TRAP_GROUP_NAME] = { .type = NLA_NUL_STRING },
>> +	[DEVLINK_ATTR_NETNS_PID] = { .type = NLA_U32 },
>> +	[DEVLINK_ATTR_NETNS_FD] = { .type = NLA_U32 },
>> +	[DEVLINK_ATTR_NETNS_ID] = { .type = NLA_U32 },
>>  };
>>  
>>  static const struct genl_ops devlink_nl_ops[] = {
>> @@ -8061,9 +8166,43 @@ int devlink_compat_switch_id_get(struct net_device *dev,
>>  	return 0;
>>  }
>>  
>> +static void __net_exit devlink_pernet_pre_exit(struct net *net)
>> +{
>> +	struct devlink *devlink;
>> +	int err;
>> +
>> +	/* In case network namespace is getting destroyed, reload
>> +	 * all devlink instances from this namespace into init_net.
>> +	 */
>> +	mutex_lock(&devlink_mutex);
>> +	list_for_each_entry(devlink, &devlink_list, list) {
>> +		if (net_eq(devlink_net(devlink), net)) {
>> +			if (WARN_ON(!devlink_reload_supported(devlink)))
>> +				continue;
>> +			err = devlink_reload(devlink, &init_net, NULL);
>> +			if (err)
>> +				pr_warn("Failed to reload devlink instance into init_net\n");
>> +		}
>> +	}
>> +	mutex_unlock(&devlink_mutex);
>> +}
>> +
>> +static struct pernet_operations devlink_pernet_ops __net_initdata = {
>> +	.pre_exit = devlink_pernet_pre_exit,
>> +};
>> +
>>  static int __init devlink_init(void)
>>  {
>> -	return genl_register_family(&devlink_nl_family);
>> +	int err;
>> +
>> +	err = genl_register_family(&devlink_nl_family);
>> +	if (err)
>> +		goto out;
>> +	err = register_pernet_subsys(&devlink_pernet_ops);
>> +
>> +out:
>> +	WARN_ON(err);
>> +	return err;
>>  }
>>  
>>  subsys_initcall(devlink_init);
>> -- 
>> 2.21.0
>> 

^ permalink raw reply

* Re: [patch net-next 03/15] net: fib_notifier: propagate possible error during fib notifier registration
From: Jiri Pirko @ 2019-09-15  9:41 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, idosch, dsahern, jakub.kicinski, tariqt, saeedm,
	kuznet, yoshfuji, shuah, mlxsw
In-Reply-To: <20190915081746.GB11194@splinter>

Sun, Sep 15, 2019 at 10:17:46AM CEST, idosch@idosch.org wrote:
>On Sat, Sep 14, 2019 at 08:45:56AM +0200, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Unlike events for registered notifier, during the registration, the
>> errors that happened for the block being registered are not propagated
>> up to the caller. For fib rules, this is already present, but not for
>
>What do you mean by "already present" ? You added it below for rules as
>well...

Right, will fix.


>
>> fib entries. So make sure the error is propagated for those as well.
>> 
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/ip_fib.h    |  2 +-
>>  net/core/fib_notifier.c |  2 --
>>  net/core/fib_rules.c    | 11 ++++++++---
>>  net/ipv4/fib_notifier.c |  4 +---
>>  net/ipv4/fib_trie.c     | 31 ++++++++++++++++++++++---------
>>  net/ipv4/ipmr_base.c    | 22 +++++++++++++++-------
>>  net/ipv6/ip6_fib.c      | 36 ++++++++++++++++++++++++------------
>>  7 files changed, 71 insertions(+), 37 deletions(-)
>> 
>> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
>> index 4cec9ecaa95e..caae0fa610aa 100644
>> --- a/include/net/ip_fib.h
>> +++ b/include/net/ip_fib.h
>> @@ -229,7 +229,7 @@ int __net_init fib4_notifier_init(struct net *net);
>>  void __net_exit fib4_notifier_exit(struct net *net);
>>  
>>  void fib_info_notify_update(struct net *net, struct nl_info *info);
>> -void fib_notify(struct net *net, struct notifier_block *nb);
>> +int fib_notify(struct net *net, struct notifier_block *nb);
>>  
>>  struct fib_table {
>>  	struct hlist_node	tb_hlist;
>> diff --git a/net/core/fib_notifier.c b/net/core/fib_notifier.c
>> index b965f3c0ec9a..fbd029425638 100644
>> --- a/net/core/fib_notifier.c
>> +++ b/net/core/fib_notifier.c
>> @@ -65,8 +65,6 @@ static int fib_net_dump(struct net *net, struct notifier_block *nb)
>>  
>>  	rcu_read_lock();
>>  	list_for_each_entry_rcu(ops, &fn_net->fib_notifier_ops, list) {
>> -		int err;
>
>Looks like this should have been removed in previous patch

Correct, will move.


>
>> -
>>  		if (!try_module_get(ops->owner))
>>  			continue;
>>  		err = ops->fib_dump(net, nb);
>> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
>> index 28cbf07102bc..592d8aef90e3 100644
>> --- a/net/core/fib_rules.c
>> +++ b/net/core/fib_rules.c
>> @@ -354,15 +354,20 @@ int fib_rules_dump(struct net *net, struct notifier_block *nb, int family)
>>  {
>>  	struct fib_rules_ops *ops;
>>  	struct fib_rule *rule;
>> +	int err = 0;
>>  
>>  	ops = lookup_rules_ops(net, family);
>>  	if (!ops)
>>  		return -EAFNOSUPPORT;
>> -	list_for_each_entry_rcu(rule, &ops->rules_list, list)
>> -		call_fib_rule_notifier(nb, FIB_EVENT_RULE_ADD, rule, family);
>> +	list_for_each_entry_rcu(rule, &ops->rules_list, list) {
>> +		err = call_fib_rule_notifier(nb, FIB_EVENT_RULE_ADD,
>> +					     rule, family);
>
>Here you add it for rules
>
>> +		if (err)
>> +			break;
>> +	}
>>  	rules_ops_put(ops);
>>  
>> -	return 0;
>> +	return err;
>>  }
>>  EXPORT_SYMBOL_GPL(fib_rules_dump);

^ permalink raw reply

* Re: [patch net-next 02/15] net: fib_notifier: make FIB notifier per-netns
From: Jiri Pirko @ 2019-09-15  9:37 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, idosch, dsahern, jakub.kicinski, tariqt, saeedm,
	kuznet, yoshfuji, shuah, mlxsw
In-Reply-To: <20190915080602.GA11194@splinter>

Sun, Sep 15, 2019 at 10:06:02AM CEST, idosch@idosch.org wrote:
>On Sat, Sep 14, 2019 at 08:45:55AM +0200, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Currently all users of FIB notifier only cares about events in init_net.
>
>s/cares/care/

ok


>
>> Later in this patchset, users get interested in other namespaces too.
>> However, for every registered block user is interested only about one
>> namespace. Make the FIB notifier registration per-netns and avoid
>> unnecessary calls of notifier block for other namespaces.
>
>...
>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
>> index 5d20d615663e..fe0cc969cf94 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
>> @@ -248,9 +248,6 @@ static int mlx5_lag_fib_event(struct notifier_block *nb,
>>  	struct net_device *fib_dev;
>>  	struct fib_info *fi;
>>  
>> -	if (!net_eq(info->net, &init_net))
>> -		return NOTIFY_DONE;
>
>I don't see anymore uses of 'info->net'. Can it be removed from 'struct
>fib_notifier_info' ?

correct. I missed that.


^ permalink raw reply

* rt_uses_gateway was removed?
From: Julian Anastasov @ 2019-09-15  9:08 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, Ido Schimmel


	Hello,

	Looks like I'm a bit late with the storm of changes
in the routing.

	By default, after allocation rt_uses_gateway was set to 0.
Later it can be set to 1 if nh_gw is not the final route target,
i.e. it is indirect GW and not a target on LAN (the RT_SCOPE_LINK
check in rt_set_nexthop).

	What remains hidden for the rt_uses_gateway semantic
is this code in rt_set_nexthop:

	if (unlikely(!cached)) {
		/* Routes we intend to cache in nexthop exception or
		 * FIB nexthop have the DST_NOCACHE bit clear.
		 * However, if we are unsuccessful at storing this
		 * route into the cache we really need to set it.
		 */
		if (!rt->rt_gateway)
			rt->rt_gateway = daddr;
		rt_add_uncached_list(rt);
	}

and this code in rt_bind_exception:

	if (!rt->rt_gateway)
		rt->rt_gateway = daddr;

	I.e. even if rt_uses_gateway remains 0, rt_gateway
can contain address, i.e. the target is on LAN and not
behind nh_gw.

	Now I see commit 1550c171935d wrongly changes that to
"If rt_gw_family is set it implies rt_uses_gateway.".
As result, we set rt_gw_family while rt_uses_gateway was 0
for above cases. Think about it in this way: there should be
a reason why we used rt_uses_gateway flag instead a simple
"rt_gateway != 0" check, right?

	Replacing rt->rt_gateway checks with rt_gw_family
checks is right but rt_uses_gateway checks should be put
back because they indicates the route has more hops to
target.

	As the problem is related to some FNHW and non-cached
routes, redirects, etc the easiest way to see the problem is with
'ip route get LOCAL_IP oif eth0' where extra 'via GW' line is
shown.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [patch net-next 14/15] net: devlink: allow to change namespaces during reload
From: Ido Schimmel @ 2019-09-15  8:58 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, dsahern, jakub.kicinski, tariqt, saeedm,
	kuznet, yoshfuji, shuah, mlxsw
In-Reply-To: <20190914064608.26799-15-jiri@resnulli.us>

On Sat, Sep 14, 2019 at 08:46:07AM +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> All devlink instances are created in init_net and stay there for a
> lifetime. Allow user to be able to move devlink instances into
> namespaces during devlink reload operation. That ensures proper
> re-instantiation of driver objects, including netdevices.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/main.c |   4 +
>  include/uapi/linux/devlink.h              |   4 +
>  net/core/devlink.c                        | 155 ++++++++++++++++++++--
>  3 files changed, 155 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
> index ef3f3d06ff1e..989d0882aaa9 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
> @@ -3942,6 +3942,10 @@ static int mlx4_devlink_reload_down(struct devlink *devlink,
>  	struct mlx4_dev *dev = &priv->dev;
>  	struct mlx4_dev_persistent *persist = dev->persist;
>  
> +	if (!net_eq(devlink_net(devlink), &init_net)) {
> +		NL_SET_ERR_MSG_MOD(extack, "Namespace change is not supported");
> +		return -EOPNOTSUPP;
> +	}

Are you sure that this actually works? I see that you first invoke
reload_down(), then set the new namespace, then invoke reload_up().

So shouldn't this check be done in reload_up() callback instead?

>  	if (persist->num_vfs)
>  		mlx4_warn(persist->dev, "Reload performed on PF, will cause reset on operating Virtual Functions\n");
>  	mlx4_restart_one_down(persist->pdev);
> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> index 580b7a2e40e1..b558ea88b766 100644
> --- a/include/uapi/linux/devlink.h
> +++ b/include/uapi/linux/devlink.h
> @@ -421,6 +421,10 @@ enum devlink_attr {
>  
>  	DEVLINK_ATTR_RELOAD_FAILED,			/* u8 0 or 1 */
>  
> +	DEVLINK_ATTR_NETNS_FD,			/* u32 */
> +	DEVLINK_ATTR_NETNS_PID,			/* u32 */
> +	DEVLINK_ATTR_NETNS_ID,			/* u32 */
> +
>  	/* add new attributes above here, update the policy in devlink.c */
>  
>  	__DEVLINK_ATTR_MAX,
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index 362cbbcca225..2a5db95cce3c 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -435,8 +435,16 @@ static void devlink_nl_post_doit(const struct genl_ops *ops,
>  {
>  	struct devlink *devlink;
>  
> -	devlink = devlink_get_from_info(info);
> -	if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
> +	/* When devlink changes netns, it would not be found
> +	 * by devlink_get_from_info(). So try if it is stored first.
> +	 */
> +	if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK) {
> +		devlink = info->user_ptr[0];
> +	} else {
> +		devlink = devlink_get_from_info(info);
> +		WARN_ON(IS_ERR(devlink));
> +	}
> +	if (!IS_ERR(devlink) && ~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
>  		mutex_unlock(&devlink->lock);
>  	mutex_unlock(&devlink_mutex);
>  }
> @@ -2675,6 +2683,73 @@ devlink_resources_validate(struct devlink *devlink,
>  	return err;
>  }
>  
> +static struct net *devlink_netns_get(struct sk_buff *skb,
> +				     struct devlink *devlink,
> +				     struct genl_info *info)
> +{
> +	struct nlattr *netns_pid_attr = info->attrs[DEVLINK_ATTR_NETNS_PID];
> +	struct nlattr *netns_fd_attr = info->attrs[DEVLINK_ATTR_NETNS_FD];
> +	struct nlattr *netns_id_attr = info->attrs[DEVLINK_ATTR_NETNS_ID];
> +	struct net *net;
> +
> +	if (!!netns_pid_attr + !!netns_fd_attr + !!netns_id_attr > 1) {
> +		NL_SET_ERR_MSG(info->extack, "multiple netns identifying attributes specified");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (netns_pid_attr) {
> +		net = get_net_ns_by_pid(nla_get_u32(netns_pid_attr));
> +	} else if (netns_fd_attr) {
> +		net = get_net_ns_by_fd(nla_get_u32(netns_fd_attr));
> +	} else if (netns_id_attr) {
> +		net = get_net_ns_by_id(sock_net(skb->sk),
> +				       nla_get_u32(netns_id_attr));
> +		if (!net)
> +			net = ERR_PTR(-EINVAL);
> +	} else {
> +		WARN_ON(1);
> +		net = ERR_PTR(-EINVAL);
> +	}
> +	if (IS_ERR(net)) {
> +		NL_SET_ERR_MSG(info->extack, "Unknown network namespace");
> +		return ERR_PTR(-EINVAL);
> +	}
> +	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
> +		put_net(net);
> +		return ERR_PTR(-EPERM);
> +	}
> +	return net;
> +}
> +
> +static void devlink_param_notify(struct devlink *devlink,
> +				 unsigned int port_index,
> +				 struct devlink_param_item *param_item,
> +				 enum devlink_command cmd);
> +
> +static void devlink_reload_netns_change(struct devlink *devlink,
> +					struct net *dest_net)
> +{
> +	struct devlink_param_item *param_item;
> +
> +	/* Userspace needs to be notified about devlink objects
> +	 * removed from original and entering new network namespace.
> +	 * The rest of the devlink objects are re-created during
> +	 * reload process so the notifications are generated separatelly.
> +	 */
> +
> +	list_for_each_entry(param_item, &devlink->param_list, list)
> +		devlink_param_notify(devlink, 0, param_item,
> +				     DEVLINK_CMD_PARAM_DEL);
> +	devlink_notify(devlink, DEVLINK_CMD_DEL);
> +
> +	devlink_net_set(devlink, dest_net);
> +
> +	devlink_notify(devlink, DEVLINK_CMD_NEW);
> +	list_for_each_entry(param_item, &devlink->param_list, list)
> +		devlink_param_notify(devlink, 0, param_item,
> +				     DEVLINK_CMD_PARAM_NEW);
> +}
> +
>  static bool devlink_reload_supported(struct devlink *devlink)
>  {
>  	return devlink->ops->reload_down && devlink->ops->reload_up;
> @@ -2695,9 +2770,27 @@ bool devlink_is_reload_failed(const struct devlink *devlink)
>  }
>  EXPORT_SYMBOL_GPL(devlink_is_reload_failed);
>  
> +static int devlink_reload(struct devlink *devlink, struct net *dest_net,
> +			  struct netlink_ext_ack *extack)
> +{
> +	int err;
> +
> +	err = devlink->ops->reload_down(devlink, extack);
> +	if (err)
> +		return err;
> +
> +	if (dest_net && !net_eq(dest_net, devlink_net(devlink)))
> +		devlink_reload_netns_change(devlink, dest_net);
> +
> +	err = devlink->ops->reload_up(devlink, extack);
> +	devlink_reload_failed_set(devlink, !!err);
> +	return err;
> +}
> +
>  static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
>  {
>  	struct devlink *devlink = info->user_ptr[0];
> +	struct net *dest_net = NULL;
>  	int err;
>  
>  	if (!devlink_reload_supported(devlink))
> @@ -2708,11 +2801,20 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
>  		NL_SET_ERR_MSG_MOD(info->extack, "resources size validation failed");
>  		return err;
>  	}
> -	err = devlink->ops->reload_down(devlink, info->extack);
> -	if (err)
> -		return err;
> -	err = devlink->ops->reload_up(devlink, info->extack);
> -	devlink_reload_failed_set(devlink, !!err);
> +
> +	if (info->attrs[DEVLINK_ATTR_NETNS_PID] ||
> +	    info->attrs[DEVLINK_ATTR_NETNS_FD] ||
> +	    info->attrs[DEVLINK_ATTR_NETNS_ID]) {
> +		dest_net = devlink_netns_get(skb, devlink, info);

Hmm, you're never using 'devlink' there, so I guess you can drop it.

> +		if (IS_ERR(dest_net))
> +			return PTR_ERR(dest_net);
> +	}
> +
> +	err = devlink_reload(devlink, dest_net, info->extack);
> +
> +	if (dest_net)
> +		put_net(dest_net);
> +
>  	return err;
>  }
>  
> @@ -5794,6 +5896,9 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
>  	[DEVLINK_ATTR_TRAP_NAME] = { .type = NLA_NUL_STRING },
>  	[DEVLINK_ATTR_TRAP_ACTION] = { .type = NLA_U8 },
>  	[DEVLINK_ATTR_TRAP_GROUP_NAME] = { .type = NLA_NUL_STRING },
> +	[DEVLINK_ATTR_NETNS_PID] = { .type = NLA_U32 },
> +	[DEVLINK_ATTR_NETNS_FD] = { .type = NLA_U32 },
> +	[DEVLINK_ATTR_NETNS_ID] = { .type = NLA_U32 },
>  };
>  
>  static const struct genl_ops devlink_nl_ops[] = {
> @@ -8061,9 +8166,43 @@ int devlink_compat_switch_id_get(struct net_device *dev,
>  	return 0;
>  }
>  
> +static void __net_exit devlink_pernet_pre_exit(struct net *net)
> +{
> +	struct devlink *devlink;
> +	int err;
> +
> +	/* In case network namespace is getting destroyed, reload
> +	 * all devlink instances from this namespace into init_net.
> +	 */
> +	mutex_lock(&devlink_mutex);
> +	list_for_each_entry(devlink, &devlink_list, list) {
> +		if (net_eq(devlink_net(devlink), net)) {
> +			if (WARN_ON(!devlink_reload_supported(devlink)))
> +				continue;
> +			err = devlink_reload(devlink, &init_net, NULL);
> +			if (err)
> +				pr_warn("Failed to reload devlink instance into init_net\n");
> +		}
> +	}
> +	mutex_unlock(&devlink_mutex);
> +}
> +
> +static struct pernet_operations devlink_pernet_ops __net_initdata = {
> +	.pre_exit = devlink_pernet_pre_exit,
> +};
> +
>  static int __init devlink_init(void)
>  {
> -	return genl_register_family(&devlink_nl_family);
> +	int err;
> +
> +	err = genl_register_family(&devlink_nl_family);
> +	if (err)
> +		goto out;
> +	err = register_pernet_subsys(&devlink_pernet_ops);
> +
> +out:
> +	WARN_ON(err);
> +	return err;
>  }
>  
>  subsys_initcall(devlink_init);
> -- 
> 2.21.0
> 

^ permalink raw reply

* Здравствуйте! Вас интересуют клиентские базы данных?
From: netdev @ 2019-09-14 23:39 UTC (permalink / raw)
  To: netdev

Здравствуйте! Вас интересуют клиентские базы данных?

^ permalink raw reply

* Re: [patch net-next 09/15] mlxsw: Propagate extack down to register_fib_notifier()
From: Ido Schimmel @ 2019-09-15  8:39 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, dsahern, jakub.kicinski, tariqt, saeedm,
	kuznet, yoshfuji, shuah, mlxsw
In-Reply-To: <20190914064608.26799-10-jiri@resnulli.us>

On Sat, Sep 14, 2019 at 08:46:02AM +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> During the devlink reaload the extack is present, so propagate it all

s/reaload/reload/

> the way down to register_fib_notifier() call in spectrum_router.c.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox