[PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily
@ 2019-07-21 12:53 Zhu Yanjun
  2019-07-21 12:53 ` [PATCHv2 1/2] forcedeth: add recv cache to make nic " Zhu Yanjun
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Zhu Yanjun @ 2019-07-21 12:53 UTC (permalink / raw)
  To: yanjun.zhu, davem, netdev

These patches are to this scenario:

"
When the host run for long time, there are a lot of memory fragments in
the hosts. And it is possible that kernel will compact memory fragments.
But normally it is difficult for NIC driver to allocate a memory from
kernel. From this variable stat_rx_dropped, we can confirm that NIC driver
can not allocate skb very frequently.
"

Since NIC driver can not allocate skb in time, this makes some important
tasks not be completed in time.
To avoid it, a recv cache is created to pre-allocate skb for NIC driver.
This can make the important tasks be completed in time.
From Nan's tests in LAB, these patches can make NIC driver work steadily.
Now in production hosts, these patches are applied.

With these patches, one NIC port needs 125MiB reserved. This 125MiB memory
can not be used by others. To a host on which the communications are not
mandatory, it is not necessary to reserve so much memory. So this recv cache
is disabled by default.

V1->V2:
1. ndelay is replaced with GFP_KERNEL function __netdev_alloc_skb.
2. skb_queue_purge is used when recv cache is destroyed.
3. RECV_LIST_ALLOCATE bit is removed.
4. schedule_delayed_work is moved out of while loop.

Zhu Yanjun (2):
  forcedeth: add recv cache make nic work steadily
  forcedeth: disable recv cache by default

 drivers/net/ethernet/nvidia/Kconfig     |  11 +++
 drivers/net/ethernet/nvidia/Makefile    |   1 +
 drivers/net/ethernet/nvidia/forcedeth.c | 129 +++++++++++++++++++++++++++++++-
 3 files changed, 139 insertions(+), 2 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCHv2 1/2] forcedeth: add recv cache to make nic work steadily
  2019-07-21 12:53 [PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily Zhu Yanjun
@ 2019-07-21 12:53 ` Zhu Yanjun
  2019-07-21 12:53 ` [PATCHv2 2/2] forcedeth: disable recv cache by default Zhu Yanjun
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Zhu Yanjun @ 2019-07-21 12:53 UTC (permalink / raw)
  To: yanjun.zhu, davem, netdev

A recv cache is added. The size of recv cache is 1000Mbit / skb_length.
When the system memory is not enough, this recv cache can make nic work
steadily.
When nic is up, this recv cache and work queue are created. When nic
is down, this recv cache will be destroyed and delayed workqueue is
canceled.
When nic is polled or rx interrupt is triggerred, rx handler will
get a skb from recv cache. Then a work is queued to fill up recv cache.
When skb size is changed, the old recv cache is destroyed and new recv
cache is created.
When the system memory is not enough, the allocation of skb failed. Then
recv cache will continue allocate skb with GFP_KERNEL until the recv
cache is filled up. When the system memory is not enough, this can make
nic work steadily. Becase of recv cache, the performance of nic is
enhanced.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Tested-by: Nan san <nan.1986san@gmail.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c | 103 +++++++++++++++++++++++++++++++-
 1 file changed, 101 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index b327b29..f8e766f 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -674,6 +674,11 @@ struct nv_ethtool_stats {
 	u64 tx_broadcast;
 };
 
+/* 1000Mb is 125M bytes, 125 * 1024 * 1024 bytes
+ * The length of recv cache is 125M / skb_length
+ */
+#define RECV_CACHE_LIST_LENGTH		(125 * 1024 * 1024 / np->rx_buf_sz)
+
 #define NV_DEV_STATISTICS_V3_COUNT (sizeof(struct nv_ethtool_stats)/sizeof(u64))
 #define NV_DEV_STATISTICS_V2_COUNT (NV_DEV_STATISTICS_V3_COUNT - 3)
 #define NV_DEV_STATISTICS_V1_COUNT (NV_DEV_STATISTICS_V2_COUNT - 6)
@@ -844,6 +849,11 @@ struct fe_priv {
 	char name_rx[IFNAMSIZ + 3];       /* -rx    */
 	char name_tx[IFNAMSIZ + 3];       /* -tx    */
 	char name_other[IFNAMSIZ + 6];    /* -other */
+
+	/* This is to schedule work */
+	struct delayed_work     recv_cache_work;
+	/* This list is to store skb queue for recv */
+	struct sk_buff_head recv_list;
 };
 
 /*
@@ -1804,7 +1814,8 @@ static int nv_alloc_rx(struct net_device *dev)
 		less_rx = np->last_rx.orig;
 
 	while (np->put_rx.orig != less_rx) {
-		struct sk_buff *skb = netdev_alloc_skb(dev, np->rx_buf_sz + NV_RX_ALLOC_PAD);
+		struct sk_buff *skb = skb_dequeue(&np->recv_list);
+
 		if (likely(skb)) {
 			np->put_rx_ctx->skb = skb;
 			np->put_rx_ctx->dma = dma_map_single(&np->pci_dev->dev,
@@ -1829,9 +1840,15 @@ static int nv_alloc_rx(struct net_device *dev)
 			u64_stats_update_begin(&np->swstats_rx_syncp);
 			np->stat_rx_dropped++;
 			u64_stats_update_end(&np->swstats_rx_syncp);
+
+			schedule_delayed_work(&np->recv_cache_work, 0);
+
 			return 1;
 		}
 	}
+
+	schedule_delayed_work(&np->recv_cache_work, 0);
+
 	return 0;
 }
 
@@ -1845,7 +1862,8 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
 		less_rx = np->last_rx.ex;
 
 	while (np->put_rx.ex != less_rx) {
-		struct sk_buff *skb = netdev_alloc_skb(dev, np->rx_buf_sz + NV_RX_ALLOC_PAD);
+		struct sk_buff *skb = skb_dequeue(&np->recv_list);
+
 		if (likely(skb)) {
 			np->put_rx_ctx->skb = skb;
 			np->put_rx_ctx->dma = dma_map_single(&np->pci_dev->dev,
@@ -1871,9 +1889,15 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
 			u64_stats_update_begin(&np->swstats_rx_syncp);
 			np->stat_rx_dropped++;
 			u64_stats_update_end(&np->swstats_rx_syncp);
+
+			schedule_delayed_work(&np->recv_cache_work, 0);
+
 			return 1;
 		}
 	}
+
+	schedule_delayed_work(&np->recv_cache_work, 0);
+
 	return 0;
 }
 
@@ -1957,6 +1981,43 @@ static void nv_init_tx(struct net_device *dev)
 	}
 }
 
+static void nv_init_recv_cache(struct net_device *dev)
+{
+	struct fe_priv *np = netdev_priv(dev);
+
+	skb_queue_head_init(&np->recv_list);
+	while (skb_queue_len(&np->recv_list) < RECV_CACHE_LIST_LENGTH) {
+		struct sk_buff *skb = netdev_alloc_skb(dev,
+				 np->rx_buf_sz + NV_RX_ALLOC_PAD);
+		/* skb is null. This indicates that memory is not
+		 * enough.
+		 */
+		if (unlikely(!skb)) {
+			/* When allocating memory with GFP_ATOMIC fails,
+			 * allocating with GFP_KERNEL will get memory
+			 * finally.
+			 */
+			skb = __netdev_alloc_skb(dev,
+						 np->rx_buf_sz +
+						 NV_RX_ALLOC_PAD,
+						 GFP_KERNEL);
+		}
+
+		skb_queue_tail(&np->recv_list, skb);
+	}
+}
+
+static void nv_destroy_recv_cache(struct net_device *dev)
+{
+	struct fe_priv *np = netdev_priv(dev);
+
+	cancel_delayed_work_sync(&np->recv_cache_work);
+	WARN_ON(delayed_work_pending(&np->recv_cache_work));
+
+	skb_queue_purge(&np->recv_list);
+	WARN_ON(skb_queue_len(&np->recv_list));
+}
+
 static int nv_init_ring(struct net_device *dev)
 {
 	struct fe_priv *np = netdev_priv(dev);
@@ -3047,6 +3108,8 @@ static int nv_change_mtu(struct net_device *dev, int new_mtu)
 		nv_drain_rxtx(dev);
 		/* reinit driver view of the rx queue */
 		set_bufsize(dev);
+		nv_destroy_recv_cache(dev);
+		nv_init_recv_cache(dev);
 		if (nv_init_ring(dev)) {
 			if (!np->in_shutdown)
 				mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
@@ -4074,6 +4137,32 @@ static void nv_free_irq(struct net_device *dev)
 	}
 }
 
+static void nv_recv_cache_worker(struct work_struct *work)
+{
+	struct fe_priv *np = container_of(work, struct fe_priv,
+					  recv_cache_work.work);
+
+	while (skb_queue_len(&np->recv_list) < RECV_CACHE_LIST_LENGTH) {
+		struct sk_buff *skb = netdev_alloc_skb(np->dev,
+				np->rx_buf_sz + NV_RX_ALLOC_PAD);
+		/* skb is null. This indicates that memory is not
+		 * enough.
+		 */
+		if (unlikely(!skb)) {
+			/* When allocating memory with GFP_ATOMIC fails,
+			 * allocating with GFP_KERNEL will get memory
+			 * finally.
+			 */
+			skb = __netdev_alloc_skb(np->dev,
+						 np->rx_buf_sz +
+						 NV_RX_ALLOC_PAD,
+						 GFP_KERNEL);
+		}
+
+		skb_queue_tail(&np->recv_list, skb);
+	}
+}
+
 static void nv_do_nic_poll(struct timer_list *t)
 {
 	struct fe_priv *np = from_timer(np, t, nic_poll);
@@ -4129,6 +4218,8 @@ static void nv_do_nic_poll(struct timer_list *t)
 			nv_drain_rxtx(dev);
 			/* reinit driver view of the rx queue */
 			set_bufsize(dev);
+			nv_destroy_recv_cache(dev);
+			nv_init_recv_cache(dev);
 			if (nv_init_ring(dev)) {
 				if (!np->in_shutdown)
 					mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
@@ -4681,6 +4772,8 @@ static int nv_set_ringparam(struct net_device *dev, struct ethtool_ringparam* ri
 	if (netif_running(dev)) {
 		/* reinit driver view of the queues */
 		set_bufsize(dev);
+		nv_destroy_recv_cache(dev);
+		nv_init_recv_cache(dev);
 		if (nv_init_ring(dev)) {
 			if (!np->in_shutdown)
 				mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
@@ -5402,6 +5495,9 @@ static int nv_open(struct net_device *dev)
 
 	/* initialize descriptor rings */
 	set_bufsize(dev);
+	nv_init_recv_cache(dev);
+
+	INIT_DELAYED_WORK(&np->recv_cache_work, nv_recv_cache_worker);
 	oom = nv_init_ring(dev);
 
 	writel(0, base + NvRegLinkSpeed);
@@ -5583,6 +5679,9 @@ static int nv_close(struct net_device *dev)
 		nv_txrx_gate(dev, true);
 	}
 
+	/* free all SKBs in recv cache */
+	nv_destroy_recv_cache(dev);
+
 	/* FIXME: power down nic */
 
 	return 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCHv2 2/2] forcedeth: disable recv cache by default
  2019-07-21 12:53 [PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily Zhu Yanjun
  2019-07-21 12:53 ` [PATCHv2 1/2] forcedeth: add recv cache to make nic " Zhu Yanjun
@ 2019-07-21 12:53 ` Zhu Yanjun
  2019-07-21 14:48   ` Andrew Lunn
  2019-07-21 14:53 ` [PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily Andrew Lunn
  2019-07-21 18:45 ` David Miller
  3 siblings, 1 reply; 6+ messages in thread
From: Zhu Yanjun @ 2019-07-21 12:53 UTC (permalink / raw)
  To: yanjun.zhu, davem, netdev

The recv cache is to allocate 125MiB memory to reserve for NIC.
In the past time, this recv cache works very well. When the memory
is not enough, this recv cache reserves memory for NIC.
And the communications through this NIC is not affected by the
memory shortage. And the performance of NIC is better because of
this recv cache.
But this recv cache reserves 125MiB memory for one NIC port. Normally
there are 2 NIC ports in one card. So in a host, there are about 250
MiB memory reserved for NIC ports. To a host on which communications
are not mandatory, it is not necessary to reserve memory.
So this recv cache is disabled by default.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Tested-by: Nan san <nan.1986san@gmail.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 drivers/net/ethernet/nvidia/Kconfig     | 11 ++++++++
 drivers/net/ethernet/nvidia/Makefile    |  1 +
 drivers/net/ethernet/nvidia/forcedeth.c | 46 ++++++++++++++++++++++++++-------
 3 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/Kconfig b/drivers/net/ethernet/nvidia/Kconfig
index faacbd1..9a9f42a 100644
--- a/drivers/net/ethernet/nvidia/Kconfig
+++ b/drivers/net/ethernet/nvidia/Kconfig
@@ -26,4 +26,15 @@ config FORCEDETH
 	  To compile this driver as a module, choose M here. The module
 	  will be called forcedeth.
 
+config	FORCEDETH_RECV_CACHE
+	bool "nForce Ethernet recv cache support"
+	depends on FORCEDETH
+	default n
+	---help---
+	  The recv cache can make nic work steadily when the system memory is
+	  not enough. And it can also enhance nic performance. But to a host
+	  on which the communications are not mandatory, it is not necessary
+	  to reserve 125MiB memory for NIC.
+	  So recv cache is disabled by default.
+
 endif # NET_VENDOR_NVIDIA
diff --git a/drivers/net/ethernet/nvidia/Makefile b/drivers/net/ethernet/nvidia/Makefile
index 8935699..40c055e 100644
--- a/drivers/net/ethernet/nvidia/Makefile
+++ b/drivers/net/ethernet/nvidia/Makefile
@@ -4,3 +4,4 @@
 #
 
 obj-$(CONFIG_FORCEDETH) += forcedeth.o
+ccflags-$(CONFIG_FORCEDETH_RECV_CACHE)	:=	-DFORCEDETH_RECV_CACHE
diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index f8e766f..deda276 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -674,10 +674,12 @@ struct nv_ethtool_stats {
 	u64 tx_broadcast;
 };
 
+#ifdef FORCEDETH_RECV_CACHE
 /* 1000Mb is 125M bytes, 125 * 1024 * 1024 bytes
  * The length of recv cache is 125M / skb_length
  */
 #define RECV_CACHE_LIST_LENGTH		(125 * 1024 * 1024 / np->rx_buf_sz)
+#endif
 
 #define NV_DEV_STATISTICS_V3_COUNT (sizeof(struct nv_ethtool_stats)/sizeof(u64))
 #define NV_DEV_STATISTICS_V2_COUNT (NV_DEV_STATISTICS_V3_COUNT - 3)
@@ -850,10 +852,12 @@ struct fe_priv {
 	char name_tx[IFNAMSIZ + 3];       /* -tx    */
 	char name_other[IFNAMSIZ + 6];    /* -other */
 
+#ifdef FORCEDETH_RECV_CACHE
 	/* This is to schedule work */
 	struct delayed_work     recv_cache_work;
 	/* This list is to store skb queue for recv */
 	struct sk_buff_head recv_list;
+#endif
 };
 
 /*
@@ -1814,8 +1818,12 @@ static int nv_alloc_rx(struct net_device *dev)
 		less_rx = np->last_rx.orig;
 
 	while (np->put_rx.orig != less_rx) {
+#ifdef FORCEDETH_RECV_CACHE
 		struct sk_buff *skb = skb_dequeue(&np->recv_list);
-
+#else
+		struct sk_buff *skb = netdev_alloc_skb(np->dev,
+					 np->rx_buf_sz + NV_RX_ALLOC_PAD);
+#endif
 		if (likely(skb)) {
 			np->put_rx_ctx->skb = skb;
 			np->put_rx_ctx->dma = dma_map_single(&np->pci_dev->dev,
@@ -1840,15 +1848,15 @@ static int nv_alloc_rx(struct net_device *dev)
 			u64_stats_update_begin(&np->swstats_rx_syncp);
 			np->stat_rx_dropped++;
 			u64_stats_update_end(&np->swstats_rx_syncp);
-
+#ifdef FORCEDETH_RECV_CACHE
 			schedule_delayed_work(&np->recv_cache_work, 0);
-
+#endif
 			return 1;
 		}
 	}
-
+#ifdef FORCEDETH_RECV_CACHE
 	schedule_delayed_work(&np->recv_cache_work, 0);
-
+#endif
 	return 0;
 }
 
@@ -1862,7 +1870,12 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
 		less_rx = np->last_rx.ex;
 
 	while (np->put_rx.ex != less_rx) {
+#ifdef FORCEDETH_RECV_CACHE
 		struct sk_buff *skb = skb_dequeue(&np->recv_list);
+#else
+		struct sk_buff *skb = netdev_alloc_skb(np->dev,
+					np->rx_buf_sz + NV_RX_ALLOC_PAD);
+#endif
 
 		if (likely(skb)) {
 			np->put_rx_ctx->skb = skb;
@@ -1889,15 +1902,15 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
 			u64_stats_update_begin(&np->swstats_rx_syncp);
 			np->stat_rx_dropped++;
 			u64_stats_update_end(&np->swstats_rx_syncp);
-
+#ifdef FORCEDETH_RECV_CACHE
 			schedule_delayed_work(&np->recv_cache_work, 0);
-
+#endif
 			return 1;
 		}
 	}
-
+#ifdef FORCEDETH_RECV_CACHE
 	schedule_delayed_work(&np->recv_cache_work, 0);
-
+#endif
 	return 0;
 }
 
@@ -1981,6 +1994,7 @@ static void nv_init_tx(struct net_device *dev)
 	}
 }
 
+#ifdef FORCEDETH_RECV_CACHE
 static void nv_init_recv_cache(struct net_device *dev)
 {
 	struct fe_priv *np = netdev_priv(dev);
@@ -2017,6 +2031,7 @@ static void nv_destroy_recv_cache(struct net_device *dev)
 	skb_queue_purge(&np->recv_list);
 	WARN_ON(skb_queue_len(&np->recv_list));
 }
+#endif
 
 static int nv_init_ring(struct net_device *dev)
 {
@@ -3108,8 +3123,10 @@ static int nv_change_mtu(struct net_device *dev, int new_mtu)
 		nv_drain_rxtx(dev);
 		/* reinit driver view of the rx queue */
 		set_bufsize(dev);
+#ifdef FORCEDETH_RECV_CACHE
 		nv_destroy_recv_cache(dev);
 		nv_init_recv_cache(dev);
+#endif
 		if (nv_init_ring(dev)) {
 			if (!np->in_shutdown)
 				mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
@@ -4137,6 +4154,7 @@ static void nv_free_irq(struct net_device *dev)
 	}
 }
 
+#ifdef FORCEDETH_RECV_CACHE
 static void nv_recv_cache_worker(struct work_struct *work)
 {
 	struct fe_priv *np = container_of(work, struct fe_priv,
@@ -4162,6 +4180,7 @@ static void nv_recv_cache_worker(struct work_struct *work)
 		skb_queue_tail(&np->recv_list, skb);
 	}
 }
+#endif
 
 static void nv_do_nic_poll(struct timer_list *t)
 {
@@ -4218,8 +4237,10 @@ static void nv_do_nic_poll(struct timer_list *t)
 			nv_drain_rxtx(dev);
 			/* reinit driver view of the rx queue */
 			set_bufsize(dev);
+#ifdef FORCEDETH_RECV_CACHE
 			nv_destroy_recv_cache(dev);
 			nv_init_recv_cache(dev);
+#endif
 			if (nv_init_ring(dev)) {
 				if (!np->in_shutdown)
 					mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
@@ -4772,8 +4793,10 @@ static int nv_set_ringparam(struct net_device *dev, struct ethtool_ringparam* ri
 	if (netif_running(dev)) {
 		/* reinit driver view of the queues */
 		set_bufsize(dev);
+#ifdef FORCEDETH_RECV_CACHE
 		nv_destroy_recv_cache(dev);
 		nv_init_recv_cache(dev);
+#endif
 		if (nv_init_ring(dev)) {
 			if (!np->in_shutdown)
 				mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
@@ -5495,9 +5518,11 @@ static int nv_open(struct net_device *dev)
 
 	/* initialize descriptor rings */
 	set_bufsize(dev);
+#ifdef FORCEDETH_RECV_CACHE
 	nv_init_recv_cache(dev);
 
 	INIT_DELAYED_WORK(&np->recv_cache_work, nv_recv_cache_worker);
+#endif
 	oom = nv_init_ring(dev);
 
 	writel(0, base + NvRegLinkSpeed);
@@ -5679,9 +5704,10 @@ static int nv_close(struct net_device *dev)
 		nv_txrx_gate(dev, true);
 	}
 
+#ifdef FORCEDETH_RECV_CACHE
 	/* free all SKBs in recv cache */
 	nv_destroy_recv_cache(dev);
-
+#endif
 	/* FIXME: power down nic */
 
 	return 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCHv2 2/2] forcedeth: disable recv cache by default
  2019-07-21 12:53 ` [PATCHv2 2/2] forcedeth: disable recv cache by default Zhu Yanjun
@ 2019-07-21 14:48   ` Andrew Lunn
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Lunn @ 2019-07-21 14:48 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: davem, netdev

On Sun, Jul 21, 2019 at 08:53:53AM -0400, Zhu Yanjun wrote:
> The recv cache is to allocate 125MiB memory to reserve for NIC.
> In the past time, this recv cache works very well. When the memory
> is not enough, this recv cache reserves memory for NIC.
> And the communications through this NIC is not affected by the
> memory shortage. And the performance of NIC is better because of
> this recv cache.
> But this recv cache reserves 125MiB memory for one NIC port. Normally
> there are 2 NIC ports in one card. So in a host, there are about 250
> MiB memory reserved for NIC ports. To a host on which communications
> are not mandatory, it is not necessary to reserve memory.
> So this recv cache is disabled by default.
> 
> CC: Joe Jin <joe.jin@oracle.com>
> CC: Junxiao Bi <junxiao.bi@oracle.com>
> Tested-by: Nan san <nan.1986san@gmail.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> ---
>  drivers/net/ethernet/nvidia/Kconfig     | 11 ++++++++
>  drivers/net/ethernet/nvidia/Makefile    |  1 +
>  drivers/net/ethernet/nvidia/forcedeth.c | 46 ++++++++++++++++++++++++++-------
>  3 files changed, 48 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/nvidia/Kconfig b/drivers/net/ethernet/nvidia/Kconfig
> index faacbd1..9a9f42a 100644
> --- a/drivers/net/ethernet/nvidia/Kconfig
> +++ b/drivers/net/ethernet/nvidia/Kconfig
> @@ -26,4 +26,15 @@ config FORCEDETH
>  	  To compile this driver as a module, choose M here. The module
>  	  will be called forcedeth.
>  
> +config	FORCEDETH_RECV_CACHE
> +	bool "nForce Ethernet recv cache support"
> +	depends on FORCEDETH
> +	default n
> +	---help---
> +	  The recv cache can make nic work steadily when the system memory is
> +	  not enough. And it can also enhance nic performance. But to a host
> +	  on which the communications are not mandatory, it is not necessary
> +	  to reserve 125MiB memory for NIC.
> +	  So recv cache is disabled by default.
> +
>  endif # NET_VENDOR_NVIDIA
> diff --git a/drivers/net/ethernet/nvidia/Makefile b/drivers/net/ethernet/nvidia/Makefile
> index 8935699..40c055e 100644
> --- a/drivers/net/ethernet/nvidia/Makefile
> +++ b/drivers/net/ethernet/nvidia/Makefile
> @@ -4,3 +4,4 @@
>  #
>  
>  obj-$(CONFIG_FORCEDETH) += forcedeth.o
> +ccflags-$(CONFIG_FORCEDETH_RECV_CACHE)	:=	-DFORCEDETH_RECV_CACHE
> diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
> index f8e766f..deda276 100644
> --- a/drivers/net/ethernet/nvidia/forcedeth.c
> +++ b/drivers/net/ethernet/nvidia/forcedeth.c
> @@ -674,10 +674,12 @@ struct nv_ethtool_stats {
>  	u64 tx_broadcast;
>  };
>  
> +#ifdef FORCEDETH_RECV_CACHE
>  /* 1000Mb is 125M bytes, 125 * 1024 * 1024 bytes
>   * The length of recv cache is 125M / skb_length
>   */
>  #define RECV_CACHE_LIST_LENGTH		(125 * 1024 * 1024 / np->rx_buf_sz)
> +#endif
>  
>  #define NV_DEV_STATISTICS_V3_COUNT (sizeof(struct nv_ethtool_stats)/sizeof(u64))
>  #define NV_DEV_STATISTICS_V2_COUNT (NV_DEV_STATISTICS_V3_COUNT - 3)
> @@ -850,10 +852,12 @@ struct fe_priv {
>  	char name_tx[IFNAMSIZ + 3];       /* -tx    */
>  	char name_other[IFNAMSIZ + 6];    /* -other */
>  
> +#ifdef FORCEDETH_RECV_CACHE
>  	/* This is to schedule work */
>  	struct delayed_work     recv_cache_work;
>  	/* This list is to store skb queue for recv */
>  	struct sk_buff_head recv_list;
> +#endif
>  };
>  
>  /*
> @@ -1814,8 +1818,12 @@ static int nv_alloc_rx(struct net_device *dev)
>  		less_rx = np->last_rx.orig;
>  
>  	while (np->put_rx.orig != less_rx) {
> +#ifdef FORCEDETH_RECV_CACHE
>  		struct sk_buff *skb = skb_dequeue(&np->recv_list);
> -
> +#else
> +		struct sk_buff *skb = netdev_alloc_skb(np->dev,
> +					 np->rx_buf_sz + NV_RX_ALLOC_PAD);
> +#endif
>  		if (likely(skb)) {
>  			np->put_rx_ctx->skb = skb;
>  			np->put_rx_ctx->dma = dma_map_single(&np->pci_dev->dev,
> @@ -1840,15 +1848,15 @@ static int nv_alloc_rx(struct net_device *dev)
>  			u64_stats_update_begin(&np->swstats_rx_syncp);
>  			np->stat_rx_dropped++;
>  			u64_stats_update_end(&np->swstats_rx_syncp);
> -
> +#ifdef FORCEDETH_RECV_CACHE
>  			schedule_delayed_work(&np->recv_cache_work, 0);
> -
> +#endif

All these #ifdef are pretty ugly. It also makes for easy to break code
since most of the time this option will not be enabled. Please
refactor the code so that is uses

if (IS_ENABLED(FORCEDETH_RECV_CACHE))

so that the compiler at least compiles the code every time, and then
optimizing it out.

	   Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily
  2019-07-21 12:53 [PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily Zhu Yanjun
  2019-07-21 12:53 ` [PATCHv2 1/2] forcedeth: add recv cache to make nic " Zhu Yanjun
  2019-07-21 12:53 ` [PATCHv2 2/2] forcedeth: disable recv cache by default Zhu Yanjun
@ 2019-07-21 14:53 ` Andrew Lunn
  2019-07-21 18:45 ` David Miller
  3 siblings, 0 replies; 6+ messages in thread
From: Andrew Lunn @ 2019-07-21 14:53 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: davem, netdev

On Sun, Jul 21, 2019 at 08:53:51AM -0400, Zhu Yanjun wrote:
> These patches are to this scenario:
> 
> "
> When the host run for long time, there are a lot of memory fragments in
> the hosts. And it is possible that kernel will compact memory fragments.
> But normally it is difficult for NIC driver to allocate a memory from
> kernel. From this variable stat_rx_dropped, we can confirm that NIC driver
> can not allocate skb very frequently.
> "
> 
> Since NIC driver can not allocate skb in time, this makes some important
> tasks not be completed in time.
> To avoid it, a recv cache is created to pre-allocate skb for NIC driver.
> This can make the important tasks be completed in time.
> >From Nan's tests in LAB, these patches can make NIC driver work steadily.
> Now in production hosts, these patches are applied.
> 
> With these patches, one NIC port needs 125MiB reserved. This 125MiB memory
> can not be used by others. To a host on which the communications are not
> mandatory, it is not necessary to reserve so much memory. So this recv cache
> is disabled by default.
> 
> V1->V2:
> 1. ndelay is replaced with GFP_KERNEL function __netdev_alloc_skb.
> 2. skb_queue_purge is used when recv cache is destroyed.
> 3. RECV_LIST_ALLOCATE bit is removed.
> 4. schedule_delayed_work is moved out of while loop.

Hi Zhu

You don't appear to of address David's comment that this is probably
the wrong way to do this, it should be a generic solution.

Also, that there should be enough atomic memory in the system
anyway. Have you looked at what other drivers are using atomic memory?
It could actually be you need to debug some other driver, rather than
add hacks to forcedeth.

    Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily
  2019-07-21 12:53 [PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily Zhu Yanjun
                   ` (2 preceding siblings ...)
  2019-07-21 14:53 ` [PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily Andrew Lunn
@ 2019-07-21 18:45 ` David Miller
  3 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2019-07-21 18:45 UTC (permalink / raw)
  To: yanjun.zhu; +Cc: netdev

I made it abundantly clear that I am completely not supportive of
changes like this.

If anything, we need to improve the behavior of the core kernel
allocators, and the mid-level networking interfaces which use them,
to fix problems like this.

It is absolutely not sustainable to have every driver implement
a cache of some sort to "improve" allocation behavior.

Sorry, there is no way in the world I'm applying changes like
these.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-21 18:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-07-21 12:53 [PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily Zhu Yanjun
2019-07-21 12:53 ` [PATCHv2 1/2] forcedeth: add recv cache to make nic " Zhu Yanjun
2019-07-21 12:53 ` [PATCHv2 2/2] forcedeth: disable recv cache by default Zhu Yanjun
2019-07-21 14:48   ` Andrew Lunn
2019-07-21 14:53 ` [PATCHv2 0/2] forcedeth: recv cache to make NIC work steadily Andrew Lunn
2019-07-21 18:45 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).