* [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs w/o dma cache snooping
2021-02-11 16:18 [PATCH net-next v2 0/5] lan743x speed boost Sven Van Asbroeck
@ 2021-02-11 16:18 ` Sven Van Asbroeck
2021-02-12 0:18 ` Sergej Bauer
2021-02-12 20:22 ` Bryan.Whitehead
2021-02-11 16:18 ` [PATCH net-next v2 2/5] lan743x: sync only the received area of an rx ring buffer Sven Van Asbroeck
` (4 subsequent siblings)
5 siblings, 2 replies; 14+ messages in thread
From: Sven Van Asbroeck @ 2021-02-11 16:18 UTC (permalink / raw)
To: Bryan Whitehead, UNGLinuxDriver, David S Miller, Jakub Kicinski
Cc: Sven Van Asbroeck, Andrew Lunn, Alexey Denisov, Sergej Bauer,
Tim Harvey, Anders Rønningen, Hillf Danton,
Christoph Hellwig, Willem de Bruijn, netdev, linux-kernel
From: Sven Van Asbroeck <thesven73@gmail.com>
The buffers in the lan743x driver's receive ring are always 9K,
even when the largest packet that can be received (the mtu) is
much smaller. This performs particularly badly on cpu archs
without dma cache snooping (such as ARM): each received packet
results in a 9K dma_{map|unmap} operation, which is very expensive
because cpu caches need to be invalidated.
Careful measurement of the driver rx path on armv7 reveals that
the cpu spends the majority of its time waiting for cache
invalidation.
Optimize by keeping the rx ring buffer size as close as possible
to the mtu. This limits the amount of cache that requires
invalidation.
This optimization would normally force us to re-allocate all
ring buffers when the mtu is changed - a disruptive event,
because it can only happen when the network interface is down.
Remove the need to re-allocate all ring buffers by adding support
for multi-buffer frames. Now any combination of mtu and ring
buffer size will work. When the mtu changes from mtu1 to mtu2,
consumed buffers of size mtu1 are lazily replaced by newly
allocated buffers of size mtu2.
These optimizations double the rx performance on armv7.
Third parties report 3x rx speedup on armv8.
Tested with iperf3 on a freescale imx6qp + lan7430, both sides
set to mtu 1500 bytes, measure rx performance:
Before:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-20.00 sec 550 MBytes 231 Mbits/sec 0
After:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-20.00 sec 1.33 GBytes 570 Mbits/sec 0
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
---
To: Bryan Whitehead <bryan.whitehead@microchip.com>
To: UNGLinuxDriver@microchip.com
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Alexey Denisov <rtgbnm@gmail.com>
Cc: Sergej Bauer <sbauer@blackbox.su>
Cc: Tim Harvey <tharvey@gateworks.com>
Cc: Anders Rønningen <anders@ronningen.priv.no>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
drivers/net/ethernet/microchip/lan743x_main.c | 325 ++++++++----------
drivers/net/ethernet/microchip/lan743x_main.h | 5 +-
2 files changed, 149 insertions(+), 181 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index f1f6eba4ace4..0c48bb559719 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1955,15 +1955,6 @@ static int lan743x_rx_next_index(struct lan743x_rx *rx, int index)
return ((++index) % rx->ring_size);
}
-static struct sk_buff *lan743x_rx_allocate_skb(struct lan743x_rx *rx)
-{
- int length = 0;
-
- length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
- return __netdev_alloc_skb(rx->adapter->netdev,
- length, GFP_ATOMIC | GFP_DMA);
-}
-
static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
{
/* update the tail once per 8 descriptors */
@@ -1972,36 +1963,40 @@ static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
index);
}
-static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index,
- struct sk_buff *skb)
+static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
{
+ struct net_device *netdev = rx->adapter->netdev;
+ struct device *dev = &rx->adapter->pdev->dev;
struct lan743x_rx_buffer_info *buffer_info;
struct lan743x_rx_descriptor *descriptor;
- int length = 0;
+ struct sk_buff *skb;
+ dma_addr_t dma_ptr;
+ int length;
+
+ length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
- length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- buffer_info->skb = skb;
- if (!(buffer_info->skb))
+ skb = __netdev_alloc_skb(netdev, length, GFP_ATOMIC | GFP_DMA);
+ if (!skb)
return -ENOMEM;
- buffer_info->dma_ptr = dma_map_single(&rx->adapter->pdev->dev,
- buffer_info->skb->data,
- length,
- DMA_FROM_DEVICE);
- if (dma_mapping_error(&rx->adapter->pdev->dev,
- buffer_info->dma_ptr)) {
- buffer_info->dma_ptr = 0;
+ dma_ptr = dma_map_single(dev, skb->data, length, DMA_FROM_DEVICE);
+ if (dma_mapping_error(dev, dma_ptr)) {
+ dev_kfree_skb_any(skb);
return -ENOMEM;
}
+ if (buffer_info->dma_ptr)
+ dma_unmap_single(dev, buffer_info->dma_ptr,
+ buffer_info->buffer_length, DMA_FROM_DEVICE);
+ buffer_info->skb = skb;
+ buffer_info->dma_ptr = dma_ptr;
buffer_info->buffer_length = length;
descriptor->data1 = cpu_to_le32(DMA_ADDR_LOW32(buffer_info->dma_ptr));
descriptor->data2 = cpu_to_le32(DMA_ADDR_HIGH32(buffer_info->dma_ptr));
descriptor->data3 = 0;
descriptor->data0 = cpu_to_le32((RX_DESC_DATA0_OWN_ |
(length & RX_DESC_DATA0_BUF_LENGTH_MASK_)));
- skb_reserve(buffer_info->skb, RX_HEAD_PADDING);
lan743x_rx_update_tail(rx, index);
return 0;
@@ -2050,16 +2045,32 @@ static void lan743x_rx_release_ring_element(struct lan743x_rx *rx, int index)
memset(buffer_info, 0, sizeof(*buffer_info));
}
-static int lan743x_rx_process_packet(struct lan743x_rx *rx)
+static struct sk_buff *
+lan743x_rx_trim_skb(struct sk_buff *skb, int frame_length)
+{
+ if (skb_linearize(skb)) {
+ dev_kfree_skb_irq(skb);
+ return NULL;
+ }
+ frame_length = max_t(int, 0, frame_length - RX_HEAD_PADDING - 2);
+ if (skb->len > frame_length) {
+ skb->tail -= skb->len - frame_length;
+ skb->len = frame_length;
+ }
+ return skb;
+}
+
+static int lan743x_rx_process_buffer(struct lan743x_rx *rx)
{
- struct skb_shared_hwtstamps *hwtstamps = NULL;
- int result = RX_PROCESS_RESULT_NOTHING_TO_DO;
int current_head_index = le32_to_cpu(*rx->head_cpu_ptr);
+ struct lan743x_rx_descriptor *descriptor, *desc_ext;
+ struct net_device *netdev = rx->adapter->netdev;
+ int result = RX_PROCESS_RESULT_NOTHING_TO_DO;
struct lan743x_rx_buffer_info *buffer_info;
- struct lan743x_rx_descriptor *descriptor;
+ int frame_length, buffer_length;
int extension_index = -1;
- int first_index = -1;
- int last_index = -1;
+ bool is_last, is_first;
+ struct sk_buff *skb;
if (current_head_index < 0 || current_head_index >= rx->ring_size)
goto done;
@@ -2067,163 +2078,121 @@ static int lan743x_rx_process_packet(struct lan743x_rx *rx)
if (rx->last_head < 0 || rx->last_head >= rx->ring_size)
goto done;
- if (rx->last_head != current_head_index) {
- descriptor = &rx->ring_cpu_ptr[rx->last_head];
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_OWN_)
- goto done;
+ if (rx->last_head == current_head_index)
+ goto done;
- if (!(le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_FS_))
- goto done;
+ descriptor = &rx->ring_cpu_ptr[rx->last_head];
+ if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_OWN_)
+ goto done;
+ buffer_info = &rx->buffer_info[rx->last_head];
- first_index = rx->last_head;
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_) {
- last_index = rx->last_head;
- } else {
- int index;
-
- index = lan743x_rx_next_index(rx, first_index);
- while (index != current_head_index) {
- descriptor = &rx->ring_cpu_ptr[index];
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_OWN_)
- goto done;
-
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_) {
- last_index = index;
- break;
- }
- index = lan743x_rx_next_index(rx, index);
- }
- }
- if (last_index >= 0) {
- descriptor = &rx->ring_cpu_ptr[last_index];
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_EXT_) {
- /* extension is expected to follow */
- int index = lan743x_rx_next_index(rx,
- last_index);
- if (index != current_head_index) {
- descriptor = &rx->ring_cpu_ptr[index];
- if (le32_to_cpu(descriptor->data0) &
- RX_DESC_DATA0_OWN_) {
- goto done;
- }
- if (le32_to_cpu(descriptor->data0) &
- RX_DESC_DATA0_EXT_) {
- extension_index = index;
- } else {
- goto done;
- }
- } else {
- /* extension is not yet available */
- /* prevent processing of this packet */
- first_index = -1;
- last_index = -1;
- }
- }
- }
- }
- if (first_index >= 0 && last_index >= 0) {
- int real_last_index = last_index;
- struct sk_buff *skb = NULL;
- u32 ts_sec = 0;
- u32 ts_nsec = 0;
-
- /* packet is available */
- if (first_index == last_index) {
- /* single buffer packet */
- struct sk_buff *new_skb = NULL;
- int packet_length;
-
- new_skb = lan743x_rx_allocate_skb(rx);
- if (!new_skb) {
- /* failed to allocate next skb.
- * Memory is very low.
- * Drop this packet and reuse buffer.
- */
- lan743x_rx_reuse_ring_element(rx, first_index);
- goto process_extension;
- }
+ is_last = le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_;
+ is_first = le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_FS_;
- buffer_info = &rx->buffer_info[first_index];
- skb = buffer_info->skb;
- descriptor = &rx->ring_cpu_ptr[first_index];
-
- /* unmap from dma */
- if (buffer_info->dma_ptr) {
- dma_unmap_single(&rx->adapter->pdev->dev,
- buffer_info->dma_ptr,
- buffer_info->buffer_length,
- DMA_FROM_DEVICE);
- buffer_info->dma_ptr = 0;
- buffer_info->buffer_length = 0;
- }
- buffer_info->skb = NULL;
- packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_
- (le32_to_cpu(descriptor->data0));
- skb_put(skb, packet_length - 4);
- skb->protocol = eth_type_trans(skb,
- rx->adapter->netdev);
- lan743x_rx_init_ring_element(rx, first_index, new_skb);
- } else {
- int index = first_index;
+ if (is_last && le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_EXT_) {
+ /* extension is expected to follow */
+ int index = lan743x_rx_next_index(rx, rx->last_head);
- /* multi buffer packet not supported */
- /* this should not happen since
- * buffers are allocated to be at least jumbo size
- */
+ if (index == current_head_index)
+ /* extension not yet available */
+ goto done;
+ desc_ext = &rx->ring_cpu_ptr[index];
+ if (le32_to_cpu(desc_ext->data0) & RX_DESC_DATA0_OWN_)
+ /* extension not yet available */
+ goto done;
+ if (!(le32_to_cpu(desc_ext->data0) & RX_DESC_DATA0_EXT_))
+ goto move_forward;
+ extension_index = index;
+ }
- /* clean up buffers */
- if (first_index <= last_index) {
- while ((index >= first_index) &&
- (index <= last_index)) {
- lan743x_rx_reuse_ring_element(rx,
- index);
- index = lan743x_rx_next_index(rx,
- index);
- }
- } else {
- while ((index >= first_index) ||
- (index <= last_index)) {
- lan743x_rx_reuse_ring_element(rx,
- index);
- index = lan743x_rx_next_index(rx,
- index);
- }
- }
- }
+ /* Only the last buffer in a multi-buffer frame contains the total frame
+ * length. All other buffers have a zero frame length. The chip
+ * occasionally sends more buffers than strictly required to reach the
+ * total frame length.
+ * Handle this by adding all buffers to the skb in their entirety.
+ * Once the real frame length is known, trim the skb.
+ */
+ frame_length =
+ RX_DESC_DATA0_FRAME_LENGTH_GET_(le32_to_cpu(descriptor->data0));
+ buffer_length = buffer_info->buffer_length;
+
+ netdev_dbg(netdev, "%s%schunk: %d/%d",
+ is_first ? "first " : " ",
+ is_last ? "last " : " ",
+ frame_length, buffer_length);
+
+ /* save existing skb, allocate new skb and map to dma */
+ skb = buffer_info->skb;
+ if (lan743x_rx_init_ring_element(rx, rx->last_head)) {
+ /* failed to allocate next skb.
+ * Memory is very low.
+ * Drop this packet and reuse buffer.
+ */
+ lan743x_rx_reuse_ring_element(rx, rx->last_head);
+ /* drop packet that was being assembled */
+ dev_kfree_skb_irq(rx->skb_head);
+ rx->skb_head = NULL;
+ goto process_extension;
+ }
+
+ /* add buffers to skb via skb->frag_list */
+ if (is_first) {
+ skb_reserve(skb, RX_HEAD_PADDING);
+ skb_put(skb, buffer_length - RX_HEAD_PADDING);
+ if (rx->skb_head)
+ dev_kfree_skb_irq(rx->skb_head);
+ rx->skb_head = skb;
+ } else if (rx->skb_head) {
+ skb_put(skb, buffer_length);
+ if (skb_shinfo(rx->skb_head)->frag_list)
+ rx->skb_tail->next = skb;
+ else
+ skb_shinfo(rx->skb_head)->frag_list = skb;
+ rx->skb_tail = skb;
+ rx->skb_head->len += skb->len;
+ rx->skb_head->data_len += skb->len;
+ rx->skb_head->truesize += skb->truesize;
+ } else {
+ /* packet to assemble has already been dropped because one or
+ * more of its buffers could not be allocated
+ */
+ netdev_dbg(netdev, "drop buffer intended for dropped packet");
+ dev_kfree_skb_irq(skb);
+ }
process_extension:
- if (extension_index >= 0) {
- descriptor = &rx->ring_cpu_ptr[extension_index];
- buffer_info = &rx->buffer_info[extension_index];
-
- ts_sec = le32_to_cpu(descriptor->data1);
- ts_nsec = (le32_to_cpu(descriptor->data2) &
- RX_DESC_DATA2_TS_NS_MASK_);
- lan743x_rx_reuse_ring_element(rx, extension_index);
- real_last_index = extension_index;
- }
+ if (extension_index >= 0) {
+ u32 ts_sec;
+ u32 ts_nsec;
- if (!skb) {
- result = RX_PROCESS_RESULT_PACKET_DROPPED;
- goto move_forward;
- }
+ ts_sec = le32_to_cpu(desc_ext->data1);
+ ts_nsec = (le32_to_cpu(desc_ext->data2) &
+ RX_DESC_DATA2_TS_NS_MASK_);
+ if (rx->skb_head)
+ skb_hwtstamps(rx->skb_head)->hwtstamp =
+ ktime_set(ts_sec, ts_nsec);
+ lan743x_rx_reuse_ring_element(rx, extension_index);
+ rx->last_head = extension_index;
+ netdev_dbg(netdev, "process extension");
+ }
- if (extension_index < 0)
- goto pass_packet_to_os;
- hwtstamps = skb_hwtstamps(skb);
- if (hwtstamps)
- hwtstamps->hwtstamp = ktime_set(ts_sec, ts_nsec);
+ if (is_last && rx->skb_head)
+ rx->skb_head = lan743x_rx_trim_skb(rx->skb_head, frame_length);
-pass_packet_to_os:
- /* pass packet to OS */
- napi_gro_receive(&rx->napi, skb);
- result = RX_PROCESS_RESULT_PACKET_RECEIVED;
+ if (is_last && rx->skb_head) {
+ rx->skb_head->protocol = eth_type_trans(rx->skb_head,
+ rx->adapter->netdev);
+ netdev_dbg(netdev, "sending %d byte frame to OS",
+ rx->skb_head->len);
+ napi_gro_receive(&rx->napi, rx->skb_head);
+ rx->skb_head = NULL;
+ }
move_forward:
- /* push tail and head forward */
- rx->last_tail = real_last_index;
- rx->last_head = lan743x_rx_next_index(rx, real_last_index);
- }
+ /* push tail and head forward */
+ rx->last_tail = rx->last_head;
+ rx->last_head = lan743x_rx_next_index(rx, rx->last_head);
+ result = RX_PROCESS_RESULT_BUFFER_RECEIVED;
done:
return result;
}
@@ -2242,12 +2211,12 @@ static int lan743x_rx_napi_poll(struct napi_struct *napi, int weight)
DMAC_INT_BIT_RXFRM_(rx->channel_number));
}
for (count = 0; count < weight; count++) {
- result = lan743x_rx_process_packet(rx);
+ result = lan743x_rx_process_buffer(rx);
if (result == RX_PROCESS_RESULT_NOTHING_TO_DO)
break;
}
rx->frame_count += count;
- if (count == weight || result == RX_PROCESS_RESULT_PACKET_RECEIVED)
+ if (count == weight || result == RX_PROCESS_RESULT_BUFFER_RECEIVED)
return weight;
if (!napi_complete_done(napi, count))
@@ -2359,9 +2328,7 @@ static int lan743x_rx_ring_init(struct lan743x_rx *rx)
rx->last_head = 0;
for (index = 0; index < rx->ring_size; index++) {
- struct sk_buff *new_skb = lan743x_rx_allocate_skb(rx);
-
- ret = lan743x_rx_init_ring_element(rx, index, new_skb);
+ ret = lan743x_rx_init_ring_element(rx, index);
if (ret)
goto cleanup;
}
diff --git a/drivers/net/ethernet/microchip/lan743x_main.h b/drivers/net/ethernet/microchip/lan743x_main.h
index 751f2bc9ce84..40dfb564c4f7 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.h
+++ b/drivers/net/ethernet/microchip/lan743x_main.h
@@ -698,6 +698,8 @@ struct lan743x_rx {
struct napi_struct napi;
u32 frame_count;
+
+ struct sk_buff *skb_head, *skb_tail;
};
struct lan743x_adapter {
@@ -831,8 +833,7 @@ struct lan743x_rx_buffer_info {
#define LAN743X_RX_RING_SIZE (65)
#define RX_PROCESS_RESULT_NOTHING_TO_DO (0)
-#define RX_PROCESS_RESULT_PACKET_RECEIVED (1)
-#define RX_PROCESS_RESULT_PACKET_DROPPED (2)
+#define RX_PROCESS_RESULT_BUFFER_RECEIVED (1)
u32 lan743x_csr_read(struct lan743x_adapter *adapter, int offset);
void lan743x_csr_write(struct lan743x_adapter *adapter, int offset, u32 data);
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs w/o dma cache snooping
2021-02-11 16:18 ` [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs w/o dma cache snooping Sven Van Asbroeck
@ 2021-02-12 0:18 ` Sergej Bauer
2021-02-12 0:27 ` Sven Van Asbroeck
2021-02-12 20:22 ` Bryan.Whitehead
1 sibling, 1 reply; 14+ messages in thread
From: Sergej Bauer @ 2021-02-12 0:18 UTC (permalink / raw)
To: thesven73
Cc: andrew, Markus.Elfring, rtgbnm, tharvey, anders, sbauer,
Bryan Whitehead, maintainer:MICROCHIP LAN743X ETHERNET DRIVER,
David S. Miller, Jakub Kicinski,
open list:MICROCHIP LAN743X ETHERNET DRIVER, open list
On Thursday, February 11, 2021 7:18:26 PM MSK you wrote:
> From: Sven Van Asbroeck <thesven73@gmail.com>
>
> The buffers in the lan743x driver's receive ring are always 9K,
> even when the largest packet that can be received (the mtu) is
> much smaller. This performs particularly badly on cpu archs
> without dma cache snooping (such as ARM): each received packet
> results in a 9K dma_{map|unmap} operation, which is very expensive
> because cpu caches need to be invalidated.
>
> Careful measurement of the driver rx path on armv7 reveals that
> the cpu spends the majority of its time waiting for cache
> invalidation.
>
> Optimize by keeping the rx ring buffer size as close as possible
> to the mtu. This limits the amount of cache that requires
> invalidation.
>
> This optimization would normally force us to re-allocate all
> ring buffers when the mtu is changed - a disruptive event,
> because it can only happen when the network interface is down.
>
> Remove the need to re-allocate all ring buffers by adding support
> for multi-buffer frames. Now any combination of mtu and ring
> buffer size will work. When the mtu changes from mtu1 to mtu2,
> consumed buffers of size mtu1 are lazily replaced by newly
> allocated buffers of size mtu2.
>
> These optimizations double the rx performance on armv7.
> Third parties report 3x rx speedup on armv8.
>
> Tested with iperf3 on a freescale imx6qp + lan7430, both sides
> set to mtu 1500 bytes, measure rx performance:
>
> Before:
> [ ID] Interval Transfer Bandwidth Retr
> [ 4] 0.00-20.00 sec 550 MBytes 231 Mbits/sec 0
> After:
> [ ID] Interval Transfer Bandwidth Retr
> [ 4] 0.00-20.00 sec 1.33 GBytes 570 Mbits/sec 0
>
> Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
> ---
( for the reference to current speed, response to v1 of the patch can be found at
https://lkml.org/lkml/2021/2/5/472 )
Hi Sven
although whole set of tests might be an overly extensive, but after applying patch v2 [1/5]
tests are:
sbauer@metamini ~/devel/kernel-works/net-next.git lan743x_virtual_phy$ ifmtu eth7 500
mtu = 500
sbauer@metamini ~/devel/kernel-works/net-next.git lan743x_virtual_phy$ sudo test_ber -l eth7 -c 1000 -n 1000000 -f500 --no-conf
...
number of sent packets = 1000000
number of received packets = 747411
number of lost packets = 252589
number of out of order packets = 0
number of bit errors = 0
total errors detected = 252589
bit error rate = 0.252589
average speed: 408.0757 Mbit/s
...
number of sent packets = 1000000
number of received packets = 738377
number of lost packets = 261623
number of out of order packets = 0
number of bit errors = 0
total errors detected = 261623
bit error rate = 0.261623
average speed: 413.1470 Mbit/s
...
number of sent packets = 1000000
number of received packets = 738142
number of lost packets = 261858
number of out of order packets = 0
number of bit errors = 0
total errors detected = 261858
bit error rate = 0.261858
average speed: 413.2262 Mbit/s
...
number of sent packets = 1000000
number of received packets = 708973
number of lost packets = 291027
number of out of order packets = 0
number of bit errors = 0
total errors detected = 291027
bit error rate = 0.291027
average speed: 430.6224 Mbit/s
...
number of sent packets = 1000000
number of received packets = 725452
number of lost packets = 274548
number of out of order packets = 0
number of bit errors = 0
total errors detected = 274548
bit error rate = 0.274548
average speed: 420.7341 Mbit/s
sbauer@metamini ~/devel/kernel-works/net-next.git lan743x_virtual_phy$ ifmtu eth7 1500
mtu = 1500
sbauer@metamini ~/devel/kernel-works/net-next.git lan743x_virtual_phy$ sudo test_ber -l eth7 -c 1000 -n 1000000 -f500 --no-conf
...
number of sent packets = 1000000
number of received packets = 714228
number of lost packets = 285772
number of out of order packets = 0
number of bit errors = 0
total errors detected = 285772
bit error rate = 0.285772
average speed: 427.1300 Mbit/s
...
number of sent packets = 1000000
number of received packets = 750055
number of lost packets = 249945
number of out of order packets = 0
number of bit errors = 0
total errors detected = 249945
bit error rate = 0.249945
average speed: 405.0383 Mbit/s
...
number of sent packets = 1000000
number of received packets = 689458
number of lost packets = 310542
number of out of order packets = 0
number of bit errors = 0
total errors detected = 310542
bit error rate = 0.310542
average speed: 442.5301 Mbit/s
number of sent packets = 1000000
number of received packets = 676830
number of lost packets = 323170
number of out of order packets = 0
number of bit errors = 0
total errors detected = 323170
bit error rate = 0.32317
average speed: 450.9439 Mbit/s
number of sent packets = 1000000
number of received packets = 701719
number of lost packets = 298281
number of out of order packets = 0
number of bit errors = 0
total errors detected = 298281
bit error rate = 0.298281
average speed: 434.7563 Mbit/s
sbauer@metamini ~/devel/kernel-works/net-next.git lan743x_virtual_phy$ sudo test_ber -l eth7 -c 1000 -n 1000000 -f1500 --no-conf
...
number of sent packets = 1000000
number of received packets = 1000000
number of lost packets = 0
number of out of order packets = 0
number of bit errors = 0
total errors detected = 0
bit error rate = 0
average speed: 643.5758 Mbit/s
...
number of sent packets = 1000000
number of received packets = 1000000
number of lost packets = 0
number of out of order packets = 0
number of bit errors = 0
total errors detected = 0
bit error rate = 0
average speed: 644.7713 Mbit/s
...
number of sent packets = 1000000
number of received packets = 1000000
number of lost packets = 0
number of out of order packets = 0
number of bit errors = 0
total errors detected = 0
bit error rate = 0
average speed: 645.4407 Mbit/s
...
number of sent packets = 1000000
number of received packets = 1000000
number of lost packets = 0
number of out of order packets = 0
number of bit errors = 0
total errors detected = 0
bit error rate = 0
average speed: 645.6741 Mbit/s
...
number of sent packets = 1000000
number of received packets = 1000000
number of lost packets = 0
number of out of order packets = 0
number of bit errors = 0
total errors detected = 0
bit error rate = 0
average speed: 646.0109 Mbit/s
sbauer@metamini ~/devel/kernel-works/net-next.git lan743x_virtual_phy$ ifmtu eth7 9216
mtu = 9216
bauer@metamini ~/devel/kernel-works/net-next.git lan743x_virtual_phy$ sudo test_ber -l eth7 -c 1000 -n 1000000 -f1500 --no-conf
...
number of sent packets = 1000000
number of received packets = 575141
number of lost packets = 424859
number of out of order packets = 0
number of bit errors = 0
total errors detected = 424859
bit error rate = 0.424859
average speed: 646.7859 Mbit/s
...
number of sent packets = 1000000
number of received packets = 583353
number of lost packets = 416647
number of out of order packets = 0
number of bit errors = 0
total errors detected = 416647
bit error rate = 0.416647
average speed: 637.8472 Mbit/s
...
number of sent packets = 1000000
number of received packets = 577127
number of lost packets = 422873
number of out of order packets = 0
number of bit errors = 0
total errors detected = 422873
bit error rate = 0.422873
average speed: 644.5562 Mbit/s
...
number of sent packets = 1000000
number of received packets = 576916
number of lost packets = 423084
number of out of order packets = 0
number of bit errors = 0
total errors detected = 423084
bit error rate = 0.423084
average speed: 644.8260 Mbit/s
...
number of sent packets = 1000000
number of received packets = 577154
number of lost packets = 422846
number of out of order packets = 0
number of bit errors = 0
total errors detected = 422846
bit error rate = 0.422846
average speed: 644.6815 Mbit/s
sbauer@metamini ~/devel/kernel-works/net-next.git lan743x_virtual_phy$ sudo test_ber -l eth7 -c 1000 -n 1000000 -f9216 --no-conf
...
number of sent packets = 1000000
number of received packets = 1000000
number of lost packets = 0
number of out of order packets = 0
number of bit errors = 0
total errors detected = 0
bit error rate = 0
average speed: 775.2005 Mbit/s
...
number of sent packets = 1000000
number of received packets = 999998
number of lost packets = 2
number of out of order packets = 0
number of bit errors = 0
total errors detected = 2
bit error rate = 2e-06
average speed: 775.0468 Mbit/
...
number of sent packets = 1000000
number of received packets = 999998
number of lost packets = 2
number of out of order packets = 0
number of bit errors = 0
total errors detected = 2
bit error rate = 2e-06
average speed: 775.2150 Mbit/s
...
number of sent packets = 1000000
number of received packets = 999997
number of lost packets = 3
number of out of order packets = 0
number of bit errors = 0
total errors detected = 3
bit error rate = 3e-06
average speed: 775.2666 Mbit/s
...
number of sent packets = 1000000
number of received packets = 999999
number of lost packets = 1
number of out of order packets = 0
number of bit errors = 0
total errors detected = 1
bit error rate = 1e-06
average speed: 775.2182 Mbit/s
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs w/o dma cache snooping
2021-02-12 0:18 ` Sergej Bauer
@ 2021-02-12 0:27 ` Sven Van Asbroeck
2021-02-12 1:31 ` Sergej Bauer
0 siblings, 1 reply; 14+ messages in thread
From: Sven Van Asbroeck @ 2021-02-12 0:27 UTC (permalink / raw)
To: Sergej Bauer
Cc: Andrew Lunn, Markus.Elfring, Alexey Denisov, Tim Harvey,
Anders Rønningen, Bryan Whitehead,
maintainer:MICROCHIP LAN743X ETHERNET DRIVER, David S. Miller,
Jakub Kicinski, open list:MICROCHIP LAN743X ETHERNET DRIVER,
open list
Hi Sergej, thank you for testing this !
On Thu, Feb 11, 2021 at 7:18 PM Sergej Bauer <sbauer@blackbox.su> wrote:
>
> although whole set of tests might be an overly extensive, but after applying patch v2 [1/5]
> tests are:
I am unfamiliar with the test_ber tool. Does this patch improve things?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs w/o dma cache snooping
2021-02-12 0:27 ` Sven Van Asbroeck
@ 2021-02-12 1:31 ` Sergej Bauer
0 siblings, 0 replies; 14+ messages in thread
From: Sergej Bauer @ 2021-02-12 1:31 UTC (permalink / raw)
To: thesven73
Cc: andrew, Markus.Elfring, rtgbnm, tharvey, anders, sbauer,
Bryan Whitehead, maintainer:MICROCHIP LAN743X ETHERNET DRIVER,
David S. Miller, Jakub Kicinski,
open list:MICROCHIP LAN743X ETHERNET DRIVER, open list
On Friday, February 12, 2021 3:27:40 AM MSK you wrote:
> Hi Sergej, thank you for testing this !
Don't mention it, it's just a small assistance
> On Thu, Feb 11, 2021 at 7:18 PM Sergej Bauer <sbauer@blackbox.su> wrote:
> > although whole set of tests might be an overly extensive, but after
> > applying patch v2 [1/5]
> > tests are:
> I am unfamiliar with the test_ber tool. Does this patch improve things?
v1 does a great job
number of lost packets decreased by 2.5-3 times
except of this, without the patch I have bit error rate about 0.423531 with MTU=1500
and now with this patch BER=0.
resuls of v2 are about the same as results of v1
tomorrow I can test it in more wide range of frame sizes.
tomorrow I can test v2 again, if it needs to be tested again.
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs w/o dma cache snooping
2021-02-11 16:18 ` [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs w/o dma cache snooping Sven Van Asbroeck
2021-02-12 0:18 ` Sergej Bauer
@ 2021-02-12 20:22 ` Bryan.Whitehead
1 sibling, 0 replies; 14+ messages in thread
From: Bryan.Whitehead @ 2021-02-12 20:22 UTC (permalink / raw)
To: thesven73, UNGLinuxDriver, davem, kuba
Cc: andrew, rtgbnm, sbauer, tharvey, anders, hdanton, hch,
willemdebruijn.kernel, netdev, linux-kernel
Hi Sven,
> Subject: [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs
> w/o dma cache snooping
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the
> content is safe
>
> From: Sven Van Asbroeck <thesven73@gmail.com>
>
> The buffers in the lan743x driver's receive ring are always 9K, even when the
> largest packet that can be received (the mtu) is much smaller. This performs
> particularly badly on cpu archs without dma cache snooping (such as ARM):
> each received packet results in a 9K dma_{map|unmap} operation, which is
> very expensive because cpu caches need to be invalidated.
>
> Careful measurement of the driver rx path on armv7 reveals that the cpu
> spends the majority of its time waiting for cache invalidation.
>
> Optimize by keeping the rx ring buffer size as close as possible to the mtu.
> This limits the amount of cache that requires invalidation.
>
> This optimization would normally force us to re-allocate all ring buffers when
> the mtu is changed - a disruptive event, because it can only happen when
> the network interface is down.
>
> Remove the need to re-allocate all ring buffers by adding support for multi-
> buffer frames. Now any combination of mtu and ring buffer size will work.
> When the mtu changes from mtu1 to mtu2, consumed buffers of size mtu1
> are lazily replaced by newly allocated buffers of size mtu2.
>
> These optimizations double the rx performance on armv7.
> Third parties report 3x rx speedup on armv8.
>
> Tested with iperf3 on a freescale imx6qp + lan7430, both sides set to mtu
> 1500 bytes, measure rx performance:
>
> Before:
> [ ID] Interval Transfer Bandwidth Retr
> [ 4] 0.00-20.00 sec 550 MBytes 231 Mbits/sec 0
> After:
> [ ID] Interval Transfer Bandwidth Retr
> [ 4] 0.00-20.00 sec 1.33 GBytes 570 Mbits/sec 0
>
> Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Looks good
Reviewed-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next v2 2/5] lan743x: sync only the received area of an rx ring buffer
2021-02-11 16:18 [PATCH net-next v2 0/5] lan743x speed boost Sven Van Asbroeck
2021-02-11 16:18 ` [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs w/o dma cache snooping Sven Van Asbroeck
@ 2021-02-11 16:18 ` Sven Van Asbroeck
2021-02-12 20:45 ` Bryan.Whitehead
2021-02-11 16:18 ` [PATCH net-next v2 3/5] TEST ONLY: lan743x: limit rx ring buffer size to 500 bytes Sven Van Asbroeck
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Sven Van Asbroeck @ 2021-02-11 16:18 UTC (permalink / raw)
To: Bryan Whitehead, UNGLinuxDriver, David S Miller, Jakub Kicinski
Cc: Sven Van Asbroeck, Andrew Lunn, Alexey Denisov, Sergej Bauer,
Tim Harvey, Anders Rønningen, Hillf Danton,
Christoph Hellwig, Willem de Bruijn, netdev, linux-kernel
From: Sven Van Asbroeck <thesven73@gmail.com>
On cpu architectures w/o dma cache snooping, dma_unmap() is a
is a very expensive operation, because its resulting sync
needs to invalidate cpu caches.
Increase efficiency/performance by syncing only those sections
of the lan743x's rx ring buffers that are actually in use.
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
---
To: Bryan Whitehead <bryan.whitehead@microchip.com>
To: UNGLinuxDriver@microchip.com
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Alexey Denisov <rtgbnm@gmail.com>
Cc: Sergej Bauer <sbauer@blackbox.su>
Cc: Tim Harvey <tharvey@gateworks.com>
Cc: Anders Rønningen <anders@ronningen.priv.no>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
drivers/net/ethernet/microchip/lan743x_main.c | 32 +++++++++++++------
1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index 0c48bb559719..36cc67c72851 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1968,35 +1968,49 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
struct net_device *netdev = rx->adapter->netdev;
struct device *dev = &rx->adapter->pdev->dev;
struct lan743x_rx_buffer_info *buffer_info;
+ unsigned int buffer_length, packet_length;
struct lan743x_rx_descriptor *descriptor;
struct sk_buff *skb;
dma_addr_t dma_ptr;
- int length;
- length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
+ buffer_length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- skb = __netdev_alloc_skb(netdev, length, GFP_ATOMIC | GFP_DMA);
+ skb = __netdev_alloc_skb(netdev, buffer_length, GFP_ATOMIC | GFP_DMA);
if (!skb)
return -ENOMEM;
- dma_ptr = dma_map_single(dev, skb->data, length, DMA_FROM_DEVICE);
+ dma_ptr = dma_map_single(dev, skb->data, buffer_length, DMA_FROM_DEVICE);
if (dma_mapping_error(dev, dma_ptr)) {
dev_kfree_skb_any(skb);
return -ENOMEM;
}
- if (buffer_info->dma_ptr)
- dma_unmap_single(dev, buffer_info->dma_ptr,
- buffer_info->buffer_length, DMA_FROM_DEVICE);
+ if (buffer_info->dma_ptr) {
+ /* unmap from dma */
+ packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_
+ (le32_to_cpu(descriptor->data0));
+ if (packet_length == 0 ||
+ packet_length > buffer_info->buffer_length)
+ /* buffer is part of multi-buffer packet: fully used */
+ packet_length = buffer_info->buffer_length;
+ /* sync used part of buffer only */
+ dma_sync_single_for_cpu(dev, buffer_info->dma_ptr,
+ packet_length,
+ DMA_FROM_DEVICE);
+ dma_unmap_single_attrs(dev, buffer_info->dma_ptr,
+ buffer_info->buffer_length,
+ DMA_FROM_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC);
+ }
buffer_info->skb = skb;
buffer_info->dma_ptr = dma_ptr;
- buffer_info->buffer_length = length;
+ buffer_info->buffer_length = buffer_length;
descriptor->data1 = cpu_to_le32(DMA_ADDR_LOW32(buffer_info->dma_ptr));
descriptor->data2 = cpu_to_le32(DMA_ADDR_HIGH32(buffer_info->dma_ptr));
descriptor->data3 = 0;
descriptor->data0 = cpu_to_le32((RX_DESC_DATA0_OWN_ |
- (length & RX_DESC_DATA0_BUF_LENGTH_MASK_)));
+ (buffer_length & RX_DESC_DATA0_BUF_LENGTH_MASK_)));
lan743x_rx_update_tail(rx, index);
return 0;
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* RE: [PATCH net-next v2 2/5] lan743x: sync only the received area of an rx ring buffer
2021-02-11 16:18 ` [PATCH net-next v2 2/5] lan743x: sync only the received area of an rx ring buffer Sven Van Asbroeck
@ 2021-02-12 20:45 ` Bryan.Whitehead
2021-02-12 22:38 ` Sven Van Asbroeck
0 siblings, 1 reply; 14+ messages in thread
From: Bryan.Whitehead @ 2021-02-12 20:45 UTC (permalink / raw)
To: thesven73, UNGLinuxDriver, davem, kuba
Cc: andrew, rtgbnm, sbauer, tharvey, anders, hdanton, hch,
willemdebruijn.kernel, netdev, linux-kernel
Hi Sven, see below.
> + if (buffer_info->dma_ptr) {
> + /* unmap from dma */
> + packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_
> + (le32_to_cpu(descriptor->data0));
> + if (packet_length == 0 ||
> + packet_length > buffer_info->buffer_length)
> + /* buffer is part of multi-buffer packet: fully used */
> + packet_length = buffer_info->buffer_length;
According to the document I have, FRAME_LENGTH is only valid when LS bit is set, and reserved otherwise.
Therefore, I'm not sure you can rely on it being zero when LS is not set, even if your experiments say it is.
Future chip revisions might use those bits differently.
Can you change this so the LS bit is checked.
If set you can use the smaller of FRAME_LENGTH or buffer length.
If clear you can just use buffer length.
> + /* sync used part of buffer only */
> + dma_sync_single_for_cpu(dev, buffer_info->dma_ptr,
> + packet_length,
> + DMA_FROM_DEVICE);
> + dma_unmap_single_attrs(dev, buffer_info->dma_ptr,
> + buffer_info->buffer_length,
> + DMA_FROM_DEVICE,
> + DMA_ATTR_SKIP_CPU_SYNC);
> + }
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next v2 2/5] lan743x: sync only the received area of an rx ring buffer
2021-02-12 20:45 ` Bryan.Whitehead
@ 2021-02-12 22:38 ` Sven Van Asbroeck
2021-02-13 19:15 ` Bryan.Whitehead
0 siblings, 1 reply; 14+ messages in thread
From: Sven Van Asbroeck @ 2021-02-12 22:38 UTC (permalink / raw)
To: Bryan Whitehead
Cc: Microchip Linux Driver Support, David Miller, Jakub Kicinski,
Andrew Lunn, Alexey Denisov, Sergej Bauer, Tim Harvey,
Anders Rønningen, Hillf Danton, Christoph Hellwig,
Willem de Bruijn, netdev, Linux Kernel Mailing List
Hi Bryan,
On Fri, Feb 12, 2021 at 3:45 PM <Bryan.Whitehead@microchip.com> wrote:
>
> According to the document I have, FRAME_LENGTH is only valid when LS bit is set, and reserved otherwise.
> Therefore, I'm not sure you can rely on it being zero when LS is not set, even if your experiments say it is.
> Future chip revisions might use those bits differently.
That's good to know. I didn't find any documentation related to
multi-buffer frames, so I had to go with what I saw the chip do
experimentally. It's great that you were able to double-check against
the official docs.
>
> Can you change this so the LS bit is checked.
> If set you can use the smaller of FRAME_LENGTH or buffer length.
> If clear you can just use buffer length.
Will do. Are you planning to hold off your tests until v3? It
shouldn't take too long.
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH net-next v2 2/5] lan743x: sync only the received area of an rx ring buffer
2021-02-12 22:38 ` Sven Van Asbroeck
@ 2021-02-13 19:15 ` Bryan.Whitehead
0 siblings, 0 replies; 14+ messages in thread
From: Bryan.Whitehead @ 2021-02-13 19:15 UTC (permalink / raw)
To: thesven73
Cc: UNGLinuxDriver, davem, kuba, andrew, rtgbnm, sbauer, tharvey,
anders, hdanton, hch, willemdebruijn.kernel, netdev, linux-kernel
> Will do. Are you planning to hold off your tests until v3? It shouldn't take too
> long.
Sure, we will wait for v3
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next v2 3/5] TEST ONLY: lan743x: limit rx ring buffer size to 500 bytes
2021-02-11 16:18 [PATCH net-next v2 0/5] lan743x speed boost Sven Van Asbroeck
2021-02-11 16:18 ` [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs w/o dma cache snooping Sven Van Asbroeck
2021-02-11 16:18 ` [PATCH net-next v2 2/5] lan743x: sync only the received area of an rx ring buffer Sven Van Asbroeck
@ 2021-02-11 16:18 ` Sven Van Asbroeck
2021-02-11 16:18 ` [PATCH net-next v2 4/5] TEST ONLY: lan743x: skb_alloc failure test Sven Van Asbroeck
` (2 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Sven Van Asbroeck @ 2021-02-11 16:18 UTC (permalink / raw)
To: Bryan Whitehead, UNGLinuxDriver, David S Miller, Jakub Kicinski
Cc: Sven Van Asbroeck, Andrew Lunn, Alexey Denisov, Sergej Bauer,
Tim Harvey, Anders Rønningen, Hillf Danton,
Christoph Hellwig, Willem de Bruijn, netdev, linux-kernel
From: Sven Van Asbroeck <thesven73@gmail.com>
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
---
To: Bryan Whitehead <bryan.whitehead@microchip.com>
To: UNGLinuxDriver@microchip.com
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Alexey Denisov <rtgbnm@gmail.com>
Cc: Sergej Bauer <sbauer@blackbox.su>
Cc: Tim Harvey <tharvey@gateworks.com>
Cc: Anders Rønningen <anders@ronningen.priv.no>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
drivers/net/ethernet/microchip/lan743x_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index 36cc67c72851..90d49231494d 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1973,7 +1973,7 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
struct sk_buff *skb;
dma_addr_t dma_ptr;
- buffer_length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
+ buffer_length = 500 + ETH_HLEN + 4 + RX_HEAD_PADDING;
descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next v2 4/5] TEST ONLY: lan743x: skb_alloc failure test
2021-02-11 16:18 [PATCH net-next v2 0/5] lan743x speed boost Sven Van Asbroeck
` (2 preceding siblings ...)
2021-02-11 16:18 ` [PATCH net-next v2 3/5] TEST ONLY: lan743x: limit rx ring buffer size to 500 bytes Sven Van Asbroeck
@ 2021-02-11 16:18 ` Sven Van Asbroeck
2021-02-11 16:18 ` [PATCH net-next v2 5/5] TEST ONLY: lan743x: skb_trim " Sven Van Asbroeck
2021-02-12 20:15 ` [PATCH net-next v2 0/5] lan743x speed boost Bryan.Whitehead
5 siblings, 0 replies; 14+ messages in thread
From: Sven Van Asbroeck @ 2021-02-11 16:18 UTC (permalink / raw)
To: Bryan Whitehead, UNGLinuxDriver, David S Miller, Jakub Kicinski
Cc: Sven Van Asbroeck, Andrew Lunn, Alexey Denisov, Sergej Bauer,
Tim Harvey, Anders Rønningen, Hillf Danton,
Christoph Hellwig, Willem de Bruijn, netdev, linux-kernel
From: Sven Van Asbroeck <thesven73@gmail.com>
Simulate low-memory in lan743x_rx_allocate_skb(): fail 10
allocations in a row in every 100.
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
---
To: Bryan Whitehead <bryan.whitehead@microchip.com>
To: UNGLinuxDriver@microchip.com
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Alexey Denisov <rtgbnm@gmail.com>
Cc: Sergej Bauer <sbauer@blackbox.su>
Cc: Tim Harvey <tharvey@gateworks.com>
Cc: Anders Rønningen <anders@ronningen.priv.no>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
drivers/net/ethernet/microchip/lan743x_main.c | 21 +++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index 90d49231494d..0094ecac5741 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1963,7 +1963,20 @@ static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
index);
}
-static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
+static struct sk_buff *
+lan743x_alloc_skb(struct net_device *netdev, int length, bool can_fail)
+{
+ static int rx_alloc;
+ int counter = rx_alloc++ % 100;
+
+ if (can_fail && counter >= 20 && counter < 30)
+ return NULL;
+
+ return __netdev_alloc_skb(netdev, length, GFP_ATOMIC | GFP_DMA);
+}
+
+static int
+lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index, bool can_fail)
{
struct net_device *netdev = rx->adapter->netdev;
struct device *dev = &rx->adapter->pdev->dev;
@@ -1977,7 +1990,7 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- skb = __netdev_alloc_skb(netdev, buffer_length, GFP_ATOMIC | GFP_DMA);
+ skb = lan743x_alloc_skb(netdev, buffer_length, can_fail);
if (!skb)
return -ENOMEM;
dma_ptr = dma_map_single(dev, skb->data, buffer_length, DMA_FROM_DEVICE);
@@ -2137,7 +2150,7 @@ static int lan743x_rx_process_buffer(struct lan743x_rx *rx)
/* save existing skb, allocate new skb and map to dma */
skb = buffer_info->skb;
- if (lan743x_rx_init_ring_element(rx, rx->last_head)) {
+ if (lan743x_rx_init_ring_element(rx, rx->last_head, true)) {
/* failed to allocate next skb.
* Memory is very low.
* Drop this packet and reuse buffer.
@@ -2342,7 +2355,7 @@ static int lan743x_rx_ring_init(struct lan743x_rx *rx)
rx->last_head = 0;
for (index = 0; index < rx->ring_size; index++) {
- ret = lan743x_rx_init_ring_element(rx, index);
+ ret = lan743x_rx_init_ring_element(rx, index, false);
if (ret)
goto cleanup;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next v2 5/5] TEST ONLY: lan743x: skb_trim failure test
2021-02-11 16:18 [PATCH net-next v2 0/5] lan743x speed boost Sven Van Asbroeck
` (3 preceding siblings ...)
2021-02-11 16:18 ` [PATCH net-next v2 4/5] TEST ONLY: lan743x: skb_alloc failure test Sven Van Asbroeck
@ 2021-02-11 16:18 ` Sven Van Asbroeck
2021-02-12 20:15 ` [PATCH net-next v2 0/5] lan743x speed boost Bryan.Whitehead
5 siblings, 0 replies; 14+ messages in thread
From: Sven Van Asbroeck @ 2021-02-11 16:18 UTC (permalink / raw)
To: Bryan Whitehead, UNGLinuxDriver, David S Miller, Jakub Kicinski
Cc: Sven Van Asbroeck, Andrew Lunn, Alexey Denisov, Sergej Bauer,
Tim Harvey, Anders Rønningen, Hillf Danton,
Christoph Hellwig, Willem de Bruijn, netdev, linux-kernel
From: Sven Van Asbroeck <thesven73@gmail.com>
Simulate low-memory in lan743x_rx_trim_skb(): fail one allocation
in every 100.
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
---
To: Bryan Whitehead <bryan.whitehead@microchip.com>
To: UNGLinuxDriver@microchip.com
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Alexey Denisov <rtgbnm@gmail.com>
Cc: Sergej Bauer <sbauer@blackbox.su>
Cc: Tim Harvey <tharvey@gateworks.com>
Cc: Anders Rønningen <anders@ronningen.priv.no>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
drivers/net/ethernet/microchip/lan743x_main.c | 28 ++++++++-----------
1 file changed, 11 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index 0094ecac5741..53c2b93b82b4 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1963,20 +1963,7 @@ static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
index);
}
-static struct sk_buff *
-lan743x_alloc_skb(struct net_device *netdev, int length, bool can_fail)
-{
- static int rx_alloc;
- int counter = rx_alloc++ % 100;
-
- if (can_fail && counter >= 20 && counter < 30)
- return NULL;
-
- return __netdev_alloc_skb(netdev, length, GFP_ATOMIC | GFP_DMA);
-}
-
-static int
-lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index, bool can_fail)
+static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
{
struct net_device *netdev = rx->adapter->netdev;
struct device *dev = &rx->adapter->pdev->dev;
@@ -1990,7 +1977,7 @@ lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index, bool can_fail)
descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- skb = lan743x_alloc_skb(netdev, buffer_length, can_fail);
+ skb = __netdev_alloc_skb(netdev, buffer_length, GFP_ATOMIC | GFP_DMA);
if (!skb)
return -ENOMEM;
dma_ptr = dma_map_single(dev, skb->data, buffer_length, DMA_FROM_DEVICE);
@@ -2075,6 +2062,13 @@ static void lan743x_rx_release_ring_element(struct lan743x_rx *rx, int index)
static struct sk_buff *
lan743x_rx_trim_skb(struct sk_buff *skb, int frame_length)
{
+ static int trim_cnt;
+
+ if ((trim_cnt++ % 100) == 77) {
+ dev_kfree_skb_irq(skb);
+ return NULL;
+ }
+
if (skb_linearize(skb)) {
dev_kfree_skb_irq(skb);
return NULL;
@@ -2150,7 +2144,7 @@ static int lan743x_rx_process_buffer(struct lan743x_rx *rx)
/* save existing skb, allocate new skb and map to dma */
skb = buffer_info->skb;
- if (lan743x_rx_init_ring_element(rx, rx->last_head, true)) {
+ if (lan743x_rx_init_ring_element(rx, rx->last_head)) {
/* failed to allocate next skb.
* Memory is very low.
* Drop this packet and reuse buffer.
@@ -2355,7 +2349,7 @@ static int lan743x_rx_ring_init(struct lan743x_rx *rx)
rx->last_head = 0;
for (index = 0; index < rx->ring_size; index++) {
- ret = lan743x_rx_init_ring_element(rx, index, false);
+ ret = lan743x_rx_init_ring_element(rx, index);
if (ret)
goto cleanup;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* RE: [PATCH net-next v2 0/5] lan743x speed boost
2021-02-11 16:18 [PATCH net-next v2 0/5] lan743x speed boost Sven Van Asbroeck
` (4 preceding siblings ...)
2021-02-11 16:18 ` [PATCH net-next v2 5/5] TEST ONLY: lan743x: skb_trim " Sven Van Asbroeck
@ 2021-02-12 20:15 ` Bryan.Whitehead
5 siblings, 0 replies; 14+ messages in thread
From: Bryan.Whitehead @ 2021-02-12 20:15 UTC (permalink / raw)
To: thesven73, UNGLinuxDriver, davem, kuba
Cc: andrew, rtgbnm, sbauer, tharvey, anders, hdanton, hch,
willemdebruijn.kernel, netdev, linux-kernel
Hi Sven, see below
> - Bryan Whitehead:
> + multi-buffer patch concept "looks good".
> As a result, I will squash the intermediate "dma buffer only" patch which
> demonstrated the speed boost using an inflexible solution
> (w/o multi-buffers).
> + Rename lan743x_rx_process_buffer() to lan743x_rx_process_packet()
You meant "Rename lan743x_rx_process_packet() to lan743x_rx_process_buffer()"
> + Remove unused RX_PROCESS_RESULT_PACKET_DROPPED
> + Rename RX_PROCESS_RESULT_BUFFER_RECEIVED to
> RX_PROCESS_RESULT_PACKET_RECEIVED
You meant "Rename RX_PROCESS_RESULT_PACKET_RECEIVED to RX_PROCESS_RESULT_BUFFER_RECEIVED"
I don't think you need a new version for just these typos, because the patch is correct. But if you do a new version then you can change it.
Regards,
Bryan
^ permalink raw reply [flat|nested] 14+ messages in thread