* [RFC TRY#2][PATCH] bgmac: pass received packet to the netif instead of copying it
@ 2013-10-28 17:42 Rafał Miłecki
2013-10-28 20:28 ` David Miller
2013-10-29 6:50 ` Nathan Hintz
0 siblings, 2 replies; 5+ messages in thread
From: Rafał Miłecki @ 2013-10-28 17:42 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: openwrt-devel, Rafał Miłecki
Copying whole packets with skb_copy_from_linear_data_offset is a pretty
bad idea. CPU was spending time in __copy_user_common and network
performance was lower. With the new solution iperf-measured speed
increased from 116Mb/s to 134Mb/s.
Another way to improve performance could be switching to build_skb. It
is cache-specific, so will require testing of various devices.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
---
drivers/net/ethernet/broadcom/bgmac.c | 71 ++++++++++++++++++++-------------
1 file changed, 44 insertions(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
index 6b7541f..fde9a11 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -307,7 +307,6 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
struct device *dma_dev = bgmac->core->dma_dev;
struct bgmac_slot_info *slot = &ring->slots[ring->start];
struct sk_buff *skb = slot->skb;
- struct sk_buff *new_skb;
struct bgmac_rx_header *rx;
u16 len, flags;
@@ -320,38 +319,56 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
len = le16_to_cpu(rx->len);
flags = le16_to_cpu(rx->flags);
- /* Check for poison and drop or pass the packet */
- if (len == 0xdead && flags == 0xbeef) {
- bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n",
- ring->start);
- } else {
+ do {
+ struct sk_buff *old_skb = slot->skb;
+ dma_addr_t old_dma_addr = slot->dma_addr;
+ int err;
+
+ /* Check for poison and drop or pass the packet */
+ if (len == 0xdead && flags == 0xbeef) {
+ bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n",
+ ring->start);
+ dma_sync_single_for_device(dma_dev,
+ slot->dma_addr,
+ BGMAC_RX_BUF_SIZE,
+ DMA_FROM_DEVICE);
+ break;
+ }
+
/* Omit CRC. */
len -= ETH_FCS_LEN;
- new_skb = netdev_alloc_skb_ip_align(bgmac->net_dev, len);
- if (new_skb) {
- skb_put(new_skb, len);
- skb_copy_from_linear_data_offset(skb, BGMAC_RX_FRAME_OFFSET,
- new_skb->data,
- len);
- skb_checksum_none_assert(skb);
- new_skb->protocol =
- eth_type_trans(new_skb, bgmac->net_dev);
- netif_receive_skb(new_skb);
- handled++;
- } else {
- bgmac->net_dev->stats.rx_dropped++;
- bgmac_err(bgmac, "Allocation of skb for copying packet failed!\n");
+ /* Prepare new skb as replacement */
+ err = bgmac_dma_rx_skb_for_slot(bgmac, slot);
+ if (err) {
+ slot->skb = old_skb;
+ slot->dma_addr = old_dma_addr;
+
+ /* Poison the old skb */
+ rx->len = cpu_to_le16(0xdead);
+ rx->flags = cpu_to_le16(0xbeef);
+
+ dma_sync_single_for_device(dma_dev,
+ slot->dma_addr,
+ BGMAC_RX_BUF_SIZE,
+ DMA_FROM_DEVICE);
+ break;
}
+ bgmac_dma_rx_setup_desc(bgmac, ring, ring->start);
- /* Poison the old skb */
- rx->len = cpu_to_le16(0xdead);
- rx->flags = cpu_to_le16(0xbeef);
- }
+ /* Unmap old skb, we'll pass it to the netfif */
+ dma_unmap_single(dma_dev, old_dma_addr,
+ BGMAC_RX_BUF_SIZE,
+ DMA_FROM_DEVICE);
+
+ skb_put(skb, BGMAC_RX_FRAME_OFFSET + len);
+ skb_pull(skb, BGMAC_RX_FRAME_OFFSET);
- /* Make it back accessible to the hardware */
- dma_sync_single_for_device(dma_dev, slot->dma_addr,
- BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
+ skb_checksum_none_assert(skb);
+ skb->protocol = eth_type_trans(skb, bgmac->net_dev);
+ netif_receive_skb(skb);
+ handled++;
+ } while (0);
if (++ring->start >= BGMAC_RX_RING_SLOTS)
ring->start = 0;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC TRY#2][PATCH] bgmac: pass received packet to the netif instead of copying it
2013-10-28 17:42 [RFC TRY#2][PATCH] bgmac: pass received packet to the netif instead of copying it Rafał Miłecki
@ 2013-10-28 20:28 ` David Miller
2013-10-28 20:36 ` Rafał Miłecki
2013-10-29 6:50 ` Nathan Hintz
1 sibling, 1 reply; 5+ messages in thread
From: David Miller @ 2013-10-28 20:28 UTC (permalink / raw)
To: zajec5; +Cc: netdev, openwrt-devel
From: Rafał Miłecki <zajec5@gmail.com>
Date: Mon, 28 Oct 2013 18:42:22 +0100
> Copying whole packets with skb_copy_from_linear_data_offset is a pretty
> bad idea. CPU was spending time in __copy_user_common and network
> performance was lower. With the new solution iperf-measured speed
> increased from 116Mb/s to 134Mb/s.
>
> Another way to improve performance could be switching to build_skb. It
> is cache-specific, so will require testing of various devices.
>
> Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Semantically this looks good but it needs some coding style fixes:
The following is indented correctly:
> + dma_sync_single_for_device(dma_dev,
> + slot->dma_addr,
> + BGMAC_RX_BUF_SIZE,
> + DMA_FROM_DEVICE);
> + break;
> }
However, this is not:
> + /* Unmap old skb, we'll pass it to the netfif */
> + dma_unmap_single(dma_dev, old_dma_addr,
> + BGMAC_RX_BUF_SIZE,
> + DMA_FROM_DEVICE);
> +
Like the first case, please align the arguments on the second and subsequently
line to start at the first column after the openning parenthesis on the
first line.
Every time I see this not done correctly, it's because the person doing it
is trying to use TAB characters only to indent. You should be using the
appropriate number of TAB _and_ SPACE characters necessary to get things
lines up properly.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC TRY#2][PATCH] bgmac: pass received packet to the netif instead of copying it
2013-10-28 20:28 ` David Miller
@ 2013-10-28 20:36 ` Rafał Miłecki
0 siblings, 0 replies; 5+ messages in thread
From: Rafał Miłecki @ 2013-10-28 20:36 UTC (permalink / raw)
To: David Miller; +Cc: Network Development, OpenWrt Development List
2013/10/28 David Miller <davem@davemloft.net>:
> However, this is not:
>
>> + /* Unmap old skb, we'll pass it to the netfif */
>> + dma_unmap_single(dma_dev, old_dma_addr,
>> + BGMAC_RX_BUF_SIZE,
>> + DMA_FROM_DEVICE);
>> +
Oh, sorry for that. I had to copy & paste it in some silly way... Will fix that!
--
Rafał
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC TRY#2][PATCH] bgmac: pass received packet to the netif instead of copying it
2013-10-28 17:42 [RFC TRY#2][PATCH] bgmac: pass received packet to the netif instead of copying it Rafał Miłecki
2013-10-28 20:28 ` David Miller
@ 2013-10-29 6:50 ` Nathan Hintz
2013-10-29 6:59 ` [OpenWrt-Devel] " Rafał Miłecki
1 sibling, 1 reply; 5+ messages in thread
From: Nathan Hintz @ 2013-10-29 6:50 UTC (permalink / raw)
To: OpenWrt Development List; +Cc: netdev, David S. Miller
On Mon, 28 Oct 2013 18:42:22 +0100
Rafał Miłecki <zajec5@gmail.com> wrote:
Hi:
A few questions/comments inline...
Nathan
> Copying whole packets with skb_copy_from_linear_data_offset is a pretty
> bad idea. CPU was spending time in __copy_user_common and network
> performance was lower. With the new solution iperf-measured speed
> increased from 116Mb/s to 134Mb/s.
>
> Another way to improve performance could be switching to build_skb. It
> is cache-specific, so will require testing of various devices.
>
> Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
> ---
> drivers/net/ethernet/broadcom/bgmac.c | 71 ++++++++++++++++++++-------------
> 1 file changed, 44 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
> index 6b7541f..fde9a11 100644
> --- a/drivers/net/ethernet/broadcom/bgmac.c
> +++ b/drivers/net/ethernet/broadcom/bgmac.c
> @@ -307,7 +307,6 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
> struct device *dma_dev = bgmac->core->dma_dev;
> struct bgmac_slot_info *slot = &ring->slots[ring->start];
> struct sk_buff *skb = slot->skb;
> - struct sk_buff *new_skb;
> struct bgmac_rx_header *rx;
> u16 len, flags;
>
> @@ -320,38 +319,56 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
> len = le16_to_cpu(rx->len);
> flags = le16_to_cpu(rx->flags);
>
> - /* Check for poison and drop or pass the packet */
> - if (len == 0xdead && flags == 0xbeef) {
> - bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n",
> - ring->start);
> - } else {
> + do {
"old_skb" duplicates "skb" stored above, can that one be used (renamed) instead of creating a new one here?
> + struct sk_buff *old_skb = slot->skb;
> + dma_addr_t old_dma_addr = slot->dma_addr;
> + int err;
> +
> + /* Check for poison and drop or pass the packet */
> + if (len == 0xdead && flags == 0xbeef) {
> + bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n",
> + ring->start);
Nothing in the buffer has been changed by the cpu yet, so is this sync necessary?
> + dma_sync_single_for_device(dma_dev,
> + slot->dma_addr,
> + BGMAC_RX_BUF_SIZE,
> + DMA_FROM_DEVICE);
> + break;
> + }
> +
> /* Omit CRC. */
> len -= ETH_FCS_LEN;
>
> - new_skb = netdev_alloc_skb_ip_align(bgmac->net_dev, len);
> - if (new_skb) {
> - skb_put(new_skb, len);
> - skb_copy_from_linear_data_offset(skb, BGMAC_RX_FRAME_OFFSET,
> - new_skb->data,
> - len);
> - skb_checksum_none_assert(skb);
> - new_skb->protocol =
> - eth_type_trans(new_skb, bgmac->net_dev);
> - netif_receive_skb(new_skb);
> - handled++;
> - } else {
> - bgmac->net_dev->stats.rx_dropped++;
> - bgmac_err(bgmac, "Allocation of skb for copying packet failed!\n");
> + /* Prepare new skb as replacement */
> + err = bgmac_dma_rx_skb_for_slot(bgmac, slot);
> + if (err) {
I've sent a separate patch against "bgmac_dma_rx_skb_for_slot" to not corrupt the slot at all if an
error occurs (skb alloc or dma mapping), and free the skb that was allocated if a dma mapping error
occurs. Assuming that patch is accepted, then the following two lines would not be needed.
With "bgmac_dma_rx_skb_for_slot" as it currently exists, this would leak an skb for a dma mapping
error (this was pre-existing to the changes in this patch).
> + slot->skb = old_skb;
> + slot->dma_addr = old_dma_addr;
> +
> + /* Poison the old skb */
> + rx->len = cpu_to_le16(0xdead);
> + rx->flags = cpu_to_le16(0xbeef);
> +
> + dma_sync_single_for_device(dma_dev,
> + slot->dma_addr,
> + BGMAC_RX_BUF_SIZE,
> + DMA_FROM_DEVICE);
> + break;
> }
> + bgmac_dma_rx_setup_desc(bgmac, ring, ring->start);
>
> - /* Poison the old skb */
> - rx->len = cpu_to_le16(0xdead);
> - rx->flags = cpu_to_le16(0xbeef);
> - }
> + /* Unmap old skb, we'll pass it to the netfif */
> + dma_unmap_single(dma_dev, old_dma_addr,
> + BGMAC_RX_BUF_SIZE,
> + DMA_FROM_DEVICE);
> +
> + skb_put(skb, BGMAC_RX_FRAME_OFFSET + len);
> + skb_pull(skb, BGMAC_RX_FRAME_OFFSET);
>
> - /* Make it back accessible to the hardware */
> - dma_sync_single_for_device(dma_dev, slot->dma_addr,
> - BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
> + skb_checksum_none_assert(skb);
> + skb->protocol = eth_type_trans(skb, bgmac->net_dev);
> + netif_receive_skb(skb);
> + handled++;
> + } while (0);
>
> if (++ring->start >= BGMAC_RX_RING_SLOTS)
> ring->start = 0;
--
Nathan
_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [OpenWrt-Devel] [RFC TRY#2][PATCH] bgmac: pass received packet to the netif instead of copying it
2013-10-29 6:50 ` Nathan Hintz
@ 2013-10-29 6:59 ` Rafał Miłecki
0 siblings, 0 replies; 5+ messages in thread
From: Rafał Miłecki @ 2013-10-29 6:59 UTC (permalink / raw)
To: Nathan Hintz
Cc: OpenWrt Development List, Network Development, David S. Miller
2013/10/29 Nathan Hintz <nlhintz@hotmail.com>:
> On Mon, 28 Oct 2013 18:42:22 +0100
> Rafał Miłecki <zajec5@gmail.com> wrote:
>
> Hi:
>
> A few questions/comments inline...
>
> Nathan
>
>> Copying whole packets with skb_copy_from_linear_data_offset is a pretty
>> bad idea. CPU was spending time in __copy_user_common and network
>> performance was lower. With the new solution iperf-measured speed
>> increased from 116Mb/s to 134Mb/s.
>>
>> Another way to improve performance could be switching to build_skb. It
>> is cache-specific, so will require testing of various devices.
>>
>> Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
>> ---
>> drivers/net/ethernet/broadcom/bgmac.c | 71 ++++++++++++++++++++-------------
>> 1 file changed, 44 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
>> index 6b7541f..fde9a11 100644
>> --- a/drivers/net/ethernet/broadcom/bgmac.c
>> +++ b/drivers/net/ethernet/broadcom/bgmac.c
>> @@ -307,7 +307,6 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
>> struct device *dma_dev = bgmac->core->dma_dev;
>> struct bgmac_slot_info *slot = &ring->slots[ring->start];
>> struct sk_buff *skb = slot->skb;
>> - struct sk_buff *new_skb;
>> struct bgmac_rx_header *rx;
>> u16 len, flags;
>>
>> @@ -320,38 +319,56 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
>> len = le16_to_cpu(rx->len);
>> flags = le16_to_cpu(rx->flags);
>>
>> - /* Check for poison and drop or pass the packet */
>> - if (len == 0xdead && flags == 0xbeef) {
>> - bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n",
>> - ring->start);
>> - } else {
>> + do {
>
> "old_skb" duplicates "skb" stored above, can that one be used (renamed) instead of creating a new one here?
I've focused on clean code too much. That won't be needed anyway when
I rebase my patch on top of yours.
>> + struct sk_buff *old_skb = slot->skb;
>> + dma_addr_t old_dma_addr = slot->dma_addr;
>> + int err;
>> +
>> + /* Check for poison and drop or pass the packet */
>> + if (len == 0xdead && flags == 0xbeef) {
>> + bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n",
>> + ring->start);
>> + dma_sync_single_for_device(dma_dev,
>> + slot->dma_addr,
>> + BGMAC_RX_BUF_SIZE,
>> + DMA_FROM_DEVICE);
>
> Nothing in the buffer has been changed by the cpu yet, so is this sync necessary?
I've moved your comment a line below, so it comments the code *above*.
I'm using LDD3 to understand DMA and it contains this explanation:
> void dma_sync_single_for_cpu(struct device *dev, dma_handle_t bus_addr,
> size_t size, enum dma_data_direction direction);
>
> This function should be called before the processor accesses a streaming DMA buffer. Once the call has been made, the CPU “owns” the DMA buffer and can work with it as needed. Before the device accesses the buffer, however, ownership should be transferred back to it with:
>
> void dma_sync_single_for_device(struct device *dev, dma_handle_t bus_addr,
> size_t size, enum dma_data_direction direction);
So even if I didn't change anything in the buffer, I believe we still
need to "sync" it back to make it accessible to the hardware.
> I've sent a separate patch against "bgmac_dma_rx_skb_for_slot" to not corrupt the slot at all if an
> error occurs (skb alloc or dma mapping), and free the skb that was allocated if a dma mapping error
> occurs. Assuming that patch is accepted, then the following two lines would not be needed.
> With "bgmac_dma_rx_skb_for_slot" as it currently exists, this would leak an skb for a dma mapping
> error (this was pre-existing to the changes in this patch).
Thanks, I'll rebase my patch.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-10-29 6:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-28 17:42 [RFC TRY#2][PATCH] bgmac: pass received packet to the netif instead of copying it Rafał Miłecki
2013-10-28 20:28 ` David Miller
2013-10-28 20:36 ` Rafał Miłecki
2013-10-29 6:50 ` Nathan Hintz
2013-10-29 6:59 ` [OpenWrt-Devel] " Rafał Miłecki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).