* [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location @ 2012-07-16 20:03 Andy Cress 2012-07-17 0:59 ` Paul Gortmaker 2012-07-17 7:09 ` Eric Dumazet 0 siblings, 2 replies; 10+ messages in thread From: Andy Cress @ 2012-07-16 20:03 UTC (permalink / raw) To: netdev Author: Zhong Hongbo <hongbo.zhong@windriver.com> Due to some unknown hardware limitations the pch_gbe hardware cannot calculate checksums when the length of network package is less than 64 bytes, where we will surprisingly encounter a problem of the destination IP incorrectly changed. When forwarding network packages at the network layer the IP packages won't be relayed to the upper transport layer and analyzed there, consequently, skb->transport_header pointer will be mistakenly remained the same as that of skb->network_header, resulting in TCP checksum wrongly filled into the field of destination IP in IP header. We can fix this issue by manually calculate the offset of the TCP checksum and update it accordingly. We would normally use the skb_checksum_start_offset(skb) here, but in this case it is sometimes -2 (csum_start=0 - skb_headroom=2 => -2), hence the manual calculation. Signed-off-by: Zhong Hongbo <hongbo.zhong@windriver.com> Merged-by: Andy Cress <andy.cress@us.kontron.com> --- drivers/net/pch_gbe/pch_gbe_main.c | 14 ++++++++------ 1 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c index 3787c64..1642bff 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c @@ -1178,32 +1178,35 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, /* * It is because the hardware accelerator does not support a checksum, * when the received data size is less than 64 bytes. + * Note: skb_checksum_start_offset(skb) is sometimes -2 here. */ if (skb->len < PCH_GBE_SHORT_PKT && skb->ip_summed != CHECKSUM_NONE) { + struct iphdr *iph = ip_hdr(skb); frame_ctrl |= PCH_GBE_TXD_CTRL_APAD | PCH_GBE_TXD_CTRL_TCPIP_ACC_OFF; if (skb->protocol == htons(ETH_P_IP)) { - struct iphdr *iph = ip_hdr(skb); unsigned int offset; - offset = skb_transport_offset(skb); + offset = (unsigned char *)((u8 *)iph + iph->ihl * 4) - skb->data; if (iph->protocol == IPPROTO_TCP) { + struct tcphdr *tcphdr_point = (struct tcphdr *)((u8 *)iph + iph->ihl * 4); skb->csum = 0; - tcp_hdr(skb)->check = 0; + tcphdr_point->check = 0; skb->csum = skb_checksum(skb, offset, skb->len - offset, 0); - tcp_hdr(skb)->check = + tcphdr_point->check = csum_tcpudp_magic(iph->saddr, iph->daddr, skb->len - offset, IPPROTO_TCP, skb->csum); } else if (iph->protocol == IPPROTO_UDP) { + struct udphdr *udphdr_point = (struct udphdr *)((u8 *)iph + iph->ihl * 4); skb->csum = 0; - udp_hdr(skb)->check = 0; + udphdr_point->check = 0; skb->csum = skb_checksum(skb, offset, skb->len - offset, 0); - udp_hdr(skb)->check = + udphdr_point->check = csum_tcpudp_magic(iph->saddr, iph->daddr, skb->len - offset, ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location 2012-07-16 20:03 [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location Andy Cress @ 2012-07-17 0:59 ` Paul Gortmaker 2012-07-17 7:09 ` Eric Dumazet 1 sibling, 0 replies; 10+ messages in thread From: Paul Gortmaker @ 2012-07-17 0:59 UTC (permalink / raw) To: Andy Cress; +Cc: netdev Hi Andy, On Mon, Jul 16, 2012 at 4:03 PM, Andy Cress <andy.cress@us.kontron.com> wrote: > > Author: Zhong Hongbo <hongbo.zhong@windriver.com> I'm curious as to how you are generating these mails, since if it were with the "normal" method, then the above would have a "From: " prefix, and not "Author: " prefix. (More on what is the "normal" method below...) > > Due to some unknown hardware limitations the pch_gbe hardware cannot > calculate checksums when the length of network package is less > than 64 bytes, where we will surprisingly encounter a problem of > the destination IP incorrectly changed. This has the feel of a problem that isn't properly understood, and a solution that is aimed at masking the symptom at the point where it rears its ugly head... Not a good sign, when it comes to requesting upstream acceptance. > > When forwarding network packages at the network layer the IP packages s/packages/packets/ > won't be relayed to the upper transport layer and analyzed there, > consequently, skb->transport_header pointer will be mistakenly remained > the same as that of skb->network_header, resulting in TCP checksum > wrongly > filled into the field of destination IP in IP header. Similarly here, the one-word-per-line symptom seems to hint at it being a case of data manually entered into a GUI MUA, which generally translates into headaches for the maintainer who has to integrate your work. If you can adopt a "standard" workflow, it will be easier for both you and the maintainer. And that "standard" workflow could be something as simple as this: -------------------------- cd /path/to/my/git-repo/of/net-next git pull git checkout -b pch-fixes master <create commits, by cherry-pick or manual creation> <validate with compilation, boot testing> git format-patch --subject-prefix="PATCH net-next" --cover-letter -o sendme master..pch-fixes vi sendme/0000-cover-letter.patch <enter text describing your series of patches> git send-email --to netdev@vger.kernel.org -cc maintainer@someaddress.com sendme ------------------------- > > We can fix this issue by manually calculate the offset of the TCP > checksum > and update it accordingly. > > We would normally use the skb_checksum_start_offset(skb) here, but in > this > case it is sometimes -2 (csum_start=0 - skb_headroom=2 => -2), hence the > manual calculation. > > Signed-off-by: Zhong Hongbo <hongbo.zhong@windriver.com> > Merged-by: Andy Cress <andy.cress@us.kontron.com> There really isn't a precedent for Merged-by: -- you may want to see: http://kerneltrap.org/taxonomy/term/245 A "real" merge actually creates a merge commit, which is unique in its own right. So here, if you've carried forward an older patch, or similar, you'd be most likely to add your own SOB line underneath. Also a Cc: addition might be in order, say based on the output of: ./scripts/get_maintainer.pl -f drivers/net/ethernet/oki-semi/pch_gbe A Cc: line put here goes on record. If you use the --cc option mentioned above, the Cc: record is limited to the mail header in the netdev mail archive. > > --- > drivers/net/pch_gbe/pch_gbe_main.c | 14 ++++++++------ > 1 files changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c > b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c > index 3787c64..1642bff 100644 > --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c > +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c > @@ -1178,32 +1178,35 @@ static void pch_gbe_tx_queue(struct > pch_gbe_adapter *adapter, > /* > * It is because the hardware accelerator does not support a > checksum, > * when the received data size is less than 64 bytes. > + * Note: skb_checksum_start_offset(skb) is sometimes -2 here. This comment is borderline useless, without some level of justification of how/why that happens, and why it is OK/acceptable. > */ > if (skb->len < PCH_GBE_SHORT_PKT && skb->ip_summed != > CHECKSUM_NONE) { > + struct iphdr *iph = ip_hdr(skb); > frame_ctrl |= PCH_GBE_TXD_CTRL_APAD | > PCH_GBE_TXD_CTRL_TCPIP_ACC_OFF; > if (skb->protocol == htons(ETH_P_IP)) { > - struct iphdr *iph = ip_hdr(skb); > unsigned int offset; > - offset = skb_transport_offset(skb); > + offset = (unsigned char *)((u8 *)iph + iph->ihl > * 4) - skb->data; > if (iph->protocol == IPPROTO_TCP) { > + struct tcphdr *tcphdr_point = (struct > tcphdr *)((u8 *)iph + iph->ihl * 4); There are several things going on here that are not ideal. There are casts of casts with unexplained math magic, there is the attempt to encode variable types ("_point") in the variable name, and there is apparently mailer munging which inserts rogue newlines. At this point, I'd have to suggest that you go back to trying to describe the use case, and the exact symptoms you are seeing there; describe that and how you believe it comes about as articulately as possible, and ask for input on how best to deal with it; giving reference to this as a workaround that was deployed some time in the past. In the end, this is a recipe for a good commit log anyways, i.e. describing the end user symptom, the use case that triggers it, the underlying reason as to why it happens, and finally the technically correct fix to deal with solving it (and indicate why it is the correct fix, if not obvious.) Hope this helps with basic process and expectations. Paul. -- > skb->csum = 0; > - tcp_hdr(skb)->check = 0; > + tcphdr_point->check = 0; > skb->csum = skb_checksum(skb, offset, > skb->len - > offset, 0); > - tcp_hdr(skb)->check = > + tcphdr_point->check = > csum_tcpudp_magic(iph->saddr, > iph->daddr, > skb->len - > offset, > IPPROTO_TCP, > skb->csum); > } else if (iph->protocol == IPPROTO_UDP) { > + struct udphdr *udphdr_point = (struct > udphdr *)((u8 *)iph + iph->ihl * 4); > skb->csum = 0; > - udp_hdr(skb)->check = 0; > + udphdr_point->check = 0; > skb->csum = > skb_checksum(skb, offset, > skb->len - offset, > 0); > - udp_hdr(skb)->check = > + udphdr_point->check = > csum_tcpudp_magic(iph->saddr, > iph->daddr, > skb->len - > offset, > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location 2012-07-16 20:03 [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location Andy Cress 2012-07-17 0:59 ` Paul Gortmaker @ 2012-07-17 7:09 ` Eric Dumazet 2012-07-17 7:33 ` Eric Dumazet 2012-07-17 8:04 ` [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location Zhong Hongbo 1 sibling, 2 replies; 10+ messages in thread From: Eric Dumazet @ 2012-07-17 7:09 UTC (permalink / raw) To: Andy Cress; +Cc: netdev, Zhong Hongbo On Mon, 2012-07-16 at 13:03 -0700, Andy Cress wrote: > Author: Zhong Hongbo <hongbo.zhong@windriver.com> > > Due to some unknown hardware limitations the pch_gbe hardware cannot > calculate checksums when the length of network package is less > than 64 bytes, where we will surprisingly encounter a problem of > the destination IP incorrectly changed. > > When forwarding network packages at the network layer the IP packages > won't be relayed to the upper transport layer and analyzed there, > consequently, skb->transport_header pointer will be mistakenly remained > the same as that of skb->network_header, resulting in TCP checksum > wrongly > filled into the field of destination IP in IP header. > > We can fix this issue by manually calculate the offset of the TCP > checksum > and update it accordingly. > > We would normally use the skb_checksum_start_offset(skb) here, but in > this > case it is sometimes -2 (csum_start=0 - skb_headroom=2 => -2), hence the > manual calculation. > > Signed-off-by: Zhong Hongbo <hongbo.zhong@windriver.com> > Merged-by: Andy Cress <andy.cress@us.kontron.com> Hmm... I fail to understand why you care about NIC doing checksums, while pch_gbe_tx_queue() make a _copy_ of each outgoing packets. There _must_ be a way to avoid most of these copies (ie not touching payload), only mess with the header to insert these 2 nul bytes ? /* [Header:14][payload] ---> [Header:14][paddong:2][payload] */ So at device setup : dev->needed_headroom = 2; and in xmit, if (skb_headroom(skb) < 2) { struct sk_buff *skb_new; skb_new = skb_realloc_headroom(skb, 2); if (!skb_new) { handle error } consume_skb(skb); skb = skb_new; } ptr = skb_push(skb, 2); memmove(ptr, ptr + 2, ETH_HLEN); ptr[ETH_HLEN] = 0; ptr[ETH_HLEN + 1] = 0; ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location 2012-07-17 7:09 ` Eric Dumazet @ 2012-07-17 7:33 ` Eric Dumazet 2012-07-17 14:20 ` Andy Cress 2012-07-25 20:10 ` Andy Cress 2012-07-17 8:04 ` [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location Zhong Hongbo 1 sibling, 2 replies; 10+ messages in thread From: Eric Dumazet @ 2012-07-17 7:33 UTC (permalink / raw) To: Andy Cress; +Cc: netdev, Zhong Hongbo On Tue, 2012-07-17 at 09:09 +0200, Eric Dumazet wrote: > Hmm... I fail to understand why you care about NIC doing checksums, > while pch_gbe_tx_queue() make a _copy_ of each outgoing > packets. > > There _must_ be a way to avoid most of these copies (ie not touching > payload), only mess with the header to insert these 2 nul bytes ? > > /* [Header:14][payload] ---> [Header:14][paddong:2][payload] */ > > So at device setup : dev->needed_headroom = 2; > > and in xmit, > > if (skb_headroom(skb) < 2) { > struct sk_buff *skb_new; > > skb_new = skb_realloc_headroom(skb, 2); > if (!skb_new) { handle error } > consume_skb(skb); > skb = skb_new; > } > ptr = skb_push(skb, 2); > memmove(ptr, ptr + 2, ETH_HLEN); > ptr[ETH_HLEN] = 0; > ptr[ETH_HLEN + 1] = 0; > > Something like the following (untested) patch drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 55 +++++----- 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c index b100656..2d3d982 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c @@ -1163,7 +1163,7 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, struct pch_gbe_hw *hw = &adapter->hw; struct pch_gbe_tx_desc *tx_desc; struct pch_gbe_buffer *buffer_info; - struct sk_buff *tmp_skb; + char *ptr; unsigned int frame_ctrl; unsigned int ring_num; @@ -1221,18 +1221,27 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, buffer_info = &tx_ring->buffer_info[ring_num]; - tmp_skb = buffer_info->skb; + if (skb_headroom(skb) < 2) { + struct sk_buff *skb_new; + + skb_new = skb_realloc_headroom(skb, 2); + if (!skb_new) { + tx_ring->next_to_use = ring_num; + dev_kfree_skb_any(skb); + return; + } + consume_skb(skb); + skb = skb_new; + } /* [Header:14][payload] ---> [Header:14][paddong:2][payload] */ - memcpy(tmp_skb->data, skb->data, ETH_HLEN); - tmp_skb->data[ETH_HLEN] = 0x00; - tmp_skb->data[ETH_HLEN + 1] = 0x00; - tmp_skb->len = skb->len; - memcpy(&tmp_skb->data[ETH_HLEN + 2], &skb->data[ETH_HLEN], - (skb->len - ETH_HLEN)); + ptr = skb_push(skb, 2); + memmove(ptr, ptr + 2, ETH_HLEN); + ptr[ETH_HLEN] = 0x00; + ptr[ETH_HLEN + 1] = 0x00; /*-- Set Buffer information --*/ - buffer_info->length = tmp_skb->len; - buffer_info->dma = dma_map_single(&adapter->pdev->dev, tmp_skb->data, + buffer_info->length = skb->len; + buffer_info->dma = dma_map_single(&adapter->pdev->dev, skb->data, buffer_info->length, DMA_TO_DEVICE); if (dma_mapping_error(&adapter->pdev->dev, buffer_info->dma)) { @@ -1240,18 +1249,20 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, buffer_info->dma = 0; buffer_info->time_stamp = 0; tx_ring->next_to_use = ring_num; + dev_kfree_skb_any(skb); return; } buffer_info->mapped = true; buffer_info->time_stamp = jiffies; + buffer_info->skb = skb; /*-- Set Tx descriptor --*/ tx_desc = PCH_GBE_TX_DESC(*tx_ring, ring_num); - tx_desc->buffer_addr = (buffer_info->dma); - tx_desc->length = (tmp_skb->len); - tx_desc->tx_words_eob = ((tmp_skb->len + 3)); + tx_desc->buffer_addr = buffer_info->dma; + tx_desc->length = skb->len; + tx_desc->tx_words_eob = skb->len + 3; tx_desc->tx_frame_ctrl = (frame_ctrl); - tx_desc->gbec_status = (DSC_INIT16); + tx_desc->gbec_status = DSC_INIT16; if (unlikely(++ring_num == tx_ring->count)) ring_num = 0; @@ -1265,7 +1276,6 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, pch_tx_timestamp(adapter, skb); #endif - dev_kfree_skb_any(skb); } /** @@ -1543,19 +1553,12 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter, struct pch_gbe_tx_ring *tx_ring) { struct pch_gbe_buffer *buffer_info; - struct sk_buff *skb; unsigned int i; - unsigned int bufsz; struct pch_gbe_tx_desc *tx_desc; - bufsz = - adapter->hw.mac.max_frame_size + PCH_GBE_DMA_ALIGN + NET_IP_ALIGN; - for (i = 0; i < tx_ring->count; i++) { buffer_info = &tx_ring->buffer_info[i]; - skb = netdev_alloc_skb(adapter->netdev, bufsz); - skb_reserve(skb, PCH_GBE_DMA_ALIGN); - buffer_info->skb = skb; + buffer_info->skb = NULL; tx_desc = PCH_GBE_TX_DESC(*tx_ring, i); tx_desc->gbec_status = (DSC_INIT16); } @@ -1622,9 +1625,9 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter, buffer_info->length, DMA_TO_DEVICE); buffer_info->mapped = false; } - if (buffer_info->skb) { - pr_debug("trim buffer_info->skb : %d\n", i); - skb_trim(buffer_info->skb, 0); + if (skb) { + dev_kfree_skb_any(skb); + buffer_info->skb = NULL; } tx_desc->gbec_status = DSC_INIT16; if (unlikely(++i == tx_ring->count)) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location 2012-07-17 7:33 ` Eric Dumazet @ 2012-07-17 14:20 ` Andy Cress 2012-07-25 20:10 ` Andy Cress 1 sibling, 0 replies; 10+ messages in thread From: Andy Cress @ 2012-07-17 14:20 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Zhong Hongbo Eric, This is intriguing, and the data copy also would explain why this transmit path is slow, and is susceptible to transmit timeouts. I want to apply and test your proposed patch, but I'll have to do that next week. Andy -----Original Message----- From: Eric Dumazet [mailto:eric.dumazet@gmail.com] Sent: Tuesday, July 17, 2012 3:33 AM To: Andy Cress Cc: netdev@vger.kernel.org; Zhong Hongbo Subject: Re: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location On Tue, 2012-07-17 at 09:09 +0200, Eric Dumazet wrote: > Hmm... I fail to understand why you care about NIC doing checksums, > while pch_gbe_tx_queue() make a _copy_ of each outgoing > packets. > > There _must_ be a way to avoid most of these copies (ie not touching > payload), only mess with the header to insert these 2 nul bytes ? > > /* [Header:14][payload] ---> [Header:14][paddong:2][payload] */ > > So at device setup : dev->needed_headroom = 2; > > and in xmit, > > if (skb_headroom(skb) < 2) { > struct sk_buff *skb_new; > > skb_new = skb_realloc_headroom(skb, 2); > if (!skb_new) { handle error } > consume_skb(skb); > skb = skb_new; > } > ptr = skb_push(skb, 2); > memmove(ptr, ptr + 2, ETH_HLEN); > ptr[ETH_HLEN] = 0; > ptr[ETH_HLEN + 1] = 0; > > Something like the following (untested) patch drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 55 +++++----- 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c index b100656..2d3d982 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c @@ -1163,7 +1163,7 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, struct pch_gbe_hw *hw = &adapter->hw; struct pch_gbe_tx_desc *tx_desc; struct pch_gbe_buffer *buffer_info; - struct sk_buff *tmp_skb; + char *ptr; unsigned int frame_ctrl; unsigned int ring_num; @@ -1221,18 +1221,27 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, buffer_info = &tx_ring->buffer_info[ring_num]; - tmp_skb = buffer_info->skb; + if (skb_headroom(skb) < 2) { + struct sk_buff *skb_new; + + skb_new = skb_realloc_headroom(skb, 2); + if (!skb_new) { + tx_ring->next_to_use = ring_num; + dev_kfree_skb_any(skb); + return; + } + consume_skb(skb); + skb = skb_new; + } /* [Header:14][payload] ---> [Header:14][paddong:2][payload] */ - memcpy(tmp_skb->data, skb->data, ETH_HLEN); - tmp_skb->data[ETH_HLEN] = 0x00; - tmp_skb->data[ETH_HLEN + 1] = 0x00; - tmp_skb->len = skb->len; - memcpy(&tmp_skb->data[ETH_HLEN + 2], &skb->data[ETH_HLEN], - (skb->len - ETH_HLEN)); + ptr = skb_push(skb, 2); + memmove(ptr, ptr + 2, ETH_HLEN); + ptr[ETH_HLEN] = 0x00; + ptr[ETH_HLEN + 1] = 0x00; /*-- Set Buffer information --*/ - buffer_info->length = tmp_skb->len; - buffer_info->dma = dma_map_single(&adapter->pdev->dev, tmp_skb->data, + buffer_info->length = skb->len; + buffer_info->dma = dma_map_single(&adapter->pdev->dev, skb->data, buffer_info->length, DMA_TO_DEVICE); if (dma_mapping_error(&adapter->pdev->dev, buffer_info->dma)) { @@ -1240,18 +1249,20 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, buffer_info->dma = 0; buffer_info->time_stamp = 0; tx_ring->next_to_use = ring_num; + dev_kfree_skb_any(skb); return; } buffer_info->mapped = true; buffer_info->time_stamp = jiffies; + buffer_info->skb = skb; /*-- Set Tx descriptor --*/ tx_desc = PCH_GBE_TX_DESC(*tx_ring, ring_num); - tx_desc->buffer_addr = (buffer_info->dma); - tx_desc->length = (tmp_skb->len); - tx_desc->tx_words_eob = ((tmp_skb->len + 3)); + tx_desc->buffer_addr = buffer_info->dma; + tx_desc->length = skb->len; + tx_desc->tx_words_eob = skb->len + 3; tx_desc->tx_frame_ctrl = (frame_ctrl); - tx_desc->gbec_status = (DSC_INIT16); + tx_desc->gbec_status = DSC_INIT16; if (unlikely(++ring_num == tx_ring->count)) ring_num = 0; @@ -1265,7 +1276,6 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, pch_tx_timestamp(adapter, skb); #endif - dev_kfree_skb_any(skb); } /** @@ -1543,19 +1553,12 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter, struct pch_gbe_tx_ring *tx_ring) { struct pch_gbe_buffer *buffer_info; - struct sk_buff *skb; unsigned int i; - unsigned int bufsz; struct pch_gbe_tx_desc *tx_desc; - bufsz = - adapter->hw.mac.max_frame_size + PCH_GBE_DMA_ALIGN + NET_IP_ALIGN; - for (i = 0; i < tx_ring->count; i++) { buffer_info = &tx_ring->buffer_info[i]; - skb = netdev_alloc_skb(adapter->netdev, bufsz); - skb_reserve(skb, PCH_GBE_DMA_ALIGN); - buffer_info->skb = skb; + buffer_info->skb = NULL; tx_desc = PCH_GBE_TX_DESC(*tx_ring, i); tx_desc->gbec_status = (DSC_INIT16); } @@ -1622,9 +1625,9 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter, buffer_info->length, DMA_TO_DEVICE); buffer_info->mapped = false; } - if (buffer_info->skb) { - pr_debug("trim buffer_info->skb : %d\n", i); - skb_trim(buffer_info->skb, 0); + if (skb) { + dev_kfree_skb_any(skb); + buffer_info->skb = NULL; } tx_desc->gbec_status = DSC_INIT16; if (unlikely(++i == tx_ring->count)) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location 2012-07-17 7:33 ` Eric Dumazet 2012-07-17 14:20 ` Andy Cress @ 2012-07-25 20:10 ` Andy Cress 2012-08-06 14:19 ` pch_gbe: dont-copy-payload (was [PATCH 1/4] ...) Andy Cress 1 sibling, 1 reply; 10+ messages in thread From: Andy Cress @ 2012-07-25 20:10 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Zhong Hongbo Eric, For the checksum patch, that needs to be tabled until Hongbo has a chance to evaluate/revise it, but that isn't vital to the goal of avoiding transmit timeouts, so the other 3 patches should still go forward. I'll submit a revised patch set with just those 3. I tried applying the dont-copy-payload patch you outlined below, and for some reason it won't send successfully with that change. I'm not sure why though. This approach seems sound, and should greatly help transmit performance, if I could figure out what else it needs. Andy -----Original Message----- From: Eric Dumazet [mailto:eric.dumazet@gmail.com] Sent: Tuesday, July 17, 2012 3:33 AM To: Andy Cress Cc: netdev@vger.kernel.org; Zhong Hongbo Subject: Re: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location On Tue, 2012-07-17 at 09:09 +0200, Eric Dumazet wrote: > Hmm... I fail to understand why you care about NIC doing checksums, > while pch_gbe_tx_queue() make a _copy_ of each outgoing > packets. > > There _must_ be a way to avoid most of these copies (ie not touching > payload), only mess with the header to insert these 2 nul bytes ? > > /* [Header:14][payload] ---> [Header:14][paddong:2][payload] */ > > So at device setup : dev->needed_headroom = 2; > > and in xmit, > > if (skb_headroom(skb) < 2) { > struct sk_buff *skb_new; > > skb_new = skb_realloc_headroom(skb, 2); > if (!skb_new) { handle error } > consume_skb(skb); > skb = skb_new; > } > ptr = skb_push(skb, 2); > memmove(ptr, ptr + 2, ETH_HLEN); > ptr[ETH_HLEN] = 0; > ptr[ETH_HLEN + 1] = 0; > > Something like the following (untested) patch drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 55 +++++----- 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c index b100656..2d3d982 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c @@ -1163,7 +1163,7 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, struct pch_gbe_hw *hw = &adapter->hw; struct pch_gbe_tx_desc *tx_desc; struct pch_gbe_buffer *buffer_info; - struct sk_buff *tmp_skb; + char *ptr; unsigned int frame_ctrl; unsigned int ring_num; @@ -1221,18 +1221,27 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, buffer_info = &tx_ring->buffer_info[ring_num]; - tmp_skb = buffer_info->skb; + if (skb_headroom(skb) < 2) { + struct sk_buff *skb_new; + + skb_new = skb_realloc_headroom(skb, 2); + if (!skb_new) { + tx_ring->next_to_use = ring_num; + dev_kfree_skb_any(skb); + return; + } + consume_skb(skb); + skb = skb_new; + } /* [Header:14][payload] ---> [Header:14][paddong:2][payload] */ - memcpy(tmp_skb->data, skb->data, ETH_HLEN); - tmp_skb->data[ETH_HLEN] = 0x00; - tmp_skb->data[ETH_HLEN + 1] = 0x00; - tmp_skb->len = skb->len; - memcpy(&tmp_skb->data[ETH_HLEN + 2], &skb->data[ETH_HLEN], - (skb->len - ETH_HLEN)); + ptr = skb_push(skb, 2); + memmove(ptr, ptr + 2, ETH_HLEN); + ptr[ETH_HLEN] = 0x00; + ptr[ETH_HLEN + 1] = 0x00; /*-- Set Buffer information --*/ - buffer_info->length = tmp_skb->len; - buffer_info->dma = dma_map_single(&adapter->pdev->dev, tmp_skb->data, + buffer_info->length = skb->len; + buffer_info->dma = dma_map_single(&adapter->pdev->dev, skb->data, buffer_info->length, DMA_TO_DEVICE); if (dma_mapping_error(&adapter->pdev->dev, buffer_info->dma)) { @@ -1240,18 +1249,20 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, buffer_info->dma = 0; buffer_info->time_stamp = 0; tx_ring->next_to_use = ring_num; + dev_kfree_skb_any(skb); return; } buffer_info->mapped = true; buffer_info->time_stamp = jiffies; + buffer_info->skb = skb; /*-- Set Tx descriptor --*/ tx_desc = PCH_GBE_TX_DESC(*tx_ring, ring_num); - tx_desc->buffer_addr = (buffer_info->dma); - tx_desc->length = (tmp_skb->len); - tx_desc->tx_words_eob = ((tmp_skb->len + 3)); + tx_desc->buffer_addr = buffer_info->dma; + tx_desc->length = skb->len; + tx_desc->tx_words_eob = skb->len + 3; tx_desc->tx_frame_ctrl = (frame_ctrl); - tx_desc->gbec_status = (DSC_INIT16); + tx_desc->gbec_status = DSC_INIT16; if (unlikely(++ring_num == tx_ring->count)) ring_num = 0; @@ -1265,7 +1276,6 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, pch_tx_timestamp(adapter, skb); #endif - dev_kfree_skb_any(skb); } /** @@ -1543,19 +1553,12 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter, struct pch_gbe_tx_ring *tx_ring) { struct pch_gbe_buffer *buffer_info; - struct sk_buff *skb; unsigned int i; - unsigned int bufsz; struct pch_gbe_tx_desc *tx_desc; - bufsz = - adapter->hw.mac.max_frame_size + PCH_GBE_DMA_ALIGN + NET_IP_ALIGN; - for (i = 0; i < tx_ring->count; i++) { buffer_info = &tx_ring->buffer_info[i]; - skb = netdev_alloc_skb(adapter->netdev, bufsz); - skb_reserve(skb, PCH_GBE_DMA_ALIGN); - buffer_info->skb = skb; + buffer_info->skb = NULL; tx_desc = PCH_GBE_TX_DESC(*tx_ring, i); tx_desc->gbec_status = (DSC_INIT16); } @@ -1622,9 +1625,9 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter, buffer_info->length, DMA_TO_DEVICE); buffer_info->mapped = false; } - if (buffer_info->skb) { - pr_debug("trim buffer_info->skb : %d\n", i); - skb_trim(buffer_info->skb, 0); + if (skb) { + dev_kfree_skb_any(skb); + buffer_info->skb = NULL; } tx_desc->gbec_status = DSC_INIT16; if (unlikely(++i == tx_ring->count)) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: pch_gbe: dont-copy-payload (was [PATCH 1/4] ...) 2012-07-25 20:10 ` Andy Cress @ 2012-08-06 14:19 ` Andy Cress 2012-08-06 14:52 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Andy Cress @ 2012-08-06 14:19 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev I found out why the proposed dont-copy-payload patch didn't work with this pch_gbe NIC. This NIC PHY requires 64-byte-aligned DMA, and the transmit buffers won't be transferred if skb->data is not 64-byte-aligned. Apparently the data copy has a by-product of aligning the buffers. I tried using skb_reserve(skb,64) in pch_gbe_alloc_tx_buffers and pch-gbe_alloc_rx_buffers, but that didn't seem to resolve it. How can I make sure that the transmit data buffers are 64-byte-aligned? Andy -----Original Message----- From: Andy Cress Sent: Wednesday, July 25, 2012 4:11 PM Subject: RE: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location [...] I tried applying the dont-copy-payload patch you outlined below, and for some reason it won't send successfully with that change. I'm not sure why though. This approach seems sound, and should greatly help transmit performance, if I could figure out what else it needs. Andy -----Original Message----- From: Eric Dumazet Sent: Tuesday, July 17, 2012 3:33 AM To: Andy Cress Cc: netdev@vger.kernel.org; Zhong Hongbo Subject: Re: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location On Tue, 2012-07-17 at 09:09 +0200, Eric Dumazet wrote: [...] > There _must_ be a way to avoid most of these copies (ie not touching > payload), only mess with the header to insert these 2 nul bytes ? > > /* [Header:14][payload] ---> [Header:14][paddong:2][payload] */ [...] Something like the following (untested) patch drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 55 +++++----- 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c index b100656..2d3d982 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c @@ -1163,7 +1163,7 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, struct pch_gbe_hw *hw = &adapter->hw; struct pch_gbe_tx_desc *tx_desc; struct pch_gbe_buffer *buffer_info; - struct sk_buff *tmp_skb; + char *ptr; unsigned int frame_ctrl; unsigned int ring_num; @@ -1221,18 +1221,27 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, buffer_info = &tx_ring->buffer_info[ring_num]; - tmp_skb = buffer_info->skb; + if (skb_headroom(skb) < 2) { + struct sk_buff *skb_new; + + skb_new = skb_realloc_headroom(skb, 2); + if (!skb_new) { + tx_ring->next_to_use = ring_num; + dev_kfree_skb_any(skb); + return; + } + consume_skb(skb); + skb = skb_new; + } /* [Header:14][payload] ---> [Header:14][paddong:2][payload] */ - memcpy(tmp_skb->data, skb->data, ETH_HLEN); - tmp_skb->data[ETH_HLEN] = 0x00; - tmp_skb->data[ETH_HLEN + 1] = 0x00; - tmp_skb->len = skb->len; - memcpy(&tmp_skb->data[ETH_HLEN + 2], &skb->data[ETH_HLEN], - (skb->len - ETH_HLEN)); + ptr = skb_push(skb, 2); + memmove(ptr, ptr + 2, ETH_HLEN); + ptr[ETH_HLEN] = 0x00; + ptr[ETH_HLEN + 1] = 0x00; /*-- Set Buffer information --*/ - buffer_info->length = tmp_skb->len; - buffer_info->dma = dma_map_single(&adapter->pdev->dev, tmp_skb->data, + buffer_info->length = skb->len; + buffer_info->dma = dma_map_single(&adapter->pdev->dev, skb->data, buffer_info->length, DMA_TO_DEVICE); if (dma_mapping_error(&adapter->pdev->dev, buffer_info->dma)) { @@ -1240,18 +1249,20 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, buffer_info->dma = 0; buffer_info->time_stamp = 0; tx_ring->next_to_use = ring_num; + dev_kfree_skb_any(skb); return; } buffer_info->mapped = true; buffer_info->time_stamp = jiffies; + buffer_info->skb = skb; /*-- Set Tx descriptor --*/ tx_desc = PCH_GBE_TX_DESC(*tx_ring, ring_num); - tx_desc->buffer_addr = (buffer_info->dma); - tx_desc->length = (tmp_skb->len); - tx_desc->tx_words_eob = ((tmp_skb->len + 3)); + tx_desc->buffer_addr = buffer_info->dma; + tx_desc->length = skb->len; + tx_desc->tx_words_eob = skb->len + 3; tx_desc->tx_frame_ctrl = (frame_ctrl); - tx_desc->gbec_status = (DSC_INIT16); + tx_desc->gbec_status = DSC_INIT16; if (unlikely(++ring_num == tx_ring->count)) ring_num = 0; @@ -1265,7 +1276,6 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, pch_tx_timestamp(adapter, skb); #endif - dev_kfree_skb_any(skb); } /** @@ -1543,19 +1553,12 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter, struct pch_gbe_tx_ring *tx_ring) { struct pch_gbe_buffer *buffer_info; - struct sk_buff *skb; unsigned int i; - unsigned int bufsz; struct pch_gbe_tx_desc *tx_desc; - bufsz = - adapter->hw.mac.max_frame_size + PCH_GBE_DMA_ALIGN + NET_IP_ALIGN; - for (i = 0; i < tx_ring->count; i++) { buffer_info = &tx_ring->buffer_info[i]; - skb = netdev_alloc_skb(adapter->netdev, bufsz); - skb_reserve(skb, PCH_GBE_DMA_ALIGN); - buffer_info->skb = skb; + buffer_info->skb = NULL; tx_desc = PCH_GBE_TX_DESC(*tx_ring, i); tx_desc->gbec_status = (DSC_INIT16); } @@ -1622,9 +1625,9 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter, buffer_info->length, DMA_TO_DEVICE); buffer_info->mapped = false; } - if (buffer_info->skb) { - pr_debug("trim buffer_info->skb : %d\n", i); - skb_trim(buffer_info->skb, 0); + if (skb) { + dev_kfree_skb_any(skb); + buffer_info->skb = NULL; } tx_desc->gbec_status = DSC_INIT16; if (unlikely(++i == tx_ring->count)) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: pch_gbe: dont-copy-payload (was [PATCH 1/4] ...) 2012-08-06 14:19 ` pch_gbe: dont-copy-payload (was [PATCH 1/4] ...) Andy Cress @ 2012-08-06 14:52 ` Eric Dumazet 0 siblings, 0 replies; 10+ messages in thread From: Eric Dumazet @ 2012-08-06 14:52 UTC (permalink / raw) To: Andy Cress; +Cc: netdev On Mon, 2012-08-06 at 07:19 -0700, Andy Cress wrote: > I found out why the proposed dont-copy-payload patch didn't work with > this pch_gbe NIC. > This NIC PHY requires 64-byte-aligned DMA, and the transmit buffers > won't be transferred if skb->data is not 64-byte-aligned. Apparently > the data copy has a by-product of aligning the buffers. > > I tried using skb_reserve(skb,64) in pch_gbe_alloc_tx_buffers and > pch-gbe_alloc_rx_buffers, but that didn't seem to resolve it. > > How can I make sure that the transmit data buffers are > 64-byte-aligned? > There is no support for such requirement in linux stacks. Only solution for the driver is to copy all frames to 64-byte-aligned bounce buffers. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location 2012-07-17 7:09 ` Eric Dumazet 2012-07-17 7:33 ` Eric Dumazet @ 2012-07-17 8:04 ` Zhong Hongbo 2012-07-17 8:48 ` Eric Dumazet 1 sibling, 1 reply; 10+ messages in thread From: Zhong Hongbo @ 2012-07-17 8:04 UTC (permalink / raw) To: Eric Dumazet; +Cc: Andy Cress, netdev On 07/17/2012 03:09 PM, Eric Dumazet wrote: > On Mon, 2012-07-16 at 13:03 -0700, Andy Cress wrote: >> Author: Zhong Hongbo <hongbo.zhong@windriver.com> >> >> Due to some unknown hardware limitations the pch_gbe hardware cannot >> calculate checksums when the length of network package is less >> than 64 bytes, where we will surprisingly encounter a problem of >> the destination IP incorrectly changed. >> >> When forwarding network packages at the network layer the IP packages >> won't be relayed to the upper transport layer and analyzed there, >> consequently, skb->transport_header pointer will be mistakenly remained >> the same as that of skb->network_header, resulting in TCP checksum >> wrongly >> filled into the field of destination IP in IP header. >> >> We can fix this issue by manually calculate the offset of the TCP >> checksum >> and update it accordingly. >> >> We would normally use the skb_checksum_start_offset(skb) here, but in >> this >> case it is sometimes -2 (csum_start=0 - skb_headroom=2 => -2), hence the >> manual calculation. >> >> Signed-off-by: Zhong Hongbo <hongbo.zhong@windriver.com> >> Merged-by: Andy Cress <andy.cress@us.kontron.com> > > Hmm... I fail to understand why you care about NIC doing checksums, Hi Eric, When forwarding network packages at the network layer, the variable value of skb->transport_header is unknown. In my test, the variable value of skb->transport_header is equal to skb->network_header. So When you count the checksum as following: offset = skb_transport_offset(skb); skb->csum = skb_checksum(skb, offset, skb->len - offset, 0); We should only count the TCP checksum, But it maybe include IP part. tcp_hdr(skb)->check = csum_tcpudp_magic(iph->saddr, iph->daddr, skb->len - offset, IPPROTO_TCP, skb->csum); We should fill the checksum in TCP package, But maybe fill it in other location and cover the useful information, such as source ip. So We should count the TCP checksum and fill it in the correct location. Or else the forwarding network package will be drop for the error checksum. > while pch_gbe_tx_queue() make a _copy_ of each outgoing > packets. > > There _must_ be a way to avoid most of these copies (ie not touching > payload), only mess with the header to insert these 2 nul bytes ? This is other fix, my patch just fix checksum error issue. Thanks, hongbo > > /* [Header:14][payload] ---> [Header:14][paddong:2][payload] */ > > So at device setup : dev->needed_headroom = 2; > > and in xmit, > > if (skb_headroom(skb) < 2) { > struct sk_buff *skb_new; > > skb_new = skb_realloc_headroom(skb, 2); > if (!skb_new) { handle error } > consume_skb(skb); > skb = skb_new; > } > ptr = skb_push(skb, 2); > memmove(ptr, ptr + 2, ETH_HLEN); > ptr[ETH_HLEN] = 0; > ptr[ETH_HLEN + 1] = 0; > > > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location 2012-07-17 8:04 ` [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location Zhong Hongbo @ 2012-07-17 8:48 ` Eric Dumazet 0 siblings, 0 replies; 10+ messages in thread From: Eric Dumazet @ 2012-07-17 8:48 UTC (permalink / raw) To: Zhong Hongbo; +Cc: Andy Cress, netdev On Tue, 2012-07-17 at 16:04 +0800, Zhong Hongbo wrote: > Hi Eric, > > When forwarding network packages at the network layer, the variable > value of skb->transport_header is unknown. In my test, the variable > value of skb->transport_header is equal to skb->network_header. So > When you count the checksum as following: > > offset = skb_transport_offset(skb); > > skb->csum = skb_checksum(skb, offset, skb->len - offset, 0); > We should only count the TCP checksum, But it maybe include IP part. > > tcp_hdr(skb)->check = csum_tcpudp_magic(iph->saddr, iph->daddr, > skb->len - offset, IPPROTO_TCP, skb->csum); > We should fill the checksum in TCP package, But maybe fill it in other > location and cover the useful information, such as source ip. > > So We should count the TCP checksum and fill it in the correct > location. Or else the forwarding network package will be drop for the > error checksum. > So maybe you should instead test if (skb->ip_summed == CHECKSUM_PARTIAL) { ... skb_checksum_start_offset(skb); /* is valid */ } ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-08-06 14:52 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-07-16 20:03 [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location Andy Cress 2012-07-17 0:59 ` Paul Gortmaker 2012-07-17 7:09 ` Eric Dumazet 2012-07-17 7:33 ` Eric Dumazet 2012-07-17 14:20 ` Andy Cress 2012-07-25 20:10 ` Andy Cress 2012-08-06 14:19 ` pch_gbe: dont-copy-payload (was [PATCH 1/4] ...) Andy Cress 2012-08-06 14:52 ` Eric Dumazet 2012-07-17 8:04 ` [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location Zhong Hongbo 2012-07-17 8:48 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox