* skb_try_coalesce bug? @ 2014-04-22 12:01 Erik Hugne 2014-04-22 13:11 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Erik Hugne @ 2014-04-22 12:01 UTC (permalink / raw) To: netdev It seems that if the head skb of a reassembly chain have enough tailroom to hold the data of a received fragment, skb_try_coalesce() will append this directly to the head, even if preceding fragments have been put on a frag list. This will cause a corrupted buffer to be passed to userland when skb_copy_datagram_iovec() later copies the contents of head, and then each frag one by one to the target iovec. Is skb_try_coalesce() broken, or are we using it wrongly in tipc? //E ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug? 2014-04-22 12:01 skb_try_coalesce bug? Erik Hugne @ 2014-04-22 13:11 ` Eric Dumazet 2014-04-22 19:38 ` Jon Maloy 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2014-04-22 13:11 UTC (permalink / raw) To: Erik Hugne; +Cc: netdev On Tue, 2014-04-22 at 14:01 +0200, Erik Hugne wrote: > It seems that if the head skb of a reassembly chain have enough tailroom > to hold the data of a received fragment, skb_try_coalesce() will append this > directly to the head, even if preceding fragments have been put on a frag list. > This will cause a corrupted buffer to be passed to userland when > skb_copy_datagram_iovec() later copies the contents of head, and then each frag > one by one to the target iovec. > > Is skb_try_coalesce() broken, or are we using it wrongly in tipc? I am not sure how it could happen with the current implementation ? static inline bool skb_is_nonlinear(const struct sk_buff *skb) { return skb->data_len; } static inline int skb_tailroom(const struct sk_buff *skb) { return skb_is_nonlinear(skb) ? 0 : skb->end - skb->tail; } /** * skb_try_coalesce - try to merge skb to prior one * @to: prior buffer * @from: buffer to add * @fragstolen: pointer to boolean * @delta_truesize: how much more was allocated than was requested */ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, bool *fragstolen, int *delta_truesize) { int i, delta, len = from->len; *fragstolen = false; if (skb_cloned(to)) return false; if (len <= skb_tailroom(to)) { BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len)); ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug? 2014-04-22 13:11 ` Eric Dumazet @ 2014-04-22 19:38 ` Jon Maloy 2014-04-22 20:05 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Jon Maloy @ 2014-04-22 19:38 UTC (permalink / raw) To: Eric Dumazet, Erik Hugne; +Cc: netdev On 04/22/2014 09:11 AM, Eric Dumazet wrote: > On Tue, 2014-04-22 at 14:01 +0200, Erik Hugne wrote: >> It seems that if the head skb of a reassembly chain have enough tailroom >> to hold the data of a received fragment, skb_try_coalesce() will append this >> directly to the head, even if preceding fragments have been put on a frag list. >> This will cause a corrupted buffer to be passed to userland when >> skb_copy_datagram_iovec() later copies the contents of head, and then each frag >> one by one to the target iovec. >> >> Is skb_try_coalesce() broken, or are we using it wrongly in tipc? > > I am not sure how it could happen with the current implementation ? > > static inline bool skb_is_nonlinear(const struct sk_buff *skb) > { > return skb->data_len; > } > > static inline int skb_tailroom(const struct sk_buff *skb) > { > return skb_is_nonlinear(skb) ? 0 : skb->end - skb->tail; > } > > /** > * skb_try_coalesce - try to merge skb to prior one > * @to: prior buffer > * @from: buffer to add > * @fragstolen: pointer to boolean > * @delta_truesize: how much more was allocated than was requested > */ > bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > bool *fragstolen, int *delta_truesize) > { > int i, delta, len = from->len; > > *fragstolen = false; > > if (skb_cloned(to)) > return false; > > if (len <= skb_tailroom(to)) { > BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len)); In the case I encountered, our head buffer is linear (skb->data_len == 0), so it is the real tailroom value that is returned. An alas, that one is big enough to contain the last (small) fragment of the message. ///jon > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug? 2014-04-22 19:38 ` Jon Maloy @ 2014-04-22 20:05 ` Eric Dumazet 2014-04-22 20:35 ` Jon Maloy 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2014-04-22 20:05 UTC (permalink / raw) To: Jon Maloy; +Cc: Erik Hugne, netdev On Tue, 2014-04-22 at 15:38 -0400, Jon Maloy wrote: > > In the case I encountered, our head buffer is linear (skb->data_len == 0), > so it is the real tailroom value that is returned. An alas, that one is big > enough to contain the last (small) fragment of the message. Whole point of skb_try_coalesce() is to coalesce as much as possible, without guarantee of keeping some sort of 'segments' skb_try_coalesce - try to merge skb to prior one If you do not want this to happen, (you seem to want nothing else in your head buffer skb->head), you need to add some logic. A helper temporarily setting head->tail = head->end would do it I guess. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug? 2014-04-22 20:05 ` Eric Dumazet @ 2014-04-22 20:35 ` Jon Maloy 2014-04-22 21:28 ` Jon Maloy 2014-04-22 21:29 ` Eric Dumazet 0 siblings, 2 replies; 12+ messages in thread From: Jon Maloy @ 2014-04-22 20:35 UTC (permalink / raw) To: Eric Dumazet; +Cc: Erik Hugne, netdev On 04/22/2014 04:05 PM, Eric Dumazet wrote: > On Tue, 2014-04-22 at 15:38 -0400, Jon Maloy wrote: > >> >> In the case I encountered, our head buffer is linear (skb->data_len == 0), >> so it is the real tailroom value that is returned. An alas, that one is big >> enough to contain the last (small) fragment of the message. > > > Whole point of skb_try_coalesce() is to coalesce as much as possible, > without guarantee of keeping some sort of 'segments' > > skb_try_coalesce - try to merge skb to prior one > > If you do not want this to happen, (you seem to want nothing else in > your head buffer skb->head), you need to add some logic. Ok. I should have given a little background. 1: We send a message of 3041 bytes, inclusive TIPC header, via loopback interface. 2: This one gets chopped up in three fragments: 1420, 1420,and 201 bytes. (The mtu was of course wrong, but this is how I discovered the problem). 3: First fragment is received, uncloned, and serves as head. 4; Second fragment (a clone) is received. skb_try_coalesce() fails at the skb_head_is_locked() test, because the buffer is a clone. Because of this, we add the buffer to skb_shinfo(head)->frag_list instead. 5: Third fragment (also a clone) is received. Now, since we check for space in tailroom of header before we do anything else, it slips in there, and bypasses the already chained-up second segment. Regards ///jon > > A helper temporarily setting head->tail = head->end would do it I guess. > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug? 2014-04-22 20:35 ` Jon Maloy @ 2014-04-22 21:28 ` Jon Maloy 2014-04-22 21:29 ` Eric Dumazet 1 sibling, 0 replies; 12+ messages in thread From: Jon Maloy @ 2014-04-22 21:28 UTC (permalink / raw) To: Eric Dumazet; +Cc: Erik Hugne, netdev On 04/22/2014 04:35 PM, Jon Maloy wrote: > On 04/22/2014 04:05 PM, Eric Dumazet wrote: >> On Tue, 2014-04-22 at 15:38 -0400, Jon Maloy wrote: >> >>> >>> In the case I encountered, our head buffer is linear (skb->data_len == 0), >>> so it is the real tailroom value that is returned. An alas, that one is big >>> enough to contain the last (small) fragment of the message. >> >> >> Whole point of skb_try_coalesce() is to coalesce as much as possible, >> without guarantee of keeping some sort of 'segments' >> >> skb_try_coalesce - try to merge skb to prior one >> >> If you do not want this to happen, (you seem to want nothing else in >> your head buffer skb->head), you need to add some logic. > > Ok. I should have given a little background. > > 1: We send a message of 3041 bytes, inclusive TIPC header, via loopback interface. > > 2: This one gets chopped up in three fragments: 1420, 1420,and 201 bytes. > (The mtu was of course wrong, but this is how I discovered the problem). > > 3: First fragment is received, uncloned, and serves as head. > > 4; Second fragment (a clone) is received. skb_try_coalesce() fails at > the skb_head_is_locked() test, because the buffer is a clone. > Because of this, we add the buffer to skb_shinfo(head)->frag_list > instead. > > 5: Third fragment (also a clone) is received. Now, since we i.e., skb_try_coalesce(head, frag) check for > space in tailroom of header before we do anything else, it slips > in there, and bypasses the already chained-up second segment. More background: our reassembly code is based on the one found in ip_fragment.c::ip_frag_reasm(), which always first try to coalecse a buffer with head. That is a bad idea, I guess, but I wonder why they don't see this problem in ipv4. > > Regards > ///jon > > >> >> A helper temporarily setting head->tail = head->end would do it I guess. That would work. Or just check skb_has_frag_list(head) first, and make the call to skb_try_coalesce() conditional to the the result. It just feels a little unnecessary, since that test is done inside skb_try_coalesce() anyway. ///jon >> >> > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug? 2014-04-22 20:35 ` Jon Maloy 2014-04-22 21:28 ` Jon Maloy @ 2014-04-22 21:29 ` Eric Dumazet 2014-04-22 21:31 ` Jon Maloy 2014-04-22 21:37 ` Eric Dumazet 1 sibling, 2 replies; 12+ messages in thread From: Eric Dumazet @ 2014-04-22 21:29 UTC (permalink / raw) To: Jon Maloy; +Cc: Erik Hugne, netdev On Tue, 2014-04-22 at 16:35 -0400, Jon Maloy wrote: > On 04/22/2014 04:05 PM, Eric Dumazet wrote: > > On Tue, 2014-04-22 at 15:38 -0400, Jon Maloy wrote: > > > >> > >> In the case I encountered, our head buffer is linear (skb->data_len == 0), > >> so it is the real tailroom value that is returned. An alas, that one is big > >> enough to contain the last (small) fragment of the message. > > > > > > Whole point of skb_try_coalesce() is to coalesce as much as possible, > > without guarantee of keeping some sort of 'segments' > > > > skb_try_coalesce - try to merge skb to prior one > > > > If you do not want this to happen, (you seem to want nothing else in > > your head buffer skb->head), you need to add some logic. > > Ok. I should have given a little background. > > 1: We send a message of 3041 bytes, inclusive TIPC header, via loopback interface. > > 2: This one gets chopped up in three fragments: 1420, 1420,and 201 bytes. > (The mtu was of course wrong, but this is how I discovered the problem). > > 3: First fragment is received, uncloned, and serves as head. > > 4; Second fragment (a clone) is received. skb_try_coalesce() fails at > the skb_head_is_locked() test, because the buffer is a clone. > Because of this, we add the buffer to skb_shinfo(head)->frag_list > instead. Then if you do that, you also need to change head->data_len ! > > 5: Third fragment (also a clone) is received. Now, since we check for > space in tailroom of header before we do anything else, it slips > in there, and bypasses the already chained-up second segment. Only because you failed to tell the world @head was no longer linear. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug? 2014-04-22 21:29 ` Eric Dumazet @ 2014-04-22 21:31 ` Jon Maloy 2014-04-22 21:37 ` Eric Dumazet 1 sibling, 0 replies; 12+ messages in thread From: Jon Maloy @ 2014-04-22 21:31 UTC (permalink / raw) To: Eric Dumazet; +Cc: Erik Hugne, netdev On 04/22/2014 05:29 PM, Eric Dumazet wrote: > On Tue, 2014-04-22 at 16:35 -0400, Jon Maloy wrote: >> On 04/22/2014 04:05 PM, Eric Dumazet wrote: >>> On Tue, 2014-04-22 at 15:38 -0400, Jon Maloy wrote: >>> >>>> >>>> In the case I encountered, our head buffer is linear (skb->data_len == 0), >>>> so it is the real tailroom value that is returned. An alas, that one is big >>>> enough to contain the last (small) fragment of the message. >>> >>> >>> Whole point of skb_try_coalesce() is to coalesce as much as possible, >>> without guarantee of keeping some sort of 'segments' >>> >>> skb_try_coalesce - try to merge skb to prior one >>> >>> If you do not want this to happen, (you seem to want nothing else in >>> your head buffer skb->head), you need to add some logic. >> >> Ok. I should have given a little background. >> >> 1: We send a message of 3041 bytes, inclusive TIPC header, via loopback interface. >> >> 2: This one gets chopped up in three fragments: 1420, 1420,and 201 bytes. >> (The mtu was of course wrong, but this is how I discovered the problem). >> >> 3: First fragment is received, uncloned, and serves as head. >> >> 4; Second fragment (a clone) is received. skb_try_coalesce() fails at >> the skb_head_is_locked() test, because the buffer is a clone. >> Because of this, we add the buffer to skb_shinfo(head)->frag_list >> instead. > > Then if you do that, you also need to change head->data_len ! > >> >> 5: Third fragment (also a clone) is received. Now, since we check for >> space in tailroom of header before we do anything else, it slips >> in there, and bypasses the already chained-up second segment. > > Only because you failed to tell the world @head was no longer linear. Ok. Thank you. Lesson learned. ///jon > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug? 2014-04-22 21:29 ` Eric Dumazet 2014-04-22 21:31 ` Jon Maloy @ 2014-04-22 21:37 ` Eric Dumazet 2014-04-23 16:56 ` Jon Maloy 1 sibling, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2014-04-22 21:37 UTC (permalink / raw) To: Jon Maloy; +Cc: Erik Hugne, netdev On Tue, 2014-04-22 at 14:29 -0700, Eric Dumazet wrote: > Then if you do that, you also need to change head->data_len ! Untested patch would be : diff --git a/net/tipc/link.c b/net/tipc/link.c index c5190ab75290..85077dd7c63e 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -2349,6 +2349,7 @@ int tipc_link_frag_rcv(struct sk_buff **head, struct sk_buff **tail, (*tail)->next = frag; *tail = frag; (*head)->truesize += frag->truesize; + (*head)->data_len += frag->len; } if (fragid == LAST_FRAGMENT) { *fbuf = *head; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug? 2014-04-22 21:37 ` Eric Dumazet @ 2014-04-23 16:56 ` Jon Maloy 2014-04-23 17:33 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Jon Maloy @ 2014-04-23 16:56 UTC (permalink / raw) To: Eric Dumazet; +Cc: Erik Hugne, netdev On 04/22/2014 05:37 PM, Eric Dumazet wrote: > On Tue, 2014-04-22 at 14:29 -0700, Eric Dumazet wrote: > >> Then if you do that, you also need to change head->data_len ! > > Untested patch would be : > > diff --git a/net/tipc/link.c b/net/tipc/link.c > index c5190ab75290..85077dd7c63e 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -2349,6 +2349,7 @@ int tipc_link_frag_rcv(struct sk_buff **head, struct sk_buff **tail, > (*tail)->next = frag; > *tail = frag; > (*head)->truesize += frag->truesize; > + (*head)->data_len += frag->len; Just to confirm, does this mean that head's own (linear) data is not included in data_len? ///jon > } > if (fragid == LAST_FRAGMENT) { > *fbuf = *head; > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug? 2014-04-23 16:56 ` Jon Maloy @ 2014-04-23 17:33 ` David Miller 2014-04-23 17:54 ` Jon Maloy 0 siblings, 1 reply; 12+ messages in thread From: David Miller @ 2014-04-23 17:33 UTC (permalink / raw) To: jon.maloy; +Cc: eric.dumazet, erik.hugne, netdev From: Jon Maloy <jon.maloy@ericsson.com> Date: Wed, 23 Apr 2014 12:56:20 -0400 > On 04/22/2014 05:37 PM, Eric Dumazet wrote: >> On Tue, 2014-04-22 at 14:29 -0700, Eric Dumazet wrote: >> >>> Then if you do that, you also need to change head->data_len ! >> >> Untested patch would be : >> >> diff --git a/net/tipc/link.c b/net/tipc/link.c >> index c5190ab75290..85077dd7c63e 100644 >> --- a/net/tipc/link.c >> +++ b/net/tipc/link.c >> @@ -2349,6 +2349,7 @@ int tipc_link_frag_rcv(struct sk_buff **head, struct sk_buff **tail, >> (*tail)->next = frag; >> *tail = frag; >> (*head)->truesize += frag->truesize; >> + (*head)->data_len += frag->len; > > Just to confirm, does this mean that head's own (linear) data is not > included in data_len? For a given SKB, skb->len is the entire length of the packet, fragments and all. skb->data_len counts the sum of all of the page and SKB based fragments, ie. all bytes which are not in the top-level SKBs linear area. So the linear length is always "skb->len - skb->data_len", and this is exactly what skb_headlen() does. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug? 2014-04-23 17:33 ` David Miller @ 2014-04-23 17:54 ` Jon Maloy 0 siblings, 0 replies; 12+ messages in thread From: Jon Maloy @ 2014-04-23 17:54 UTC (permalink / raw) To: David Miller; +Cc: eric.dumazet, erik.hugne, netdev On 04/23/2014 01:33 PM, David Miller wrote: > From: Jon Maloy <jon.maloy@ericsson.com> > Date: Wed, 23 Apr 2014 12:56:20 -0400 > >> On 04/22/2014 05:37 PM, Eric Dumazet wrote: >>> On Tue, 2014-04-22 at 14:29 -0700, Eric Dumazet wrote: >>> >>>> Then if you do that, you also need to change head->data_len ! >>> >>> Untested patch would be : >>> >>> diff --git a/net/tipc/link.c b/net/tipc/link.c >>> index c5190ab75290..85077dd7c63e 100644 >>> --- a/net/tipc/link.c >>> +++ b/net/tipc/link.c >>> @@ -2349,6 +2349,7 @@ int tipc_link_frag_rcv(struct sk_buff **head, struct sk_buff **tail, >>> (*tail)->next = frag; >>> *tail = frag; >>> (*head)->truesize += frag->truesize; >>> + (*head)->data_len += frag->len; >> >> Just to confirm, does this mean that head's own (linear) data is not >> included in data_len? > > For a given SKB, skb->len is the entire length of the packet, fragments and > all. > > skb->data_len counts the sum of all of the page and SKB based fragments, ie. > all bytes which are not in the top-level SKBs linear area. > > So the linear length is always "skb->len - skb->data_len", and this is exactly > what skb_headlen() does. > Thank you for the clarification. We'll have to fix this. I am just puzzled that our defragmentation algorithm has worked as well as it has until now. ///jon ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-04-23 17:54 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-22 12:01 skb_try_coalesce bug? Erik Hugne 2014-04-22 13:11 ` Eric Dumazet 2014-04-22 19:38 ` Jon Maloy 2014-04-22 20:05 ` Eric Dumazet 2014-04-22 20:35 ` Jon Maloy 2014-04-22 21:28 ` Jon Maloy 2014-04-22 21:29 ` Eric Dumazet 2014-04-22 21:31 ` Jon Maloy 2014-04-22 21:37 ` Eric Dumazet 2014-04-23 16:56 ` Jon Maloy 2014-04-23 17:33 ` David Miller 2014-04-23 17:54 ` Jon Maloy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).