* skb_try_coalesce bug?
@ 2014-04-22 12:01 Erik Hugne
2014-04-22 13:11 ` Eric Dumazet
0 siblings, 1 reply; 12+ messages in thread
From: Erik Hugne @ 2014-04-22 12:01 UTC (permalink / raw)
To: netdev
It seems that if the head skb of a reassembly chain have enough tailroom
to hold the data of a received fragment, skb_try_coalesce() will append this
directly to the head, even if preceding fragments have been put on a frag list.
This will cause a corrupted buffer to be passed to userland when
skb_copy_datagram_iovec() later copies the contents of head, and then each frag
one by one to the target iovec.
Is skb_try_coalesce() broken, or are we using it wrongly in tipc?
//E
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug?
2014-04-22 12:01 skb_try_coalesce bug? Erik Hugne
@ 2014-04-22 13:11 ` Eric Dumazet
2014-04-22 19:38 ` Jon Maloy
0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2014-04-22 13:11 UTC (permalink / raw)
To: Erik Hugne; +Cc: netdev
On Tue, 2014-04-22 at 14:01 +0200, Erik Hugne wrote:
> It seems that if the head skb of a reassembly chain have enough tailroom
> to hold the data of a received fragment, skb_try_coalesce() will append this
> directly to the head, even if preceding fragments have been put on a frag list.
> This will cause a corrupted buffer to be passed to userland when
> skb_copy_datagram_iovec() later copies the contents of head, and then each frag
> one by one to the target iovec.
>
> Is skb_try_coalesce() broken, or are we using it wrongly in tipc?
I am not sure how it could happen with the current implementation ?
static inline bool skb_is_nonlinear(const struct sk_buff *skb)
{
return skb->data_len;
}
static inline int skb_tailroom(const struct sk_buff *skb)
{
return skb_is_nonlinear(skb) ? 0 : skb->end - skb->tail;
}
/**
* skb_try_coalesce - try to merge skb to prior one
* @to: prior buffer
* @from: buffer to add
* @fragstolen: pointer to boolean
* @delta_truesize: how much more was allocated than was requested
*/
bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
bool *fragstolen, int *delta_truesize)
{
int i, delta, len = from->len;
*fragstolen = false;
if (skb_cloned(to))
return false;
if (len <= skb_tailroom(to)) {
BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug?
2014-04-22 13:11 ` Eric Dumazet
@ 2014-04-22 19:38 ` Jon Maloy
2014-04-22 20:05 ` Eric Dumazet
0 siblings, 1 reply; 12+ messages in thread
From: Jon Maloy @ 2014-04-22 19:38 UTC (permalink / raw)
To: Eric Dumazet, Erik Hugne; +Cc: netdev
On 04/22/2014 09:11 AM, Eric Dumazet wrote:
> On Tue, 2014-04-22 at 14:01 +0200, Erik Hugne wrote:
>> It seems that if the head skb of a reassembly chain have enough tailroom
>> to hold the data of a received fragment, skb_try_coalesce() will append this
>> directly to the head, even if preceding fragments have been put on a frag list.
>> This will cause a corrupted buffer to be passed to userland when
>> skb_copy_datagram_iovec() later copies the contents of head, and then each frag
>> one by one to the target iovec.
>>
>> Is skb_try_coalesce() broken, or are we using it wrongly in tipc?
>
> I am not sure how it could happen with the current implementation ?
>
> static inline bool skb_is_nonlinear(const struct sk_buff *skb)
> {
> return skb->data_len;
> }
>
> static inline int skb_tailroom(const struct sk_buff *skb)
> {
> return skb_is_nonlinear(skb) ? 0 : skb->end - skb->tail;
> }
>
> /**
> * skb_try_coalesce - try to merge skb to prior one
> * @to: prior buffer
> * @from: buffer to add
> * @fragstolen: pointer to boolean
> * @delta_truesize: how much more was allocated than was requested
> */
> bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> bool *fragstolen, int *delta_truesize)
> {
> int i, delta, len = from->len;
>
> *fragstolen = false;
>
> if (skb_cloned(to))
> return false;
>
> if (len <= skb_tailroom(to)) {
> BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
In the case I encountered, our head buffer is linear (skb->data_len == 0),
so it is the real tailroom value that is returned. An alas, that one is big
enough to contain the last (small) fragment of the message.
///jon
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug?
2014-04-22 19:38 ` Jon Maloy
@ 2014-04-22 20:05 ` Eric Dumazet
2014-04-22 20:35 ` Jon Maloy
0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2014-04-22 20:05 UTC (permalink / raw)
To: Jon Maloy; +Cc: Erik Hugne, netdev
On Tue, 2014-04-22 at 15:38 -0400, Jon Maloy wrote:
>
> In the case I encountered, our head buffer is linear (skb->data_len == 0),
> so it is the real tailroom value that is returned. An alas, that one is big
> enough to contain the last (small) fragment of the message.
Whole point of skb_try_coalesce() is to coalesce as much as possible,
without guarantee of keeping some sort of 'segments'
skb_try_coalesce - try to merge skb to prior one
If you do not want this to happen, (you seem to want nothing else in
your head buffer skb->head), you need to add some logic.
A helper temporarily setting head->tail = head->end would do it I guess.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug?
2014-04-22 20:05 ` Eric Dumazet
@ 2014-04-22 20:35 ` Jon Maloy
2014-04-22 21:28 ` Jon Maloy
2014-04-22 21:29 ` Eric Dumazet
0 siblings, 2 replies; 12+ messages in thread
From: Jon Maloy @ 2014-04-22 20:35 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Erik Hugne, netdev
On 04/22/2014 04:05 PM, Eric Dumazet wrote:
> On Tue, 2014-04-22 at 15:38 -0400, Jon Maloy wrote:
>
>>
>> In the case I encountered, our head buffer is linear (skb->data_len == 0),
>> so it is the real tailroom value that is returned. An alas, that one is big
>> enough to contain the last (small) fragment of the message.
>
>
> Whole point of skb_try_coalesce() is to coalesce as much as possible,
> without guarantee of keeping some sort of 'segments'
>
> skb_try_coalesce - try to merge skb to prior one
>
> If you do not want this to happen, (you seem to want nothing else in
> your head buffer skb->head), you need to add some logic.
Ok. I should have given a little background.
1: We send a message of 3041 bytes, inclusive TIPC header, via loopback interface.
2: This one gets chopped up in three fragments: 1420, 1420,and 201 bytes.
(The mtu was of course wrong, but this is how I discovered the problem).
3: First fragment is received, uncloned, and serves as head.
4; Second fragment (a clone) is received. skb_try_coalesce() fails at
the skb_head_is_locked() test, because the buffer is a clone.
Because of this, we add the buffer to skb_shinfo(head)->frag_list
instead.
5: Third fragment (also a clone) is received. Now, since we check for
space in tailroom of header before we do anything else, it slips
in there, and bypasses the already chained-up second segment.
Regards
///jon
>
> A helper temporarily setting head->tail = head->end would do it I guess.
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug?
2014-04-22 20:35 ` Jon Maloy
@ 2014-04-22 21:28 ` Jon Maloy
2014-04-22 21:29 ` Eric Dumazet
1 sibling, 0 replies; 12+ messages in thread
From: Jon Maloy @ 2014-04-22 21:28 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Erik Hugne, netdev
On 04/22/2014 04:35 PM, Jon Maloy wrote:
> On 04/22/2014 04:05 PM, Eric Dumazet wrote:
>> On Tue, 2014-04-22 at 15:38 -0400, Jon Maloy wrote:
>>
>>>
>>> In the case I encountered, our head buffer is linear (skb->data_len == 0),
>>> so it is the real tailroom value that is returned. An alas, that one is big
>>> enough to contain the last (small) fragment of the message.
>>
>>
>> Whole point of skb_try_coalesce() is to coalesce as much as possible,
>> without guarantee of keeping some sort of 'segments'
>>
>> skb_try_coalesce - try to merge skb to prior one
>>
>> If you do not want this to happen, (you seem to want nothing else in
>> your head buffer skb->head), you need to add some logic.
>
> Ok. I should have given a little background.
>
> 1: We send a message of 3041 bytes, inclusive TIPC header, via loopback interface.
>
> 2: This one gets chopped up in three fragments: 1420, 1420,and 201 bytes.
> (The mtu was of course wrong, but this is how I discovered the problem).
>
> 3: First fragment is received, uncloned, and serves as head.
>
> 4; Second fragment (a clone) is received. skb_try_coalesce() fails at
> the skb_head_is_locked() test, because the buffer is a clone.
> Because of this, we add the buffer to skb_shinfo(head)->frag_list
> instead.
>
> 5: Third fragment (also a clone) is received. Now, since we
i.e., skb_try_coalesce(head, frag)
check for
> space in tailroom of header before we do anything else, it slips
> in there, and bypasses the already chained-up second segment.
More background: our reassembly code is based on the one found in
ip_fragment.c::ip_frag_reasm(), which always first try to coalecse
a buffer with head. That is a bad idea, I guess, but I wonder why
they don't see this problem in ipv4.
>
> Regards
> ///jon
>
>
>>
>> A helper temporarily setting head->tail = head->end would do it I guess.
That would work. Or just check skb_has_frag_list(head) first, and make
the call to skb_try_coalesce() conditional to the the result.
It just feels a little unnecessary, since that test is done inside
skb_try_coalesce() anyway.
///jon
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug?
2014-04-22 20:35 ` Jon Maloy
2014-04-22 21:28 ` Jon Maloy
@ 2014-04-22 21:29 ` Eric Dumazet
2014-04-22 21:31 ` Jon Maloy
2014-04-22 21:37 ` Eric Dumazet
1 sibling, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2014-04-22 21:29 UTC (permalink / raw)
To: Jon Maloy; +Cc: Erik Hugne, netdev
On Tue, 2014-04-22 at 16:35 -0400, Jon Maloy wrote:
> On 04/22/2014 04:05 PM, Eric Dumazet wrote:
> > On Tue, 2014-04-22 at 15:38 -0400, Jon Maloy wrote:
> >
> >>
> >> In the case I encountered, our head buffer is linear (skb->data_len == 0),
> >> so it is the real tailroom value that is returned. An alas, that one is big
> >> enough to contain the last (small) fragment of the message.
> >
> >
> > Whole point of skb_try_coalesce() is to coalesce as much as possible,
> > without guarantee of keeping some sort of 'segments'
> >
> > skb_try_coalesce - try to merge skb to prior one
> >
> > If you do not want this to happen, (you seem to want nothing else in
> > your head buffer skb->head), you need to add some logic.
>
> Ok. I should have given a little background.
>
> 1: We send a message of 3041 bytes, inclusive TIPC header, via loopback interface.
>
> 2: This one gets chopped up in three fragments: 1420, 1420,and 201 bytes.
> (The mtu was of course wrong, but this is how I discovered the problem).
>
> 3: First fragment is received, uncloned, and serves as head.
>
> 4; Second fragment (a clone) is received. skb_try_coalesce() fails at
> the skb_head_is_locked() test, because the buffer is a clone.
> Because of this, we add the buffer to skb_shinfo(head)->frag_list
> instead.
Then if you do that, you also need to change head->data_len !
>
> 5: Third fragment (also a clone) is received. Now, since we check for
> space in tailroom of header before we do anything else, it slips
> in there, and bypasses the already chained-up second segment.
Only because you failed to tell the world @head was no longer linear.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug?
2014-04-22 21:29 ` Eric Dumazet
@ 2014-04-22 21:31 ` Jon Maloy
2014-04-22 21:37 ` Eric Dumazet
1 sibling, 0 replies; 12+ messages in thread
From: Jon Maloy @ 2014-04-22 21:31 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Erik Hugne, netdev
On 04/22/2014 05:29 PM, Eric Dumazet wrote:
> On Tue, 2014-04-22 at 16:35 -0400, Jon Maloy wrote:
>> On 04/22/2014 04:05 PM, Eric Dumazet wrote:
>>> On Tue, 2014-04-22 at 15:38 -0400, Jon Maloy wrote:
>>>
>>>>
>>>> In the case I encountered, our head buffer is linear (skb->data_len == 0),
>>>> so it is the real tailroom value that is returned. An alas, that one is big
>>>> enough to contain the last (small) fragment of the message.
>>>
>>>
>>> Whole point of skb_try_coalesce() is to coalesce as much as possible,
>>> without guarantee of keeping some sort of 'segments'
>>>
>>> skb_try_coalesce - try to merge skb to prior one
>>>
>>> If you do not want this to happen, (you seem to want nothing else in
>>> your head buffer skb->head), you need to add some logic.
>>
>> Ok. I should have given a little background.
>>
>> 1: We send a message of 3041 bytes, inclusive TIPC header, via loopback interface.
>>
>> 2: This one gets chopped up in three fragments: 1420, 1420,and 201 bytes.
>> (The mtu was of course wrong, but this is how I discovered the problem).
>>
>> 3: First fragment is received, uncloned, and serves as head.
>>
>> 4; Second fragment (a clone) is received. skb_try_coalesce() fails at
>> the skb_head_is_locked() test, because the buffer is a clone.
>> Because of this, we add the buffer to skb_shinfo(head)->frag_list
>> instead.
>
> Then if you do that, you also need to change head->data_len !
>
>>
>> 5: Third fragment (also a clone) is received. Now, since we check for
>> space in tailroom of header before we do anything else, it slips
>> in there, and bypasses the already chained-up second segment.
>
> Only because you failed to tell the world @head was no longer linear.
Ok. Thank you. Lesson learned.
///jon
>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug?
2014-04-22 21:29 ` Eric Dumazet
2014-04-22 21:31 ` Jon Maloy
@ 2014-04-22 21:37 ` Eric Dumazet
2014-04-23 16:56 ` Jon Maloy
1 sibling, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2014-04-22 21:37 UTC (permalink / raw)
To: Jon Maloy; +Cc: Erik Hugne, netdev
On Tue, 2014-04-22 at 14:29 -0700, Eric Dumazet wrote:
> Then if you do that, you also need to change head->data_len !
Untested patch would be :
diff --git a/net/tipc/link.c b/net/tipc/link.c
index c5190ab75290..85077dd7c63e 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -2349,6 +2349,7 @@ int tipc_link_frag_rcv(struct sk_buff **head, struct sk_buff **tail,
(*tail)->next = frag;
*tail = frag;
(*head)->truesize += frag->truesize;
+ (*head)->data_len += frag->len;
}
if (fragid == LAST_FRAGMENT) {
*fbuf = *head;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug?
2014-04-22 21:37 ` Eric Dumazet
@ 2014-04-23 16:56 ` Jon Maloy
2014-04-23 17:33 ` David Miller
0 siblings, 1 reply; 12+ messages in thread
From: Jon Maloy @ 2014-04-23 16:56 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Erik Hugne, netdev
On 04/22/2014 05:37 PM, Eric Dumazet wrote:
> On Tue, 2014-04-22 at 14:29 -0700, Eric Dumazet wrote:
>
>> Then if you do that, you also need to change head->data_len !
>
> Untested patch would be :
>
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index c5190ab75290..85077dd7c63e 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -2349,6 +2349,7 @@ int tipc_link_frag_rcv(struct sk_buff **head, struct sk_buff **tail,
> (*tail)->next = frag;
> *tail = frag;
> (*head)->truesize += frag->truesize;
> + (*head)->data_len += frag->len;
Just to confirm, does this mean that head's own (linear) data is not
included in data_len?
///jon
> }
> if (fragid == LAST_FRAGMENT) {
> *fbuf = *head;
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug?
2014-04-23 16:56 ` Jon Maloy
@ 2014-04-23 17:33 ` David Miller
2014-04-23 17:54 ` Jon Maloy
0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2014-04-23 17:33 UTC (permalink / raw)
To: jon.maloy; +Cc: eric.dumazet, erik.hugne, netdev
From: Jon Maloy <jon.maloy@ericsson.com>
Date: Wed, 23 Apr 2014 12:56:20 -0400
> On 04/22/2014 05:37 PM, Eric Dumazet wrote:
>> On Tue, 2014-04-22 at 14:29 -0700, Eric Dumazet wrote:
>>
>>> Then if you do that, you also need to change head->data_len !
>>
>> Untested patch would be :
>>
>> diff --git a/net/tipc/link.c b/net/tipc/link.c
>> index c5190ab75290..85077dd7c63e 100644
>> --- a/net/tipc/link.c
>> +++ b/net/tipc/link.c
>> @@ -2349,6 +2349,7 @@ int tipc_link_frag_rcv(struct sk_buff **head, struct sk_buff **tail,
>> (*tail)->next = frag;
>> *tail = frag;
>> (*head)->truesize += frag->truesize;
>> + (*head)->data_len += frag->len;
>
> Just to confirm, does this mean that head's own (linear) data is not
> included in data_len?
For a given SKB, skb->len is the entire length of the packet, fragments and
all.
skb->data_len counts the sum of all of the page and SKB based fragments, ie.
all bytes which are not in the top-level SKBs linear area.
So the linear length is always "skb->len - skb->data_len", and this is exactly
what skb_headlen() does.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: skb_try_coalesce bug?
2014-04-23 17:33 ` David Miller
@ 2014-04-23 17:54 ` Jon Maloy
0 siblings, 0 replies; 12+ messages in thread
From: Jon Maloy @ 2014-04-23 17:54 UTC (permalink / raw)
To: David Miller; +Cc: eric.dumazet, erik.hugne, netdev
On 04/23/2014 01:33 PM, David Miller wrote:
> From: Jon Maloy <jon.maloy@ericsson.com>
> Date: Wed, 23 Apr 2014 12:56:20 -0400
>
>> On 04/22/2014 05:37 PM, Eric Dumazet wrote:
>>> On Tue, 2014-04-22 at 14:29 -0700, Eric Dumazet wrote:
>>>
>>>> Then if you do that, you also need to change head->data_len !
>>>
>>> Untested patch would be :
>>>
>>> diff --git a/net/tipc/link.c b/net/tipc/link.c
>>> index c5190ab75290..85077dd7c63e 100644
>>> --- a/net/tipc/link.c
>>> +++ b/net/tipc/link.c
>>> @@ -2349,6 +2349,7 @@ int tipc_link_frag_rcv(struct sk_buff **head, struct sk_buff **tail,
>>> (*tail)->next = frag;
>>> *tail = frag;
>>> (*head)->truesize += frag->truesize;
>>> + (*head)->data_len += frag->len;
>>
>> Just to confirm, does this mean that head's own (linear) data is not
>> included in data_len?
>
> For a given SKB, skb->len is the entire length of the packet, fragments and
> all.
>
> skb->data_len counts the sum of all of the page and SKB based fragments, ie.
> all bytes which are not in the top-level SKBs linear area.
>
> So the linear length is always "skb->len - skb->data_len", and this is exactly
> what skb_headlen() does.
>
Thank you for the clarification. We'll have to fix this. I am just puzzled that
our defragmentation algorithm has worked as well as it has until now.
///jon
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-04-23 17:54 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-22 12:01 skb_try_coalesce bug? Erik Hugne
2014-04-22 13:11 ` Eric Dumazet
2014-04-22 19:38 ` Jon Maloy
2014-04-22 20:05 ` Eric Dumazet
2014-04-22 20:35 ` Jon Maloy
2014-04-22 21:28 ` Jon Maloy
2014-04-22 21:29 ` Eric Dumazet
2014-04-22 21:31 ` Jon Maloy
2014-04-22 21:37 ` Eric Dumazet
2014-04-23 16:56 ` Jon Maloy
2014-04-23 17:33 ` David Miller
2014-04-23 17:54 ` Jon Maloy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).