* [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO @ 2008-06-18 16:07 ` Octavian Purdila 2008-06-20 6:37 ` Jarek Poplawski 2008-06-28 0:27 ` David Miller 0 siblings, 2 replies; 15+ messages in thread From: Octavian Purdila @ 2008-06-18 16:07 UTC (permalink / raw) To: netdev [-- Attachment #1: Type: text/plain, Size: 511 bytes --] This is a resend of "[PATCH] Fix frag list handling in TCP splice receive", modified to conform to the new commit header line rules.The commit message has also been edited, and hopefully it is more clear now. It fixes an infinite loop problem, in tcp_read_splice, that I can easily reproduce on my setup with a simple splice test program on the receive side. AFAICS, this problem should affect all setups which use software LRO and TCP splice receive. Did anybody tried this combination? Thanks, tavi [-- Attachment #2: [PATCH] tcp: fix for splice receive when used with software LRO --] [-- Type: text/plain, Size: 2213 bytes --] commit 7c1ca9b811e2d38b8a6b5ea02b8211bf7df4e4ab Author: Octavian Purdila <opurdila@ixiacom.com> Date: Wed Jun 18 18:24:06 2008 +0300 tcp: fix for splice receive when used with software LRO If an skb has nr_frags set to zero but its frag_list is not empty (as it can happen if software LRO is enabled), and a previous tcp_read_sock has consumed the linear part of the skb, then __skb_splice_bits: (a) incorrectly reports an error and (b) forgets to update the offset to account for the linear part Any of the two problems will cause the subsequent __skb_splice_bits call (the one that handles the frag_list skbs) to either skip data, or, if the unadjusted offset is greater then the size of the next skb in the frag_list, make tcp_splice_read loop forever. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 874790b..27cb0d3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1198,12 +1198,14 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, { unsigned int nr_pages = spd->nr_pages; unsigned int poff, plen, len, toff, tlen; - int headlen, seg; + int headlen, seg, error = 0; toff = *offset; tlen = *total_len; - if (!tlen) + if (!tlen) { + error = 1; goto err; + } /* * if the offset is greater than the linear part, go directly to @@ -1245,7 +1247,8 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, * just jump directly to update and return, no point * in going over fragments when the output is full. */ - if (spd_fill_page(spd, virt_to_page(p), plen, poff, skb)) + error = spd_fill_page(spd, virt_to_page(p), plen, poff, skb); + if (error) goto done; tlen -= plen; @@ -1278,7 +1281,8 @@ map_frag: if (!plen) break; - if (spd_fill_page(spd, f->page, plen, poff, skb)) + error = spd_fill_page(spd, f->page, plen, poff, skb); + if (error) break; tlen -= plen; @@ -1291,7 +1295,10 @@ done: return 0; } err: - return 1; + /* update the offset to reflect the linear part skip, if any */ + if (!error) + *offset = toff; + return error; } /* ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-18 16:07 ` [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO Octavian Purdila @ 2008-06-20 6:37 ` Jarek Poplawski 2008-06-20 10:09 ` Octavian Purdila 2008-06-28 0:27 ` David Miller 1 sibling, 1 reply; 15+ messages in thread From: Jarek Poplawski @ 2008-06-20 6:37 UTC (permalink / raw) To: Octavian Purdila; +Cc: netdev On 18-06-2008 18:07, Octavian Purdila wrote: ... > tcp: fix for splice receive when used with software LRO > > If an skb has nr_frags set to zero but its frag_list is not empty (as > it can happen if software LRO is enabled), and a previous > tcp_read_sock has consumed the linear part of the skb, then > __skb_splice_bits: > > (a) incorrectly reports an error and > > (b) forgets to update the offset to account for the linear part > > Any of the two problems will cause the subsequent __skb_splice_bits > call (the one that handles the frag_list skbs) to either skip data, > or, if the unadjusted offset is greater then the size of the next skb > in the frag_list, make tcp_splice_read loop forever. > > Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 874790b..27cb0d3 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -1198,12 +1198,14 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, > { > unsigned int nr_pages = spd->nr_pages; > unsigned int poff, plen, len, toff, tlen; > - int headlen, seg; > + int headlen, seg, error = 0; > > toff = *offset; > tlen = *total_len; > - if (!tlen) > + if (!tlen) { > + error = 1; > goto err; > + } > > /* > * if the offset is greater than the linear part, go directly to > @@ -1245,7 +1247,8 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, > * just jump directly to update and return, no point > * in going over fragments when the output is full. > */ > - if (spd_fill_page(spd, virt_to_page(p), plen, poff, skb)) > + error = spd_fill_page(spd, virt_to_page(p), plen, poff, skb); > + if (error) > goto done; > > tlen -= plen; > @@ -1278,7 +1281,8 @@ map_frag: > if (!plen) > break; > > - if (spd_fill_page(spd, f->page, plen, poff, skb)) > + error = spd_fill_page(spd, f->page, plen, poff, skb); > + if (error) > break; Hi, This patch looks fine to me, but I wonder if, btw., this place can't be optimized a bit, so why can't we simply: if (spd_fill_page(spd, f->page, plen, poff, skb)) goto err; in both cases, since nothing more can't be filled after this? Regards, Jarek P. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-20 6:37 ` Jarek Poplawski @ 2008-06-20 10:09 ` Octavian Purdila 2008-06-20 11:01 ` Jarek Poplawski 2008-06-20 21:57 ` Jarek Poplawski 0 siblings, 2 replies; 15+ messages in thread From: Octavian Purdila @ 2008-06-20 10:09 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 524 bytes --] On Friday 20 June 2008, Jarek Poplawski wrote: > This patch looks fine to me, but I wonder if, btw., this place can't > be optimized a bit, so why can't we simply: > > if (spd_fill_page(spd, f->page, plen, poff, skb)) > goto err; > > in both cases, since nothing more can't be filled after this? Yes, you are right. Here is the patch with your suggestions in place, tested. BTW, I have another trivial fix and a RFC related to tcp splice read that I've sent a while ago. Should I resend them as well? Thanks, tavi [-- Attachment #2: x --] [-- Type: text/plain, Size: 2249 bytes --] commit b4671a70a6d70cdf11537a5231193271b06a3efa Author: Octavian Purdila <opurdila@ixiacom.com> Date: Fri Jun 20 12:49:24 2008 +0300 tcp: fix for splice receive when used with software LRO If an skb has nr_frags set to zero but its frag_list is not empty (as it can happen if software LRO is enabled), and a previous tcp_read_sock has consumed the linear part of the skb, then __skb_splice_bits: (a) incorrectly reports an error and (b) forgets to update the offset to account for the linear part Any of the two problems will cause the subsequent __skb_splice_bits call (the one that handles the frag_list skbs) to either skip data, or, if the unadjusted offset is greater then the size of the next skb in the frag_list, make tcp_splice_read loop forever. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 25fa74d..4262006 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1198,12 +1198,14 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, { unsigned int nr_pages = spd->nr_pages; unsigned int poff, plen, len, toff, tlen; - int headlen, seg; + int headlen, seg, error = 0; toff = *offset; tlen = *total_len; - if (!tlen) + if (!tlen) { + error = 1; goto err; + } /* * if the offset is greater than the linear part, go directly to @@ -1245,8 +1247,9 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, * just jump directly to update and return, no point * in going over fragments when the output is full. */ - if (spd_fill_page(spd, virt_to_page(p), plen, poff, skb)) - goto done; + error = spd_fill_page(spd, virt_to_page(p), plen, poff, skb); + if (error) + goto err; tlen -= plen; } @@ -1275,8 +1278,9 @@ map_frag: if (!plen) break; - if (spd_fill_page(spd, f->page, plen, poff, skb)) - break; + error = spd_fill_page(spd, f->page, plen, poff, skb); + if (error) + goto err; tlen -= plen; } @@ -1288,7 +1292,10 @@ done: return 0; } err: - return 1; + /* update the offset to reflect the linear part skip, if any */ + if (!error) + *offset = toff; + return error; } /* ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-20 10:09 ` Octavian Purdila @ 2008-06-20 11:01 ` Jarek Poplawski 2008-06-20 12:39 ` Octavian Purdila 2008-06-20 21:57 ` Jarek Poplawski 1 sibling, 1 reply; 15+ messages in thread From: Jarek Poplawski @ 2008-06-20 11:01 UTC (permalink / raw) To: Octavian Purdila; +Cc: netdev On Fri, Jun 20, 2008 at 01:09:35PM +0300, Octavian Purdila wrote: > On Friday 20 June 2008, Jarek Poplawski wrote: > > > This patch looks fine to me, but I wonder if, btw., this place can't > > be optimized a bit, so why can't we simply: > > > > if (spd_fill_page(spd, f->page, plen, poff, skb)) > > goto err; > > > > in both cases, since nothing more can't be filled after this? > > Yes, you are right. Here is the patch with your suggestions in place, tested. Hmm... Of course, this patch looks fine too, but since I didn't expect you respond so fast with a patch, I "forgot" to add a "if so" question (sorry for this and for breaking this thread!): If so, can't we make it all simpler now, without this new "error" variable, and simply skipping this: "if (spd->nr_pages - nr_pages)" test under "done:", so, doing this block unconditional or maybe (I didn't check this enough) changing this test a little to catch there this offset update for frag_list? > > BTW, I have another trivial fix and a RFC related to tcp splice read that I've > sent a while ago. Should I resend them as well? Since this was quite a while ago, I guess you could. And, btw. maybe try to Cc David Miller. Thanks, Jarek P. > > Thanks, > tavi > commit b4671a70a6d70cdf11537a5231193271b06a3efa > Author: Octavian Purdila <opurdila@ixiacom.com> > Date: Fri Jun 20 12:49:24 2008 +0300 > > tcp: fix for splice receive when used with software LRO > > If an skb has nr_frags set to zero but its frag_list is not empty (as > it can happen if software LRO is enabled), and a previous > tcp_read_sock has consumed the linear part of the skb, then > __skb_splice_bits: > > (a) incorrectly reports an error and > > (b) forgets to update the offset to account for the linear part > > Any of the two problems will cause the subsequent __skb_splice_bits > call (the one that handles the frag_list skbs) to either skip data, > or, if the unadjusted offset is greater then the size of the next skb > in the frag_list, make tcp_splice_read loop forever. > > Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 25fa74d..4262006 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -1198,12 +1198,14 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, > { > unsigned int nr_pages = spd->nr_pages; > unsigned int poff, plen, len, toff, tlen; > - int headlen, seg; > + int headlen, seg, error = 0; > > toff = *offset; > tlen = *total_len; > - if (!tlen) > + if (!tlen) { > + error = 1; > goto err; > + } > > /* > * if the offset is greater than the linear part, go directly to > @@ -1245,8 +1247,9 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, > * just jump directly to update and return, no point > * in going over fragments when the output is full. > */ > - if (spd_fill_page(spd, virt_to_page(p), plen, poff, skb)) > - goto done; > + error = spd_fill_page(spd, virt_to_page(p), plen, poff, skb); > + if (error) > + goto err; > > tlen -= plen; > } > @@ -1275,8 +1278,9 @@ map_frag: > if (!plen) > break; > > - if (spd_fill_page(spd, f->page, plen, poff, skb)) > - break; > + error = spd_fill_page(spd, f->page, plen, poff, skb); > + if (error) > + goto err; > > tlen -= plen; > } > @@ -1288,7 +1292,10 @@ done: > return 0; > } > err: > - return 1; > + /* update the offset to reflect the linear part skip, if any */ > + if (!error) > + *offset = toff; > + return error; > } > > /* ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-20 11:01 ` Jarek Poplawski @ 2008-06-20 12:39 ` Octavian Purdila 2008-06-20 13:01 ` Jarek Poplawski 0 siblings, 1 reply; 15+ messages in thread From: Octavian Purdila @ 2008-06-20 12:39 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev On Friday 20 June 2008, Jarek Poplawski wrote: > If so, can't we make it all simpler now, without this new "error" > variable, and simply skipping this: "if (spd->nr_pages - nr_pages)" > test under "done:", so, doing this block unconditional or maybe (I > didn't check this enough) changing this test a little to catch there > this offset update for frag_list? Let me try to wrap my brain around this again :) and see if we don't miss anything. In the process, maybe I can even find a way to make this code easier to understand. Thanks, tavi ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-20 12:39 ` Octavian Purdila @ 2008-06-20 13:01 ` Jarek Poplawski 2008-06-20 20:44 ` Octavian Purdila 0 siblings, 1 reply; 15+ messages in thread From: Jarek Poplawski @ 2008-06-20 13:01 UTC (permalink / raw) To: Octavian Purdila; +Cc: netdev On Fri, Jun 20, 2008 at 03:39:24PM +0300, Octavian Purdila wrote: > On Friday 20 June 2008, Jarek Poplawski wrote: > > > If so, can't we make it all simpler now, without this new "error" > > variable, and simply skipping this: "if (spd->nr_pages - nr_pages)" > > test under "done:", so, doing this block unconditional or maybe (I > > didn't check this enough) changing this test a little to catch there > > this offset update for frag_list? > > Let me try to wrap my brain around this again :) and see if we don't miss > anything. In the process, maybe I can even find a way to make this code > easier to understand. > Sure, no need to hurry, and after all, I'm OK with the current patch too. Jarek P. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-20 13:01 ` Jarek Poplawski @ 2008-06-20 20:44 ` Octavian Purdila 2008-06-22 21:07 ` Jarek Poplawski 0 siblings, 1 reply; 15+ messages in thread From: Octavian Purdila @ 2008-06-20 20:44 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 862 bytes --] On Friday 20 June 2008, Jarek Poplawski wrote: > > > > Let me try to wrap my brain around this again :) and see if we don't miss > > anything. In the process, maybe I can even find a way to make this code > > easier to understand. > > Sure, no need to hurry, and after all, I'm OK with the current patch too. > OK, here is my attempt at making it a bit more readable: - move all of the details on offsets, lengths and buffers into a single function instead of doing these operation from multiple places - use a bottom up approach - try to avoid details in the high level functions, introduce them gradually as you go deeper in the function call stack I've attached both the patch and a full excerpt - it might be easier to comment on it this way. I've minimally tested the patch, no obvious functional or performance problems were found. Thanks, tavi [-- Attachment #2: y --] [-- Type: text/plain, Size: 3161 bytes --] static inline void __segment_seek(struct page **page, int *poff, int *plen, int off) { *poff += off; *page += *poff / PAGE_SIZE; *poff = *poff % PAGE_SIZE; *plen -= off; } static inline int __splice_segment(struct page *page, unsigned int poff, unsigned int plen, unsigned int *off, unsigned int *len, struct sk_buff *skb, struct splice_pipe_desc *spd) { /* skip this segment if already processed */ if (*off >= plen) { *off -= plen; return 0; } /* ignore any bits we already processed */ if (*off) { __segment_seek(&page, &poff, &plen, *off); *off = 0; } do { unsigned int flen = min(*len, plen); if (spd_fill_page(spd, page, flen, poff, skb)) return 1; __segment_seek(&page, &poff, &plen, flen); *len -= flen; } while (*len && plen); return 0; } int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, unsigned int *len, struct splice_pipe_desc *spd) { int seg; /* * map the linear part */ if (__splice_segment(virt_to_page(skb->data), (unsigned long) skb->data & (PAGE_SIZE - 1), skb_headlen(skb), offset, len, skb, spd)) return 1; /* * then map the fragments */ for (seg = 0; seg < skb_shinfo(skb)->nr_frags; seg++) { const skb_frag_t *f = &skb_shinfo(skb)->frags[seg]; if (__splice_segment(f->page, f->page_offset, f->size, offset, len, skb, spd)) return 1; } /* * now see if we have a frag_list to map */ if (skb_shinfo(skb)->frag_list) { struct sk_buff *list = skb_shinfo(skb)->frag_list; for (; list && *len; list = list->next) if (__skb_splice_bits(list, offset, len, spd)) return 1; } return 0; } /* * Map data from the skb to a pipe. Should handle both the linear part, * the fragments, and the frag list. */ int skb_splice_bits(struct sk_buff *__skb, unsigned int offset, struct pipe_inode_info *pipe, unsigned int tlen, unsigned int flags) { struct partial_page partial[PIPE_BUFFERS]; struct page *pages[PIPE_BUFFERS]; struct splice_pipe_desc spd = { .pages = pages, .partial = partial, .flags = flags, .ops = &sock_pipe_buf_ops, .spd_release = sock_spd_release, }; struct sk_buff *skb; /* * I'd love to avoid the clone here, but tcp_read_sock() * ignores reference counts and unconditonally kills the sk_buff * on return from the actor. */ skb = skb_clone(__skb, GFP_KERNEL); if (unlikely(!skb)) return -ENOMEM; __skb_splice_bits(skb, &offset, &tlen, &spd); /* * drop our reference to the clone, the pipe consumption will * drop the rest. */ kfree_skb(skb); if (spd.nr_pages) { int ret; struct sock *sk = __skb->sk; /* * Drop the socket lock, otherwise we have reverse * locking dependencies between sk_lock and i_mutex * here as compared to sendfile(). We enter here * with the socket lock held, and splice_to_pipe() will * grab the pipe inode lock. For sendfile() emulation, * we call into ->sendpage() with the i_mutex lock held * and networking will grab the socket lock. */ release_sock(sk); ret = splice_to_pipe(pipe, &spd); lock_sock(sk); return ret; } return 0; } [-- Attachment #3: x --] [-- Type: text/plain, Size: 4985 bytes --] diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 25fa74d..017daef 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1188,114 +1188,89 @@ static inline int spd_fill_page(struct splice_pipe_desc *spd, struct page *page, return 0; } -/* - * Map linear and fragment data from the skb to spd. Returns number of - * pages mapped. - */ -static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, - unsigned int *total_len, - struct splice_pipe_desc *spd) +static inline void __segment_seek(struct page **page, int *poff, int *plen, + int off) { - unsigned int nr_pages = spd->nr_pages; - unsigned int poff, plen, len, toff, tlen; - int headlen, seg; + *poff += off; + *page += *poff / PAGE_SIZE; + *poff = *poff % PAGE_SIZE; + *plen -= off; +} - toff = *offset; - tlen = *total_len; - if (!tlen) - goto err; +static inline int __splice_segment(struct page *page, unsigned int poff, + unsigned int plen, unsigned int *off, + unsigned int *len, struct sk_buff *skb, + struct splice_pipe_desc *spd) +{ + /* skip this segment if already processed */ + if (*off >= plen) { + *off -= plen; + return 0; + } - /* - * if the offset is greater than the linear part, go directly to - * the fragments. - */ - headlen = skb_headlen(skb); - if (toff >= headlen) { - toff -= headlen; - goto map_frag; + /* ignore any bits we already processed */ + if (*off) { + __segment_seek(&page, &poff, &plen, *off); + *off = 0; } - /* - * first map the linear region into the pages/partial map, skipping - * any potential initial offset. - */ - len = 0; - while (len < headlen) { - void *p = skb->data + len; - - poff = (unsigned long) p & (PAGE_SIZE - 1); - plen = min_t(unsigned int, headlen - len, PAGE_SIZE - poff); - len += plen; - - if (toff) { - if (plen <= toff) { - toff -= plen; - continue; - } - plen -= toff; - poff += toff; - toff = 0; - } + do { + unsigned int flen = min(*len, plen); - plen = min(plen, tlen); - if (!plen) - break; + if (spd_fill_page(spd, page, flen, poff, skb)) + return 1; - /* - * just jump directly to update and return, no point - * in going over fragments when the output is full. - */ - if (spd_fill_page(spd, virt_to_page(p), plen, poff, skb)) - goto done; + __segment_seek(&page, &poff, &plen, flen); + *len -= flen; - tlen -= plen; - } + } while (*len && plen); + + return 0; +} + +int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, + unsigned int *len, + struct splice_pipe_desc *spd) +{ + int seg; + + /* + * map the linear part + */ + if (__splice_segment(virt_to_page(skb->data), + (unsigned long) skb->data & (PAGE_SIZE - 1), + skb_headlen(skb), + offset, len, skb, spd)) + return 1; /* * then map the fragments */ -map_frag: for (seg = 0; seg < skb_shinfo(skb)->nr_frags; seg++) { const skb_frag_t *f = &skb_shinfo(skb)->frags[seg]; - plen = f->size; - poff = f->page_offset; - - if (toff) { - if (plen <= toff) { - toff -= plen; - continue; - } - plen -= toff; - poff += toff; - toff = 0; - } - - plen = min(plen, tlen); - if (!plen) - break; + if (__splice_segment(f->page, f->page_offset, f->size, + offset, len, skb, spd)) + return 1; + } - if (spd_fill_page(spd, f->page, plen, poff, skb)) - break; + /* + * now see if we have a frag_list to map + */ + if (skb_shinfo(skb)->frag_list) { + struct sk_buff *list = skb_shinfo(skb)->frag_list; - tlen -= plen; + for (; list && *len; list = list->next) + if (__skb_splice_bits(list, offset, len, spd)) + return 1; } -done: - if (spd->nr_pages - nr_pages) { - *offset = 0; - *total_len = tlen; - return 0; - } -err: - return 1; + return 0; } /* * Map data from the skb to a pipe. Should handle both the linear part, - * the fragments, and the frag list. It does NOT handle frag lists within - * the frag list, if such a thing exists. We'd probably need to recurse to - * handle that cleanly. + * the fragments, and the frag list. */ int skb_splice_bits(struct sk_buff *__skb, unsigned int offset, struct pipe_inode_info *pipe, unsigned int tlen, @@ -1321,28 +1296,9 @@ int skb_splice_bits(struct sk_buff *__skb, unsigned int offset, if (unlikely(!skb)) return -ENOMEM; - /* - * __skb_splice_bits() only fails if the output has no room left, - * so no point in going over the frag_list for the error case. - */ - if (__skb_splice_bits(skb, &offset, &tlen, &spd)) - goto done; - else if (!tlen) - goto done; - - /* - * now see if we have a frag_list to map - */ - if (skb_shinfo(skb)->frag_list) { - struct sk_buff *list = skb_shinfo(skb)->frag_list; - for (; list && tlen; list = list->next) { - if (__skb_splice_bits(list, &offset, &tlen, &spd)) - break; - } - } + __skb_splice_bits(skb, &offset, &tlen, &spd); -done: /* * drop our reference to the clone, the pipe consumption will * drop the rest. ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-20 20:44 ` Octavian Purdila @ 2008-06-22 21:07 ` Jarek Poplawski 2008-06-23 9:50 ` Octavian Purdila 0 siblings, 1 reply; 15+ messages in thread From: Jarek Poplawski @ 2008-06-22 21:07 UTC (permalink / raw) To: Octavian Purdila; +Cc: netdev On Fri, Jun 20, 2008 at 11:44:12PM +0300, Octavian Purdila wrote: ... > OK, here is my attempt at making it a bit more readable: > > - move all of the details on offsets, lengths and buffers into a > single function instead of doing these operation from multiple places > > - use a bottom up approach - try to avoid details in the high level functions, > introduce them gradually as you go deeper in the function call stack > > I've attached both the patch and a full excerpt - it might be easier to > comment on it this way. > > I've minimally tested the patch, no obvious functional or performance problems > were found. IMHO it's really more readable, and probably should be sometimes faster if these divisions are optimized by a compiler. So, since the work is done anyway, you could try to submit this - you got nothing to lose. But, I think it's better to separate the change of functionality (a recursive processing of frag_list) to another patch (if there is a practical reason for this). A few of my doubts below as //?? comments. Regards, Jarek P. static inline void __segment_seek(struct page **page, int *poff, int *plen, int off) //?? unsigned ints (especially *poff for "/,%" optimization)? { *poff += off; *page += *poff / PAGE_SIZE; *poff = *poff % PAGE_SIZE; *plen -= off; } static inline int __splice_segment(struct page *page, unsigned int poff, unsigned int plen, unsigned int *off, unsigned int *len, struct sk_buff *skb, struct splice_pipe_desc *spd) { //?? if (!*len) //?? return 1; /* skip this segment if already processed */ if (*off >= plen) { *off -= plen; return 0; } /* ignore any bits we already processed */ if (*off) { __segment_seek(&page, &poff, &plen, *off); *off = 0; } do { unsigned int flen = min(*len, plen); //?? needed for a linear region?: //?? flen = min_t(unsigned int, flen, PAGE_SIZE - poff); if (spd_fill_page(spd, page, flen, poff, skb)) return 1; __segment_seek(&page, &poff, &plen, flen); *len -= flen; } while (*len && plen); return 0; } //?? The original comment with fixed "Returns..." (unless changed to void)? int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, unsigned int *len, struct splice_pipe_desc *spd) { int seg; /* * map the linear part */ if (__splice_segment(virt_to_page(skb->data), (unsigned long) skb->data & (PAGE_SIZE - 1), skb_headlen(skb), offset, len, skb, spd)) return 1; /* * then map the fragments */ for (seg = 0; seg < skb_shinfo(skb)->nr_frags; seg++) { const skb_frag_t *f = &skb_shinfo(skb)->frags[seg]; if (__splice_segment(f->page, f->page_offset, f->size, offset, len, skb, spd)) return 1; } //?? I'm not sure this recursion is really needed here, so I'd prefer //?? to move this back to skb_splice_bits() for now, and maybe to //?? propose this change as a separate patch giving the reasons for this. /* * now see if we have a frag_list to map */ if (skb_shinfo(skb)->frag_list) { struct sk_buff *list = skb_shinfo(skb)->frag_list; for (; list && *len; list = list->next) if (__skb_splice_bits(list, offset, len, spd)) return 1; } return 0; } ... ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-22 21:07 ` Jarek Poplawski @ 2008-06-23 9:50 ` Octavian Purdila 2008-06-23 20:48 ` Jarek Poplawski 0 siblings, 1 reply; 15+ messages in thread From: Octavian Purdila @ 2008-06-23 9:50 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev On Monday 23 June 2008, Jarek Poplawski wrote: > > IMHO it's really more readable, and probably should be sometimes faster > if these divisions are optimized by a compiler. So, since the work is > done anyway, you could try to submit this - you got nothing to lose. Cool, thanks for reviewing it. > But, I think it's better to separate the change of functionality (a > recursive processing of frag_list) to another patch (if there is a > practical reason for this). > Yes, it makes sense, I'll remove the recursion from this patch. About the recursion: don't know if it makes a difference from a functional perspective (I don't think that we can have frag_lists in frag_lists), but I've noticed that skb_copy_bits does recurse. Any idea why? > static inline void __segment_seek(struct page **page, int *poff, int *plen, > int off) > > //?? unsigned ints (especially *poff for "/,%" optimization)? Ack, will fix. > static inline int __splice_segment(struct page *page, unsigned int poff, > unsigned int plen, unsigned int *off, > unsigned int *len, struct sk_buff *skb, > struct splice_pipe_desc *spd) > { > //?? if (!*len) > //?? return 1; Ack, will fix. > /* skip this segment if already processed */ > if (*off >= plen) { > *off -= plen; > return 0; > } > > /* ignore any bits we already processed */ > if (*off) { > __segment_seek(&page, &poff, &plen, *off); > *off = 0; > } > > do { > unsigned int flen = min(*len, plen); > > //?? needed for a linear region?: > //?? flen = min_t(unsigned int, flen, PAGE_SIZE - poff); > Oops, missed that, thanks for catching it. > > //?? The original comment with fixed "Returns..." (unless changed to void)? > Will add the comment back, since removing the recursions will make it correct again and will need the return type to decide if we go into the frag list or not, up in skb_splice_bits. > > //?? I'm not sure this recursion is really needed here, so I'd prefer > //?? to move this back to skb_splice_bits() for now, and maybe to > //?? propose this change as a separate patch giving the reasons for this. Ack. Thanks, tavi ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-23 9:50 ` Octavian Purdila @ 2008-06-23 20:48 ` Jarek Poplawski 0 siblings, 0 replies; 15+ messages in thread From: Jarek Poplawski @ 2008-06-23 20:48 UTC (permalink / raw) To: Octavian Purdila; +Cc: netdev On Mon, Jun 23, 2008 at 12:50:30PM +0300, Octavian Purdila wrote: > On Monday 23 June 2008, Jarek Poplawski wrote: ... > > But, I think it's better to separate the change of functionality (a > > recursive processing of frag_list) to another patch (if there is a > > practical reason for this). > > > > Yes, it makes sense, I'll remove the recursion from this patch. > > About the recursion: don't know if it makes a difference from a functional > perspective (I don't think that we can have frag_lists in frag_lists), but > I've noticed that skb_copy_bits does recurse. Any idea why? I meant it's potentially the change of functionality in this place. Probably this recursion is intended for 1 level only, and is natural in this place. But, since this place didn't use this, it should be safer to separate this change to another patch (of course, consistency with skb_copy_bits could be a good argument). Jarek P. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-20 10:09 ` Octavian Purdila 2008-06-20 11:01 ` Jarek Poplawski @ 2008-06-20 21:57 ` Jarek Poplawski 2008-06-21 0:40 ` Octavian Purdila 1 sibling, 1 reply; 15+ messages in thread From: Jarek Poplawski @ 2008-06-20 21:57 UTC (permalink / raw) To: Octavian Purdila; +Cc: netdev Octavian Purdila wrote, On 06/20/2008 12:09 PM: > On Friday 20 June 2008, Jarek Poplawski wrote: > >> This patch looks fine to me, but I wonder if, btw., this place can't >> be optimized a bit, so why can't we simply: >> >> if (spd_fill_page(spd, f->page, plen, poff, skb)) >> goto err; >> >> in both cases, since nothing more can't be filled after this? > > Yes, you are right. Here is the patch with your suggestions in place, tested. > > BTW, I have another trivial fix and a RFC related to tcp splice read that I've > sent a while ago. Should I resend them as well? > > Thanks, > tavi > > commit b4671a70a6d70cdf11537a5231193271b06a3efa > Author: Octavian Purdila <opurdila@ixiacom.com> > Date: Fri Jun 20 12:49:24 2008 +0300 > > tcp: fix for splice receive when used with software LRO > > If an skb has nr_frags set to zero but its frag_list is not empty (as > it can happen if software LRO is enabled), and a previous > tcp_read_sock has consumed the linear part of the skb, then > __skb_splice_bits: > > (a) incorrectly reports an error and > > (b) forgets to update the offset to account for the linear part > > Any of the two problems will cause the subsequent __skb_splice_bits > call (the one that handles the frag_list skbs) to either skip data, > or, if the unadjusted offset is greater then the size of the next skb > in the frag_list, make tcp_splice_read loop forever. > > Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 25fa74d..4262006 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -1198,12 +1198,14 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, > Octavian, since these readability changes later in this thread are quite substantial, my proposal is to separate them from this bug fix. I'll need more time to check them, but they are probably rather for net-next, while the bug fix could probably make to current. If you agree with this try to resend this once more as a new thread with David in To or Cc (and maybe re-diff this to more current tree). You can add my ack below if you like. Thanks, Jarek P. Acked-by: Jarek Poplawski <jarkao2@gmail.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-20 21:57 ` Jarek Poplawski @ 2008-06-21 0:40 ` Octavian Purdila 2008-06-21 8:39 ` Jarek Poplawski 0 siblings, 1 reply; 15+ messages in thread From: Octavian Purdila @ 2008-06-21 0:40 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 764 bytes --] On Saturday 21 June 2008, Jarek Poplawski wrote: > Octavian, since these readability changes later in this thread are quite > substantial, my proposal is to separate them from this bug fix. I'll need > more time to check them, but they are probably rather for net-next, while > the bug fix could probably make to current. If you agree with this try to > resend this once more as a new thread with David in To or Cc (and maybe > re-diff this to more current tree). You can add my ack below if you like. > Yes, I completely agree. Here is yet another patch which incorporates your earlier suggestions, hope I get them right. Minimally tested, rediffed to current net-2.6. I'll send this one to David once you acked it. Thanks for the shepherding :) , tavi [-- Attachment #2: z --] [-- Type: text/x-diff, Size: 1883 bytes --] commit 41f5beb8a6e12e0c2588aee82ba68c46baf9d5f2 Author: Octavian Purdila <opurdila@ixiacom.com> Date: Sat Jun 21 03:17:10 2008 +0300 tcp: fix for splice receive when used with software LRO If an skb has nr_frags set to zero but its frag_list is not empty (as it can happen if software LRO is enabled), and a previous tcp_read_sock has consumed the linear part of the skb, then __skb_splice_bits: (a) incorrectly reports an error and (b) forgets to update the offset to account for the linear part Any of the two problems will cause the subsequent __skb_splice_bits call (the one that handles the frag_list skbs) to either skip data, or, if the unadjusted offset is greater then the size of the next skb in the frag_list, make tcp_splice_read loop forever. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 1e556d3..d912982 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1290,7 +1290,6 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, unsigned int *total_len, struct splice_pipe_desc *spd) { - unsigned int nr_pages = spd->nr_pages; unsigned int poff, plen, len, toff, tlen; int headlen, seg; @@ -1340,7 +1339,7 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, * in going over fragments when the output is full. */ if (spd_fill_page(spd, virt_to_page(p), plen, poff, skb)) - goto done; + goto err; tlen -= plen; } @@ -1370,17 +1369,15 @@ map_frag: break; if (spd_fill_page(spd, f->page, plen, poff, skb)) - break; + goto err; tlen -= plen; } -done: - if (spd->nr_pages - nr_pages) { - *offset = 0; - *total_len = tlen; - return 0; - } + *offset = toff; + *total_len = tlen; + + return 0; err: return 1; } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-21 0:40 ` Octavian Purdila @ 2008-06-21 8:39 ` Jarek Poplawski 2008-06-21 10:32 ` Octavian Purdila 0 siblings, 1 reply; 15+ messages in thread From: Jarek Poplawski @ 2008-06-21 8:39 UTC (permalink / raw) To: Octavian Purdila; +Cc: netdev On Sat, Jun 21, 2008 at 03:40:48AM +0300, Octavian Purdila wrote: > On Saturday 21 June 2008, Jarek Poplawski wrote: > > > Octavian, since these readability changes later in this thread are quite > > substantial, my proposal is to separate them from this bug fix. I'll need > > more time to check them, but they are probably rather for net-next, while > > the bug fix could probably make to current. If you agree with this try to > > resend this once more as a new thread with David in To or Cc (and maybe > > re-diff this to more current tree). You can add my ack below if you like. > > > > Yes, I completely agree. Here is yet another patch which incorporates your > earlier suggestions, hope I get them right. Minimally tested, rediffed to > current net-2.6. I'll send this one to David once you acked it. > Hmm..., I hope my ack will not hinder too much... Thanks, Jarek P. PS: I see this other "readability" patch seems to change even more, so I'll really need some free time to figure this out. > commit 41f5beb8a6e12e0c2588aee82ba68c46baf9d5f2 > Author: Octavian Purdila <opurdila@ixiacom.com> > Date: Sat Jun 21 03:17:10 2008 +0300 > > tcp: fix for splice receive when used with software LRO > > If an skb has nr_frags set to zero but its frag_list is not empty (as > it can happen if software LRO is enabled), and a previous > tcp_read_sock has consumed the linear part of the skb, then > __skb_splice_bits: > > (a) incorrectly reports an error and > > (b) forgets to update the offset to account for the linear part > > Any of the two problems will cause the subsequent __skb_splice_bits > call (the one that handles the frag_list skbs) to either skip data, > or, if the unadjusted offset is greater then the size of the next skb > in the frag_list, make tcp_splice_read loop forever. > > Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Acked-by: Jarek Poplawski <jarkao2@gmail.com> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 1e556d3..d912982 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -1290,7 +1290,6 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, > unsigned int *total_len, > struct splice_pipe_desc *spd) > { > - unsigned int nr_pages = spd->nr_pages; > unsigned int poff, plen, len, toff, tlen; > int headlen, seg; > > @@ -1340,7 +1339,7 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, > * in going over fragments when the output is full. > */ > if (spd_fill_page(spd, virt_to_page(p), plen, poff, skb)) > - goto done; > + goto err; > > tlen -= plen; > } > @@ -1370,17 +1369,15 @@ map_frag: > break; > > if (spd_fill_page(spd, f->page, plen, poff, skb)) > - break; > + goto err; > > tlen -= plen; > } > > -done: > - if (spd->nr_pages - nr_pages) { > - *offset = 0; > - *total_len = tlen; > - return 0; > - } > + *offset = toff; > + *total_len = tlen; > + > + return 0; > err: > return 1; > } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-21 8:39 ` Jarek Poplawski @ 2008-06-21 10:32 ` Octavian Purdila 0 siblings, 0 replies; 15+ messages in thread From: Octavian Purdila @ 2008-06-21 10:32 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev On Saturday 21 June 2008, Jarek Poplawski wrote: > > Hmm..., I hope my ack will not hinder too much... > No, it actually helped making this a better patch and me learning a great deal. > PS: I see this other "readability" patch seems to change even more, so > I'll really need some free time to figure this out. > Sure, no hurry. To be honest this is a rewrite which I started to see if I understand the tcp read splice path properly. It ended up being cleaner than the original code so I thought I should share this - but of course being the one who wrote it I am completely biased on that assessment. So if it is not obviously more "readable" to someone else, than its probably not worth spending too much time on it since we already have working / tested code :) Thanks, tavi ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO 2008-06-18 16:07 ` [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO Octavian Purdila 2008-06-20 6:37 ` Jarek Poplawski @ 2008-06-28 0:27 ` David Miller 1 sibling, 0 replies; 15+ messages in thread From: David Miller @ 2008-06-28 0:27 UTC (permalink / raw) To: opurdila; +Cc: netdev From: Octavian Purdila <opurdila@ixiacom.com> Date: Wed, 18 Jun 2008 19:07:16 +0300 > tcp: fix for splice receive when used with software LRO > > If an skb has nr_frags set to zero but its frag_list is not empty (as > it can happen if software LRO is enabled), and a previous > tcp_read_sock has consumed the linear part of the skb, then > __skb_splice_bits: > > (a) incorrectly reports an error and > > (b) forgets to update the offset to account for the linear part > > Any of the two problems will cause the subsequent __skb_splice_bits > call (the one that handles the frag_list skbs) to either skip data, > or, if the unadjusted offset is greater then the size of the next skb > in the frag_list, make tcp_splice_read loop forever. > > Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Applied, th anks Octavian. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-06-28 0:27 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <485B4ADE.8070102@domat.com.pl>
2008-06-18 16:07 ` [RESEND] [PATCH] tcp: fix for splice receive when used with software LRO Octavian Purdila
2008-06-20 6:37 ` Jarek Poplawski
2008-06-20 10:09 ` Octavian Purdila
2008-06-20 11:01 ` Jarek Poplawski
2008-06-20 12:39 ` Octavian Purdila
2008-06-20 13:01 ` Jarek Poplawski
2008-06-20 20:44 ` Octavian Purdila
2008-06-22 21:07 ` Jarek Poplawski
2008-06-23 9:50 ` Octavian Purdila
2008-06-23 20:48 ` Jarek Poplawski
2008-06-20 21:57 ` Jarek Poplawski
2008-06-21 0:40 ` Octavian Purdila
2008-06-21 8:39 ` Jarek Poplawski
2008-06-21 10:32 ` Octavian Purdila
2008-06-28 0:27 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).