From: Willy Tarreau <w@1wt.eu>
To: Jarek Poplawski <jarkao2@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>,
Changli Gao <xiaosuo@gmail.com>,
Evgeniy Polyakov <zbr@ioremap.net>,
Herbert Xu <herbert@gondor.apana.org.au>,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: Data corruption issue with splice() on 2.6.27.10
Date: Wed, 7 Jan 2009 13:22:05 +0100 [thread overview]
Message-ID: <20090107122205.GA6051@1wt.eu> (raw)
In-Reply-To: <20090107093915.GA6899@ff.dom.local>
[ CCing Evgeniy and Herbert who also participate to the thread ]
On Wed, Jan 07, 2009 at 09:39:15AM +0000, Jarek Poplawski wrote:
> On Tue, Jan 06, 2009 at 04:57:15PM +0100, Willy Tarreau wrote:
> > On Tue, Jan 06, 2009 at 10:01:13AM +0000, Jarek Poplawski wrote:
> > > On Tue, Jan 06, 2009 at 10:41:38AM +0100, Willy Tarreau wrote:
> > > ...
> > > > > Great story! Alas I don't understand this fully either, but it seems
> > > > > Changli Gao was concerned with sendpage sending this "as pages", so
> > > > > when NETIF_F_SG flag is available. Did you try this without SG btw?
> > > >
> > > > No I did not. I can try, it's not too hard. It would in part defeat the
> > > > purpose of the mechanism (especially at 10 Gbps) but at least it will
> > > > help narrow the problem down.
> > >
> > > Yes, I meant it only as a proof of concept. BTW, delaying TCP acks a
> > > bit for these sendpages should then make it more reproducible, I guess.
> >
> > OK here is an update. It does not change anything to turn off any acceleration
> > feature on the interface (tg3) :
> >
> > root@wtap:~# ethtool -k eth0
> > Offload parameters for eth0:
> > rx-checksumming: off
> > tx-checksumming: off
> > scatter-gather: off
> > tcp segmentation offload: off
> >
> > It still forwards corrupted data like mad. I noticed that the corruption rate
> > is 10-100 times higher when forwarding from eth0 to eth0 than from eth0 to lo.
> >
> > Maybe this can help find the culprit ?
>
> I hope it will, but I still don't get it. Anyway, here is an untested
> patch, which I guess partly tries Changli Gao's recommendation to give
> real pages to splice/pipe (but here it's always - not for sendpage
> only).
>
> BTW, I added Changli to Cc - great review!
>
> Thanks,
> Jarek P.
Well, I've just tested it. It did not fix the problem but made it worse.
Sending a few bytes at a time, the corruption is still there, from the
beginning. Here's what I fed :
willy@pcw:~$ nc -lp4001
azerazerazerazerazerazer
eiguhaeihgaeighaeighaeirghiareg
aeroigjaeorgjaeorgjaoeigjaeoig
ioejrgoiaerjgoiaerjgoiaerjgaoiejgoaiejg
Here's what I got :
willy@pcw:~$ telnet 10.0.3.2 4000
Trying 10.0.3.2...
Connected to 10.0.3.2.
Escape character is '^]'.
_J0s9ñuMG1S9Ðt2D$EðL$T$
However, when feeding /dev/zero as in previous tests, the kernel paniced
in skb_release_data().
Regards,
Willy
> ---
>
> net/core/skbuff.c | 41 +++++++++++++++++++++++++++--------------
> 1 files changed, 27 insertions(+), 14 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 5110b35..4c080cd 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -73,17 +73,13 @@ static struct kmem_cache *skbuff_fclone_cache __read_mostly;
> static void sock_pipe_buf_release(struct pipe_inode_info *pipe,
> struct pipe_buffer *buf)
> {
> - struct sk_buff *skb = (struct sk_buff *) buf->private;
> -
> - kfree_skb(skb);
> + put_page(buf->page);
> }
>
> static void sock_pipe_buf_get(struct pipe_inode_info *pipe,
> struct pipe_buffer *buf)
> {
> - struct sk_buff *skb = (struct sk_buff *) buf->private;
> -
> - skb_get(skb);
> + get_page(buf->page);
> }
>
> static int sock_pipe_buf_steal(struct pipe_inode_info *pipe,
> @@ -1334,9 +1330,19 @@ fault:
> */
> static void sock_spd_release(struct splice_pipe_desc *spd, unsigned int i)
> {
> - struct sk_buff *skb = (struct sk_buff *) spd->partial[i].private;
> + put_page(spd->pages[i]);
> +}
>
> - kfree_skb(skb);
> +static inline struct page *linear_to_page(struct page *page, unsigned int len,
> + unsigned int offset)
> +{
> + struct page *p = alloc_pages(GFP_KERNEL, 0);
> +
> + if (!p)
> + return NULL;
> + memcpy(p + offset, page + offset, len);
> +
> + return p;
> }
>
> /*
> @@ -1344,16 +1350,23 @@ static void sock_spd_release(struct splice_pipe_desc *spd, unsigned int i)
> */
> static inline int spd_fill_page(struct splice_pipe_desc *spd, struct page *page,
> unsigned int len, unsigned int offset,
> - struct sk_buff *skb)
> + struct sk_buff *skb, int linear)
> {
> if (unlikely(spd->nr_pages == PIPE_BUFFERS))
> return 1;
>
> + if (linear) {
> + page = linear_to_page(page, len, offset);
> + if (!page)
> + return 1;
> + }
> +
> spd->pages[spd->nr_pages] = page;
> spd->partial[spd->nr_pages].len = len;
> spd->partial[spd->nr_pages].offset = offset;
> - spd->partial[spd->nr_pages].private = (unsigned long) skb_get(skb);
> spd->nr_pages++;
> + get_page(page);
> +
> return 0;
> }
>
> @@ -1369,7 +1382,7 @@ static inline void __segment_seek(struct page **page, unsigned int *poff,
> static inline int __splice_segment(struct page *page, unsigned int poff,
> unsigned int plen, unsigned int *off,
> unsigned int *len, struct sk_buff *skb,
> - struct splice_pipe_desc *spd)
> + struct splice_pipe_desc *spd, int linear)
> {
> if (!*len)
> return 1;
> @@ -1392,7 +1405,7 @@ static inline int __splice_segment(struct page *page, unsigned int poff,
> /* the linear region may spread across several pages */
> flen = min_t(unsigned int, flen, PAGE_SIZE - poff);
>
> - if (spd_fill_page(spd, page, flen, poff, skb))
> + if (spd_fill_page(spd, page, flen, poff, skb, linear))
> return 1;
>
> __segment_seek(&page, &poff, &plen, flen);
> @@ -1419,7 +1432,7 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset,
> if (__splice_segment(virt_to_page(skb->data),
> (unsigned long) skb->data & (PAGE_SIZE - 1),
> skb_headlen(skb),
> - offset, len, skb, spd))
> + offset, len, skb, spd, 1))
> return 1;
>
> /*
> @@ -1429,7 +1442,7 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset,
> const skb_frag_t *f = &skb_shinfo(skb)->frags[seg];
>
> if (__splice_segment(f->page, f->page_offset, f->size,
> - offset, len, skb, spd))
> + offset, len, skb, spd, 0))
> return 1;
> }
>
next prev parent reply other threads:[~2009-01-07 12:25 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-24 15:28 Data corruption issue with splice() on 2.6.27.10 Willy Tarreau
2009-01-06 8:54 ` Jarek Poplawski
2009-01-06 9:41 ` Willy Tarreau
2009-01-06 10:01 ` Jarek Poplawski
2009-01-06 10:04 ` Willy Tarreau
2009-01-06 15:57 ` Willy Tarreau
2009-01-07 9:39 ` Jarek Poplawski
2009-01-07 12:22 ` Willy Tarreau [this message]
2009-01-07 12:24 ` Herbert Xu
2009-01-07 12:38 ` Jarek Poplawski
2009-01-07 12:31 ` Jarek Poplawski
2009-01-07 12:35 ` Jens Axboe
2009-01-07 12:40 ` Evgeniy Polyakov
2009-01-07 12:52 ` Willy Tarreau
2009-01-07 12:53 ` Herbert Xu
2009-01-07 12:57 ` Evgeniy Polyakov
2009-01-07 13:08 ` Willy Tarreau
2009-01-07 12:49 ` Jarek Poplawski
2009-01-07 12:52 ` Herbert Xu
2009-01-07 13:00 ` Willy Tarreau
2009-01-07 13:01 ` Herbert Xu
2009-01-07 13:02 ` Jarek Poplawski
2009-01-12 12:02 ` Herbert Xu
2009-01-12 12:45 ` Evgeniy Polyakov
2009-01-12 12:56 ` Herbert Xu
2009-01-12 12:59 ` Evgeniy Polyakov
2009-01-12 21:11 ` Herbert Xu
2009-01-12 13:15 ` Jarek Poplawski
2009-01-12 21:12 ` Herbert Xu
2009-01-19 7:32 ` Jarek Poplawski
2009-01-07 12:39 ` Willy Tarreau
2009-01-07 12:56 ` Jarek Poplawski
2009-01-07 12:44 ` Herbert Xu
2009-01-06 17:42 ` Ben Mansell
2009-01-06 18:15 ` Willy Tarreau
2009-01-08 7:16 ` Jarek Poplawski
2009-01-08 8:05 ` Willy Tarreau
2009-01-08 14:53 ` Ingo Molnar
2009-01-08 15:16 ` Ben Mansell
2009-01-08 17:14 ` Willy Tarreau
2009-01-06 18:32 ` Evgeniy Polyakov
2009-01-06 18:37 ` Jens Axboe
2009-01-06 18:55 ` Willy Tarreau
2009-01-07 4:42 ` Herbert Xu
2009-01-07 6:38 ` Willy Tarreau
2009-01-07 9:52 ` Herbert Xu
2009-01-07 9:54 ` Willy Tarreau
2009-01-07 11:52 ` Herbert Xu
2009-01-07 8:17 ` Jens Axboe
2009-01-07 11:29 ` Evgeniy Polyakov
2009-01-07 11:50 ` Herbert Xu
2009-01-07 11:56 ` Evgeniy Polyakov
2009-01-07 11:59 ` Herbert Xu
2009-01-07 12:15 ` Evgeniy Polyakov
2009-01-07 12:22 ` Herbert Xu
2009-01-07 12:27 ` Herbert Xu
2009-01-07 12:30 ` Herbert Xu
2009-01-07 12:37 ` Evgeniy Polyakov
2009-01-07 12:42 ` Herbert Xu
2009-01-07 12:46 ` Evgeniy Polyakov
2009-01-07 12:55 ` Willy Tarreau
2009-01-07 12:57 ` Herbert Xu
2009-01-07 13:02 ` Evgeniy Polyakov
2009-01-07 13:10 ` Jarek Poplawski
2009-01-07 13:15 ` Willy Tarreau
2009-01-07 13:22 ` Jarek Poplawski
2009-01-07 14:01 ` Jarek Poplawski
2009-01-06 18:50 ` Willy Tarreau
2009-01-19 8:39 ` Lennert Buytenhek
2009-01-19 9:53 ` Willy Tarreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090107122205.GA6051@1wt.eu \
--to=w@1wt.eu \
--cc=herbert@gondor.apana.org.au \
--cc=jarkao2@gmail.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=xiaosuo@gmail.com \
--cc=zbr@ioremap.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.