* [PATCH][RFC] network splice receive v2
@ 2007-06-11 11:59 Jens Axboe
2007-06-12 18:17 ` Evgeniy Polyakov
0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2007-06-11 11:59 UTC (permalink / raw)
To: netdev; +Cc: olaf.kirch, johnpol
[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]
Hi,
Here's an updated implementation of tcp network splice receive support.
It actually works for me now, no data corruption seen.
For the original announcement and how to test it, see:
http://marc.info/?l=linux-netdev&m=118103093400770&w=2
I hang on to the original skb by creating a clone of it and holding
references to that. I really don't need a clone, but tcp_read_sock() is
not being very helpful by calling sk_eat_skb() which blissfully ignores
any reference counts and uncondtionally frees the skb - why is that??
The clone works around that issue.
The current code also gets rid of the data_ready hack, and it should now
handle linear/fragments/fraglist in the skb just fine. Thanks to Olaf
for providing review of that stuff!
Patches are against the #splice branch of the block repo, "official" url
of that is:
git://git.kernel.dk/data/git/linux-2.6-block.git/
and it's based on Linus main tree. Let me know if I should supply netdev
branch patches instead, or even just provide a rolled up patch (or patch
series) for anyone curious to test or play with it.
--
Jens Axboe
[-- Attachment #2: 0001-splice-don-t-assume-regular-pages-in-splice_to_pipe.patch --]
[-- Type: text/x-patch, Size: 2138 bytes --]
>From fd79bf84fdeb15b72f256c342609257ae8a56235 Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe@oracle.com>
Date: Mon, 11 Jun 2007 13:00:32 +0200
Subject: [PATCH] splice: don't assume regular pages in splice_to_pipe()
Allow caller to pass in a release function, there might be
other resources that need releasing as well. Needed for
network receive.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/splice.c | 9 ++++++++-
include/linux/splice.h | 1 +
2 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index f24e367..25ec9c8 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -247,11 +247,16 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
}
while (page_nr < spd->nr_pages)
- page_cache_release(spd->pages[page_nr++]);
+ spd->spd_release(spd, page_nr++);
return ret;
}
+static void spd_release_page(struct splice_pipe_desc *spd, unsigned int i)
+{
+ page_cache_release(spd->pages[i]);
+}
+
static int
__generic_file_splice_read(struct file *in, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
@@ -270,6 +275,7 @@ __generic_file_splice_read(struct file *in, loff_t *ppos,
.partial = partial,
.flags = flags,
.ops = &page_cache_pipe_buf_ops,
+ .spd_release = spd_release_page,
};
index = *ppos >> PAGE_CACHE_SHIFT;
@@ -1442,6 +1448,7 @@ static long vmsplice_to_pipe(struct file *file, const struct iovec __user *iov,
.partial = partial,
.flags = flags,
.ops = &user_page_pipe_buf_ops,
+ .spd_release = spd_release_page,
};
pipe = pipe_info(file->f_path.dentry->d_inode);
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 1a1182b..04c1068 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -53,6 +53,7 @@ struct splice_pipe_desc {
int nr_pages; /* number of pages in map */
unsigned int flags; /* splice flags */
const struct pipe_buf_operations *ops;/* ops associated with output pipe */
+ void (*spd_release)(struct splice_pipe_desc *, unsigned int);
};
typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
--
1.5.2.1.174.gcd03
[-- Attachment #3: 0002-tcp_read_sock-alloc-recv_actor-return-return-nega.patch --]
[-- Type: text/x-patch, Size: 1105 bytes --]
>From 95a1ee277f2a6df5c95d786b9229ea0ffa46850d Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe@oracle.com>
Date: Mon, 4 Jun 2007 15:06:43 +0200
Subject: [PATCH] tcp_read_sock: alloc recv_actor() return return negative error value
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
net/ipv4/tcp.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index cd3c7e9..450f44b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1064,7 +1064,11 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
break;
}
used = recv_actor(desc, skb, offset, len);
- if (used <= len) {
+ if (used < 0) {
+ if (!copied)
+ copied = used;
+ break;
+ } else if (used <= len) {
seq += used;
copied += used;
offset += used;
@@ -1086,7 +1090,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
tcp_rcv_space_adjust(sk);
/* Clean up data we have read: This will do ACK frames. */
- if (copied)
+ if (copied > 0)
tcp_cleanup_rbuf(sk, copied);
return copied;
}
--
1.5.2.1.174.gcd03
[-- Attachment #4: 0003-TCP-splice-receive-support.patch --]
[-- Type: text/x-patch, Size: 13049 bytes --]
>From f0329226c6c1f521c2069358699bae5ed48f5a43 Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe@oracle.com>
Date: Mon, 11 Jun 2007 13:51:57 +0200
Subject: [PATCH] TCP splice receive support
Support for network splice receive.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
include/linux/net.h | 3 +
include/linux/skbuff.h | 5 ++
include/net/tcp.h | 3 +
net/core/skbuff.c | 175 ++++++++++++++++++++++++++++++++++++++++++++++++
net/ipv4/af_inet.c | 1 +
net/ipv4/tcp.c | 122 +++++++++++++++++++++++++++++++++
net/socket.c | 13 ++++
7 files changed, 322 insertions(+), 0 deletions(-)
diff --git a/include/linux/net.h b/include/linux/net.h
index efc4517..472ee12 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -19,6 +19,7 @@
#define _LINUX_NET_H
#include <linux/wait.h>
+#include <linux/splice.h>
#include <asm/socket.h>
struct poll_table_struct;
@@ -165,6 +166,8 @@ struct proto_ops {
struct vm_area_struct * vma);
ssize_t (*sendpage) (struct socket *sock, struct page *page,
int offset, size_t size, int flags);
+ ssize_t (*splice_read)(struct socket *sock, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len, unsigned int flags);
};
struct net_proto_family {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e7367c7..64e3eed 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1504,6 +1504,11 @@ extern int skb_store_bits(struct sk_buff *skb, int offset,
extern __wsum skb_copy_and_csum_bits(const struct sk_buff *skb,
int offset, u8 *to, int len,
__wsum csum);
+extern int skb_splice_bits(struct sk_buff *skb,
+ unsigned int offset,
+ struct pipe_inode_info *pipe,
+ unsigned int len,
+ unsigned int flags);
extern void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
extern void skb_split(struct sk_buff *skb,
struct sk_buff *skb1, const u32 len);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index a8af9ae..8e86697 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -308,6 +308,9 @@ extern int tcp_twsk_unique(struct sock *sk,
extern void tcp_twsk_destructor(struct sock *sk);
+extern ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len, unsigned int flags);
+
static inline void tcp_dec_quickack_mode(struct sock *sk,
const unsigned int pkts)
{
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7c6a34e..cbbeccb 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -52,6 +52,7 @@
#endif
#include <linux/string.h>
#include <linux/skbuff.h>
+#include <linux/splice.h>
#include <linux/cache.h>
#include <linux/rtnetlink.h>
#include <linux/init.h>
@@ -71,6 +72,40 @@
static struct kmem_cache *skbuff_head_cache __read_mostly;
static struct kmem_cache *skbuff_fclone_cache __read_mostly;
+static void sock_pipe_buf_release(struct pipe_inode_info *pipe,
+ struct pipe_buffer *buf)
+{
+ struct sk_buff *skb = (struct sk_buff *) buf->private;
+
+ kfree_skb(skb);
+}
+
+static void sock_pipe_buf_get(struct pipe_inode_info *pipe,
+ struct pipe_buffer *buf)
+{
+ struct sk_buff *skb = (struct sk_buff *) buf->private;
+
+ skb_get(skb);
+}
+
+static int sock_pipe_buf_steal(struct pipe_inode_info *pipe,
+ struct pipe_buffer *buf)
+{
+ return 1;
+}
+
+
+/* Pipe buffer operations for a socket. */
+static struct pipe_buf_operations sock_pipe_buf_ops = {
+ .can_merge = 0,
+ .map = generic_pipe_buf_map,
+ .unmap = generic_pipe_buf_unmap,
+ .pin = generic_pipe_buf_pin,
+ .release = sock_pipe_buf_release,
+ .steal = sock_pipe_buf_steal,
+ .get = sock_pipe_buf_get,
+};
+
/*
* Keep out-of-line to prevent kernel bloat.
* __builtin_return_address is not used because it is not always
@@ -1116,6 +1151,146 @@ fault:
return -EFAULT;
}
+static void sock_spd_release(struct splice_pipe_desc *spd, unsigned int i)
+{
+ struct sk_buff *skb = (struct sk_buff *) spd->partial[i].private;
+
+ kfree_skb(skb);
+}
+
+/*
+ * Fill page/offset/length into spd, if it can hold more pages.
+ */
+static inline int spd_fill_page(struct splice_pipe_desc *spd, struct page *page,
+ unsigned int len, unsigned int offset,
+ struct sk_buff *skb)
+{
+ if (unlikely(spd->nr_pages == PIPE_BUFFERS))
+ return 1;
+
+ spd->pages[spd->nr_pages] = page;
+ spd->partial[spd->nr_pages].len = len;
+ spd->partial[spd->nr_pages].offset = offset;
+ spd->partial[spd->nr_pages].private = (unsigned long) skb_get(skb);
+ spd->nr_pages++;
+ return 0;
+}
+
+/*
+ * Map linear and fragment data from the skb to spd. Returns number of
+ * pages mapped.
+ */
+static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset,
+ unsigned int *total_len,
+ struct splice_pipe_desc *spd)
+{
+ unsigned int nr_pages = spd->nr_pages;
+ unsigned int poff, plen, len, toff;
+ int headlen, seg;
+
+ toff = *offset;
+
+ /*
+ * first map the linear region into the pages/partial map, skipping
+ * any potential initial offset.
+ */
+ headlen = skb_headlen(skb);
+ len = 0;
+ for (seg = 0; len < headlen; len += PAGE_SIZE - poff) {
+ void *p = skb->data + len;
+
+ poff = (unsigned long) p & (PAGE_SIZE - 1);
+ plen = min_t(unsigned, headlen - len, PAGE_SIZE - poff);
+
+ if (toff) {
+ if (plen <= toff) {
+ toff -= plen;
+ continue;
+ }
+ plen -= toff;
+ toff = 0;
+ }
+
+ if (spd_fill_page(spd, virt_to_page(p), plen, poff, skb))
+ break;
+ }
+
+ /*
+ * then map the fragments
+ */
+ for (seg = 0; seg < skb_shinfo(skb)->nr_frags; seg++) {
+ const skb_frag_t *f = &skb_shinfo(skb)->frags[seg];
+
+ plen = f->size;
+ poff = f->page_offset;
+
+ if (toff) {
+ if (plen <= toff) {
+ toff -= plen;
+ continue;
+ }
+ plen -= toff;
+ toff = 0;
+ }
+
+ if (spd_fill_page(spd, f->page, plen, poff, skb))
+ break;
+ }
+
+ *offset = toff;
+ return spd->nr_pages - nr_pages;
+}
+
+/*
+ * Map data from the skb to a pipe. Should handle both the linear part,
+ * the fragments, and the frag list. It does NOT handle frag lists within
+ * the frag list, if such a thing exists. We'd probably need to recurse to
+ * handle that cleanly.
+ */
+int skb_splice_bits(struct sk_buff *__skb, unsigned int offset,
+ struct pipe_inode_info *pipe, unsigned int tlen,
+ unsigned int flags)
+{
+ struct partial_page partial[PIPE_BUFFERS];
+ struct page *pages[PIPE_BUFFERS];
+ struct splice_pipe_desc spd = {
+ .pages = pages,
+ .partial = partial,
+ .flags = flags,
+ .ops = &sock_pipe_buf_ops,
+ .spd_release = sock_spd_release,
+ };
+ struct sk_buff *skb;
+ ssize_t ret = 0;
+
+ /*
+ * I'd love to avoid the clone here, but tcp_read_sock()
+ * ignores reference counts and unconditonally kills the sk_buff
+ * on return from the actor.
+ */
+ skb = skb_clone(__skb, GFP_KERNEL);
+ if (unlikely(!skb))
+ return -ENOMEM;
+
+ __skb_splice_bits(skb, &offset, &tlen, &spd);
+
+ /*
+ * now see if we have a frag_list to map
+ */
+ if (skb_shinfo(skb)->frag_list) {
+ struct sk_buff *list = skb_shinfo(skb)->frag_list;
+
+ for (; list && tlen; list = list->next)
+ __skb_splice_bits(skb, &offset, &tlen, &spd);
+ }
+
+ if (spd.nr_pages)
+ ret = splice_to_pipe(pipe, &spd);
+
+ kfree_skb(skb);
+ return ret;
+}
+
/**
* skb_store_bits - store bits from kernel buffer to skb
* @skb: destination buffer
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 041fba3..0ff9f86 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -835,6 +835,7 @@ const struct proto_ops inet_stream_ops = {
.recvmsg = sock_common_recvmsg,
.mmap = sock_no_mmap,
.sendpage = tcp_sendpage,
+ .splice_read = tcp_splice_read,
#ifdef CONFIG_COMPAT
.compat_setsockopt = compat_sock_common_setsockopt,
.compat_getsockopt = compat_sock_common_getsockopt,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 450f44b..437e77f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -253,6 +253,10 @@
#include <linux/poll.h>
#include <linux/init.h>
#include <linux/fs.h>
+#include <linux/skbuff.h>
+#include <linux/splice.h>
+#include <linux/net.h>
+#include <linux/socket.h>
#include <linux/random.h>
#include <linux/bootmem.h>
#include <linux/cache.h>
@@ -264,6 +268,7 @@
#include <net/xfrm.h>
#include <net/ip.h>
#include <net/netdma.h>
+#include <net/sock.h>
#include <asm/uaccess.h>
#include <asm/ioctls.h>
@@ -291,6 +296,15 @@ EXPORT_SYMBOL(tcp_memory_allocated);
EXPORT_SYMBOL(tcp_sockets_allocated);
/*
+ * Create a TCP splice context.
+ */
+struct tcp_splice_state {
+ struct pipe_inode_info *pipe;
+ size_t len;
+ unsigned int flags;
+};
+
+/*
* Pressure flag: try to collapse.
* Technical note: it is used by multiple contexts non atomically.
* All the sk_stream_mem_schedule() is of this nature: accounting
@@ -500,6 +514,113 @@ static inline void tcp_push(struct sock *sk, int flags, int mss_now,
}
}
+int tcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
+ unsigned int offset, size_t len)
+{
+ struct tcp_splice_state *tss = rd_desc->arg.data;
+
+ return skb_splice_bits(skb, offset, tss->pipe, tss->len, tss->flags);
+}
+
+static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
+{
+ /* Store TCP splice context information in read_descriptor_t. */
+ read_descriptor_t rd_desc = {
+ .arg.data = tss,
+ };
+
+ return tcp_read_sock(sk, &rd_desc, tcp_splice_data_recv);
+}
+
+/*
+ * tcp_splice_read - splice data from TCP socket to a pipe
+ * @sock: socket to splice from
+ * @pipe: pipe to splice to
+ * @len: number of bytes to splice
+ * @flags: splice modifier flags
+ *
+ * Will read pages from given socket and fill them into a pipe.
+ */
+ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ struct sock *sk = sock->sk;
+ struct tcp_splice_state tss = {
+ .pipe = pipe,
+ .len = len,
+ .flags = flags,
+ };
+ long timeo;
+ ssize_t spliced;
+ int ret;
+
+ if (*ppos != 0)
+ return -EINVAL;
+
+ lock_sock(sk);
+
+ ret = spliced = 0;
+ timeo = sock_rcvtimeo(sk, flags & SPLICE_F_NONBLOCK);
+ while (tss.len) {
+ ret = __tcp_splice_read(sk, &tss);
+ if (ret < 0)
+ break;
+ else if (!ret) {
+ if (spliced)
+ break;
+ if (flags & SPLICE_F_NONBLOCK) {
+ ret = -EAGAIN;
+ break;
+ }
+ if (sock_flag(sk, SOCK_DONE))
+ break;
+ if (sk->sk_err) {
+ ret = sock_error(sk);
+ break;
+ }
+ if (sk->sk_shutdown & RCV_SHUTDOWN)
+ break;
+ if (sk->sk_state == TCP_CLOSE) {
+ /*
+ * This occurs when user tries to read
+ * from never connected socket.
+ */
+ if (!sock_flag(sk, SOCK_DONE))
+ ret = -ENOTCONN;
+ break;
+ }
+ if (!timeo) {
+ ret = -EAGAIN;
+ break;
+ }
+ sk_wait_data(sk, &timeo);
+ if (signal_pending(current)) {
+ ret = sock_intr_errno(timeo);
+ break;
+ }
+ continue;
+ }
+ tss.len -= ret;
+ spliced += ret;
+
+ release_sock(sk);
+ lock_sock(sk);
+
+ if (sk->sk_err || sk->sk_state == TCP_CLOSE ||
+ (sk->sk_shutdown & RCV_SHUTDOWN) || !timeo ||
+ signal_pending(current))
+ break;
+ }
+
+ release_sock(sk);
+
+ if (spliced)
+ return spliced;
+
+ return ret;
+}
+
static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
size_t psize, int flags)
{
@@ -2515,6 +2636,7 @@ EXPORT_SYMBOL(tcp_poll);
EXPORT_SYMBOL(tcp_read_sock);
EXPORT_SYMBOL(tcp_recvmsg);
EXPORT_SYMBOL(tcp_sendmsg);
+EXPORT_SYMBOL(tcp_splice_read);
EXPORT_SYMBOL(tcp_sendpage);
EXPORT_SYMBOL(tcp_setsockopt);
EXPORT_SYMBOL(tcp_shutdown);
diff --git a/net/socket.c b/net/socket.c
index f453019..41240f5 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -111,6 +111,9 @@ static long compat_sock_ioctl(struct file *file,
static int sock_fasync(int fd, struct file *filp, int on);
static ssize_t sock_sendpage(struct file *file, struct page *page,
int offset, size_t size, loff_t *ppos, int more);
+static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags);
/*
* Socket files have a set of 'special' operations as well as the generic file ones. These don't appear
@@ -133,6 +136,7 @@ static const struct file_operations socket_file_ops = {
.fasync = sock_fasync,
.sendpage = sock_sendpage,
.splice_write = generic_splice_sendpage,
+ .splice_read = sock_splice_read,
};
/*
@@ -691,6 +695,15 @@ static ssize_t sock_sendpage(struct file *file, struct page *page,
return sock->ops->sendpage(sock, page, offset, size, flags);
}
+static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ struct socket *sock = file->private_data;
+
+ return sock->ops->splice_read(sock, ppos, pipe, len, flags);
+}
+
static struct sock_iocb *alloc_sock_iocb(struct kiocb *iocb,
struct sock_iocb *siocb)
{
--
1.5.2.1.174.gcd03
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-11 11:59 [PATCH][RFC] network splice receive v2 Jens Axboe
@ 2007-06-12 18:17 ` Evgeniy Polyakov
2007-06-12 18:17 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2007-06-12 18:17 UTC (permalink / raw)
To: Jens Axboe; +Cc: netdev, olaf.kirch
On Mon, Jun 11, 2007 at 01:59:26PM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> Patches are against the #splice branch of the block repo, "official" url
> of that is:
>
> git://git.kernel.dk/data/git/linux-2.6-block.git/
>
> and it's based on Linus main tree. Let me know if I should supply netdev
> branch patches instead, or even just provide a rolled up patch (or patch
> series) for anyone curious to test or play with it.
Hi Jens.
I've just pulled your tree (splice-net, but splice tree looks the same, git pull says
'Already up-to-date.') on top of linus git and got following bug trace.
I will investigate it further tomorrow.
[ 51.942373] ------------[ cut here ]------------
[ 51.947041] kernel BUG at include/linux/mm.h:285!
[ 51.951786] invalid opcode: 0000 [1] PREEMPT SMP
[ 51.956680] CPU 0
[ 51.958784] Modules linked in: button loop snd_intel8x0
snd_ac97_codec psmouse ac97_bus snd_pcm snd_timer snd soundcore
snd_page_alloc k8temp i2c_nforcen
[ 51.988793] Pid: 2604, comm: splice-fromnet Not tainted
2.6.22-rc4-splice #2
[ 51.995886] RIP: 0010:[<ffffffff80389b15>] [<ffffffff80389b15>]
__skb_splice_bits+0xcd/0x201
[ 52.004520] RSP: 0018:ffff810037f23c28 EFLAGS: 00010246
[ 52.009872] RAX: 0000000000000000 RBX: ffff810037f23d98 RCX:
000000000000003f
[ 52.017053] RDX: ffff81003fe93808 RSI: ffff81003fe93808 RDI:
000000000003c0a3
[ 52.024233] RBP: ffff810037f23c78 R08: 0000000000000000 R09:
ffff81003780e4b8
[ 52.031412] R10: ffffffff803b01d9 R11: ffff810037f23de8 R12:
000000000000009a
[ 52.038591] R13: 0000000000000000 R14: ffff810037f23c90 R15:
00000000000005a8
[ 52.045771] FS: 00002b9181d2c6d0(0000) GS:ffffffff804fb000(0000)
knlGS:0000000000000000
[ 52.053920] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 52.059714] CR2: 00002b9181bb60e0 CR3: 000000003d109000 CR4:
00000000000006e0
[ 52.066894] Process splice-fromnet (pid: 2604, threadinfo
ffff810037f22000, task ffff8100010f4100)
[ 52.075908] Stack: 00000040000612d0 ffff810037f23c94
ffff81003780e4b8 0000000037f23c78
[ 52.084214] 0000faf20000050e ffff81003780e4b8 ffff81003780e4b8
ffff81003e8f22d8
[ 52.091860] ffff81003c99c820 000000004d5f4ede ffff810037f23dd8
ffffffff8038bf20
[ 52.099265] Call Trace:
[ 52.101998] [<ffffffff8038bf20>] skb_splice_bits+0x6c/0xd0
[ 52.107619] [<ffffffff803dc720>] _read_unlock_irq+0x31/0x4e
[ 52.113330] [<ffffffff803afc1c>] tcp_splice_data_recv+0x20/0x22
[ 52.119386] [<ffffffff803afaf3>] tcp_read_sock+0xa2/0x1ab
[ 52.124920] [<ffffffff803afbfc>] tcp_splice_data_recv+0x0/0x22
[ 52.130888] [<ffffffff803b0232>] tcp_splice_read+0xa1/0x21b
[ 52.136593] [<ffffffff803891cf>] sock_def_readable+0x0/0x6f
[ 52.142303] [<ffffffff80384a25>] sock_splice_read+0x15/0x17
[ 52.148010] [<ffffffff8029e773>] do_splice_to+0x76/0x88
[ 52.153370] [<ffffffff8029fc87>] sys_splice+0x1a8/0x232
[ 52.158733] [<ffffffff802097ce>] system_call+0x7e/0x83
[ 52.164005]
[ 52.165544]
[ 52.165545] Code: 0f 0b eb fe 44 39 65 d4 8b 4d d4 41 0f 47 cc 90 ff
42 08 48
[ 52.175364] RIP [<ffffffff80389b15>] __skb_splice_bits+0xcd/0x201
[ 52.181636] RSP <ffff810037f23c28>
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-12 18:17 ` Evgeniy Polyakov
@ 2007-06-12 18:17 ` Jens Axboe
2007-06-12 18:18 ` Jens Axboe
2007-06-13 8:01 ` Evgeniy Polyakov
0 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2007-06-12 18:17 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: netdev, olaf.kirch
On Tue, Jun 12 2007, Evgeniy Polyakov wrote:
> On Mon, Jun 11, 2007 at 01:59:26PM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> > Patches are against the #splice branch of the block repo, "official" url
> > of that is:
> >
> > git://git.kernel.dk/data/git/linux-2.6-block.git/
> >
> > and it's based on Linus main tree. Let me know if I should supply netdev
> > branch patches instead, or even just provide a rolled up patch (or patch
> > series) for anyone curious to test or play with it.
>
> Hi Jens.
>
> I've just pulled your tree (splice-net, but splice tree looks the
> same, git pull says 'Already up-to-date.') on top of linus git and got
> following bug trace. I will investigate it further tomorrow.
Please tell me the contents of splice-net, it looks like you didn't
actually use the new code. That BUG_ON() is in get_page(), which
splice-net no longer uses. So the bug report cannot be valid for the
current code.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-12 18:17 ` Jens Axboe
@ 2007-06-12 18:18 ` Jens Axboe
2007-06-13 8:01 ` Evgeniy Polyakov
1 sibling, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2007-06-12 18:18 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: netdev, olaf.kirch
On Tue, Jun 12 2007, Jens Axboe wrote:
> On Tue, Jun 12 2007, Evgeniy Polyakov wrote:
> > On Mon, Jun 11, 2007 at 01:59:26PM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> > > Patches are against the #splice branch of the block repo, "official" url
> > > of that is:
> > >
> > > git://git.kernel.dk/data/git/linux-2.6-block.git/
> > >
> > > and it's based on Linus main tree. Let me know if I should supply netdev
> > > branch patches instead, or even just provide a rolled up patch (or patch
> > > series) for anyone curious to test or play with it.
> >
> > Hi Jens.
> >
> > I've just pulled your tree (splice-net, but splice tree looks the
> > same, git pull says 'Already up-to-date.') on top of linus git and got
> > following bug trace. I will investigate it further tomorrow.
>
> Please tell me the contents of splice-net, it looks like you didn't
> actually use the new code. That BUG_ON() is in get_page(), which
> splice-net no longer uses. So the bug report cannot be valid for the
> current code.
Note that I rebase my branches all the time, so you most often want to
use git pull -f to force an update of them.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-12 18:17 ` Jens Axboe
2007-06-12 18:18 ` Jens Axboe
@ 2007-06-13 8:01 ` Evgeniy Polyakov
2007-06-13 19:50 ` Jens Axboe
2007-06-14 11:57 ` Evgeniy Polyakov
1 sibling, 2 replies; 14+ messages in thread
From: Evgeniy Polyakov @ 2007-06-13 8:01 UTC (permalink / raw)
To: Jens Axboe; +Cc: netdev, olaf.kirch
On Tue, Jun 12, 2007 at 08:17:32PM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> On Tue, Jun 12 2007, Evgeniy Polyakov wrote:
> > On Mon, Jun 11, 2007 at 01:59:26PM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> > > Patches are against the #splice branch of the block repo, "official" url
> > > of that is:
> > >
> > > git://git.kernel.dk/data/git/linux-2.6-block.git/
> > >
> > > and it's based on Linus main tree. Let me know if I should supply netdev
> > > branch patches instead, or even just provide a rolled up patch (or patch
> > > series) for anyone curious to test or play with it.
> >
> > Hi Jens.
> >
> > I've just pulled your tree (splice-net, but splice tree looks the
> > same, git pull says 'Already up-to-date.') on top of linus git and got
> > following bug trace. I will investigate it further tomorrow.
>
> Please tell me the contents of splice-net, it looks like you didn't
> actually use the new code. That BUG_ON() is in get_page(), which
> splice-net no longer uses. So the bug report cannot be valid for the
> current code.
This is the last commit in that tree:
commit c90a6ce8242d108a5bc6fd0bc1b2aca72a2b5944
Author: Jens Axboe <jens.axboe@oracle.com>
Date: Mon Jun 11 21:59:50 2007 +0200
TCP splice receive support
Support for network splice receive.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
:100644 100644 efc4517... 472ee12... M include/linux/net.h
:100644 100644 e7367c7... 64e3eed... M include/linux/skbuff.h
:100644 100644 a8af9ae... 8e86697... M include/net/tcp.h
:100644 100644 7c6a34e... daea7b0... M net/core/skbuff.c
:100644 100644 041fba3... 0ff9f86... M net/ipv4/af_inet.c
:100644 100644 450f44b... 63efd7a... M net/ipv4/tcp.c
:100644 100644 f453019... 41240f5... M net/socket.c
I will rebase my tree, likely something was not merged correctly.
> --
> Jens Axboe
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-13 8:01 ` Evgeniy Polyakov
@ 2007-06-13 19:50 ` Jens Axboe
2007-06-14 11:57 ` Evgeniy Polyakov
1 sibling, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2007-06-13 19:50 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: netdev, olaf.kirch
On Wed, Jun 13 2007, Evgeniy Polyakov wrote:
> On Tue, Jun 12, 2007 at 08:17:32PM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> > On Tue, Jun 12 2007, Evgeniy Polyakov wrote:
> > > On Mon, Jun 11, 2007 at 01:59:26PM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> > > > Patches are against the #splice branch of the block repo, "official" url
> > > > of that is:
> > > >
> > > > git://git.kernel.dk/data/git/linux-2.6-block.git/
> > > >
> > > > and it's based on Linus main tree. Let me know if I should supply netdev
> > > > branch patches instead, or even just provide a rolled up patch (or patch
> > > > series) for anyone curious to test or play with it.
> > >
> > > Hi Jens.
> > >
> > > I've just pulled your tree (splice-net, but splice tree looks the
> > > same, git pull says 'Already up-to-date.') on top of linus git and got
> > > following bug trace. I will investigate it further tomorrow.
> >
> > Please tell me the contents of splice-net, it looks like you didn't
> > actually use the new code. That BUG_ON() is in get_page(), which
> > splice-net no longer uses. So the bug report cannot be valid for the
> > current code.
>
> This is the last commit in that tree:
>
> commit c90a6ce8242d108a5bc6fd0bc1b2aca72a2b5944
> Author: Jens Axboe <jens.axboe@oracle.com>
> Date: Mon Jun 11 21:59:50 2007 +0200
>
> TCP splice receive support
>
> Support for network splice receive.
>
> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
>
> :100644 100644 efc4517... 472ee12... M include/linux/net.h
> :100644 100644 e7367c7... 64e3eed... M include/linux/skbuff.h
> :100644 100644 a8af9ae... 8e86697... M include/net/tcp.h
> :100644 100644 7c6a34e... daea7b0... M net/core/skbuff.c
> :100644 100644 041fba3... 0ff9f86... M net/ipv4/af_inet.c
> :100644 100644 450f44b... 63efd7a... M net/ipv4/tcp.c
> :100644 100644 f453019... 41240f5... M net/socket.c
>
> I will rebase my tree, likely something was not merged correctly.
It must have been, please let me know how the current stuf works for
you!
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-13 8:01 ` Evgeniy Polyakov
2007-06-13 19:50 ` Jens Axboe
@ 2007-06-14 11:57 ` Evgeniy Polyakov
2007-06-14 11:59 ` Jens Axboe
1 sibling, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2007-06-14 11:57 UTC (permalink / raw)
To: Jens Axboe; +Cc: netdev, olaf.kirch
On Wed, Jun 13, 2007 at 12:01:04PM +0400, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> I will rebase my tree, likely something was not merged correctly.
Ok, I've just rebased a tree from recent git and pulled from brick -
things seems to be settled. I've ran several tests with different
filesizes and all files were received correctly without kernel crashes.
There is skb leak indeed, and it looks like it is in the
__splice_from_pipe() for the last page.
Thanks Jens, good work.
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-14 11:57 ` Evgeniy Polyakov
@ 2007-06-14 11:59 ` Jens Axboe
2007-06-15 8:06 ` Evgeniy Polyakov
0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2007-06-14 11:59 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: netdev, olaf.kirch
On Thu, Jun 14 2007, Evgeniy Polyakov wrote:
> On Wed, Jun 13, 2007 at 12:01:04PM +0400, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> > I will rebase my tree, likely something was not merged correctly.
>
> Ok, I've just rebased a tree from recent git and pulled from brick -
> things seems to be settled. I've ran several tests with different
> filesizes and all files were received correctly without kernel crashes.
> There is skb leak indeed, and it looks like it is in the
> __splice_from_pipe() for the last page.
Uh, the leak, right - I had forgotten about that, was working on getting
vmsplice into shape the last two days. Interesting that you mention the
last page, I'll dig in now! Any more info on this (how did you notice
the leak originates from there)?
> Thanks Jens, good work.
Thanks, you're welcome!
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-14 11:59 ` Jens Axboe
@ 2007-06-15 8:06 ` Evgeniy Polyakov
2007-06-15 8:43 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2007-06-15 8:06 UTC (permalink / raw)
To: Jens Axboe; +Cc: netdev, olaf.kirch
On Thu, Jun 14, 2007 at 01:59:02PM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> On Thu, Jun 14 2007, Evgeniy Polyakov wrote:
> > On Wed, Jun 13, 2007 at 12:01:04PM +0400, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> > > I will rebase my tree, likely something was not merged correctly.
> >
> > Ok, I've just rebased a tree from recent git and pulled from brick -
> > things seems to be settled. I've ran several tests with different
> > filesizes and all files were received correctly without kernel crashes.
> > There is skb leak indeed, and it looks like it is in the
> > __splice_from_pipe() for the last page.
>
> Uh, the leak, right - I had forgotten about that, was working on getting
> vmsplice into shape the last two days. Interesting that you mention the
> last page, I'll dig in now! Any more info on this (how did you notice
> the leak originates from there)?
I first observed leak via slabinfo data (not a leak, but number of skbs
did not dropped after quite huge files were transferred), then added
number of allocated and freed objects in skbuff.c, they did not match
for big files, so I started to check splice source and found that
sometimes ->release callback is not called, but code breaks out of the
loop. I've put some printks in __splice_from_pipe() and found following
case, when skb is leaked:
when the same cloned skb was shared multiple times (no more than 2 though),
only one copy was freed.
Further analysis description requires some splice background (obvious
for you, but that clears it for me):
pipe_buffer only contains 16 pages.
There is a code, which copies pages (pointers) from spd to pipe_buffer
(splice_to_pipe()).
skb allocations happens in chunks of different size (i.e. with different
number of skbs/pages per call), so it is possible that number of
allocated skbs will be less than pipe_buffer size (16), and then the
rest of the pages will be put into different (or the same) pipe_buffer later.
Sometimes two skbs from above example happens to be on the boundary of
the pipe buffer, so only one of them is being copied into pipe_buffer,
which is then transferred over the pipe.
So, we have a case, when spd has (it had more, but this two are special)
2 pages (actually the same page, but two references to it), but pipe_buffer
has a place only for one. In that case second page from spd will be missed.
So, things turned down to be not in the __splice_from_pipe(), but
splice_to_pipe(). Attached patch fixes a leak for me.
It was tested with different data files and all were received correctly.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
diff --git a/fs/splice.c b/fs/splice.c
index bc481f1..365bfd9 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
break;
if (pipe->nrbufs < PIPE_BUFFERS)
continue;
-
- break;
}
if (spd->flags & SPLICE_F_NONBLOCK) {
--
Evgeniy Polyakov
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-15 8:06 ` Evgeniy Polyakov
@ 2007-06-15 8:43 ` Jens Axboe
2007-06-15 9:14 ` Evgeniy Polyakov
0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2007-06-15 8:43 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: netdev, olaf.kirch
On Fri, Jun 15 2007, Evgeniy Polyakov wrote:
> On Thu, Jun 14, 2007 at 01:59:02PM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> > On Thu, Jun 14 2007, Evgeniy Polyakov wrote:
> > > On Wed, Jun 13, 2007 at 12:01:04PM +0400, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> > > > I will rebase my tree, likely something was not merged correctly.
> > >
> > > Ok, I've just rebased a tree from recent git and pulled from brick -
> > > things seems to be settled. I've ran several tests with different
> > > filesizes and all files were received correctly without kernel crashes.
> > > There is skb leak indeed, and it looks like it is in the
> > > __splice_from_pipe() for the last page.
> >
> > Uh, the leak, right - I had forgotten about that, was working on getting
> > vmsplice into shape the last two days. Interesting that you mention the
> > last page, I'll dig in now! Any more info on this (how did you notice
> > the leak originates from there)?
>
> I first observed leak via slabinfo data (not a leak, but number of skbs
> did not dropped after quite huge files were transferred), then added
> number of allocated and freed objects in skbuff.c, they did not match
> for big files, so I started to check splice source and found that
> sometimes ->release callback is not called, but code breaks out of the
> loop. I've put some printks in __splice_from_pipe() and found following
> case, when skb is leaked:
> when the same cloned skb was shared multiple times (no more than 2 though),
> only one copy was freed.
>
> Further analysis description requires some splice background (obvious
> for you, but that clears it for me):
> pipe_buffer only contains 16 pages.
> There is a code, which copies pages (pointers) from spd to pipe_buffer
> (splice_to_pipe()).
> skb allocations happens in chunks of different size (i.e. with different
> number of skbs/pages per call), so it is possible that number of
> allocated skbs will be less than pipe_buffer size (16), and then the
> rest of the pages will be put into different (or the same) pipe_buffer later.
> Sometimes two skbs from above example happens to be on the boundary of
> the pipe buffer, so only one of them is being copied into pipe_buffer,
> which is then transferred over the pipe.
> So, we have a case, when spd has (it had more, but this two are special)
> 2 pages (actually the same page, but two references to it), but pipe_buffer
> has a place only for one. In that case second page from spd will be missed.
>
> So, things turned down to be not in the __splice_from_pipe(), but
> splice_to_pipe(). Attached patch fixes a leak for me.
> It was tested with different data files and all were received correctly.
>
> Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
>
> diff --git a/fs/splice.c b/fs/splice.c
> index bc481f1..365bfd9 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> break;
> if (pipe->nrbufs < PIPE_BUFFERS)
> continue;
> -
> - break;
> }
>
> if (spd->flags & SPLICE_F_NONBLOCK) {
>
Hmm, curious. If we hit that location, then two conditions are true:
- Pipe is full
- We transferred some data
if you remove the break, then you'll end up blocking in pipe_wait()
(unless you have SPLICE_F_NONBLOCK also set). And we don't want to block
waiting for more room, if we already transferred some data. In that case
we just want to return a short splice. Looking at pipe_write(), it'll
block as well though. Just doesn't seem optimal to me, but...
So the question is why would doing the break there cause a leak? I just
don't yet see how it can happen, I'd love to fix that condition instead.
For the case you describe, we should have page_nr == 1 and spd->nr_pages
== 2. Is the:
while (page_nr < spd->nr_pages)
spd->spd_release(spd, page_nr++);
not dropping the right reference?
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-15 8:43 ` Jens Axboe
@ 2007-06-15 9:14 ` Evgeniy Polyakov
2007-06-15 9:34 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2007-06-15 9:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: netdev, olaf.kirch
On Fri, Jun 15, 2007 at 10:43:18AM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> > So, things turned down to be not in the __splice_from_pipe(), but
> > splice_to_pipe(). Attached patch fixes a leak for me.
> > It was tested with different data files and all were received correctly.
> >
> > Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
> >
> > diff --git a/fs/splice.c b/fs/splice.c
> > index bc481f1..365bfd9 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> > break;
> > if (pipe->nrbufs < PIPE_BUFFERS)
> > continue;
> > -
> > - break;
> > }
> >
> > if (spd->flags & SPLICE_F_NONBLOCK) {
> >
>
> Hmm, curious. If we hit that location, then two conditions are true:
>
> - Pipe is full
> - We transferred some data
Yep.
> if you remove the break, then you'll end up blocking in pipe_wait()
> (unless you have SPLICE_F_NONBLOCK also set). And we don't want to block
> waiting for more room, if we already transferred some data. In that case
> we just want to return a short splice. Looking at pipe_write(), it'll
> block as well though. Just doesn't seem optimal to me, but...
>
> So the question is why would doing the break there cause a leak? I just
> don't yet see how it can happen, I'd love to fix that condition instead.
> For the case you describe, we should have page_nr == 1 and spd->nr_pages
> == 2. Is the:
>
> while (page_nr < spd->nr_pages)
> spd->spd_release(spd, page_nr++);
>
> not dropping the right reference?
Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages
was 2 there was only 1 free slot in pipe_buffer.
spd_fill_page: allocated: 89, freed: 73, data: ffff81003d606d28
spd_fill_page: allocated: 90, freed: 73, data: ffff81003d606d28
splice_to_pipe: priv: ffff81003d606d28, spd_nrpages: 1, pipe_nrbufs: 16, page_nr: 1.
spd_fill_page: allocated: 91, freed: 73, data: fff81003d6549c8 // next data
...
__splice_from_pipe: process: sd_len: 0, buf_len: 0, buf_priv: ffff81003d606d28.
__splice_from_pipe: release ffff81003d606d28.
sock_pipe_buf_release: allocated: 91, freed: 89, ptr: ffff81003d606d28
splice_to_pipe: priv: ffff81003d6549c8, spd_nrpages: 0, pipe_nrbufs: 1, page_nr: 1. // next data
> --
> Jens Axboe
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-15 9:14 ` Evgeniy Polyakov
@ 2007-06-15 9:34 ` Jens Axboe
2007-06-15 9:44 ` Evgeniy Polyakov
0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2007-06-15 9:34 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: netdev, olaf.kirch
On Fri, Jun 15 2007, Evgeniy Polyakov wrote:
> On Fri, Jun 15, 2007 at 10:43:18AM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> > > So, things turned down to be not in the __splice_from_pipe(), but
> > > splice_to_pipe(). Attached patch fixes a leak for me.
> > > It was tested with different data files and all were received correctly.
> > >
> > > Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
> > >
> > > diff --git a/fs/splice.c b/fs/splice.c
> > > index bc481f1..365bfd9 100644
> > > --- a/fs/splice.c
> > > +++ b/fs/splice.c
> > > @@ -211,8 +211,6 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> > > break;
> > > if (pipe->nrbufs < PIPE_BUFFERS)
> > > continue;
> > > -
> > > - break;
> > > }
> > >
> > > if (spd->flags & SPLICE_F_NONBLOCK) {
> > >
> >
> > Hmm, curious. If we hit that location, then two conditions are true:
> >
> > - Pipe is full
> > - We transferred some data
>
> Yep.
>
> > if you remove the break, then you'll end up blocking in pipe_wait()
> > (unless you have SPLICE_F_NONBLOCK also set). And we don't want to block
> > waiting for more room, if we already transferred some data. In that case
> > we just want to return a short splice. Looking at pipe_write(), it'll
> > block as well though. Just doesn't seem optimal to me, but...
> >
> > So the question is why would doing the break there cause a leak? I just
> > don't yet see how it can happen, I'd love to fix that condition instead.
> > For the case you describe, we should have page_nr == 1 and spd->nr_pages
> > == 2. Is the:
> >
> > while (page_nr < spd->nr_pages)
> > spd->spd_release(spd, page_nr++);
> >
> > not dropping the right reference?
>
> Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages
> was 2 there was only 1 free slot in pipe_buffer.
Ah duh, indeed, it is a dumb bug indeed. This should work.
diff --git a/fs/splice.c b/fs/splice.c
index 89871c6..5327cbf 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -172,6 +172,7 @@ static const struct pipe_buf_operations user_page_pipe_buf_ops = {
ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
struct splice_pipe_desc *spd)
{
+ unsigned int spd_pages = spd->nr_pages;
int ret, do_wakeup, page_nr;
ret = 0;
@@ -252,7 +253,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
}
}
- while (page_nr < spd->nr_pages)
+ while (page_nr < spd_pages)
spd->spd_release(spd, page_nr++);
return ret;
--
Jens Axboe
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-15 9:34 ` Jens Axboe
@ 2007-06-15 9:44 ` Evgeniy Polyakov
2007-06-15 13:31 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2007-06-15 9:44 UTC (permalink / raw)
To: Jens Axboe; +Cc: netdev, olaf.kirch
On Fri, Jun 15, 2007 at 11:34:30AM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> > Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages
> > was 2 there was only 1 free slot in pipe_buffer.
>
> Ah duh, indeed, it is a dumb bug indeed. This should work.
Yep, this one works too.
> diff --git a/fs/splice.c b/fs/splice.c
> index 89871c6..5327cbf 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -172,6 +172,7 @@ static const struct pipe_buf_operations user_page_pipe_buf_ops = {
> ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> struct splice_pipe_desc *spd)
> {
> + unsigned int spd_pages = spd->nr_pages;
> int ret, do_wakeup, page_nr;
>
> ret = 0;
> @@ -252,7 +253,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> }
> }
>
> - while (page_nr < spd->nr_pages)
> + while (page_nr < spd_pages)
> spd->spd_release(spd, page_nr++);
>
> return ret;
>
> --
> Jens Axboe
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH][RFC] network splice receive v2
2007-06-15 9:44 ` Evgeniy Polyakov
@ 2007-06-15 13:31 ` Jens Axboe
0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2007-06-15 13:31 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: netdev, olaf.kirch
On Fri, Jun 15 2007, Evgeniy Polyakov wrote:
> On Fri, Jun 15, 2007 at 11:34:30AM +0200, Jens Axboe (jens.axboe@oracle.com) wrote:
> > > Both spd->nr_pages and page_nr are equal to 1. When spd->nr_pages
> > > was 2 there was only 1 free slot in pipe_buffer.
> >
> > Ah duh, indeed, it is a dumb bug indeed. This should work.
>
> Yep, this one works too.
Great, thanks for testing!
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-06-15 13:31 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-11 11:59 [PATCH][RFC] network splice receive v2 Jens Axboe
2007-06-12 18:17 ` Evgeniy Polyakov
2007-06-12 18:17 ` Jens Axboe
2007-06-12 18:18 ` Jens Axboe
2007-06-13 8:01 ` Evgeniy Polyakov
2007-06-13 19:50 ` Jens Axboe
2007-06-14 11:57 ` Evgeniy Polyakov
2007-06-14 11:59 ` Jens Axboe
2007-06-15 8:06 ` Evgeniy Polyakov
2007-06-15 8:43 ` Jens Axboe
2007-06-15 9:14 ` Evgeniy Polyakov
2007-06-15 9:34 ` Jens Axboe
2007-06-15 9:44 ` Evgeniy Polyakov
2007-06-15 13:31 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).