* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor [not found] <200706112159.l5BLxF5x004043@hera.kernel.org> @ 2007-06-11 23:34 ` Andrew Morton 2007-06-12 6:35 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Andrew Morton @ 2007-06-11 23:34 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel On Mon, 11 Jun 2007 21:59:15 GMT Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote: > Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=20d698db67059a63d217030dfd02872cb5f88dfb > Commit: 20d698db67059a63d217030dfd02872cb5f88dfb > Parent: 17374ff1aa9ce2a0597416a16729474b538af443 > Author: Jens Axboe <jens.axboe@oracle.com> > AuthorDate: Tue Jun 5 11:05:11 2007 +0200 > Committer: Jens Axboe <jens.axboe@oracle.com> > CommitDate: Fri Jun 8 08:33:59 2007 +0200 > > splice: move balance_dirty_pages_ratelimited() outside of splice actor > > I've seen inode related deadlocks, so move this call outside of the > actor itself, which may hold the inode lock. > eh? If the pipe_to_file() caller holds inode_lock, our problems are large. I doubt if that's true, so what problem is this patch really fixing?? > --- > fs/splice.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/fs/splice.c b/fs/splice.c > index b78a7f0..6349d31 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -652,7 +652,6 @@ find_page: > * accessed, we are now done! > */ > mark_page_accessed(page); > - balance_dirty_pages_ratelimited(mapping); > out: > page_cache_release(page); > unlock_page(page); > @@ -823,6 +822,7 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out, > if (err) > ret = err; > } > + balance_dirty_pages_ratelimited(mapping); > } > > return ret; > @@ -876,6 +876,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > if (err) > ret = err; > } > + balance_dirty_pages_ratelimited(mapping); > } > balance_dirty_pages_ratelimited() is supposed to be called once-per-dirtied-page. This caller can dirty an arbitrarily large amount of memory and hence should use balance_dirty_pages_ratelimited_nr(). As things stand, a large splice() could potentially cause the dirty limits to be exceeded. btw, can we please arrange to get patches reviewed prior to them being merged? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-11 23:34 ` splice: move balance_dirty_pages_ratelimited() outside of splice actor Andrew Morton @ 2007-06-12 6:35 ` Jens Axboe 2007-06-12 11:20 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2007-06-12 6:35 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Mon, Jun 11 2007, Andrew Morton wrote: > On Mon, 11 Jun 2007 21:59:15 GMT > Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote: > > > Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=20d698db67059a63d217030dfd02872cb5f88dfb > > Commit: 20d698db67059a63d217030dfd02872cb5f88dfb > > Parent: 17374ff1aa9ce2a0597416a16729474b538af443 > > Author: Jens Axboe <jens.axboe@oracle.com> > > AuthorDate: Tue Jun 5 11:05:11 2007 +0200 > > Committer: Jens Axboe <jens.axboe@oracle.com> > > CommitDate: Fri Jun 8 08:33:59 2007 +0200 > > > > splice: move balance_dirty_pages_ratelimited() outside of splice actor > > > > I've seen inode related deadlocks, so move this call outside of the > > actor itself, which may hold the inode lock. > > > > eh? If the pipe_to_file() caller holds inode_lock, our problems are large. > > I doubt if that's true, so what problem is this patch really fixing?? I can repeatedly lock up balance_dirty_pages() if it's called inside the double lock of splice_from_pipe(). I'll fire up the test box and reproduce, then post the backtraces. > > --- > > fs/splice.c | 3 ++- > > 1 files changed, 2 insertions(+), 1 deletions(-) > > > > diff --git a/fs/splice.c b/fs/splice.c > > index b78a7f0..6349d31 100644 > > --- a/fs/splice.c > > +++ b/fs/splice.c > > @@ -652,7 +652,6 @@ find_page: > > * accessed, we are now done! > > */ > > mark_page_accessed(page); > > - balance_dirty_pages_ratelimited(mapping); > > out: > > page_cache_release(page); > > unlock_page(page); > > @@ -823,6 +822,7 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out, > > if (err) > > ret = err; > > } > > + balance_dirty_pages_ratelimited(mapping); > > } > > > > return ret; > > @@ -876,6 +876,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > > if (err) > > ret = err; > > } > > + balance_dirty_pages_ratelimited(mapping); > > } > > > > balance_dirty_pages_ratelimited() is supposed to be called > once-per-dirtied-page. This caller can dirty an arbitrarily large amount > of memory and hence should use balance_dirty_pages_ratelimited_nr(). > > As things stand, a large splice() could potentially cause the dirty limits > to be exceeded. It's mostly (at most) 16 pages, not an arbitrarily large amount. So I doubt it makes a lot of difference. > btw, can we please arrange to get patches reviewed prior to them being > merged? Sure, I should have posted the series here. Mostly simple stuff though, and others have seen them. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 6:35 ` Jens Axboe @ 2007-06-12 11:20 ` Jens Axboe 2007-06-12 11:31 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2007-06-12 11:20 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Tue, Jun 12 2007, Jens Axboe wrote: > On Mon, Jun 11 2007, Andrew Morton wrote: > > On Mon, 11 Jun 2007 21:59:15 GMT > > Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote: > > > > > Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=20d698db67059a63d217030dfd02872cb5f88dfb > > > Commit: 20d698db67059a63d217030dfd02872cb5f88dfb > > > Parent: 17374ff1aa9ce2a0597416a16729474b538af443 > > > Author: Jens Axboe <jens.axboe@oracle.com> > > > AuthorDate: Tue Jun 5 11:05:11 2007 +0200 > > > Committer: Jens Axboe <jens.axboe@oracle.com> > > > CommitDate: Fri Jun 8 08:33:59 2007 +0200 > > > > > > splice: move balance_dirty_pages_ratelimited() outside of splice actor > > > > > > I've seen inode related deadlocks, so move this call outside of the > > > actor itself, which may hold the inode lock. > > > > > > > eh? If the pipe_to_file() caller holds inode_lock, our problems are large. > > > > I doubt if that's true, so what problem is this patch really fixing?? > > I can repeatedly lock up balance_dirty_pages() if it's called inside the > double lock of splice_from_pipe(). I'll fire up the test box and > reproduce, then post the backtraces. OK, here it is. Running $ ./splice-fromnet 9999 | ./splice-out xxx on one box (receive from network port 9999, store in file xxx) and a simple $ cat SUSE-Linux-10.1-GM-DVD-i386.iso | netcat centera 9999 on another box. After ~180mb has been transferred, splice-out gets stuck. With the balancing moved outside of the splice actor, everything works perfectly regardless of memory pressure. splice-fromne S 00000000 0 9829 9778 (NOTLB) efe63d4c 00000046 f524cb20 00000000 efe63cfc c0144ae3 00000000 f524cb20 c03c33fa f1b9ee0c 00000009 f524cb20 02ad51ce 00000024 00002406 f524cc30 c1cf3980 00000001 efe63d44 00000246 f6692040 00000282 efe63d54 f2f83600 Call Trace: [<c01809bd>] pipe_wait+0x6d/0x90 [<c0198006>] splice_to_pipe+0x196/0x240 [<c035eb4d>] skb_splice_bits+0xcd/0xe0 [<c038157b>] tcp_splice_data_recv+0x2b/0x40 [<c0381501>] tcp_read_sock+0x131/0x180 [<c0381f55>] tcp_splice_read+0x85/0x1e0 [<c03572f5>] sock_splice_read+0x25/0x30 [<c019743e>] do_splice_to+0x5e/0x80 [<c0198beb>] sys_splice+0x1eb/0x200 [<c010511e>] sysenter_past_esp+0x5f/0x99 ======================= splice-out D EF423C98 0 9830 9778 (NOTLB) ef423cac 00000046 00000002 ef423c98 ef423c94 00000000 00000046 f760ba98 f760ba98 f760b9a0 00000006 f7cba0f0 02ca49b4 00000024 001cf7e6 f7cba200 c1cf3980 00000001 c0144c73 c1cdaf78 f55c64c0 00000046 c1cdaf78 fffd5d5b Call Trace: [<c03c272d>] io_schedule+0x1d/0x30 [<c0154979>] sync_page+0x39/0x50 [<c03c2d20>] __wait_on_bit_lock+0x40/0x70 [<c0154929>] __lock_page+0x69/0x70 [<c015a9c5>] write_cache_pages+0x105/0x2f0 [<c015abd3>] generic_writepages+0x23/0x30 [<c015ac29>] do_writepages+0x49/0x50 [<c01960a4>] __writeback_single_inode+0x94/0x370 [<c0196783>] sync_sb_inodes+0x1b3/0x270 [<c0196aa8>] writeback_inodes+0xb8/0xf0 [<c015afed>] balance_dirty_pages_ratelimited_nr+0x9d/0x1b0 [<c01978b3>] pipe_to_file+0x163/0x260 [<c0197a0f>] __splice_from_pipe+0x5f/0x250 [<c0197d18>] splice_from_pipe+0x58/0x80 [<c0197dc4>] generic_file_splice_write+0x54/0x100 [<c01974bf>] do_splice_from+0x5f/0x80 [<c0198bce>] sys_splice+0x1ce/0x200 [<c010511e>] sysenter_past_esp+0x5f/0x99 There are two pdflushes, doing nothing: ======================= pdflush S F7D8FF90 0 171 2 (L-TLB) f7d8ffa4 00000046 00000002 f7d8ff90 f7d8ff8c 00000000 f7c50b60 c055ebac f7c51090 00000001 0000000a f7c50b60 6fc07726 00000026 0002552e f7c50c70 c1ce9980 00000000 c015b750 f7d8ff98 f71d8b80 c0498c8c 00000001 fffd86d8 Call Trace: [<c015b7ec>] pdflush+0x9c/0x1c0 [<c013aa42>] kthread+0x42/0x70 [<c0105e0b>] kernel_thread_helper+0x7/0x1c ======================= pdflush S F7D91F90 0 172 2 (L-TLB) f7d91fa4 00000046 00000002 f7d91f90 f7d91f8c 00000000 f7ca6aa0 c055ebac f7ca6fd0 00000001 00000009 f7ca6aa0 01d76a76 00000024 003c4c0a f7ca6bb0 c1ce9980 00000000 c015b750 f7d91f98 f7255dc0 c0498c8c 00000001 fffd5df9 Call Trace: [<c015b7ec>] pdflush+0x9c/0x1c0 [<c013aa42>] kthread+0x42/0x70 [<c0105e0b>] kernel_thread_helper+0x7/0x1c -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 11:20 ` Jens Axboe @ 2007-06-12 11:31 ` Jens Axboe 2007-06-12 12:06 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2007-06-12 11:31 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Tue, Jun 12 2007, Jens Axboe wrote: > On Tue, Jun 12 2007, Jens Axboe wrote: > > On Mon, Jun 11 2007, Andrew Morton wrote: > > > On Mon, 11 Jun 2007 21:59:15 GMT > > > Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote: > > > > > > > Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=20d698db67059a63d217030dfd02872cb5f88dfb > > > > Commit: 20d698db67059a63d217030dfd02872cb5f88dfb > > > > Parent: 17374ff1aa9ce2a0597416a16729474b538af443 > > > > Author: Jens Axboe <jens.axboe@oracle.com> > > > > AuthorDate: Tue Jun 5 11:05:11 2007 +0200 > > > > Committer: Jens Axboe <jens.axboe@oracle.com> > > > > CommitDate: Fri Jun 8 08:33:59 2007 +0200 > > > > > > > > splice: move balance_dirty_pages_ratelimited() outside of splice actor > > > > > > > > I've seen inode related deadlocks, so move this call outside of the > > > > actor itself, which may hold the inode lock. > > > > > > > > > > eh? If the pipe_to_file() caller holds inode_lock, our problems are large. > > > > > > I doubt if that's true, so what problem is this patch really fixing?? > > > > I can repeatedly lock up balance_dirty_pages() if it's called inside the > > double lock of splice_from_pipe(). I'll fire up the test box and > > reproduce, then post the backtraces. > > OK, here it is. Running > > $ ./splice-fromnet 9999 | ./splice-out xxx > > on one box (receive from network port 9999, store in file xxx) and a > simple > > $ cat SUSE-Linux-10.1-GM-DVD-i386.iso | netcat centera 9999 > > on another box. After ~180mb has been transferred, splice-out gets > stuck. With the balancing moved outside of the splice actor, everything > works perfectly regardless of memory pressure. > > splice-fromne S 00000000 0 9829 9778 (NOTLB) > efe63d4c 00000046 f524cb20 00000000 efe63cfc c0144ae3 00000000 f524cb20 > c03c33fa f1b9ee0c 00000009 f524cb20 02ad51ce 00000024 00002406 f524cc30 > c1cf3980 00000001 efe63d44 00000246 f6692040 00000282 efe63d54 f2f83600 > Call Trace: > [<c01809bd>] pipe_wait+0x6d/0x90 > [<c0198006>] splice_to_pipe+0x196/0x240 > [<c035eb4d>] skb_splice_bits+0xcd/0xe0 > [<c038157b>] tcp_splice_data_recv+0x2b/0x40 > [<c0381501>] tcp_read_sock+0x131/0x180 > [<c0381f55>] tcp_splice_read+0x85/0x1e0 > [<c03572f5>] sock_splice_read+0x25/0x30 > [<c019743e>] do_splice_to+0x5e/0x80 > [<c0198beb>] sys_splice+0x1eb/0x200 > [<c010511e>] sysenter_past_esp+0x5f/0x99 > ======================= > splice-out D EF423C98 0 9830 9778 (NOTLB) > ef423cac 00000046 00000002 ef423c98 ef423c94 00000000 00000046 f760ba98 > f760ba98 f760b9a0 00000006 f7cba0f0 02ca49b4 00000024 001cf7e6 f7cba200 > c1cf3980 00000001 c0144c73 c1cdaf78 f55c64c0 00000046 c1cdaf78 fffd5d5b > Call Trace: > [<c03c272d>] io_schedule+0x1d/0x30 > [<c0154979>] sync_page+0x39/0x50 > [<c03c2d20>] __wait_on_bit_lock+0x40/0x70 > [<c0154929>] __lock_page+0x69/0x70 > [<c015a9c5>] write_cache_pages+0x105/0x2f0 > [<c015abd3>] generic_writepages+0x23/0x30 > [<c015ac29>] do_writepages+0x49/0x50 > [<c01960a4>] __writeback_single_inode+0x94/0x370 > [<c0196783>] sync_sb_inodes+0x1b3/0x270 > [<c0196aa8>] writeback_inodes+0xb8/0xf0 > [<c015afed>] balance_dirty_pages_ratelimited_nr+0x9d/0x1b0 > [<c01978b3>] pipe_to_file+0x163/0x260 > [<c0197a0f>] __splice_from_pipe+0x5f/0x250 > [<c0197d18>] splice_from_pipe+0x58/0x80 > [<c0197dc4>] generic_file_splice_write+0x54/0x100 > [<c01974bf>] do_splice_from+0x5f/0x80 > [<c0198bce>] sys_splice+0x1ce/0x200 > [<c010511e>] sysenter_past_esp+0x5f/0x99 Alright, I think the "mystery" is solved - it's not the inode locks (pheew), it's pipe_to_file() calling balance_dirty_pages() with the page lock still held. The rest is self-explanatory, we end up blocking on the page lock. Would you prefer this change, then? I'd prefer keeping the current code, unless it's absolutely critical that we call balance_dirty_pages_ratelimited() for each and every page instead of eg every 16 pages here. diff --git a/fs/splice.c b/fs/splice.c index 25ec9c8..21e6b92 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -660,6 +699,8 @@ find_page: out: page_cache_release(page); unlock_page(page); + if (ret) + balance_dirty_pages_ratelimited(mapping); out_ret: return ret; } @@ -857,7 +898,6 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out, if (err) ret = err; } - balance_dirty_pages_ratelimited(mapping); } return ret; @@ -913,7 +953,6 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, if (err) ret = err; } - balance_dirty_pages_ratelimited(mapping); } return ret; -- Jens Axboe ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 11:31 ` Jens Axboe @ 2007-06-12 12:06 ` Peter Zijlstra 2007-06-12 12:10 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2007-06-12 12:06 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Morton, linux-kernel On Tue, 2007-06-12 at 13:31 +0200, Jens Axboe wrote: > Would you prefer this change, then? I'd prefer keeping the current code, > unless it's absolutely critical that we call > balance_dirty_pages_ratelimited() for each and every page instead of eg > every 16 pages here. For that we should call: balance_dirty_pages_ratelimited_nr(mapping, nr); Which is ok, for small nr. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 12:06 ` Peter Zijlstra @ 2007-06-12 12:10 ` Jens Axboe 2007-06-12 12:16 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2007-06-12 12:10 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Andrew Morton, linux-kernel On Tue, Jun 12 2007, Peter Zijlstra wrote: > On Tue, 2007-06-12 at 13:31 +0200, Jens Axboe wrote: > > > Would you prefer this change, then? I'd prefer keeping the current code, > > unless it's absolutely critical that we call > > balance_dirty_pages_ratelimited() for each and every page instead of eg > > every 16 pages here. > > For that we should call: > balance_dirty_pages_ratelimited_nr(mapping, nr); > > Which is ok, for small nr. OK, then this should be better: diff --git a/fs/splice.c b/fs/splice.c index 25ec9c8..ed40967 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -844,6 +883,9 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out, sd.file = out; ret = __splice_from_pipe(pipe, &sd, pipe_to_file); if (ret > 0) { + unsigned long nr_pages; + + nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; *ppos += ret; /* @@ -857,7 +899,7 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out, if (err) ret = err; } - balance_dirty_pages_ratelimited(mapping); + balance_dirty_pages_ratelimited_nr(mapping, nr_pages); } return ret; @@ -898,6 +940,9 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, ret = splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_file); if (ret > 0) { + unsigned long nr_pages; + + nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; *ppos += ret; /* @@ -913,7 +958,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, if (err) ret = err; } - balance_dirty_pages_ratelimited(mapping); + balance_dirty_pages_ratelimited_nr(mapping, nr_pages); } return ret; -- Jens Axboe ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 12:10 ` Jens Axboe @ 2007-06-12 12:16 ` Peter Zijlstra 2007-06-12 12:44 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2007-06-12 12:16 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Morton, linux-kernel On Tue, 2007-06-12 at 14:10 +0200, Jens Axboe wrote: > On Tue, Jun 12 2007, Peter Zijlstra wrote: > > On Tue, 2007-06-12 at 13:31 +0200, Jens Axboe wrote: > > > > > Would you prefer this change, then? I'd prefer keeping the current code, > > > unless it's absolutely critical that we call > > > balance_dirty_pages_ratelimited() for each and every page instead of eg > > > every 16 pages here. > > > > For that we should call: > > balance_dirty_pages_ratelimited_nr(mapping, nr); > > > > Which is ok, for small nr. > > OK, then this should be better: > > diff --git a/fs/splice.c b/fs/splice.c > index 25ec9c8..ed40967 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -844,6 +883,9 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out, > sd.file = out; > ret = __splice_from_pipe(pipe, &sd, pipe_to_file); > if (ret > 0) { > + unsigned long nr_pages; > + > + nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; perhaps? nr_pages = DIV_ROUND_UP(ret, PAGE_CACHE_SIZE); not sure how horrid that turns out to be; you never know with gcc. Otherwise, looks good. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 12:16 ` Peter Zijlstra @ 2007-06-12 12:44 ` Jens Axboe 2007-06-12 16:02 ` Andrew Morton 2007-06-12 16:46 ` Andrew Morton 0 siblings, 2 replies; 14+ messages in thread From: Jens Axboe @ 2007-06-12 12:44 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Andrew Morton, linux-kernel On Tue, Jun 12 2007, Peter Zijlstra wrote: > On Tue, 2007-06-12 at 14:10 +0200, Jens Axboe wrote: > > On Tue, Jun 12 2007, Peter Zijlstra wrote: > > > On Tue, 2007-06-12 at 13:31 +0200, Jens Axboe wrote: > > > > > > > Would you prefer this change, then? I'd prefer keeping the current code, > > > > unless it's absolutely critical that we call > > > > balance_dirty_pages_ratelimited() for each and every page instead of eg > > > > every 16 pages here. > > > > > > For that we should call: > > > balance_dirty_pages_ratelimited_nr(mapping, nr); > > > > > > Which is ok, for small nr. > > > > OK, then this should be better: > > > > diff --git a/fs/splice.c b/fs/splice.c > > index 25ec9c8..ed40967 100644 > > --- a/fs/splice.c > > +++ b/fs/splice.c > > @@ -844,6 +883,9 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out, > > sd.file = out; > > ret = __splice_from_pipe(pipe, &sd, pipe_to_file); > > if (ret > 0) { > > + unsigned long nr_pages; > > + > > + nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; > > perhaps? > nr_pages = DIV_ROUND_UP(ret, PAGE_CACHE_SIZE); > > not sure how horrid that turns out to be; you never know with gcc. Well, I think such macros are horribly ugly and that the original code is MUCH easier to read. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 12:44 ` Jens Axboe @ 2007-06-12 16:02 ` Andrew Morton 2007-06-12 16:46 ` Andrew Morton 1 sibling, 0 replies; 14+ messages in thread From: Andrew Morton @ 2007-06-12 16:02 UTC (permalink / raw) To: Jens Axboe; +Cc: Peter Zijlstra, linux-kernel On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote: > On Tue, Jun 12 2007, Peter Zijlstra wrote: > > On Tue, 2007-06-12 at 14:10 +0200, Jens Axboe wrote: > > > On Tue, Jun 12 2007, Peter Zijlstra wrote: > > > > On Tue, 2007-06-12 at 13:31 +0200, Jens Axboe wrote: > > > > > > > > > Would you prefer this change, then? I'd prefer keeping the current code, > > > > > unless it's absolutely critical that we call > > > > > balance_dirty_pages_ratelimited() for each and every page instead of eg > > > > > every 16 pages here. > > > > > > > > For that we should call: > > > > balance_dirty_pages_ratelimited_nr(mapping, nr); > > > > > > > > Which is ok, for small nr. > > > > > > OK, then this should be better: yup, the patch looks fine, thanks. It will in practice be OK calling balance_dirty_pages() once per 16 pages but one should try to be a good little kernel citizen and do the right thing, I guess. > > > diff --git a/fs/splice.c b/fs/splice.c > > > index 25ec9c8..ed40967 100644 > > > --- a/fs/splice.c > > > +++ b/fs/splice.c > > > @@ -844,6 +883,9 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out, > > > sd.file = out; > > > ret = __splice_from_pipe(pipe, &sd, pipe_to_file); > > > if (ret > 0) { > > > + unsigned long nr_pages; > > > + > > > + nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; > > > > perhaps? > > nr_pages = DIV_ROUND_UP(ret, PAGE_CACHE_SIZE); > > > > not sure how horrid that turns out to be; you never know with gcc. > > Well, I think such macros are horribly ugly and that the original code > is MUCH easier to read. I think the macros are pretty foul too (perhaps we should migrate it to "div_round_up()"). But we should migrate to them. Because once one has learned what a particular helper like this does, the code becomes more readable, more understandable and easier to verify correct behaviour. Whereas the open-coded arithmetic is _always_ suspect and always needs to be checked each time one's eye passes over it. And then there's the ongoing pitter-patter of "convert to DIV_ROUND_UP" patches in my inbox ;) It's not, however, our biggest problem. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 12:44 ` Jens Axboe 2007-06-12 16:02 ` Andrew Morton @ 2007-06-12 16:46 ` Andrew Morton 2007-06-12 17:32 ` Jens Axboe 1 sibling, 1 reply; 14+ messages in thread From: Andrew Morton @ 2007-06-12 16:46 UTC (permalink / raw) To: Jens Axboe; +Cc: Peter Zijlstra, linux-kernel On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote: > splice btw, I'm staring in profound mystification at this: int generic_pipe_buf_steal(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { struct page *page = buf->page; if (page_count(page) == 1) { lock_page(page); return 0; } return 1; } afacit that `if page_count(page)' test could be replaced by `if today_is_tuesday()'. But then I don't have the foggiest idea what it's trying to do. It would be nice to get some comments in and around here. Also, I was trying to work out the role and responsibility of the ->pin callback, and gave up. There isn't a lot of point in explaining this over email - one should be able to gain an understanding of these things by reading the code. I think the best way of tackling this would be to comprehensively document pipe_buf_operations and pipe_inode_info, please... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 16:46 ` Andrew Morton @ 2007-06-12 17:32 ` Jens Axboe 2007-06-12 18:15 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2007-06-12 17:32 UTC (permalink / raw) To: Andrew Morton; +Cc: Peter Zijlstra, linux-kernel On Tue, Jun 12 2007, Andrew Morton wrote: > On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote: > > > splice > > btw, I'm staring in profound mystification at this: > > int generic_pipe_buf_steal(struct pipe_inode_info *pipe, > struct pipe_buffer *buf) > { > struct page *page = buf->page; > > if (page_count(page) == 1) { > lock_page(page); > return 0; > } > > return 1; > } > > > afacit that `if page_count(page)' test could be replaced by > `if today_is_tuesday()'. But then I don't have the foggiest idea > what it's trying to do. > > It would be nice to get some comments in and around here. > > Also, I was trying to work out the role and responsibility of the ->pin > callback, and gave up. > > There isn't a lot of point in explaining this over email - one should be > able to gain an understanding of these things by reading the code. I think > the best way of tackling this would be to comprehensively document > pipe_buf_operations and pipe_inode_info, please... OK so I wont explain it in detail here, I'll write up a good set of comments tonight. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 17:32 ` Jens Axboe @ 2007-06-12 18:15 ` Jens Axboe 2007-06-12 18:43 ` Andrew Morton 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2007-06-12 18:15 UTC (permalink / raw) To: Andrew Morton; +Cc: Peter Zijlstra, linux-kernel On Tue, Jun 12 2007, Jens Axboe wrote: > On Tue, Jun 12 2007, Andrew Morton wrote: > > On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote: > > > > > splice > > > > btw, I'm staring in profound mystification at this: > > > > int generic_pipe_buf_steal(struct pipe_inode_info *pipe, > > struct pipe_buffer *buf) > > { > > struct page *page = buf->page; > > > > if (page_count(page) == 1) { > > lock_page(page); > > return 0; > > } > > > > return 1; > > } > > > > > > afacit that `if page_count(page)' test could be replaced by > > `if today_is_tuesday()'. But then I don't have the foggiest idea > > what it's trying to do. > > > > It would be nice to get some comments in and around here. > > > > Also, I was trying to work out the role and responsibility of the ->pin > > callback, and gave up. > > > > There isn't a lot of point in explaining this over email - one should be > > able to gain an understanding of these things by reading the code. I think > > the best way of tackling this would be to comprehensively document > > pipe_buf_operations and pipe_inode_info, please... > > OK so I wont explain it in detail here, I'll write up a good set of > comments tonight. I'll merge this into the #splice branch. diff --git a/fs/pipe.c b/fs/pipe.c index 3694af1..d007830 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -164,6 +164,20 @@ static void anon_pipe_buf_release(struct pipe_inode_info *pipe, page_cache_release(page); } +/** + * generic_pipe_buf_map - virtually map a pipe buffer + * @pipe: the pipe that the buffer belongs to + * @buf: the buffer that should be mapped + * @atomic: whether to use an atomic map + * + * Description: + * This function returns a kernel virtual address mapping for the + * passed in @pipe_buffer. If @atomic is set, an atomic map is provided + * and the caller has to be careful not to fault before calling + * the unmap function. + * + * Note that this function occupies KM_USER0 if @atomic != 0. + */ void *generic_pipe_buf_map(struct pipe_inode_info *pipe, struct pipe_buffer *buf, int atomic) { @@ -175,6 +189,15 @@ void *generic_pipe_buf_map(struct pipe_inode_info *pipe, return kmap(buf->page); } +/** + * generic_pipe_buf_unmap - unmap a previously mapped pipe buffer + * @pipe: the pipe that the buffer belongs to + * @buf: the buffer that should be unmapped + * @map_data: the data that the mapping function returned + * + * Description: + * This function undoes the mapping that ->map() provided. + */ void generic_pipe_buf_unmap(struct pipe_inode_info *pipe, struct pipe_buffer *buf, void *map_data) { @@ -185,11 +208,28 @@ void generic_pipe_buf_unmap(struct pipe_inode_info *pipe, kunmap(buf->page); } +/** + * generic_pipe_buf_steal - attempt to take ownership of a @pipe_buffer + * @pipe: the pipe that the buffer belongs to + * @buf: the buffer to attempt to steal + * + * Description: + * This function attempts to steal the @struct page attached to + * @buf. If successful, this function returns 0 and returns with + * the page locked. The caller may then reuse the page for whatever + * he wishes, the typical use is insertion into a different file + * page cache. + */ int generic_pipe_buf_steal(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { struct page *page = buf->page; + /* + * A reference of one is golden, that means that the owner of this + * page is the only one holding a reference to it. lock the page + * and return OK. + */ if (page_count(page) == 1) { lock_page(page); return 0; @@ -198,11 +238,30 @@ int generic_pipe_buf_steal(struct pipe_inode_info *pipe, return 1; } -void generic_pipe_buf_get(struct pipe_inode_info *info, struct pipe_buffer *buf) +/** + * generic_pipe_buf_get - get a reference to a @struct pipe_buffer + * @pipe: the pipe that the buffer belongs to + * @buf: the buffer to get a reference to + * + * Description: + * This function grabs an extra reference to @buf. It's used in + * in the tee() system call, when we duplicate the buffers in one + * pipe into another. + */ +void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { page_cache_get(buf->page); } +/** + * generic_pipe_buf_confirm - verify contents of the pipe buffer + * @pipe: the pipe that the buffer belongs to + * @buf: the buffer to confirm + * + * Description: + * This function does nothing, because the generic pipe code uses + * pages that are always good when inserted into the pipe. + */ int generic_pipe_buf_confirm(struct pipe_inode_info *info, struct pipe_buffer *buf) { diff --git a/fs/splice.c b/fs/splice.c index b125a61..dd3be2c 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -85,6 +85,10 @@ static void page_cache_pipe_buf_release(struct pipe_inode_info *pipe, buf->flags &= ~PIPE_BUF_FLAG_LRU; } +/* + * Check whether the contents of buf is OK to access. Since the content + * is a page cache page, IO may be in flight. + */ static int page_cache_pipe_buf_confirm(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index cc09fe8..333f389 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -17,6 +17,22 @@ struct pipe_buffer { unsigned long private; }; +/** + * struct pipe_inode_info - a linux kernel pipe + * @wait: reader/writer wait point in case of empty/full pipe + * @nrbufs: the number of non-empty pipe buffers in this pipe + * @curbuf: the current pipe buffer entry + * @tmp_page: cached released page + * @readers: number of current readers of this pipe + * @writers: number of current writers of this pipe + * @waiting_writers: number of writers blocked waiting for room + * @r_counter: reader counter + * @w_counter: writer counter + * @fasync_readers: reader side fasync + * @fasync_writers: writer side fasync + * @inode: inode this pipe is attached to + * @bufs: the circular array of pipe buffers + **/ struct pipe_inode_info { wait_queue_head_t wait; unsigned int nrbufs, curbuf; @@ -43,15 +59,65 @@ struct pipe_inode_info { * ->unmap() * * That is, ->map() must be called on a confirmed buffer, - * same goes for ->steal(). + * same goes for ->steal(). See below for the meaning of each + * operation. Also see kerneldoc in fs/pipe.c for the pipe + * and generic variants of these hooks. */ struct pipe_buf_operations { + /* + * This is set to 1, if the generic pipe read/write may coalesce + * data into an existing buffer. If this is set to 0, a new pipe + * page segment is always used for new data. + */ int can_merge; + + /* + * ->map() returns a virtual address mapping of the pipe buffer. + * The last integer flag reflects whether this should be an atomic + * mapping or not. The atomic map is faster, however you can't take + * page faults before calling ->unmap() again. So if you need to eg + * access user data through copy_to/from_user(), then you must get + * a non-atomic map. ->map() uses the KM_USER0 atomic slot for + * atomic maps, so you can't map more than one pipe_buffer at once + * and you have to be careful if mapping another page as source + * or destination for a copy (IOW, it has to use something else + * than KM_USER0). + */ void * (*map)(struct pipe_inode_info *, struct pipe_buffer *, int); + + /* + * Undoes ->map(), finishes the virtual mapping of the pipe buffer. + */ void (*unmap)(struct pipe_inode_info *, struct pipe_buffer *, void *); + + /* + * ->confirm() verifies that the data in the pipe buffer is there + * and that the contents are good. If the pages in the pipe belong + * to a file system, we may need to wait for IO completion in this + * hook. Returns 0 for good, or a negative error value in case of + * error. + */ int (*confirm)(struct pipe_inode_info *, struct pipe_buffer *); + + /* + * When the contents of this pipe buffer has been completely + * consumed by a reader, ->release() is called. + */ void (*release)(struct pipe_inode_info *, struct pipe_buffer *); + + /* + * Attempt to take ownership of the pipe buffer and its contents. + * ->steal() returns 0 for success, in which case the contents + * of the pipe (the buf->page) is locked and now completely owned + * by the caller. The page may then be transferred to a different + * mapping, the most often used case is insertion into different + * file address space cache. + */ int (*steal)(struct pipe_inode_info *, struct pipe_buffer *); + + /* + * Get a reference to the pipe buffer. + */ void (*get)(struct pipe_inode_info *, struct pipe_buffer *); }; -- Jens Axboe ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 18:15 ` Jens Axboe @ 2007-06-12 18:43 ` Andrew Morton 2007-06-12 18:48 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Andrew Morton @ 2007-06-12 18:43 UTC (permalink / raw) To: Jens Axboe; +Cc: Peter Zijlstra, linux-kernel On Tue, 12 Jun 2007 20:15:41 +0200 Jens Axboe <jens.axboe@oracle.com> wrote: > On Tue, Jun 12 2007, Jens Axboe wrote: > > On Tue, Jun 12 2007, Andrew Morton wrote: > > > On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote: > > > > > > > splice > > > > > > btw, I'm staring in profound mystification at this: > > > > > > int generic_pipe_buf_steal(struct pipe_inode_info *pipe, > > > struct pipe_buffer *buf) > > > { > > > struct page *page = buf->page; > > > > > > if (page_count(page) == 1) { > > > lock_page(page); > > > return 0; > > > } > > > > > > return 1; > > > } > > > > > > > > > afacit that `if page_count(page)' test could be replaced by > > > `if today_is_tuesday()'. But then I don't have the foggiest idea > > > what it's trying to do. > > > > > > It would be nice to get some comments in and around here. > > > > > > Also, I was trying to work out the role and responsibility of the ->pin > > > callback, and gave up. > > > > > > There isn't a lot of point in explaining this over email - one should be > > > able to gain an understanding of these things by reading the code. I think > > > the best way of tackling this would be to comprehensively document > > > pipe_buf_operations and pipe_inode_info, please... > > > > OK so I wont explain it in detail here, I'll write up a good set of > > comments tonight. > > I'll merge this into the #splice branch. Great, thanks. > +/** > + * generic_pipe_buf_steal - attempt to take ownership of a @pipe_buffer > + * @pipe: the pipe that the buffer belongs to > + * @buf: the buffer to attempt to steal > + * > + * Description: > + * This function attempts to steal the @struct page attached to > + * @buf. If successful, this function returns 0 and returns with > + * the page locked. The caller may then reuse the page for whatever > + * he wishes, the typical use is insertion into a different file > + * page cache. > + */ > int generic_pipe_buf_steal(struct pipe_inode_info *pipe, > struct pipe_buffer *buf) > { > struct page *page = buf->page; > > + /* > + * A reference of one is golden, that means that the owner of this > + * page is the only one holding a reference to it. lock the page > + * and return OK. > + */ > if (page_count(page) == 1) { > lock_page(page); > return 0; I still don't get this code. I guess I should have asked for pipe_buffer docs too ;) What sorts of pages can find themselves inside a pipe_buffer, and which of these types of pages is the above test detecting? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor 2007-06-12 18:43 ` Andrew Morton @ 2007-06-12 18:48 ` Jens Axboe 0 siblings, 0 replies; 14+ messages in thread From: Jens Axboe @ 2007-06-12 18:48 UTC (permalink / raw) To: Andrew Morton; +Cc: Peter Zijlstra, linux-kernel On Tue, Jun 12 2007, Andrew Morton wrote: > On Tue, 12 Jun 2007 20:15:41 +0200 > Jens Axboe <jens.axboe@oracle.com> wrote: > > > On Tue, Jun 12 2007, Jens Axboe wrote: > > > On Tue, Jun 12 2007, Andrew Morton wrote: > > > > On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote: > > > > > > > > > splice > > > > > > > > btw, I'm staring in profound mystification at this: > > > > > > > > int generic_pipe_buf_steal(struct pipe_inode_info *pipe, > > > > struct pipe_buffer *buf) > > > > { > > > > struct page *page = buf->page; > > > > > > > > if (page_count(page) == 1) { > > > > lock_page(page); > > > > return 0; > > > > } > > > > > > > > return 1; > > > > } > > > > > > > > > > > > afacit that `if page_count(page)' test could be replaced by > > > > `if today_is_tuesday()'. But then I don't have the foggiest idea > > > > what it's trying to do. > > > > > > > > It would be nice to get some comments in and around here. > > > > > > > > Also, I was trying to work out the role and responsibility of the ->pin > > > > callback, and gave up. > > > > > > > > There isn't a lot of point in explaining this over email - one should be > > > > able to gain an understanding of these things by reading the code. I think > > > > the best way of tackling this would be to comprehensively document > > > > pipe_buf_operations and pipe_inode_info, please... > > > > > > OK so I wont explain it in detail here, I'll write up a good set of > > > comments tonight. > > > > I'll merge this into the #splice branch. > > Great, thanks. > > > +/** > > + * generic_pipe_buf_steal - attempt to take ownership of a @pipe_buffer > > + * @pipe: the pipe that the buffer belongs to > > + * @buf: the buffer to attempt to steal > > + * > > + * Description: > > + * This function attempts to steal the @struct page attached to > > + * @buf. If successful, this function returns 0 and returns with > > + * the page locked. The caller may then reuse the page for whatever > > + * he wishes, the typical use is insertion into a different file > > + * page cache. > > + */ > > int generic_pipe_buf_steal(struct pipe_inode_info *pipe, > > struct pipe_buffer *buf) > > { > > struct page *page = buf->page; > > > > + /* > > + * A reference of one is golden, that means that the owner of this > > + * page is the only one holding a reference to it. lock the page > > + * and return OK. > > + */ > > if (page_count(page) == 1) { > > lock_page(page); > > return 0; > > I still don't get this code. I guess I should have asked for pipe_buffer > docs too ;) I just love commenting stuff, so I'll treat you to a pipe_buffer commentary as well :-) > What sorts of pages can find themselves inside a pipe_buffer, and which of > these types of pages is the above test detecting? Any type of page, really. But for generic_pipe_buf_steal(), it's a page allocated through alloc_page(). The ops must match the buf contents (hence it's placed in the pipe_buffer, not the pipe itself). So if you shoving different pages in there, then your steal function (and others) must reflect that. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-06-12 18:51 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200706112159.l5BLxF5x004043@hera.kernel.org>
2007-06-11 23:34 ` splice: move balance_dirty_pages_ratelimited() outside of splice actor Andrew Morton
2007-06-12 6:35 ` Jens Axboe
2007-06-12 11:20 ` Jens Axboe
2007-06-12 11:31 ` Jens Axboe
2007-06-12 12:06 ` Peter Zijlstra
2007-06-12 12:10 ` Jens Axboe
2007-06-12 12:16 ` Peter Zijlstra
2007-06-12 12:44 ` Jens Axboe
2007-06-12 16:02 ` Andrew Morton
2007-06-12 16:46 ` Andrew Morton
2007-06-12 17:32 ` Jens Axboe
2007-06-12 18:15 ` Jens Axboe
2007-06-12 18:43 ` Andrew Morton
2007-06-12 18:48 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).