linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: splice: move balance_dirty_pages_ratelimited() outside of splice actor
       [not found] <200706112159.l5BLxF5x004043@hera.kernel.org>
@ 2007-06-11 23:34 ` Andrew Morton
  2007-06-12  6:35   ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2007-06-11 23:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Mon, 11 Jun 2007 21:59:15 GMT
Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote:

> Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=20d698db67059a63d217030dfd02872cb5f88dfb
> Commit:     20d698db67059a63d217030dfd02872cb5f88dfb
> Parent:     17374ff1aa9ce2a0597416a16729474b538af443
> Author:     Jens Axboe <jens.axboe@oracle.com>
> AuthorDate: Tue Jun 5 11:05:11 2007 +0200
> Committer:  Jens Axboe <jens.axboe@oracle.com>
> CommitDate: Fri Jun 8 08:33:59 2007 +0200
> 
>     splice: move balance_dirty_pages_ratelimited() outside of splice actor
>     
>     I've seen inode related deadlocks, so move this call outside of the
>     actor itself, which may hold the inode lock.
>     

eh?  If the pipe_to_file() caller holds inode_lock, our problems are large.

I doubt if that's true, so what problem is this patch really fixing??

> ---
>  fs/splice.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index b78a7f0..6349d31 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -652,7 +652,6 @@ find_page:
>  	 * accessed, we are now done!
>  	 */
>  	mark_page_accessed(page);
> -	balance_dirty_pages_ratelimited(mapping);
>  out:
>  	page_cache_release(page);
>  	unlock_page(page);
> @@ -823,6 +822,7 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out,
>  			if (err)
>  				ret = err;
>  		}
> +		balance_dirty_pages_ratelimited(mapping);
>  	}
>  
>  	return ret;
> @@ -876,6 +876,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
>  			if (err)
>  				ret = err;
>  		}
> +		balance_dirty_pages_ratelimited(mapping);
>  	}
>  

balance_dirty_pages_ratelimited() is supposed to be called
once-per-dirtied-page.  This caller can dirty an arbitrarily large amount
of memory and hence should use balance_dirty_pages_ratelimited_nr().

As things stand, a large splice() could potentially cause the dirty limits
to be exceeded.

btw, can we please arrange to get patches reviewed prior to them being
merged?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of  splice actor
  2007-06-11 23:34 ` splice: move balance_dirty_pages_ratelimited() outside of splice actor Andrew Morton
@ 2007-06-12  6:35   ` Jens Axboe
  2007-06-12 11:20     ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2007-06-12  6:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Mon, Jun 11 2007, Andrew Morton wrote:
> On Mon, 11 Jun 2007 21:59:15 GMT
> Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote:
> 
> > Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=20d698db67059a63d217030dfd02872cb5f88dfb
> > Commit:     20d698db67059a63d217030dfd02872cb5f88dfb
> > Parent:     17374ff1aa9ce2a0597416a16729474b538af443
> > Author:     Jens Axboe <jens.axboe@oracle.com>
> > AuthorDate: Tue Jun 5 11:05:11 2007 +0200
> > Committer:  Jens Axboe <jens.axboe@oracle.com>
> > CommitDate: Fri Jun 8 08:33:59 2007 +0200
> > 
> >     splice: move balance_dirty_pages_ratelimited() outside of splice actor
> >     
> >     I've seen inode related deadlocks, so move this call outside of the
> >     actor itself, which may hold the inode lock.
> >     
> 
> eh?  If the pipe_to_file() caller holds inode_lock, our problems are large.
> 
> I doubt if that's true, so what problem is this patch really fixing??

I can repeatedly lock up balance_dirty_pages() if it's called inside the
double lock of splice_from_pipe(). I'll fire up the test box and
reproduce, then post the backtraces.

> > ---
> >  fs/splice.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/splice.c b/fs/splice.c
> > index b78a7f0..6349d31 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -652,7 +652,6 @@ find_page:
> >  	 * accessed, we are now done!
> >  	 */
> >  	mark_page_accessed(page);
> > -	balance_dirty_pages_ratelimited(mapping);
> >  out:
> >  	page_cache_release(page);
> >  	unlock_page(page);
> > @@ -823,6 +822,7 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out,
> >  			if (err)
> >  				ret = err;
> >  		}
> > +		balance_dirty_pages_ratelimited(mapping);
> >  	}
> >  
> >  	return ret;
> > @@ -876,6 +876,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
> >  			if (err)
> >  				ret = err;
> >  		}
> > +		balance_dirty_pages_ratelimited(mapping);
> >  	}
> >  
> 
> balance_dirty_pages_ratelimited() is supposed to be called
> once-per-dirtied-page.  This caller can dirty an arbitrarily large amount
> of memory and hence should use balance_dirty_pages_ratelimited_nr().
> 
> As things stand, a large splice() could potentially cause the dirty limits
> to be exceeded.

It's mostly (at most) 16 pages, not an arbitrarily large amount. So I
doubt it makes a lot of difference.

> btw, can we please arrange to get patches reviewed prior to them being
> merged?

Sure, I should have posted the series here. Mostly simple stuff though,
and others have seen them.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of  splice actor
  2007-06-12  6:35   ` Jens Axboe
@ 2007-06-12 11:20     ` Jens Axboe
  2007-06-12 11:31       ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2007-06-12 11:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Tue, Jun 12 2007, Jens Axboe wrote:
> On Mon, Jun 11 2007, Andrew Morton wrote:
> > On Mon, 11 Jun 2007 21:59:15 GMT
> > Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote:
> > 
> > > Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=20d698db67059a63d217030dfd02872cb5f88dfb
> > > Commit:     20d698db67059a63d217030dfd02872cb5f88dfb
> > > Parent:     17374ff1aa9ce2a0597416a16729474b538af443
> > > Author:     Jens Axboe <jens.axboe@oracle.com>
> > > AuthorDate: Tue Jun 5 11:05:11 2007 +0200
> > > Committer:  Jens Axboe <jens.axboe@oracle.com>
> > > CommitDate: Fri Jun 8 08:33:59 2007 +0200
> > > 
> > >     splice: move balance_dirty_pages_ratelimited() outside of splice actor
> > >     
> > >     I've seen inode related deadlocks, so move this call outside of the
> > >     actor itself, which may hold the inode lock.
> > >     
> > 
> > eh?  If the pipe_to_file() caller holds inode_lock, our problems are large.
> > 
> > I doubt if that's true, so what problem is this patch really fixing??
> 
> I can repeatedly lock up balance_dirty_pages() if it's called inside the
> double lock of splice_from_pipe(). I'll fire up the test box and
> reproduce, then post the backtraces.

OK, here it is. Running

$ ./splice-fromnet 9999 | ./splice-out xxx

on one box (receive from network port 9999, store in file xxx) and a
simple

$ cat SUSE-Linux-10.1-GM-DVD-i386.iso | netcat centera 9999

on another box. After ~180mb has been transferred, splice-out gets
stuck. With the balancing moved outside of the splice actor, everything
works perfectly regardless of memory pressure.

splice-fromne S 00000000     0  9829   9778 (NOTLB)
   efe63d4c 00000046 f524cb20 00000000 efe63cfc c0144ae3 00000000 f524cb20
   c03c33fa f1b9ee0c 00000009 f524cb20 02ad51ce 00000024 00002406 f524cc30
   c1cf3980 00000001 efe63d44 00000246 f6692040 00000282 efe63d54 f2f83600
Call Trace:
 [<c01809bd>] pipe_wait+0x6d/0x90
 [<c0198006>] splice_to_pipe+0x196/0x240
 [<c035eb4d>] skb_splice_bits+0xcd/0xe0
 [<c038157b>] tcp_splice_data_recv+0x2b/0x40
 [<c0381501>] tcp_read_sock+0x131/0x180
 [<c0381f55>] tcp_splice_read+0x85/0x1e0
 [<c03572f5>] sock_splice_read+0x25/0x30
 [<c019743e>] do_splice_to+0x5e/0x80
 [<c0198beb>] sys_splice+0x1eb/0x200
 [<c010511e>] sysenter_past_esp+0x5f/0x99
 =======================
splice-out    D EF423C98     0  9830   9778 (NOTLB)
   ef423cac 00000046 00000002 ef423c98 ef423c94 00000000 00000046 f760ba98
   f760ba98 f760b9a0 00000006 f7cba0f0 02ca49b4 00000024 001cf7e6 f7cba200
   c1cf3980 00000001 c0144c73 c1cdaf78 f55c64c0 00000046 c1cdaf78 fffd5d5b
Call Trace:
 [<c03c272d>] io_schedule+0x1d/0x30
 [<c0154979>] sync_page+0x39/0x50
 [<c03c2d20>] __wait_on_bit_lock+0x40/0x70
 [<c0154929>] __lock_page+0x69/0x70
 [<c015a9c5>] write_cache_pages+0x105/0x2f0
 [<c015abd3>] generic_writepages+0x23/0x30
 [<c015ac29>] do_writepages+0x49/0x50
 [<c01960a4>] __writeback_single_inode+0x94/0x370
 [<c0196783>] sync_sb_inodes+0x1b3/0x270
 [<c0196aa8>] writeback_inodes+0xb8/0xf0
 [<c015afed>] balance_dirty_pages_ratelimited_nr+0x9d/0x1b0
 [<c01978b3>] pipe_to_file+0x163/0x260
 [<c0197a0f>] __splice_from_pipe+0x5f/0x250
 [<c0197d18>] splice_from_pipe+0x58/0x80
 [<c0197dc4>] generic_file_splice_write+0x54/0x100
 [<c01974bf>] do_splice_from+0x5f/0x80
 [<c0198bce>] sys_splice+0x1ce/0x200
 [<c010511e>] sysenter_past_esp+0x5f/0x99

There are two pdflushes, doing nothing:

 =======================
pdflush       S F7D8FF90     0   171      2 (L-TLB)
   f7d8ffa4 00000046 00000002 f7d8ff90 f7d8ff8c 00000000 f7c50b60 c055ebac
   f7c51090 00000001 0000000a f7c50b60 6fc07726 00000026 0002552e f7c50c70
   c1ce9980 00000000 c015b750 f7d8ff98 f71d8b80 c0498c8c 00000001 fffd86d8
Call Trace:
 [<c015b7ec>] pdflush+0x9c/0x1c0
 [<c013aa42>] kthread+0x42/0x70
 [<c0105e0b>] kernel_thread_helper+0x7/0x1c
 =======================
pdflush       S F7D91F90     0   172      2 (L-TLB)
   f7d91fa4 00000046 00000002 f7d91f90 f7d91f8c 00000000 f7ca6aa0 c055ebac
   f7ca6fd0 00000001 00000009 f7ca6aa0 01d76a76 00000024 003c4c0a f7ca6bb0
   c1ce9980 00000000 c015b750 f7d91f98 f7255dc0 c0498c8c 00000001 fffd5df9
Call Trace:
 [<c015b7ec>] pdflush+0x9c/0x1c0
 [<c013aa42>] kthread+0x42/0x70
 [<c0105e0b>] kernel_thread_helper+0x7/0x1c

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of  splice actor
  2007-06-12 11:20     ` Jens Axboe
@ 2007-06-12 11:31       ` Jens Axboe
  2007-06-12 12:06         ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2007-06-12 11:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Tue, Jun 12 2007, Jens Axboe wrote:
> On Tue, Jun 12 2007, Jens Axboe wrote:
> > On Mon, Jun 11 2007, Andrew Morton wrote:
> > > On Mon, 11 Jun 2007 21:59:15 GMT
> > > Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote:
> > > 
> > > > Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=20d698db67059a63d217030dfd02872cb5f88dfb
> > > > Commit:     20d698db67059a63d217030dfd02872cb5f88dfb
> > > > Parent:     17374ff1aa9ce2a0597416a16729474b538af443
> > > > Author:     Jens Axboe <jens.axboe@oracle.com>
> > > > AuthorDate: Tue Jun 5 11:05:11 2007 +0200
> > > > Committer:  Jens Axboe <jens.axboe@oracle.com>
> > > > CommitDate: Fri Jun 8 08:33:59 2007 +0200
> > > > 
> > > >     splice: move balance_dirty_pages_ratelimited() outside of splice actor
> > > >     
> > > >     I've seen inode related deadlocks, so move this call outside of the
> > > >     actor itself, which may hold the inode lock.
> > > >     
> > > 
> > > eh?  If the pipe_to_file() caller holds inode_lock, our problems are large.
> > > 
> > > I doubt if that's true, so what problem is this patch really fixing??
> > 
> > I can repeatedly lock up balance_dirty_pages() if it's called inside the
> > double lock of splice_from_pipe(). I'll fire up the test box and
> > reproduce, then post the backtraces.
> 
> OK, here it is. Running
> 
> $ ./splice-fromnet 9999 | ./splice-out xxx
> 
> on one box (receive from network port 9999, store in file xxx) and a
> simple
> 
> $ cat SUSE-Linux-10.1-GM-DVD-i386.iso | netcat centera 9999
> 
> on another box. After ~180mb has been transferred, splice-out gets
> stuck. With the balancing moved outside of the splice actor, everything
> works perfectly regardless of memory pressure.
> 
> splice-fromne S 00000000     0  9829   9778 (NOTLB)
>    efe63d4c 00000046 f524cb20 00000000 efe63cfc c0144ae3 00000000 f524cb20
>    c03c33fa f1b9ee0c 00000009 f524cb20 02ad51ce 00000024 00002406 f524cc30
>    c1cf3980 00000001 efe63d44 00000246 f6692040 00000282 efe63d54 f2f83600
> Call Trace:
>  [<c01809bd>] pipe_wait+0x6d/0x90
>  [<c0198006>] splice_to_pipe+0x196/0x240
>  [<c035eb4d>] skb_splice_bits+0xcd/0xe0
>  [<c038157b>] tcp_splice_data_recv+0x2b/0x40
>  [<c0381501>] tcp_read_sock+0x131/0x180
>  [<c0381f55>] tcp_splice_read+0x85/0x1e0
>  [<c03572f5>] sock_splice_read+0x25/0x30
>  [<c019743e>] do_splice_to+0x5e/0x80
>  [<c0198beb>] sys_splice+0x1eb/0x200
>  [<c010511e>] sysenter_past_esp+0x5f/0x99
>  =======================
> splice-out    D EF423C98     0  9830   9778 (NOTLB)
>    ef423cac 00000046 00000002 ef423c98 ef423c94 00000000 00000046 f760ba98
>    f760ba98 f760b9a0 00000006 f7cba0f0 02ca49b4 00000024 001cf7e6 f7cba200
>    c1cf3980 00000001 c0144c73 c1cdaf78 f55c64c0 00000046 c1cdaf78 fffd5d5b
> Call Trace:
>  [<c03c272d>] io_schedule+0x1d/0x30
>  [<c0154979>] sync_page+0x39/0x50
>  [<c03c2d20>] __wait_on_bit_lock+0x40/0x70
>  [<c0154929>] __lock_page+0x69/0x70
>  [<c015a9c5>] write_cache_pages+0x105/0x2f0
>  [<c015abd3>] generic_writepages+0x23/0x30
>  [<c015ac29>] do_writepages+0x49/0x50
>  [<c01960a4>] __writeback_single_inode+0x94/0x370
>  [<c0196783>] sync_sb_inodes+0x1b3/0x270
>  [<c0196aa8>] writeback_inodes+0xb8/0xf0
>  [<c015afed>] balance_dirty_pages_ratelimited_nr+0x9d/0x1b0
>  [<c01978b3>] pipe_to_file+0x163/0x260
>  [<c0197a0f>] __splice_from_pipe+0x5f/0x250
>  [<c0197d18>] splice_from_pipe+0x58/0x80
>  [<c0197dc4>] generic_file_splice_write+0x54/0x100
>  [<c01974bf>] do_splice_from+0x5f/0x80
>  [<c0198bce>] sys_splice+0x1ce/0x200
>  [<c010511e>] sysenter_past_esp+0x5f/0x99

Alright, I think the "mystery" is solved - it's not the inode locks
(pheew), it's pipe_to_file() calling balance_dirty_pages() with the page
lock still held. The rest is self-explanatory, we end up blocking on the
page lock.

Would you prefer this change, then? I'd prefer keeping the current code,
unless it's absolutely critical that we call
balance_dirty_pages_ratelimited() for each and every page instead of eg
every 16 pages here.

diff --git a/fs/splice.c b/fs/splice.c
index 25ec9c8..21e6b92 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -660,6 +699,8 @@ find_page:
 out:
 	page_cache_release(page);
 	unlock_page(page);
+	if (ret)
+		balance_dirty_pages_ratelimited(mapping);
 out_ret:
 	return ret;
 }
@@ -857,7 +898,6 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out,
 			if (err)
 				ret = err;
 		}
-		balance_dirty_pages_ratelimited(mapping);
 	}
 
 	return ret;
@@ -913,7 +953,6 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 			if (err)
 				ret = err;
 		}
-		balance_dirty_pages_ratelimited(mapping);
 	}
 
 	return ret;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of  splice actor
  2007-06-12 11:31       ` Jens Axboe
@ 2007-06-12 12:06         ` Peter Zijlstra
  2007-06-12 12:10           ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2007-06-12 12:06 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

On Tue, 2007-06-12 at 13:31 +0200, Jens Axboe wrote:

> Would you prefer this change, then? I'd prefer keeping the current code,
> unless it's absolutely critical that we call
> balance_dirty_pages_ratelimited() for each and every page instead of eg
> every 16 pages here.

For that we should call:
  balance_dirty_pages_ratelimited_nr(mapping, nr);

Which is ok, for small nr.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of   splice actor
  2007-06-12 12:06         ` Peter Zijlstra
@ 2007-06-12 12:10           ` Jens Axboe
  2007-06-12 12:16             ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2007-06-12 12:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andrew Morton, linux-kernel

On Tue, Jun 12 2007, Peter Zijlstra wrote:
> On Tue, 2007-06-12 at 13:31 +0200, Jens Axboe wrote:
> 
> > Would you prefer this change, then? I'd prefer keeping the current code,
> > unless it's absolutely critical that we call
> > balance_dirty_pages_ratelimited() for each and every page instead of eg
> > every 16 pages here.
> 
> For that we should call:
>   balance_dirty_pages_ratelimited_nr(mapping, nr);
> 
> Which is ok, for small nr.

OK, then this should be better:

diff --git a/fs/splice.c b/fs/splice.c
index 25ec9c8..ed40967 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -844,6 +883,9 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out,
 	sd.file = out;
 	ret = __splice_from_pipe(pipe, &sd, pipe_to_file);
 	if (ret > 0) {
+		unsigned long nr_pages;
+
+		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 		*ppos += ret;
 
 		/*
@@ -857,7 +899,7 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out,
 			if (err)
 				ret = err;
 		}
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
 	}
 
 	return ret;
@@ -898,6 +940,9 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 
 	ret = splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_file);
 	if (ret > 0) {
+		unsigned long nr_pages;
+
+		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 		*ppos += ret;
 
 		/*
@@ -913,7 +958,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 			if (err)
 				ret = err;
 		}
-		balance_dirty_pages_ratelimited(mapping);
+		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
 	}
 
 	return ret;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of   splice actor
  2007-06-12 12:10           ` Jens Axboe
@ 2007-06-12 12:16             ` Peter Zijlstra
  2007-06-12 12:44               ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2007-06-12 12:16 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

On Tue, 2007-06-12 at 14:10 +0200, Jens Axboe wrote:
> On Tue, Jun 12 2007, Peter Zijlstra wrote:
> > On Tue, 2007-06-12 at 13:31 +0200, Jens Axboe wrote:
> > 
> > > Would you prefer this change, then? I'd prefer keeping the current code,
> > > unless it's absolutely critical that we call
> > > balance_dirty_pages_ratelimited() for each and every page instead of eg
> > > every 16 pages here.
> > 
> > For that we should call:
> >   balance_dirty_pages_ratelimited_nr(mapping, nr);
> > 
> > Which is ok, for small nr.
> 
> OK, then this should be better:
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index 25ec9c8..ed40967 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -844,6 +883,9 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out,
>  	sd.file = out;
>  	ret = __splice_from_pipe(pipe, &sd, pipe_to_file);
>  	if (ret > 0) {
> +		unsigned long nr_pages;
> +
> +		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;

perhaps?
	nr_pages = DIV_ROUND_UP(ret, PAGE_CACHE_SIZE); 

not sure how horrid that turns out to be; you never know with gcc.

Otherwise, looks good.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of    splice actor
  2007-06-12 12:16             ` Peter Zijlstra
@ 2007-06-12 12:44               ` Jens Axboe
  2007-06-12 16:02                 ` Andrew Morton
  2007-06-12 16:46                 ` Andrew Morton
  0 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2007-06-12 12:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andrew Morton, linux-kernel

On Tue, Jun 12 2007, Peter Zijlstra wrote:
> On Tue, 2007-06-12 at 14:10 +0200, Jens Axboe wrote:
> > On Tue, Jun 12 2007, Peter Zijlstra wrote:
> > > On Tue, 2007-06-12 at 13:31 +0200, Jens Axboe wrote:
> > > 
> > > > Would you prefer this change, then? I'd prefer keeping the current code,
> > > > unless it's absolutely critical that we call
> > > > balance_dirty_pages_ratelimited() for each and every page instead of eg
> > > > every 16 pages here.
> > > 
> > > For that we should call:
> > >   balance_dirty_pages_ratelimited_nr(mapping, nr);
> > > 
> > > Which is ok, for small nr.
> > 
> > OK, then this should be better:
> > 
> > diff --git a/fs/splice.c b/fs/splice.c
> > index 25ec9c8..ed40967 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -844,6 +883,9 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out,
> >  	sd.file = out;
> >  	ret = __splice_from_pipe(pipe, &sd, pipe_to_file);
> >  	if (ret > 0) {
> > +		unsigned long nr_pages;
> > +
> > +		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> 
> perhaps?
> 	nr_pages = DIV_ROUND_UP(ret, PAGE_CACHE_SIZE); 
> 
> not sure how horrid that turns out to be; you never know with gcc.

Well, I think such macros are horribly ugly and that the original code
is MUCH easier to read.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of    splice actor
  2007-06-12 12:44               ` Jens Axboe
@ 2007-06-12 16:02                 ` Andrew Morton
  2007-06-12 16:46                 ` Andrew Morton
  1 sibling, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2007-06-12 16:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, linux-kernel

On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:

> On Tue, Jun 12 2007, Peter Zijlstra wrote:
> > On Tue, 2007-06-12 at 14:10 +0200, Jens Axboe wrote:
> > > On Tue, Jun 12 2007, Peter Zijlstra wrote:
> > > > On Tue, 2007-06-12 at 13:31 +0200, Jens Axboe wrote:
> > > > 
> > > > > Would you prefer this change, then? I'd prefer keeping the current code,
> > > > > unless it's absolutely critical that we call
> > > > > balance_dirty_pages_ratelimited() for each and every page instead of eg
> > > > > every 16 pages here.
> > > > 
> > > > For that we should call:
> > > >   balance_dirty_pages_ratelimited_nr(mapping, nr);
> > > > 
> > > > Which is ok, for small nr.
> > > 
> > > OK, then this should be better:

yup, the patch looks fine, thanks.

It will in practice be OK calling balance_dirty_pages() once per 16 pages
but one should try to be a good little kernel citizen and do the right
thing, I guess.

> > > diff --git a/fs/splice.c b/fs/splice.c
> > > index 25ec9c8..ed40967 100644
> > > --- a/fs/splice.c
> > > +++ b/fs/splice.c
> > > @@ -844,6 +883,9 @@ generic_file_splice_write_nolock(struct pipe_inode_info *pipe, struct file *out,
> > >  	sd.file = out;
> > >  	ret = __splice_from_pipe(pipe, &sd, pipe_to_file);
> > >  	if (ret > 0) {
> > > +		unsigned long nr_pages;
> > > +
> > > +		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> > 
> > perhaps?
> > 	nr_pages = DIV_ROUND_UP(ret, PAGE_CACHE_SIZE); 
> > 
> > not sure how horrid that turns out to be; you never know with gcc.
> 
> Well, I think such macros are horribly ugly and that the original code
> is MUCH easier to read.

I think the macros are pretty foul too (perhaps we should migrate it to
"div_round_up()").

But we should migrate to them.  Because once one has learned what a
particular helper like this does, the code becomes more readable, more
understandable and easier to verify correct behaviour.  Whereas the
open-coded arithmetic is _always_ suspect and always needs to be checked
each time one's eye passes over it.

And then there's the ongoing pitter-patter of "convert to DIV_ROUND_UP"
patches in my inbox ;)

It's not, however, our biggest problem.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of    splice actor
  2007-06-12 12:44               ` Jens Axboe
  2007-06-12 16:02                 ` Andrew Morton
@ 2007-06-12 16:46                 ` Andrew Morton
  2007-06-12 17:32                   ` Jens Axboe
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2007-06-12 16:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, linux-kernel

On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:

> splice

btw, I'm staring in profound mystification at this:

int generic_pipe_buf_steal(struct pipe_inode_info *pipe,
			   struct pipe_buffer *buf)
{
	struct page *page = buf->page;

	if (page_count(page) == 1) {
		lock_page(page);
		return 0;
	}

	return 1;
}


afacit that `if page_count(page)' test could be replaced by
`if today_is_tuesday()'.  But then I don't have the foggiest idea
what it's trying to do.

It would be nice to get some comments in and around here.

Also, I was trying to work out the role and responsibility of the ->pin
callback, and gave up.

There isn't a lot of point in explaining this over email - one should be
able to gain an understanding of these things by reading the code.  I think
the best way of tackling this would be to comprehensively document
pipe_buf_operations and pipe_inode_info, please...



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of     splice actor
  2007-06-12 16:46                 ` Andrew Morton
@ 2007-06-12 17:32                   ` Jens Axboe
  2007-06-12 18:15                     ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2007-06-12 17:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Peter Zijlstra, linux-kernel

On Tue, Jun 12 2007, Andrew Morton wrote:
> On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > splice
> 
> btw, I'm staring in profound mystification at this:
> 
> int generic_pipe_buf_steal(struct pipe_inode_info *pipe,
> 			   struct pipe_buffer *buf)
> {
> 	struct page *page = buf->page;
> 
> 	if (page_count(page) == 1) {
> 		lock_page(page);
> 		return 0;
> 	}
> 
> 	return 1;
> }
> 
> 
> afacit that `if page_count(page)' test could be replaced by
> `if today_is_tuesday()'.  But then I don't have the foggiest idea
> what it's trying to do.
> 
> It would be nice to get some comments in and around here.
> 
> Also, I was trying to work out the role and responsibility of the ->pin
> callback, and gave up.
> 
> There isn't a lot of point in explaining this over email - one should be
> able to gain an understanding of these things by reading the code.  I think
> the best way of tackling this would be to comprehensively document
> pipe_buf_operations and pipe_inode_info, please...

OK so I wont explain it in detail here, I'll write up a good set of
comments tonight.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of     splice actor
  2007-06-12 17:32                   ` Jens Axboe
@ 2007-06-12 18:15                     ` Jens Axboe
  2007-06-12 18:43                       ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2007-06-12 18:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Peter Zijlstra, linux-kernel

On Tue, Jun 12 2007, Jens Axboe wrote:
> On Tue, Jun 12 2007, Andrew Morton wrote:
> > On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > splice
> > 
> > btw, I'm staring in profound mystification at this:
> > 
> > int generic_pipe_buf_steal(struct pipe_inode_info *pipe,
> > 			   struct pipe_buffer *buf)
> > {
> > 	struct page *page = buf->page;
> > 
> > 	if (page_count(page) == 1) {
> > 		lock_page(page);
> > 		return 0;
> > 	}
> > 
> > 	return 1;
> > }
> > 
> > 
> > afacit that `if page_count(page)' test could be replaced by
> > `if today_is_tuesday()'.  But then I don't have the foggiest idea
> > what it's trying to do.
> > 
> > It would be nice to get some comments in and around here.
> > 
> > Also, I was trying to work out the role and responsibility of the ->pin
> > callback, and gave up.
> > 
> > There isn't a lot of point in explaining this over email - one should be
> > able to gain an understanding of these things by reading the code.  I think
> > the best way of tackling this would be to comprehensively document
> > pipe_buf_operations and pipe_inode_info, please...
> 
> OK so I wont explain it in detail here, I'll write up a good set of
> comments tonight.

I'll merge this into the #splice branch.

diff --git a/fs/pipe.c b/fs/pipe.c
index 3694af1..d007830 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -164,6 +164,20 @@ static void anon_pipe_buf_release(struct pipe_inode_info *pipe,
 		page_cache_release(page);
 }
 
+/**
+ * generic_pipe_buf_map - virtually map a pipe buffer
+ * @pipe:	the pipe that the buffer belongs to
+ * @buf:	the buffer that should be mapped
+ * @atomic:	whether to use an atomic map
+ *
+ * Description:
+ *	This function returns a kernel virtual address mapping for the
+ *	passed in @pipe_buffer. If @atomic is set, an atomic map is provided
+ *	and the caller has to be careful not to fault before calling
+ *	the unmap function.
+ *
+ *	Note that this function occupies KM_USER0 if @atomic != 0.
+ */
 void *generic_pipe_buf_map(struct pipe_inode_info *pipe,
 			   struct pipe_buffer *buf, int atomic)
 {
@@ -175,6 +189,15 @@ void *generic_pipe_buf_map(struct pipe_inode_info *pipe,
 	return kmap(buf->page);
 }
 
+/**
+ * generic_pipe_buf_unmap - unmap a previously mapped pipe buffer
+ * @pipe:	the pipe that the buffer belongs to
+ * @buf:	the buffer that should be unmapped
+ * @map_data:	the data that the mapping function returned
+ *
+ * Description:
+ *	This function undoes the mapping that ->map() provided.
+ */
 void generic_pipe_buf_unmap(struct pipe_inode_info *pipe,
 			    struct pipe_buffer *buf, void *map_data)
 {
@@ -185,11 +208,28 @@ void generic_pipe_buf_unmap(struct pipe_inode_info *pipe,
 		kunmap(buf->page);
 }
 
+/**
+ * generic_pipe_buf_steal - attempt to take ownership of a @pipe_buffer
+ * @pipe:	the pipe that the buffer belongs to
+ * @buf:	the buffer to attempt to steal
+ *
+ * Description:
+ *	This function attempts to steal the @struct page attached to
+ *	@buf. If successful, this function returns 0 and returns with
+ *	the page locked. The caller may then reuse the page for whatever
+ *	he wishes, the typical use is insertion into a different file
+ *	page cache.
+ */
 int generic_pipe_buf_steal(struct pipe_inode_info *pipe,
 			   struct pipe_buffer *buf)
 {
 	struct page *page = buf->page;
 
+	/*
+	 * A reference of one is golden, that means that the owner of this
+	 * page is the only one holding a reference to it. lock the page
+	 * and return OK.
+	 */
 	if (page_count(page) == 1) {
 		lock_page(page);
 		return 0;
@@ -198,11 +238,30 @@ int generic_pipe_buf_steal(struct pipe_inode_info *pipe,
 	return 1;
 }
 
-void generic_pipe_buf_get(struct pipe_inode_info *info, struct pipe_buffer *buf)
+/**
+ * generic_pipe_buf_get - get a reference to a @struct pipe_buffer
+ * @pipe:	the pipe that the buffer belongs to
+ * @buf:	the buffer to get a reference to
+ *
+ * Description:
+ *	This function grabs an extra reference to @buf. It's used in
+ *	in the tee() system call, when we duplicate the buffers in one
+ *	pipe into another.
+ */
+void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
 {
 	page_cache_get(buf->page);
 }
 
+/**
+ * generic_pipe_buf_confirm - verify contents of the pipe buffer
+ * @pipe:	the pipe that the buffer belongs to
+ * @buf:	the buffer to confirm
+ *
+ * Description:
+ *	This function does nothing, because the generic pipe code uses
+ *	pages that are always good when inserted into the pipe.
+ */
 int generic_pipe_buf_confirm(struct pipe_inode_info *info,
 			     struct pipe_buffer *buf)
 {
diff --git a/fs/splice.c b/fs/splice.c
index b125a61..dd3be2c 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -85,6 +85,10 @@ static void page_cache_pipe_buf_release(struct pipe_inode_info *pipe,
 	buf->flags &= ~PIPE_BUF_FLAG_LRU;
 }
 
+/*
+ * Check whether the contents of buf is OK to access. Since the content
+ * is a page cache page, IO may be in flight.
+ */
 static int page_cache_pipe_buf_confirm(struct pipe_inode_info *pipe,
 				       struct pipe_buffer *buf)
 {
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index cc09fe8..333f389 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -17,6 +17,22 @@ struct pipe_buffer {
 	unsigned long private;
 };
 
+/**
+ *	struct pipe_inode_info - a linux kernel pipe
+ *	@wait: reader/writer wait point in case of empty/full pipe
+ *	@nrbufs: the number of non-empty pipe buffers in this pipe
+ *	@curbuf: the current pipe buffer entry
+ *	@tmp_page: cached released page
+ *	@readers: number of current readers of this pipe
+ *	@writers: number of current writers of this pipe
+ *	@waiting_writers: number of writers blocked waiting for room
+ *	@r_counter: reader counter
+ *	@w_counter: writer counter
+ *	@fasync_readers: reader side fasync
+ *	@fasync_writers: writer side fasync
+ *	@inode: inode this pipe is attached to
+ *	@bufs: the circular array of pipe buffers
+ **/
 struct pipe_inode_info {
 	wait_queue_head_t wait;
 	unsigned int nrbufs, curbuf;
@@ -43,15 +59,65 @@ struct pipe_inode_info {
  *	->unmap()
  *
  * That is, ->map() must be called on a confirmed buffer,
- * same goes for ->steal().
+ * same goes for ->steal(). See below for the meaning of each
+ * operation. Also see kerneldoc in fs/pipe.c for the pipe
+ * and generic variants of these hooks.
  */
 struct pipe_buf_operations {
+	/*
+	 * This is set to 1, if the generic pipe read/write may coalesce
+	 * data into an existing buffer. If this is set to 0, a new pipe
+	 * page segment is always used for new data.
+	 */
 	int can_merge;
+
+	/*
+	 * ->map() returns a virtual address mapping of the pipe buffer.
+	 * The last integer flag reflects whether this should be an atomic
+	 * mapping or not. The atomic map is faster, however you can't take
+	 * page faults before calling ->unmap() again. So if you need to eg
+	 * access user data through copy_to/from_user(), then you must get
+	 * a non-atomic map. ->map() uses the KM_USER0 atomic slot for
+	 * atomic maps, so you can't map more than one pipe_buffer at once
+	 * and you have to be careful if mapping another page as source
+	 * or destination for a copy (IOW, it has to use something else
+	 * than KM_USER0).
+	 */
 	void * (*map)(struct pipe_inode_info *, struct pipe_buffer *, int);
+
+	/*
+	 * Undoes ->map(), finishes the virtual mapping of the pipe buffer.
+	 */
 	void (*unmap)(struct pipe_inode_info *, struct pipe_buffer *, void *);
+
+	/*
+	 * ->confirm() verifies that the data in the pipe buffer is there
+	 * and that the contents are good. If the pages in the pipe belong
+	 * to a file system, we may need to wait for IO completion in this
+	 * hook. Returns 0 for good, or a negative error value in case of
+	 * error.
+	 */
 	int (*confirm)(struct pipe_inode_info *, struct pipe_buffer *);
+
+	/*
+	 * When the contents of this pipe buffer has been completely
+	 * consumed by a reader, ->release() is called.
+	 */
 	void (*release)(struct pipe_inode_info *, struct pipe_buffer *);
+
+	/*
+	 * Attempt to take ownership of the pipe buffer and its contents.
+	 * ->steal() returns 0 for success, in which case the contents
+	 * of the pipe (the buf->page) is locked and now completely owned
+	 * by the caller. The page may then be transferred to a different
+	 * mapping, the most often used case is insertion into different
+	 * file address space cache.
+	 */
 	int (*steal)(struct pipe_inode_info *, struct pipe_buffer *);
+
+	/*
+	 * Get a reference to the pipe buffer.
+	 */
 	void (*get)(struct pipe_inode_info *, struct pipe_buffer *);
 };
 

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of     splice actor
  2007-06-12 18:15                     ` Jens Axboe
@ 2007-06-12 18:43                       ` Andrew Morton
  2007-06-12 18:48                         ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2007-06-12 18:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, linux-kernel

On Tue, 12 Jun 2007 20:15:41 +0200
Jens Axboe <jens.axboe@oracle.com> wrote:

> On Tue, Jun 12 2007, Jens Axboe wrote:
> > On Tue, Jun 12 2007, Andrew Morton wrote:
> > > On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > > 
> > > > splice
> > > 
> > > btw, I'm staring in profound mystification at this:
> > > 
> > > int generic_pipe_buf_steal(struct pipe_inode_info *pipe,
> > > 			   struct pipe_buffer *buf)
> > > {
> > > 	struct page *page = buf->page;
> > > 
> > > 	if (page_count(page) == 1) {
> > > 		lock_page(page);
> > > 		return 0;
> > > 	}
> > > 
> > > 	return 1;
> > > }
> > > 
> > > 
> > > afacit that `if page_count(page)' test could be replaced by
> > > `if today_is_tuesday()'.  But then I don't have the foggiest idea
> > > what it's trying to do.
> > > 
> > > It would be nice to get some comments in and around here.
> > > 
> > > Also, I was trying to work out the role and responsibility of the ->pin
> > > callback, and gave up.
> > > 
> > > There isn't a lot of point in explaining this over email - one should be
> > > able to gain an understanding of these things by reading the code.  I think
> > > the best way of tackling this would be to comprehensively document
> > > pipe_buf_operations and pipe_inode_info, please...
> > 
> > OK so I wont explain it in detail here, I'll write up a good set of
> > comments tonight.
> 
> I'll merge this into the #splice branch.

Great, thanks.

> +/**
> + * generic_pipe_buf_steal - attempt to take ownership of a @pipe_buffer
> + * @pipe:	the pipe that the buffer belongs to
> + * @buf:	the buffer to attempt to steal
> + *
> + * Description:
> + *	This function attempts to steal the @struct page attached to
> + *	@buf. If successful, this function returns 0 and returns with
> + *	the page locked. The caller may then reuse the page for whatever
> + *	he wishes, the typical use is insertion into a different file
> + *	page cache.
> + */
>  int generic_pipe_buf_steal(struct pipe_inode_info *pipe,
>  			   struct pipe_buffer *buf)
>  {
>  	struct page *page = buf->page;
>  
> +	/*
> +	 * A reference of one is golden, that means that the owner of this
> +	 * page is the only one holding a reference to it. lock the page
> +	 * and return OK.
> +	 */
>  	if (page_count(page) == 1) {
>  		lock_page(page);
>  		return 0;

I still don't get this code.  I guess I should have asked for pipe_buffer
docs too ;)

What sorts of pages can find themselves inside a pipe_buffer, and which of
these types of pages is the above test detecting?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: splice: move balance_dirty_pages_ratelimited() outside of      splice actor
  2007-06-12 18:43                       ` Andrew Morton
@ 2007-06-12 18:48                         ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2007-06-12 18:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Peter Zijlstra, linux-kernel

On Tue, Jun 12 2007, Andrew Morton wrote:
> On Tue, 12 Jun 2007 20:15:41 +0200
> Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Tue, Jun 12 2007, Jens Axboe wrote:
> > > On Tue, Jun 12 2007, Andrew Morton wrote:
> > > > On Tue, 12 Jun 2007 14:44:50 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > 
> > > > > splice
> > > > 
> > > > btw, I'm staring in profound mystification at this:
> > > > 
> > > > int generic_pipe_buf_steal(struct pipe_inode_info *pipe,
> > > > 			   struct pipe_buffer *buf)
> > > > {
> > > > 	struct page *page = buf->page;
> > > > 
> > > > 	if (page_count(page) == 1) {
> > > > 		lock_page(page);
> > > > 		return 0;
> > > > 	}
> > > > 
> > > > 	return 1;
> > > > }
> > > > 
> > > > 
> > > > afacit that `if page_count(page)' test could be replaced by
> > > > `if today_is_tuesday()'.  But then I don't have the foggiest idea
> > > > what it's trying to do.
> > > > 
> > > > It would be nice to get some comments in and around here.
> > > > 
> > > > Also, I was trying to work out the role and responsibility of the ->pin
> > > > callback, and gave up.
> > > > 
> > > > There isn't a lot of point in explaining this over email - one should be
> > > > able to gain an understanding of these things by reading the code.  I think
> > > > the best way of tackling this would be to comprehensively document
> > > > pipe_buf_operations and pipe_inode_info, please...
> > > 
> > > OK so I wont explain it in detail here, I'll write up a good set of
> > > comments tonight.
> > 
> > I'll merge this into the #splice branch.
> 
> Great, thanks.
> 
> > +/**
> > + * generic_pipe_buf_steal - attempt to take ownership of a @pipe_buffer
> > + * @pipe:	the pipe that the buffer belongs to
> > + * @buf:	the buffer to attempt to steal
> > + *
> > + * Description:
> > + *	This function attempts to steal the @struct page attached to
> > + *	@buf. If successful, this function returns 0 and returns with
> > + *	the page locked. The caller may then reuse the page for whatever
> > + *	he wishes, the typical use is insertion into a different file
> > + *	page cache.
> > + */
> >  int generic_pipe_buf_steal(struct pipe_inode_info *pipe,
> >  			   struct pipe_buffer *buf)
> >  {
> >  	struct page *page = buf->page;
> >  
> > +	/*
> > +	 * A reference of one is golden, that means that the owner of this
> > +	 * page is the only one holding a reference to it. lock the page
> > +	 * and return OK.
> > +	 */
> >  	if (page_count(page) == 1) {
> >  		lock_page(page);
> >  		return 0;
> 
> I still don't get this code.  I guess I should have asked for pipe_buffer
> docs too ;)

I just love commenting stuff, so I'll treat you to a pipe_buffer
commentary as well :-)

> What sorts of pages can find themselves inside a pipe_buffer, and which of
> these types of pages is the above test detecting?

Any type of page, really. But for generic_pipe_buf_steal(), it's a page
allocated through alloc_page(). The ops must match the buf contents
(hence it's placed in the pipe_buffer, not the pipe itself). So if you
shoving different pages in there, then your steal function (and others)
must reflect that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-06-12 18:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200706112159.l5BLxF5x004043@hera.kernel.org>
2007-06-11 23:34 ` splice: move balance_dirty_pages_ratelimited() outside of splice actor Andrew Morton
2007-06-12  6:35   ` Jens Axboe
2007-06-12 11:20     ` Jens Axboe
2007-06-12 11:31       ` Jens Axboe
2007-06-12 12:06         ` Peter Zijlstra
2007-06-12 12:10           ` Jens Axboe
2007-06-12 12:16             ` Peter Zijlstra
2007-06-12 12:44               ` Jens Axboe
2007-06-12 16:02                 ` Andrew Morton
2007-06-12 16:46                 ` Andrew Morton
2007-06-12 17:32                   ` Jens Axboe
2007-06-12 18:15                     ` Jens Axboe
2007-06-12 18:43                       ` Andrew Morton
2007-06-12 18:48                         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).