Re: [RFC][PATCH] Interface to invalidate regions of mmaps

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [RFC][PATCH] Interface to invalidate regions of mmaps
       [not found] <20030513133636.C2929@us.ibm.com>
@ 2003-05-13 22:00 ` William Lee Irwin III
  2003-05-13 22:21 ` Andrew Morton
  1 sibling, 0 replies; 7+ messages in thread
From: William Lee Irwin III @ 2003-05-13 22:00 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel, linux-mm, akpm, mjbligh

On Tue, May 13, 2003 at 01:36:36PM -0700, Paul E. McKenney wrote:
> This patch adds an API to allow networked and distributed filesystems
> to invalidate portions of (or all of) a file.  This is needed to 
> provide POSIX or near-POSIX semantics in such filesystems, as
> discussed on LKML late last year:
> 	http://marc.theaimsgroup.com/?l=linux-kernel&m=103609089604576&w=2
> 	http://marc.theaimsgroup.com/?l=linux-kernel&m=103167761917669&w=2

It looks possible to consolidate this with the internals of vmtruncate()
by passing in the maximum value representable by loff_t as the length.


-- wli

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] Interface to invalidate regions of mmaps
       [not found] <20030513133636.C2929@us.ibm.com>
  2003-05-13 22:00 ` [RFC][PATCH] Interface to invalidate regions of mmaps William Lee Irwin III
@ 2003-05-13 22:21 ` Andrew Morton
  2003-05-13 22:43   ` Paul E. McKenney
  2003-05-13 23:11   ` Zach Brown
  1 sibling, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2003-05-13 22:21 UTC (permalink / raw)
  To: paulmck; +Cc: linux-kernel, linux-mm, mjbligh

"Paul E. McKenney" <paulmck@us.ibm.com> wrote:
>
> This patch adds an API to allow networked and distributed filesystems
> to invalidate portions of (or all of) a file.  This is needed to 
> provide POSIX or near-POSIX semantics in such filesystems, as
> discussed on LKML late last year:
> 
> 	http://marc.theaimsgroup.com/?l=linux-kernel&m=103609089604576&w=2
> 	http://marc.theaimsgroup.com/?l=linux-kernel&m=103167761917669&w=2
> 
> Thoughts?

What filesystems would be needing this, and when could we see live code
which actually uses it?

> +/*
> + * Helper function for invalidate_mmap_range().
> + * Both hba and hlen are page numbers in PAGE_SIZE units.
> + */
> +static void 
> +invalidate_mmap_range_list(struct list_head *head,
> +			   unsigned long const hba,
> +			   unsigned long const hlen)

Be nice to consolidate this with vmtruncate_list, so that it gets
exercised.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] Interface to invalidate regions of mmaps
  2003-05-13 22:21 ` Andrew Morton
@ 2003-05-13 22:43   ` Paul E. McKenney
  2003-05-13 23:11   ` Zach Brown
  1 sibling, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2003-05-13 22:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, mjbligh

On Tue, May 13, 2003 at 03:21:41PM -0700, Andrew Morton wrote:
> "Paul E. McKenney" <paulmck@us.ibm.com> wrote:
> >
> > This patch adds an API to allow networked and distributed filesystems
> > to invalidate portions of (or all of) a file.  This is needed to 
> > provide POSIX or near-POSIX semantics in such filesystems, as
> > discussed on LKML late last year:
> > 
> > 	http://marc.theaimsgroup.com/?l=linux-kernel&m=103609089604576&w=2
> > 	http://marc.theaimsgroup.com/?l=linux-kernel&m=103167761917669&w=2
> > 
> > Thoughts?
> 
> What filesystems would be needing this, and when could we see live code
> which actually uses it?

Working on getting it out...  But I suspect that others need
this functionality as well, given the threads noted above.

> > +/*
> > + * Helper function for invalidate_mmap_range().
> > + * Both hba and hlen are page numbers in PAGE_SIZE units.
> > + */
> > +static void 
> > +invalidate_mmap_range_list(struct list_head *head,
> > +			   unsigned long const hba,
> > +			   unsigned long const hlen)
> 
> Be nice to consolidate this with vmtruncate_list, so that it gets
> exercised.

Good point from both you and wli -- here is the updated vmtruncate
patch (now depends on the invalidate_mmap_range patch).

						Thanx, Paul

diff -urN -X dontdiff linux-2.5.69.invalidate_mmap_range/mm/memory.c linux-2.5.69.vmtruncate/mm/memory.c
--- linux-2.5.69.invalidate_mmap_range/mm/memory.c	Tue May 13 14:56:41 2003
+++ linux-2.5.69.vmtruncate/mm/memory.c	Tue May 13 15:19:23 2003
@@ -1063,6 +1063,7 @@
 /*
  * Helper function for invalidate_mmap_range().
  * Both hba and hlen are page numbers in PAGE_SIZE units.
+ * An hlen of zero blows away the entire portion file after hba.
  */
 static void 
 invalidate_mmap_range_list(struct list_head *head,
@@ -1078,6 +1079,8 @@
 	unsigned long zea;
 
 	hea = hba + hlen - 1;	/* avoid overflow. */
+	if (hea < hba)
+		hea = ULONG_MAX;
 	list_for_each(curr, head) {
 		vp = list_entry(curr, struct vm_area_struct, shared);
 		vba = vp->vm_pgoff;
@@ -1128,37 +1131,6 @@
 	up(&mapping->i_shared_sem);
 }       
 
-static void vmtruncate_list(struct list_head *head, unsigned long pgoff)
-{
-	unsigned long start, end, len, diff;
-	struct vm_area_struct *vma;
-	struct list_head *curr;
-
-	list_for_each(curr, head) {
-		vma = list_entry(curr, struct vm_area_struct, shared);
-		start = vma->vm_start;
-		end = vma->vm_end;
-		len = end - start;
-
-		/* mapping wholly truncated? */
-		if (vma->vm_pgoff >= pgoff) {
-			zap_page_range(vma, start, len);
-			continue;
-		}
-
-		/* mapping wholly unaffected? */
-		len = len >> PAGE_SHIFT;
-		diff = pgoff - vma->vm_pgoff;
-		if (diff >= len)
-			continue;
-
-		/* Ok, partially affected.. */
-		start += diff << PAGE_SHIFT;
-		len = (len - diff) << PAGE_SHIFT;
-		zap_page_range(vma, start, len);
-	}
-}
-
 /*
  * Handle all mappings that got truncated by a "truncate()"
  * system call.
@@ -1176,17 +1148,12 @@
 	if (inode->i_size < offset)
 		goto do_expand;
 	inode->i_size = offset;
+	pgoff = (offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	down(&mapping->i_shared_sem);
-	if (list_empty(&mapping->i_mmap) && list_empty(&mapping->i_mmap_shared))
-		goto out_unlock;
-
-	pgoff = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	if (!list_empty(&mapping->i_mmap))
-		vmtruncate_list(&mapping->i_mmap, pgoff);
-	if (!list_empty(&mapping->i_mmap_shared))
-		vmtruncate_list(&mapping->i_mmap_shared, pgoff);
-
-out_unlock:
+	if (unlikely(!list_empty(&mapping->i_mmap)))
+		invalidate_mmap_range_list(&mapping->i_mmap, pgoff, 0);
+	if (unlikely(!list_empty(&mapping->i_mmap_shared)))
+		invalidate_mmap_range_list(&mapping->i_mmap_shared, pgoff, 0);
 	up(&mapping->i_shared_sem);
 	truncate_inode_pages(mapping, offset);
 	goto out_truncate;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] Interface to invalidate regions of mmaps
  2003-05-13 22:21 ` Andrew Morton
  2003-05-13 22:43   ` Paul E. McKenney
@ 2003-05-13 23:11   ` Zach Brown
  2003-05-13 23:19     ` Andrew Morton
  2003-05-13 23:26     ` William Lee Irwin III
  1 sibling, 2 replies; 7+ messages in thread
From: Zach Brown @ 2003-05-13 23:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: paulmck, linux-kernel, linux-mm, mjbligh

Andrew Morton wrote:

> What filesystems would be needing this, and when could we see live code
> which actually uses it?

on the one hand, lustre would very much like something like this.  our
posix IO guarantees are centered around a DLM that knows about file
extents and the presence of pages in the page cache is tied to holding
these locks.  its very common for us to get a lock cancelation which
invalidates a region of a file that falls in the middle of what is cached.

worse still, our (possibly gi-normous) files are backed by striping the
file across multiple storage targets and the locks live on these
targets.  if you imagine a file that is built by alternating 64k-wide
stripes across 4 targets, we can get a lock cancelation that invalidates
pages at offset 0->15, 64->79,128->143, and so on.

so what we'd like most is the ability to invalidate a region of the file
in an efficient go.

void truncate_inode_pages(struct address_space * mapping, loff_t lstart,
loff_t end)

that sort of thing.  this might not suck so bad if the page cache was an
rbtree :)   in any case, what we've been doing so far is tracking dirty
page offsets in our own rbtree thing in lustre and calling
truncate_complete_page for these offsets as locks are canceled.  (our
locks are page-aligned, so we don't worry so much about partial page
pain in these particular paths).

but on the other hand, this doesn't solve another problem we have with
opportunistic lock extents and sparse page cache populations.  Ideally
we'd like a FS specific pointer in struct page so we can associate pages
in the cache with a lock, but I can't imagine suggesting such a thing
within earshot of wli.  so we'd still have to track the dirty offsets to
avoid having to pass through offsets 0 ... i_size only to find that one
page in the 8T file that was cached.

	https://lxr.lustre.org/source/llite/file.c?v=b_devel#602

is the most relevant part of the story.

- z

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] Interface to invalidate regions of mmaps
  2003-05-13 23:11   ` Zach Brown
@ 2003-05-13 23:19     ` Andrew Morton
  2003-05-13 23:57       ` Zach Brown
  2003-05-13 23:26     ` William Lee Irwin III
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2003-05-13 23:19 UTC (permalink / raw)
  To: Zach Brown; +Cc: paulmck, linux-kernel, linux-mm, mjbligh

Zach Brown <zab@zabbo.net> wrote:
>
> so what we'd like most is the ability to invalidate a region of the file
> in an efficient go.
> 
> void truncate_inode_pages(struct address_space * mapping, loff_t lstart,
> loff_t end)
> 
> that sort of thing.

That's trivial in 2.5.

>  this might not suck so bad if the page cache was an
> rbtree :)

Or a radix tree.

> but on the other hand, this doesn't solve another problem we have with
> opportunistic lock extents and sparse page cache populations.  Ideally
> we'd like a FS specific pointer in struct page so we can associate pages
> in the cache with a lock,

In 2.5, page->buffers was abstracted out to page->private, and is available
to filesystems for functions such as this.


> but I can't imagine suggesting such a thing
> within earshot of wli. 

wli doesn't have to run your kernel.  If you want to add a pointer to the
pageframe, go add it.  But I'd suggest that you do it with a view to
migrating it to page->private.

When you finally decide to do your development in a development kernel ;)



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] Interface to invalidate regions of mmaps
  2003-05-13 23:19     ` Andrew Morton
@ 2003-05-13 23:57       ` Zach Brown
  0 siblings, 0 replies; 7+ messages in thread
From: Zach Brown @ 2003-05-13 23:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: paulmck, linux-kernel, linux-mm, mjbligh


> In 2.5, page->buffers was abstracted out to page->private, and is available
> to filesystems for functions such as this.

that's great news!

> When you finally decide to do your development in a development kernel ;)

customers seem to have the strangest aversion to  development kernels :)

but, yeah, I should be doing 2.5 work soon and will holler if
simplifications make themselves apparent.

- z


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] Interface to invalidate regions of mmaps
  2003-05-13 23:11   ` Zach Brown
  2003-05-13 23:19     ` Andrew Morton
@ 2003-05-13 23:26     ` William Lee Irwin III
  1 sibling, 0 replies; 7+ messages in thread
From: William Lee Irwin III @ 2003-05-13 23:26 UTC (permalink / raw)
  To: Zach Brown; +Cc: Andrew Morton, paulmck, linux-kernel, linux-mm, mjbligh

On Tue, May 13, 2003 at 04:11:31PM -0700, Zach Brown wrote:
> but on the other hand, this doesn't solve another problem we have with
> opportunistic lock extents and sparse page cache populations.  Ideally
> we'd like a FS specific pointer in struct page so we can associate pages
> in the cache with a lock, but I can't imagine suggesting such a thing
> within earshot of wli.  so we'd still have to track the dirty offsets to
> avoid having to pass through offsets 0 ... i_size only to find that one
> page in the 8T file that was cached.

Nah, don't worry about sizeof(struct page) anymore; I'll just jack up
PAGE_SIZE to compensate.


-- wli

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-05-13 23:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20030513133636.C2929@us.ibm.com>
2003-05-13 22:00 ` [RFC][PATCH] Interface to invalidate regions of mmaps William Lee Irwin III
2003-05-13 22:21 ` Andrew Morton
2003-05-13 22:43   ` Paul E. McKenney
2003-05-13 23:11   ` Zach Brown
2003-05-13 23:19     ` Andrew Morton
2003-05-13 23:57       ` Zach Brown
2003-05-13 23:26     ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox