* Re: [RFC][PATCH] Interface to invalidate regions of mmaps
2003-05-13 22:21 ` Andrew Morton
@ 2003-05-13 22:43 ` Paul E. McKenney
2003-05-13 23:11 ` Zach Brown
1 sibling, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2003-05-13 22:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, mjbligh
On Tue, May 13, 2003 at 03:21:41PM -0700, Andrew Morton wrote:
> "Paul E. McKenney" <paulmck@us.ibm.com> wrote:
> >
> > This patch adds an API to allow networked and distributed filesystems
> > to invalidate portions of (or all of) a file. This is needed to
> > provide POSIX or near-POSIX semantics in such filesystems, as
> > discussed on LKML late last year:
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=103609089604576&w=2
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=103167761917669&w=2
> >
> > Thoughts?
>
> What filesystems would be needing this, and when could we see live code
> which actually uses it?
Working on getting it out... But I suspect that others need
this functionality as well, given the threads noted above.
> > +/*
> > + * Helper function for invalidate_mmap_range().
> > + * Both hba and hlen are page numbers in PAGE_SIZE units.
> > + */
> > +static void
> > +invalidate_mmap_range_list(struct list_head *head,
> > + unsigned long const hba,
> > + unsigned long const hlen)
>
> Be nice to consolidate this with vmtruncate_list, so that it gets
> exercised.
Good point from both you and wli -- here is the updated vmtruncate
patch (now depends on the invalidate_mmap_range patch).
Thanx, Paul
diff -urN -X dontdiff linux-2.5.69.invalidate_mmap_range/mm/memory.c linux-2.5.69.vmtruncate/mm/memory.c
--- linux-2.5.69.invalidate_mmap_range/mm/memory.c Tue May 13 14:56:41 2003
+++ linux-2.5.69.vmtruncate/mm/memory.c Tue May 13 15:19:23 2003
@@ -1063,6 +1063,7 @@
/*
* Helper function for invalidate_mmap_range().
* Both hba and hlen are page numbers in PAGE_SIZE units.
+ * An hlen of zero blows away the entire portion file after hba.
*/
static void
invalidate_mmap_range_list(struct list_head *head,
@@ -1078,6 +1079,8 @@
unsigned long zea;
hea = hba + hlen - 1; /* avoid overflow. */
+ if (hea < hba)
+ hea = ULONG_MAX;
list_for_each(curr, head) {
vp = list_entry(curr, struct vm_area_struct, shared);
vba = vp->vm_pgoff;
@@ -1128,37 +1131,6 @@
up(&mapping->i_shared_sem);
}
-static void vmtruncate_list(struct list_head *head, unsigned long pgoff)
-{
- unsigned long start, end, len, diff;
- struct vm_area_struct *vma;
- struct list_head *curr;
-
- list_for_each(curr, head) {
- vma = list_entry(curr, struct vm_area_struct, shared);
- start = vma->vm_start;
- end = vma->vm_end;
- len = end - start;
-
- /* mapping wholly truncated? */
- if (vma->vm_pgoff >= pgoff) {
- zap_page_range(vma, start, len);
- continue;
- }
-
- /* mapping wholly unaffected? */
- len = len >> PAGE_SHIFT;
- diff = pgoff - vma->vm_pgoff;
- if (diff >= len)
- continue;
-
- /* Ok, partially affected.. */
- start += diff << PAGE_SHIFT;
- len = (len - diff) << PAGE_SHIFT;
- zap_page_range(vma, start, len);
- }
-}
-
/*
* Handle all mappings that got truncated by a "truncate()"
* system call.
@@ -1176,17 +1148,12 @@
if (inode->i_size < offset)
goto do_expand;
inode->i_size = offset;
+ pgoff = (offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
down(&mapping->i_shared_sem);
- if (list_empty(&mapping->i_mmap) && list_empty(&mapping->i_mmap_shared))
- goto out_unlock;
-
- pgoff = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
- if (!list_empty(&mapping->i_mmap))
- vmtruncate_list(&mapping->i_mmap, pgoff);
- if (!list_empty(&mapping->i_mmap_shared))
- vmtruncate_list(&mapping->i_mmap_shared, pgoff);
-
-out_unlock:
+ if (unlikely(!list_empty(&mapping->i_mmap)))
+ invalidate_mmap_range_list(&mapping->i_mmap, pgoff, 0);
+ if (unlikely(!list_empty(&mapping->i_mmap_shared)))
+ invalidate_mmap_range_list(&mapping->i_mmap_shared, pgoff, 0);
up(&mapping->i_shared_sem);
truncate_inode_pages(mapping, offset);
goto out_truncate;
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [RFC][PATCH] Interface to invalidate regions of mmaps
2003-05-13 22:21 ` Andrew Morton
2003-05-13 22:43 ` Paul E. McKenney
@ 2003-05-13 23:11 ` Zach Brown
2003-05-13 23:19 ` Andrew Morton
2003-05-13 23:26 ` William Lee Irwin III
1 sibling, 2 replies; 7+ messages in thread
From: Zach Brown @ 2003-05-13 23:11 UTC (permalink / raw)
To: Andrew Morton; +Cc: paulmck, linux-kernel, linux-mm, mjbligh
Andrew Morton wrote:
> What filesystems would be needing this, and when could we see live code
> which actually uses it?
on the one hand, lustre would very much like something like this. our
posix IO guarantees are centered around a DLM that knows about file
extents and the presence of pages in the page cache is tied to holding
these locks. its very common for us to get a lock cancelation which
invalidates a region of a file that falls in the middle of what is cached.
worse still, our (possibly gi-normous) files are backed by striping the
file across multiple storage targets and the locks live on these
targets. if you imagine a file that is built by alternating 64k-wide
stripes across 4 targets, we can get a lock cancelation that invalidates
pages at offset 0->15, 64->79,128->143, and so on.
so what we'd like most is the ability to invalidate a region of the file
in an efficient go.
void truncate_inode_pages(struct address_space * mapping, loff_t lstart,
loff_t end)
that sort of thing. this might not suck so bad if the page cache was an
rbtree :) in any case, what we've been doing so far is tracking dirty
page offsets in our own rbtree thing in lustre and calling
truncate_complete_page for these offsets as locks are canceled. (our
locks are page-aligned, so we don't worry so much about partial page
pain in these particular paths).
but on the other hand, this doesn't solve another problem we have with
opportunistic lock extents and sparse page cache populations. Ideally
we'd like a FS specific pointer in struct page so we can associate pages
in the cache with a lock, but I can't imagine suggesting such a thing
within earshot of wli. so we'd still have to track the dirty offsets to
avoid having to pass through offsets 0 ... i_size only to find that one
page in the 8T file that was cached.
https://lxr.lustre.org/source/llite/file.c?v=b_devel#602
is the most relevant part of the story.
- z
^ permalink raw reply [flat|nested] 7+ messages in thread