[PATCH] AFS: Implement shared-writable mmap [try #2]

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] AFS: Implement shared-writable mmap [try #2]
@ 2007-05-16 10:02 David Howells
  2007-05-16 12:07 ` Nick Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: David Howells @ 2007-05-16 10:02 UTC (permalink / raw)
  To: akpm, nickpiggin; +Cc: linux-kernel, linux-fsdevel, dhowells

Implement shared-writable mmap for AFS.

The key with which to access the file is obtained from the VMA at the point
where the PTE is made writable by the page_mkwrite() VMA op and cached in the
affected page.

If there's an outstanding write on the page made with a different key, then
page_mkwrite() will flush it before attaching a record of the new key.

[try #2] Only flush the page if the page is still part of the mapping (truncate
may have discarded it).

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/file.c     |   22 ++++++++++++++++++++--
 fs/afs/internal.h |    1 +
 fs/afs/write.c    |   32 ++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 9c0e721..da2a18b 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -22,6 +22,7 @@ static int afs_readpage(struct file *file, struct page *page);
 static void afs_invalidatepage(struct page *page, unsigned long offset);
 static int afs_releasepage(struct page *page, gfp_t gfp_flags);
 static int afs_launder_page(struct page *page);
+static int afs_mmap(struct file *file, struct vm_area_struct *vma);
 
 const struct file_operations afs_file_operations = {
 	.open		= afs_open,
@@ -31,7 +32,7 @@ const struct file_operations afs_file_operations = {
 	.write		= do_sync_write,
 	.aio_read	= generic_file_aio_read,
 	.aio_write	= afs_file_write,
-	.mmap		= generic_file_readonly_mmap,
+	.mmap		= afs_mmap,
 	.sendfile	= generic_file_sendfile,
 	.fsync		= afs_fsync,
 };
@@ -54,6 +55,12 @@ const struct address_space_operations afs_fs_aops = {
 	.writepages	= afs_writepages,
 };
 
+static struct vm_operations_struct afs_file_vm_ops = {
+	.nopage		= filemap_nopage,
+	.populate	= filemap_populate,
+	.page_mkwrite	= afs_page_mkwrite,
+};
+
 /*
  * open an AFS file or directory and attach a key to it
  */
@@ -266,7 +273,6 @@ static void afs_invalidatepage(struct page *page, unsigned long offset)
 static int afs_launder_page(struct page *page)
 {
 	_enter("{%lu}", page->index);
-
 	return 0;
 }
 
@@ -293,3 +299,15 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 	_leave(" = 0");
 	return 0;
 }
+
+/*
+ * memory map part of an AFS file
+ */
+static int afs_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	_enter("");
+
+	file_accessed(file);
+	vma->vm_ops = &afs_file_vm_ops;
+	return 0;
+}
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 4953ba5..bf4ef07 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -709,6 +709,7 @@ extern ssize_t afs_file_write(struct kiocb *, const struct iovec *,
 			      unsigned long, loff_t);
 extern int afs_writeback_all(struct afs_vnode *);
 extern int afs_fsync(struct file *, struct dentry *, int);
+extern int afs_page_mkwrite(struct vm_area_struct *, struct page *);
 
 
 /*****************************************************************************/
diff --git a/fs/afs/write.c b/fs/afs/write.c
index a03b92a..208fbc0 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -174,6 +174,8 @@ static int afs_prepare_page(struct afs_vnode *vnode, struct page *page,
  * prepare to perform part of a write to a page
  * - the caller holds the page locked, preventing it from being written out or
  *   modified by anyone else
+ * - may be called from afs_page_mkwrite() to set up a page for modification
+ *   through shared-writable mmap
  */
 int afs_prepare_write(struct file *file, struct page *page,
 		      unsigned offset, unsigned to)
@@ -825,3 +827,33 @@ int afs_fsync(struct file *file, struct dentry *dentry, int datasync)
 	_leave(" = %d", ret);
 	return ret;
 }
+
+/*
+ * notification that a previously read-only page is about to become writable
+ * - if it returns an error, the caller will deliver a bus error signal
+ *
+ * we use this to make a record of the key with which the writeback should be
+ * performed and to flush any outstanding writes made with a different key
+ *
+ * the key to be used is attached to the file pinned by the VMA
+ */
+int afs_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	struct afs_vnode *vnode = AFS_FS_I(vma->vm_file->f_mapping->host);
+	struct key *key = vma->vm_file->private_data;
+	int ret;
+
+	_enter("{{%x:%u},%x},{%lx}",
+	       vnode->fid.vid, vnode->fid.vnode, key_serial(key), page->index);
+
+	lock_page(page);
+	if (page->mapping == vma->vm_file->f_mapping)
+		ret = afs_prepare_write(vma->vm_file, page, 0, 0);
+	else
+		ret = 0; /* seems truncate interfered - let the caller deal
+			  * with it (presumably the PTE changed too) */
+	unlock_page(page);
+
+	_leave(" = %d", ret);
+	return ret;
+}


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-16 10:02 [PATCH] AFS: Implement shared-writable mmap [try #2] David Howells
@ 2007-05-16 12:07 ` Nick Piggin
  2007-05-16 13:16   ` David Howells
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2007-05-16 12:07 UTC (permalink / raw)
  To: David Howells; +Cc: akpm, linux-kernel, linux-fsdevel

David Howells wrote:
> Implement shared-writable mmap for AFS.
> 
> The key with which to access the file is obtained from the VMA at the point
> where the PTE is made writable by the page_mkwrite() VMA op and cached in the
> affected page.
> 
> If there's an outstanding write on the page made with a different key, then
> page_mkwrite() will flush it before attaching a record of the new key.
> 
> [try #2] Only flush the page if the page is still part of the mapping (truncate
> may have discarded it).

Couple more issues...

> Signed-off-by: David Howells <dhowells@redhat.com>

> +/*
> + * notification that a previously read-only page is about to become writable
> + * - if it returns an error, the caller will deliver a bus error signal
> + *
> + * we use this to make a record of the key with which the writeback should be
> + * performed and to flush any outstanding writes made with a different key
> + *
> + * the key to be used is attached to the file pinned by the VMA
> + */
> +int afs_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> +{
> +	struct afs_vnode *vnode = AFS_FS_I(vma->vm_file->f_mapping->host);
> +	struct key *key = vma->vm_file->private_data;
> +	int ret;
> +
> +	_enter("{{%x:%u},%x},{%lx}",
> +	       vnode->fid.vid, vnode->fid.vnode, key_serial(key), page->index);
> +
> +	lock_page(page);
> +	if (page->mapping == vma->vm_file->f_mapping)
> +		ret = afs_prepare_write(vma->vm_file, page, 0, 0);

I would strongly suggest you used (0, PAGE_CACHE_SIZE) for the range, and
have your nopage function DTRT.

Minor issue: you can just check for `if (!page->mapping)` for truncation,
which is the usual signal to tell the reader you're checking for truncate.
Then you can remove the comment...


> +	else
> +		ret = 0; /* seems truncate interfered - let the caller deal
> +			  * with it (presumably the PTE changed too) */

Rather than add this (not always correct) comment about the VM workings, I'd
just add a directive in the page_mkwrite API documentation that the filesystem
is to return 0 if the page has been truncated.


> +	unlock_page(page);
> +
> +	_leave(" = %d", ret);
> +	return ret;
> +}

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-16 12:07 ` Nick Piggin
@ 2007-05-16 13:16   ` David Howells
  2007-05-16 13:32     ` Nick Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: David Howells @ 2007-05-16 13:16 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, linux-kernel, linux-fsdevel

Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> I would strongly suggest you used (0, PAGE_CACHE_SIZE) for the range,

That tells prepare_write() that the region to be modified is the whole page -
which is incorrect.  We're going to change a little bit of it.

Hmmm... Thinking about it again, I probably shouldn't be using
afs_prepare_write() at all.  afs_prepare_write() does two things:

 (1) Fills in the bits around the edges of the region to be modified if the
     page is not up to date.

 (2) Records the region of the page to be modified.

If afs_prepare_write() function is invoked by write(), then the region in (2)
is known, and thus the edges are known.

But if it's called from page_mkwrite(), we don't know where the edges are, so
we have to refresh the entire page if it's not up to date, and then we have to
record that the entire page needs writing back as we don't know which bits
have changed.

> and have your nopage function DTRT.

That assumes nopage() will be called in all cases - which it won't.

> Rather than add this (not always correct) comment about the VM workings, I'd
> just add a directive in the page_mkwrite API documentation that the filesystem
> is to return 0 if the page has been truncated.

I still want to put a comment in my code to say *why* I'm doing this.

> Minor issue: you can just check for `if (!page->mapping)` for truncation,
> which is the usual signal to tell the reader you're checking for truncate.

That's inconsistent with other core code, truncate_complete_page() for
example.

> Then you can remove the comment...

The comment stays, though I'm willing to improve it.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-16 13:16   ` David Howells
@ 2007-05-16 13:32     ` Nick Piggin
  2007-05-16 16:12       ` David Howells
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2007-05-16 13:32 UTC (permalink / raw)
  To: David Howells; +Cc: akpm, linux-kernel, linux-fsdevel

David Howells wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>I would strongly suggest you used (0, PAGE_CACHE_SIZE) for the range,
> 
> 
> That tells prepare_write() that the region to be modified is the whole page -
> which is incorrect.  We're going to change a little bit of it.

You don't know how much you're going to change, but it could be anything
in the range of 0 to PAGE_CACHE_SIZE. Clearly a (0, PAGE_CACHE_SIZE)
range is the right range to pass in.

> Hmmm... Thinking about it again, I probably shouldn't be using
> afs_prepare_write() at all.  afs_prepare_write() does two things:
> 
>  (1) Fills in the bits around the edges of the region to be modified if the
>      page is not up to date.
> 
>  (2) Records the region of the page to be modified.
> 
> If afs_prepare_write() function is invoked by write(), then the region in (2)
> is known, and thus the edges are known.
> 
> But if it's called from page_mkwrite(), we don't know where the edges are, so
> we have to refresh the entire page if it's not up to date, and then we have to
> record that the entire page needs writing back as we don't know which bits
> have changed.

Oh god you're doing ClearPageUptodate directly on pagecache pages?

In general (modulo bugs and crazy filesystems), you're not allowed to have
!uptodate pages mapped into user addresses because that implies the user
would be allowed to see garbage.

If you follow that rule, then you can never have a !uptodate page be passed
into page_mkwrite (unless it has just been truncated, in which case you
catch that case anyway).

>>and have your nopage function DTRT.
> 
> 
> That assumes nopage() will be called in all cases - which it won't.

No, just assumes that nopage brings the page uptodate and people use the
correct invalidation functions to invalidate pagecache, so you never have
!uptodate pages that are mapped.

>>Rather than add this (not always correct) comment about the VM workings, I'd
>>just add a directive in the page_mkwrite API documentation that the filesystem
>>is to return 0 if the page has been truncated.
> 
> 
> I still want to put a comment in my code to say *why* I'm doing this.

The one you put there was wrong.

>>Minor issue: you can just check for `if (!page->mapping)` for truncation,
>>which is the usual signal to tell the reader you're checking for truncate.
> 
> 
> That's inconsistent with other core code, truncate_complete_page() for
> example.

Your filesystem internally moves pages between mappings like tmpfs?

>>Then you can remove the comment...
> 
> 
> The comment stays, though I'm willing to improve it.

Well I don't really care, so I'll just concentrate on trying to get the
code fixed.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-16 13:32     ` Nick Piggin
@ 2007-05-16 16:12       ` David Howells
  2007-05-16 16:32         ` Nick Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: David Howells @ 2007-05-16 16:12 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, linux-kernel, linux-fsdevel

Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> In general (modulo bugs and crazy filesystems), you're not allowed to have
> !uptodate pages mapped into user addresses because that implies the user
> would be allowed to see garbage.

Ths situation I have to deal with is a tricky one.  Consider:

 (1) User A modifies a page with his key.  This change gets made in the
     pagecache, but is not written back immediately.

 (2) User B then wants to modify the same page, but with a different key.
     This means that afs_prepare_write() has to flush A's writes back to the
     server before B is permitted to write.

 (3) The flush fails because A is no longer permitted to write to that file.
     This means that the change in the page cache is now stale.  We can't just
     write it back as B because B didn't make the change.

What I've made afs_prepare_write() do in this situation is to nuke A's entire
write.  We can't write any of it back.  I can't call invalidate_inode_pages()
or similar because that might incorrectly kill one of B's writes (or someone
else's writes); besides, the on-server file hasn't changed.

To nuke A's write, each page that makes up that write is marked non-uptodate
and then reloaded.  Whilst I might wish to call invalidate_inode_pages_range(),
I can't as it can/would deadlock if called from prepare_write() in two
different ways.

> >>Minor issue: you can just check for `if (!page->mapping)` for truncation,
> >>which is the usual signal to tell the reader you're checking for truncate.
> >
> >
> > That's inconsistent with other core code, truncate_complete_page() for
> > example.
> 
> Your filesystem internally moves pages between mappings like tmpfs?

You misunderstand me.  truncate_complete_page() uses this:

	if (page->mapping != mapping)

not this:

	if (!page->mapping)

I think that both cases should work in page_mkwrite().  But !page->mapping does
not appear to be the "usual signal" from what I've seen.

However, that's a minor matter.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-16 16:12       ` David Howells
@ 2007-05-16 16:32         ` Nick Piggin
  2007-05-16 16:56           ` David Howells
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2007-05-16 16:32 UTC (permalink / raw)
  To: David Howells; +Cc: akpm, linux-kernel, linux-fsdevel

David Howells wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>In general (modulo bugs and crazy filesystems), you're not allowed to have
>>!uptodate pages mapped into user addresses because that implies the user
>>would be allowed to see garbage.
> 
> 
> Ths situation I have to deal with is a tricky one.  Consider:
> 
>  (1) User A modifies a page with his key.  This change gets made in the
>      pagecache, but is not written back immediately.
> 
>  (2) User B then wants to modify the same page, but with a different key.
>      This means that afs_prepare_write() has to flush A's writes back to the
>      server before B is permitted to write.
> 
>  (3) The flush fails because A is no longer permitted to write to that file.
>      This means that the change in the page cache is now stale.  We can't just
>      write it back as B because B didn't make the change.
> 
> What I've made afs_prepare_write() do in this situation is to nuke A's entire
> write.  We can't write any of it back.  I can't call invalidate_inode_pages()
> or similar because that might incorrectly kill one of B's writes (or someone
> else's writes); besides, the on-server file hasn't changed.

Why would that kill anyone's writes?


> To nuke A's write, each page that makes up that write is marked non-uptodate
> and then reloaded.  Whilst I might wish to call invalidate_inode_pages_range(),
> I can't as it can/would deadlock if called from prepare_write() in two
> different ways.

Which ways? Are you talking about prepare_write being called from page_mkwrite,
or anywhere?

More generally it sounds like a nasty thing to have a writeback cache if it can
become incoherent (due to dirty pages that subsequently cannot be written
back) without notification. Have you tried doing a write-through one?

You may be clearing PG_uptodate, but isn't there still an underlying problem
that you can have mappings to the page at that point? If that isn't a problem
for you, then I don't know why you would have to clear PG_uptodate at all.


>>>>Minor issue: you can just check for `if (!page->mapping)` for truncation,
>>>>which is the usual signal to tell the reader you're checking for truncate.
>>>
>>>
>>>That's inconsistent with other core code, truncate_complete_page() for
>>>example.
>>
>>Your filesystem internally moves pages between mappings like tmpfs?
> 
> 
> You misunderstand me.  truncate_complete_page() uses this:
> 
> 	if (page->mapping != mapping)
> 
> not this:
> 
> 	if (!page->mapping)
> 
> I think that both cases should work in page_mkwrite().  But !page->mapping does
> not appear to be the "usual signal" from what I've seen.

truncate_complete_page does that because it has to handle the case where
the mapping changes from one thing to something else that is non-NULL,
which tmpfs does.

This is not the case for most code in fs.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-16 16:32         ` Nick Piggin
@ 2007-05-16 16:56           ` David Howells
  2007-05-16 17:28             ` Nick Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: David Howells @ 2007-05-16 16:56 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, linux-kernel, linux-fsdevel

Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> > I can't call invalidate_inode_pages() or similar because that might
> > incorrectly kill one of B's writes (or someone else's writes); besides,
> > the on-server file hasn't changed.
> 
> Why would that kill anyone's writes?

Because invalidate_inode_pages() forcibly removes the dirty flag from each page
in the inode and then calls invalidatepage() - and thus they don't get written
back, but some of those pages may contain writes from other processes.  The
whole inode isn't owned by one user at a time.

I hadn't considered invalidate_inode_pages_range(), but that suffers from the
deadlock problem.

> > I can't as it can/would deadlock if called from prepare_write() in two
> > different ways.
> 
> Which ways? Are you talking about prepare_write being called from
> page_mkwrite, or anywhere?

 (1) prepare_write() is called with the target page locked and does not release
     the lock.  The truncation routines lock the page prior to invalidating it.
     Any truncation routine that skips locked pages is of no use.

 (2) Consider a run of pages that make up a single write by one user.  Two
     other writes from other users may be attempting to overwrite that run at
     the same time.  Each of them would need to invalidate the other's locked
     page(s).

Furthermore, the caller of prepare_write() probably won't take kindly to the
page it's dealing with being evicted from the pagecache.

> More generally it sounds like a nasty thing to have a writeback cache if it
> can become incoherent (due to dirty pages that subsequently cannot be
> written back) without notification. Have you tried doing a write-through
> one?

How do you do a write-through cache for shared-writable mmap?

> You may be clearing PG_uptodate, but isn't there still an underlying problem
> that you can have mappings to the page at that point? If that isn't a problem
> for you, then I don't know why you would have to clear PG_uptodate at all.

There might be, yes.  I guess I should ask the VM to nuke all PTEs to each of
these pages too.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-16 16:56           ` David Howells
@ 2007-05-16 17:28             ` Nick Piggin
  2007-05-16 17:46               ` David Howells
  2007-05-16 18:45               ` David Howells
  0 siblings, 2 replies; 15+ messages in thread
From: Nick Piggin @ 2007-05-16 17:28 UTC (permalink / raw)
  To: David Howells; +Cc: akpm, linux-kernel, linux-fsdevel

David Howells wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>>I can't call invalidate_inode_pages() or similar because that might
>>>incorrectly kill one of B's writes (or someone else's writes); besides,
>>>the on-server file hasn't changed.
>>
>>Why would that kill anyone's writes?
> 
> 
> Because invalidate_inode_pages() forcibly removes the dirty flag from each page

It had better not. We use that sucker to nuke pagecache when we're trying to
reclaim inodes, for example...


>>>I can't as it can/would deadlock if called from prepare_write() in two
>>>different ways.
>>
>>Which ways? Are you talking about prepare_write being called from
>>page_mkwrite, or anywhere?
> 
> 
>  (1) prepare_write() is called with the target page locked and does not release
>      the lock.  The truncation routines lock the page prior to invalidating it.
>      Any truncation routine that skips locked pages is of no use.

You can drop the lock, do the invalidation, and return AOP_TRUNCATED_PAGE. The
new aops patches will provide a better solution, but that will work today.


>  (2) Consider a run of pages that make up a single write by one user.  Two
>      other writes from other users may be attempting to overwrite that run at
>      the same time.  Each of them would need to invalidate the other's locked
>      page(s).

See #1.


> Furthermore, the caller of prepare_write() probably won't take kindly to the
> page it's dealing with being evicted from the pagecache.

It's fine if you return AOP_TRUNCATED_PAGE.


>>More generally it sounds like a nasty thing to have a writeback cache if it
>>can become incoherent (due to dirty pages that subsequently cannot be
>>written back) without notification. Have you tried doing a write-through
>>one?
> 
> 
> How do you do a write-through cache for shared-writable mmap?

I just mean more generally. simple write(2) writes, for starters.

For shared writable mmap? I don't know... does POSIX require mmap data
to be coherent with read(2)/write(2)? ;)


>>You may be clearing PG_uptodate, but isn't there still an underlying problem
>>that you can have mappings to the page at that point? If that isn't a problem
>>for you, then I don't know why you would have to clear PG_uptodate at all.
> 
> 
> There might be, yes.  I guess I should ask the VM to nuke all PTEs to each of
> these pages too.

That's what the invalidate / truncate routines do.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-16 17:28             ` Nick Piggin
@ 2007-05-16 17:46               ` David Howells
  2007-05-16 17:59                 ` Nick Piggin
  2007-05-16 18:45               ` David Howells
  1 sibling, 1 reply; 15+ messages in thread
From: David Howells @ 2007-05-16 17:46 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, linux-kernel, linux-fsdevel

Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> > How do you do a write-through cache for shared-writable mmap?
> 
> For shared writable mmap? I don't know...

You can't do write-through caching for shared-writable mmap because the writes
go directly into the pagecache once the page is made writable, at least, short
of instruction emulation.

At some point in the future we'll be asked to turf the data back to the
server.

> does POSIX require mmap data to be coherent with read(2)/write(2)? ;)

I suspect so, but I don't know offhand.  I want it to be coherent anyway,
otherwise it's inconsistent with OpenAFS and Arla (or at least more so).

Note also that the choice of write-through or write-back caching also has
implications for local on-disk caching of modified data and disconnected
operation.

> I just mean more generally. simple write(2) writes, for starters.

Given that writing through an mmap'd section is write-back by its very
nature[*] and so since I have to do write-back anyway, why consider doing
write-through too?

[*] Unless we want to do instruction emulation or single stepping.

Hmmm... I wonder if O_SYNC should use write-through caching.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-16 17:46               ` David Howells
@ 2007-05-16 17:59                 ` Nick Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2007-05-16 17:59 UTC (permalink / raw)
  To: David Howells; +Cc: akpm, linux-kernel, linux-fsdevel

So did we just get your issues sorted? I _think_ *snip* is the
Howells code for "OK", but I can never be sure ;)

FWIW, as a rule, ClearPageUptodate should never be done by anyone,
least of all a filesystem on regular file pagecache. I need to go
through and audit this stuff... but so much backlog :P

David Howells wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>>How do you do a write-through cache for shared-writable mmap?
>>
>>For shared writable mmap? I don't know...

Anyway, *snip* the side discussion.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-16 17:28             ` Nick Piggin
  2007-05-16 17:46               ` David Howells
@ 2007-05-16 18:45               ` David Howells
  2007-05-17  6:39                 ` Nick Piggin
  1 sibling, 1 reply; 15+ messages in thread
From: David Howells @ 2007-05-16 18:45 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, linux-kernel, linux-fsdevel

Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> You can drop the lock, do the invalidation,

Hmmm...  There's a danger of incurring a race by doing that.  Consider two
processes both trying to write to a dirty page for which writeback will be
rejected:

 (1) The first process gets EKEYREJECTED from the server, drops its lock and
     is then preempted.

 (2) The second process gets EKEYREJECTED from the server, drops its lock,
     truncates the page, reloads the page and modifies it.

 (3) The first process resumes and truncates the page, thereby splatting the
     second process's write.

Or:

 (1) The first process gets EKEYREJECTED from the server, clears the writeback
     information from the page, drops its lock and is then preempted.

 (2) The second process attaches its own writeback information to the page and
     modifies it.

 (3) The first process resumes and truncates the page, thereby splatting the
     second process's write.

Really, what I want to do is pass the page lock to truncate to deal with.
Better still, I want truncate to be selective, based on whether or not a page
is still associated with the rejected writeback.  I wonder if I should call
truncate_complete_page() or invalidate_complete_page() directly.

What might help is that PG_writeback is set on all pages in the rejected run,
even those that are unlocked.

Network filesystems are such fun:-)

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-16 18:45               ` David Howells
@ 2007-05-17  6:39                 ` Nick Piggin
  2007-05-17 12:30                   ` David Howells
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2007-05-17  6:39 UTC (permalink / raw)
  To: David Howells; +Cc: akpm, linux-kernel, linux-fsdevel

David Howells wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>You can drop the lock, do the invalidation,
> 
> 
> Hmmm...  There's a danger of incurring a race by doing that.  Consider two
> processes both trying to write to a dirty page for which writeback will be
> rejected:
> 
>  (1) The first process gets EKEYREJECTED from the server, drops its lock and
>      is then preempted.
> 
>  (2) The second process gets EKEYREJECTED from the server, drops its lock,
>      truncates the page, reloads the page and modifies it.
> 
>  (3) The first process resumes and truncates the page, thereby splatting the
>      second process's write.
> 
> Or:
> 
>  (1) The first process gets EKEYREJECTED from the server, clears the writeback
>      information from the page, drops its lock and is then preempted.
> 
>  (2) The second process attaches its own writeback information to the page and
>      modifies it.
> 
>  (3) The first process resumes and truncates the page, thereby splatting the
>      second process's write.

If there are race issues involving concurrent invalidations, then you'd
fix that up by taking a filesystem specific lock to prevent them.

Generic write path should be holding i_mutex, but I don't think you can
take that from page_mkwrite... Just add one of your own.


> Really, what I want to do is pass the page lock to truncate to deal with.
> Better still, I want truncate to be selective, based on whether or not a page
> is still associated with the rejected writeback.  I wonder if I should call
> truncate_complete_page() or invalidate_complete_page() directly.

No, you shouldn't. We could theoretically introduce a new API for this,
but I think it would be preferable if you can fix the race in the fs.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-17  6:39                 ` Nick Piggin
@ 2007-05-17 12:30                   ` David Howells
  2007-05-17 17:46                     ` David Howells
  2007-05-18  2:29                     ` Nick Piggin
  0 siblings, 2 replies; 15+ messages in thread
From: David Howells @ 2007-05-17 12:30 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, linux-kernel, linux-fsdevel

Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> No, you shouldn't. We could theoretically introduce a new API for this,
> but I think it would be preferable if you can fix the race in the fs.

Actually, I might be able to do better.

When making a StoreData call to the AFS server, I send all the parameters
first, and at that point, the server will abort it, I think, if permission is
not available, and won't wait for the payload to be delivered.

So if I tell AF_RXRPC to send the parameter data with an explicit ACK request
and then wait till it's either hard-ACK'd or aborted, I should then be able to
deal with the permissions failure at a state where I have locked *all* the
pages to be sent.

At that point, I should be able to tell truncate to simple discard all these
locked pages.

How's that sound?

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-17 12:30                   ` David Howells
@ 2007-05-17 17:46                     ` David Howells
  2007-05-18  2:29                     ` Nick Piggin
  1 sibling, 0 replies; 15+ messages in thread
From: David Howells @ 2007-05-17 17:46 UTC (permalink / raw)
  Cc: Nick Piggin, akpm, linux-kernel, linux-fsdevel

David Howells <dhowells@redhat.com> wrote:

> When making a StoreData call to the AFS server, I send all the parameters
> first, and at that point, the server will abort it, I think, if permission is
> not available, and won't wait for the payload to be delivered.

Make that "should abort it".  openafs-1.4.2 seems to let you execute a
StoreData, even if you don't have permission to do so.  I can tell I don't
have permission to do so because the reply from the StoreData op says so...

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] AFS: Implement shared-writable mmap [try #2]
  2007-05-17 12:30                   ` David Howells
  2007-05-17 17:46                     ` David Howells
@ 2007-05-18  2:29                     ` Nick Piggin
  1 sibling, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2007-05-18  2:29 UTC (permalink / raw)
  To: David Howells; +Cc: akpm, linux-kernel, linux-fsdevel

David Howells wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>No, you shouldn't. We could theoretically introduce a new API for this,
>>but I think it would be preferable if you can fix the race in the fs.
> 
> 
> Actually, I might be able to do better.
> 
> When making a StoreData call to the AFS server, I send all the parameters
> first, and at that point, the server will abort it, I think, if permission is
> not available, and won't wait for the payload to be delivered.
> 
> So if I tell AF_RXRPC to send the parameter data with an explicit ACK request
> and then wait till it's either hard-ACK'd or aborted, I should then be able to
> deal with the permissions failure at a state where I have locked *all* the
> pages to be sent.
> 
> At that point, I should be able to tell truncate to simple discard all these
> locked pages.
> 
> How's that sound?

Truncate as it stands still needs to be given unlocked pages. So we would
either have to create a new API, or I think preferably it would be nice if
you could see if you can first solve it with a private lock?

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-05-18  2:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-16 10:02 [PATCH] AFS: Implement shared-writable mmap [try #2] David Howells
2007-05-16 12:07 ` Nick Piggin
2007-05-16 13:16   ` David Howells
2007-05-16 13:32     ` Nick Piggin
2007-05-16 16:12       ` David Howells
2007-05-16 16:32         ` Nick Piggin
2007-05-16 16:56           ` David Howells
2007-05-16 17:28             ` Nick Piggin
2007-05-16 17:46               ` David Howells
2007-05-16 17:59                 ` Nick Piggin
2007-05-16 18:45               ` David Howells
2007-05-17  6:39                 ` Nick Piggin
2007-05-17 12:30                   ` David Howells
2007-05-17 17:46                     ` David Howells
2007-05-18  2:29                     ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox