From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Jeff Layton <jlayton@samba.org>
Cc: Boaz Harrosh <bharrosh@panasas.com>,
axboe@kernel.dk, lucho@ionkov.net, tytso@mit.edu,
sage@inktank.com, ericvh@gmail.com, mfasheh@suse.com,
dedekind1@gmail.com, adrian.hunter@intel.com,
dhowells@redhat.com, sfrench@samba.org, jlbec@evilplan.org,
rminnich@sandia.gov, linux-cifs@vger.kernel.org, jack@suse.cz,
martin.petersen@oracle.com, neilb@suse.de, david@fromorbit.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-mtd@lists.infradead.org, linux-fsdevel@vger.kernel.org,
v9fs-developer@lists.sourceforge.net, ceph-devel@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-afs@lists.infradead.org,
ocfs2-devel@oss.oracle.com
Subject: Re: [PATCH 3/3] fs: Fix remaining filesystems to wait for stable page writeback
Date: Thu, 1 Nov 2012 15:47:30 -0700 [thread overview]
Message-ID: <20121101224730.GJ19591@blackbox.djwong.org> (raw)
In-Reply-To: <20121101162254.03dbbd9a@tlielax.poochiereds.net>
On Thu, Nov 01, 2012 at 04:22:54PM -0400, Jeff Layton wrote:
> On Thu, 1 Nov 2012 11:43:26 -0700
> Boaz Harrosh <bharrosh@panasas.com> wrote:
>
> > On 11/01/2012 12:58 AM, Darrick J. Wong wrote:
> > > Fix up the filesystems that provide their own ->page_mkwrite handlers to
> > > provide stable page writes if necessary.
> > >
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > > fs/9p/vfs_file.c | 1 +
> > > fs/afs/write.c | 4 ++--
> > > fs/ceph/addr.c | 1 +
> > > fs/cifs/file.c | 1 +
> > > fs/ocfs2/mmap.c | 1 +
> > > fs/ubifs/file.c | 4 ++--
> > > 6 files changed, 8 insertions(+), 4 deletions(-)
> > >
> > >
> > > diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
> > > index c2483e9..aa253f0 100644
> > > --- a/fs/9p/vfs_file.c
> > > +++ b/fs/9p/vfs_file.c
> > > @@ -620,6 +620,7 @@ v9fs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > > lock_page(page);
> > > if (page->mapping != inode->i_mapping)
> > > goto out_unlock;
> > > + wait_on_stable_page_write(page);
> > >
> >
> > Good god thanks, yes please ;-)
> >
> > > return VM_FAULT_LOCKED;
> > > out_unlock:
> > > diff --git a/fs/afs/write.c b/fs/afs/write.c
> > > index 9aa52d9..39eb2a4 100644
> > > --- a/fs/afs/write.c
> > > +++ b/fs/afs/write.c
> > > @@ -758,7 +758,7 @@ int afs_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> >
> > afs, is it not a network filesystem? which means that it has it's own emulated none-block-device
> > BDI, registered internally. So if you do need stable pages someone should call
> > bdi_require_stable_pages()
> >
> > But again since it is a network filesystem I don't see how it is needed, and/or it might be
> > taken care of already.
> >
> > > #ifdef CONFIG_AFS_FSCACHE
> > > fscache_wait_on_page_write(vnode->cache, page);
> > > #endif
> > > -
> > > + wait_on_stable_page_write(page);
> > > _leave(" = 0");
> > > - return 0;
> > > + return VM_FAULT_LOCKED;
> > > }
> > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> >
> > CEPH for sure has it's own "emulated none-block-device BDI". This one is also
> > a pure networking filesystem.
> >
> > And it already does what it needs to do with wait_on_writeback().
> >
> > So i do not think you should touch CEPH
> >
> > > index 6690269..e9734bf 100644
> > > --- a/fs/ceph/addr.c
> > > +++ b/fs/ceph/addr.c
> > > @@ -1208,6 +1208,7 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > > set_page_dirty(page);
> > > up_read(&mdsc->snap_rwsem);
> > > ret = VM_FAULT_LOCKED;
> > > + wait_on_stable_page_write(page);
> > > } else {
> > > if (ret == -ENOMEM)
> > > ret = VM_FAULT_OOM;
> > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> >
> > Cifs also self-BDI network filesystem, but
> >
> > > index edb25b4..a8770bf 100644
> > > --- a/fs/cifs/file.c
> > > +++ b/fs/cifs/file.c
> > > @@ -2997,6 +2997,7 @@ cifs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> > > struct page *page = vmf->page;
> > >
> > > lock_page(page);
> >
> > It waits by locking the page, that's cifs naive way of waiting for writeback
> >
> > > + wait_on_stable_page_write(page);
> >
> > Instead it could do better and not override page_mkwrite at all, and all it needs
> > to do is call bdi_require_stable_pages() at it's own registered BDI
> >
>
> Hmm...I don't know...
>
> I've never been crazy about using the page lock for this, but in the
> absence of a better way to guarantee stable pages, it was what I ended
> up with at the time. cifs_writepages will hold the page lock until
> kernel_sendmsg returns. At that point the TCP layer will have copied
> off the page data so it's safe to release it.
>
> With this change though, we're going to end up blocking until the
> writeback flag clears, right? And I think that will happen when the
> reply comes in? So, we'll end up blocking for much longer than is
> really necessary in page_mkwrite with this change.
That's a very good point to make-- network FSes can stop the stable-waiting
after the request is sent. Can I interest you in a new page flag (PG_stable)
that indicates when a page has to be held for stable write? Along with a
modification to wait_on_stable_page_write that uses the new PG_stable flag
instead of just writeback? Then, you can clear PG_stable right after the
sendmsg() and release the page for further activity without having to overload
the page lock.
I wrote a patch that does exactly that as part of my work to defer the
integrity checksumming until the last possible instant. However, I haven't
gotten that part to work yet, so I left the PG_stable patch out of this
submission. On the other hand, it sounds like you could use it.
--D
>
> --
> Jeff Layton <jlayton@samba.org>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-11-01 22:47 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-01 7:58 [RFC PATCH v2 0/3] mm/fs: Implement faster stable page writes on filesystems Darrick J. Wong
2012-11-01 7:58 ` [PATCH 1/3] bdi: Track users that require stable page writes Darrick J. Wong
2012-11-01 13:31 ` Jan Kara
2012-11-01 18:21 ` Boaz Harrosh
2012-11-01 18:57 ` Darrick J. Wong
2012-11-01 22:56 ` Boaz Harrosh
[not found] ` <5092FE22.30906-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org>
2012-11-01 23:15 ` Jan Kara
2012-11-01 7:58 ` [PATCH 2/3] mm: Only enforce stable page writes if the backing device requires it Darrick J. Wong
[not found] ` <20121101075821.16153.38301.stgit-yuuUpGxbzT9UbpRmUfBrXUB+6BGkLq7r@public.gmane.org>
2012-11-01 13:28 ` Jan Kara
2012-11-01 7:58 ` [PATCH 3/3] fs: Fix remaining filesystems to wait for stable page writeback Darrick J. Wong
[not found] ` <20121101075829.16153.92036.stgit-yuuUpGxbzT9UbpRmUfBrXUB+6BGkLq7r@public.gmane.org>
2012-11-01 12:36 ` Jan Kara
2012-11-01 18:43 ` Boaz Harrosh
2012-11-01 20:22 ` Jeff Layton
2012-11-01 22:23 ` Boaz Harrosh
2012-11-01 22:47 ` Darrick J. Wong [this message]
2012-11-02 0:36 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121101224730.GJ19591@blackbox.djwong.org \
--to=darrick.wong@oracle.com \
--cc=adrian.hunter@intel.com \
--cc=axboe@kernel.dk \
--cc=bharrosh@panasas.com \
--cc=ceph-devel@vger.kernel.org \
--cc=david@fromorbit.com \
--cc=dedekind1@gmail.com \
--cc=dhowells@redhat.com \
--cc=ericvh@gmail.com \
--cc=jack@suse.cz \
--cc=jlayton@samba.org \
--cc=jlbec@evilplan.org \
--cc=linux-afs@lists.infradead.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-mtd@lists.infradead.org \
--cc=lucho@ionkov.net \
--cc=martin.petersen@oracle.com \
--cc=mfasheh@suse.com \
--cc=neilb@suse.de \
--cc=ocfs2-devel@oss.oracle.com \
--cc=rminnich@sandia.gov \
--cc=sage@inktank.com \
--cc=sfrench@samba.org \
--cc=tytso@mit.edu \
--cc=v9fs-developer@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).