All of lore.kernel.org
 help / color / mirror / Atom feed
From: Omar Sandoval <osandov@osandov.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	David Sterba <dsterba@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO
Date: Thu, 18 Dec 2014 22:24:05 -0800	[thread overview]
Message-ID: <20141219062405.GA11486@mew> (raw)
In-Reply-To: <20141217220313.GK22149@ZenIV.linux.org.uk>

On Wed, Dec 17, 2014 at 10:03:13PM +0000, Al Viro wrote:
> On Wed, Dec 17, 2014 at 10:52:56AM -0800, Christoph Hellwig wrote:
> > On Wed, Dec 17, 2014 at 06:58:32AM -0800, Omar Sandoval wrote:
> > > See my previous message. If we use O_DIRECT on the original open, then
> > > filesystems that implement bmap but not direct_IO will no longer work.
> > > These are the ones that I found in my tree:
> > 
> > In the long run I don't think they are worth keeping.  But to keep you
> > out of that discussion you can just try an open without O_DIRECT if the
> > open with the flag failed.
> 
> Umm...  That's one possibility, of course (and if swapon(2) is on someone's
> hotpath, I really would like to see what the hell they are doing - it has
> to be interesting in a sick way).

If this is the approach you'd prefer, I'll go ahead and do that for v2.
I personally think it looks pretty kludgey, but I'm fine either way:

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 63f55cc..c1b3073 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2379,7 +2379,16 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
                name = NULL;
                goto bad_swap;
        }
-       swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0);
+       swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_DIRECT, 0);
+       if (IS_ERR(swap_file) && PTR_ERR(swap_file) == -EINVAL)
+               swap_file = file_open_name(name, O_RDWR | O_LARGEFILE, 0);
        if (IS_ERR(swap_file)) {
                error = PTR_ERR(swap_file);
                swap_file = NULL;

> BTW, speaking of read/write vs. swap - what's the story with e.g. AFS
> write() checking IS_SWAPFILE() and failing with -EBUSY?  Note that
> 	* it's done before acquiring i_mutex, so it isn't race-free
> 	* it's dubious from the POSIX POV - EBUSY isn't in the error
> list for write(2).
> 	* other filesystems generally don't have anything of that sort.
> NFS does, but local ones do not...
> Besides, do we even allow swapfiles on AFS?

AFS doesn't implement ->bmap or ->swap_activate, so that code is dead,
probably cargo-culted from the NFS code. It seems pretty pointless, not
only because it's inconsistent with the local filesystems like you
mentioned, but also because it's trivial to bypass with O_DIRECT on NFS:

ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
{
	struct file *file = iocb->ki_filp;
	struct inode *inode = file_inode(file);
	unsigned long written = 0;
	ssize_t result;
	size_t count = iov_iter_count(from);
	loff_t pos = iocb->ki_pos;

	result = nfs_key_timeout_notify(file, inode);
	if (result)
		return result;

	if (file->f_flags & O_DIRECT)
		return nfs_file_direct_write(iocb, from, pos);

	dprintk("NFS: write(%pD2, %zu@%Ld)\n",
		file, count, (long long) pos);

	result = -EBUSY;
	if (IS_SWAPFILE(inode))
		goto out_swapfile;

I think it's safe to scrap that code. However, this also led me to find that
NFS doesn't prevent truncates on an active swapfile. I'm submitting a patch for
that now.

-- 
Omar

WARNING: multiple messages have this Message-ID (diff)
From: Omar Sandoval <osandov@osandov.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	David Sterba <dsterba@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO
Date: Thu, 18 Dec 2014 22:24:05 -0800	[thread overview]
Message-ID: <20141219062405.GA11486@mew> (raw)
In-Reply-To: <20141217220313.GK22149@ZenIV.linux.org.uk>

On Wed, Dec 17, 2014 at 10:03:13PM +0000, Al Viro wrote:
> On Wed, Dec 17, 2014 at 10:52:56AM -0800, Christoph Hellwig wrote:
> > On Wed, Dec 17, 2014 at 06:58:32AM -0800, Omar Sandoval wrote:
> > > See my previous message. If we use O_DIRECT on the original open, then
> > > filesystems that implement bmap but not direct_IO will no longer work.
> > > These are the ones that I found in my tree:
> > 
> > In the long run I don't think they are worth keeping.  But to keep you
> > out of that discussion you can just try an open without O_DIRECT if the
> > open with the flag failed.
> 
> Umm...  That's one possibility, of course (and if swapon(2) is on someone's
> hotpath, I really would like to see what the hell they are doing - it has
> to be interesting in a sick way).

If this is the approach you'd prefer, I'll go ahead and do that for v2.
I personally think it looks pretty kludgey, but I'm fine either way:

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 63f55cc..c1b3073 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2379,7 +2379,16 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
                name = NULL;
                goto bad_swap;
        }
-       swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0);
+       swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_DIRECT, 0);
+       if (IS_ERR(swap_file) && PTR_ERR(swap_file) == -EINVAL)
+               swap_file = file_open_name(name, O_RDWR | O_LARGEFILE, 0);
        if (IS_ERR(swap_file)) {
                error = PTR_ERR(swap_file);
                swap_file = NULL;

> BTW, speaking of read/write vs. swap - what's the story with e.g. AFS
> write() checking IS_SWAPFILE() and failing with -EBUSY?  Note that
> 	* it's done before acquiring i_mutex, so it isn't race-free
> 	* it's dubious from the POSIX POV - EBUSY isn't in the error
> list for write(2).
> 	* other filesystems generally don't have anything of that sort.
> NFS does, but local ones do not...
> Besides, do we even allow swapfiles on AFS?

AFS doesn't implement ->bmap or ->swap_activate, so that code is dead,
probably cargo-culted from the NFS code. It seems pretty pointless, not
only because it's inconsistent with the local filesystems like you
mentioned, but also because it's trivial to bypass with O_DIRECT on NFS:

ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
{
	struct file *file = iocb->ki_filp;
	struct inode *inode = file_inode(file);
	unsigned long written = 0;
	ssize_t result;
	size_t count = iov_iter_count(from);
	loff_t pos = iocb->ki_pos;

	result = nfs_key_timeout_notify(file, inode);
	if (result)
		return result;

	if (file->f_flags & O_DIRECT)
		return nfs_file_direct_write(iocb, from, pos);

	dprintk("NFS: write(%pD2, %zu@%Ld)\n",
		file, count, (long long) pos);

	result = -EBUSY;
	if (IS_SWAPFILE(inode))
		goto out_swapfile;

I think it's safe to scrap that code. However, this also led me to find that
NFS doesn't prevent truncates on an active swapfile. I'm submitting a patch for
that now.

-- 
Omar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-12-19  6:24 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-15  5:26 [PATCH 0/8] clean up and generalize swap-over-NFS Omar Sandoval
2014-12-15  5:26 ` Omar Sandoval
2014-12-15  5:26 ` [PATCH 1/8] nfs: follow direct I/O write locking convention Omar Sandoval
2014-12-15  5:26   ` Omar Sandoval
2014-12-15 12:49   ` Trond Myklebust
2014-12-15 12:49     ` Trond Myklebust
2014-12-15 15:42     ` Omar Sandoval
2014-12-15 15:42       ` Omar Sandoval
2014-12-15  5:26 ` [PATCH 2/8] swap: lock i_mutex for swap_writepage direct_IO Omar Sandoval
2014-12-15  5:26   ` Omar Sandoval
2014-12-15 16:27   ` Jan Kara
2014-12-15 16:27     ` Jan Kara
2014-12-15 16:56     ` Christoph Hellwig
2014-12-15 16:56       ` Christoph Hellwig
2014-12-15 22:11       ` Omar Sandoval
2014-12-15 22:11         ` Omar Sandoval
2014-12-16  8:35         ` Christoph Hellwig
2014-12-16  8:35           ` Christoph Hellwig
2014-12-16  8:35           ` Christoph Hellwig
2014-12-16  8:56           ` Omar Sandoval
2014-12-16  8:56             ` Omar Sandoval
2014-12-17  8:06             ` Christoph Hellwig
2014-12-17  8:06               ` Christoph Hellwig
2014-12-17  8:20               ` Al Viro
2014-12-17  8:20                 ` Al Viro
2014-12-17  8:24                 ` Christoph Hellwig
2014-12-17  8:24                   ` Christoph Hellwig
2014-12-17 14:58                   ` Omar Sandoval
2014-12-17 14:58                     ` Omar Sandoval
2014-12-17 18:52                     ` Christoph Hellwig
2014-12-17 18:52                       ` Christoph Hellwig
2014-12-17 22:03                       ` Al Viro
2014-12-17 22:03                         ` Al Viro
2014-12-19  6:24                         ` Omar Sandoval [this message]
2014-12-19  6:24                           ` Omar Sandoval
2014-12-19  6:28                           ` Al Viro
2014-12-19  6:28                             ` Al Viro
2014-12-19  6:28                             ` Al Viro
2014-12-20  6:51       ` Al Viro
2014-12-20  6:51         ` Al Viro
2014-12-22  7:26         ` Omar Sandoval
2014-12-22  7:26           ` Omar Sandoval
2014-12-23  9:37         ` Christoph Hellwig
2014-12-23  9:37           ` Christoph Hellwig
2014-12-23  9:37           ` Christoph Hellwig
2014-12-15  5:26 ` [PATCH 3/8] swap: don't add ITER_BVEC flag to direct_IO rw Omar Sandoval
2014-12-15  5:26   ` Omar Sandoval
2014-12-15  6:16   ` Al Viro
2014-12-15  6:16     ` Al Viro
2014-12-15 15:57     ` Omar Sandoval
2014-12-15 15:57       ` Omar Sandoval
2014-12-15  5:26 ` [PATCH 4/8] iov_iter: add iov_iter_bvec and convert callers Omar Sandoval
2014-12-15  5:26   ` Omar Sandoval
2014-12-15  5:26 ` [PATCH 5/8] direct-io: don't dirty ITER_BVEC pages on read Omar Sandoval
2014-12-15  5:26   ` Omar Sandoval
2014-12-15  5:27 ` [PATCH 6/8] nfs: don't dirty ITER_BVEC pages read through direct I/O Omar Sandoval
2014-12-15  5:27   ` Omar Sandoval
2014-12-15  6:17   ` Al Viro
2014-12-15  6:17     ` Al Viro
2014-12-15  5:27 ` [PATCH 7/8] swap: use direct I/O for SWP_FILE swap_readpage Omar Sandoval
2014-12-15  5:27   ` Omar Sandoval
2014-12-15  5:27 ` [PATCH 8/8] vfs: update swap_{,de}activate documentation Omar Sandoval
2014-12-15  5:27   ` Omar Sandoval

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141219062405.GA11486@mew \
    --to=osandov@osandov.com \
    --cc=akpm@linux-foundation.org \
    --cc=dsterba@suse.cz \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.