All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Becker <Joel.Becker@oracle.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32
Date: Thu, 17 Sep 2009 18:43:33 -0700	[thread overview]
Message-ID: <20090918014333.GD15620@mail.oracle.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0909170856050.4950@localhost.localdomain>

On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote:
> Why would anybody want to hide it at all? Why even the libc hiding?
> 
> Nobody is going to use this except for special apps. Let them see what 
> they can do, in all its glory. 

	I expect everyone will use this through cp(1), so that cp(1) can
try to get server-side copy on the network filesystms.
	Speaking of "all its glory", what we have now is:

int sys_copyfileat(int oldfd, const char *oldname, int newfd,
                   const char *newname, int flags, int atflags)

> So I'd suggest something like having two system calls: one to start the 
> operation, and one to control it. And for a filesystem that does atomic 
> copies, the 'start' one obviously would also finish it, so the 'control' 
> it would be a no-op, because there would never be any outstanding ones.
> 
> See what I'm saying? It wouldn't complicate _your_ life, but it would 
> allow for filesystems that can't do it atomically (or even quickly).
> 
> So the first one would be something like
> 
> 	int copyfile(const char *src, const char *dest, unsigned long flags);
> 
> which would return:
> 
>  - zero on success
>  - negative (with errno) on error
>  - positive cookie on "I started it, here's my cookie". For extra bonus 
>    points, maybe the cookie would actually be a file descriptor (for 
>    poll/select users), but it would _not_ be a file descriptor to the 
>    resulting _file_, it would literally be a "cookie" to the actual 
>    copyfile event.

	Actually, if the cookie is a magic file descriptor, you don't
need ctl.  You can play tricks like polling for completoin,
read(magic_fd, &remain, sizeof(loff_t)) for status, and close(magic_fd)
for cancel.  Might be a bit overloaded, though.

> and then for ocfs2 you'd never return positive cookies. You'd never have 
> to worry about it.

	I suspect we'll later take advantage of copyfile's other
modes.  I did reflink as reflink only for the simple fact of doing one
thing and well, not because I think copyfile isn't good.

> Then the second interface would be something like
> 
> 	int copyfile_ctrl(long cookie, unsigned long cmd);
> 
> where you'd just have some way to wait for completion and ask how much has 
> been copied. The 'cmd' would be some set of 'cancel', 'status' or 
> 'uninterruptible wait' or whatever, and the return value would again be
> 
>  - negative (with errno) for errors (copy failed) - cookie released
>  - zero for 'done' - cookie released
>  - positive for 'percent remaining' or whatever - cookie still valid
> 
> and this would be another callback into the filesystem code, but you'd 
> never have to worry about it, since you'd never see it (just leave it 
> NULL).

	I was going to ask about how to fit both calls into one inode
operation, but I see you're giving this as an additional inode
operation.
	This leaves us with a simliar-to-reflink inode copyfile op and a
control op:

    ->copyfile(old_dentry, dir_inode, new_dentry, flags)
    ->copyfile_ctl(int cookie, unsigned int cmd)

	I have to change the flags a little, as my original proposal
didn't handle backoff correctly.

#define COPYFILE_WAIT		0x0001	/* Block until complete */
#define COPYFILE_ATOMIC		0x0002	/* Things copied must be
					   point-in-time and it must
					   fail or succeed completely. */
#define COPYFILE_ALLOW_COW	0x0004	/* The filesystem may share data
					   extents between the source
					   and target in a Copy-on-Write
					   fashion.  If neither
					   COPYFILE_ALLOW_COW nor
					   COPYFILE_REQUIRE_COW are
					   specified, data extents must
					   NOT be shared.  When neither
					   COW flag is provided, most
					   filesystems should return
					   -ENOTSUPP, as userspace can
					   do read-write looping
					   itself */
#define COPYFILE_REQUIRE_COW	0x0008	/* Data extents MUST be shared
					   between the source and target
					   in a Copy-on-Write fashion */
#define COPYFILE_UNPRIV_ATTRS	0x0010	/* Unprivileged attributes
					   should be copied from the
					   source to the target */
#define COPYFILE_PRIV_ATTRS	0x0020	/* Privileged attributes should
					   be copied from the source to
					   the target if the caller has
					   the necessary privileges */
#define COPYFILE_REQUIRE_ATTRS	0x0040	/* Combined with the other
					   attribute flags, the call
					   MUST fail if the caller lacks
					   the necessary privileges to
					   copy ever attribute
					   requested */

#define COPYFILE_SNAPSHOT_ASYNC	(COPYFILE_REQUIRE_COW |
				 COPYFILE_UNPRIV_ATTRS |
				 COPYFILE_PRIV_ATTRS |
				 COPYFILE_ATOMIC)
#define COPYFILE_SNAPSHOT_STRICT_ASYNC	(COPYFILE_SNAPSHOT_ASYNC |
					 COPYFILE_REQUIRE_ATTRS)
#define COPYFILE_SNAPSHOT	(COPYFILE_SNAPSHOT_ASYNC |
				 COPYFILE_WAIT)
#define COPYFILE_SNAPSHOT_STRICT	(COPYFILE_SNAPSHOT_STRICT_ASYNC |
					 COPYFILE_WAIT)

> I dunno. The above seems like a fairly simple and powerful interface, and 
> I _think_ it would be ok for NFS and CIFS. And in fact, if that whole 
> "background copy" ends up being used a lot, maybe even a local filesystem 
> would implement it just to get easy overlapping IO - even if it would just 
> be a trivial common wrapper function that says "start a thread to do a 
> trivial manual copy".

	NFS and CIFS folks, please speak up.

Joel

-- 

"There is no more evil thing on earth than race prejudice, none at 
 all.  I write deliberately -- it is the worst single thing in life 
 now.  It justifies and holds together more baseness, cruelty and
 abomination than any other sort of error in the world." 
        - H. G. Wells

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

WARNING: multiple messages have this Message-ID (diff)
From: Joel Becker <Joel.Becker@oracle.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32
Date: Thu, 17 Sep 2009 18:43:33 -0700	[thread overview]
Message-ID: <20090918014333.GD15620@mail.oracle.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0909170856050.4950@localhost.localdomain>

On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote:
> Why would anybody want to hide it at all? Why even the libc hiding?
> 
> Nobody is going to use this except for special apps. Let them see what 
> they can do, in all its glory. 

	I expect everyone will use this through cp(1), so that cp(1) can
try to get server-side copy on the network filesystms.
	Speaking of "all its glory", what we have now is:

int sys_copyfileat(int oldfd, const char *oldname, int newfd,
                   const char *newname, int flags, int atflags)

> So I'd suggest something like having two system calls: one to start the 
> operation, and one to control it. And for a filesystem that does atomic 
> copies, the 'start' one obviously would also finish it, so the 'control' 
> it would be a no-op, because there would never be any outstanding ones.
> 
> See what I'm saying? It wouldn't complicate _your_ life, but it would 
> allow for filesystems that can't do it atomically (or even quickly).
> 
> So the first one would be something like
> 
> 	int copyfile(const char *src, const char *dest, unsigned long flags);
> 
> which would return:
> 
>  - zero on success
>  - negative (with errno) on error
>  - positive cookie on "I started it, here's my cookie". For extra bonus 
>    points, maybe the cookie would actually be a file descriptor (for 
>    poll/select users), but it would _not_ be a file descriptor to the 
>    resulting _file_, it would literally be a "cookie" to the actual 
>    copyfile event.

	Actually, if the cookie is a magic file descriptor, you don't
need ctl.  You can play tricks like polling for completoin,
read(magic_fd, &remain, sizeof(loff_t)) for status, and close(magic_fd)
for cancel.  Might be a bit overloaded, though.

> and then for ocfs2 you'd never return positive cookies. You'd never have 
> to worry about it.

	I suspect we'll later take advantage of copyfile's other
modes.  I did reflink as reflink only for the simple fact of doing one
thing and well, not because I think copyfile isn't good.

> Then the second interface would be something like
> 
> 	int copyfile_ctrl(long cookie, unsigned long cmd);
> 
> where you'd just have some way to wait for completion and ask how much has 
> been copied. The 'cmd' would be some set of 'cancel', 'status' or 
> 'uninterruptible wait' or whatever, and the return value would again be
> 
>  - negative (with errno) for errors (copy failed) - cookie released
>  - zero for 'done' - cookie released
>  - positive for 'percent remaining' or whatever - cookie still valid
> 
> and this would be another callback into the filesystem code, but you'd 
> never have to worry about it, since you'd never see it (just leave it 
> NULL).

	I was going to ask about how to fit both calls into one inode
operation, but I see you're giving this as an additional inode
operation.
	This leaves us with a simliar-to-reflink inode copyfile op and a
control op:

    ->copyfile(old_dentry, dir_inode, new_dentry, flags)
    ->copyfile_ctl(int cookie, unsigned int cmd)

	I have to change the flags a little, as my original proposal
didn't handle backoff correctly.

#define COPYFILE_WAIT		0x0001	/* Block until complete */
#define COPYFILE_ATOMIC		0x0002	/* Things copied must be
					   point-in-time and it must
					   fail or succeed completely. */
#define COPYFILE_ALLOW_COW	0x0004	/* The filesystem may share data
					   extents between the source
					   and target in a Copy-on-Write
					   fashion.  If neither
					   COPYFILE_ALLOW_COW nor
					   COPYFILE_REQUIRE_COW are
					   specified, data extents must
					   NOT be shared.  When neither
					   COW flag is provided, most
					   filesystems should return
					   -ENOTSUPP, as userspace can
					   do read-write looping
					   itself */
#define COPYFILE_REQUIRE_COW	0x0008	/* Data extents MUST be shared
					   between the source and target
					   in a Copy-on-Write fashion */
#define COPYFILE_UNPRIV_ATTRS	0x0010	/* Unprivileged attributes
					   should be copied from the
					   source to the target */
#define COPYFILE_PRIV_ATTRS	0x0020	/* Privileged attributes should
					   be copied from the source to
					   the target if the caller has
					   the necessary privileges */
#define COPYFILE_REQUIRE_ATTRS	0x0040	/* Combined with the other
					   attribute flags, the call
					   MUST fail if the caller lacks
					   the necessary privileges to
					   copy ever attribute
					   requested */

#define COPYFILE_SNAPSHOT_ASYNC	(COPYFILE_REQUIRE_COW |
				 COPYFILE_UNPRIV_ATTRS |
				 COPYFILE_PRIV_ATTRS |
				 COPYFILE_ATOMIC)
#define COPYFILE_SNAPSHOT_STRICT_ASYNC	(COPYFILE_SNAPSHOT_ASYNC |
					 COPYFILE_REQUIRE_ATTRS)
#define COPYFILE_SNAPSHOT	(COPYFILE_SNAPSHOT_ASYNC |
				 COPYFILE_WAIT)
#define COPYFILE_SNAPSHOT_STRICT	(COPYFILE_SNAPSHOT_STRICT_ASYNC |
					 COPYFILE_WAIT)

> I dunno. The above seems like a fairly simple and powerful interface, and 
> I _think_ it would be ok for NFS and CIFS. And in fact, if that whole 
> "background copy" ends up being used a lot, maybe even a local filesystem 
> would implement it just to get easy overlapping IO - even if it would just 
> be a trivial common wrapper function that says "start a thread to do a 
> trivial manual copy".

	NFS and CIFS folks, please speak up.

Joel

-- 

"There is no more evil thing on earth than race prejudice, none at 
 all.  I write deliberately -- it is the worst single thing in life 
 now.  It justifies and holds together more baseness, cruelty and
 abomination than any other sort of error in the world." 
        - H. G. Wells

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

  parent reply	other threads:[~2009-09-18  1:43 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-11 20:04 [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32 Joel Becker
2009-09-11 20:04 ` Joel Becker
2009-09-14 21:32 ` [Ocfs2-devel] " Linus Torvalds
2009-09-14 21:32   ` Linus Torvalds
2009-09-14 22:14   ` [Ocfs2-devel] " Joel Becker
2009-09-14 22:14     ` Joel Becker
2009-09-14 23:27     ` [Ocfs2-devel] " Linus Torvalds
2009-09-14 23:27       ` Linus Torvalds
2009-09-15  0:04       ` [Ocfs2-devel] " Joel Becker
2009-09-15  0:04         ` Joel Becker
2009-09-15  0:31         ` [Ocfs2-devel] " Linus Torvalds
2009-09-15  0:31           ` Linus Torvalds
2009-09-15  0:54           ` [Ocfs2-devel] " Joel Becker
2009-09-15  0:54             ` Joel Becker
2009-09-15  2:01             ` [Ocfs2-devel] " Linus Torvalds
2009-09-15  2:01               ` Linus Torvalds
2009-09-15  4:05               ` [Ocfs2-devel] " Arjan van de Ven
2009-09-15  4:05                 ` Arjan van de Ven
2009-09-15  4:35                 ` [Ocfs2-devel] " Joel Becker
2009-09-15  4:35                   ` Joel Becker
2009-09-15  4:06               ` [Ocfs2-devel] " Joel Becker
2009-09-15  4:06                 ` Joel Becker
2009-09-15 16:30                 ` [Ocfs2-devel] " Linus Torvalds
2009-09-15 16:30                   ` Linus Torvalds
2009-09-15 21:45                   ` [Ocfs2-devel] " Joel Becker
2009-09-15 21:45                     ` Joel Becker
2009-09-16  4:20                     ` [Ocfs2-devel] " Linus Torvalds
2009-09-16  4:20                       ` Linus Torvalds
2009-09-16  4:40                       ` [Ocfs2-devel] " Joel Becker
2009-09-16  4:40                         ` Joel Becker
2009-09-17 16:29                         ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 16:29                           ` Linus Torvalds
2009-09-17 16:38                           ` [Ocfs2-devel] " Arjan van de Ven
2009-09-17 16:38                             ` Arjan van de Ven
2009-09-17 20:16                             ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 20:16                               ` Linus Torvalds
2009-09-17 18:40                           ` [Ocfs2-devel] " Roland Dreier
2009-09-17 18:40                             ` Roland Dreier
2009-09-17 20:17                             ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 20:17                               ` Linus Torvalds
2009-09-17 20:34                               ` [Ocfs2-devel] " Joel Becker
2009-09-17 20:34                                 ` Joel Becker
2009-09-18  0:29                                 ` [Ocfs2-devel] " Linus Torvalds
2009-09-18  0:29                                   ` Linus Torvalds
2009-09-17 20:42                               ` [Ocfs2-devel] " Roland Dreier
2009-09-17 20:42                                 ` Roland Dreier
2009-09-17 20:55                                 ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 20:55                                   ` Linus Torvalds
2009-09-18  1:43                           ` Joel Becker [this message]
2009-09-18  1:43                             ` [Ocfs2-devel] " Joel Becker
2009-09-18 13:34                             ` Pádraig Brady
2009-09-18 13:34                               ` Pádraig Brady
2009-09-18 18:37                               ` Joel Becker
2009-09-18 18:37                                 ` Joel Becker
2009-09-18 17:23                             ` Peter W. Morreale
2009-09-18 17:23                               ` Peter W. Morreale
2009-09-18 18:39                               ` Joel Becker
2009-09-18 18:39                                 ` Joel Becker
2009-09-15  6:44   ` Miklos Szeredi
2009-09-15  6:44     ` Miklos Szeredi
2009-09-23 11:02   ` [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32 (take 2, no syscall) Joel Becker
2009-09-23 11:02     ` Joel Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090918014333.GD15620@mail.oracle.com \
    --to=joel.becker@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.