From: Joel Becker <Joel.Becker@oracle.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32
Date: Thu, 17 Sep 2009 18:43:33 -0700 [thread overview]
Message-ID: <20090918014333.GD15620@mail.oracle.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0909170856050.4950@localhost.localdomain>
On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote:
> Why would anybody want to hide it at all? Why even the libc hiding?
>
> Nobody is going to use this except for special apps. Let them see what
> they can do, in all its glory.
I expect everyone will use this through cp(1), so that cp(1) can
try to get server-side copy on the network filesystms.
Speaking of "all its glory", what we have now is:
int sys_copyfileat(int oldfd, const char *oldname, int newfd,
const char *newname, int flags, int atflags)
> So I'd suggest something like having two system calls: one to start the
> operation, and one to control it. And for a filesystem that does atomic
> copies, the 'start' one obviously would also finish it, so the 'control'
> it would be a no-op, because there would never be any outstanding ones.
>
> See what I'm saying? It wouldn't complicate _your_ life, but it would
> allow for filesystems that can't do it atomically (or even quickly).
>
> So the first one would be something like
>
> int copyfile(const char *src, const char *dest, unsigned long flags);
>
> which would return:
>
> - zero on success
> - negative (with errno) on error
> - positive cookie on "I started it, here's my cookie". For extra bonus
> points, maybe the cookie would actually be a file descriptor (for
> poll/select users), but it would _not_ be a file descriptor to the
> resulting _file_, it would literally be a "cookie" to the actual
> copyfile event.
Actually, if the cookie is a magic file descriptor, you don't
need ctl. You can play tricks like polling for completoin,
read(magic_fd, &remain, sizeof(loff_t)) for status, and close(magic_fd)
for cancel. Might be a bit overloaded, though.
> and then for ocfs2 you'd never return positive cookies. You'd never have
> to worry about it.
I suspect we'll later take advantage of copyfile's other
modes. I did reflink as reflink only for the simple fact of doing one
thing and well, not because I think copyfile isn't good.
> Then the second interface would be something like
>
> int copyfile_ctrl(long cookie, unsigned long cmd);
>
> where you'd just have some way to wait for completion and ask how much has
> been copied. The 'cmd' would be some set of 'cancel', 'status' or
> 'uninterruptible wait' or whatever, and the return value would again be
>
> - negative (with errno) for errors (copy failed) - cookie released
> - zero for 'done' - cookie released
> - positive for 'percent remaining' or whatever - cookie still valid
>
> and this would be another callback into the filesystem code, but you'd
> never have to worry about it, since you'd never see it (just leave it
> NULL).
I was going to ask about how to fit both calls into one inode
operation, but I see you're giving this as an additional inode
operation.
This leaves us with a simliar-to-reflink inode copyfile op and a
control op:
->copyfile(old_dentry, dir_inode, new_dentry, flags)
->copyfile_ctl(int cookie, unsigned int cmd)
I have to change the flags a little, as my original proposal
didn't handle backoff correctly.
#define COPYFILE_WAIT 0x0001 /* Block until complete */
#define COPYFILE_ATOMIC 0x0002 /* Things copied must be
point-in-time and it must
fail or succeed completely. */
#define COPYFILE_ALLOW_COW 0x0004 /* The filesystem may share data
extents between the source
and target in a Copy-on-Write
fashion. If neither
COPYFILE_ALLOW_COW nor
COPYFILE_REQUIRE_COW are
specified, data extents must
NOT be shared. When neither
COW flag is provided, most
filesystems should return
-ENOTSUPP, as userspace can
do read-write looping
itself */
#define COPYFILE_REQUIRE_COW 0x0008 /* Data extents MUST be shared
between the source and target
in a Copy-on-Write fashion */
#define COPYFILE_UNPRIV_ATTRS 0x0010 /* Unprivileged attributes
should be copied from the
source to the target */
#define COPYFILE_PRIV_ATTRS 0x0020 /* Privileged attributes should
be copied from the source to
the target if the caller has
the necessary privileges */
#define COPYFILE_REQUIRE_ATTRS 0x0040 /* Combined with the other
attribute flags, the call
MUST fail if the caller lacks
the necessary privileges to
copy ever attribute
requested */
#define COPYFILE_SNAPSHOT_ASYNC (COPYFILE_REQUIRE_COW |
COPYFILE_UNPRIV_ATTRS |
COPYFILE_PRIV_ATTRS |
COPYFILE_ATOMIC)
#define COPYFILE_SNAPSHOT_STRICT_ASYNC (COPYFILE_SNAPSHOT_ASYNC |
COPYFILE_REQUIRE_ATTRS)
#define COPYFILE_SNAPSHOT (COPYFILE_SNAPSHOT_ASYNC |
COPYFILE_WAIT)
#define COPYFILE_SNAPSHOT_STRICT (COPYFILE_SNAPSHOT_STRICT_ASYNC |
COPYFILE_WAIT)
> I dunno. The above seems like a fairly simple and powerful interface, and
> I _think_ it would be ok for NFS and CIFS. And in fact, if that whole
> "background copy" ends up being used a lot, maybe even a local filesystem
> would implement it just to get easy overlapping IO - even if it would just
> be a trivial common wrapper function that says "start a thread to do a
> trivial manual copy".
NFS and CIFS folks, please speak up.
Joel
--
"There is no more evil thing on earth than race prejudice, none at
all. I write deliberately -- it is the worst single thing in life
now. It justifies and holds together more baseness, cruelty and
abomination than any other sort of error in the world."
- H. G. Wells
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
WARNING: multiple messages have this Message-ID (diff)
From: Joel Becker <Joel.Becker@oracle.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32
Date: Thu, 17 Sep 2009 18:43:33 -0700 [thread overview]
Message-ID: <20090918014333.GD15620@mail.oracle.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0909170856050.4950@localhost.localdomain>
On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote:
> Why would anybody want to hide it at all? Why even the libc hiding?
>
> Nobody is going to use this except for special apps. Let them see what
> they can do, in all its glory.
I expect everyone will use this through cp(1), so that cp(1) can
try to get server-side copy on the network filesystms.
Speaking of "all its glory", what we have now is:
int sys_copyfileat(int oldfd, const char *oldname, int newfd,
const char *newname, int flags, int atflags)
> So I'd suggest something like having two system calls: one to start the
> operation, and one to control it. And for a filesystem that does atomic
> copies, the 'start' one obviously would also finish it, so the 'control'
> it would be a no-op, because there would never be any outstanding ones.
>
> See what I'm saying? It wouldn't complicate _your_ life, but it would
> allow for filesystems that can't do it atomically (or even quickly).
>
> So the first one would be something like
>
> int copyfile(const char *src, const char *dest, unsigned long flags);
>
> which would return:
>
> - zero on success
> - negative (with errno) on error
> - positive cookie on "I started it, here's my cookie". For extra bonus
> points, maybe the cookie would actually be a file descriptor (for
> poll/select users), but it would _not_ be a file descriptor to the
> resulting _file_, it would literally be a "cookie" to the actual
> copyfile event.
Actually, if the cookie is a magic file descriptor, you don't
need ctl. You can play tricks like polling for completoin,
read(magic_fd, &remain, sizeof(loff_t)) for status, and close(magic_fd)
for cancel. Might be a bit overloaded, though.
> and then for ocfs2 you'd never return positive cookies. You'd never have
> to worry about it.
I suspect we'll later take advantage of copyfile's other
modes. I did reflink as reflink only for the simple fact of doing one
thing and well, not because I think copyfile isn't good.
> Then the second interface would be something like
>
> int copyfile_ctrl(long cookie, unsigned long cmd);
>
> where you'd just have some way to wait for completion and ask how much has
> been copied. The 'cmd' would be some set of 'cancel', 'status' or
> 'uninterruptible wait' or whatever, and the return value would again be
>
> - negative (with errno) for errors (copy failed) - cookie released
> - zero for 'done' - cookie released
> - positive for 'percent remaining' or whatever - cookie still valid
>
> and this would be another callback into the filesystem code, but you'd
> never have to worry about it, since you'd never see it (just leave it
> NULL).
I was going to ask about how to fit both calls into one inode
operation, but I see you're giving this as an additional inode
operation.
This leaves us with a simliar-to-reflink inode copyfile op and a
control op:
->copyfile(old_dentry, dir_inode, new_dentry, flags)
->copyfile_ctl(int cookie, unsigned int cmd)
I have to change the flags a little, as my original proposal
didn't handle backoff correctly.
#define COPYFILE_WAIT 0x0001 /* Block until complete */
#define COPYFILE_ATOMIC 0x0002 /* Things copied must be
point-in-time and it must
fail or succeed completely. */
#define COPYFILE_ALLOW_COW 0x0004 /* The filesystem may share data
extents between the source
and target in a Copy-on-Write
fashion. If neither
COPYFILE_ALLOW_COW nor
COPYFILE_REQUIRE_COW are
specified, data extents must
NOT be shared. When neither
COW flag is provided, most
filesystems should return
-ENOTSUPP, as userspace can
do read-write looping
itself */
#define COPYFILE_REQUIRE_COW 0x0008 /* Data extents MUST be shared
between the source and target
in a Copy-on-Write fashion */
#define COPYFILE_UNPRIV_ATTRS 0x0010 /* Unprivileged attributes
should be copied from the
source to the target */
#define COPYFILE_PRIV_ATTRS 0x0020 /* Privileged attributes should
be copied from the source to
the target if the caller has
the necessary privileges */
#define COPYFILE_REQUIRE_ATTRS 0x0040 /* Combined with the other
attribute flags, the call
MUST fail if the caller lacks
the necessary privileges to
copy ever attribute
requested */
#define COPYFILE_SNAPSHOT_ASYNC (COPYFILE_REQUIRE_COW |
COPYFILE_UNPRIV_ATTRS |
COPYFILE_PRIV_ATTRS |
COPYFILE_ATOMIC)
#define COPYFILE_SNAPSHOT_STRICT_ASYNC (COPYFILE_SNAPSHOT_ASYNC |
COPYFILE_REQUIRE_ATTRS)
#define COPYFILE_SNAPSHOT (COPYFILE_SNAPSHOT_ASYNC |
COPYFILE_WAIT)
#define COPYFILE_SNAPSHOT_STRICT (COPYFILE_SNAPSHOT_STRICT_ASYNC |
COPYFILE_WAIT)
> I dunno. The above seems like a fairly simple and powerful interface, and
> I _think_ it would be ok for NFS and CIFS. And in fact, if that whole
> "background copy" ends up being used a lot, maybe even a local filesystem
> would implement it just to get easy overlapping IO - even if it would just
> be a trivial common wrapper function that says "start a thread to do a
> trivial manual copy".
NFS and CIFS folks, please speak up.
Joel
--
"There is no more evil thing on earth than race prejudice, none at
all. I write deliberately -- it is the worst single thing in life
now. It justifies and holds together more baseness, cruelty and
abomination than any other sort of error in the world."
- H. G. Wells
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
next prev parent reply other threads:[~2009-09-18 1:43 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-11 20:04 [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32 Joel Becker
2009-09-11 20:04 ` Joel Becker
2009-09-14 21:32 ` [Ocfs2-devel] " Linus Torvalds
2009-09-14 21:32 ` Linus Torvalds
2009-09-14 22:14 ` [Ocfs2-devel] " Joel Becker
2009-09-14 22:14 ` Joel Becker
2009-09-14 23:27 ` [Ocfs2-devel] " Linus Torvalds
2009-09-14 23:27 ` Linus Torvalds
2009-09-15 0:04 ` [Ocfs2-devel] " Joel Becker
2009-09-15 0:04 ` Joel Becker
2009-09-15 0:31 ` [Ocfs2-devel] " Linus Torvalds
2009-09-15 0:31 ` Linus Torvalds
2009-09-15 0:54 ` [Ocfs2-devel] " Joel Becker
2009-09-15 0:54 ` Joel Becker
2009-09-15 2:01 ` [Ocfs2-devel] " Linus Torvalds
2009-09-15 2:01 ` Linus Torvalds
2009-09-15 4:05 ` [Ocfs2-devel] " Arjan van de Ven
2009-09-15 4:05 ` Arjan van de Ven
2009-09-15 4:35 ` [Ocfs2-devel] " Joel Becker
2009-09-15 4:35 ` Joel Becker
2009-09-15 4:06 ` [Ocfs2-devel] " Joel Becker
2009-09-15 4:06 ` Joel Becker
2009-09-15 16:30 ` [Ocfs2-devel] " Linus Torvalds
2009-09-15 16:30 ` Linus Torvalds
2009-09-15 21:45 ` [Ocfs2-devel] " Joel Becker
2009-09-15 21:45 ` Joel Becker
2009-09-16 4:20 ` [Ocfs2-devel] " Linus Torvalds
2009-09-16 4:20 ` Linus Torvalds
2009-09-16 4:40 ` [Ocfs2-devel] " Joel Becker
2009-09-16 4:40 ` Joel Becker
2009-09-17 16:29 ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 16:29 ` Linus Torvalds
2009-09-17 16:38 ` [Ocfs2-devel] " Arjan van de Ven
2009-09-17 16:38 ` Arjan van de Ven
2009-09-17 20:16 ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 20:16 ` Linus Torvalds
2009-09-17 18:40 ` [Ocfs2-devel] " Roland Dreier
2009-09-17 18:40 ` Roland Dreier
2009-09-17 20:17 ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 20:17 ` Linus Torvalds
2009-09-17 20:34 ` [Ocfs2-devel] " Joel Becker
2009-09-17 20:34 ` Joel Becker
2009-09-18 0:29 ` [Ocfs2-devel] " Linus Torvalds
2009-09-18 0:29 ` Linus Torvalds
2009-09-17 20:42 ` [Ocfs2-devel] " Roland Dreier
2009-09-17 20:42 ` Roland Dreier
2009-09-17 20:55 ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 20:55 ` Linus Torvalds
2009-09-18 1:43 ` Joel Becker [this message]
2009-09-18 1:43 ` [Ocfs2-devel] " Joel Becker
2009-09-18 13:34 ` Pádraig Brady
2009-09-18 13:34 ` Pádraig Brady
2009-09-18 18:37 ` Joel Becker
2009-09-18 18:37 ` Joel Becker
2009-09-18 17:23 ` Peter W. Morreale
2009-09-18 17:23 ` Peter W. Morreale
2009-09-18 18:39 ` Joel Becker
2009-09-18 18:39 ` Joel Becker
2009-09-15 6:44 ` Miklos Szeredi
2009-09-15 6:44 ` Miklos Szeredi
2009-09-23 11:02 ` [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32 (take 2, no syscall) Joel Becker
2009-09-23 11:02 ` Joel Becker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090918014333.GD15620@mail.oracle.com \
--to=joel.becker@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mfasheh@suse.com \
--cc=ocfs2-devel@oss.oracle.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.