From: Peter W. Morreale <pmorreale@novell.com>
To: Joel Becker <Joel.Becker@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Mark Fasheh <mfasheh@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32
Date: Fri, 18 Sep 2009 11:23:33 -0600 [thread overview]
Message-ID: <1253294613.31359.136.camel@hermosa> (raw)
In-Reply-To: <20090918014333.GD15620@mail.oracle.com>
On Thu, 2009-09-17 at 18:43 -0700, Joel Becker wrote:
> On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote:
> > Why would anybody want to hide it at all? Why even the libc hiding?
> >
> > Nobody is going to use this except for special apps. Let them see what
> > they can do, in all its glory.
>
> I expect everyone will use this through cp(1), so that cp(1) can
> try to get server-side copy on the network filesystms.
> Speaking of "all its glory", what we have now is:
>
> int sys_copyfileat(int oldfd, const char *oldname, int newfd,
> const char *newname, int flags, int atflags)
Would it be worthwhile to consider adding an offset and length?
Then we get dd as well. (potentially)
Best,
-PWM
>
> > So I'd suggest something like having two system calls: one to start the
> > operation, and one to control it. And for a filesystem that does atomic
> > copies, the 'start' one obviously would also finish it, so the 'control'
> > it would be a no-op, because there would never be any outstanding ones.
> >
> > See what I'm saying? It wouldn't complicate _your_ life, but it would
> > allow for filesystems that can't do it atomically (or even quickly).
> >
> > So the first one would be something like
> >
> > int copyfile(const char *src, const char *dest, unsigned long flags);
> >
> > which would return:
> >
> > - zero on success
> > - negative (with errno) on error
> > - positive cookie on "I started it, here's my cookie". For extra bonus
> > points, maybe the cookie would actually be a file descriptor (for
> > poll/select users), but it would _not_ be a file descriptor to the
> > resulting _file_, it would literally be a "cookie" to the actual
> > copyfile event.
>
> Actually, if the cookie is a magic file descriptor, you don't
> need ctl. You can play tricks like polling for completoin,
> read(magic_fd, &remain, sizeof(loff_t)) for status, and close(magic_fd)
> for cancel. Might be a bit overloaded, though.
>
> > and then for ocfs2 you'd never return positive cookies. You'd never have
> > to worry about it.
>
> I suspect we'll later take advantage of copyfile's other
> modes. I did reflink as reflink only for the simple fact of doing one
> thing and well, not because I think copyfile isn't good.
>
> > Then the second interface would be something like
> >
> > int copyfile_ctrl(long cookie, unsigned long cmd);
> >
> > where you'd just have some way to wait for completion and ask how much has
> > been copied. The 'cmd' would be some set of 'cancel', 'status' or
> > 'uninterruptible wait' or whatever, and the return value would again be
> >
> > - negative (with errno) for errors (copy failed) - cookie released
> > - zero for 'done' - cookie released
> > - positive for 'percent remaining' or whatever - cookie still valid
> >
> > and this would be another callback into the filesystem code, but you'd
> > never have to worry about it, since you'd never see it (just leave it
> > NULL).
>
> I was going to ask about how to fit both calls into one inode
> operation, but I see you're giving this as an additional inode
> operation.
> This leaves us with a simliar-to-reflink inode copyfile op and a
> control op:
>
> ->copyfile(old_dentry, dir_inode, new_dentry, flags)
> ->copyfile_ctl(int cookie, unsigned int cmd)
>
> I have to change the flags a little, as my original proposal
> didn't handle backoff correctly.
>
> #define COPYFILE_WAIT 0x0001 /* Block until complete */
> #define COPYFILE_ATOMIC 0x0002 /* Things copied must be
> point-in-time and it must
> fail or succeed completely. */
> #define COPYFILE_ALLOW_COW 0x0004 /* The filesystem may share data
> extents between the source
> and target in a Copy-on-Write
> fashion. If neither
> COPYFILE_ALLOW_COW nor
> COPYFILE_REQUIRE_COW are
> specified, data extents must
> NOT be shared. When neither
> COW flag is provided, most
> filesystems should return
> -ENOTSUPP, as userspace can
> do read-write looping
> itself */
> #define COPYFILE_REQUIRE_COW 0x0008 /* Data extents MUST be shared
> between the source and target
> in a Copy-on-Write fashion */
> #define COPYFILE_UNPRIV_ATTRS 0x0010 /* Unprivileged attributes
> should be copied from the
> source to the target */
> #define COPYFILE_PRIV_ATTRS 0x0020 /* Privileged attributes should
> be copied from the source to
> the target if the caller has
> the necessary privileges */
> #define COPYFILE_REQUIRE_ATTRS 0x0040 /* Combined with the other
> attribute flags, the call
> MUST fail if the caller lacks
> the necessary privileges to
> copy ever attribute
> requested */
>
> #define COPYFILE_SNAPSHOT_ASYNC (COPYFILE_REQUIRE_COW |
> COPYFILE_UNPRIV_ATTRS |
> COPYFILE_PRIV_ATTRS |
> COPYFILE_ATOMIC)
> #define COPYFILE_SNAPSHOT_STRICT_ASYNC (COPYFILE_SNAPSHOT_ASYNC |
> COPYFILE_REQUIRE_ATTRS)
> #define COPYFILE_SNAPSHOT (COPYFILE_SNAPSHOT_ASYNC |
> COPYFILE_WAIT)
> #define COPYFILE_SNAPSHOT_STRICT (COPYFILE_SNAPSHOT_STRICT_ASYNC |
> COPYFILE_WAIT)
>
> > I dunno. The above seems like a fairly simple and powerful interface, and
> > I _think_ it would be ok for NFS and CIFS. And in fact, if that whole
> > "background copy" ends up being used a lot, maybe even a local filesystem
> > would implement it just to get easy overlapping IO - even if it would just
> > be a trivial common wrapper function that says "start a thread to do a
> > trivial manual copy".
>
> NFS and CIFS folks, please speak up.
>
> Joel
>
WARNING: multiple messages have this Message-ID (diff)
From: "Peter W. Morreale" <pmorreale@novell.com>
To: Joel Becker <Joel.Becker@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Mark Fasheh <mfasheh@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32
Date: Fri, 18 Sep 2009 11:23:33 -0600 [thread overview]
Message-ID: <1253294613.31359.136.camel@hermosa> (raw)
In-Reply-To: <20090918014333.GD15620@mail.oracle.com>
On Thu, 2009-09-17 at 18:43 -0700, Joel Becker wrote:
> On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote:
> > Why would anybody want to hide it at all? Why even the libc hiding?
> >
> > Nobody is going to use this except for special apps. Let them see what
> > they can do, in all its glory.
>
> I expect everyone will use this through cp(1), so that cp(1) can
> try to get server-side copy on the network filesystms.
> Speaking of "all its glory", what we have now is:
>
> int sys_copyfileat(int oldfd, const char *oldname, int newfd,
> const char *newname, int flags, int atflags)
Would it be worthwhile to consider adding an offset and length?
Then we get dd as well. (potentially)
Best,
-PWM
>
> > So I'd suggest something like having two system calls: one to start the
> > operation, and one to control it. And for a filesystem that does atomic
> > copies, the 'start' one obviously would also finish it, so the 'control'
> > it would be a no-op, because there would never be any outstanding ones.
> >
> > See what I'm saying? It wouldn't complicate _your_ life, but it would
> > allow for filesystems that can't do it atomically (or even quickly).
> >
> > So the first one would be something like
> >
> > int copyfile(const char *src, const char *dest, unsigned long flags);
> >
> > which would return:
> >
> > - zero on success
> > - negative (with errno) on error
> > - positive cookie on "I started it, here's my cookie". For extra bonus
> > points, maybe the cookie would actually be a file descriptor (for
> > poll/select users), but it would _not_ be a file descriptor to the
> > resulting _file_, it would literally be a "cookie" to the actual
> > copyfile event.
>
> Actually, if the cookie is a magic file descriptor, you don't
> need ctl. You can play tricks like polling for completoin,
> read(magic_fd, &remain, sizeof(loff_t)) for status, and close(magic_fd)
> for cancel. Might be a bit overloaded, though.
>
> > and then for ocfs2 you'd never return positive cookies. You'd never have
> > to worry about it.
>
> I suspect we'll later take advantage of copyfile's other
> modes. I did reflink as reflink only for the simple fact of doing one
> thing and well, not because I think copyfile isn't good.
>
> > Then the second interface would be something like
> >
> > int copyfile_ctrl(long cookie, unsigned long cmd);
> >
> > where you'd just have some way to wait for completion and ask how much has
> > been copied. The 'cmd' would be some set of 'cancel', 'status' or
> > 'uninterruptible wait' or whatever, and the return value would again be
> >
> > - negative (with errno) for errors (copy failed) - cookie released
> > - zero for 'done' - cookie released
> > - positive for 'percent remaining' or whatever - cookie still valid
> >
> > and this would be another callback into the filesystem code, but you'd
> > never have to worry about it, since you'd never see it (just leave it
> > NULL).
>
> I was going to ask about how to fit both calls into one inode
> operation, but I see you're giving this as an additional inode
> operation.
> This leaves us with a simliar-to-reflink inode copyfile op and a
> control op:
>
> ->copyfile(old_dentry, dir_inode, new_dentry, flags)
> ->copyfile_ctl(int cookie, unsigned int cmd)
>
> I have to change the flags a little, as my original proposal
> didn't handle backoff correctly.
>
> #define COPYFILE_WAIT 0x0001 /* Block until complete */
> #define COPYFILE_ATOMIC 0x0002 /* Things copied must be
> point-in-time and it must
> fail or succeed completely. */
> #define COPYFILE_ALLOW_COW 0x0004 /* The filesystem may share data
> extents between the source
> and target in a Copy-on-Write
> fashion. If neither
> COPYFILE_ALLOW_COW nor
> COPYFILE_REQUIRE_COW are
> specified, data extents must
> NOT be shared. When neither
> COW flag is provided, most
> filesystems should return
> -ENOTSUPP, as userspace can
> do read-write looping
> itself */
> #define COPYFILE_REQUIRE_COW 0x0008 /* Data extents MUST be shared
> between the source and target
> in a Copy-on-Write fashion */
> #define COPYFILE_UNPRIV_ATTRS 0x0010 /* Unprivileged attributes
> should be copied from the
> source to the target */
> #define COPYFILE_PRIV_ATTRS 0x0020 /* Privileged attributes should
> be copied from the source to
> the target if the caller has
> the necessary privileges */
> #define COPYFILE_REQUIRE_ATTRS 0x0040 /* Combined with the other
> attribute flags, the call
> MUST fail if the caller lacks
> the necessary privileges to
> copy ever attribute
> requested */
>
> #define COPYFILE_SNAPSHOT_ASYNC (COPYFILE_REQUIRE_COW |
> COPYFILE_UNPRIV_ATTRS |
> COPYFILE_PRIV_ATTRS |
> COPYFILE_ATOMIC)
> #define COPYFILE_SNAPSHOT_STRICT_ASYNC (COPYFILE_SNAPSHOT_ASYNC |
> COPYFILE_REQUIRE_ATTRS)
> #define COPYFILE_SNAPSHOT (COPYFILE_SNAPSHOT_ASYNC |
> COPYFILE_WAIT)
> #define COPYFILE_SNAPSHOT_STRICT (COPYFILE_SNAPSHOT_STRICT_ASYNC |
> COPYFILE_WAIT)
>
> > I dunno. The above seems like a fairly simple and powerful interface, and
> > I _think_ it would be ok for NFS and CIFS. And in fact, if that whole
> > "background copy" ends up being used a lot, maybe even a local filesystem
> > would implement it just to get easy overlapping IO - even if it would just
> > be a trivial common wrapper function that says "start a thread to do a
> > trivial manual copy".
>
> NFS and CIFS folks, please speak up.
>
> Joel
>
next prev parent reply other threads:[~2009-09-18 17:23 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-11 20:04 [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32 Joel Becker
2009-09-11 20:04 ` Joel Becker
2009-09-14 21:32 ` [Ocfs2-devel] " Linus Torvalds
2009-09-14 21:32 ` Linus Torvalds
2009-09-14 22:14 ` [Ocfs2-devel] " Joel Becker
2009-09-14 22:14 ` Joel Becker
2009-09-14 23:27 ` [Ocfs2-devel] " Linus Torvalds
2009-09-14 23:27 ` Linus Torvalds
2009-09-15 0:04 ` [Ocfs2-devel] " Joel Becker
2009-09-15 0:04 ` Joel Becker
2009-09-15 0:31 ` [Ocfs2-devel] " Linus Torvalds
2009-09-15 0:31 ` Linus Torvalds
2009-09-15 0:54 ` [Ocfs2-devel] " Joel Becker
2009-09-15 0:54 ` Joel Becker
2009-09-15 2:01 ` [Ocfs2-devel] " Linus Torvalds
2009-09-15 2:01 ` Linus Torvalds
2009-09-15 4:05 ` [Ocfs2-devel] " Arjan van de Ven
2009-09-15 4:05 ` Arjan van de Ven
2009-09-15 4:35 ` [Ocfs2-devel] " Joel Becker
2009-09-15 4:35 ` Joel Becker
2009-09-15 4:06 ` [Ocfs2-devel] " Joel Becker
2009-09-15 4:06 ` Joel Becker
2009-09-15 16:30 ` [Ocfs2-devel] " Linus Torvalds
2009-09-15 16:30 ` Linus Torvalds
2009-09-15 21:45 ` [Ocfs2-devel] " Joel Becker
2009-09-15 21:45 ` Joel Becker
2009-09-16 4:20 ` [Ocfs2-devel] " Linus Torvalds
2009-09-16 4:20 ` Linus Torvalds
2009-09-16 4:40 ` [Ocfs2-devel] " Joel Becker
2009-09-16 4:40 ` Joel Becker
2009-09-17 16:29 ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 16:29 ` Linus Torvalds
2009-09-17 16:38 ` [Ocfs2-devel] " Arjan van de Ven
2009-09-17 16:38 ` Arjan van de Ven
2009-09-17 20:16 ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 20:16 ` Linus Torvalds
2009-09-17 18:40 ` [Ocfs2-devel] " Roland Dreier
2009-09-17 18:40 ` Roland Dreier
2009-09-17 20:17 ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 20:17 ` Linus Torvalds
2009-09-17 20:34 ` [Ocfs2-devel] " Joel Becker
2009-09-17 20:34 ` Joel Becker
2009-09-18 0:29 ` [Ocfs2-devel] " Linus Torvalds
2009-09-18 0:29 ` Linus Torvalds
2009-09-17 20:42 ` [Ocfs2-devel] " Roland Dreier
2009-09-17 20:42 ` Roland Dreier
2009-09-17 20:55 ` [Ocfs2-devel] " Linus Torvalds
2009-09-17 20:55 ` Linus Torvalds
2009-09-18 1:43 ` [Ocfs2-devel] " Joel Becker
2009-09-18 1:43 ` Joel Becker
2009-09-18 13:34 ` Pádraig Brady
2009-09-18 13:34 ` Pádraig Brady
2009-09-18 18:37 ` Joel Becker
2009-09-18 18:37 ` Joel Becker
2009-09-18 17:23 ` Peter W. Morreale [this message]
2009-09-18 17:23 ` Peter W. Morreale
2009-09-18 18:39 ` Joel Becker
2009-09-18 18:39 ` Joel Becker
2009-09-15 6:44 ` Miklos Szeredi
2009-09-15 6:44 ` Miklos Szeredi
2009-09-23 11:02 ` [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32 (take 2, no syscall) Joel Becker
2009-09-23 11:02 ` Joel Becker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1253294613.31359.136.camel@hermosa \
--to=pmorreale@novell.com \
--cc=Joel.Becker@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mfasheh@suse.com \
--cc=ocfs2-devel@oss.oracle.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.