From: "Peter W. Morreale" <pmorreale@novell.com>
To: Joel Becker <Joel.Becker@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Mark Fasheh <mfasheh@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32
Date: Fri, 18 Sep 2009 11:23:33 -0600 [thread overview]
Message-ID: <1253294613.31359.136.camel@hermosa> (raw)
In-Reply-To: <20090918014333.GD15620@mail.oracle.com>
On Thu, 2009-09-17 at 18:43 -0700, Joel Becker wrote:
> On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote:
> > Why would anybody want to hide it at all? Why even the libc hiding?
> >
> > Nobody is going to use this except for special apps. Let them see what
> > they can do, in all its glory.
>
> I expect everyone will use this through cp(1), so that cp(1) can
> try to get server-side copy on the network filesystms.
> Speaking of "all its glory", what we have now is:
>
> int sys_copyfileat(int oldfd, const char *oldname, int newfd,
> const char *newname, int flags, int atflags)
Would it be worthwhile to consider adding an offset and length?
Then we get dd as well. (potentially)
Best,
-PWM
>
> > So I'd suggest something like having two system calls: one to start the
> > operation, and one to control it. And for a filesystem that does atomic
> > copies, the 'start' one obviously would also finish it, so the 'control'
> > it would be a no-op, because there would never be any outstanding ones.
> >
> > See what I'm saying? It wouldn't complicate _your_ life, but it would
> > allow for filesystems that can't do it atomically (or even quickly).
> >
> > So the first one would be something like
> >
> > int copyfile(const char *src, const char *dest, unsigned long flags);
> >
> > which would return:
> >
> > - zero on success
> > - negative (with errno) on error
> > - positive cookie on "I started it, here's my cookie". For extra bonus
> > points, maybe the cookie would actually be a file descriptor (for
> > poll/select users), but it would _not_ be a file descriptor to the
> > resulting _file_, it would literally be a "cookie" to the actual
> > copyfile event.
>
> Actually, if the cookie is a magic file descriptor, you don't
> need ctl. You can play tricks like polling for completoin,
> read(magic_fd, &remain, sizeof(loff_t)) for status, and close(magic_fd)
> for cancel. Might be a bit overloaded, though.
>
> > and then for ocfs2 you'd never return positive cookies. You'd never have
> > to worry about it.
>
> I suspect we'll later take advantage of copyfile's other
> modes. I did reflink as reflink only for the simple fact of doing one
> thing and well, not because I think copyfile isn't good.
>
> > Then the second interface would be something like
> >
> > int copyfile_ctrl(long cookie, unsigned long cmd);
> >
> > where you'd just have some way to wait for completion and ask how much has
> > been copied. The 'cmd' would be some set of 'cancel', 'status' or
> > 'uninterruptible wait' or whatever, and the return value would again be
> >
> > - negative (with errno) for errors (copy failed) - cookie released
> > - zero for 'done' - cookie released
> > - positive for 'percent remaining' or whatever - cookie still valid
> >
> > and this would be another callback into the filesystem code, but you'd
> > never have to worry about it, since you'd never see it (just leave it
> > NULL).
>
> I was going to ask about how to fit both calls into one inode
> operation, but I see you're giving this as an additional inode
> operation.
> This leaves us with a simliar-to-reflink inode copyfile op and a
> control op:
>
> ->copyfile(old_dentry, dir_inode, new_dentry, flags)
> ->copyfile_ctl(int cookie, unsigned int cmd)
>
> I have to change the flags a little, as my original proposal
> didn't handle backoff correctly.
>
> #define COPYFILE_WAIT 0x0001 /* Block until complete */
> #define COPYFILE_ATOMIC 0x0002 /* Things copied must be
> point-in-time and it must
> fail or succeed completely. */
> #define COPYFILE_ALLOW_COW 0x0004 /* The filesystem may share data
> extents between the source
> and target in a Copy-on-Write
> fashion. If neither
> COPYFILE_ALLOW_COW nor
> COPYFILE_REQUIRE_COW are
> specified, data extents must
> NOT be shared. When neither
> COW flag is provided, most
> filesystems should return
> -ENOTSUPP, as userspace can
> do read-write looping
> itself */
> #define COPYFILE_REQUIRE_COW 0x0008 /* Data extents MUST be shared
> between the source and target
> in a Copy-on-Write fashion */
> #define COPYFILE_UNPRIV_ATTRS 0x0010 /* Unprivileged attributes
> should be copied from the
> source to the target */
> #define COPYFILE_PRIV_ATTRS 0x0020 /* Privileged attributes should
> be copied from the source to
> the target if the caller has
> the necessary privileges */
> #define COPYFILE_REQUIRE_ATTRS 0x0040 /* Combined with the other
> attribute flags, the call
> MUST fail if the caller lacks
> the necessary privileges to
> copy ever attribute
> requested */
>
> #define COPYFILE_SNAPSHOT_ASYNC (COPYFILE_REQUIRE_COW |
> COPYFILE_UNPRIV_ATTRS |
> COPYFILE_PRIV_ATTRS |
> COPYFILE_ATOMIC)
> #define COPYFILE_SNAPSHOT_STRICT_ASYNC (COPYFILE_SNAPSHOT_ASYNC |
> COPYFILE_REQUIRE_ATTRS)
> #define COPYFILE_SNAPSHOT (COPYFILE_SNAPSHOT_ASYNC |
> COPYFILE_WAIT)
> #define COPYFILE_SNAPSHOT_STRICT (COPYFILE_SNAPSHOT_STRICT_ASYNC |
> COPYFILE_WAIT)
>
> > I dunno. The above seems like a fairly simple and powerful interface, and
> > I _think_ it would be ok for NFS and CIFS. And in fact, if that whole
> > "background copy" ends up being used a lot, maybe even a local filesystem
> > would implement it just to get easy overlapping IO - even if it would just
> > be a trivial common wrapper function that says "start a thread to do a
> > trivial manual copy".
>
> NFS and CIFS folks, please speak up.
>
> Joel
>
next prev parent reply other threads:[~2009-09-18 17:23 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-11 20:04 [GIT PULL] ocfs2 changes for 2.6.32 Joel Becker
2009-09-14 21:32 ` Linus Torvalds
2009-09-14 22:14 ` Joel Becker
2009-09-14 23:27 ` Linus Torvalds
2009-09-15 0:04 ` Joel Becker
2009-09-15 0:31 ` Linus Torvalds
2009-09-15 0:54 ` Joel Becker
2009-09-15 2:01 ` Linus Torvalds
2009-09-15 4:05 ` Arjan van de Ven
2009-09-15 4:35 ` Joel Becker
2009-09-15 4:06 ` Joel Becker
2009-09-15 16:30 ` Linus Torvalds
2009-09-15 21:45 ` Joel Becker
2009-09-16 4:20 ` Linus Torvalds
2009-09-16 4:40 ` Joel Becker
2009-09-17 16:29 ` Linus Torvalds
2009-09-17 16:38 ` Arjan van de Ven
2009-09-17 20:16 ` Linus Torvalds
2009-09-17 18:40 ` Roland Dreier
2009-09-17 20:17 ` Linus Torvalds
2009-09-17 20:34 ` Joel Becker
2009-09-18 0:29 ` Linus Torvalds
2009-09-17 20:42 ` Roland Dreier
2009-09-17 20:55 ` Linus Torvalds
2009-09-18 1:43 ` [Ocfs2-devel] " Joel Becker
2009-09-18 13:34 ` Pádraig Brady
2009-09-18 18:37 ` Joel Becker
2009-09-18 17:23 ` Peter W. Morreale [this message]
2009-09-18 18:39 ` Joel Becker
2009-09-15 6:44 ` Miklos Szeredi
2009-09-23 11:02 ` [GIT PULL] ocfs2 changes for 2.6.32 (take 2, no syscall) Joel Becker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1253294613.31359.136.camel@hermosa \
--to=pmorreale@novell.com \
--cc=Joel.Becker@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mfasheh@suse.com \
--cc=ocfs2-devel@oss.oracle.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox