* [GIT PULL] ocfs2 changes for 2.6.32
@ 2009-09-11 20:04 Joel Becker
2009-09-14 21:32 ` Linus Torvalds
0 siblings, 1 reply; 33+ messages in thread
From: Joel Becker @ 2009-09-11 20:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Mark Fasheh, Andrew Morton, linux-kernel, ocfs2-devel
Linus, et al,
Here are the ocfs2 feature changes for 2.6.32. The big ticket
item is the reflinkat(2) system call and ocfs2's support for it. The
ocfs2 support accounts for all but a handful of the changes. The
remaining few patches are fixes.
Please pull.
Joel
The following changes since commit 8379e7c46cc48f51197dd663fc6676f47f2a1e71:
Sunil Mushran (1):
ocfs2: ocfs2_write_begin_nolock() should handle len=0
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2.git upstream-linus
Coly Li (1):
dlmglue.c: add missed mlog lines
Joel Becker (41):
ocfs2: Make the ocfs2_caching_info structure self-contained.
ocfs2: Change metadata caching locks to an operations structure.
ocfs2: Take the inode out of the metadata read/write paths.
ocfs2: move ip_last_trans to struct ocfs2_caching_info
ocfs2: move ip_created_trans to struct ocfs2_caching_info
ocfs2: Pass struct ocfs2_caching_info to the journal functions.
ocfs2: Store the ocfs2_caching_info on ocfs2_extent_tree.
ocfs2: Pass ocfs2_caching_info to ocfs2_read_extent_block().
ocfs2: ocfs2_find_path() only needs the caching info
ocfs2: ocfs2_create_new_meta_bhs() doesn't need struct inode.
ocfs2: Pass ocfs2_extent_tree to ocfs2_unlink_path()
ocfs2: ocfs2_complete_edge_insert() doesn't need struct inode at all.
ocfs2: Get inode out of ocfs2_rotate_subtree_root_right().
ocfs2: Pass ocfs2_extent_tree to ocfs2_get_subtree_root()
ocfs2: Drop struct inode from ocfs2_extent_tree_operations.
ocfs2: ocfs2_rotate_tree_right() doesn't need struct inode.
ocfs2: ocfs2_update_edge_lengths() doesn't need struct inode.
ocfs2: ocfs2_rotate_subtree_left() doesn't need struct inode.
ocfs2: __ocfs2_rotate_tree_left() doesn't need struct inode.
ocfs2: ocfs2_rotate_tree_left() no longer needs struct inode.
ocfs2: ocfs2_merge_rec_left/right() no longer need struct inode.
ocfs2: ocfs2_try_to_merge_extent() doesn't need struct inode.
ocfs2: ocfs2_grow_branch() and ocfs2_append_rec_to_path() lose struct inode.
ocfs2: ocfs2_truncate_rec() doesn't need struct inode.
ocfs2: Make truncating the extent map an extent_tree_operation.
ocfs2: ocfs2_insert_at_leaf() doesn't need struct inode.
ocfs2: Give ocfs2_split_record() an extent_tree instead of an inode.
ocfs2: ocfs2_do_insert_extent() and ocfs2_insert_path() no longer need an inode.
ocfs2: ocfs2_extent_contig() only requires the superblock.
ocfs2: Swap inode for extent_tree in ocfs2_figure_merge_contig_type().
ocfs2: Remove inode from ocfs2_figure_extent_contig().
ocfs2: ocfs2_figure_insert_type() no longer needs struct inode.
ocfs2: Make extent map insertion an extent_tree_operation.
ocfs2: ocfs2_insert_extent() no longer needs struct inode.
ocfs2: ocfs2_add_clusters_in_btree() no longer needs struct inode.
ocfs2: ocfs2_remove_extent() no longer needs struct inode.
ocfs2: ocfs2_split_and_insert() no longer needs struct inode.
ocfs2: Teach ocfs2_replace_extent_rec() to use an extent_tree.
ocfs2: __ocfs2_mark_extent_written() doesn't need struct inode.
ocfs2: Pass ocfs2_caching_info into ocfs_init_*_extent_tree().
fs: Add the reflink() operation and reflinkat(2) system call.
Sunil Mushran (1):
ocfs2: __ocfs2_abort() should not enable panic for local mounts
Tao Ma (42):
ocfs2: Define refcount tree structure.
ocfs2: Add metaecc for ocfs2_refcount_block.
ocfs2: Add ocfs2_read_refcount_block.
ocfs2: Abstract caching info checkpoint.
ocfs2: Add new refcount tree lock resource in dlmglue.
ocfs2: Add caching info for refcount tree.
ocfs2: Add refcount tree lock mechanism.
ocfs2: Basic tree root operation.
ocfs2: Wrap ocfs2_extent_contig in ocfs2_extent_tree.
ocfs2: Abstract extent split process.
ocfs2: Add refcount b-tree as a new extent tree.
ocfs2: move tree path functions to alloc.h.
ocfs2: Add support for incrementing refcount in the tree.
ocfs2: Add support of decrementing refcount for delete.
ocfs2: Add functions for extents refcounted.
ocfs2: Decrement refcount when truncating refcounted extents.
ocfs2: Add CoW support.
ocfs2: CoW refcount tree improvement.
ocfs2: Integrate CoW in file write.
ocfs2: CoW a reflinked cluster when it is truncated.
ocfs2: Add normal functions for reflink a normal file's extents.
ocfs2: handle file attributes issue for reflink.
ocfs2: Return extent flags for xattr value tree.
ocfs2: Abstract duplicate clusters process in CoW.
ocfs2: Add CoW support for xattr.
ocfs2: Remove inode from ocfs2_xattr_bucket_get_name_value.
ocfs2: Abstract the creation of xattr block.
ocfs2: Abstract ocfs2 xattr tree extend rec iteration process.
ocfs2: Attach xattr clusters to refcount tree.
ocfs2: Call refcount tree remove process properly.
ocfs2: Create an xattr indexed block if needed.
ocfs2: Add reflink support for xattr.
ocfs2: Modify removing xattr process for refcount.
ocfs2: Don't merge in 1st refcount ops of reflink.
ocfs2: Make transaction extend more efficient.
ocfs2: Use proper parameter for some inode operation.
ocfs2: Create reflinked file in orphan dir.
ocfs2: Add preserve to reflink.
ocfs2: Implement ocfs2_reflink.
ocfs2: Enable refcount tree support.
ocfs2: Add ioctl for reflink.
ocfs2: Use buffer IO if we are appending a file.
Wengang Wang (1):
ocfs2: add spinlock protection when dealing with lockres->purge.
Documentation/filesystems/reflink.txt | 174 ++
Documentation/filesystems/vfs.txt | 4 +
arch/x86/ia32/ia32entry.S | 1 +
arch/x86/include/asm/unistd_32.h | 1 +
arch/x86/include/asm/unistd_64.h | 2 +
arch/x86/kernel/syscall_table_32.S | 1 +
fs/namei.c | 137 ++
fs/ocfs2/Makefile | 1 +
fs/ocfs2/alloc.c | 1342 ++++++-----
fs/ocfs2/alloc.h | 101 +-
fs/ocfs2/aops.c | 37 +-
fs/ocfs2/aops.h | 2 +
fs/ocfs2/buffer_head_io.c | 47 +-
fs/ocfs2/buffer_head_io.h | 8 +-
fs/ocfs2/cluster/masklog.c | 1 +
fs/ocfs2/cluster/masklog.h | 1 +
fs/ocfs2/dir.c | 107 +-
fs/ocfs2/dlm/dlmthread.c | 6 +-
fs/ocfs2/dlmglue.c | 105 +-
fs/ocfs2/dlmglue.h | 6 +
fs/ocfs2/extent_map.c | 33 +-
fs/ocfs2/extent_map.h | 8 +-
fs/ocfs2/file.c | 151 ++-
fs/ocfs2/file.h | 2 +
fs/ocfs2/inode.c | 86 +-
fs/ocfs2/inode.h | 20 +-
fs/ocfs2/ioctl.c | 14 +
fs/ocfs2/journal.c | 82 +-
fs/ocfs2/journal.h | 94 +-
fs/ocfs2/localalloc.c | 12 +-
fs/ocfs2/namei.c | 343 +++-
fs/ocfs2/namei.h | 6 +
fs/ocfs2/ocfs2.h | 52 +-
fs/ocfs2/ocfs2_fs.h | 107 +-
fs/ocfs2/ocfs2_lockid.h | 5 +
fs/ocfs2/quota_global.c | 5 +-
fs/ocfs2/quota_local.c | 26 +-
fs/ocfs2/refcounttree.c | 4249 +++++++++++++++++++++++++++++++++
fs/ocfs2/refcounttree.h | 108 +
fs/ocfs2/resize.c | 16 +-
fs/ocfs2/slot_map.c | 10 +-
fs/ocfs2/suballoc.c | 35 +-
fs/ocfs2/super.c | 13 +-
fs/ocfs2/uptodate.c | 265 ++-
fs/ocfs2/uptodate.h | 51 +-
fs/ocfs2/xattr.c | 2056 +++++++++++++++--
fs/ocfs2/xattr.h | 15 +-
include/linux/fcntl.h | 8 +
include/linux/fs.h | 2 +
include/linux/security.h | 23 +
include/linux/syscalls.h | 3 +
security/capability.c | 7 +
security/security.c | 8 +
53 files changed, 8823 insertions(+), 1176 deletions(-)
create mode 100644 Documentation/filesystems/reflink.txt
create mode 100644 fs/ocfs2/refcounttree.c
create mode 100644 fs/ocfs2/refcounttree.h
--
Life's Little Instruction Book #99
"Think big thoughts, but relish small pleasures."
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-11 20:04 [GIT PULL] ocfs2 changes for 2.6.32 Joel Becker @ 2009-09-14 21:32 ` Linus Torvalds 2009-09-14 22:14 ` Joel Becker ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Linus Torvalds @ 2009-09-14 21:32 UTC (permalink / raw) To: Joel Becker; +Cc: Mark Fasheh, Andrew Morton, linux-kernel, ocfs2-devel On Fri, 11 Sep 2009, Joel Becker wrote: > > Linus, et al, > Here are the ocfs2 feature changes for 2.6.32. The big ticket > item is the reflinkat(2) system call and ocfs2's support for it. The > ocfs2 support accounts for all but a handful of the changes. The > remaining few patches are fixes. I _really_ want some kind of ack's for new filesystem system calls like this. I'm not going to pull a new 'reflink[at]()' system call just based on a single filesystem. Yes, there's clearly been _some_ discussion, but (a) I've not seen it (since it's been on 'fsdevel', which is one of those single-topic mailing lists that I'm totally uninterested in, since they tend to become clique groups) and (b) you don't even say whether the thing has been acked by things like the security angle etc. So I'm not pulling this. Not until I get the feeling that there is consensus. I also don't understand why it's called 'reflink'. Why not 'copyfile'? We should not name things by implementation, we should name things by what they _do_. And I'm not seeing what is so 'reflink' about this that it's not a 'copyfile'. I also am not entirely clear on why you need the source name, and not - for example - an 'fd'. Are we going to add 'freflink[at]()' at some point? So I want explanations for the naming, I want sign-offs from other filesystem (and security) people, etc. What I do _not_ want is to get a "please pull" request for a filesystem, and notice that it's suddenly not all about just that particular filesystem, without any indication of who you've been talking to etc etc. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-14 21:32 ` Linus Torvalds @ 2009-09-14 22:14 ` Joel Becker 2009-09-14 23:27 ` Linus Torvalds 2009-09-15 6:44 ` Miklos Szeredi 2009-09-23 11:02 ` [GIT PULL] ocfs2 changes for 2.6.32 (take 2, no syscall) Joel Becker 2 siblings, 1 reply; 33+ messages in thread From: Joel Becker @ 2009-09-14 22:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mark Fasheh, Andrew Morton, linux-kernel, ocfs2-devel On Mon, Sep 14, 2009 at 02:32:36PM -0700, Linus Torvalds wrote: > On Fri, 11 Sep 2009, Joel Becker wrote: > > > > Linus, et al, > > Here are the ocfs2 feature changes for 2.6.32. The big ticket > > item is the reflinkat(2) system call and ocfs2's support for it. The > > ocfs2 support accounts for all but a handful of the changes. The > > remaining few patches are fixes. > > I _really_ want some kind of ack's for new filesystem system calls like > this. I'm not going to pull a new 'reflink[at]()' system call just based > on a single filesystem. I'll get specific acks. I sent it via ocfs2.git because others recommended I not send it upstream in June but instead wait until I had at least one filesystem implementing it. > Yes, there's clearly been _some_ discussion, but (a) I've not seen it > (since it's been on 'fsdevel', which is one of those single-topic mailing > lists that I'm totally uninterested in, since they tend to become clique > groups) and (b) you don't even say whether the thing has been acked by > things like the security angle etc. Fair enough. Don't worry, the security folks were involved. I'll get direct acks. > I also don't understand why it's called 'reflink'. Why not 'copyfile'? We > should not name things by implementation, we should name things by what > they _do_. And I'm not seeing what is so 'reflink' about this that it's > not a 'copyfile'. I also am not entirely clear on why you need the source > name, and not - for example - an 'fd'. > > Are we going to add 'freflink[at]()' at some point? It's a link(2) analogue. symlink(2) has the loosest coupling, and reflink(2) the highest. We're not going to add freflink[at](). It's a snap, not a copy. It can be used to implement a copy, and copyfile() in libc can be written with reflinkat(2), but it isn't just a copy. Joel -- "There is shadow under this red rock. (Come in under the shadow of this red rock) And I will show you something different from either Your shadow at morning striding behind you Or your shadow at evening rising to meet you. I will show you fear in a handful of dust." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-14 22:14 ` Joel Becker @ 2009-09-14 23:27 ` Linus Torvalds 2009-09-15 0:04 ` Joel Becker 0 siblings, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2009-09-14 23:27 UTC (permalink / raw) To: Joel Becker; +Cc: Mark Fasheh, Andrew Morton, linux-kernel, ocfs2-devel On Mon, 14 Sep 2009, Joel Becker wrote: > > It's a link(2) analogue. symlink(2) has the loosest coupling, > and reflink(2) the highest. We're not going to add freflink[at](). > It's a snap, not a copy. It can be used to implement a copy, and > copyfile() in libc can be written with reflinkat(2), but it isn't just a > copy. >From all but a performance standpoint, it's a copy. It has absolutely _zero_ "link" semantics. When you do a symlink or a hardlink, you see it in the resulting semantics: changing one changes the other. This 'reflink' has no such semantics that I can tell. It has purely copy semantics, never mind that it's optimized. And the thing to note is that it doesn't even have to be optimized as a "link". Think about network filesystems: maybe they want to implement this thing as a server-side "copy" operation (with atomicity guarantees). In other words, I can well imagine that for some filesystems, there really is no refcounting or linking implied, and that's why I think naming should be about semantics, not some random implementation issue that just happens to be true for some particular class of filesystems. So tell me - are there actually any non-copying semantics as far as the _user_ is concerned? Is there some reason why a NFS server might not implement this as a server-side copy? Is there something fundamentally in this all that is about reference counting as far as a user is concerned? I also still didn't get any answer to the "freflink()" question. You just said that we wouldn't do it, with no explanation. Why? We've discussed 'flink()' in the past, I just want to know that when we do a new system call there is some _reason_ why it's not going to explode into many different variants later... Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-14 23:27 ` Linus Torvalds @ 2009-09-15 0:04 ` Joel Becker 2009-09-15 0:31 ` Linus Torvalds 0 siblings, 1 reply; 33+ messages in thread From: Joel Becker @ 2009-09-15 0:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mark Fasheh, Andrew Morton, linux-kernel, ocfs2-devel On Mon, Sep 14, 2009 at 04:27:59PM -0700, Linus Torvalds wrote: > On Mon, 14 Sep 2009, Joel Becker wrote: > >From all but a performance standpoint, it's a copy. It has absolutely > _zero_ "link" semantics. When you do a symlink or a hardlink, you see it > in the resulting semantics: changing one changes the other. It's creating a new entry in the name space based on an old one. > And the thing to note is that it doesn't even have to be optimized as a > "link". Think about network filesystems: maybe they want to implement this > thing as a server-side "copy" operation (with atomicity guarantees). reflink doesn't merely guarantee atomicity, it guarantees the shared data extents. Under the auspices of reflink a network filesystem cannot merely provide an atomic copy. A separate copyfile call might allow that, but reflink doesn't. This is deliberate, because the caller wants the shared storage, not just a copy. > I also still didn't get any answer to the "freflink()" question. You just > said that we wouldn't do it, with no explanation. Why? We've discussed > 'flink()' in the past, I just want to know that when we do a new system > call there is some _reason_ why it's not going to explode into many > different variants later... Well, obviously I started from the fact that we don't have flink(). But it doesn't really fit anyway. reflink is a namespace operation - give me a new item in the namespace that shares the data extents of the old item. So working from a file descriptor doesn't quite fit. Plus, flink and freflink would have to deal with recovering already-orphaned inodes. Where do you stand on flink? If it actually makes sense to you, then perhaps we should consider it and freflinkat. It doesn't strike me as the way to go, but throughout all the discussion I'm quite willing to be convinced. Joel -- "I don't know anything about music. In my line you don't have to." - Elvis Presley Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-15 0:04 ` Joel Becker @ 2009-09-15 0:31 ` Linus Torvalds 2009-09-15 0:54 ` Joel Becker 0 siblings, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2009-09-15 0:31 UTC (permalink / raw) To: Joel Becker; +Cc: Mark Fasheh, Andrew Morton, linux-kernel, ocfs2-devel On Mon, 14 Sep 2009, Joel Becker wrote: > > It's creating a new entry in the name space based on an old one. That's just a cumbersome way of saying "copyfile". Here's a challenge for you: go outside, take the first five people you meet at random, and ask them what a 'copyfile()' system call would do. Then, do the same thing with 'reflink()'. Feel free to stack the deck, so that the people you ask about 'reflink()' actually know computers. Then report back which group guessed better what the system call does. > reflink doesn't merely guarantee atomicity, it guarantees the > shared data extents. Why? That just limits its usefulness. What's the reason for that sophistry, except to try to argue for a name that makes no sense? > Well, obviously I started from the fact that we don't have > flink(). But it doesn't really fit anyway. reflink is a namespace > operation - give me a new item in the namespace that shares the data > extents of the old item. That's not a namespace op, EXCEPT FOR THE NEW NAME. The data you share from has no namespace component to it, except as a lookup. But a 'fd' is equally descriptive of the shared data. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-15 0:31 ` Linus Torvalds @ 2009-09-15 0:54 ` Joel Becker 2009-09-15 2:01 ` Linus Torvalds 0 siblings, 1 reply; 33+ messages in thread From: Joel Becker @ 2009-09-15 0:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mark Fasheh, Andrew Morton, linux-kernel, ocfs2-devel On Mon, Sep 14, 2009 at 05:31:27PM -0700, Linus Torvalds wrote: > On Mon, 14 Sep 2009, Joel Becker wrote: > > reflink doesn't merely guarantee atomicity, it guarantees the > > shared data extents. > > Why? > > That just limits its usefulness. What's the reason for that sophistry, > except to try to argue for a name that makes no sense? This originally came from the idea of creating file snapshots. That was our original goal, but the more generic reflink call allows more than snapshots to be built. You can use it to implement copyfile or clone or a variety of things. But the snapshot capability is what really motivates, and removing the shared data requirement means removing that capability. Like any API we have, if it can degrade, you have to assume it degraded. A reflink/copyfile that can just copy means you have assume it copied and didn't conserve space. This makes it useless for snapshotting or cloning. In the reflink discussion before, I proposed that a separate copyfile() syscall could be written that uses the same ->reflink() inode operation but allows degradation in the storage handling. This would be a little more capable than a glibc copyfile() written around reflink because it would get the atomicity right. The separate copyfile/reflink calls would handle the different requirements of storage handling. I just concentrated on reflink and didn't worry about that alternate copyfile at the time being. I'm open to another proposal on how to do it. As a user, I need a way to ask for a reflink/copyfile that fails if it can't share the data. Things like snapshots and cloning gold VM images can't be doubling the storage. They become pointless. About the name, the reflink name came out of "you call it like link(2)" and "the storage is reference counted CoW". It really works well as "ln -r". Folks at the filesystem summit liked it, so I didn't change it. It's not so much that it has to be "reflink", but I've avoided "copyfile" because copyfile intuitively sounds like you describe, including the plain-copy fallback. Want me to call the requires-shared-data-because-its-a-snap version snapfileat(2)? Something better? > > Well, obviously I started from the fact that we don't have > > flink(). But it doesn't really fit anyway. reflink is a namespace > > operation - give me a new item in the namespace that shares the data > > extents of the old item. > > That's not a namespace op, EXCEPT FOR THE NEW NAME. > > The data you share from has no namespace component to it, except as a > lookup. But a 'fd' is equally descriptive of the shared data. Ok, I gather that you find freflink (and by extension, flink) compelling. I can certainly implement it. Joel -- A good programming language should have features that make the kind of people who use the phrase "software engineering" shake their heads disapprovingly. - Paul Graham Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-15 0:54 ` Joel Becker @ 2009-09-15 2:01 ` Linus Torvalds 2009-09-15 4:05 ` Arjan van de Ven 2009-09-15 4:06 ` Joel Becker 0 siblings, 2 replies; 33+ messages in thread From: Linus Torvalds @ 2009-09-15 2:01 UTC (permalink / raw) To: Joel Becker Cc: Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Mon, 14 Sep 2009, Joel Becker wrote: > > In the reflink discussion before, I proposed that a separate > copyfile() syscall could be written that uses the same ->reflink() inode > operation but allows degradation in the storage handling. .. exactly how? If you're talking about falling back to manually just copying the data, then nobody is interested in that. User space can do that better with a simple read-write loop or with splice, or whatever. There's no reaason what-so-ever to do that. But the thing is, network filesystems may be able to do server-side copies, and the point being that they can do so _without_ transferring the data to the client (and back). And if we do 'copyfile' (under whatever name) for one filesystem, then I think we should strive to make sure that it's useful for other filesystems too. Just google for "NFS Server-side Copy". And SMB has had a COPY command from the very beginning, I think. And as far as I can tell, neither NFS nor CIFS could use your definition of 'reflink()'. They aren't reflinks. Or rather, _could_ be, on the server, of course, but what some people want to do is to avoid moving data over the network. So it's not about "don't use more diskspace" for that kind of application. Do we really want to introduce a new filesystem operation that is likely to be broken for something like that? Now, it's possible that nobody will ever care, and that NFS server-side copy goes the way of a lot of other failed trials. But I really hope you have at least _talked_ to some CIFS/NFS people about this. [ Btw, it's quite possible that CIFS/NFS people would want more than a single entrypoint. I think they might want partial copies and status updates etc, which would likely mean that a single ->copyfile() thing isn't sufficient. Maybe it's not worth it, and the complexity of something like that gets to be too annoying. But I don't get the feeling that you've even _tried_ to see if this can be generalized to something that would be much more widely useful ] Now, I can see that you might want to say "fail rather than use double the diskspace for data". But why not just do that as a flag? You already have flags for 'copy extended attributes or not'. Why not have a flag that says 'copy only if you can do it without any extra space'? Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-15 2:01 ` Linus Torvalds @ 2009-09-15 4:05 ` Arjan van de Ven 2009-09-15 4:35 ` Joel Becker 2009-09-15 4:06 ` Joel Becker 1 sibling, 1 reply; 33+ messages in thread From: Arjan van de Ven @ 2009-09-15 4:05 UTC (permalink / raw) To: Linus Torvalds Cc: Joel Becker, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Mon, 14 Sep 2009 19:01:06 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Mon, 14 Sep 2009, Joel Becker wrote: > > > > In the reflink discussion before, I proposed that a separate > > copyfile() syscall could be written that uses the same ->reflink() > > inode operation but allows degradation in the storage handling. > > .. exactly how? > > If you're talking about falling back to manually just copying the > data, then nobody is interested in that. User space can do that > better with a simple read-write loop or with splice, or whatever. > There's no reaason what-so-ever to do that. > > But the thing is, network filesystems may be able to do server-side > copies, and the point being that they can do so _without_ > transferring the data to the client (and back). And if we do > 'copyfile' (under whatever name) for one filesystem, then I think we > should strive to make sure that it's useful for other filesystems too. COW filesystems like btrfs may also be able to do interesting things with copyfile() btw by just sharing all data blocks COW. That would make copyfile() useful for me, much more so than the network filesystem side... -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-15 4:05 ` Arjan van de Ven @ 2009-09-15 4:35 ` Joel Becker 0 siblings, 0 replies; 33+ messages in thread From: Joel Becker @ 2009-09-15 4:35 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Tue, Sep 15, 2009 at 06:05:59AM +0200, Arjan van de Ven wrote: > COW filesystems like btrfs may also be able to do interesting things > with copyfile() btw by just sharing all data blocks COW. > > That would make copyfile() useful for me, much more so than the network > filesystem side... btrfs reflink is definitely in the cards. Joel -- "The first thing we do, let's kill all the lawyers." -Henry VI, IV:ii Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-15 2:01 ` Linus Torvalds 2009-09-15 4:05 ` Arjan van de Ven @ 2009-09-15 4:06 ` Joel Becker 2009-09-15 16:30 ` Linus Torvalds 1 sibling, 1 reply; 33+ messages in thread From: Joel Becker @ 2009-09-15 4:06 UTC (permalink / raw) To: Linus Torvalds Cc: Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Mon, Sep 14, 2009 at 07:01:06PM -0700, Linus Torvalds wrote: > On Mon, 14 Sep 2009, Joel Becker wrote: > > > > In the reflink discussion before, I proposed that a separate > > copyfile() syscall could be written that uses the same ->reflink() inode > > operation but allows degradation in the storage handling. > > .. exactly how? > > If you're talking about falling back to manually just copying the data, > then nobody is interested in that. User space can do that better with a > simple read-write loop or with splice, or whatever. There's no reaason > what-so-ever to do that. I'm talking about any facility for copying that isn't just a userspace loop. Like your discussion of network filesystems. > But the thing is, network filesystems may be able to do server-side > copies, and the point being that they can do so _without_ transferring the > data to the client (and back). And if we do 'copyfile' (under whatever > name) for one filesystem, then I think we should strive to make sure that > it's useful for other filesystems too. Hence I brought this to the filesystem summit and then fsdevel rather than just implementing it in ocfs2. I know NFS folks were in the room in April, and they said the call definition was workable. Can't remember if CIFS folks were there, but I think so. > [ Btw, it's quite possible that CIFS/NFS people would want more than a > single entrypoint. I think they might want partial copies and status > updates etc, which would likely mean that a single ->copyfile() thing > isn't sufficient. > > Maybe it's not worth it, and the complexity of something like that gets > to be too annoying. But I don't get the feeling that you've even _tried_ > to see if this can be generalized to something that would be much more > widely useful ] I brought it up in a forum with everyone there precisely so that I wouldn't miss their concerns via myopia. reflink() is a generic application of the specific "let's snapshot inodes" idea. It doesn't do "atomic copy of data into duplicate storage", nor does it do "send byte ranges". The goal was something straightforward, not a kitchen sink. > Now, I can see that you might want to say "fail rather than use double > the diskspace for data". But why not just do that as a flag? You already > have flags for 'copy extended attributes or not'. Why not have a flag that > says 'copy only if you can do it without any extra space'? We could. Like I said, I really wanted something simple and clean. I tried hard to avoid that other flag, but I had to give up due to (correct) concerns from the security folks. I'm looking at both the ease of calling the call and how we define userspace programs to use it. reflink(1), the program, is essentially a synonym for 'ln -r' right now. That's pretty nice to use from a script. Other ideas have been 'cp --reflink' or 'cp --clone', but every proposal for a cp argument has felt awful and clunky. If I were doing a straight copyfile(), ignoring the reflink symantics, I'd want something that could be done by cp(1) at all times (rc = copyfile(); if ENOSYS do_normal_copy()). I mean, if we do it right, why not take advantage at all times. Using reflink here violates peoples expectations, because a reflink, with its shared data extents, can ENOSPC when you do CoW. Whereas a copyfile() that expects to duplicate the storage can fit within defualt cp. Joel -- "Heav'n hath no rage like love to hatred turn'd, nor Hell a fury, like a woman scorn'd." - William Congreve Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-15 4:06 ` Joel Becker @ 2009-09-15 16:30 ` Linus Torvalds 2009-09-15 21:45 ` Joel Becker 0 siblings, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2009-09-15 16:30 UTC (permalink / raw) To: Joel Becker Cc: Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Mon, 14 Sep 2009, Joel Becker wrote: > > > > If you're talking about falling back to manually just copying the data, > > then nobody is interested in that. User space can do that better with a > > simple read-write loop or with splice, or whatever. There's no reaason > > what-so-ever to do that. > > I'm talking about any facility for copying that isn't just a > userspace loop. Like your discussion of network filesystems. HOW? We need to have a per-filesystem interface to that. Having a '->copyfile()' function would be great. But don't you see how _idiotic_ it is to then also having a '->reflink()' function that does _conceptually_ the exact same thing, except it does it by incrementing a usage count instead? Do you see why I'm so unhappy to add a ->reflink() function? > Hence I brought this to the filesystem summit and then fsdevel > rather than just implementing it in ocfs2. I know NFS folks were in the > room in April, and they said the call definition was workable. Can't > remember if CIFS folks were there, but I think so. It's not workable if you define the 'reflink()' function to not use any disk space on the filesystem. Because SMB _will_ do a copy (and I presume the NFS thing will too). So it would not in general be what you call reflink, it will not be a "snapshot". So if you _define_ the semantics of "reflink" to be that it's atomic and doesn't use any new diskspace (apart from the new inode/directory entry, of course), then it will be almost totally useless to other filesystems. In fact, it's entirely possible to have filesystems that can avoid copying the _data_ blocks, but would need to copy the indirect blocks - maybe the data blocks are ref-counted, but the metadata needs to be per-file (I can see many reasons to do it that way, even if it's organized as a tree - it's how we do page table COW, for example, and it makes some things much simpler). Would that be a 'reflink()' or not? I have no way of knowing, because you have decided on reflink on a purely ocfs2-specific implementation basis. But I do know that such a filesystem would be perfectly happy to have a 'copyfile' function. This is why I want the VFS pointers to be about _semantics_, not about some random implementation detail. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-15 16:30 ` Linus Torvalds @ 2009-09-15 21:45 ` Joel Becker 2009-09-16 4:20 ` Linus Torvalds 0 siblings, 1 reply; 33+ messages in thread From: Joel Becker @ 2009-09-15 21:45 UTC (permalink / raw) To: Linus Torvalds Cc: Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Tue, Sep 15, 2009 at 09:30:54AM -0700, Linus Torvalds wrote: > HOW? > > We need to have a per-filesystem interface to that. No argument here. > But don't you see how _idiotic_ it is to then also having a '->reflink()' > function that does _conceptually_ the exact same thing, except it does it > by incrementing a usage count instead? > > Do you see why I'm so unhappy to add a ->reflink() function? I got it the first time. You see reflink() as a copyfile(), and distinguishing the inode operations doesn't make sense to you. Quite frankly, it doesn't to me either. There is the user<->kernel interface of the system call, and there is the filesystem interface of the inode operation. One inode op that can support multiple variations of user<->kernel is find with me! Let's step back a second. I'm not married to the name 'reflink'. I'm not opposed to a copyfile() syscall. I think I have a clearer idea of what I see. More below. > Would that be a 'reflink()' or not? I have no way of knowing, because you > have decided on reflink on a purely ocfs2-specific implementation basis. > But I do know that such a filesystem would be perfectly happy to have a > 'copyfile' function. That's not fair. I deliberately defined it as something outside of the ocfs2 implementation. Apparently I didn't do a good enough job. > This is why I want the VFS pointers to be about _semantics_, not about > some random implementation detail. Again, no argument here. The syscall interface better be reasonably obvious to the userspace programmer. The VFS pointer better be an efficient and clean way to implement the syscall interface. I'm seeing three things here: 1. A CoW snapshot of an inode. This is reflink. It expressly defines metadata as copyable, but data must be shared in a CoW fashion (to answer your question about indirect blocks). You either get a snapshot or nothing. Call it snapfile() if you like. Don't care. 2. An efficient copy. This is what you're talking about with CIFS COPY, etc. You want to be guaranteed it does NOT do CoW, because it would be great for a naive cp(1) to use it without the ENOSPC surprise of CoW. You'd like the kernel call to fail if you're just going to get read-write-loops, because userspace can implement that better. Maybe we have it such that only network filesystems implement this action, all the others return -ENOTSUPP, and then glibc handles the read-write-loop. This allows everyone to call copyfile() and get what they expected. 3. A space-saving copy. This is doing CoW linkup of the data storage if possible, like a snapshot but without the atomicity guarantee. It has the ENOSPC surprise, but someone using it should know that. I think it would be great for Linux to provide all three. I chose to only attack (1) because I could define it well. I left (2) and (3), what I see as copyfile(), for later work. And I fully expected that the VFS operation could change later - it's an internal thing, after all. I want to get a good user<->kernel interface, because that's the one that is set in stone. What I didn't want was to create another kitchen-sink call, or another POSIXy thing that has a million special cases that trip folks up. I'm glad you've taken an interest, because you're pretty damned good at architecture. If we can expand to cover copyfile sanely too, win-win. To me, the user<->kernel interface really is two system calls: reflink/snapfile for (1) and copyfile for (2) & (3). The kernel VFS interface I would think you could do in one inode operation. If you want to name it ->copyfile, that's fine. Perhaps ->copyfile takes the following flags: #define ALLOW_COW_SHARED 0x0001 #define REQUIRE_COW_SHARED 0x0002 #define REQUIRE_BASIC_ATTRS 0x0004 #define REQUIRE_FULL_ATTRS 0x0008 #define REQUIRE_ATOMIC 0x0010 #define SNAPSHOT (REQUIRE_COW_SHARED | REQUIRE_BASIC_ATTRS | REQUIRE_ATOMIC) #define SNAPSHOT_PRESERVE (SNAPSHOT | REQUIRE_FULL_ATTRS) Thus, sys_reflink/sys_snapfile(oldpath, newpath, 0) becomes: ->copyfile(oldpath, newpath, SNAPSHOT) and sys_reflink/sys_snapfile(oldpath, newpath, ATTR_PRESERVE) becomes: ->copyfile(oldpath, newpath, SNAPSHOT_PRESERVE) while sys_copyfile(oldpath, newpath, 0) is: ->copyfile(oldpath, newpath, 0) and sys_copyfile(oldpath, newpath, ALLOW_COW) is: ->copyfile(oldpath, newpath, ALLOW_COW_SHARED) What do you think? Other ideas? Joel -- "The lawgiver, of all beings, most owes the law allegiance. He of all men should behave as though the law compelled him. But it is the universal weakness of mankind that what we are given to administer we presently imagine we own." - H.G. Wells Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-15 21:45 ` Joel Becker @ 2009-09-16 4:20 ` Linus Torvalds 2009-09-16 4:40 ` Joel Becker 0 siblings, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2009-09-16 4:20 UTC (permalink / raw) To: Joel Becker Cc: Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Tue, 15 Sep 2009, Joel Becker wrote: > > Perhaps ->copyfile takes the following flags: > > #define ALLOW_COW_SHARED 0x0001 > #define REQUIRE_COW_SHARED 0x0002 > #define REQUIRE_BASIC_ATTRS 0x0004 > #define REQUIRE_FULL_ATTRS 0x0008 > #define REQUIRE_ATOMIC 0x0010 > #define SNAPSHOT (REQUIRE_COW_SHARED | > REQUIRE_BASIC_ATTRS | > REQUIRE_ATOMIC) > #define SNAPSHOT_PRESERVE (SNAPSHOT | REQUIRE_FULL_ATTRS) > > Thus, sys_reflink/sys_snapfile(oldpath, newpath, 0) becomes: > ... Yes. The above all sounds sane to me. I still worry that especially the non-atomic case will want some kind of partial-copy updates (think graphical file managers that want to show the progress of the copy), and that (think EINTR and continuing) makes me think "that could get really complex really quickly", but that's something that the NFS/SMB people would have to pipe up on. I'm pretty sure the NFS spec has some kind "partial completion notification" model, I dunno about SMB. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-16 4:20 ` Linus Torvalds @ 2009-09-16 4:40 ` Joel Becker 2009-09-17 16:29 ` Linus Torvalds 0 siblings, 1 reply; 33+ messages in thread From: Joel Becker @ 2009-09-16 4:40 UTC (permalink / raw) To: Linus Torvalds Cc: Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Tue, Sep 15, 2009 at 09:20:47PM -0700, Linus Torvalds wrote: > On Tue, 15 Sep 2009, Joel Becker wrote: > > > > Perhaps ->copyfile takes the following flags: > > > > #define ALLOW_COW_SHARED 0x0001 > > #define REQUIRE_COW_SHARED 0x0002 > > #define REQUIRE_BASIC_ATTRS 0x0004 > > #define REQUIRE_FULL_ATTRS 0x0008 > > #define REQUIRE_ATOMIC 0x0010 > > #define SNAPSHOT (REQUIRE_COW_SHARED | > > REQUIRE_BASIC_ATTRS | > > REQUIRE_ATOMIC) > > #define SNAPSHOT_PRESERVE (SNAPSHOT | REQUIRE_FULL_ATTRS) > > > > Thus, sys_reflink/sys_snapfile(oldpath, newpath, 0) becomes: > > ... > > Yes. The above all sounds sane to me. Ok. Where do you see the exposure level? What I mean is, I just defined a vfs op that handles these things, but accessed it via two syscalls, sys_snapfile() and sys_copyfile(). We could also just provide one system call and allow userspace to use these flags itself, creating snapfile(3) and copyfile(3) in libc, hiding the details (kind of like clone being hidden by pthreads, though ignoring that pthreads has "issues"). Or we could explicitly make this the public API and expect something like cp(1) to directly use the flags. Thoughts? > I still worry that especially the non-atomic case will want some kind of > partial-copy updates (think graphical file managers that want to show the > progress of the copy), and that (think EINTR and continuing) makes me > think "that could get really complex really quickly", but that's something > that the NFS/SMB people would have to pipe up on. I'm pretty sure the NFS > spec has some kind "partial completion notification" model, I dunno about > SMB. I'm really wary of combining a ranged interface with this one. Not only does it make no sense for snapshots, but I think it falls down in any "create a new inode" scheme entirely. btrfs has an ioctl that basically says "link up range x->y of file 1 to file 2". Chris is using the underlying machinery to implement reflink, but I think the concept actually would work nicely as a splice flag. If you have two existing files, not creating one, you can just ask splice to do efficient things with a SPLICE_F_EFFICIENT_COPY for yoru CIFS COPY-style thing or SPLICE_F_COW_COPY for btrfs- and ocfs2-style data sharing. Joel -- "Nothing is wrong with California that a rise in the ocean level wouldn't cure." - Ross MacDonald Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-16 4:40 ` Joel Becker @ 2009-09-17 16:29 ` Linus Torvalds 2009-09-17 16:38 ` Arjan van de Ven ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Linus Torvalds @ 2009-09-17 16:29 UTC (permalink / raw) To: Joel Becker Cc: Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Tue, 15 Sep 2009, Joel Becker wrote: > > Ok. Where do you see the exposure level? What I mean is, I > just defined a vfs op that handles these things, but accessed it via two > syscalls, sys_snapfile() and sys_copyfile(). We could also just provide > one system call and allow userspace to use these flags itself, creating > snapfile(3) and copyfile(3) in libc Why would anybody want to hide it at all? Why even the libc hiding? Nobody is going to use this except for special apps. Let them see what they can do, in all its glory. > > I still worry that especially the non-atomic case will want some kind of > > partial-copy updates (think graphical file managers that want to show the > > progress of the copy), and that (think EINTR and continuing) makes me > > think "that could get really complex really quickly", but that's something > > that the NFS/SMB people would have to pipe up on. I'm pretty sure the NFS > > spec has some kind "partial completion notification" model, I dunno about > > SMB. > > I'm really wary of combining a ranged interface with this one. > Not only does it make no sense for snapshots, but I think it falls down > in any "create a new inode" scheme entirely. Oh, I wouldn't suggest a ranged interface, just one that allows for status updates and cancelling - _if_ the initial op isn't atomic to begin with. There's also the issue of concurrency in IO: maybe you want to start several things without necessarily waiting for them (think high-throughput "cp -R" on NFS or something like that). So I'd suggest something like having two system calls: one to start the operation, and one to control it. And for a filesystem that does atomic copies, the 'start' one obviously would also finish it, so the 'control' it would be a no-op, because there would never be any outstanding ones. See what I'm saying? It wouldn't complicate _your_ life, but it would allow for filesystems that can't do it atomically (or even quickly). So the first one would be something like int copyfile(const char *src, const char *dest, unsigned long flags); which would return: - zero on success - negative (with errno) on error - positive cookie on "I started it, here's my cookie". For extra bonus points, maybe the cookie would actually be a file descriptor (for poll/select users), but it would _not_ be a file descriptor to the resulting _file_, it would literally be a "cookie" to the actual copyfile event. and then for ocfs2 you'd never return positive cookies. You'd never have to worry about it. Then the second interface would be something like int copyfile_ctrl(long cookie, unsigned long cmd); where you'd just have some way to wait for completion and ask how much has been copied. The 'cmd' would be some set of 'cancel', 'status' or 'uninterruptible wait' or whatever, and the return value would again be - negative (with errno) for errors (copy failed) - cookie released - zero for 'done' - cookie released - positive for 'percent remaining' or whatever - cookie still valid and this would be another callback into the filesystem code, but you'd never have to worry about it, since you'd never see it (just leave it NULL). NOTE! The above is a rough idea - I have not spent tons of time thinking about it, or looking at exactly what something like NFS would really want. But the _concept_ is simple, and usage should be pretty trivial. A simple case would be something like this: int copy_file(const char *src, const char *dst) { /* Start a file copy */ int cookie = copyfile(src, dst, 0); /* Async case? */ if (cookie > 0) { int ret; while ((ret = copyfile_ctrl(cookie, COPYFILE_WAIT)) > 0) /* nothing */; /* Error handling is shared for async/sync */ cookie = ret; } if (cookie < 0) { perror("copyfile failed"); return -1; } return 0; } doesn't that look fairly easy to use? And the advantage here is that you _can_ - still fairly easily - do much more involved things. For example, let's say that you wanted to do a very efficient parallel copy, so you'd do something like this: #define MAX_PEND 10 static int pending[MAX_PEND]; static int nr_pending = 0; static int wait_for_completion(int nr_left) { int ret; while (nr_pending > nr_left) { int cookie = pending[0], i; /* Wait for completion of the oldest entry */ while ((i = copyfile_ctrl(cookie, COPYFILE_WAIT)) > 0) /* nothing */; /* Save the "we had an error" case */ if (i < 0) ret = i; /* Move the other entries down */ memmove(pending, pending+1, sizeof(int)*--nr_pending); } return ret; } int start_copy(src, dst) { int cookie, ret; cookie = copyfile(src, dst, 0); if (cookie <= 0) return cookie; ret = 0; if (nr_pending == MAX_PENDING) ret = wait_for_completion(pending, MAX_PENDING/2); pending[nr_pending++] = cookie; return ret; } int stop_copy(void) { return wait_for_completion(pending, 0); } which basically ends up having ten copyfile() calls outstanding (and when we hit the limit, we wait for half of them to complete), so now you can do an efficient "cp -R" with concurrent server-side IO. And it wasn't so hard, was it? (Ok, so the above would need to be fleshed out to remember the filenames so that you can report _which_ file failed etc, but you get the idea). And again, it wouldn't be any more complicated for your case. Your copyfile would always just return 0 or negative for error. But it would be _way_ more powerful for filesystems that want to do potentially lots of IO for the file copy. I dunno. The above seems like a fairly simple and powerful interface, and I _think_ it would be ok for NFS and CIFS. And in fact, if that whole "background copy" ends up being used a lot, maybe even a local filesystem would implement it just to get easy overlapping IO - even if it would just be a trivial common wrapper function that says "start a thread to do a trivial manual copy". Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-17 16:29 ` Linus Torvalds @ 2009-09-17 16:38 ` Arjan van de Ven 2009-09-17 20:16 ` Linus Torvalds 2009-09-17 18:40 ` Roland Dreier 2009-09-18 1:43 ` [Ocfs2-devel] " Joel Becker 2 siblings, 1 reply; 33+ messages in thread From: Arjan van de Ven @ 2009-09-17 16:38 UTC (permalink / raw) To: Linus Torvalds Cc: Joel Becker, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Thu, 17 Sep 2009 09:29:14 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Tue, 15 Sep 2009, Joel Becker wrote: > > > > Ok. Where do you see the exposure level? What I mean is, I > > just defined a vfs op that handles these things, but accessed it > > via two syscalls, sys_snapfile() and sys_copyfile(). We could also > > just provide one system call and allow userspace to use these flags > > itself, creating snapfile(3) and copyfile(3) in libc > > Why would anybody want to hide it at all? Why even the libc hiding? > > Nobody is going to use this except for special apps. Let them see > what they can do, in all its glory. > > > > I still worry that especially the non-atomic case will want some > > > kind of partial-copy updates (think graphical file managers that > > > want to show the progress of the copy), and that (think EINTR and > > > continuing) makes me think "that could get really complex really > > > quickly", but that's something that the NFS/SMB people would have > > > to pipe up on. I'm pretty sure the NFS spec has some kind > > > "partial completion notification" model, I dunno about SMB. > > > > I'm really wary of combining a ranged interface with this > > one. Not only does it make no sense for snapshots, but I think it > > falls down in any "create a new inode" scheme entirely. > > Oh, I wouldn't suggest a ranged interface, just one that allows for > status updates and cancelling - _if_ the initial op isn't atomic to > begin with. There's also the issue of concurrency in IO: maybe you > want to start several things without necessarily waiting for them > (think high-throughput "cp -R" on NFS or something like that). > > So I'd suggest something like having two system calls: one to start > the operation, and one to control it. And for a filesystem that does > atomic copies, the 'start' one obviously would also finish it, so the > 'control' it would be a no-op, because there would never be any > outstanding ones. > > See what I'm saying? It wouldn't complicate _your_ life, but it would > allow for filesystems that can't do it atomically (or even quickly). > > So the first one would be something like > > int copyfile(const char *src, const char *dest, unsigned long > flags); > > which would return: > > - zero on success > - negative (with errno) on error > - positive cookie on "I started it, here's my cookie". For extra > bonus points, maybe the cookie would actually be a file descriptor > (for poll/select users), but it would _not_ be a file descriptor to > the resulting _file_, it would literally be a "cookie" to the actual > copyfile event. > > and then for ocfs2 you'd never return positive cookies. You'd never > have to worry about it. > > Then the second interface would be something like > > int copyfile_ctrl(long cookie, unsigned long cmd); > > where you'd just have some way to wait for completion and ask how > much has been copied. The 'cmd' would be some set of 'cancel', > 'status' or 'uninterruptible wait' or whatever, and the return value > would again be > > - negative (with errno) for errors (copy failed) - cookie released > - zero for 'done' - cookie released > - positive for 'percent remaining' or whatever - cookie still valid > > and this would be another callback into the filesystem code, but > you'd never have to worry about it, since you'd never see it (just > leave it NULL). > > NOTE! The above is a rough idea - I have not spent tons of time > thinking about it, or looking at exactly what something like NFS > would really want. But the _concept_ is simple, and usage should be > pretty trivial. A simple case would be something like this: > > int copy_file(const char *src, const char *dst) > { > /* Start a file copy */ > int cookie = copyfile(src, dst, 0); > > /* Async case? */ > if (cookie > 0) { > int ret; > > while ((ret = copyfile_ctrl(cookie, COPYFILE_WAIT)) > > 0) /* nothing */; > > /* Error handling is shared for async/sync */ > cookie = ret; > } > if (cookie < 0) { > perror("copyfile failed"); > return -1; > } > return 0; > } > > doesn't that look fairly easy to use? > > And the advantage here is that you _can_ - still fairly easily - do > much more involved things. For example, let's say that you wanted to > do a very efficient parallel copy, so you'd do something like this: > > #define MAX_PEND 10 > static int pending[MAX_PEND]; > static int nr_pending = 0; > > static int wait_for_completion(int nr_left) > { > int ret; > > while (nr_pending > nr_left) { > int cookie = pending[0], i; > > /* Wait for completion of the oldest entry */ > while ((i = copyfile_ctrl(cookie, > COPYFILE_WAIT)) > 0) /* nothing */; > > /* Save the "we had an error" case */ > if (i < 0) > ret = i; > > /* Move the other entries down */ > memmove(pending, pending+1, > sizeof(int)*--nr_pending); } > return ret; > } > > int start_copy(src, dst) > { > int cookie, ret; > > cookie = copyfile(src, dst, 0); > if (cookie <= 0) > return cookie; > > ret = 0; > if (nr_pending == MAX_PENDING) > ret = wait_for_completion(pending, > MAX_PENDING/2); > > pending[nr_pending++] = cookie; > return ret; > } > > int stop_copy(void) > { > return wait_for_completion(pending, 0); > } > > which basically ends up having ten copyfile() calls outstanding (and > when we hit the limit, we wait for half of them to complete), so now > you can do an efficient "cp -R" with concurrent server-side IO. And > it wasn't so hard, was it? > > (Ok, so the above would need to be fleshed out to remember the > filenames so that you can report _which_ file failed etc, but you get > the idea). > > And again, it wouldn't be any more complicated for your case. Your > copyfile would always just return 0 or negative for error. But it > would be _way_ more powerful for filesystems that want to do > potentially lots of IO for the file copy. > > I dunno. The above seems like a fairly simple and powerful interface, > and I _think_ it would be ok for NFS and CIFS. And in fact, if that > whole "background copy" ends up being used a lot, maybe even a local > filesystem would implement it just to get easy overlapping IO - even > if it would just be a trivial common wrapper function that says > "start a thread to do a trivial manual copy". or make it one level simpler? Have a "wait for all started copies" call only.... saves a ton of book keeping, and is likely what people will use it for in the end anyway. (implementation wise the fallback implementation could then just use the async function calls if it wanted to, and just wait for all copies to finish in the complete call) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-17 16:38 ` Arjan van de Ven @ 2009-09-17 20:16 ` Linus Torvalds 0 siblings, 0 replies; 33+ messages in thread From: Linus Torvalds @ 2009-09-17 20:16 UTC (permalink / raw) To: Arjan van de Ven Cc: Joel Becker, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Thu, 17 Sep 2009, Arjan van de Ven wrote: > > or make it one level simpler? > Have a "wait for all started copies" call only.... saves a ton of book > keeping, and is likely what people will use it for in the end anyway. No. That wouldn't work. For a few reasons: - the case I didn't show was the "graphical file manager client" thing that wants to show the copy as it progresses. It needs to know how much is left, and for which file. - if errors happen, you need to indicate which file had an error. Again, my example code didn't show that, since it was written as an example and obviously just while writing email anyway. But it's a major requirement for any sane and reliable filesystem model! - even in my example, I wanted to show how you don't want to wait for _all_ of them in the middle, you just want to wait for some of them. If you wait for all of them - just to make room for more - you're going to have hickups in your IO patterns and you cannot saturate your server or disks well. So I really think there needs to be a cookie per outstanding file copy (of course, the kernel is likely to not allow a single user more than 'n' outstanding copies anyway, but that's a separate issue). Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-17 16:29 ` Linus Torvalds 2009-09-17 16:38 ` Arjan van de Ven @ 2009-09-17 18:40 ` Roland Dreier 2009-09-17 20:17 ` Linus Torvalds 2009-09-18 1:43 ` [Ocfs2-devel] " Joel Becker 2 siblings, 1 reply; 33+ messages in thread From: Roland Dreier @ 2009-09-17 18:40 UTC (permalink / raw) To: Linus Torvalds Cc: Joel Becker, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel > int copy_file(const char *src, const char *dst) > { > /* Start a file copy */ > int cookie = copyfile(src, dst, 0); > > /* Async case? */ > if (cookie > 0) { > int ret; > > while ((ret = copyfile_ctrl(cookie, COPYFILE_WAIT)) > 0) > /* nothing */; > > /* Error handling is shared for async/sync */ > cookie = ret; > } > if (cookie < 0) { > perror("copyfile failed"); > return -1; > } I guess one bit of semantics to figure out is what happens if copyfile() does the async case but then copyfile_ctrl() returns an error halfway through... is the state of the dest file just undefined? - R. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-17 18:40 ` Roland Dreier @ 2009-09-17 20:17 ` Linus Torvalds 2009-09-17 20:34 ` Joel Becker 2009-09-17 20:42 ` Roland Dreier 0 siblings, 2 replies; 33+ messages in thread From: Linus Torvalds @ 2009-09-17 20:17 UTC (permalink / raw) To: Roland Dreier Cc: Joel Becker, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Thu, 17 Sep 2009, Roland Dreier wrote: > > I guess one bit of semantics to figure out is what happens if copyfile() > does the async case but then copyfile_ctrl() returns an error halfway > through... is the state of the dest file just undefined? I think that's the one that most filesystems would prefer. Maybe the file is there, it's just that it's only half copied because the filesystem filled up. Making filesystems give atomicity guarantees would be hard for the async case. Of course, if the filesystem can do the copy entirely atomically (ie by just incrementing a refcount), then it can give atomicity guarantees, but then you'd never see the async case either. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-17 20:17 ` Linus Torvalds @ 2009-09-17 20:34 ` Joel Becker 2009-09-18 0:29 ` Linus Torvalds 2009-09-17 20:42 ` Roland Dreier 1 sibling, 1 reply; 33+ messages in thread From: Joel Becker @ 2009-09-17 20:34 UTC (permalink / raw) To: Linus Torvalds Cc: Roland Dreier, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Thu, Sep 17, 2009 at 01:17:55PM -0700, Linus Torvalds wrote: > On Thu, 17 Sep 2009, Roland Dreier wrote: > > > > I guess one bit of semantics to figure out is what happens if copyfile() > > does the async case but then copyfile_ctrl() returns an error halfway > > through... is the state of the dest file just undefined? > > I think that's the one that most filesystems would prefer. Maybe the file > is there, it's just that it's only half copied because the filesystem > filled up. I have to say, adding 'undefined behavior' things isn't fun in a call that is already potentially confusing. We have a bunch of flags and behaviors we're covering. > Making filesystems give atomicity guarantees would be hard for the async > case. Note that "cleaning up after an error" and "atomic" are not the same. Atomicity implies that not only do you see all or none, but that the contents are a point-in-time of the source file. A non-atomic implementation may be affected by writes that happen during the copy (like any read-write-loop copy would be). As an example, ocfs2_reflink() builds the target inode in the orphan directory. If the operation fails at any point, it's removed. If we crash, orphan cleanup happens. Only if it succeeds do we move it to the target directory. ocfs2_reflink() is an atomic snapshot, of course, but recoverability is certainly possible for a non-atomic copyfile() on filesystems with similar orphan schemes (ext3 is the obvious example). Of course, how the network filesystems might see it, I don't know. NFS/CIFS folks, please speak up. > Of course, if the filesystem can do the copy entirely atomically (ie by > just incrementing a refcount), then it can give atomicity guarantees, but > then you'd never see the async case either. Even the atomic copy might take a little time (say, to bump up and write out the metadata structures). Do you want to define that as not being async? I was figuring COPYFILE_ATOMIC and COPYFILE_WAIT to be separate flags. Joel -- "Behind every successful man there's a lot of unsuccessful years." - Bob Brown Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-17 20:34 ` Joel Becker @ 2009-09-18 0:29 ` Linus Torvalds 0 siblings, 0 replies; 33+ messages in thread From: Linus Torvalds @ 2009-09-18 0:29 UTC (permalink / raw) To: Joel Becker Cc: Roland Dreier, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Thu, 17 Sep 2009, Joel Becker wrote: > > I have to say, adding 'undefined behavior' things isn't fun in a > call that is already potentially confusing. We have a bunch of flags > and behaviors we're covering. I don't think it's "undefined". It's just not complete. >From a user standpoint, there is no difference between such a system call and doing the thing as a library routine (which has to be the fallback anyway for something like 'cp'). > Note that "cleaning up after an error" and "atomic" are not the > same. Atomicity implies that not only do you see all or none, but that > the contents are a point-in-time of the source file. A non-atomic > implementation may be affected by writes that happen during the copy > (like any read-write-loop copy would be). Sure. There are middle grounds that may not need the cleanup. I was more looking at the two extreme ends. > > Of course, if the filesystem can do the copy entirely atomically (ie by > > just incrementing a refcount), then it can give atomicity guarantees, but > > then you'd never see the async case either. > > Even the atomic copy might take a little time (say, to bump up > and write out the metadata structures). Do you want to define that as > not being async? I was figuring COPYFILE_ATOMIC and COPYFILE_WAIT to be > separate flags. I don't think that's wrong, and yeah, you could decide that you actually want to be able to support the "ten outstanding 'copy' commands from user space" model too. So yeah, separate COPYFILE_ATOMIC (only succeed if you can do it as an atomic op in the filesystem) and COPYFILE_WAIT (only return when fully done) bits sounds conceptually fine to me. Whether it's worth it for a filesystem that effectively only needs a couple of writes (that can be buffered too), I dunno. But it's certainly not something I'd object to on an interface level. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-17 20:17 ` Linus Torvalds 2009-09-17 20:34 ` Joel Becker @ 2009-09-17 20:42 ` Roland Dreier 2009-09-17 20:55 ` Linus Torvalds 1 sibling, 1 reply; 33+ messages in thread From: Roland Dreier @ 2009-09-17 20:42 UTC (permalink / raw) To: Linus Torvalds Cc: Joel Becker, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel > > I guess one bit of semantics to figure out is what happens if copyfile() > > does the async case but then copyfile_ctrl() returns an error halfway > > through... is the state of the dest file just undefined? > I think that's the one that most filesystems would prefer. Maybe the file > is there, it's just that it's only half copied because the filesystem > filled up. Makes sense... and even having the state of having the file half-copied is not really well-defined since a filesystem may want to optimize things by copying out of order etc. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-17 20:42 ` Roland Dreier @ 2009-09-17 20:55 ` Linus Torvalds 0 siblings, 0 replies; 33+ messages in thread From: Linus Torvalds @ 2009-09-17 20:55 UTC (permalink / raw) To: Roland Dreier Cc: Joel Becker, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Thu, 17 Sep 2009, Roland Dreier wrote: > > > I guess one bit of semantics to figure out is what happens if copyfile() > > > does the async case but then copyfile_ctrl() returns an error halfway > > > through... is the state of the dest file just undefined? > > > I think that's the one that most filesystems would prefer. Maybe the file > > is there, it's just that it's only half copied because the filesystem > > filled up. > > Makes sense... and even having the state of having the file half-copied > is not really well-defined since a filesystem may want to optimize > things by copying out of order etc. Yeah. The thing to remember is that where you'd use a non-atomic 'copyfile()' system call is as a replacement for just doing the same thing by hand in user space, so any users of this system call would basically have to handle the "oops, it failed with ENOSPC in the middle" _anyway_. So there is no downside to saying "ok, it failed in the middle, you clean up the result", because the user needs to support that anyway. The ones that use copyfile for filesystem-specific snapshots and use the ATOMIC bit to say so obviously don't have this issue. But they aren't looking for a "faster 'cp'" thing - they're looking for very specific semantics, and for the ATOMIC case we can provide those kinds of nice atomic "everything or nothing" semantics. So the fact that some random server-side copy over CIFS/NFS may need cleanup by the user after failing at half-point is not actually a downside. It doesn't affect Joel's kind of fancy use. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32 2009-09-17 16:29 ` Linus Torvalds 2009-09-17 16:38 ` Arjan van de Ven 2009-09-17 18:40 ` Roland Dreier @ 2009-09-18 1:43 ` Joel Becker 2009-09-18 13:34 ` Pádraig Brady 2009-09-18 17:23 ` Peter W. Morreale 2 siblings, 2 replies; 33+ messages in thread From: Joel Becker @ 2009-09-18 1:43 UTC (permalink / raw) To: Linus Torvalds Cc: Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote: > Why would anybody want to hide it at all? Why even the libc hiding? > > Nobody is going to use this except for special apps. Let them see what > they can do, in all its glory. I expect everyone will use this through cp(1), so that cp(1) can try to get server-side copy on the network filesystms. Speaking of "all its glory", what we have now is: int sys_copyfileat(int oldfd, const char *oldname, int newfd, const char *newname, int flags, int atflags) > So I'd suggest something like having two system calls: one to start the > operation, and one to control it. And for a filesystem that does atomic > copies, the 'start' one obviously would also finish it, so the 'control' > it would be a no-op, because there would never be any outstanding ones. > > See what I'm saying? It wouldn't complicate _your_ life, but it would > allow for filesystems that can't do it atomically (or even quickly). > > So the first one would be something like > > int copyfile(const char *src, const char *dest, unsigned long flags); > > which would return: > > - zero on success > - negative (with errno) on error > - positive cookie on "I started it, here's my cookie". For extra bonus > points, maybe the cookie would actually be a file descriptor (for > poll/select users), but it would _not_ be a file descriptor to the > resulting _file_, it would literally be a "cookie" to the actual > copyfile event. Actually, if the cookie is a magic file descriptor, you don't need ctl. You can play tricks like polling for completoin, read(magic_fd, &remain, sizeof(loff_t)) for status, and close(magic_fd) for cancel. Might be a bit overloaded, though. > and then for ocfs2 you'd never return positive cookies. You'd never have > to worry about it. I suspect we'll later take advantage of copyfile's other modes. I did reflink as reflink only for the simple fact of doing one thing and well, not because I think copyfile isn't good. > Then the second interface would be something like > > int copyfile_ctrl(long cookie, unsigned long cmd); > > where you'd just have some way to wait for completion and ask how much has > been copied. The 'cmd' would be some set of 'cancel', 'status' or > 'uninterruptible wait' or whatever, and the return value would again be > > - negative (with errno) for errors (copy failed) - cookie released > - zero for 'done' - cookie released > - positive for 'percent remaining' or whatever - cookie still valid > > and this would be another callback into the filesystem code, but you'd > never have to worry about it, since you'd never see it (just leave it > NULL). I was going to ask about how to fit both calls into one inode operation, but I see you're giving this as an additional inode operation. This leaves us with a simliar-to-reflink inode copyfile op and a control op: ->copyfile(old_dentry, dir_inode, new_dentry, flags) ->copyfile_ctl(int cookie, unsigned int cmd) I have to change the flags a little, as my original proposal didn't handle backoff correctly. #define COPYFILE_WAIT 0x0001 /* Block until complete */ #define COPYFILE_ATOMIC 0x0002 /* Things copied must be point-in-time and it must fail or succeed completely. */ #define COPYFILE_ALLOW_COW 0x0004 /* The filesystem may share data extents between the source and target in a Copy-on-Write fashion. If neither COPYFILE_ALLOW_COW nor COPYFILE_REQUIRE_COW are specified, data extents must NOT be shared. When neither COW flag is provided, most filesystems should return -ENOTSUPP, as userspace can do read-write looping itself */ #define COPYFILE_REQUIRE_COW 0x0008 /* Data extents MUST be shared between the source and target in a Copy-on-Write fashion */ #define COPYFILE_UNPRIV_ATTRS 0x0010 /* Unprivileged attributes should be copied from the source to the target */ #define COPYFILE_PRIV_ATTRS 0x0020 /* Privileged attributes should be copied from the source to the target if the caller has the necessary privileges */ #define COPYFILE_REQUIRE_ATTRS 0x0040 /* Combined with the other attribute flags, the call MUST fail if the caller lacks the necessary privileges to copy ever attribute requested */ #define COPYFILE_SNAPSHOT_ASYNC (COPYFILE_REQUIRE_COW | COPYFILE_UNPRIV_ATTRS | COPYFILE_PRIV_ATTRS | COPYFILE_ATOMIC) #define COPYFILE_SNAPSHOT_STRICT_ASYNC (COPYFILE_SNAPSHOT_ASYNC | COPYFILE_REQUIRE_ATTRS) #define COPYFILE_SNAPSHOT (COPYFILE_SNAPSHOT_ASYNC | COPYFILE_WAIT) #define COPYFILE_SNAPSHOT_STRICT (COPYFILE_SNAPSHOT_STRICT_ASYNC | COPYFILE_WAIT) > I dunno. The above seems like a fairly simple and powerful interface, and > I _think_ it would be ok for NFS and CIFS. And in fact, if that whole > "background copy" ends up being used a lot, maybe even a local filesystem > would implement it just to get easy overlapping IO - even if it would just > be a trivial common wrapper function that says "start a thread to do a > trivial manual copy". NFS and CIFS folks, please speak up. Joel -- "There is no more evil thing on earth than race prejudice, none at all. I write deliberately -- it is the worst single thing in life now. It justifies and holds together more baseness, cruelty and abomination than any other sort of error in the world." - H. G. Wells Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32 2009-09-18 1:43 ` [Ocfs2-devel] " Joel Becker @ 2009-09-18 13:34 ` Pádraig Brady 2009-09-18 18:37 ` Joel Becker 2009-09-18 17:23 ` Peter W. Morreale 1 sibling, 1 reply; 33+ messages in thread From: Pádraig Brady @ 2009-09-18 13:34 UTC (permalink / raw) To: Linus Torvalds, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel Joel Becker wrote: > On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote: >> Why would anybody want to hide it at all? Why even the libc hiding? >> >> Nobody is going to use this except for special apps. Let them see what >> they can do, in all its glory. > > I expect everyone will use this through cp(1), so that cp(1) can > try to get server-side copy on the network filesystms. For reference, cp(1) has a --reflink option as of coreutils-7.5 which currently just does: ioctl (dest_fd, BTRFS_IOC_CLONE, src_fd); There is a specific option in cp to do this because a "reflink copy" was seen to have these disadvantages: 1. one copy of data blocks so more chances of data loss 2. disk head seeking deferred to modification process 3. possible fragmentation on write 4. possible ENOSPC on write Now 2. will go away with time, and 3 & 4 may be alleviated by the use of fallocate(), but 1. was deemed important enough to not enable by default. cheers, Pádraig. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32 2009-09-18 13:34 ` Pádraig Brady @ 2009-09-18 18:37 ` Joel Becker 0 siblings, 0 replies; 33+ messages in thread From: Joel Becker @ 2009-09-18 18:37 UTC (permalink / raw) To: Pádraig Brady Cc: Linus Torvalds, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Fri, Sep 18, 2009 at 02:34:18PM +0100, Pádraig Brady wrote: > Joel Becker wrote: > > On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote: > >> Why would anybody want to hide it at all? Why even the libc hiding? > >> > >> Nobody is going to use this except for special apps. Let them see what > >> they can do, in all its glory. > > > > I expect everyone will use this through cp(1), so that cp(1) can > > try to get server-side copy on the network filesystms. > > For reference, cp(1) has a --reflink option as of > coreutils-7.5 which currently just does: > > ioctl (dest_fd, BTRFS_IOC_CLONE, src_fd); Note that the btrfs ioctl is not a reflink(), so this probably wants changing (OCFS2_IOC_REFLINK is the ocfs2 ioctl, sys_reflink() was going to be the syscall). > There is a specific option in cp to do this because > a "reflink copy" was seen to have these disadvantages: > > 1. one copy of data blocks so more chances of data loss > 2. disk head seeking deferred to modification process > 3. possible fragmentation on write > 4. possible ENOSPC on write > > Now 2. will go away with time, and 3 & 4 may be alleviated > by the use of fallocate(), but 1. was deemed important > enough to not enable by default. 1, 2, and 3 are definitely in the category of "it would be nice to choose the behavior". 4 is the big one, because it breaks default cp(1) assumptions. The good news is that the current copyfile idea of copyfile(src, dst, 0) would satisfy 1-4 and be efficient or return -ENOTSUPP/-ENOSYS if it couldn't be. Then cp(1) falls back to the read-write loop. cp --reflink would become copyfile(src, dst, COPYFILE_SNAPSHOT) Joel -- Life's Little Instruction Book #451 "Don't be afraid to say, 'I'm sorry.'" Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32 2009-09-18 1:43 ` [Ocfs2-devel] " Joel Becker 2009-09-18 13:34 ` Pádraig Brady @ 2009-09-18 17:23 ` Peter W. Morreale 2009-09-18 18:39 ` Joel Becker 1 sibling, 1 reply; 33+ messages in thread From: Peter W. Morreale @ 2009-09-18 17:23 UTC (permalink / raw) To: Joel Becker Cc: Linus Torvalds, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Thu, 2009-09-17 at 18:43 -0700, Joel Becker wrote: > On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote: > > Why would anybody want to hide it at all? Why even the libc hiding? > > > > Nobody is going to use this except for special apps. Let them see what > > they can do, in all its glory. > > I expect everyone will use this through cp(1), so that cp(1) can > try to get server-side copy on the network filesystms. > Speaking of "all its glory", what we have now is: > > int sys_copyfileat(int oldfd, const char *oldname, int newfd, > const char *newname, int flags, int atflags) Would it be worthwhile to consider adding an offset and length? Then we get dd as well. (potentially) Best, -PWM > > > So I'd suggest something like having two system calls: one to start the > > operation, and one to control it. And for a filesystem that does atomic > > copies, the 'start' one obviously would also finish it, so the 'control' > > it would be a no-op, because there would never be any outstanding ones. > > > > See what I'm saying? It wouldn't complicate _your_ life, but it would > > allow for filesystems that can't do it atomically (or even quickly). > > > > So the first one would be something like > > > > int copyfile(const char *src, const char *dest, unsigned long flags); > > > > which would return: > > > > - zero on success > > - negative (with errno) on error > > - positive cookie on "I started it, here's my cookie". For extra bonus > > points, maybe the cookie would actually be a file descriptor (for > > poll/select users), but it would _not_ be a file descriptor to the > > resulting _file_, it would literally be a "cookie" to the actual > > copyfile event. > > Actually, if the cookie is a magic file descriptor, you don't > need ctl. You can play tricks like polling for completoin, > read(magic_fd, &remain, sizeof(loff_t)) for status, and close(magic_fd) > for cancel. Might be a bit overloaded, though. > > > and then for ocfs2 you'd never return positive cookies. You'd never have > > to worry about it. > > I suspect we'll later take advantage of copyfile's other > modes. I did reflink as reflink only for the simple fact of doing one > thing and well, not because I think copyfile isn't good. > > > Then the second interface would be something like > > > > int copyfile_ctrl(long cookie, unsigned long cmd); > > > > where you'd just have some way to wait for completion and ask how much has > > been copied. The 'cmd' would be some set of 'cancel', 'status' or > > 'uninterruptible wait' or whatever, and the return value would again be > > > > - negative (with errno) for errors (copy failed) - cookie released > > - zero for 'done' - cookie released > > - positive for 'percent remaining' or whatever - cookie still valid > > > > and this would be another callback into the filesystem code, but you'd > > never have to worry about it, since you'd never see it (just leave it > > NULL). > > I was going to ask about how to fit both calls into one inode > operation, but I see you're giving this as an additional inode > operation. > This leaves us with a simliar-to-reflink inode copyfile op and a > control op: > > ->copyfile(old_dentry, dir_inode, new_dentry, flags) > ->copyfile_ctl(int cookie, unsigned int cmd) > > I have to change the flags a little, as my original proposal > didn't handle backoff correctly. > > #define COPYFILE_WAIT 0x0001 /* Block until complete */ > #define COPYFILE_ATOMIC 0x0002 /* Things copied must be > point-in-time and it must > fail or succeed completely. */ > #define COPYFILE_ALLOW_COW 0x0004 /* The filesystem may share data > extents between the source > and target in a Copy-on-Write > fashion. If neither > COPYFILE_ALLOW_COW nor > COPYFILE_REQUIRE_COW are > specified, data extents must > NOT be shared. When neither > COW flag is provided, most > filesystems should return > -ENOTSUPP, as userspace can > do read-write looping > itself */ > #define COPYFILE_REQUIRE_COW 0x0008 /* Data extents MUST be shared > between the source and target > in a Copy-on-Write fashion */ > #define COPYFILE_UNPRIV_ATTRS 0x0010 /* Unprivileged attributes > should be copied from the > source to the target */ > #define COPYFILE_PRIV_ATTRS 0x0020 /* Privileged attributes should > be copied from the source to > the target if the caller has > the necessary privileges */ > #define COPYFILE_REQUIRE_ATTRS 0x0040 /* Combined with the other > attribute flags, the call > MUST fail if the caller lacks > the necessary privileges to > copy ever attribute > requested */ > > #define COPYFILE_SNAPSHOT_ASYNC (COPYFILE_REQUIRE_COW | > COPYFILE_UNPRIV_ATTRS | > COPYFILE_PRIV_ATTRS | > COPYFILE_ATOMIC) > #define COPYFILE_SNAPSHOT_STRICT_ASYNC (COPYFILE_SNAPSHOT_ASYNC | > COPYFILE_REQUIRE_ATTRS) > #define COPYFILE_SNAPSHOT (COPYFILE_SNAPSHOT_ASYNC | > COPYFILE_WAIT) > #define COPYFILE_SNAPSHOT_STRICT (COPYFILE_SNAPSHOT_STRICT_ASYNC | > COPYFILE_WAIT) > > > I dunno. The above seems like a fairly simple and powerful interface, and > > I _think_ it would be ok for NFS and CIFS. And in fact, if that whole > > "background copy" ends up being used a lot, maybe even a local filesystem > > would implement it just to get easy overlapping IO - even if it would just > > be a trivial common wrapper function that says "start a thread to do a > > trivial manual copy". > > NFS and CIFS folks, please speak up. > > Joel > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Ocfs2-devel] [GIT PULL] ocfs2 changes for 2.6.32 2009-09-18 17:23 ` Peter W. Morreale @ 2009-09-18 18:39 ` Joel Becker 0 siblings, 0 replies; 33+ messages in thread From: Joel Becker @ 2009-09-18 18:39 UTC (permalink / raw) To: Peter W. Morreale Cc: Linus Torvalds, Mark Fasheh, Andrew Morton, Linux Kernel Mailing List, ocfs2-devel On Fri, Sep 18, 2009 at 11:23:33AM -0600, Peter W. Morreale wrote: > On Thu, 2009-09-17 at 18:43 -0700, Joel Becker wrote: > > On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote: > > > Why would anybody want to hide it at all? Why even the libc hiding? > > > > > > Nobody is going to use this except for special apps. Let them see what > > > they can do, in all its glory. > > > > I expect everyone will use this through cp(1), so that cp(1) can > > try to get server-side copy on the network filesystms. > > Speaking of "all its glory", what we have now is: > > > > int sys_copyfileat(int oldfd, const char *oldname, int newfd, > > const char *newname, int flags, int atflags) > > > Would it be worthwhile to consider adding an offset and length? > > Then we get dd as well. (potentially) I'm with Linus that a range attribute really makes this complicated. I also think it doesn't work well with a call that is supposed to create newname. Joel -- Pitchers and catchers report. Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32 2009-09-14 21:32 ` Linus Torvalds 2009-09-14 22:14 ` Joel Becker @ 2009-09-15 6:44 ` Miklos Szeredi 2009-09-23 11:02 ` [GIT PULL] ocfs2 changes for 2.6.32 (take 2, no syscall) Joel Becker 2 siblings, 0 replies; 33+ messages in thread From: Miklos Szeredi @ 2009-09-15 6:44 UTC (permalink / raw) To: torvalds; +Cc: Joel.Becker, mfasheh, akpm, linux-kernel, ocfs2-devel On Mon, 14 Sep 2009, Linus Torvalds wrote: > Are we going to add 'freflink[at]()' at some point? We already have an interface for "freflink()" called splice(). Splice has all the arguments needed to implement refcounted data copies and more. Yes, splice's name implies more of a "piecing data together" type of operation, but we can look at reflink() or copyfile() as just "splice one big piece from here to there". Thanks, Miklos ^ permalink raw reply [flat|nested] 33+ messages in thread
* [GIT PULL] ocfs2 changes for 2.6.32 (take 2, no syscall) 2009-09-14 21:32 ` Linus Torvalds 2009-09-14 22:14 ` Joel Becker 2009-09-15 6:44 ` Miklos Szeredi @ 2009-09-23 11:02 ` Joel Becker 2 siblings, 0 replies; 33+ messages in thread From: Joel Becker @ 2009-09-23 11:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mark Fasheh, Andrew Morton, linux-kernel, ocfs2-devel On Mon, Sep 14, 2009 at 02:32:36PM -0700, Linus Torvalds wrote: > So I'm not pulling this. Not until I get the feeling that there is > consensus. Linus, Here are all of the ocfs2 changes without the reflink(2) system call. The snapshot functionality is still available via the ocfs2 ioctl(2). I'm going to formulate up a new system call based on the discussion we've had, but clearly that's not something for 2.6.32 anymore. Please pull. The following changes since commit 8379e7c46cc48f51197dd663fc6676f47f2a1e71: Sunil Mushran (1): ocfs2: ocfs2_write_begin_nolock() should handle len=0 are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2.git upstream-linus Coly Li (1): dlmglue.c: add missed mlog lines Joel Becker (40): ocfs2: Make the ocfs2_caching_info structure self-contained. ocfs2: Change metadata caching locks to an operations structure. ocfs2: Take the inode out of the metadata read/write paths. ocfs2: move ip_last_trans to struct ocfs2_caching_info ocfs2: move ip_created_trans to struct ocfs2_caching_info ocfs2: Pass struct ocfs2_caching_info to the journal functions. ocfs2: Store the ocfs2_caching_info on ocfs2_extent_tree. ocfs2: Pass ocfs2_caching_info to ocfs2_read_extent_block(). ocfs2: ocfs2_find_path() only needs the caching info ocfs2: ocfs2_create_new_meta_bhs() doesn't need struct inode. ocfs2: Pass ocfs2_extent_tree to ocfs2_unlink_path() ocfs2: ocfs2_complete_edge_insert() doesn't need struct inode at all. ocfs2: Get inode out of ocfs2_rotate_subtree_root_right(). ocfs2: Pass ocfs2_extent_tree to ocfs2_get_subtree_root() ocfs2: Drop struct inode from ocfs2_extent_tree_operations. ocfs2: ocfs2_rotate_tree_right() doesn't need struct inode. ocfs2: ocfs2_update_edge_lengths() doesn't need struct inode. ocfs2: ocfs2_rotate_subtree_left() doesn't need struct inode. ocfs2: __ocfs2_rotate_tree_left() doesn't need struct inode. ocfs2: ocfs2_rotate_tree_left() no longer needs struct inode. ocfs2: ocfs2_merge_rec_left/right() no longer need struct inode. ocfs2: ocfs2_try_to_merge_extent() doesn't need struct inode. ocfs2: ocfs2_grow_branch() and ocfs2_append_rec_to_path() lose struct inode. ocfs2: ocfs2_truncate_rec() doesn't need struct inode. ocfs2: Make truncating the extent map an extent_tree_operation. ocfs2: ocfs2_insert_at_leaf() doesn't need struct inode. ocfs2: Give ocfs2_split_record() an extent_tree instead of an inode. ocfs2: ocfs2_do_insert_extent() and ocfs2_insert_path() no longer need an inode. ocfs2: ocfs2_extent_contig() only requires the superblock. ocfs2: Swap inode for extent_tree in ocfs2_figure_merge_contig_type(). ocfs2: Remove inode from ocfs2_figure_extent_contig(). ocfs2: ocfs2_figure_insert_type() no longer needs struct inode. ocfs2: Make extent map insertion an extent_tree_operation. ocfs2: ocfs2_insert_extent() no longer needs struct inode. ocfs2: ocfs2_add_clusters_in_btree() no longer needs struct inode. ocfs2: ocfs2_remove_extent() no longer needs struct inode. ocfs2: ocfs2_split_and_insert() no longer needs struct inode. ocfs2: Teach ocfs2_replace_extent_rec() to use an extent_tree. ocfs2: __ocfs2_mark_extent_written() doesn't need struct inode. ocfs2: Pass ocfs2_caching_info into ocfs_init_*_extent_tree(). Sunil Mushran (1): ocfs2: __ocfs2_abort() should not enable panic for local mounts Tao Ma (42): ocfs2: Define refcount tree structure. ocfs2: Add metaecc for ocfs2_refcount_block. ocfs2: Add ocfs2_read_refcount_block. ocfs2: Abstract caching info checkpoint. ocfs2: Add new refcount tree lock resource in dlmglue. ocfs2: Add caching info for refcount tree. ocfs2: Add refcount tree lock mechanism. ocfs2: Basic tree root operation. ocfs2: Wrap ocfs2_extent_contig in ocfs2_extent_tree. ocfs2: Abstract extent split process. ocfs2: Add refcount b-tree as a new extent tree. ocfs2: move tree path functions to alloc.h. ocfs2: Add support for incrementing refcount in the tree. ocfs2: Add support of decrementing refcount for delete. ocfs2: Add functions for extents refcounted. ocfs2: Decrement refcount when truncating refcounted extents. ocfs2: Add CoW support. ocfs2: CoW refcount tree improvement. ocfs2: Integrate CoW in file write. ocfs2: CoW a reflinked cluster when it is truncated. ocfs2: Add normal functions for reflink a normal file's extents. ocfs2: handle file attributes issue for reflink. ocfs2: Return extent flags for xattr value tree. ocfs2: Abstract duplicate clusters process in CoW. ocfs2: Add CoW support for xattr. ocfs2: Remove inode from ocfs2_xattr_bucket_get_name_value. ocfs2: Abstract the creation of xattr block. ocfs2: Abstract ocfs2 xattr tree extend rec iteration process. ocfs2: Attach xattr clusters to refcount tree. ocfs2: Call refcount tree remove process properly. ocfs2: Create an xattr indexed block if needed. ocfs2: Add reflink support for xattr. ocfs2: Modify removing xattr process for refcount. ocfs2: Don't merge in 1st refcount ops of reflink. ocfs2: Make transaction extend more efficient. ocfs2: Use proper parameter for some inode operation. ocfs2: Create reflinked file in orphan dir. ocfs2: Add preserve to reflink. ocfs2: Implement ocfs2_reflink. ocfs2: Enable refcount tree support. ocfs2: Add ioctl for reflink. ocfs2: Use buffer IO if we are appending a file. Wengang Wang (1): ocfs2: add spinlock protection when dealing with lockres->purge. fs/ocfs2/Makefile | 1 + fs/ocfs2/alloc.c | 1342 ++++++++------ fs/ocfs2/alloc.h | 101 +- fs/ocfs2/aops.c | 37 +- fs/ocfs2/aops.h | 2 + fs/ocfs2/buffer_head_io.c | 47 +- fs/ocfs2/buffer_head_io.h | 8 +- fs/ocfs2/cluster/masklog.c | 1 + fs/ocfs2/cluster/masklog.h | 1 + fs/ocfs2/dir.c | 107 +- fs/ocfs2/dlm/dlmthread.c | 6 +- fs/ocfs2/dlmglue.c | 105 +- fs/ocfs2/dlmglue.h | 6 + fs/ocfs2/extent_map.c | 33 +- fs/ocfs2/extent_map.h | 8 +- fs/ocfs2/file.c | 151 ++- fs/ocfs2/file.h | 2 + fs/ocfs2/inode.c | 86 +- fs/ocfs2/inode.h | 20 +- fs/ocfs2/ioctl.c | 14 + fs/ocfs2/journal.c | 82 +- fs/ocfs2/journal.h | 94 +- fs/ocfs2/localalloc.c | 12 +- fs/ocfs2/namei.c | 341 ++++- fs/ocfs2/namei.h | 6 + fs/ocfs2/ocfs2.h | 52 +- fs/ocfs2/ocfs2_fs.h | 107 ++- fs/ocfs2/ocfs2_lockid.h | 5 + fs/ocfs2/quota_global.c | 5 +- fs/ocfs2/quota_local.c | 26 +- fs/ocfs2/refcounttree.c | 4313 ++++++++++++++++++++++++++++++++++++++++++++ fs/ocfs2/refcounttree.h | 106 ++ fs/ocfs2/resize.c | 16 +- fs/ocfs2/slot_map.c | 10 +- fs/ocfs2/suballoc.c | 35 +- fs/ocfs2/super.c | 13 +- fs/ocfs2/uptodate.c | 265 ++-- fs/ocfs2/uptodate.h | 51 +- fs/ocfs2/xattr.c | 2056 +++++++++++++++++++-- fs/ocfs2/xattr.h | 15 +- 40 files changed, 8512 insertions(+), 1176 deletions(-) create mode 100644 fs/ocfs2/refcounttree.c create mode 100644 fs/ocfs2/refcounttree.h -- "But then she looks me in the eye And says, 'We're going to last forever,' And man you know I can't begin to doubt it. Cause it just feels so good and so free and so right, I know we ain't never going to change our minds about it, Hey! Here comes my girl." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [GIT PULL] ocfs2 changes for 2.6.32
@ 2009-09-22 0:51 George Spelvin
0 siblings, 0 replies; 33+ messages in thread
From: George Spelvin @ 2009-09-22 0:51 UTC (permalink / raw)
To: Joel.Becker; +Cc: linux, linux-kernel, torvalds
> Perhaps ->copyfile takes the following flags:
>
> #define ALLOW_COW_SHARED 0x0001
> #define REQUIRE_COW_SHARED 0x0002
> #define REQUIRE_BASIC_ATTRS 0x0004
> #define REQUIRE_FULL_ATTRS 0x0008
> #define REQUIRE_ATOMIC 0x0010
> #define SNAPSHOT (REQUIRE_COW_SHARED |
> REQUIRE_BASIC_ATTRS |
> REQUIRE_ATOMIC)
> #define SNAPSHOT_PRESERVE (SNAPSHOT | REQUIRE_FULL_ATTRS)
Um, could I strongly suggest that flags == 0 be the "succeed if at all
possible case", and various options limit it.
In particular, invert ALLOW_COW_SHARED to REQUIRE_ALLOCATE.
Another possibly useful flag would be REQUIRE_OPTIMIZED.
I.e. if it's not appreciably faster than a read/write loop, perhaps
the application would prefer to do it itself.
We also have to define the error code to return in case of a flag
violation. ENOTSUP?
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [GIT PULL] ocfs2 changes for 2.6.32
@ 2009-09-22 3:28 George Spelvin
0 siblings, 0 replies; 33+ messages in thread
From: George Spelvin @ 2009-09-22 3:28 UTC (permalink / raw)
To: arjan; +Cc: linux, linux-kernel, torvalds
> or make it one level simpler?
> Have a "wait for all started copies" call only.... saves a ton of book
> keeping, and is likely what people will use it for in the end anyway.
Aieeee! No, no, a thousand times, no!
We do NOT need another blocking primitive that can't play well with others.
That would be a HORRIBLE design mistake.
The whole benefit of Linus' scheme is that it returns a file descriptor.
Any waiting should be done by the standard event-wait system call, poll().
It should return POLLIN when there's an interesting event (such as copy
completion), and should remain valid until explicitly close()d.
There's nothing wrong with a convenience function that waits for all
started copies, but I don't see a reason to design a new kernel interface
for the purpose.
A few more points:
- If the file descriptor returned by copyfile() is guaranteed not to be 0
(even if that is available), perhaps it should be guaranteed to be >= 3.
- We might as well make the returned file descriptor O_CLOEXEC by default.
You can always change it back if you want to.
^ permalink raw reply [flat|nested] 33+ messages in threadend of thread, other threads:[~2009-09-23 11:02 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-09-11 20:04 [GIT PULL] ocfs2 changes for 2.6.32 Joel Becker 2009-09-14 21:32 ` Linus Torvalds 2009-09-14 22:14 ` Joel Becker 2009-09-14 23:27 ` Linus Torvalds 2009-09-15 0:04 ` Joel Becker 2009-09-15 0:31 ` Linus Torvalds 2009-09-15 0:54 ` Joel Becker 2009-09-15 2:01 ` Linus Torvalds 2009-09-15 4:05 ` Arjan van de Ven 2009-09-15 4:35 ` Joel Becker 2009-09-15 4:06 ` Joel Becker 2009-09-15 16:30 ` Linus Torvalds 2009-09-15 21:45 ` Joel Becker 2009-09-16 4:20 ` Linus Torvalds 2009-09-16 4:40 ` Joel Becker 2009-09-17 16:29 ` Linus Torvalds 2009-09-17 16:38 ` Arjan van de Ven 2009-09-17 20:16 ` Linus Torvalds 2009-09-17 18:40 ` Roland Dreier 2009-09-17 20:17 ` Linus Torvalds 2009-09-17 20:34 ` Joel Becker 2009-09-18 0:29 ` Linus Torvalds 2009-09-17 20:42 ` Roland Dreier 2009-09-17 20:55 ` Linus Torvalds 2009-09-18 1:43 ` [Ocfs2-devel] " Joel Becker 2009-09-18 13:34 ` Pádraig Brady 2009-09-18 18:37 ` Joel Becker 2009-09-18 17:23 ` Peter W. Morreale 2009-09-18 18:39 ` Joel Becker 2009-09-15 6:44 ` Miklos Szeredi 2009-09-23 11:02 ` [GIT PULL] ocfs2 changes for 2.6.32 (take 2, no syscall) Joel Becker -- strict thread matches above, loose matches on Subject: below -- 2009-09-22 0:51 [GIT PULL] ocfs2 changes for 2.6.32 George Spelvin 2009-09-22 3:28 George Spelvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox