* Re: [RFC][PATCH 0/3] ext4: online defrag (ver 1.0) [not found] <49829A1D.5090002@rs.jp.nec.com> @ 2009-01-30 20:15 ` Chris Mason 2009-02-03 8:00 ` Akira Fujita 2009-01-30 22:33 ` Greg Freemyer 1 sibling, 1 reply; 8+ messages in thread From: Chris Mason @ 2009-01-30 20:15 UTC (permalink / raw) To: Akira Fujita; +Cc: Theodore Tso, linux-ext4, linux-fsdevel On Fri, 2009-01-30 at 15:11 +0900, Akira Fujita wrote: > Hi, > > I have rewritten ext4 online defrag patches based on the comments from Ted. > In the new defrag, create donor inode in the user space instead of kernel space, > and then allocate contiguous blocks to it with fallocate(). > In kernel space, exchange the blocks between target inode and donor inode, > and then copy the file data of target inode to donor inode every 64MB. > The EXT4_IOC_DEFRAG ioctl becomes simpler than the old one, > so it may be useful for other purposes. > One thing you'll want to handle is swap files. The swap code uses the bmap ioctl to make a mapping of extents in the files, and expects that mapping not to change. So, defragging a swap file will lead to some serious problems. Btrfs is currently getting around this by dropping bmap support, so swapfiles on btrfs won't work at all. A real long term solution is required ;) For ext4 you should be able to just detect swapfile and disallow the defrag on it. -chris ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 0/3] ext4: online defrag (ver 1.0) 2009-01-30 20:15 ` [RFC][PATCH 0/3] ext4: online defrag (ver 1.0) Chris Mason @ 2009-02-03 8:00 ` Akira Fujita 0 siblings, 0 replies; 8+ messages in thread From: Akira Fujita @ 2009-02-03 8:00 UTC (permalink / raw) To: Chris Mason; +Cc: Theodore Tso, linux-ext4, linux-fsdevel Hi Chris, Chris Mason wrote: > On Fri, 2009-01-30 at 15:11 +0900, Akira Fujita wrote: >> Hi, >> >> I have rewritten ext4 online defrag patches based on the comments from Ted. >> In the new defrag, create donor inode in the user space instead of kernel space, >> and then allocate contiguous blocks to it with fallocate(). >> In kernel space, exchange the blocks between target inode and donor inode, >> and then copy the file data of target inode to donor inode every 64MB. >> The EXT4_IOC_DEFRAG ioctl becomes simpler than the old one, >> so it may be useful for other purposes. >> > > One thing you'll want to handle is swap files. The swap code uses the > bmap ioctl to make a mapping of extents in the files, and expects that > mapping not to change. So, defragging a swap file will lead to some > serious problems. > > Btrfs is currently getting around this by dropping bmap support, so > swapfiles on btrfs won't work at all. A real long term solution is > required ;) > > For ext4 you should be able to just detect swapfile and disallow the > defrag on it. > Thank you for teaching. ;) I'll add the swapfile checks to command and kernel. If target file is swapfile, ext4 online defrag returns an error without doing defrag in the next version. Regards, Akira Fujita ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 0/3] ext4: online defrag (ver 1.0) [not found] <49829A1D.5090002@rs.jp.nec.com> 2009-01-30 20:15 ` [RFC][PATCH 0/3] ext4: online defrag (ver 1.0) Chris Mason @ 2009-01-30 22:33 ` Greg Freemyer 2009-02-04 8:07 ` Akira Fujita 1 sibling, 1 reply; 8+ messages in thread From: Greg Freemyer @ 2009-01-30 22:33 UTC (permalink / raw) To: Akira Fujita; +Cc: Theodore Tso, linux-ext4, linux-fsdevel On Fri, Jan 30, 2009 at 1:11 AM, Akira Fujita <a-fujita@rs.jp.nec.com> wrote: > Hi, > > I have rewritten ext4 online defrag patches based on the comments from Ted. > In the new defrag, create donor inode in the user space instead of kernel space, > and then allocate contiguous blocks to it with fallocate(). > In kernel space, exchange the blocks between target inode and donor inode, > and then copy the file data of target inode to donor inode every 64MB. > The EXT4_IOC_DEFRAG ioctl becomes simpler than the old one, > so it may be useful for other purposes. > > #define EXT4_IOC_DEFRAG _IOW('f', 15, struct move_extent) > Do we want the ioctl name to be specific to defrag? I thought Ted's goal was to make it more generic? I can also envision this same ioctl being implemented by other file systems so EXT4 seems an inappropriate prefix. Thoughts? > struct move_extent { > int org_fd; /* original file descriptor */ > int dest_fd; /* destination file descriptor */ > ext4_lblk_t start; /* logical offset of org_fd and dest_fd */ > ext4_lblk_t len; /* exchange block length */ > }; I would also like to see .dest_fd changed to .donor_fd. I would like to see the ABI be more flexible and have .start be broken into 2 fields: .start_orig .start_donor And I don't think they should be of type ext4_lblk_t. Something more generic seems appropriate. Thoughts? Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 0/3] ext4: online defrag (ver 1.0) 2009-01-30 22:33 ` Greg Freemyer @ 2009-02-04 8:07 ` Akira Fujita 2009-02-04 12:25 ` Greg Freemyer 2009-02-04 14:09 ` Theodore Tso 0 siblings, 2 replies; 8+ messages in thread From: Akira Fujita @ 2009-02-04 8:07 UTC (permalink / raw) To: Greg Freemyer; +Cc: Theodore Tso, linux-ext4, linux-fsdevel Hi Greg, Greg Freemyer wrote: > On Fri, Jan 30, 2009 at 1:11 AM, Akira Fujita <a-fujita@rs.jp.nec.com> wrote: >> Hi, >> >> I have rewritten ext4 online defrag patches based on the comments from Ted. >> In the new defrag, create donor inode in the user space instead of kernel space, >> and then allocate contiguous blocks to it with fallocate(). >> In kernel space, exchange the blocks between target inode and donor inode, >> and then copy the file data of target inode to donor inode every 64MB. >> The EXT4_IOC_DEFRAG ioctl becomes simpler than the old one, >> so it may be useful for other purposes. >> >> #define EXT4_IOC_DEFRAG _IOW('f', 15, struct move_extent) >> > I see. Does EXT4_IOC_MOVE_EXT sound better for you? #define EXT4_IOC_MOVE_EXT _IOW('f', 15, struct move_extent) > Do we want the ioctl name to be specific to defrag? I thought Ted's > goal was to make it more generic? I can also envision this same ioctl > being implemented by other file systems so EXT4 seems an inappropriate > prefix. Other filesystems (e.g. xfs, btrfs) have their own defrag ioctl, and ext2/3 can not use this ioctl because they do not handle extent file, though. What kind of advantage do you think by moving this ioctl to vfs layer? > Thoughts? > >> struct move_extent { >> int org_fd; /* original file descriptor */ >> int dest_fd; /* destination file descriptor */ >> ext4_lblk_t start; /* logical offset of org_fd and dest_fd */ >> ext4_lblk_t len; /* exchange block length */ >> }; > > I would also like to see .dest_fd changed to .donor_fd. > > I would like to see the ABI be more flexible and have .start be broken > into 2 fields: > > .start_orig > .start_donor > > And I don't think they should be of type ext4_lblk_t. Something more > generic seems appropriate. > OK, I broke .start into .orig_start and .donor_start and changed the entry type from ext4_lblk_t to __u64. The new move_extent structure is as follows: struct move_extent { int orig_fd; /* original file descriptor */ int donor_fd; /* donor file descriptor */ __u64 orig_start; /* logical start offset in block for orig */ __u64 donor_start; /* logical start offset in block for donor */ __u64 len; /* exchange block length */ }; Any comments? Regards, Akira Fujita ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 0/3] ext4: online defrag (ver 1.0) 2009-02-04 8:07 ` Akira Fujita @ 2009-02-04 12:25 ` Greg Freemyer 2009-02-04 14:09 ` Theodore Tso 1 sibling, 0 replies; 8+ messages in thread From: Greg Freemyer @ 2009-02-04 12:25 UTC (permalink / raw) To: Akira Fujita; +Cc: Theodore Tso, linux-ext4, linux-fsdevel Hi Akira, On Wed, Feb 4, 2009 at 3:07 AM, Akira Fujita <a-fujita@rs.jp.nec.com> wrote: > Hi Greg, > > Greg Freemyer wrote: >> >> On Fri, Jan 30, 2009 at 1:11 AM, Akira Fujita <a-fujita@rs.jp.nec.com> >> wrote: >>> >>> Hi, >>> >>> I have rewritten ext4 online defrag patches based on the comments from >>> Ted. >>> In the new defrag, create donor inode in the user space instead of kernel >>> space, >>> and then allocate contiguous blocks to it with fallocate(). >>> In kernel space, exchange the blocks between target inode and donor >>> inode, >>> and then copy the file data of target inode to donor inode every 64MB. >>> The EXT4_IOC_DEFRAG ioctl becomes simpler than the old one, >>> so it may be useful for other purposes. >>> >>> #define EXT4_IOC_DEFRAG _IOW('f', 15, struct move_extent) >>> >> > > I see. Does EXT4_IOC_MOVE_EXT sound better for you? > > #define EXT4_IOC_MOVE_EXT _IOW('f', 15, struct move_extent) I like it better, but a core developer should weigh in. >> Do we want the ioctl name to be specific to defrag? I thought Ted's >> goal was to make it more generic? I can also envision this same ioctl >> being implemented by other file systems so EXT4 seems an inappropriate >> prefix. > > Other filesystems (e.g. xfs, btrfs) have their own defrag ioctl, > and ext2/3 can not use this ioctl because they do not handle > extent file, though. I don't want ext2/3 to share any kernel code. I do hope that userspace code could eventually be written to exercise EXT4_IOC_MOVE_EXT type functionality for all 3 filesystems. Do we really need a new ioctl for each one? > What kind of advantage do you think by moving this ioctl > to vfs layer? I only got interested in this code because I started monitoring the OHSM project (http://code.google.com/p/fscops/). They don't need defrag, but they do need the functionality of EXT4_IOC_MOVE_EXT. They are currently writing their code around ext2 and have a proof of concept implementation almost ready. Each time they add a filesystem (ext3, ext4, etc.) they will need to have a way to trigger the block re-org from userspace. Having a single ioctl that can be expanded to handle more and more underlying filesystems would benefit them. Equally important if other users of EXT4_IOC_MOVE_EXT come along, they may want it to be more filesystem generic.as well. >> Thoughts? >> >>> struct move_extent { >>> int org_fd; /* original file descriptor */ >>> int dest_fd; /* destination file descriptor */ >>> ext4_lblk_t start; /* logical offset of org_fd and dest_fd */ >>> ext4_lblk_t len; /* exchange block length */ >>> }; >> >> I would also like to see .dest_fd changed to .donor_fd. >> >> I would like to see the ABI be more flexible and have .start be broken >> into 2 fields: >> >> .start_orig >> .start_donor >> >> And I don't think they should be of type ext4_lblk_t. Something more >> generic seems appropriate. >> > OK, I broke .start into .orig_start and .donor_start > and changed the entry type from ext4_lblk_t to __u64. > The new move_extent structure is as follows: > > struct move_extent { > int orig_fd; /* original file descriptor */ > int donor_fd; /* donor file descriptor */ > __u64 orig_start; /* logical start offset in block for orig */ > __u64 donor_start; /* logical start offset in block for donor > */ > __u64 len; /* exchange block length */ > }; > > Any comments? I like that much better. With OHSM as an example, this gives them the flexibility to re-org a large file even if there is not enough freespace to alloc a full redundant copy. > Regards, > Akira Fujita > Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 0/3] ext4: online defrag (ver 1.0) 2009-02-04 8:07 ` Akira Fujita 2009-02-04 12:25 ` Greg Freemyer @ 2009-02-04 14:09 ` Theodore Tso 2009-02-04 14:51 ` Greg Freemyer 1 sibling, 1 reply; 8+ messages in thread From: Theodore Tso @ 2009-02-04 14:09 UTC (permalink / raw) To: Akira Fujita; +Cc: Greg Freemyer, linux-ext4, linux-fsdevel On Wed, Feb 04, 2009 at 05:07:48PM +0900, Akira Fujita wrote: >> Do we want the ioctl name to be specific to defrag? I thought Ted's >> goal was to make it more generic? I can also envision this same ioctl >> being implemented by other file systems so EXT4 seems an inappropriate >> prefix. When I said generic I meant in terms of decomposing the functionality into multiple ioctls which each could be useful for multiple purposes. Not necessarily in terms of being used by other filesystem, because they will almost certainly have their own requirements. So for example, primitives like "allocate blocks for this inode from this region of the disk", or "don't allocate blocks for any inode in this region of disk", can be used for multiple things (such as on-line shrink), and not just defragmentation. I don't want to move this to the VFS layer, since it will involve huge amounts of time while people argue over generic issues regarding the interface. Look at how long it took to settle on the FIEMAP interface; that's not an experience I care to repeat. >>> struct move_extent { >>> int org_fd; /* original file descriptor */ >>> int dest_fd; /* destination file descriptor */ >>> ext4_lblk_t start; /* logical offset of org_fd and dest_fd */ >>> ext4_lblk_t len; /* exchange block length */ >>> }; >> >> I would also like to see .dest_fd changed to .donor_fd. Agreed --- dest_fd is very confusing, because while the data is moving to the blocks contributed by the donor_fd, the actual inode which remains pointed to by all of the directory entries is the org_fd. But people who think of the operation as the blocks moving to the "destination fd", will get completely confused. Donor makes more sense, since it has the sense of "organ transplant", which makes a lot more sense. - Ted ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 0/3] ext4: online defrag (ver 1.0) 2009-02-04 14:09 ` Theodore Tso @ 2009-02-04 14:51 ` Greg Freemyer 2009-02-04 15:32 ` Theodore Tso 0 siblings, 1 reply; 8+ messages in thread From: Greg Freemyer @ 2009-02-04 14:51 UTC (permalink / raw) To: Theodore Tso; +Cc: Akira Fujita, linux-ext4, linux-fsdevel On Wed, Feb 4, 2009 at 9:09 AM, Theodore Tso <tytso@mit.edu> wrote: > On Wed, Feb 04, 2009 at 05:07:48PM +0900, Akira Fujita wrote: >>> Do we want the ioctl name to be specific to defrag? I thought Ted's >>> goal was to make it more generic? I can also envision this same ioctl >>> being implemented by other file systems so EXT4 seems an inappropriate >>> prefix. > > When I said generic I meant in terms of decomposing the functionality > into multiple ioctls which each could be useful for multiple purposes. > Not necessarily in terms of being used by other filesystem, because > they will almost certainly have their own requirements. > > So for example, primitives like "allocate blocks for this inode from > this region of the disk", or "don't allocate blocks for any inode in > this region of disk", can be used for multiple things (such as on-line > shrink), and not just defragmentation. > > I don't want to move this to the VFS layer, since it will involve huge > amounts of time while people argue over generic issues regarding the > interface. Look at how long it took to settle on the FIEMAP > interface; that's not an experience I care to repeat. Convinced and request withdrawn. Talking about this ioctl, can anyone say: If the OHSM team implements a similar ioctl for ext2 and ext3 and submits them for mainline at some point, do they have a chance of being accepted or are ext2 and ext3 feature frozen? Thanks Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 0/3] ext4: online defrag (ver 1.0) 2009-02-04 14:51 ` Greg Freemyer @ 2009-02-04 15:32 ` Theodore Tso 0 siblings, 0 replies; 8+ messages in thread From: Theodore Tso @ 2009-02-04 15:32 UTC (permalink / raw) To: Greg Freemyer; +Cc: Akira Fujita, linux-ext4, linux-fsdevel On Wed, Feb 04, 2009 at 09:51:07AM -0500, Greg Freemyer wrote: > > If the OHSM team implements a similar ioctl for ext2 and ext3 and > submits them for mainline at some point, do they have a chance of > being accepted or are ext2 and ext3 feature frozen? It seems unlikely it would be accepted. If the patch could be done in a way that seriously minimized the chances of destablizing the code, maybe --- but consider also that the OHSM design is a pretty terrible hack. I'm not at all conviced they will be able to stablize it for production use, and a scheme that involves using dmapi across multiple block devices. Note that they apparently need to make other changes to the core filesystem code besides just the ioctl --- to the block allocation code, at the very least. The right answer is really to use a stackable filesystem, and to use separate filesystems for each different tier, and then build on top of unionfs to give it its policy support. I suspect that OHSM will be a cute student project, but it won't become anything serious given its architecture/design, unfortunately. - Ted ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-02-04 15:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <49829A1D.5090002@rs.jp.nec.com>
2009-01-30 20:15 ` [RFC][PATCH 0/3] ext4: online defrag (ver 1.0) Chris Mason
2009-02-03 8:00 ` Akira Fujita
2009-01-30 22:33 ` Greg Freemyer
2009-02-04 8:07 ` Akira Fujita
2009-02-04 12:25 ` Greg Freemyer
2009-02-04 14:09 ` Theodore Tso
2009-02-04 14:51 ` Greg Freemyer
2009-02-04 15:32 ` Theodore Tso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).