* Some nilfs comments
@ 2008-12-18 2:52 Chris Mason
[not found] ` <1229568775.27170.134.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2008-12-18 2:52 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg
Hello,
I've been meaning to read through nilfs for a while, and I grabbed the
code today to take a look. The code is very clean and it ran well out
of the box here.
One problem I hit early on was that nilfs doesn't seem to zero out the
block device during mkfs. So after mkfs.nilfs2, mount still found my
old btrfs filesystem. It would be a good idea to zero out the first and
last 1MB on the device (except for the first sector which might have the
partition table).
I haven't dug too deeply in yet, but if there are parts you're most
interested in comments on, please let me know.
It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by
do_sync_mapping_range().
nilfs_page_mkwrite doesn't seem to dirty the page? block_page_mkwrite
does more, including checks against i_size and others.
readpages and O_DIRECT both set b_size on the map_bh to reflect the
total size of the region they are trying to map. It seems like an easy
optimization to map more in nilfs_get_block.
Hopefully this helps, I'll try to send a few more ideas after I get to
know the code better.
-chris
^ permalink raw reply [flat|nested] 10+ messages in thread[parent not found: <1229568775.27170.134.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>]
* Re: Some nilfs comments [not found] ` <1229568775.27170.134.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org> @ 2008-12-22 9:07 ` Ryusuke Konishi [not found] ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Ryusuke Konishi @ 2008-12-22 9:07 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, chris.mason-QHcLZuEGTsvQT0dZR+AlfA Hi Chris, On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote: > Hello, > > I've been meaning to read through nilfs for a while, and I grabbed the > code today to take a look. The code is very clean and it ran well out > of the box here. Thank you very much for helpful comments! > One problem I hit early on was that nilfs doesn't seem to zero out the > block device during mkfs. So after mkfs.nilfs2, mount still found my > old btrfs filesystem. It would be a good idea to zero out the first and > last 1MB on the device (except for the first sector which might have the > partition table). Okay, I'll take this idea in mkfs.nilfs2. > I haven't dug too deeply in yet, but if there are parts you're most > interested in comments on, please let me know. Well, I feel that the following two matters are particularlly questionable and need to be checked: - struct the_nilfs: NILFS allows users to mount snapshots without making additional devices or volumes. This is achieved by sharing a block device among multiple mount instances (i.e. super_block structs). the_nilfs struct is used for this sharing. This approach seems to be peculiar to nilfs, and I feel it needs attention. - ioctl: Ioctl interface (routines and structures) were implemented in an own way. These seems to be checked whether to comply with the rules of ioctl design. > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by > do_sync_mapping_range(). Thanks! I didn't notice that this function was added. Uum, it seems to require reconsidering the way to initiate writing of data pages. > nilfs_page_mkwrite doesn't seem to dirty the page? block_page_mkwrite > does more, including checks against i_size and others. I got it. The design of this function seems so old. It should be revised to do checks and block allocations previously at this function. > readpages and O_DIRECT both set b_size on the map_bh to reflect the > total size of the region they are trying to map. It seems like an easy > optimization to map more in nilfs_get_block. Yes, that's so true! Actually, I have unfinished patches to do this. This would decrease the numbers of get_block callbacks and b-tree lookups. And, as you pointed out, this seems one of the easiest optimization in some todos on performance. > Hopefully this helps, I'll try to send a few more ideas after I get to > know the code better. > > -chris Thanks, I've also started to read btrfs. I'll see it during the Christmas holidays ;) With regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: Some nilfs comments [not found] ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2008-12-22 13:01 ` Reinoud Zandijk [not found] ` <20081222130152.GA23725-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> 2008-12-23 1:47 ` Chris Mason 2009-01-06 19:55 ` Chris Mason 2 siblings, 1 reply; 10+ messages in thread From: Reinoud Zandijk @ 2008-12-22 13:01 UTC (permalink / raw) To: NILFS Users mailing list; +Cc: chris.mason-QHcLZuEGTsvQT0dZR+AlfA Hi Ryusuke, Chris, On Mon, Dec 22, 2008 at 06:07:19PM +0900, Ryusuke Konishi wrote: > Well, I feel that the following two matters are particularlly > questionable and need to be checked: > > - struct the_nilfs: > NILFS allows users to mount snapshots without making additional > devices or volumes. This is achieved by sharing a block device > among multiple mount instances (i.e. super_block structs). > the_nilfs struct is used for this sharing. > > This approach seems to be peculiar to nilfs, and I feel it needs > attention. I've dug into this too and decided to use the same mechanism for my implementation. I think its nice to have all the fs's administration centered at one place; i even have all the btree stuff there. One thing indeed is that its not possible to do say `unmount /dev/wd0a' if there are multiple mounts on it... but maybe thats better even ;) or its even a positive thing to just unmount all the mounts in one go. > - ioctl: > Ioctl interface (routines and structures) were implemented in an > own way. These seems to be checked whether to comply with the rules > of ioctl design. I can't comment on that yet since i haven't dug into that... > Thanks, I've also started to read btrfs. > I'll see it during the Christmas holidays ;) Whats your first impression? Want to take over things/ideas? Isn't that a quite different FS structurally since it doesn't have segments? (AFAIK!) With regards, Reinoud Zandijk ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20081222130152.GA23725-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>]
* Re: Some nilfs comments [not found] ` <20081222130152.GA23725-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org> @ 2008-12-23 17:23 ` Ryusuke Konishi [not found] ` <20081224.022337.30445207.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Ryusuke Konishi @ 2008-12-23 17:23 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A Cc: chris.mason-QHcLZuEGTsvQT0dZR+AlfA Hi Reinoud, On Mon, 22 Dec 2008 14:01:52 +0100, Reinoud Zandijk <reinoud-S783fYmB3Ccdnm+yROfE0A@public.gmane.org> wrote: > On Mon, Dec 22, 2008 at 06:07:19PM +0900, Ryusuke Konishi wrote: > > Well, I feel that the following two matters are particularlly > > questionable and need to be checked: > > > > - struct the_nilfs: > > NILFS allows users to mount snapshots without making additional > > devices or volumes. This is achieved by sharing a block device > > among multiple mount instances (i.e. super_block structs). > > the_nilfs struct is used for this sharing. > > > > This approach seems to be peculiar to nilfs, and I feel it needs > > attention. > > I've dug into this too and decided to use the same mechanism for my > implementation. I think its nice to have all the fs's administration centered > at one place; i even have all the btree stuff there. One thing indeed is that > its not possible to do say `unmount /dev/wd0a' if there are multiple mounts on > it... but maybe thats better even ;) or its even a positive thing to just > unmount all the mounts in one go. I don't know the mount/umount interface of NetBSD, but perhaps unmount system call takes a directory argument instead of a device. If so, your kernel code would become similar to our nilfs. Though the linux umount command can take a device argument, only the latest mount is detached ;) > > Thanks, I've also started to read btrfs. > > I'll see it during the Christmas holidays ;) > > Whats your first impression? Want to take over things/ideas? Isn't that a > quite different FS structurally since it doesn't have segments? (AFAIK!) Yeah, btrfs has flexible and rich volume management layer, so very interesting. The snapshot code and c-tree also look like fun. Regards, Ryusuke ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20081224.022337.30445207.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: Some nilfs comments [not found] ` <20081224.022337.30445207.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2008-12-23 20:49 ` Chris Mason 0 siblings, 0 replies; 10+ messages in thread From: Chris Mason @ 2008-12-23 20:49 UTC (permalink / raw) To: Ryusuke Konishi Cc: reinoud-S783fYmB3Ccdnm+yROfE0A, users-JrjvKiOkagjYtjvyW6yDsg On Wed, 2008-12-24 at 02:23 +0900, Ryusuke Konishi wrote: > > > Thanks, I've also started to read btrfs. > > > I'll see it during the Christmas holidays ;) > > > > Whats your first impression? Want to take over things/ideas? Isn't that a > > quite different FS structurally since it doesn't have segments? (AFAIK!) > > Yeah, btrfs has flexible and rich volume management layer, so very > interesting. The snapshot code and c-tree also look like fun. The ctree (checksummed btree is really what it stands for) code itself is pretty basic. Aside from the checksumming, it is a simple btree with generation numbers to find the nodes that have changed since a given transaction. In terms of interesting/new code that others might want look at, the btrfs tree logging to do fsync efficiently (tree-log.c) and device management (volumes.c) are probably the most interesting. -chris ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Some nilfs comments [not found] ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org> 2008-12-22 13:01 ` Reinoud Zandijk @ 2008-12-23 1:47 ` Chris Mason [not found] ` <1229996820.4812.7.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org> 2009-01-06 19:55 ` Chris Mason 2 siblings, 1 reply; 10+ messages in thread From: Chris Mason @ 2008-12-23 1:47 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote: > Hi Chris, > > On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote: > > Hello, > > > > I've been meaning to read through nilfs for a while, and I grabbed the > > code today to take a look. The code is very clean and it ran well out > > of the box here. > > Thank you very much for helpful comments! > ;) I'll be slow to answer since I'm traveling a bit. > > One problem I hit early on was that nilfs doesn't seem to zero out the > > block device during mkfs. So after mkfs.nilfs2, mount still found my > > old btrfs filesystem. It would be a good idea to zero out the first and > > last 1MB on the device (except for the first sector which might have the > > partition table). > > Okay, I'll take this idea in mkfs.nilfs2. > > > I haven't dug too deeply in yet, but if there are parts you're most > > interested in comments on, please let me know. > > Well, I feel that the following two matters are particularlly > questionable and need to be checked: > > - struct the_nilfs: > NILFS allows users to mount snapshots without making additional > devices or volumes. This is achieved by sharing a block device > among multiple mount instances (i.e. super_block structs). > the_nilfs struct is used for this sharing. > > This approach seems to be peculiar to nilfs, and I feel it needs > attention. Btrfs also shares super blocks between snapshot mounts, and each snapshot becomes a private inode number space. So, it sounds like we have some things in common there. I'm looking at using more explicit mounts than I am now (with some hints from Christoph). I'll take a look at the nilfs code in this area and see how similar we really are. > > - ioctl: > Ioctl interface (routines and structures) were implemented in an > own way. These seems to be checked whether to comply with the rules > of ioctl design. Ok, I'll take a look here as well. The same goes for btrfs ;) > > > > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by > > do_sync_mapping_range(). > > Thanks! I didn't notice that this function was added. > Uum, it seems to require reconsidering the way to initiate writing of > data pages. Nick Piggin has been making noise to change do_sync_mapping_range to pass WB_SYNC_ALL instead. You can trick it a bit while things get worked out and use current_is_pdflush() to only treat WB_SYNC_NONE as WB_SYNC_NONE when pdflush is the one calling (sorry, its an ugly suggestion). -chris ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <1229996820.4812.7.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>]
* Re: Some nilfs comments [not found] ` <1229996820.4812.7.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org> @ 2008-12-23 15:57 ` Ryusuke Konishi 0 siblings, 0 replies; 10+ messages in thread From: Ryusuke Konishi @ 2008-12-23 15:57 UTC (permalink / raw) To: chris.mason-QHcLZuEGTsvQT0dZR+AlfA Cc: users-JrjvKiOkagjYtjvyW6yDsg, konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg Hi, On Mon, 22 Dec 2008 20:47:00 -0500, Chris Mason wrote: > > - struct the_nilfs: > > NILFS allows users to mount snapshots without making additional > > devices or volumes. This is achieved by sharing a block device > > among multiple mount instances (i.e. super_block structs). > > the_nilfs struct is used for this sharing. > > > > This approach seems to be peculiar to nilfs, and I feel it needs > > attention. > > Btrfs also shares super blocks between snapshot mounts, and each > snapshot becomes a private inode number space. > > So, it sounds like we have some things in common there. I'm looking at > using more explicit mounts than I am now (with some hints from > Christoph). I'll take a look at the nilfs code in this area and see how > similar we really are. Yeah, it sounds nice. So, I may not have to care this so much. Comprehending common points between the two may be indeed meaningful when we rethink mount/umount vfs interface. > > > > - ioctl: > > Ioctl interface (routines and structures) were implemented in an > > own way. These seems to be checked whether to comply with the rules > > of ioctl design. > > Ok, I'll take a look here as well. The same goes for btrfs ;) All right ;) > > > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by > > > do_sync_mapping_range(). > > > > Thanks! I didn't notice that this function was added. > > Uum, it seems to require reconsidering the way to initiate writing of > > data pages. > > Nick Piggin has been making noise to change do_sync_mapping_range to > pass WB_SYNC_ALL instead. You can trick it a bit while things get > worked out and use current_is_pdflush() to only treat WB_SYNC_NONE as > WB_SYNC_NONE when pdflush is the one calling (sorry, its an ugly > suggestion). Thanks. I've found the Nick's patch in mmotm. In either case, nilfs_writepages (or nilfs_writepage) needs some more changes to ensure data integrity writeout against sys_sync_file_range() because writeback flag on data pages is not set in nilfs_writepages() and wait_on_page_writeback_range() will ignore the pages. For other write operations, the write sync is ensured in a file level operation like the journal data mode of ext3/ext4. But I think nilfs_writepages() should be revised to initiate writeback in the function if possible. Regards, Ryusuke ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Some nilfs comments [not found] ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org> 2008-12-22 13:01 ` Reinoud Zandijk 2008-12-23 1:47 ` Chris Mason @ 2009-01-06 19:55 ` Chris Mason [not found] ` <1231271709.4888.27.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org> 2 siblings, 1 reply; 10+ messages in thread From: Chris Mason @ 2009-01-06 19:55 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote: > > I haven't dug too deeply in yet, but if there are parts you're most > > interested in comments on, please let me know. > > Well, I feel that the following two matters are particularlly > questionable and need to be checked: > > - ioctl: > Ioctl interface (routines and structures) were implemented in an > own way. These seems to be checked whether to comply with the rules > of ioctl design. > It took me far too long to look at this, I'm sorry. But, looking through the ioctl code I think you'll have a few (easy to fix) problems. struct nilfs_argv { void *v_base; size_t v_nmembs; /* number of members */ size_t v_size; /* size of members */ int v_index; int v_flags; }; The structs that get passed from userland to kernel should use fixed sized types. I get around this by using u64s (and __u64s) everywhere. It looks like you have compat ioctls to fix this too, I find it easier to use the fixed types. Otherwise the ioctl code looks pretty reasonable. -chris ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <1231271709.4888.27.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>]
* Re: Some nilfs comments [not found] ` <1231271709.4888.27.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org> @ 2009-01-07 5:20 ` Ryusuke Konishi [not found] ` <20090107.142009.106620982.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Ryusuke Konishi @ 2009-01-07 5:20 UTC (permalink / raw) To: chris.mason-QHcLZuEGTsvQT0dZR+AlfA Cc: users-JrjvKiOkagjYtjvyW6yDsg, konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg On Tue, 06 Jan 2009 14:55:09 -0500, Chris Mason wrote: > On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote: > > > I haven't dug too deeply in yet, but if there are parts you're most > > > interested in comments on, please let me know. > > > > Well, I feel that the following two matters are particularlly > > questionable and need to be checked: > > > > > - ioctl: > > Ioctl interface (routines and structures) were implemented in an > > own way. These seems to be checked whether to comply with the rules > > of ioctl design. > > > > It took me far too long to look at this, I'm sorry. But, looking > through the ioctl code I think you'll have a few (easy to fix) problems. > > struct nilfs_argv { > void *v_base; > size_t v_nmembs; /* number of members */ > size_t v_size; /* size of members */ > int v_index; > int v_flags; > }; > > The structs that get passed from userland to kernel should use fixed > sized types. > > I get around this by using u64s (and __u64s) everywhere. It looks like > you have compat ioctls to fix this too, I find it easier to use the > fixed types. > > Otherwise the ioctl code looks pretty reasonable. > > -chris Thank you for this comment and for taking time from your busy schedule. I think it actually should be fixed to comply with the kernel rules. The same problem is found in a few other ioctl structures. I'd like to modify them and remove compat ioctls some time soon. Thanks, Ryusuke Konishi ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20090107.142009.106620982.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: Some nilfs comments [not found] ` <20090107.142009.106620982.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2009-01-07 14:16 ` Chris Mason 0 siblings, 0 replies; 10+ messages in thread From: Chris Mason @ 2009-01-07 14:16 UTC (permalink / raw) To: Ryusuke Konishi Cc: users-JrjvKiOkagjYtjvyW6yDsg, konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg On Wed, 2009-01-07 at 14:20 +0900, Ryusuke Konishi wrote: > On Tue, 06 Jan 2009 14:55:09 -0500, Chris Mason wrote: > > On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote: > > > > I haven't dug too deeply in yet, but if there are parts you're most > > > > interested in comments on, please let me know. > > > > > > Well, I feel that the following two matters are particularlly > > > questionable and need to be checked: > > > > > > > > - ioctl: > > > Ioctl interface (routines and structures) were implemented in an > > > own way. These seems to be checked whether to comply with the rules > > > of ioctl design. > > > > > > > It took me far too long to look at this, I'm sorry. But, looking > > through the ioctl code I think you'll have a few (easy to fix) problems. > > > > struct nilfs_argv { > > void *v_base; > > size_t v_nmembs; /* number of members */ > > size_t v_size; /* size of members */ > > int v_index; > > int v_flags; > > }; > > > > The structs that get passed from userland to kernel should use fixed > > sized types. > > > > I get around this by using u64s (and __u64s) everywhere. It looks like > > you have compat ioctls to fix this too, I find it easier to use the > > fixed types. > > > > Otherwise the ioctl code looks pretty reasonable. > > > > I think it actually should be fixed to comply with the kernel rules. > The same problem is found in a few other ioctl structures. > > I'd like to modify them and remove compat ioctls some time soon. Strictly speaking, your code is correct. I suspect that if you ask 10 people you'll get 6 different answers on the best way. The u64 is my personal preference ;) -chris ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-01-07 14:16 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-18 2:52 Some nilfs comments Chris Mason
[not found] ` <1229568775.27170.134.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2008-12-22 9:07 ` Ryusuke Konishi
[not found] ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-12-22 13:01 ` Reinoud Zandijk
[not found] ` <20081222130152.GA23725-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-12-23 17:23 ` Ryusuke Konishi
[not found] ` <20081224.022337.30445207.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-12-23 20:49 ` Chris Mason
2008-12-23 1:47 ` Chris Mason
[not found] ` <1229996820.4812.7.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2008-12-23 15:57 ` Ryusuke Konishi
2009-01-06 19:55 ` Chris Mason
[not found] ` <1231271709.4888.27.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2009-01-07 5:20 ` Ryusuke Konishi
[not found] ` <20090107.142009.106620982.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-01-07 14:16 ` Chris Mason
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox