* Some nilfs comments
@ 2008-12-18 2:52 Chris Mason
[not found] ` <1229568775.27170.134.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2008-12-18 2:52 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg
Hello,
I've been meaning to read through nilfs for a while, and I grabbed the
code today to take a look. The code is very clean and it ran well out
of the box here.
One problem I hit early on was that nilfs doesn't seem to zero out the
block device during mkfs. So after mkfs.nilfs2, mount still found my
old btrfs filesystem. It would be a good idea to zero out the first and
last 1MB on the device (except for the first sector which might have the
partition table).
I haven't dug too deeply in yet, but if there are parts you're most
interested in comments on, please let me know.
It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by
do_sync_mapping_range().
nilfs_page_mkwrite doesn't seem to dirty the page? block_page_mkwrite
does more, including checks against i_size and others.
readpages and O_DIRECT both set b_size on the map_bh to reflect the
total size of the region they are trying to map. It seems like an easy
optimization to map more in nilfs_get_block.
Hopefully this helps, I'll try to send a few more ideas after I get to
know the code better.
-chris
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Some nilfs comments
[not found] ` <1229568775.27170.134.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
@ 2008-12-22 9:07 ` Ryusuke Konishi
[not found] ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2008-12-22 9:07 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg, chris.mason-QHcLZuEGTsvQT0dZR+AlfA
Hi Chris,
On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote:
> Hello,
>
> I've been meaning to read through nilfs for a while, and I grabbed the
> code today to take a look. The code is very clean and it ran well out
> of the box here.
Thank you very much for helpful comments!
> One problem I hit early on was that nilfs doesn't seem to zero out the
> block device during mkfs. So after mkfs.nilfs2, mount still found my
> old btrfs filesystem. It would be a good idea to zero out the first and
> last 1MB on the device (except for the first sector which might have the
> partition table).
Okay, I'll take this idea in mkfs.nilfs2.
> I haven't dug too deeply in yet, but if there are parts you're most
> interested in comments on, please let me know.
Well, I feel that the following two matters are particularlly
questionable and need to be checked:
- struct the_nilfs:
NILFS allows users to mount snapshots without making additional
devices or volumes. This is achieved by sharing a block device
among multiple mount instances (i.e. super_block structs).
the_nilfs struct is used for this sharing.
This approach seems to be peculiar to nilfs, and I feel it needs
attention.
- ioctl:
Ioctl interface (routines and structures) were implemented in an
own way. These seems to be checked whether to comply with the rules
of ioctl design.
> It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by
> do_sync_mapping_range().
Thanks! I didn't notice that this function was added.
Uum, it seems to require reconsidering the way to initiate writing of
data pages.
> nilfs_page_mkwrite doesn't seem to dirty the page? block_page_mkwrite
> does more, including checks against i_size and others.
I got it. The design of this function seems so old. It should be
revised to do checks and block allocations previously at this
function.
> readpages and O_DIRECT both set b_size on the map_bh to reflect the
> total size of the region they are trying to map. It seems like an easy
> optimization to map more in nilfs_get_block.
Yes, that's so true!
Actually, I have unfinished patches to do this. This would decrease
the numbers of get_block callbacks and b-tree lookups. And, as you
pointed out, this seems one of the easiest optimization in some todos
on performance.
> Hopefully this helps, I'll try to send a few more ideas after I get to
> know the code better.
>
> -chris
Thanks, I've also started to read btrfs.
I'll see it during the Christmas holidays ;)
With regards,
Ryusuke Konishi
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Some nilfs comments
[not found] ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-12-22 13:01 ` Reinoud Zandijk
[not found] ` <20081222130152.GA23725-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-12-23 1:47 ` Chris Mason
2009-01-06 19:55 ` Chris Mason
2 siblings, 1 reply; 10+ messages in thread
From: Reinoud Zandijk @ 2008-12-22 13:01 UTC (permalink / raw)
To: NILFS Users mailing list; +Cc: chris.mason-QHcLZuEGTsvQT0dZR+AlfA
Hi Ryusuke, Chris,
On Mon, Dec 22, 2008 at 06:07:19PM +0900, Ryusuke Konishi wrote:
> Well, I feel that the following two matters are particularlly
> questionable and need to be checked:
>
> - struct the_nilfs:
> NILFS allows users to mount snapshots without making additional
> devices or volumes. This is achieved by sharing a block device
> among multiple mount instances (i.e. super_block structs).
> the_nilfs struct is used for this sharing.
>
> This approach seems to be peculiar to nilfs, and I feel it needs
> attention.
I've dug into this too and decided to use the same mechanism for my
implementation. I think its nice to have all the fs's administration centered
at one place; i even have all the btree stuff there. One thing indeed is that
its not possible to do say `unmount /dev/wd0a' if there are multiple mounts on
it... but maybe thats better even ;) or its even a positive thing to just
unmount all the mounts in one go.
> - ioctl:
> Ioctl interface (routines and structures) were implemented in an
> own way. These seems to be checked whether to comply with the rules
> of ioctl design.
I can't comment on that yet since i haven't dug into that...
> Thanks, I've also started to read btrfs.
> I'll see it during the Christmas holidays ;)
Whats your first impression? Want to take over things/ideas? Isn't that a
quite different FS structurally since it doesn't have segments? (AFAIK!)
With regards,
Reinoud Zandijk
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Some nilfs comments
[not found] ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-12-22 13:01 ` Reinoud Zandijk
@ 2008-12-23 1:47 ` Chris Mason
[not found] ` <1229996820.4812.7.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2009-01-06 19:55 ` Chris Mason
2 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2008-12-23 1:47 UTC (permalink / raw)
To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg
On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote:
> Hi Chris,
>
> On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote:
> > Hello,
> >
> > I've been meaning to read through nilfs for a while, and I grabbed the
> > code today to take a look. The code is very clean and it ran well out
> > of the box here.
>
> Thank you very much for helpful comments!
>
;) I'll be slow to answer since I'm traveling a bit.
> > One problem I hit early on was that nilfs doesn't seem to zero out the
> > block device during mkfs. So after mkfs.nilfs2, mount still found my
> > old btrfs filesystem. It would be a good idea to zero out the first and
> > last 1MB on the device (except for the first sector which might have the
> > partition table).
>
> Okay, I'll take this idea in mkfs.nilfs2.
>
> > I haven't dug too deeply in yet, but if there are parts you're most
> > interested in comments on, please let me know.
>
> Well, I feel that the following two matters are particularlly
> questionable and need to be checked:
>
> - struct the_nilfs:
> NILFS allows users to mount snapshots without making additional
> devices or volumes. This is achieved by sharing a block device
> among multiple mount instances (i.e. super_block structs).
> the_nilfs struct is used for this sharing.
>
> This approach seems to be peculiar to nilfs, and I feel it needs
> attention.
Btrfs also shares super blocks between snapshot mounts, and each
snapshot becomes a private inode number space.
So, it sounds like we have some things in common there. I'm looking at
using more explicit mounts than I am now (with some hints from
Christoph). I'll take a look at the nilfs code in this area and see how
similar we really are.
>
> - ioctl:
> Ioctl interface (routines and structures) were implemented in an
> own way. These seems to be checked whether to comply with the rules
> of ioctl design.
Ok, I'll take a look here as well. The same goes for btrfs ;)
>
>
> > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by
> > do_sync_mapping_range().
>
> Thanks! I didn't notice that this function was added.
> Uum, it seems to require reconsidering the way to initiate writing of
> data pages.
Nick Piggin has been making noise to change do_sync_mapping_range to
pass WB_SYNC_ALL instead. You can trick it a bit while things get
worked out and use current_is_pdflush() to only treat WB_SYNC_NONE as
WB_SYNC_NONE when pdflush is the one calling (sorry, its an ugly
suggestion).
-chris
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Some nilfs comments
[not found] ` <1229996820.4812.7.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
@ 2008-12-23 15:57 ` Ryusuke Konishi
0 siblings, 0 replies; 10+ messages in thread
From: Ryusuke Konishi @ 2008-12-23 15:57 UTC (permalink / raw)
To: chris.mason-QHcLZuEGTsvQT0dZR+AlfA
Cc: users-JrjvKiOkagjYtjvyW6yDsg,
konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg
Hi,
On Mon, 22 Dec 2008 20:47:00 -0500, Chris Mason wrote:
> > - struct the_nilfs:
> > NILFS allows users to mount snapshots without making additional
> > devices or volumes. This is achieved by sharing a block device
> > among multiple mount instances (i.e. super_block structs).
> > the_nilfs struct is used for this sharing.
> >
> > This approach seems to be peculiar to nilfs, and I feel it needs
> > attention.
>
> Btrfs also shares super blocks between snapshot mounts, and each
> snapshot becomes a private inode number space.
>
> So, it sounds like we have some things in common there. I'm looking at
> using more explicit mounts than I am now (with some hints from
> Christoph). I'll take a look at the nilfs code in this area and see how
> similar we really are.
Yeah, it sounds nice.
So, I may not have to care this so much.
Comprehending common points between the two may be indeed meaningful
when we rethink mount/umount vfs interface.
> >
> > - ioctl:
> > Ioctl interface (routines and structures) were implemented in an
> > own way. These seems to be checked whether to comply with the rules
> > of ioctl design.
>
> Ok, I'll take a look here as well. The same goes for btrfs ;)
All right ;)
> > > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by
> > > do_sync_mapping_range().
> >
> > Thanks! I didn't notice that this function was added.
> > Uum, it seems to require reconsidering the way to initiate writing of
> > data pages.
>
> Nick Piggin has been making noise to change do_sync_mapping_range to
> pass WB_SYNC_ALL instead. You can trick it a bit while things get
> worked out and use current_is_pdflush() to only treat WB_SYNC_NONE as
> WB_SYNC_NONE when pdflush is the one calling (sorry, its an ugly
> suggestion).
Thanks. I've found the Nick's patch in mmotm.
In either case, nilfs_writepages (or nilfs_writepage) needs some more
changes to ensure data integrity writeout against
sys_sync_file_range() because writeback flag on data pages is not set
in nilfs_writepages() and wait_on_page_writeback_range() will ignore
the pages.
For other write operations, the write sync is ensured in a file level
operation like the journal data mode of ext3/ext4.
But I think nilfs_writepages() should be revised to initiate writeback
in the function if possible.
Regards,
Ryusuke
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Some nilfs comments
[not found] ` <20081222130152.GA23725-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
@ 2008-12-23 17:23 ` Ryusuke Konishi
[not found] ` <20081224.022337.30445207.ryusuke-sG5X7nlA6pw@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2008-12-23 17:23 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A
Cc: chris.mason-QHcLZuEGTsvQT0dZR+AlfA
Hi Reinoud,
On Mon, 22 Dec 2008 14:01:52 +0100, Reinoud Zandijk <reinoud-S783fYmB3Ccdnm+yROfE0A@public.gmane.org> wrote:
> On Mon, Dec 22, 2008 at 06:07:19PM +0900, Ryusuke Konishi wrote:
> > Well, I feel that the following two matters are particularlly
> > questionable and need to be checked:
> >
> > - struct the_nilfs:
> > NILFS allows users to mount snapshots without making additional
> > devices or volumes. This is achieved by sharing a block device
> > among multiple mount instances (i.e. super_block structs).
> > the_nilfs struct is used for this sharing.
> >
> > This approach seems to be peculiar to nilfs, and I feel it needs
> > attention.
>
> I've dug into this too and decided to use the same mechanism for my
> implementation. I think its nice to have all the fs's administration centered
> at one place; i even have all the btree stuff there. One thing indeed is that
> its not possible to do say `unmount /dev/wd0a' if there are multiple mounts on
> it... but maybe thats better even ;) or its even a positive thing to just
> unmount all the mounts in one go.
I don't know the mount/umount interface of NetBSD, but perhaps unmount
system call takes a directory argument instead of a device.
If so, your kernel code would become similar to our nilfs.
Though the linux umount command can take a device argument, only the
latest mount is detached ;)
> > Thanks, I've also started to read btrfs.
> > I'll see it during the Christmas holidays ;)
>
> Whats your first impression? Want to take over things/ideas? Isn't that a
> quite different FS structurally since it doesn't have segments? (AFAIK!)
Yeah, btrfs has flexible and rich volume management layer, so very
interesting. The snapshot code and c-tree also look like fun.
Regards,
Ryusuke
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Some nilfs comments
[not found] ` <20081224.022337.30445207.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-12-23 20:49 ` Chris Mason
0 siblings, 0 replies; 10+ messages in thread
From: Chris Mason @ 2008-12-23 20:49 UTC (permalink / raw)
To: Ryusuke Konishi
Cc: reinoud-S783fYmB3Ccdnm+yROfE0A, users-JrjvKiOkagjYtjvyW6yDsg
On Wed, 2008-12-24 at 02:23 +0900, Ryusuke Konishi wrote:
> > > Thanks, I've also started to read btrfs.
> > > I'll see it during the Christmas holidays ;)
> >
> > Whats your first impression? Want to take over things/ideas? Isn't that a
> > quite different FS structurally since it doesn't have segments? (AFAIK!)
>
> Yeah, btrfs has flexible and rich volume management layer, so very
> interesting. The snapshot code and c-tree also look like fun.
The ctree (checksummed btree is really what it stands for) code itself
is pretty basic. Aside from the checksumming, it is a simple btree with
generation numbers to find the nodes that have changed since a given
transaction.
In terms of interesting/new code that others might want look at, the
btrfs tree logging to do fsync efficiently (tree-log.c) and device
management (volumes.c) are probably the most interesting.
-chris
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Some nilfs comments
[not found] ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-12-22 13:01 ` Reinoud Zandijk
2008-12-23 1:47 ` Chris Mason
@ 2009-01-06 19:55 ` Chris Mason
[not found] ` <1231271709.4888.27.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2009-01-06 19:55 UTC (permalink / raw)
To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg
On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote:
> > I haven't dug too deeply in yet, but if there are parts you're most
> > interested in comments on, please let me know.
>
> Well, I feel that the following two matters are particularlly
> questionable and need to be checked:
>
> - ioctl:
> Ioctl interface (routines and structures) were implemented in an
> own way. These seems to be checked whether to comply with the rules
> of ioctl design.
>
It took me far too long to look at this, I'm sorry. But, looking
through the ioctl code I think you'll have a few (easy to fix) problems.
struct nilfs_argv {
void *v_base;
size_t v_nmembs; /* number of members */
size_t v_size; /* size of members */
int v_index;
int v_flags;
};
The structs that get passed from userland to kernel should use fixed
sized types.
I get around this by using u64s (and __u64s) everywhere. It looks like
you have compat ioctls to fix this too, I find it easier to use the
fixed types.
Otherwise the ioctl code looks pretty reasonable.
-chris
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Some nilfs comments
[not found] ` <1231271709.4888.27.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
@ 2009-01-07 5:20 ` Ryusuke Konishi
[not found] ` <20090107.142009.106620982.ryusuke-sG5X7nlA6pw@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2009-01-07 5:20 UTC (permalink / raw)
To: chris.mason-QHcLZuEGTsvQT0dZR+AlfA
Cc: users-JrjvKiOkagjYtjvyW6yDsg,
konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg
On Tue, 06 Jan 2009 14:55:09 -0500, Chris Mason wrote:
> On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote:
> > > I haven't dug too deeply in yet, but if there are parts you're most
> > > interested in comments on, please let me know.
> >
> > Well, I feel that the following two matters are particularlly
> > questionable and need to be checked:
> >
>
> > - ioctl:
> > Ioctl interface (routines and structures) were implemented in an
> > own way. These seems to be checked whether to comply with the rules
> > of ioctl design.
> >
>
> It took me far too long to look at this, I'm sorry. But, looking
> through the ioctl code I think you'll have a few (easy to fix) problems.
>
> struct nilfs_argv {
> void *v_base;
> size_t v_nmembs; /* number of members */
> size_t v_size; /* size of members */
> int v_index;
> int v_flags;
> };
>
> The structs that get passed from userland to kernel should use fixed
> sized types.
>
> I get around this by using u64s (and __u64s) everywhere. It looks like
> you have compat ioctls to fix this too, I find it easier to use the
> fixed types.
>
> Otherwise the ioctl code looks pretty reasonable.
>
> -chris
Thank you for this comment and for taking time from your busy
schedule.
I think it actually should be fixed to comply with the kernel rules.
The same problem is found in a few other ioctl structures.
I'd like to modify them and remove compat ioctls some time soon.
Thanks,
Ryusuke Konishi
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Some nilfs comments
[not found] ` <20090107.142009.106620982.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-01-07 14:16 ` Chris Mason
0 siblings, 0 replies; 10+ messages in thread
From: Chris Mason @ 2009-01-07 14:16 UTC (permalink / raw)
To: Ryusuke Konishi
Cc: users-JrjvKiOkagjYtjvyW6yDsg,
konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg
On Wed, 2009-01-07 at 14:20 +0900, Ryusuke Konishi wrote:
> On Tue, 06 Jan 2009 14:55:09 -0500, Chris Mason wrote:
> > On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote:
> > > > I haven't dug too deeply in yet, but if there are parts you're most
> > > > interested in comments on, please let me know.
> > >
> > > Well, I feel that the following two matters are particularlly
> > > questionable and need to be checked:
> > >
> >
> > > - ioctl:
> > > Ioctl interface (routines and structures) were implemented in an
> > > own way. These seems to be checked whether to comply with the rules
> > > of ioctl design.
> > >
> >
> > It took me far too long to look at this, I'm sorry. But, looking
> > through the ioctl code I think you'll have a few (easy to fix) problems.
> >
> > struct nilfs_argv {
> > void *v_base;
> > size_t v_nmembs; /* number of members */
> > size_t v_size; /* size of members */
> > int v_index;
> > int v_flags;
> > };
> >
> > The structs that get passed from userland to kernel should use fixed
> > sized types.
> >
> > I get around this by using u64s (and __u64s) everywhere. It looks like
> > you have compat ioctls to fix this too, I find it easier to use the
> > fixed types.
> >
> > Otherwise the ioctl code looks pretty reasonable.
> >
>
> I think it actually should be fixed to comply with the kernel rules.
> The same problem is found in a few other ioctl structures.
>
> I'd like to modify them and remove compat ioctls some time soon.
Strictly speaking, your code is correct. I suspect that if you ask 10
people you'll get 6 different answers on the best way. The u64 is my
personal preference ;)
-chris
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-01-07 14:16 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-18 2:52 Some nilfs comments Chris Mason
[not found] ` <1229568775.27170.134.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2008-12-22 9:07 ` Ryusuke Konishi
[not found] ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-12-22 13:01 ` Reinoud Zandijk
[not found] ` <20081222130152.GA23725-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-12-23 17:23 ` Ryusuke Konishi
[not found] ` <20081224.022337.30445207.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-12-23 20:49 ` Chris Mason
2008-12-23 1:47 ` Chris Mason
[not found] ` <1229996820.4812.7.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2008-12-23 15:57 ` Ryusuke Konishi
2009-01-06 19:55 ` Chris Mason
[not found] ` <1231271709.4888.27.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2009-01-07 5:20 ` Ryusuke Konishi
[not found] ` <20090107.142009.106620982.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-01-07 14:16 ` Chris Mason
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox