Linux NILFS development
 help / color / mirror / Atom feed
* Some nilfs comments
@ 2008-12-18  2:52 Chris Mason
       [not found] ` <1229568775.27170.134.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2008-12-18  2:52 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg

Hello,

I've been meaning to read through nilfs for a while, and I grabbed the
code today to take a look.  The code is very clean and it ran well out
of the box here.

One problem I hit early on was that nilfs doesn't seem to zero out the
block device during mkfs.  So after mkfs.nilfs2, mount still found my
old btrfs filesystem.  It would be a good idea to zero out the first and
last 1MB on the device (except for the first sector which might have the
partition table).

I haven't dug too deeply in yet, but if there are parts you're most
interested in comments on, please let me know.

It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by
do_sync_mapping_range().

nilfs_page_mkwrite doesn't seem to dirty the page?  block_page_mkwrite
does more, including checks against i_size and others.

readpages and O_DIRECT both set b_size on the map_bh to reflect the
total size of the region they are trying to map.  It seems like an easy
optimization to map more in nilfs_get_block.

Hopefully this helps, I'll try to send a few more ideas after I get to
know the code better.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Some nilfs comments
       [not found] ` <1229568775.27170.134.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
@ 2008-12-22  9:07   ` Ryusuke Konishi
       [not found]     ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2008-12-22  9:07 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, chris.mason-QHcLZuEGTsvQT0dZR+AlfA

Hi Chris,

On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote:
> Hello,
> 
> I've been meaning to read through nilfs for a while, and I grabbed the
> code today to take a look.  The code is very clean and it ran well out
> of the box here.

Thank you very much for helpful comments!

> One problem I hit early on was that nilfs doesn't seem to zero out the
> block device during mkfs.  So after mkfs.nilfs2, mount still found my
> old btrfs filesystem.  It would be a good idea to zero out the first and
> last 1MB on the device (except for the first sector which might have the
> partition table).

Okay, I'll take this idea in mkfs.nilfs2.

> I haven't dug too deeply in yet, but if there are parts you're most
> interested in comments on, please let me know.

Well, I feel that the following two matters are particularlly
questionable and need to be checked:

- struct the_nilfs:
  NILFS allows users to mount snapshots without making additional
  devices or volumes.  This is achieved by sharing a block device
  among multiple mount instances (i.e. super_block structs).
  the_nilfs struct is used for this sharing.

  This approach seems to be peculiar to nilfs, and I feel it needs
  attention.

- ioctl:
  Ioctl interface (routines and structures) were implemented in an
  own way.  These seems to be checked whether to comply with the rules
  of ioctl design.


> It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by
> do_sync_mapping_range().

Thanks!  I didn't notice that this function was added.
Uum, it seems to require reconsidering the way to initiate writing of
data pages.

> nilfs_page_mkwrite doesn't seem to dirty the page?  block_page_mkwrite
> does more, including checks against i_size and others.

I got it.  The design of this function seems so old.  It should be
revised to do checks and block allocations previously at this
function.
 
> readpages and O_DIRECT both set b_size on the map_bh to reflect the
> total size of the region they are trying to map.  It seems like an easy
> optimization to map more in nilfs_get_block.

Yes, that's so true!

Actually, I have unfinished patches to do this.  This would decrease
the numbers of get_block callbacks and b-tree lookups.  And, as you
pointed out, this seems one of the easiest optimization in some todos
on performance.
 
> Hopefully this helps, I'll try to send a few more ideas after I get to
> know the code better.
> 
> -chris

Thanks, I've also started to read btrfs.
I'll see it during the Christmas holidays  ;)

With regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Some nilfs comments
       [not found]     ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-12-22 13:01       ` Reinoud Zandijk
       [not found]         ` <20081222130152.GA23725-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
  2008-12-23  1:47       ` Chris Mason
  2009-01-06 19:55       ` Chris Mason
  2 siblings, 1 reply; 10+ messages in thread
From: Reinoud Zandijk @ 2008-12-22 13:01 UTC (permalink / raw)
  To: NILFS Users mailing list; +Cc: chris.mason-QHcLZuEGTsvQT0dZR+AlfA

Hi Ryusuke, Chris,

On Mon, Dec 22, 2008 at 06:07:19PM +0900, Ryusuke Konishi wrote:
> Well, I feel that the following two matters are particularlly
> questionable and need to be checked:
> 
> - struct the_nilfs:
>   NILFS allows users to mount snapshots without making additional
>   devices or volumes.  This is achieved by sharing a block device
>   among multiple mount instances (i.e. super_block structs).
>   the_nilfs struct is used for this sharing.
> 
>   This approach seems to be peculiar to nilfs, and I feel it needs
>   attention.

I've dug into this too and decided to use the same mechanism for my
implementation. I think its nice to have all the fs's administration centered
at one place; i even have all the btree stuff there. One thing indeed is that
its not possible to do say `unmount /dev/wd0a' if there are multiple mounts on
it... but maybe thats better even ;) or its even a positive thing to just
unmount all the mounts in one go.

> - ioctl:
>   Ioctl interface (routines and structures) were implemented in an
>   own way.  These seems to be checked whether to comply with the rules
>   of ioctl design.

I can't comment on that yet since i haven't dug into that...

> Thanks, I've also started to read btrfs.
> I'll see it during the Christmas holidays  ;)

Whats your first impression? Want to take over things/ideas? Isn't that a
quite different FS structurally since it doesn't have segments? (AFAIK!)

With regards,
Reinoud Zandijk

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Some nilfs comments
       [not found]     ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>
  2008-12-22 13:01       ` Reinoud Zandijk
@ 2008-12-23  1:47       ` Chris Mason
       [not found]         ` <1229996820.4812.7.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
  2009-01-06 19:55       ` Chris Mason
  2 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2008-12-23  1:47 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote:
> Hi Chris,
> 
> On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote:
> > Hello,
> > 
> > I've been meaning to read through nilfs for a while, and I grabbed the
> > code today to take a look.  The code is very clean and it ran well out
> > of the box here.
> 
> Thank you very much for helpful comments!
> 

;) I'll be slow to answer since I'm traveling a bit.

> > One problem I hit early on was that nilfs doesn't seem to zero out the
> > block device during mkfs.  So after mkfs.nilfs2, mount still found my
> > old btrfs filesystem.  It would be a good idea to zero out the first and
> > last 1MB on the device (except for the first sector which might have the
> > partition table).
> 
> Okay, I'll take this idea in mkfs.nilfs2.
> 
> > I haven't dug too deeply in yet, but if there are parts you're most
> > interested in comments on, please let me know.
> 
> Well, I feel that the following two matters are particularlly
> questionable and need to be checked:
> 
> - struct the_nilfs:
>   NILFS allows users to mount snapshots without making additional
>   devices or volumes.  This is achieved by sharing a block device
>   among multiple mount instances (i.e. super_block structs).
>   the_nilfs struct is used for this sharing.
> 
>   This approach seems to be peculiar to nilfs, and I feel it needs
>   attention.

Btrfs also shares super blocks between snapshot mounts, and each
snapshot becomes a private inode number space.

So, it sounds like we have some things in common there.  I'm looking at
using more explicit mounts than I am now (with some hints from
Christoph). I'll take a look at the nilfs code in this area and see how
similar we really are.

> 
> - ioctl:
>   Ioctl interface (routines and structures) were implemented in an
>   own way.  These seems to be checked whether to comply with the rules
>   of ioctl design.

Ok, I'll take a look here as well.  The same goes for btrfs ;)
> 
> 
> > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by
> > do_sync_mapping_range().
> 
> Thanks!  I didn't notice that this function was added.
> Uum, it seems to require reconsidering the way to initiate writing of
> data pages.

Nick Piggin has been making noise to change do_sync_mapping_range to
pass WB_SYNC_ALL instead.  You can trick it a bit while things get
worked out and use current_is_pdflush() to only treat WB_SYNC_NONE as
WB_SYNC_NONE when pdflush is the one calling (sorry, its an ugly
suggestion).

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Some nilfs comments
       [not found]         ` <1229996820.4812.7.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
@ 2008-12-23 15:57           ` Ryusuke Konishi
  0 siblings, 0 replies; 10+ messages in thread
From: Ryusuke Konishi @ 2008-12-23 15:57 UTC (permalink / raw)
  To: chris.mason-QHcLZuEGTsvQT0dZR+AlfA
  Cc: users-JrjvKiOkagjYtjvyW6yDsg,
	konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg

Hi,

On Mon, 22 Dec 2008 20:47:00 -0500, Chris Mason wrote:
> > - struct the_nilfs:
> >   NILFS allows users to mount snapshots without making additional
> >   devices or volumes.  This is achieved by sharing a block device
> >   among multiple mount instances (i.e. super_block structs).
> >   the_nilfs struct is used for this sharing.
> > 
> >   This approach seems to be peculiar to nilfs, and I feel it needs
> >   attention.
> 
> Btrfs also shares super blocks between snapshot mounts, and each
> snapshot becomes a private inode number space.
> 
> So, it sounds like we have some things in common there.  I'm looking at
> using more explicit mounts than I am now (with some hints from
> Christoph). I'll take a look at the nilfs code in this area and see how
> similar we really are.

Yeah, it sounds nice.
So, I may not have to care this so much.
Comprehending common points between the two may be indeed meaningful
when we rethink mount/umount vfs interface.

> > 
> > - ioctl:
> >   Ioctl interface (routines and structures) were implemented in an
> >   own way.  These seems to be checked whether to comply with the rules
> >   of ioctl design.
> 
> Ok, I'll take a look here as well.  The same goes for btrfs ;)

All right ;)

> > > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by
> > > do_sync_mapping_range().
> > 
> > Thanks!  I didn't notice that this function was added.
> > Uum, it seems to require reconsidering the way to initiate writing of
> > data pages.
> 
> Nick Piggin has been making noise to change do_sync_mapping_range to
> pass WB_SYNC_ALL instead.  You can trick it a bit while things get
> worked out and use current_is_pdflush() to only treat WB_SYNC_NONE as
> WB_SYNC_NONE when pdflush is the one calling (sorry, its an ugly
> suggestion).

Thanks. I've found the Nick's patch in mmotm.

In either case, nilfs_writepages (or nilfs_writepage) needs some more
changes to ensure data integrity writeout against
sys_sync_file_range() because writeback flag on data pages is not set
in nilfs_writepages() and wait_on_page_writeback_range() will ignore
the pages.

For other write operations, the write sync is ensured in a file level
operation like the journal data mode of ext3/ext4.
But I think nilfs_writepages() should be revised to initiate writeback
in the function if possible.

Regards,
Ryusuke

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Some nilfs comments
       [not found]         ` <20081222130152.GA23725-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
@ 2008-12-23 17:23           ` Ryusuke Konishi
       [not found]             ` <20081224.022337.30445207.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2008-12-23 17:23 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, reinoud-S783fYmB3Ccdnm+yROfE0A
  Cc: chris.mason-QHcLZuEGTsvQT0dZR+AlfA

Hi Reinoud,
On Mon, 22 Dec 2008 14:01:52 +0100, Reinoud Zandijk <reinoud-S783fYmB3Ccdnm+yROfE0A@public.gmane.org> wrote:
> On Mon, Dec 22, 2008 at 06:07:19PM +0900, Ryusuke Konishi wrote:
> > Well, I feel that the following two matters are particularlly
> > questionable and need to be checked:
> > 
> > - struct the_nilfs:
> >   NILFS allows users to mount snapshots without making additional
> >   devices or volumes.  This is achieved by sharing a block device
> >   among multiple mount instances (i.e. super_block structs).
> >   the_nilfs struct is used for this sharing.
> > 
> >   This approach seems to be peculiar to nilfs, and I feel it needs
> >   attention.
> 
> I've dug into this too and decided to use the same mechanism for my
> implementation. I think its nice to have all the fs's administration centered
> at one place; i even have all the btree stuff there. One thing indeed is that
> its not possible to do say `unmount /dev/wd0a' if there are multiple mounts on
> it... but maybe thats better even ;) or its even a positive thing to just
> unmount all the mounts in one go.

I don't know the mount/umount interface of NetBSD, but perhaps unmount
system call takes a directory argument instead of a device.
If so, your kernel code would become similar to our nilfs.

Though the linux umount command can take a device argument, only the
latest mount is detached ;)

> > Thanks, I've also started to read btrfs.
> > I'll see it during the Christmas holidays  ;)
> 
> Whats your first impression? Want to take over things/ideas? Isn't that a
> quite different FS structurally since it doesn't have segments? (AFAIK!)

Yeah, btrfs has flexible and rich volume management layer, so very
interesting.  The snapshot code and c-tree also look like fun.

Regards,
Ryusuke

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Some nilfs comments
       [not found]             ` <20081224.022337.30445207.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-12-23 20:49               ` Chris Mason
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Mason @ 2008-12-23 20:49 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: reinoud-S783fYmB3Ccdnm+yROfE0A, users-JrjvKiOkagjYtjvyW6yDsg

On Wed, 2008-12-24 at 02:23 +0900, Ryusuke Konishi wrote:

> > > Thanks, I've also started to read btrfs.
> > > I'll see it during the Christmas holidays  ;)
> > 
> > Whats your first impression? Want to take over things/ideas? Isn't that a
> > quite different FS structurally since it doesn't have segments? (AFAIK!)
> 
> Yeah, btrfs has flexible and rich volume management layer, so very
> interesting.  The snapshot code and c-tree also look like fun.

The ctree (checksummed btree is really what it stands for) code itself
is pretty basic.  Aside from the checksumming, it is a simple btree with
generation numbers to find the nodes that have changed since a given
transaction.

In terms of interesting/new code that others might want look at, the
btrfs tree logging to do fsync efficiently (tree-log.c) and device
management (volumes.c) are probably the most interesting.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Some nilfs comments
       [not found]     ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>
  2008-12-22 13:01       ` Reinoud Zandijk
  2008-12-23  1:47       ` Chris Mason
@ 2009-01-06 19:55       ` Chris Mason
       [not found]         ` <1231271709.4888.27.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
  2 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2009-01-06 19:55 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote:
> > I haven't dug too deeply in yet, but if there are parts you're most
> > interested in comments on, please let me know.
> 
> Well, I feel that the following two matters are particularlly
> questionable and need to be checked:
> 

> - ioctl:
>   Ioctl interface (routines and structures) were implemented in an
>   own way.  These seems to be checked whether to comply with the rules
>   of ioctl design.
> 

It took me far too long to look at this, I'm sorry.  But, looking
through the ioctl code I think you'll have a few (easy to fix) problems.

struct nilfs_argv {
        void *v_base;
        size_t v_nmembs;        /* number of members */
        size_t v_size;          /* size of members */
        int v_index;
        int v_flags;
};

The structs that get passed from userland to kernel should use fixed
sized types.

I get around this by using u64s (and __u64s) everywhere.  It looks like
you have compat ioctls to fix this too, I find it easier to use the
fixed types.

Otherwise the ioctl code looks pretty reasonable.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Some nilfs comments
       [not found]         ` <1231271709.4888.27.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
@ 2009-01-07  5:20           ` Ryusuke Konishi
       [not found]             ` <20090107.142009.106620982.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2009-01-07  5:20 UTC (permalink / raw)
  To: chris.mason-QHcLZuEGTsvQT0dZR+AlfA
  Cc: users-JrjvKiOkagjYtjvyW6yDsg,
	konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg

On Tue, 06 Jan 2009 14:55:09 -0500, Chris Mason wrote:
> On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote:
> > > I haven't dug too deeply in yet, but if there are parts you're most
> > > interested in comments on, please let me know.
> > 
> > Well, I feel that the following two matters are particularlly
> > questionable and need to be checked:
> > 
> 
> > - ioctl:
> >   Ioctl interface (routines and structures) were implemented in an
> >   own way.  These seems to be checked whether to comply with the rules
> >   of ioctl design.
> > 
> 
> It took me far too long to look at this, I'm sorry.  But, looking
> through the ioctl code I think you'll have a few (easy to fix) problems.
> 
> struct nilfs_argv {
>         void *v_base;
>         size_t v_nmembs;        /* number of members */
>         size_t v_size;          /* size of members */
>         int v_index;
>         int v_flags;
> };
> 
> The structs that get passed from userland to kernel should use fixed
> sized types.
> 
> I get around this by using u64s (and __u64s) everywhere.  It looks like
> you have compat ioctls to fix this too, I find it easier to use the
> fixed types.
> 
> Otherwise the ioctl code looks pretty reasonable.
> 
> -chris

Thank you for this comment and for taking time from your busy
schedule.

I think it actually should be fixed to comply with the kernel rules.
The same problem is found in a few other ioctl structures.

I'd like to modify them and remove compat ioctls some time soon.


Thanks,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Some nilfs comments
       [not found]             ` <20090107.142009.106620982.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-01-07 14:16               ` Chris Mason
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Mason @ 2009-01-07 14:16 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: users-JrjvKiOkagjYtjvyW6yDsg,
	konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg

On Wed, 2009-01-07 at 14:20 +0900, Ryusuke Konishi wrote:
> On Tue, 06 Jan 2009 14:55:09 -0500, Chris Mason wrote:
> > On Mon, 2008-12-22 at 18:07 +0900, Ryusuke Konishi wrote:
> > > > I haven't dug too deeply in yet, but if there are parts you're most
> > > > interested in comments on, please let me know.
> > > 
> > > Well, I feel that the following two matters are particularlly
> > > questionable and need to be checked:
> > > 
> > 
> > > - ioctl:
> > >   Ioctl interface (routines and structures) were implemented in an
> > >   own way.  These seems to be checked whether to comply with the rules
> > >   of ioctl design.
> > > 
> > 
> > It took me far too long to look at this, I'm sorry.  But, looking
> > through the ioctl code I think you'll have a few (easy to fix) problems.
> > 
> > struct nilfs_argv {
> >         void *v_base;
> >         size_t v_nmembs;        /* number of members */
> >         size_t v_size;          /* size of members */
> >         int v_index;
> >         int v_flags;
> > };
> > 
> > The structs that get passed from userland to kernel should use fixed
> > sized types.
> > 
> > I get around this by using u64s (and __u64s) everywhere.  It looks like
> > you have compat ioctls to fix this too, I find it easier to use the
> > fixed types.
> > 
> > Otherwise the ioctl code looks pretty reasonable.
> > 
>
> I think it actually should be fixed to comply with the kernel rules.
> The same problem is found in a few other ioctl structures.
> 
> I'd like to modify them and remove compat ioctls some time soon.

Strictly speaking, your code is correct.  I suspect that if you ask 10
people you'll get 6 different answers on the best way.  The u64 is my
personal preference ;)

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-01-07 14:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-18  2:52 Some nilfs comments Chris Mason
     [not found] ` <1229568775.27170.134.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2008-12-22  9:07   ` Ryusuke Konishi
     [not found]     ` <20081222.180719.88488712.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-12-22 13:01       ` Reinoud Zandijk
     [not found]         ` <20081222130152.GA23725-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2008-12-23 17:23           ` Ryusuke Konishi
     [not found]             ` <20081224.022337.30445207.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-12-23 20:49               ` Chris Mason
2008-12-23  1:47       ` Chris Mason
     [not found]         ` <1229996820.4812.7.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2008-12-23 15:57           ` Ryusuke Konishi
2009-01-06 19:55       ` Chris Mason
     [not found]         ` <1231271709.4888.27.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2009-01-07  5:20           ` Ryusuke Konishi
     [not found]             ` <20090107.142009.106620982.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-01-07 14:16               ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox