All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jörn Engel" <joern@logfs.org>
To: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, kihara.seiji@lab.ntt.co.jp,
	amagai.yoshiji@lab.ntt.co.jp
Subject: Re: [PATCH 25/27] nilfs2: block cache for garbage collection
Date: Thu, 18 Sep 2008 00:49:53 +0200	[thread overview]
Message-ID: <20080917224953.GB14644@logfs.org> (raw)
In-Reply-To: <20080918.040945.32654226.ryusuke@osrg.net>

On Thu, 18 September 2008 04:09:45 +0900, Ryusuke Konishi wrote:
> 
> > Using dummy inodes is... unusual.  Why can you not use the actual inodes
> > those blocks belong to?
> 
> Because we have to treat blocks that belong to a same file but have
> different checkpoint numbers.  (NILFS2 keeps up multiple
> checkpoints/snapshots across GC)
> 
> Of course, if the standard inode hash is applicable, I prefer it.
> ilookup5 or its variant may be applicable for this.

If that is possible I would definitely prefer it.

> If so, the remaining problem would be the lock dependencies as you
> mentioned before.

You should have the same problem already - in some shape or another.  If
you can have two data structures for the same content, a real inode and
a dummy inode, you have a race condition.  Quite possibly one involving
data corruption.

Well, one way to avoid both the race and the locking complexity is by
stopping all writes during GC and destroying all dummy inodes before
writes resume.  But that would be inefficient in several cases.  When
GC'ing data that is dirty in the caches, you move the old stale data
during GC and write the new data soon after.  And you always flush the
caches after GC, even if your machine has no better use for the memory.

So unless I missed something important, I believe the locking is well
worth the effort.

BTW: Some of the explanation you just gave me would do well as
documentation in the source file as well.  That's the sort of background
information new developers can spend month of mistakes and reverse
engineering on. :)

Jörn

-- 
Those who come seeking peace without a treaty are plotting.
-- Sun Tzu

WARNING: multiple messages have this Message-ID (diff)
From: "Jörn Engel" <joern@logfs.org>
To: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, kihara.seiji@lab.ntt.co.jp,
	amagai.yoshiji@lab.ntt.co.jp
Subject: Re: [PATCH 25/27] nilfs2: block cache for garbage collection
Date: Thu, 18 Sep 2008 00:49:53 +0200	[thread overview]
Message-ID: <20080917224953.GB14644@logfs.org> (raw)
In-Reply-To: <20080918.040945.32654226.ryusuke@osrg.net>

On Thu, 18 September 2008 04:09:45 +0900, Ryusuke Konishi wrote:
> 
> > Using dummy inodes is... unusual.  Why can you not use the actual inodes
> > those blocks belong to?
> 
> Because we have to treat blocks that belong to a same file but have
> different checkpoint numbers.  (NILFS2 keeps up multiple
> checkpoints/snapshots across GC)
> 
> Of course, if the standard inode hash is applicable, I prefer it.
> ilookup5 or its variant may be applicable for this.

If that is possible I would definitely prefer it.

> If so, the remaining problem would be the lock dependencies as you
> mentioned before.

You should have the same problem already - in some shape or another.  If
you can have two data structures for the same content, a real inode and
a dummy inode, you have a race condition.  Quite possibly one involving
data corruption.

Well, one way to avoid both the race and the locking complexity is by
stopping all writes during GC and destroying all dummy inodes before
writes resume.  But that would be inefficient in several cases.  When
GC'ing data that is dirty in the caches, you move the old stale data
during GC and write the new data soon after.  And you always flush the
caches after GC, even if your machine has no better use for the memory.

So unless I missed something important, I believe the locking is well
worth the effort.

BTW: Some of the explanation you just gave me would do well as
documentation in the source file as well.  That's the sort of background
information new developers can spend month of mistakes and reverse
engineering on. :)

Jörn

-- 
Those who come seeking peace without a treaty are plotting.
-- Sun Tzu
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2008-09-17 22:50 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-14 19:07 [PATCH 0/27] nilfs2: continuous snapshotting file system Ryusuke Konishi
2008-09-14 19:07 ` [PATCH 01/27] nilfs2: add document Ryusuke Konishi
2008-09-14 19:07   ` [PATCH 02/27] nilfs2: disk format and userland interface Ryusuke Konishi
2008-09-14 19:08     ` [PATCH 03/27] nilfs2: add inode and other major structures Ryusuke Konishi
2008-09-14 19:08       ` [PATCH 04/27] nilfs2: integrated block mapping Ryusuke Konishi
2008-09-14 19:08         ` [PATCH 05/27] nilfs2: B-tree based " Ryusuke Konishi
2008-09-14 19:08           ` [PATCH 06/27] nilfs2: direct " Ryusuke Konishi
2008-09-14 19:08             ` [PATCH 07/27] nilfs2: B-tree node cache Ryusuke Konishi
2008-09-14 19:08               ` [PATCH 08/27] nilfs2: buffer and page operations Ryusuke Konishi
2008-09-14 19:08                 ` [PATCH 09/27] nilfs2: meta data file Ryusuke Konishi
2008-09-14 19:08                   ` [PATCH 10/27] nilfs2: persistent object allocator Ryusuke Konishi
2008-09-14 19:08                     ` [PATCH 11/27] nilfs2: disk address translator Ryusuke Konishi
2008-09-14 19:08                       ` [PATCH 12/27] nilfs2: inode map file Ryusuke Konishi
2008-09-14 19:08                         ` [PATCH 13/27] nilfs2: checkpoint file Ryusuke Konishi
2008-09-14 19:08                           ` [PATCH 14/27] nilfs2: segment usage file Ryusuke Konishi
2008-09-14 19:08                             ` [PATCH 15/27] nilfs2: inode operations Ryusuke Konishi
2008-09-14 19:08                               ` [PATCH 16/27] nilfs2: file operations Ryusuke Konishi
2008-09-14 19:08                                 ` [PATCH 17/27] nilfs2: directory entry operations Ryusuke Konishi
2008-09-14 19:08                                   ` [PATCH 18/27] nilfs2: pathname operations Ryusuke Konishi
2008-09-14 19:08                                     ` [PATCH 19/27] nilfs2: operations for the_nilfs core object Ryusuke Konishi
2008-09-14 19:08                                       ` [PATCH 20/27] nilfs2: super block operations Ryusuke Konishi
2008-09-14 19:08                                         ` [PATCH 21/27] nilfs2: segment buffer Ryusuke Konishi
2008-09-14 19:08                                           ` [PATCH 22/27] nilfs2: segment constructor Ryusuke Konishi
2008-09-14 19:08                                             ` [PATCH 23/27] nilfs2: recovery functions Ryusuke Konishi
2008-09-14 19:08                                               ` [PATCH 24/27] nilfs2: another dat for garbage collection Ryusuke Konishi
2008-09-14 19:08                                                 ` [PATCH 25/27] nilfs2: block cache " Ryusuke Konishi
2008-09-14 19:08                                                   ` [PATCH 26/27] nilfs2: ioctl operations Ryusuke Konishi
2008-09-14 19:08                                                     ` [PATCH 27/27] nilfs2: update makefile and Kconfig Ryusuke Konishi
2008-09-17 14:41                                                   ` [PATCH 25/27] nilfs2: block cache for garbage collection Jörn Engel
2008-09-17 14:41                                                     ` Jörn Engel
2008-09-17 19:09                                                     ` Ryusuke Konishi
2008-09-17 19:09                                                       ` Ryusuke Konishi
2008-09-17 22:49                                                       ` Jörn Engel [this message]
2008-09-17 22:49                                                         ` Jörn Engel
2008-09-20 10:43                                                         ` Ryusuke Konishi
2008-09-20 10:43                                                           ` Ryusuke Konishi
2008-09-20 11:04                                                           ` Jörn Engel
2008-09-20 11:04                                                             ` Jörn Engel
2008-09-15 18:20                                     ` [PATCH 18/27] nilfs2: pathname operations Pekka Enberg
2008-09-16  5:31                                       ` konishi.ryusuke
2008-09-17 14:31     ` [PATCH 02/27] nilfs2: disk format and userland interface Jörn Engel
2008-09-17 15:51       ` Ryusuke Konishi
2008-09-17 15:51         ` Ryusuke Konishi
2008-09-15  9:54   ` [PATCH 01/27] nilfs2: add document Pavel Machek
2008-09-15 20:10     ` konishi.ryusuke
2008-09-16 13:38       ` Chris Mason
2008-09-17 14:54   ` Jörn Engel
2008-09-17 14:54     ` Jörn Engel
2008-09-17 17:52     ` Ryusuke Konishi
2008-09-17 17:52       ` Ryusuke Konishi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080917224953.GB14644@logfs.org \
    --to=joern@logfs.org \
    --cc=akpm@linux-foundation.org \
    --cc=amagai.yoshiji@lab.ntt.co.jp \
    --cc=kihara.seiji@lab.ntt.co.jp \
    --cc=konishi.ryusuke@lab.ntt.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.