From: "Jörn Engel" <joern@logfs.org>
To: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] nilfs2: continuous snapshotting file system
Date: Wed, 27 Aug 2008 20:13:38 +0200 [thread overview]
Message-ID: <20080827181338.GC1371@logfs.org> (raw)
In-Reply-To: <200808261654.AA00216@capsicum.lab.ntt.co.jp>
On Wed, 27 August 2008 01:54:30 +0900, Ryusuke Konishi wrote:
>
> Yeah, it was very tough battle :)
> Read is OK. But write was hard. I looked at the vfs code over again and
> again.
> We've implemented NILFS without bringing specific changes into vfs.
> However, if we can find common basis for LFSes, I'm grad to cooperate
> with you.
> Though I don't know whether exporting inode_lock is the case or not ;)
Well, I was looking more for something like a list of problems and
solutions. Partially because I am plain curious and partially because I
know those are the problem areas of any log-structured filesystem and
they deserve special attention in a review.
In logfs, garbage collection may read (and write) any inode and any
block from any file. And since garbage collection may be called from
writepage() and write_inode(), the fun included:
P: iget() on the inode being currently written back and locked.
S: Split I_LOCK into I_LOCK and I_SYNC. Has been merged upstream.
P: iget() on an inode in I_FREEING or I_WILL_FREE state.
S: Add inodes to a list in drop_inode() and remove them again in
destroy_inode(). iget() in GC context is wrapped in a method that
checks said list first and return an inode from the list when
applicable. Used to hold inode_lock to prevent races, but a
logfs-local lock is actually sufficient.
If either of the two problems above is solved by calling
ilookup5_nowait() I bet you a fiver that a race with data corruption is
lurking somewhere in the area.
P: find_get_page() or some variant on a page handed to
logfs_writepage().
S: Use the one available page flag, PG_owner_priv_1 to mark pages that
are waiting for the single-threaded logfs write path. If any page GC
needs is locked, check for PG_owner_priv_1 and if it is set, just use
the page anyway. Whoever has set the flag cannot clear it until GC
has finished.
If the flag is not set, the page might still be somewhere in the
logfs write path - before setting the page. So simply do the check
in a loop, call schedule() each time, knock on wood and keep your
fingers crossed that the page will either become unlocked and set
PG_owner_priv_1 sometime soon. I'm not proud of this solution but
know no better one.
So something like the above for nilfs would be useful. And maybe, just
to be on the safe side, try the following testcase overnight:
- Create tiny filesystem (32M or so).
- Fill filesystem 100% with a single file.
- Rewrite random parts of the file in an endless loop.
Or even better, combine this testcase with some automated system crashes
and do an fsck every time the system comes back up. ;)
Jörn
--
Geld macht nicht glücklich.
Glück macht nicht satt.
next prev parent reply other threads:[~2008-08-27 18:14 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-20 2:45 [PATCH RFC] nilfs2: continuous snapshotting file system Ryusuke Konishi
2008-08-20 7:43 ` Andrew Morton
2008-08-20 8:22 ` Pekka Enberg
2008-08-20 18:47 ` Ryusuke Konishi
2008-08-20 16:13 ` Ryusuke Konishi
2008-08-20 21:25 ` Szabolcs Szakacsits
2008-08-20 21:39 ` Andrew Morton
2008-08-20 21:48 ` Szabolcs Szakacsits
2008-08-21 2:12 ` Dave Chinner
2008-08-21 2:46 ` Szabolcs Szakacsits
2008-08-21 5:15 ` XFS vs Elevators (was Re: [PATCH RFC] nilfs2: continuous snapshotting file system) Dave Chinner
2008-08-21 6:00 ` gus3
2008-08-21 6:14 ` Dave Chinner
2008-08-21 7:00 ` Nick Piggin
2008-08-21 8:53 ` Dave Chinner
2008-08-21 9:33 ` Nick Piggin
2008-08-21 17:08 ` Dave Chinner
2008-08-22 2:29 ` Nick Piggin
2008-08-25 1:59 ` Dave Chinner
2008-08-25 4:32 ` Nick Piggin
2008-08-25 12:01 ` Jamie Lokier
2008-08-26 3:07 ` Dave Chinner
2008-08-26 3:50 ` david
2008-08-27 1:20 ` Dave Chinner
2008-08-27 21:54 ` david
2008-08-28 1:08 ` Dave Chinner
2008-08-21 14:52 ` Chris Mason
2008-08-21 6:04 ` Dave Chinner
2008-08-21 8:07 ` Aaron Carroll
2008-08-21 8:25 ` Dave Chinner
2008-08-21 11:02 ` Martin Steigerwald
2008-08-21 15:00 ` Martin Steigerwald
2008-08-21 17:10 ` Szabolcs Szakacsits
2008-08-21 17:33 ` Szabolcs Szakacsits
2008-08-22 2:24 ` Dave Chinner
2008-08-22 6:49 ` Martin Steigerwald
2008-08-22 12:44 ` Szabolcs Szakacsits
2008-08-23 12:52 ` Szabolcs Szakacsits
2008-08-21 11:53 ` Matthew Wilcox
2008-08-21 15:56 ` Dave Chinner
2008-08-21 12:51 ` [PATCH RFC] nilfs2: continuous snapshotting file system Chris Mason
2008-08-26 10:16 ` Jörn Engel
2008-08-26 16:54 ` Ryusuke Konishi
2008-08-27 18:13 ` Jörn Engel [this message]
2008-08-27 18:19 ` Jörn Engel
2008-08-29 6:29 ` Ryusuke Konishi
2008-08-29 8:40 ` Arnd Bergmann
2008-08-29 10:51 ` konishi.ryusuke
2008-08-29 11:04 ` Jörn Engel
2008-08-29 10:45 ` Jörn Engel
2008-08-29 16:37 ` Ryusuke Konishi
2008-08-29 19:16 ` Jörn Engel
2008-09-01 12:25 ` Ryusuke Konishi
2008-08-20 9:47 ` Andi Kleen
2008-08-21 4:57 ` Ryusuke Konishi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080827181338.GC1371@logfs.org \
--to=joern@logfs.org \
--cc=akpm@linux-foundation.org \
--cc=konishi.ryusuke@lab.ntt.co.jp \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox