From: Russell Cattelan <cattelan@thebarn.com>
To: wcheng@redhat.com
Cc: Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] prune_icache_sb
Date: Mon, 04 Dec 2006 10:51:19 -0600 [thread overview]
Message-ID: <45745207.4030107@thebarn.com> (raw)
In-Reply-To: <45730E36.10309@redhat.com>
Wendy Cheng wrote:
> Andrew Morton wrote:
>
>> On Thu, 30 Nov 2006 11:05:32 -0500
>> Wendy Cheng <wcheng@redhat.com> wrote:
>>
>>
>>
>>>
>>> The idea is, instead of unconditionally dropping every buffer
>>> associated with the particular mount point (that defeats the purpose
>>> of page caching), base kernel exports the "drop_pagecache_sb()" call
>>> that allows page cache to be trimmed. More importantly, it is
>>> changed to offer the choice of not randomly purging any buffer but
>>> the ones that seem to be unused (i_state is NULL and i_count is
>>> zero). This will encourage filesystem(s) to pro actively response to
>>> vm memory shortage if they choose so.
>>>
>>
>>
>> argh.
>>
>>
> I read this as "It is ok to give system admin(s) commands (that this
> "drop_pagecache_sb() call" is all about) to drop page cache. It is,
> however, not ok to give filesystem developer(s) this very same
> function to trim their own page cache if the filesystems choose to do
> so" ?
>
>> In Linux a filesystem is a dumb layer which sits between the VFS and the
>> I/O layer and provides dumb services such as reading/writing inodes,
>> reading/writing directory entries, mapping pagecache offsets to disk
>> blocks, etc. (This model is to varying degrees incorrect for every
>> post-ext2 filesystem, but that's the way it is).
>>
>>
> Linux kernel, particularly the VFS layer, is starting to show signs of
> inadequacy as the software components built upon it keep growing. I
> have doubts that it can keep up and handle this complexity with a
> development policy like you just described (filesystem is a dumb layer
> ?). Aren't these DIO_xxx_LOCKING flags inside __blockdev_direct_IO() a
> perfect example why trying to do too many things inside vfs layer for
> so many filesystems is a bad idea ? By the way, since we're on this
> subject, could we discuss a little bit about vfs rename call (or I can
> start another new discussion thread) ?
>
> Note that linux do_rename() starts with the usual lookup logic,
> followed by "lock_rename", then a final round of dentry lookup, and
> finally comes to filesystem's i_op->rename call. Since lock_rename()
> only calls for vfs layer locks that are local to this particular
> machine, for a cluster filesystem, there exists a huge window between
> the final lookup and filesystem's i_op->rename calls such that the
> file could get deleted from another node before fs can do anything
> about it. Is it possible that we could get a new function pointer
> (lock_rename) in inode_operations structure so a cluster filesystem
> can do proper locking ?
It looks like the ocfs2 guys have the similar problem?
http://ftp.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/ocfs2_git_patches/ocfs2-upstream-linus-20060924/0009-PATCH-Allow-file-systems-to-manually-d_move-inside-of-rename.txt
Does this change help fix gfs lock ordering problem as well?
-Russell Cattelan
cattelan@xfs.org
next prev parent reply other threads:[~2006-12-04 16:52 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-22 21:35 [PATCH] prune_icache_sb Wendy Cheng
2006-11-22 23:36 ` Andrew Morton
2006-11-27 23:52 ` Wendy Cheng
2006-11-28 0:52 ` Andrew Morton
2006-11-28 21:41 ` Wendy Cheng
2006-11-29 0:21 ` Andrew Morton
2006-11-29 6:02 ` Wendy Cheng
2006-11-30 16:05 ` Wendy Cheng
2006-11-30 19:31 ` Nate Diller
2006-12-01 21:23 ` Andrew Morton
2006-12-03 17:49 ` Wendy Cheng
2006-12-03 20:47 ` Andrew Morton
2006-12-04 5:57 ` Wendy Cheng
2006-12-04 6:28 ` Andrew Morton
2006-12-04 16:41 ` [PATCH] SLAB : use a multiply instead of a divide in obj_to_index() Eric Dumazet
2006-12-04 16:55 ` Christoph Lameter
2006-12-04 18:18 ` Eric Dumazet
2006-12-04 19:49 ` Andrew Morton
2006-12-04 19:55 ` Christoph Lameter
2006-12-04 21:34 ` Eric Dumazet
2006-12-04 21:56 ` David Miller
2006-12-04 22:45 ` Eric Dumazet
2006-12-05 14:42 ` Pavel Machek
2006-12-04 16:51 ` Russell Cattelan [this message]
2006-12-04 20:46 ` [PATCH] prune_icache_sb Wendy Cheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45745207.4030107@thebarn.com \
--to=cattelan@thebarn.com \
--cc=akpm@osdl.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=wcheng@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.