From mboxrd@z Thu Jan  1 00:00:00 1970
From: Al Viro <viro@ZenIV.linux.org.uk>
Subject: Re: [patch 0/3] A few patches for dcache
Date: Fri, 29 Jul 2011 08:24:41 +0100
Message-ID: <20110729072441.GA2203@ZenIV.linux.org.uk>
References: <20110728131219.146414619@openvz.org>
 <20110729032503.GD5404@dastard>
 <20110729055918.GB15883@sun>
 <20110729065951.GE5404@dastard>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Cyrill Gorcunov <gorcunov@gmail.com>,
	LINUXFS-ML <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	James Bottomley <jbottomley@parallels.com>, xemul@openvz.org
To: Dave Chinner <david@fromorbit.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from zeniv.linux.org.uk ([195.92.253.2]:34416 "EHLO
	ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754803Ab1G2HYr (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Fri, 29 Jul 2011 03:24:47 -0400
Content-Disposition: inline
In-Reply-To: <20110729065951.GE5404@dastard>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Fri, Jul 29, 2011 at 04:59:51PM +1000, Dave Chinner wrote:
> On Fri, Jul 29, 2011 at 09:59:18AM +0400, Cyrill Gorcunov wrote:
> > On Fri, Jul 29, 2011 at 01:25:03PM +1000, Dave Chinner wrote:
> > ...
> > > 
> > > The VFS shrinker code is now already called on a per-sb basis. Each
> > > sb has it's own shrinker context that deals with dentries, inodes
> > > and anything a filesystem wants to have shrunk in the call. That
> > > solves the original issue I had with your "limit the dentry cache
> > > size" patch series in that it didn't shrink or limit the other VFS
> > > caches that were the ones that were really consuming all your
> > > memory...
> > 
> > Thanks for comments, Dave! Still the read only lock without
> > increasing sequence number might be useful, no? (patch 1)
> 
> I'll defer to Al on that one - the intricacies of the rename locking
> are way over my head.

I'm not sure that's safe.  Note that one use of rename_lock is that
we allow hash lookup to race with d_move().  Which can move object
from one hash chain to another, so hash lookup may end up jumping
from one chain to another and getting a false negative.  That's
why __d_lookup() is not safe without read_seqretry loop (or seq_writelock,
of course).

But look what happens if we do per-sb locks - d_move() derailing the
hash lookup might happen on *any* filesystem.  They all share the
same hash table.  So just checking that we hadn't done renames on
our filesystem is not enough to make sure we hadn't hit a false
negative.

Unless we go for making the hashtable itself per-superblock (and I really
doubt that it's a good idea), I don't see any obvious ways to avoid that
kind of race.  IOW, how would you implement safe d_lookup()?