public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@clusterfs.com>
To: Christopher Li <chrisl@vmware.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	Ext2 devel <ext2-devel@lists.sourceforge.net>,
	Linux Kernel List <linux-kernel@vger.kernel.org>
Subject: Re: [Ext2-devel] Re: [PATCH] Fix bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
Date: Wed, 6 Nov 2002 16:27:57 -0700	[thread overview]
Message-ID: <20021106232757.GT588@clusterfs.com> (raw)
In-Reply-To: <20021106172455.A7778@vmware.com>

On Nov 06, 2002  17:24 -0800, Christopher Li wrote:
> On Wed, Nov 06, 2002 at 04:40:27PM -0500, Theodore Ts'o wrote:
> > On Wed, Nov 06, 2002 at 12:25:00AM -0800, chrisl@vmware.com wrote:
> > > This should fix the ext3 htree rename problem. Please try it again.
> > 
> > I've looked over the patch, and I've got some comments....
> > 
> > >      handle = ext3_journal_start(old_dir, 2 * EXT3_DATA_TRANS_BLOCKS +
> > > -                                    EXT3_INDEX_EXTRA_TRANS_BLOCKS + 2);
> > > +                                    EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3);
> > 
> > There's no need to increase the number of blocks that might need to be
> > dirtied; if ext3_delete_entry() can't find the missing entry, it won't
> > dirty the block, so the number of blocks that might need to modified
> > remains constant.
> 
> Even for the same block dirty twice? I am not sure about that case
> so I put it there. I got a lots of thing to do tonight.

Ted is correct on this one - if we hit this bug because both the source
and target names were in the same block before the split, then after the
split we will still need to dirty only 2 blocks because of the rename
(the split blocks are accounted for in EXT3_INDEX_EXTRA_TRANS_BLOCKS).

> > Simply retrying the ext3_delete_entry() isn't sufficient, since
> > another ext3_add_entry() could move the directory entry again while
> > you're reading in the blocks as part of ext3_find_entry().  OK, that
> > would be pretty rare, since enough other directory adds would have
> > to fill up enough that another split could happen, but *is* possible.
> > (Surely our scheduler isn't that unfair....)
> 
> I think we have the lock on ext3_rename. I might be wrong.
> If other process can change the dir between the add new entry
> and delete old entry. We should also need to check the entry have
> been delete from other process in case fall into dead loop? 

Chris is right on this one.  Like Al said, the VFS holds i_sem on
"both" directories (or the single directory if src_dir & tgt_dir are
the same).  We don't need additional locking within ext3...(yet)

<aside>
For _real_ scaling of large directories, it would be good to allow
locking just individual leaf blocks of the directories instead of the
entire directory.  Since the source and target directory leaf blocks
are the only possible locations for those filenames, we do not have
any races w.r.t. other threads adding/deleting files of the same name
if we lock those directory blocks.

We will probably need to do this in the next year or so for Lustre where
we have a requirement for millions of files being created/renamed/deleted
in the same directory by thousands of clients, and tens of metadata
servers load-balancing those requests.
</aside>

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


      parent reply	other threads:[~2002-11-06 23:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-05  4:47 bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink Jeremy Fitzhardinge
2002-11-05 21:24 ` [Ext2-devel] " chrisl
2002-11-06  5:02   ` Jeremy Fitzhardinge
2002-11-06  8:25   ` [PATCH] Fix " chrisl
2002-11-06  8:44     ` Andrew Morton
2002-11-06 21:40     ` Theodore Ts'o
2002-11-07  1:24       ` Christopher Li
2002-11-06 22:41         ` Theodore Ts'o
2002-11-06 22:47           ` Alexander Viro
2002-11-07  2:44             ` Theodore Ts'o
2002-11-07  1:58           ` Christopher Li
2002-11-06 23:27         ` Andreas Dilger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021106232757.GT588@clusterfs.com \
    --to=adilger@clusterfs.com \
    --cc=chrisl@vmware.com \
    --cc=ext2-devel@lists.sourceforge.net \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox