From: Dave Hansen <haveblue@us.ibm.com>
To: Mark Fasheh <mark.fasheh@oracle.com>
Cc: linux-kernel@vger.kernel.org, viro@ftp.linux.org.uk,
herbert@13thfloor.at, hch@infradead.org
Subject: Re: [PATCH 04/28] OCFS2 is (not) screwy
Date: Tue, 01 Aug 2006 20:19:55 -0700 [thread overview]
Message-ID: <1154488795.7232.17.camel@localhost.localdomain> (raw)
In-Reply-To: <20060802021411.GG29686@ca-server1.us.oracle.com>
[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]
On Tue, 2006-08-01 at 19:14 -0700, Mark Fasheh wrote:
> On Tue, Aug 01, 2006 at 04:52:43PM -0700, Dave Hansen wrote:
> > OCFS2 plays some games with i_nlink. It modifies it a bunch in
> > its unlink function, but rolls back the changes if an error
> > occurs. So, we can't just assume that iput_final() will happen
> > whenever i_nlink hits 0 in ocfs's unlink().
> Huh? Did you read the code? Or is it just easier to call things "screwy" and
> start hacking away?
>
> i_nlink only gets rolled back in the case that the file system wasn't able to
> actually complete the unlink / orphan operation. The idea is to keep it in
> sync with what's actually on disk. So when we call iput() in the unlink
> path, disk and struct inode should be accurate.
BTW, some gunk appears to have migrated into this patch that should have
been earlier in the series. I'll fix that up.
What do you think about the attached patch? It delays actually touching
i_nlink until the place where saved_nlink used to be zero'd. I assume
that is the point when we're sure that the inode is going to go away.
Also, instead of just clearing i_nlink for the directory case, I just do
two decrements. I did that for a few other filesystems as well. I
guess it can be collapsed to a single operation, but I'm not sure it is
worth the trouble.
Completely and utterly untested, uncompiled patch attached. Please
consider its filename a formal apology for calling your filesystem
screwy. :)
It might also be worth putting the 'double decrement i_nlink if it is a
directory' behavior in libfs.c. It appears to be pretty common logic
around the different filesystems.
Thanks for the thorough review!
-- Dave
[-- Attachment #2: ocfs-is-not-screwy.patch --]
[-- Type: text/x-patch, Size: 2017 bytes --]
--- linux-2.6-patches/fs/ocfs2/namei.c.orig 2006-08-01 20:00:58.000000000 -0700
+++ linux-2.6-patches/fs/ocfs2/namei.c 2006-08-01 20:08:14.000000000 -0700
@@ -748,7 +748,6 @@
struct dentry *dentry)
{
int status;
- unsigned int saved_nlink = 0;
struct inode *inode = dentry->d_inode;
struct ocfs2_super *osb = OCFS2_SB(dir->i_sb);
u64 blkno;
@@ -823,16 +822,6 @@
}
}
- /* There are still a few steps left until we can consider the
- * unlink to have succeeded. Save off nlink here before
- * modification so we can set it back in case we hit an issue
- * before commit. */
- saved_nlink = inode->i_nlink;
- if (S_ISDIR(inode->i_mode))
- inode->i_nlink = 0;
- else
- inode->i_nlink--;
-
status = ocfs2_request_unlink_vote(inode, dentry,
(unsigned int) inode->i_nlink);
if (status < 0) {
@@ -842,7 +831,7 @@
goto leave;
}
- if (!inode->i_nlink) {
+ if (inode->i_nlink == 1) {
status = ocfs2_prepare_orphan_dir(osb, handle, inode,
orphan_name,
&orphan_entry_bh);
@@ -869,7 +858,7 @@
fe = (struct ocfs2_dinode *) fe_bh->b_data;
- if (!inode->i_nlink) {
+ if (inode->i_nlink == 1) {
status = ocfs2_orphan_add(osb, handle, inode, fe, orphan_name,
orphan_entry_bh);
if (status < 0) {
@@ -888,7 +877,9 @@
/* We can set nlink on the dinode now. clear the saved version
* so that it doesn't get set later. */
fe->i_links_count = cpu_to_le16(inode->i_nlink);
- saved_nlink = 0;
+ inode_drop_nlink(inode);
+ if (S_ISDIR(inode->i_mode))
+ inode_drop_nlink(inode);
status = ocfs2_journal_dirty(handle, fe_bh);
if (status < 0) {
@@ -897,19 +888,15 @@
}
if (S_ISDIR(inode->i_mode)) {
- dir->i_nlink--;
status = ocfs2_mark_inode_dirty(handle, dir,
parent_node_bh);
- if (status < 0) {
+ if (status < 0)
mlog_errno(status);
- dir->i_nlink++;
- }
+ else
+ inode_drop_nlink(dir);
}
leave:
- if (status < 0 && saved_nlink)
- inode->i_nlink = saved_nlink;
-
if (handle)
ocfs2_commit_trans(handle);
next prev parent reply other threads:[~2006-08-02 3:25 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-01 23:52 [PATCH 00/28] Mount writer count and read-only bind mounts (v5) Dave Hansen
2006-08-01 23:52 ` [PATCH 01/28] prepare for write access checks: collapse if() Dave Hansen
2006-08-03 14:32 ` Christoph Hellwig
2006-08-01 23:52 ` [PATCH 02/28] r/o bind mount prepwork: move open_namei()'s vfs_create() Dave Hansen
2006-08-03 14:33 ` Christoph Hellwig
2006-08-01 23:52 ` [PATCH 03/28] unlink: monitor i_nlink Dave Hansen
2006-08-03 14:35 ` Christoph Hellwig
2006-08-01 23:52 ` [PATCH 05/28] monitor zeroing of i_nlink Dave Hansen
2006-08-01 23:52 ` [PATCH 04/28] OCFS2 is screwy Dave Hansen
2006-08-02 2:14 ` Mark Fasheh
2006-08-02 3:19 ` Dave Hansen [this message]
2006-08-02 3:21 ` Dave Hansen
2006-08-02 4:34 ` Mark Fasheh
2006-08-03 0:20 ` Mark Fasheh
2006-08-04 21:01 ` [PATCH] clean up OCFS2 nlink handling Dave Hansen
2006-08-04 21:38 ` Mark Fasheh
2006-08-01 23:52 ` [PATCH 06/28] reintroduce list of vfsmounts over superblock Dave Hansen
2006-08-03 14:39 ` Christoph Hellwig
2006-08-04 21:47 ` Dave Hansen
2006-08-01 23:52 ` [PATCH 07/28] Add vfsmount writer count Dave Hansen
2006-08-01 23:52 ` [PATCH 09/28] kill open files traverse on remount ro Dave Hansen
2006-08-01 23:52 ` [PATCH 08/28] record when sb_writer_count elevated for inode Dave Hansen
2006-08-01 23:52 ` [PATCH 10/28] increment sb writer count when nlink hits zero Dave Hansen
2006-08-01 23:52 ` [PATCH 11/28] elevate writer count for chown and friends Dave Hansen
2006-08-01 23:52 ` [PATCH 12/28] elevate mnt writers for callers of vfs_mkdir() Dave Hansen
2006-08-01 23:52 ` [PATCH 13/28] elevate write count during entire ncp_ioctl() Dave Hansen
2006-08-01 23:52 ` [PATCH 14/28] sys_symlinkat() elevate write count around vfs_symlink() Dave Hansen
2006-08-01 23:52 ` [PATCH 15/28] elevate mount count for extended attributes Dave Hansen
2006-08-01 23:52 ` [PATCH 16/28] sys_linkat(): elevate write count around vfs_link() Dave Hansen
2006-08-01 23:52 ` [PATCH 18/28] unix_find_other() elevate write count for touch_atime() Dave Hansen
2006-08-01 23:52 ` [PATCH 17/28] mount_is_safe(): add comment Dave Hansen
2006-08-01 23:52 ` [PATCH 19/28] elevate write count over calls to vfs_rename() Dave Hansen
2006-08-01 23:52 ` [PATCH 20/28] tricky: elevate write count files are open()ed Dave Hansen
2006-08-01 23:52 ` [PATCH 22/28] elevate write count for do_utimes() Dave Hansen
2006-08-01 23:52 ` [PATCH 21/28] elevate writer count for do_sys_truncate() Dave Hansen
2006-08-01 23:52 ` [PATCH 23/28] elevate write count for do_sys_utime() and touch_atime() Dave Hansen
2006-08-01 23:52 ` [PATCH 24/28] sys_mknodat(): elevate write count for vfs_mknod/create() Dave Hansen
2006-08-01 23:52 ` [PATCH 25/28] elevate mnt writers for vfs_unlink() callers Dave Hansen
2006-08-01 23:52 ` [PATCH 26/28] do_rmdir(): elevate write count Dave Hansen
2006-08-01 23:53 ` [PATCH 27/28] elevate writer count for custom 'struct file' Dave Hansen
2006-08-03 14:42 ` Christoph Hellwig
2006-08-11 20:31 ` Dave Hansen
2006-08-01 23:53 ` [PATCH 28/28] honor r/w changes at do_remount() time Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1154488795.7232.17.camel@localhost.localdomain \
--to=haveblue@us.ibm.com \
--cc=hch@infradead.org \
--cc=herbert@13thfloor.at \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.fasheh@oracle.com \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox