linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Henriques <lhenriques@suse.de>
To: Dominique Martinet <asmadeus@codewreck.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>,
	Latchesar Ionkov <lucho@ionkov.net>,
	David Howells <dhowells@redhat.com>,
	linux-fsdevel@vger.kernel.org,
	v9fs-developer@lists.sourceforge.net
Subject: Re: 9p: fscache duplicate cookie
Date: Mon, 10 May 2021 12:47:28 +0100	[thread overview]
Message-ID: <871rae24kv.fsf@suse.de> (raw)
In-Reply-To: <875yzq270z.fsf@suse.de> (Luis Henriques's message of "Mon, 10 May 2021 11:54:36 +0100")

[-- Attachment #1: Type: text/plain, Size: 5029 bytes --]

Luis Henriques <lhenriques@suse.de> writes:

> Dominique Martinet <asmadeus@codewreck.org> writes:
>
>> Luis Henriques wrote on Fri, May 07, 2021 at 05:36:29PM +0100:
>>> Ok, I spent some more time on this issue today.  I've hacked a bit of code
>>> to keep track of new inodes' qids and I'm convinced there are no
>>> duplicates when this issue happens.
>>
>> Ok, that's good.
>> Just to make sure what did you look at aside of the qid to make sure
>> it's unique? i_ino comes straight from qid->path too so we don't have
>> any great key available which is why I hadn't suggesting building a
>> debug table.
>> (... well, actually that means we'll never try to allocate two inodes
>> with the same inode number because of how v9fs_qid_iget_dotl() works, so
>> if there is a collision in qid paths we wouldn't see it through cookies
>> collision in the first place. I'm not sure that's good? But at least
>> that clears up that theory, sorry for the bad suggestion)
>>
>
> Ok, I should probably have added some more details in my email.  So,
> here's what I did:
>
> I've created a list (actually a tree...) of objects that keep track of
> each new inode in v9fs_qid_iget_dotl().  These objects contain the inode
> number, and the qid (type, version, path).  These are then removed from
> the list in v9fs_evict_inode().
>
> Whenever there's an error in v9fs_cache_inode_get_cookie(), i.e. when
> v9inode->fscache == NULL, I'll search this list to see if that inode
> number was there (or if I can find the qid.path and qid.version).  
>
> I have never seen a hit in this search, hence my claim of not having a
> duplicate qids involved.  Of course my hack can be buggy and I completely
> miss it ;-)
>
>>> OTOH, I've done another quick test: in v9fs_cache_inode_get_cookie(), I do
>>> an fscache_acquire_cookie() retry when it fails (due to the dup error),
>>> and this retry *does* succeed.  Which means, I guess, there's a race going
>>> on.  I didn't managed to look too deep yet, but my current theory is that
>>> the inode is being evicted while an open is triggered.  A new inode is
>>> allocated but the old inode fscache cookie hasn't yet been relinquished.
>>> Does this make any sense?
>>
>> hm, if the retry goes through I guess that'd make sense; if they both
>> were used in parallel the second call should fail all the same so that's
>> definitely a likely explanation.
>>
>> It wouldn't hurt to check with v9fs_evict_inode if that's correct...
>> There definitely is a window where inode is no longer findable (thus
>> leading to allocation of a new one) and the call to the
>> fscache_relinquish_cookie() at eviction, but looking at e.g. afs they
>> are doing exactly the same as 9p here (iget5_locked, if that gets a new
>> inode then call fscache_acquire_cookie // fscache_relinquish_cookie in
>> evict op) so I'm not sure what we're missing.
>>
>>
>> David, do you have an idea?
>
> I've just done a quick experiment: moving the call to function
> v9fs_cache_inode_put_cookie() in v9fs_evict_inode() to the beginning
> (before truncate_inode_pages_final()) and it seems to, at least, narrow
> the window -- I'm not able to reproduce the issue anymore.  But I'll need
> to look closer.
>

And right after sending this email I decided to try a different
experiment.  Here's the code I had in v9fs_evict_inode():

void v9fs_evict_inode(struct inode *inode)
{
	struct v9fs_inode *v9inode = V9FS_I(inode);

	v9fs_debug_remove(inode->i_ino); /* <------------------------- */
	truncate_inode_pages_final(&inode->i_data);
	clear_inode(inode);
	filemap_fdatawrite(&inode->i_data);

	v9fs_cache_inode_put_cookie(inode);
	/* clunk the fid stashed in writeback_fid */
	if (v9inode->writeback_fid) {
		p9_client_clunk(v9inode->writeback_fid);
		v9inode->writeback_fid = NULL;
	}
}

v9fs_debug_remove() is the function that would remove the inode from my
debug tree.  In my new experiment, I changed this with:

void v9fs_evict_inode(struct inode *inode)
{
	struct v9fs_inode *v9inode = V9FS_I(inode);

	v9fs_debug_tag(inode->i_ino);
	(...)
	v9fs_debug_remove(inode->i_ino);
}

So, I effectively "tag" the inode for removing it from the list but only
remove it in the end.  The result is that I actually started seeing this
inode tagged for removing in v9fs_cache_inode_get_cookie() when fscache
detects the duplicate cookie!

I'm attaching the debug patch I'm using.  Obviously, I'm not really proud
of this code and is not for merging (it's *really* hacky!), but maybe it
helps clarifying what I tried to explain above.  I.e. that
fscache_relinquish_cookie() should probably be invoked early when evicting
an inode.

FTR, with this patch applied I occasionally (but not always!) see the
following:

  [DEBUG] inode: 24187397 quid: 0.1711203.633864d4
  [DEBUG] found in tree qid: 0.1711203.633864d4 removing: 1
  [DEBUG] found dup ino: 24187397 0.1711203.633864d4 removing: 1

(Sometimes I do not see the duplicate being found, which probably means I
didn't hit the race.)

David, does this make sense?

Cheers
-- 
Luis


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: debug-code.patch --]
[-- Type: text/x-diff, Size: 5633 bytes --]

diff --git a/fs/9p/cache.c b/fs/9p/cache.c
index eb2151fb6049..59b59291609c 100644
--- a/fs/9p/cache.c
+++ b/fs/9p/cache.c
@@ -122,6 +122,30 @@ void v9fs_cache_inode_get_cookie(struct inode *inode)
 						  i_size_read(&v9inode->vfs_inode),
 						  true);
 
+	if (!v9inode->fscache) {
+		struct v9fs_debug *db, *next;
+
+		mutex_lock(&v9fs_debug_tree_mutex);
+		db = v9fs_debug_search(inode->i_ino);
+		printk("[DEBUG] inode: %ld quid: %x.%llx.%x\n",
+		       inode->i_ino,
+		       v9inode->qid.type,
+		       (unsigned long long)v9inode->qid.path,
+		       v9inode->qid.version);
+		if (db)
+			printk("[DEBUG] found in tree qid: %x.%llx.%x removing: %d\n",
+			       db->type, (unsigned long long)db->path,
+			       db->version, db->removing);
+		rbtree_postorder_for_each_entry_safe(db, next, &v9fs_debug_tree, node) {
+			if (db->path == v9inode->qid.path &&
+			    db->version == v9inode->qid.version)
+				printk("[DEBUG] found dup ino: %ld %x.%llx.%x removing: %d\n",
+				       db->inode,
+				       db->type, (unsigned long long)db->path,
+				       db->version, db->removing);
+		}
+		mutex_unlock(&v9fs_debug_tree_mutex);
+	}
 	p9_debug(P9_DEBUG_FSC, "inode %p get cookie %p\n",
 		 inode, v9inode->fscache);
 }
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 4ca56c5dd637..7935ab56453c 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -119,6 +119,22 @@ struct v9fs_inode {
 	struct inode vfs_inode;
 };
 
+struct v9fs_debug {
+	struct rb_node node;
+	unsigned long inode;
+	u8 type;
+	u32 version;
+	u64 path;
+	bool removing;
+};
+
+extern struct rb_root v9fs_debug_tree;
+extern struct mutex v9fs_debug_tree_mutex;
+extern void v9fs_debug_remove(unsigned long inode);
+extern void v9fs_debug_tag(unsigned long inode);
+extern struct v9fs_debug *v9fs_debug_search(unsigned long inode);
+extern void v9fs_debug_insert(unsigned long inode, u8 type, u32 version, u64 path);
+
 static inline struct v9fs_inode *V9FS_I(const struct inode *inode)
 {
 	return container_of(inode, struct v9fs_inode, vfs_inode);
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 795706520b5e..e5ebec787c28 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -377,6 +377,7 @@ void v9fs_evict_inode(struct inode *inode)
 {
 	struct v9fs_inode *v9inode = V9FS_I(inode);
 
+	v9fs_debug_tag(inode->i_ino);
 	truncate_inode_pages_final(&inode->i_data);
 	clear_inode(inode);
 	filemap_fdatawrite(&inode->i_data);
@@ -387,6 +388,7 @@ void v9fs_evict_inode(struct inode *inode)
 		p9_client_clunk(v9inode->writeback_fid);
 		v9inode->writeback_fid = NULL;
 	}
+	v9fs_debug_remove(inode->i_ino);
 }
 
 static int v9fs_test_inode(struct inode *inode, void *data)
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index e1c0240b51c0..56a7bc0c627b 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -32,6 +32,9 @@
 #include "xattr.h"
 #include "acl.h"
 
+struct rb_root v9fs_debug_tree = RB_ROOT;
+DEFINE_MUTEX(v9fs_debug_tree_mutex);
+
 static int
 v9fs_vfs_mknod_dotl(struct user_namespace *mnt_userns, struct inode *dir,
 		    struct dentry *dentry, umode_t omode, dev_t rdev);
@@ -94,6 +97,94 @@ static int v9fs_set_inode_dotl(struct inode *inode,  void *data)
 	return 0;
 }
 
+struct v9fs_debug *v9fs_debug_search(unsigned long inode)
+{
+	struct rb_node *node = v9fs_debug_tree.rb_node;
+
+	while (node) {
+		struct v9fs_debug *data = container_of(node, struct v9fs_debug, node);
+
+		if (data->inode < inode)
+			node = node->rb_left;
+		else if (data->inode > inode)
+			node = node->rb_right;
+		else
+			return data;
+	}
+	return NULL;
+}
+
+void v9fs_debug_tag(unsigned long inode)
+{
+	struct v9fs_debug *node;
+
+	mutex_lock(&v9fs_debug_tree_mutex);
+	node = v9fs_debug_search(inode);
+	if (node)
+		node->removing = true;
+	else
+		printk("[DEBUG] can't find %ld for tagging\n", inode);
+	mutex_unlock(&v9fs_debug_tree_mutex);
+}
+
+void v9fs_debug_remove(unsigned long inode)
+{
+	struct v9fs_debug *node;
+
+	mutex_lock(&v9fs_debug_tree_mutex);
+	node = v9fs_debug_search(inode);
+	if (node) {
+		rb_erase(&node->node, &v9fs_debug_tree);
+		kfree(node);
+	} else
+		printk("[DEBUG] can't find %ld\n", inode);
+	mutex_unlock(&v9fs_debug_tree_mutex);
+}
+
+void v9fs_debug_insert(unsigned long inode, u8 type, u32 version, u64 path)
+{
+	struct v9fs_debug *db = NULL;
+	struct rb_node **node, *parent = NULL;
+
+	mutex_lock(&v9fs_debug_tree_mutex);
+	node = &(v9fs_debug_tree.rb_node);
+	while (*node) {
+		parent = *node;
+		db = container_of(*node, struct v9fs_debug, node);
+
+		if (db->inode < inode)
+			node = &((*node)->rb_left);
+		else if (db->inode > inode)
+			node = &((*node)->rb_right);
+		else
+			break;
+	}
+	if (!db || (db->inode != inode)) {
+		db = kmalloc(sizeof(*db), GFP_KERNEL);
+		if (db) {
+			db->inode = inode;
+			db->type = type;
+			db->version = version;
+			db->path = path;
+			db->removing = false;
+			rb_link_node(&db->node, parent, node);
+			rb_insert_color(&db->node, &v9fs_debug_tree);
+		} else
+			printk("[DEBUG] Failed to alloc memory!\n");
+	}
+	if (db) {
+		if (type != db->type ||
+		    version != db->version ||
+		    path != db->path) {
+			db->inode = inode;
+			db->type = type;
+			db->version = version;
+			db->path = path;
+		}
+	}
+	mutex_unlock(&v9fs_debug_tree_mutex);
+}
+
 static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 					struct p9_qid *qid,
 					struct p9_fid *fid,
@@ -134,6 +225,7 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 	if (retval)
 		goto error;
 
+	v9fs_debug_insert(inode->i_ino, qid->type, qid->version, qid->path);
 	unlock_new_inode(inode);
 	return inode;
 error:

[-- Attachment #3: Type: text/plain, Size: 620 bytes --]


> And yeah, I had checked other filesystems too and they seem to follow this
> pattern.  So either I'm looking at the wrong place or this is something
> that is much more likely to be triggered by 9p.
>
> Cheers,
> -- 
> Luis
>
>>> If this theory is correct, I'm not sure what's the best way to close this
>>> race because the v9inode->fscache_lock can't really be used.  But I still
>>> need to proof this is really what's happening.
>>
>> Yes, I think we're going to need proof before thinking of a solution, I
>> can't think of anything simple either.
>>
>>
>> Thanks again for looking into it,
>> -- 
>> Dominique


  reply	other threads:[~2021-05-10 12:05 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 10:03 9p: fscache duplicate cookie Luis Henriques
2021-05-06 10:45 ` Dominique Martinet
2021-05-06 12:18   ` Luis Henriques
2021-05-07 16:36     ` Luis Henriques
2021-05-08  0:47       ` Dominique Martinet
2021-05-10 10:54         ` Luis Henriques
2021-05-10 11:47           ` Luis Henriques [this message]
2021-05-11 12:44           ` David Howells
2021-05-12 10:10             ` Luis Henriques
2021-05-11 12:53         ` David Howells
2021-05-11 12:38 ` David Howells
2021-05-12 10:07   ` Luis Henriques
2021-05-12 11:04   ` David Howells
2021-05-12 11:58     ` Luis Henriques
2021-05-12 12:26       ` Dominique Martinet
2021-05-12 12:57       ` What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie] David Howells
2021-05-12 13:45         ` Al Viro
2021-05-12 14:12         ` David Howells
2021-05-14 16:10           ` Luis Henriques
2021-05-14 21:16             ` Dominique Martinet
2021-05-17 15:56               ` Luis Henriques
2021-05-17 17:39               ` Aneesh Kumar K.V
2021-05-12 11:09   ` 9p: fscache duplicate cookie David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871rae24kv.fsf@suse.de \
    --to=lhenriques@suse.de \
    --cc=asmadeus@codewreck.org \
    --cc=dhowells@redhat.com \
    --cc=ericvh@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lucho@ionkov.net \
    --cc=v9fs-developer@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).