From: Luis Henriques <lhenriques@suse.de>
To: Dominique Martinet <asmadeus@codewreck.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>,
Latchesar Ionkov <lucho@ionkov.net>,
David Howells <dhowells@redhat.com>,
linux-fsdevel@vger.kernel.org,
v9fs-developer@lists.sourceforge.net
Subject: Re: 9p: fscache duplicate cookie
Date: Mon, 10 May 2021 12:47:28 +0100 [thread overview]
Message-ID: <871rae24kv.fsf@suse.de> (raw)
In-Reply-To: <875yzq270z.fsf@suse.de> (Luis Henriques's message of "Mon, 10 May 2021 11:54:36 +0100")
[-- Attachment #1: Type: text/plain, Size: 5029 bytes --]
Luis Henriques <lhenriques@suse.de> writes:
> Dominique Martinet <asmadeus@codewreck.org> writes:
>
>> Luis Henriques wrote on Fri, May 07, 2021 at 05:36:29PM +0100:
>>> Ok, I spent some more time on this issue today. I've hacked a bit of code
>>> to keep track of new inodes' qids and I'm convinced there are no
>>> duplicates when this issue happens.
>>
>> Ok, that's good.
>> Just to make sure what did you look at aside of the qid to make sure
>> it's unique? i_ino comes straight from qid->path too so we don't have
>> any great key available which is why I hadn't suggesting building a
>> debug table.
>> (... well, actually that means we'll never try to allocate two inodes
>> with the same inode number because of how v9fs_qid_iget_dotl() works, so
>> if there is a collision in qid paths we wouldn't see it through cookies
>> collision in the first place. I'm not sure that's good? But at least
>> that clears up that theory, sorry for the bad suggestion)
>>
>
> Ok, I should probably have added some more details in my email. So,
> here's what I did:
>
> I've created a list (actually a tree...) of objects that keep track of
> each new inode in v9fs_qid_iget_dotl(). These objects contain the inode
> number, and the qid (type, version, path). These are then removed from
> the list in v9fs_evict_inode().
>
> Whenever there's an error in v9fs_cache_inode_get_cookie(), i.e. when
> v9inode->fscache == NULL, I'll search this list to see if that inode
> number was there (or if I can find the qid.path and qid.version).
>
> I have never seen a hit in this search, hence my claim of not having a
> duplicate qids involved. Of course my hack can be buggy and I completely
> miss it ;-)
>
>>> OTOH, I've done another quick test: in v9fs_cache_inode_get_cookie(), I do
>>> an fscache_acquire_cookie() retry when it fails (due to the dup error),
>>> and this retry *does* succeed. Which means, I guess, there's a race going
>>> on. I didn't managed to look too deep yet, but my current theory is that
>>> the inode is being evicted while an open is triggered. A new inode is
>>> allocated but the old inode fscache cookie hasn't yet been relinquished.
>>> Does this make any sense?
>>
>> hm, if the retry goes through I guess that'd make sense; if they both
>> were used in parallel the second call should fail all the same so that's
>> definitely a likely explanation.
>>
>> It wouldn't hurt to check with v9fs_evict_inode if that's correct...
>> There definitely is a window where inode is no longer findable (thus
>> leading to allocation of a new one) and the call to the
>> fscache_relinquish_cookie() at eviction, but looking at e.g. afs they
>> are doing exactly the same as 9p here (iget5_locked, if that gets a new
>> inode then call fscache_acquire_cookie // fscache_relinquish_cookie in
>> evict op) so I'm not sure what we're missing.
>>
>>
>> David, do you have an idea?
>
> I've just done a quick experiment: moving the call to function
> v9fs_cache_inode_put_cookie() in v9fs_evict_inode() to the beginning
> (before truncate_inode_pages_final()) and it seems to, at least, narrow
> the window -- I'm not able to reproduce the issue anymore. But I'll need
> to look closer.
>
And right after sending this email I decided to try a different
experiment. Here's the code I had in v9fs_evict_inode():
void v9fs_evict_inode(struct inode *inode)
{
struct v9fs_inode *v9inode = V9FS_I(inode);
v9fs_debug_remove(inode->i_ino); /* <------------------------- */
truncate_inode_pages_final(&inode->i_data);
clear_inode(inode);
filemap_fdatawrite(&inode->i_data);
v9fs_cache_inode_put_cookie(inode);
/* clunk the fid stashed in writeback_fid */
if (v9inode->writeback_fid) {
p9_client_clunk(v9inode->writeback_fid);
v9inode->writeback_fid = NULL;
}
}
v9fs_debug_remove() is the function that would remove the inode from my
debug tree. In my new experiment, I changed this with:
void v9fs_evict_inode(struct inode *inode)
{
struct v9fs_inode *v9inode = V9FS_I(inode);
v9fs_debug_tag(inode->i_ino);
(...)
v9fs_debug_remove(inode->i_ino);
}
So, I effectively "tag" the inode for removing it from the list but only
remove it in the end. The result is that I actually started seeing this
inode tagged for removing in v9fs_cache_inode_get_cookie() when fscache
detects the duplicate cookie!
I'm attaching the debug patch I'm using. Obviously, I'm not really proud
of this code and is not for merging (it's *really* hacky!), but maybe it
helps clarifying what I tried to explain above. I.e. that
fscache_relinquish_cookie() should probably be invoked early when evicting
an inode.
FTR, with this patch applied I occasionally (but not always!) see the
following:
[DEBUG] inode: 24187397 quid: 0.1711203.633864d4
[DEBUG] found in tree qid: 0.1711203.633864d4 removing: 1
[DEBUG] found dup ino: 24187397 0.1711203.633864d4 removing: 1
(Sometimes I do not see the duplicate being found, which probably means I
didn't hit the race.)
David, does this make sense?
Cheers
--
Luis
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: debug-code.patch --]
[-- Type: text/x-diff, Size: 5633 bytes --]
diff --git a/fs/9p/cache.c b/fs/9p/cache.c
index eb2151fb6049..59b59291609c 100644
--- a/fs/9p/cache.c
+++ b/fs/9p/cache.c
@@ -122,6 +122,30 @@ void v9fs_cache_inode_get_cookie(struct inode *inode)
i_size_read(&v9inode->vfs_inode),
true);
+ if (!v9inode->fscache) {
+ struct v9fs_debug *db, *next;
+
+ mutex_lock(&v9fs_debug_tree_mutex);
+ db = v9fs_debug_search(inode->i_ino);
+ printk("[DEBUG] inode: %ld quid: %x.%llx.%x\n",
+ inode->i_ino,
+ v9inode->qid.type,
+ (unsigned long long)v9inode->qid.path,
+ v9inode->qid.version);
+ if (db)
+ printk("[DEBUG] found in tree qid: %x.%llx.%x removing: %d\n",
+ db->type, (unsigned long long)db->path,
+ db->version, db->removing);
+ rbtree_postorder_for_each_entry_safe(db, next, &v9fs_debug_tree, node) {
+ if (db->path == v9inode->qid.path &&
+ db->version == v9inode->qid.version)
+ printk("[DEBUG] found dup ino: %ld %x.%llx.%x removing: %d\n",
+ db->inode,
+ db->type, (unsigned long long)db->path,
+ db->version, db->removing);
+ }
+ mutex_unlock(&v9fs_debug_tree_mutex);
+ }
p9_debug(P9_DEBUG_FSC, "inode %p get cookie %p\n",
inode, v9inode->fscache);
}
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 4ca56c5dd637..7935ab56453c 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -119,6 +119,22 @@ struct v9fs_inode {
struct inode vfs_inode;
};
+struct v9fs_debug {
+ struct rb_node node;
+ unsigned long inode;
+ u8 type;
+ u32 version;
+ u64 path;
+ bool removing;
+};
+
+extern struct rb_root v9fs_debug_tree;
+extern struct mutex v9fs_debug_tree_mutex;
+extern void v9fs_debug_remove(unsigned long inode);
+extern void v9fs_debug_tag(unsigned long inode);
+extern struct v9fs_debug *v9fs_debug_search(unsigned long inode);
+extern void v9fs_debug_insert(unsigned long inode, u8 type, u32 version, u64 path);
+
static inline struct v9fs_inode *V9FS_I(const struct inode *inode)
{
return container_of(inode, struct v9fs_inode, vfs_inode);
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 795706520b5e..e5ebec787c28 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -377,6 +377,7 @@ void v9fs_evict_inode(struct inode *inode)
{
struct v9fs_inode *v9inode = V9FS_I(inode);
+ v9fs_debug_tag(inode->i_ino);
truncate_inode_pages_final(&inode->i_data);
clear_inode(inode);
filemap_fdatawrite(&inode->i_data);
@@ -387,6 +388,7 @@ void v9fs_evict_inode(struct inode *inode)
p9_client_clunk(v9inode->writeback_fid);
v9inode->writeback_fid = NULL;
}
+ v9fs_debug_remove(inode->i_ino);
}
static int v9fs_test_inode(struct inode *inode, void *data)
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index e1c0240b51c0..56a7bc0c627b 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -32,6 +32,9 @@
#include "xattr.h"
#include "acl.h"
+struct rb_root v9fs_debug_tree = RB_ROOT;
+DEFINE_MUTEX(v9fs_debug_tree_mutex);
+
static int
v9fs_vfs_mknod_dotl(struct user_namespace *mnt_userns, struct inode *dir,
struct dentry *dentry, umode_t omode, dev_t rdev);
@@ -94,6 +97,94 @@ static int v9fs_set_inode_dotl(struct inode *inode, void *data)
return 0;
}
+struct v9fs_debug *v9fs_debug_search(unsigned long inode)
+{
+ struct rb_node *node = v9fs_debug_tree.rb_node;
+
+ while (node) {
+ struct v9fs_debug *data = container_of(node, struct v9fs_debug, node);
+
+ if (data->inode < inode)
+ node = node->rb_left;
+ else if (data->inode > inode)
+ node = node->rb_right;
+ else
+ return data;
+ }
+ return NULL;
+}
+
+void v9fs_debug_tag(unsigned long inode)
+{
+ struct v9fs_debug *node;
+
+ mutex_lock(&v9fs_debug_tree_mutex);
+ node = v9fs_debug_search(inode);
+ if (node)
+ node->removing = true;
+ else
+ printk("[DEBUG] can't find %ld for tagging\n", inode);
+ mutex_unlock(&v9fs_debug_tree_mutex);
+}
+
+void v9fs_debug_remove(unsigned long inode)
+{
+ struct v9fs_debug *node;
+
+ mutex_lock(&v9fs_debug_tree_mutex);
+ node = v9fs_debug_search(inode);
+ if (node) {
+ rb_erase(&node->node, &v9fs_debug_tree);
+ kfree(node);
+ } else
+ printk("[DEBUG] can't find %ld\n", inode);
+ mutex_unlock(&v9fs_debug_tree_mutex);
+}
+
+void v9fs_debug_insert(unsigned long inode, u8 type, u32 version, u64 path)
+{
+ struct v9fs_debug *db = NULL;
+ struct rb_node **node, *parent = NULL;
+
+ mutex_lock(&v9fs_debug_tree_mutex);
+ node = &(v9fs_debug_tree.rb_node);
+ while (*node) {
+ parent = *node;
+ db = container_of(*node, struct v9fs_debug, node);
+
+ if (db->inode < inode)
+ node = &((*node)->rb_left);
+ else if (db->inode > inode)
+ node = &((*node)->rb_right);
+ else
+ break;
+ }
+ if (!db || (db->inode != inode)) {
+ db = kmalloc(sizeof(*db), GFP_KERNEL);
+ if (db) {
+ db->inode = inode;
+ db->type = type;
+ db->version = version;
+ db->path = path;
+ db->removing = false;
+ rb_link_node(&db->node, parent, node);
+ rb_insert_color(&db->node, &v9fs_debug_tree);
+ } else
+ printk("[DEBUG] Failed to alloc memory!\n");
+ }
+ if (db) {
+ if (type != db->type ||
+ version != db->version ||
+ path != db->path) {
+ db->inode = inode;
+ db->type = type;
+ db->version = version;
+ db->path = path;
+ }
+ }
+ mutex_unlock(&v9fs_debug_tree_mutex);
+}
+
static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
struct p9_qid *qid,
struct p9_fid *fid,
@@ -134,6 +225,7 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
if (retval)
goto error;
+ v9fs_debug_insert(inode->i_ino, qid->type, qid->version, qid->path);
unlock_new_inode(inode);
return inode;
error:
[-- Attachment #3: Type: text/plain, Size: 620 bytes --]
> And yeah, I had checked other filesystems too and they seem to follow this
> pattern. So either I'm looking at the wrong place or this is something
> that is much more likely to be triggered by 9p.
>
> Cheers,
> --
> Luis
>
>>> If this theory is correct, I'm not sure what's the best way to close this
>>> race because the v9inode->fscache_lock can't really be used. But I still
>>> need to proof this is really what's happening.
>>
>> Yes, I think we're going to need proof before thinking of a solution, I
>> can't think of anything simple either.
>>
>>
>> Thanks again for looking into it,
>> --
>> Dominique
next prev parent reply other threads:[~2021-05-10 12:05 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-06 10:03 9p: fscache duplicate cookie Luis Henriques
2021-05-06 10:45 ` Dominique Martinet
2021-05-06 12:18 ` Luis Henriques
2021-05-07 16:36 ` Luis Henriques
2021-05-08 0:47 ` Dominique Martinet
2021-05-10 10:54 ` Luis Henriques
2021-05-10 11:47 ` Luis Henriques [this message]
2021-05-11 12:44 ` David Howells
2021-05-12 10:10 ` Luis Henriques
2021-05-11 12:53 ` David Howells
2021-05-11 12:38 ` David Howells
2021-05-12 10:07 ` Luis Henriques
2021-05-12 11:04 ` David Howells
2021-05-12 11:58 ` Luis Henriques
2021-05-12 12:26 ` Dominique Martinet
2021-05-12 12:57 ` What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie] David Howells
2021-05-12 13:45 ` Al Viro
2021-05-12 14:12 ` David Howells
2021-05-14 16:10 ` Luis Henriques
2021-05-14 21:16 ` Dominique Martinet
2021-05-17 15:56 ` Luis Henriques
2021-05-17 17:39 ` Aneesh Kumar K.V
2021-05-12 11:09 ` 9p: fscache duplicate cookie David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871rae24kv.fsf@suse.de \
--to=lhenriques@suse.de \
--cc=asmadeus@codewreck.org \
--cc=dhowells@redhat.com \
--cc=ericvh@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lucho@ionkov.net \
--cc=v9fs-developer@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).