* Race condition introduced in 4bf46a27 VFS: Impose ordering on accesses of d_inode and d_flags
@ 2015-07-22 15:45 Dominique Martinet
2015-07-23 9:43 ` Dominique Martinet
2015-07-30 11:50 ` Dominique Martinet
0 siblings, 2 replies; 5+ messages in thread
From: Dominique Martinet @ 2015-07-22 15:45 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Howells, Al Viro, Aurelien CEDEYN
I'm getting a sneaky race condition failure since this commit (after
painstakingly bisecting it through the day):
commit 4bf46a272647d89e780126b52eda04737defd9f4
Author: David Howells <dhowells@redhat.com>
Date: Thu Mar 5 14:09:22 2015 +0000
VFS: Impose ordering on accesses of d_inode and d_flags
Impose ordering on accesses of d_inode and d_flags to avoid the need
to do this:
if (!dentry->d_inode || d_is_negative(dentry)) {
when this:
if (d_is_negative(dentry)) {
should suffice.
This check is especially problematic if a dentry can have its type
field set to something other than DENTRY_MISS_TYPE when d_inode is
NULL (as in unionmount).
What we really need to do is stick a write barrier between setting
d_inode and setting d_flags and a read barrier between reading
d_flags and reading d_inode.
Test run:
make -j17 on a kernel over 9P/RDMA (nfs-ganesha serving a tmpfs
directory)
This patch has nothing 9p specific so I'll try to reproduce with virtio
or another filesystem tomorrow. I think 9P hits this more easily because
the lack of cache makes it create and delete inodes alot...
I tried to make a reproducer but couldn't come up with anything, usually
open()/read()/close() like mad gets me similar bugs but that didn't work
out this time; it might be missing some fiddling with directories.
Error:
mostly this message during the build:
"fixdep: error opening config file: arch/x86/include/uapi/asm/ioctl.h:
Not a directory"
sometimes I get ENOENT but I get ENOTDIR 4 times out of 5.
Header file varies, but the whole build rarely passes (defconfig)
I managed to reproduce while dumping pcap of the build and did not see
any ENOTDIR over the wire, so assuming the race condition happens in the
kernel... Which I'm pretty confident about after bisecting the client
(and changing the server didn't help)
Soo. The barriers do look right, I'm not quite sure what it could be.
A co-worker pointed out that __d_set_inode_and_type clears d_flags for
DCACHE_ENTRY_TYPE and DCACHE_FALLTHRU, and __d_obtain_alias didn't clear
these before, but I still get the same error if I don't (although it did
seem longer to reproduce).
I have no idea what __d_obtain_alias does or if it's ok if it clears it,
just saying I tried.
My guess is that I'm just seeing a race condition that already existed
but the barriers make it easier to reproduce it.
This commit came with 2b0143b5c9 "VFS: normal filesystems (and lustre):
d_inode() annotations" which made 9P access d_inode through the helper,
which could have helped with ordering, but I'm still hitting the bug
"easily" with that commit.
Any idea? Anything I could test to help diagnose this?
Thanks,
--
Dominique Martinet
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Race condition introduced in 4bf46a27 VFS: Impose ordering on accesses of d_inode and d_flags
2015-07-22 15:45 Race condition introduced in 4bf46a27 VFS: Impose ordering on accesses of d_inode and d_flags Dominique Martinet
@ 2015-07-23 9:43 ` Dominique Martinet
2015-07-30 11:50 ` Dominique Martinet
1 sibling, 0 replies; 5+ messages in thread
From: Dominique Martinet @ 2015-07-23 9:43 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Howells, Al Viro, Aurelien CEDEYN
Dominique Martinet wrote on Wed, Jul 22, 2015 at 05:45:19PM +0200:
> Test run:
> make -j17 on a kernel over 9P/RDMA (nfs-ganesha serving a tmpfs
> directory)
>
> This patch has nothing 9p specific so I'll try to reproduce with virtio
> or another filesystem tomorrow. I think 9P hits this more easily because
> the lack of cache makes it create and delete inodes alot...
building a kernel on a 9p/virtio mount (tried tmpfs underneath first,
but also fails with xfs on a "slow" disk)
reproduced this as well, should be easier to play with than getting
RDMA.
It's actually failing alot faster, so will try to get simpler
reproducers with that again.
no specific mount option.
--
Dominique
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Race condition introduced in 4bf46a27 VFS: Impose ordering on accesses of d_inode and d_flags
2015-07-22 15:45 Race condition introduced in 4bf46a27 VFS: Impose ordering on accesses of d_inode and d_flags Dominique Martinet
2015-07-23 9:43 ` Dominique Martinet
@ 2015-07-30 11:50 ` Dominique Martinet
2015-07-31 12:28 ` Dominique Martinet
1 sibling, 1 reply; 5+ messages in thread
From: Dominique Martinet @ 2015-07-30 11:50 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Howells, Al Viro, Aurelien CEDEYN
Dominique Martinet wrote on Wed, Jul 22, 2015 at 05:45:19PM +0200:
> I'm getting a sneaky race condition failure since this commit (after
> painstakingly bisecting it through the day):
>
> commit 4bf46a272647d89e780126b52eda04737defd9f4
> Author: David Howells <dhowells@redhat.com>
> Date: Thu Mar 5 14:09:22 2015 +0000
>
> VFS: Impose ordering on accesses of d_inode and d_flags
>
> Impose ordering on accesses of d_inode and d_flags to avoid the need
> to do this:
>
> if (!dentry->d_inode || d_is_negative(dentry)) {
>
> when this:
>
> if (d_is_negative(dentry)) {
>
> should suffice.
>
> This check is especially problematic if a dentry can have its type
> field set to something other than DENTRY_MISS_TYPE when d_inode is
> NULL (as in unionmount).
>
> What we really need to do is stick a write barrier between setting
> d_inode and setting d_flags and a read barrier between reading
> d_flags and reading d_inode.
>
>
>
>
> Test run:
Ok so, instead of kernel compile, I got a simpler reproducer, on a virtio-9P
mount:
#!/bin/bash
for dir in {0..2}; do
mkdir -p d.${dir}/{d,d2}/d/d
echo foo > d.${dir}/d/d/d/f
done
function foo() {
# return code 255 tells xargs to stop immediately
cat d.$(($1%3))/{d,d2}/{d,d2}/d/f | grep -q foo || return 255
}
export -f foo
date
while ! seq 1 30000 | xargs -n 1 -P 16 -I{} bash -c "foo {}" 2>&1 |\
grep -E 'd.[0-9]/d/d/d/f'; do
date
done
After looking at what the compilation was doing, basically trying to
open a lot of header files where there is done then going forward to the
next directory etc, the reproducer needs to try to open non-existing
entries to hit the bug.
This seems to be getting ENOENT way more often than ENOTDIR though,
since it's one ENOENT in the middle of plenty of "valid" ENOENT it's not
easy to debug...
I also unfortunately couldn't reproduce on my 2-core+hyperthreading
laptop, the host where the test fails is a bi-socket 8 core+ht so 32
logical core.
Since it's a memory barrier issue, what might help most is the bisocket
part, I'd try to pin tasks but it doesn't strike me as obvious from
inside a VM.
> A co-worker pointed out that __d_set_inode_and_type clears d_flags for
> DCACHE_ENTRY_TYPE and DCACHE_FALLTHRU, and __d_obtain_alias didn't clear
> these before, but I still get the same error if I don't (although it did
> seem longer to reproduce).
> I have no idea what __d_obtain_alias does or if it's ok if it clears it,
> just saying I tried.
Could use an enlightened opinion on wether __d_obtain_alias should clear
DCACHE_ENTRY_TYPE and DCACHE_FALLTHU, independantly of the problem at
hand?
> My guess is that I'm just seeing a race condition that already existed
> but the barriers make it easier to reproduce it.
> This commit came with 2b0143b5c9 "VFS: normal filesystems (and lustre):
> d_inode() annotations" which made 9P access d_inode through the helper,
> which could have helped with ordering, but I'm still hitting the bug
> "easily" with that commit.
Still looking for ideas to help diagnose further...
--
Dominique
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Race condition introduced in 4bf46a27 VFS: Impose ordering on accesses of d_inode and d_flags
2015-07-30 11:50 ` Dominique Martinet
@ 2015-07-31 12:28 ` Dominique Martinet
2015-07-31 15:28 ` Dominique Martinet
0 siblings, 1 reply; 5+ messages in thread
From: Dominique Martinet @ 2015-07-31 12:28 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Howells, Al Viro, Aurelien CEDEYN
Dominique Martinet wrote on Thu, Jul 30, 2015 at 01:50:45PM +0200:
> > My guess is that I'm just seeing a race condition that already existed
> > but the barriers make it easier to reproduce it.
> > This commit came with 2b0143b5c9 "VFS: normal filesystems (and lustre):
> > d_inode() annotations" which made 9P access d_inode through the helper,
> > which could have helped with ordering, but I'm still hitting the bug
> > "easily" with that commit.
>
> Still looking for ideas to help diagnose further...
Slowly progressing, through carefully placed systemtap probes I got down
to this check in link_path_walk in fs/namei.c:
if (!d_can_lookup(nd->path.dentry)) {
err = -ENOTDIR;
break;
}
going all the way to path_init -> path_openat -> do_filp_open -> user.
Could then add a break in gdb on the err = -ENOTDIR instruction:
#0 link_path_walk (name=0xffff88042b47b027 "f", nd=0xffff88042c19eff8)
at fs/namei.c:1845
#1 0xffffffff81209209 in path_init (dfd=<optimized out>, name=<optimized out>,
flags=64, nd=0x1 <irq_stack_union+1>) at fs/namei.c:1952
#2 0xffffffff8120bd42 in path_openat (dfd=<optimized out>, pathname=0xffff88042b47b000,
nd=0xffff88042a9cfe28, op=0xffff88042a9cff1c, flags=65) at fs/namei.c:3230
#3 0xffffffff8120d8e9 in do_filp_open (dfd=-100, pathname=0xffff88042b47b000,
op=0xffff88042a9cff1c) at fs/namei.c:3280
#4 0xffffffff811fb2d7 in do_sys_open (dfd=683933728, filename=<optimized out>,
flags=<optimized out>, mode=<optimized out>) at fs/open.c:1010
#5 0xffffffff811fb3fe in SYSC_open (mode=<optimized out>, flags=<optimized out>,
filename=<optimized out>) at fs/open.c:1028
#6 SyS_open (filename=<optimized out>, flags=<optimized out>, mode=<optimized out>)
at fs/open.c:1023
Unfortunately can't seem to get much more out of it, nd->path is borked
by the time gdb gets here:
(gdb) p nd->path
$1 = {mnt = 0x6f666e69000064, dentry = 0x7379656b64616564}
Looking at other values around:
- next->dentry seems to be the last dir (need to rename directories in
my test to check, bad idea to name them all the same)
- name is the name of the file that does exist
It's easy enough to reproduce for me with the script I got, so happy to
give more infos if someone has an idea...
--
Dominique
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Race condition introduced in 4bf46a27 VFS: Impose ordering on accesses of d_inode and d_flags
2015-07-31 12:28 ` Dominique Martinet
@ 2015-07-31 15:28 ` Dominique Martinet
0 siblings, 0 replies; 5+ messages in thread
From: Dominique Martinet @ 2015-07-31 15:28 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Howells, Al Viro, Aurelien CEDEYN
Dominique Martinet wrote on Fri, Jul 31, 2015 at 02:28:10PM +0200:
> Unfortunately can't seem to get much more out of it, nd->path is borked
> by the time gdb gets here:
> (gdb) p nd->path
> $1 = {mnt = 0x6f666e69000064, dentry = 0x7379656b64616564}
Looks like gdb is just messing up here. Looking at the value from nd on
the stack (it's in do_filp_open's), nd->path is perfectly valid as
expected -- and has no type:
(gdb) p/x ((struct dentry*)0xffff88042c1eb800)->d_flags
$7 = 0x8088
So somewhere between the check in walk_component:
if (!inode || d_is_negative(path->dentry))
and the d_can_lookup, the entry that wasn't a miss became a miss...
Any hint in how we could tell the "something" clearing it that we use
it? I'm really not familiar with how entries are selected for
deletion...
> Looking at other values around:
> - next->dentry seems to be the last dir (need to rename directories in
> my test to check, bad idea to name them all the same)
> - name is the name of the file that does exist
Doesn't matter much anymore, but changed my directory structures and
caught a few breaks to see more values, it can fail earlier than on last
file.
link_path_walk is probably called with the full d/d2/d3/d4/file path and
cwd entry.
Well that's as far as I'm going to get today, further clues on how to
proceed welcome!
--
Dominique
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-07-31 15:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-22 15:45 Race condition introduced in 4bf46a27 VFS: Impose ordering on accesses of d_inode and d_flags Dominique Martinet
2015-07-23 9:43 ` Dominique Martinet
2015-07-30 11:50 ` Dominique Martinet
2015-07-31 12:28 ` Dominique Martinet
2015-07-31 15:28 ` Dominique Martinet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).