* [2.4] VFS locking problem during concurrent link/unlink
@ 2003-01-16 11:00 Oleg Drokin
2003-01-16 15:39 ` Chris Mason
0 siblings, 1 reply; 9+ messages in thread
From: Oleg Drokin @ 2003-01-16 11:00 UTC (permalink / raw)
To: linux-kernel; +Cc: eazgwmir, viro, nikita
Hello!
Debugging reiserfs problem that can be demonstrated with script created by
Zygo Blaxell, I started to wonder if the problem presented is indeed reiserfs
fault and not VFS.
Though the Zygo claims script only produces problems on reiserfs, I am trying
it now myself on ext2 (which will take some time).
Debugging shows that reiserfs_link is sometimes called for inodes whose
i_nlink is zero (and all corresponding data is deleted already).
So my current guess of what's going on is this:
process 1 process 2
sys_unlink("a/b") sys_link("a/b", "c/d");
down(inode of ("a")); down(inode of "c");
lock_kernel()
reiserfs_unlink("a/b")
decreases i_nlink of a/b to zero
and removes name "b" from "a"
unlock_kernel()
d_delete("a/b")
(*)
lock_kernel()
reiserfs_link()
at this point we do usual stuff,
but inode's n_nlink is zero and
file data already removed which
indicates reiserfs_delete_inode()
was already called at (*)
unlock_kernel()
So my question is "Is it really ok that sys_link/vfs_link does not
take semaphore on parent dir of original path?", or should we
actually put a workaround in reiserfs code to avoid such a situation?
Bye,
Oleg
PS: Here's the script:
#!/bin/bash
# Create an empty filesystem:
mkreiserfs -f -f /dev/hdb2
mount /dev/hdb2 /data1 -t reiserfs
cd /data1
# Script used to control the load average. Note that as written the loops
# below will keep spawning new processes, so we need some way to throttle
# them. Change the '-lt 10' to another number to change the number
# of processes.
cat <<'LC' > loadcheck && chmod 755 loadcheck
#!/bin/sh
read av1 av5 av15 rest < /proc/loadavg
echo -n "Load Average: $av1 ... "
av1=${av1%.*}
if [ $av1 -lt 10 ]; then
echo OK
exit 0
else
echo "Whoa, Nellie!"
exit 1
fi
LC
# Create directories used by test
mkdir foo bar
mkdir foo/etc foo/usr foo/var
# Start up some rsyncs. I use /etc, /usr, and /var because there's a
# good mixture of files with some hardlinks between them, and on a normal
# Linux system some of them change from time to time.
while sleep 1m; do
./loadcheck || continue;
for x in usr etc var; do
rsync -avxHS --delete /$x/. foo/$x/. &
done;
done &
# Start up some cp -al's and rm -rf's. Note there are two concurrent
# sets of 'cp's and two concurrent sets of 'rm's, and each of those
# has different instances of 'cp' and 'rm' running at different times.
for x in 1 2; do
while sleep 1m; do
./loadcheck || continue;
cp -al foo bar/`date +%s` &
done &
while sleep 1m; do
./loadcheck || continue;
for x in bar/*; do
rm -rf $x;
sleep 1m;
done &
done &
done &
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [2.4] VFS locking problem during concurrent link/unlink 2003-01-16 11:00 [2.4] VFS locking problem during concurrent link/unlink Oleg Drokin @ 2003-01-16 15:39 ` Chris Mason 2003-01-16 15:43 ` Oleg Drokin 0 siblings, 1 reply; 9+ messages in thread From: Chris Mason @ 2003-01-16 15:39 UTC (permalink / raw) To: Oleg Drokin; +Cc: linux-kernel, eazgwmir, viro, nikita On Thu, 2003-01-16 at 06:00, Oleg Drokin wrote: > Hello! > > Debugging reiserfs problem that can be demonstrated with script created by > Zygo Blaxell, I started to wonder if the problem presented is indeed reiserfs > fault and not VFS. > Though the Zygo claims script only produces problems on reiserfs, I am trying > it now myself on ext2 (which will take some time). > > Debugging shows that reiserfs_link is sometimes called for inodes whose > i_nlink is zero (and all corresponding data is deleted already). > So my current guess of what's going on is this: > No, this is a reiserfs bug, since we schedule after doing link checks in reiserfs_link and reiserfs_unlink. I sent a patch to reiserfs dev a while ago, I'll pull it out of the suse kernel and rediff against 2.4.20. -chris ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [2.4] VFS locking problem during concurrent link/unlink 2003-01-16 15:39 ` Chris Mason @ 2003-01-16 15:43 ` Oleg Drokin 2003-01-16 16:02 ` Chris Mason 0 siblings, 1 reply; 9+ messages in thread From: Oleg Drokin @ 2003-01-16 15:43 UTC (permalink / raw) To: Chris Mason; +Cc: linux-kernel, eazgwmir, viro, nikita Hello! On Thu, Jan 16, 2003 at 10:39:41AM -0500, Chris Mason wrote: > > Debugging reiserfs problem that can be demonstrated with script created by > > Zygo Blaxell, I started to wonder if the problem presented is indeed reiserfs > > fault and not VFS. > > Though the Zygo claims script only produces problems on reiserfs, I am trying > > it now myself on ext2 (which will take some time). > > > > Debugging shows that reiserfs_link is sometimes called for inodes whose > > i_nlink is zero (and all corresponding data is deleted already). > > So my current guess of what's going on is this: > No, this is a reiserfs bug, since we schedule after doing link checks in > reiserfs_link and reiserfs_unlink. I sent a patch to reiserfs dev a > while ago, I'll pull it out of the suse kernel and rediff against > 2.4.20. Yes we do. But on the other hand I've put a check at the beginning of reiserfs_link and I am still seeing these links on inodes with i_nlink == 0. Bye, Oleg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [2.4] VFS locking problem during concurrent link/unlink 2003-01-16 15:43 ` Oleg Drokin @ 2003-01-16 16:02 ` Chris Mason 2003-01-16 16:06 ` Nikita Danilov 2003-01-16 16:07 ` Oleg Drokin 0 siblings, 2 replies; 9+ messages in thread From: Chris Mason @ 2003-01-16 16:02 UTC (permalink / raw) To: Oleg Drokin; +Cc: linux-kernel, eazgwmir, viro, nikita On Thu, 2003-01-16 at 10:43, Oleg Drokin wrote: > Hello! > > On Thu, Jan 16, 2003 at 10:39:41AM -0500, Chris Mason wrote: > > > Debugging reiserfs problem that can be demonstrated with script created by > > > Zygo Blaxell, I started to wonder if the problem presented is indeed reiserfs > > > fault and not VFS. > > > Though the Zygo claims script only produces problems on reiserfs, I am trying > > > it now myself on ext2 (which will take some time). > > > > > > Debugging shows that reiserfs_link is sometimes called for inodes whose > > > i_nlink is zero (and all corresponding data is deleted already). > > > So my current guess of what's going on is this: > > No, this is a reiserfs bug, since we schedule after doing link checks in > > reiserfs_link and reiserfs_unlink. I sent a patch to reiserfs dev a > > while ago, I'll pull it out of the suse kernel and rediff against > > 2.4.20. > > Yes we do. > But on the other hand I've put a check at the beginning of reiserfs_link > and I am still seeing these links on inodes with i_nlink == 0. That's because we don't inc the link count in reiserfs_link before we schedule. The bug works a little like this: link count at 1 reiserfs_link: make new directory entry for link, schedule() reiserfs_unlink: dec link count to zero, remove file stat data reiserfs_link: inc link count, return thinking the stat data is still there All of which leads to expanding chaos as we process this link pointing to nowhere but still have a valid in ram inode pointing to it. -chris ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [2.4] VFS locking problem during concurrent link/unlink 2003-01-16 16:02 ` Chris Mason @ 2003-01-16 16:06 ` Nikita Danilov 2003-01-16 16:22 ` Chris Mason 2003-01-16 16:07 ` Oleg Drokin 1 sibling, 1 reply; 9+ messages in thread From: Nikita Danilov @ 2003-01-16 16:06 UTC (permalink / raw) To: Chris Mason; +Cc: Oleg Drokin, linux-kernel, eazgwmir, viro Chris Mason writes: > On Thu, 2003-01-16 at 10:43, Oleg Drokin wrote: [...] > > link count at 1 > reiserfs_link: make new directory entry for link, schedule() > reiserfs_unlink: dec link count to zero, remove file stat data > reiserfs_link: inc link count, return thinking the stat data is still > there What protects ext2 from doing the same on the SMP? > > All of which leads to expanding chaos as we process this link pointing > to nowhere but still have a valid in ram inode pointing to it. > > -chris Nikita. > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [2.4] VFS locking problem during concurrent link/unlink 2003-01-16 16:06 ` Nikita Danilov @ 2003-01-16 16:22 ` Chris Mason 2003-01-16 17:03 ` Oleg Drokin 0 siblings, 1 reply; 9+ messages in thread From: Chris Mason @ 2003-01-16 16:22 UTC (permalink / raw) To: Nikita Danilov; +Cc: Oleg Drokin, linux-kernel, eazgwmir, viro On Thu, 2003-01-16 at 11:06, Nikita Danilov wrote: > Chris Mason writes: > > On Thu, 2003-01-16 at 10:43, Oleg Drokin wrote: > > [...] > > > > > link count at 1 > > reiserfs_link: make new directory entry for link, schedule() > > reiserfs_unlink: dec link count to zero, remove file stat data > > reiserfs_link: inc link count, return thinking the stat data is still > > there > > What protects ext2 from doing the same on the SMP? > They inc the link count in ext2_link before scheduling. The patch below is what I had in mind, but is untested. -chris --- 1.24/fs/reiserfs/namei.c Fri Aug 9 11:22:33 2002 +++ edited/fs/reiserfs/namei.c Thu Jan 16 11:20:11 2003 @@ -748,6 +748,7 @@ int windex ; struct reiserfs_transaction_handle th ; int jbegin_count; + unsigned long savelink; inode = dentry->d_inode; @@ -783,11 +784,20 @@ inode->i_nlink = 1; } + inode->i_nlink--; + + /* + * we schedule before doing the add_save_link call, save the link + * count so we don't race + */ + savelink = inode->i_nlink; + + retval = reiserfs_cut_from_item (&th, &path, &(de.de_entry_key), dir, NULL, 0); - if (retval < 0) + if (retval < 0) { + inode->i_nlink++; goto end_unlink; - - inode->i_nlink--; + } inode->i_ctime = CURRENT_TIME; reiserfs_update_sd (&th, inode); @@ -796,7 +806,7 @@ dir->i_ctime = dir->i_mtime = CURRENT_TIME; reiserfs_update_sd (&th, dir); - if (!inode->i_nlink) + if (!savelink) /* prevent file from getting lost */ add_save_link (&th, inode, 0/* not truncate */); @@ -900,6 +910,12 @@ //FIXME: sd_nlink is 32 bit for new files return -EMLINK; } + if (inode->i_nlink == 0) { + return -ENOENT; + } + + /* inc before scheduling so reiserfs_unlink knows we are here */ + inode->i_nlink++; journal_begin(&th, dir->i_sb, jbegin_count) ; windex = push_journal_writer("reiserfs_link") ; @@ -912,12 +928,12 @@ reiserfs_update_inode_transaction(dir) ; if (retval) { + inode->i_nlink--; pop_journal_writer(windex) ; journal_end(&th, dir->i_sb, jbegin_count) ; return retval; } - inode->i_nlink++; ctime = CURRENT_TIME; inode->i_ctime = ctime; reiserfs_update_sd (&th, inode); @@ -992,6 +1008,7 @@ int jbegin_count ; umode_t old_inode_mode; time_t ctime; + unsigned long savelink = 1; /* two balancings: old name removal, new name insertion or "save" link, @@ -1161,6 +1178,7 @@ } else { new_dentry_inode->i_nlink--; } + savelink = new_dentry_inode->i_nlink; ctime = CURRENT_TIME; new_dentry_inode->i_ctime = ctime; } @@ -1196,7 +1214,7 @@ reiserfs_update_sd (&th, new_dir); if (new_dentry_inode) { - if (new_dentry_inode->i_nlink == 0) + if (savelink == 0) add_save_link (&th, new_dentry_inode, 0/* not truncate */); reiserfs_update_sd (&th, new_dentry_inode); } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [2.4] VFS locking problem during concurrent link/unlink 2003-01-16 16:22 ` Chris Mason @ 2003-01-16 17:03 ` Oleg Drokin 2003-01-16 17:20 ` Oleg Drokin 0 siblings, 1 reply; 9+ messages in thread From: Oleg Drokin @ 2003-01-16 17:03 UTC (permalink / raw) To: Chris Mason; +Cc: Nikita Danilov, linux-kernel, eazgwmir, viro Hello! On Thu, Jan 16, 2003 at 11:22:32AM -0500, Chris Mason wrote: > They inc the link count in ext2_link before scheduling. The patch below > is what I had in mind, but is untested. It does not change anything. Bye, Oleg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [2.4] VFS locking problem during concurrent link/unlink 2003-01-16 17:03 ` Oleg Drokin @ 2003-01-16 17:20 ` Oleg Drokin 0 siblings, 0 replies; 9+ messages in thread From: Oleg Drokin @ 2003-01-16 17:20 UTC (permalink / raw) To: Chris Mason; +Cc: Nikita Danilov, linux-kernel, eazgwmir, viro Hello! On Thu, Jan 16, 2003 at 08:03:41PM +0300, Oleg Drokin wrote: > > They inc the link count in ext2_link before scheduling. The patch below > > is what I had in mind, but is untested. > It does not change anything. Err, I mean I still can reproduce the problem with this patch. Bye, Oleg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [2.4] VFS locking problem during concurrent link/unlink 2003-01-16 16:02 ` Chris Mason 2003-01-16 16:06 ` Nikita Danilov @ 2003-01-16 16:07 ` Oleg Drokin 1 sibling, 0 replies; 9+ messages in thread From: Oleg Drokin @ 2003-01-16 16:07 UTC (permalink / raw) To: Chris Mason; +Cc: linux-kernel, eazgwmir, viro, nikita Hello! On Thu, Jan 16, 2003 at 11:02:08AM -0500, Chris Mason wrote: > > Yes we do. > > But on the other hand I've put a check at the beginning of reiserfs_link > > and I am still seeing these links on inodes with i_nlink == 0. > That's because we don't inc the link count in reiserfs_link before we > schedule. The bug works a little like this: > link count at 1 > reiserfs_link: make new directory entry for link, schedule() > reiserfs_unlink: dec link count to zero, remove file stat data > reiserfs_link: inc link count, return thinking the stat data is still > there That was my original yesterday's assumption. But debug prints I put in place did not confirmed it. Also if we are having a dentry pinned in memory (by sys_link), inode should not be deleted, so the statdata should stay inplace as Nikita argues. Bye, Oleg ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-01-16 17:11 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-01-16 11:00 [2.4] VFS locking problem during concurrent link/unlink Oleg Drokin 2003-01-16 15:39 ` Chris Mason 2003-01-16 15:43 ` Oleg Drokin 2003-01-16 16:02 ` Chris Mason 2003-01-16 16:06 ` Nikita Danilov 2003-01-16 16:22 ` Chris Mason 2003-01-16 17:03 ` Oleg Drokin 2003-01-16 17:20 ` Oleg Drokin 2003-01-16 16:07 ` Oleg Drokin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox