* 2.4.20, reiserfs, md linear, and "Permission denied"...
@ 2003-01-11 19:30 Zygo Blaxell
2003-01-14 0:37 ` Zygo Blaxell
2003-01-14 16:56 ` How to break a reiserfs on Linux 2.4.20 Zygo Blaxell
0 siblings, 2 replies; 16+ messages in thread
From: Zygo Blaxell @ 2003-01-11 19:30 UTC (permalink / raw)
To: reiserfs-list
I think I'm seeing a pattern of failure. I'm wondering if there is a
problem with MD linear personality (aka JBOD) and reiserfs.
Here's the recipe for disaster:
Ingredients:
reiserfs, of course ;-)
2.4.18 and 2.4.20 kernels (compiled for SMP but running on UP)
Arrays of disks in MD linear mode (aka raidtools2)
lots of metadata I/O (cp -al & rm -rf)
Directions:
Start with a large collection of files (e.g. the contents of /etc, /var,
and /usr). Put this on a reiserfs filesystem under '(mountpoint)/foo'.
In one thread, do 'cp -al foo bar/`date +%Y%m%d%H%M%S`'. Create a new
directory for each copy. The thread should check for free disk space
after each 'cp' command and throttle itself as the filesystem gets too
full (i.e. when there is less space free than the size of 'foo').
In a second and third thread, do 'rm -rf `ls -d bar/* | head -1`'.
The third thread should sleep for one hour between rm commands, while the
second thread will not sleep. Both threads should check for free disk
space and throttle themselves when the filesystem gets too empty
(e.g. when there is more space free than the size of 'foo'). Note that
there will sometimes be two 'rm -rf's processing the same directories
at the same time.
In a fourth thread, run 'find -ls >/dev/null' over the entire filesystem
continuously.
In a fifth thread, replace a few of the files in 'foo'. In my case these
files are actually rsync-ed from a live Linux system that provided the
original contents of 'foo'.
If I run this for about two weeks, one day the 'find' and 'rm' threads
will start to emit lots of "Permission denied" messages when trying to
access random files under 'bar'. This is especially apparent in the 'rm'
threads, because they'll stop being useful if they're unable to completely
remove the oldest directory.
reiserfsck --fix-fixable runs for 24.5 hours, then says:
[several megabytes deleted]
k_semantic_pass: name "gtk" in directory 5015381 5015470 points to nowhere - removed
dir 5015381 5015470 has wrong sd_size 72, has to be 48 check_semantic_pass: name "Yell-O" in directory 5009649 5015381 points to nowhere - removed
dir 5009649 5015381 has wrong sd_size 72, has to be 48 check_semantic_pass: name "themes" in directory 5005514 5009649 points to nowhere - removed
dir 5005514 5009649 has wrong sd_size 200, has to be 48 check_semantic_pass: name "share" in directory 5004046 5005514 points to nowhere - removed
dir 5004046 5005514 has wrong sd_size 120, has to be 48 check_semantic_pass: name "usr" in directory 5004045 5004046 points to nowhere - removed
dir 5004045 5004046 has wrong sd_size 96, has to be 48
No corruptions found
There are on the filesystem:
Leaves 723046
Internal nodes 5073
Directories 1226338
Other files 15857491
Data block pointers 35351178 (zero of them 169766)
Safe links 0
###########
reiserfsck finished at Sat Jan 11 10:17:45 2003
###########
Observations:
I've seen this phenomenon occur ten times in the last two months.
I thought that upgrading from 2.4.18 to 2.4.20 might fix the problem,
but it has occurred three times on 2.4.20 machines.
There do not appear to be any disk errors in the kernel logs. I have
no reason to believe that there are CPU, RAM, or cooling problems.
All of those tend to manifest themselves in other ways that I have not
also observed.
There does not appear to be any data corruption or loss in the filesystem
other than the names pointing to nowhere. The affected files would be
newly created hardlinks or recently deleted hardlinks--neither of which is
"data loss" per se. Other hardlinks to these files seem to be unaffected
(I've never seen any part of 'foo' damaged, only 'bar').
Both reiserfs and linear (JBOD) arrays of disks seem to be a requirement
to reproduce the problem. I have several machines running very similar
workload and software--they're actually mirrored servers with automated
failover, so whatever filesystem activity one machine does, another
machine does soon after. Machines using reiserfs on RAID0, RAID1, or
single disks do not have these problems, nor are there problems with
machines using ext3 on linear arrays. All machines are running the
same kernels and they're all in the same building (so they have the same
power failures and resulting unclean shutdowns). There's a wide variety
of hardware involved ranging from P133 to P4 and several different disk
vendors.
If there is an unclean shutdown, there will definitely be names pointing
to nowhere; however, this problem has recently occurred once on a machine
that was not shut down at all, cleanly or otherwise, since mkreiserfs.
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.4.20, reiserfs, md linear, and "Permission denied"...
2003-01-11 19:30 2.4.20, reiserfs, md linear, and "Permission denied" Zygo Blaxell
@ 2003-01-14 0:37 ` Zygo Blaxell
2003-01-14 16:56 ` How to break a reiserfs on Linux 2.4.20 Zygo Blaxell
1 sibling, 0 replies; 16+ messages in thread
From: Zygo Blaxell @ 2003-01-14 0:37 UTC (permalink / raw)
To: reiserfs-list
In article <avprcs$bfi$1@satsuki.furryterror.org>,
Zygo Blaxell <eazgwmir@umail.furryterror.org> wrote:
>I think I'm seeing a pattern of failure. I'm wondering if there is a
>problem with MD linear personality (aka JBOD) and reiserfs.
This has just now happened to a machine running a RAID0 array. There
was no unclean shutdown this time. That seems to rule out linear mode
as a possible culprit (although MD might have a bug that affects both
RAID0 and linear).
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 16+ messages in thread
* How to break a reiserfs on Linux 2.4.20
2003-01-11 19:30 2.4.20, reiserfs, md linear, and "Permission denied" Zygo Blaxell
2003-01-14 0:37 ` Zygo Blaxell
@ 2003-01-14 16:56 ` Zygo Blaxell
2003-01-14 17:05 ` Nikita Danilov
2003-01-14 17:53 ` Oleg Drokin
1 sibling, 2 replies; 16+ messages in thread
From: Zygo Blaxell @ 2003-01-14 16:56 UTC (permalink / raw)
To: reiserfs-list
In article <avprcs$bfi$1@satsuki.furryterror.org>,
Zygo Blaxell <eazgwmir@umail.furryterror.org> wrote:
>I think I'm seeing a pattern of failure. ...
And now I can reliably reproduce it. It has nothing to do with MD,
linear, raid, SMP, or unclean shutdowns.
I can reproduce this bug on a plain IDE disk partition in about three
hours on Linux 2.4.20 (compiled for SMP but running on UP, full .config
and system details available on request). My test system has about 4 gigs
under /etc, /usr, and /var, /dev/hdc2 is 25GB, and there is 1G of swap.
BEGIN cut-and-paste-into-a-root-shell
# Create an empty filesystem:
mkreiserfs -f -f /dev/hdc2
mount /dev/hdc2 /test
cd /test
# Script used to control the load average. Note that as written the loops
# below will keep spawning new processes, so we need some way to throttle
# them. Change the '-lt 10' to another number to change the number
# of processes.
cat <<'LC' > loadcheck && chmod 755 loadcheck
#!/bin/sh
read av1 av5 av15 rest < /proc/loadavg
echo -n "Load Average: $av1 ... "
av1=${av1%.*}
if [ $av1 -lt 10 ]; then
echo OK
exit 0
else
echo "Whoa, Nellie!"
exit 1
fi
LC
# Create directories used by test
mkdir foo bar
# Start up some rsyncs. I use /etc, /usr, and /var because there's a
# good mixture of files with some hardlinks between them, and on a normal
# Linux system some of them change from time to time.
while sleep 1m; do
./loadcheck || continue;
for x in usr etc var; do
rsync -avxHS --delete /$x/. foo/$x/. &
done;
done &
# Start up some cp -al's and rm -rf's. Note there are two concurrent
# sets of 'cp's and two concurrent sets of 'rm's, and each of those
# has different instances of 'cp' and 'rm' running at different times.
for x in 1 2; do
while sleep 1m; do
./loadcheck || continue;
cp -al foo bar/`date +%s` &
done &
while sleep 1m; do
./loadcheck || continue;
for x in bar/*; do
rm -rf $x;
sleep 1m;
done &
done &
done &
END cut-and-paste-into-a-root-shell
rm and occasionally cp will frequently complain about "No such file
or directory". This is normal. After about 3 hours, the following
non-normal messages appear:
readlink lib/R/library/base/help/contrasts: Permission denied
readlink lib/R/library/base/html/hsv.html: Permission denied
rm: cannot remove `bar/1042550428/usr/src/kernel-source-2.4.20-zb-586-smp/drivers/net/appletalk/ltpc.o': Permission denied
rm: cannot remove `bar/1042550428/usr/src/kernel-source-2.4.20-zb-586-smp/drivers/net/aironet4500_proc.c': Permission denied
cp: cannot stat `foo/usr/src/kernel-source-2.4.20-zb-586-smp/drivers/net/e1000/.e1000_ethtool.o.flags': Permission denied
cp: cannot stat `foo/usr/src/kernel-source-2.4.20-zb-586-smp/drivers/net/.eepro.o.flags': Permission denied
This needs a 'reiserfsck --fix-fixable' to fix.
It looks to me like there may be some sort of locking bug triggered by
concurrent link/unlink/rename calls, but I'm not even a filesystem expert,
much less a reiserfs expert. ;-)
--
Opinions expressed are my own, I don't speak for my employer, and all that.
Encrypted email preferred. Go ahead, you know you want to. ;-)
OpenPGP at work: 3528 A66A A62D 7ACE 7258 E561 E665 AA6F 263D 2C3D
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-14 16:56 ` How to break a reiserfs on Linux 2.4.20 Zygo Blaxell
@ 2003-01-14 17:05 ` Nikita Danilov
2003-01-14 19:04 ` Zygo Blaxell
2003-01-15 22:44 ` Zygo Blaxell
2003-01-14 17:53 ` Oleg Drokin
1 sibling, 2 replies; 16+ messages in thread
From: Nikita Danilov @ 2003-01-14 17:05 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: reiserfs-list
Zygo Blaxell writes:
> In article <avprcs$bfi$1@satsuki.furryterror.org>,
> Zygo Blaxell <eazgwmir@umail.furryterror.org> wrote:
> >I think I'm seeing a pattern of failure. ...
>
> And now I can reliably reproduce it. It has nothing to do with MD,
> linear, raid, SMP, or unclean shutdowns.
>
> I can reproduce this bug on a plain IDE disk partition in about three
> hours on Linux 2.4.20 (compiled for SMP but running on UP, full .config
> and system details available on request). My test system has about 4 gigs
> under /etc, /usr, and /var, /dev/hdc2 is 25GB, and there is 1G of swap.
>
Thanks for the report. We shall try to reproduce it tonight.
By the way, do you have REISERFS_CHECK compiled in?
>
>
Nikita.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-14 16:56 ` How to break a reiserfs on Linux 2.4.20 Zygo Blaxell
2003-01-14 17:05 ` Nikita Danilov
@ 2003-01-14 17:53 ` Oleg Drokin
2003-01-14 19:02 ` Zygo Blaxell
1 sibling, 1 reply; 16+ messages in thread
From: Oleg Drokin @ 2003-01-14 17:53 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: reiserfs-list
Hello!
On Tue, Jan 14, 2003 at 04:56:01PM +0000, Zygo Blaxell wrote:
> rm and occasionally cp will frequently complain about "No such file
> or directory". This is normal. After about 3 hours, the following
> non-normal messages appear:
Are these normal too?
building file list ... building file list ... building file list ... done
mkdir foo/var/. : No such file or directory (1)
unexpected EOF in read_timeout
done
mkdir foo/etc/. : No such file or directory (1)
unexpected EOF in read_timeout
done
mkdir foo/usr/. : No such file or directory (1)
unexpected EOF in read_timeout
If they are normal, then I am going to wait for a few hours ;)
Bye,
Oleg
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-14 17:53 ` Oleg Drokin
@ 2003-01-14 19:02 ` Zygo Blaxell
0 siblings, 0 replies; 16+ messages in thread
From: Zygo Blaxell @ 2003-01-14 19:02 UTC (permalink / raw)
To: reiserfs-list
In article <20030114205322.A19107@namesys.com>,
Oleg Drokin <green@namesys.com> wrote:
>Are these normal too?
>building file list ... building file list ... building file list ... done
>mkdir foo/var/. : No such file or directory (1)
>unexpected EOF in read_timeout
>done
>mkdir foo/etc/. : No such file or directory (1)
>unexpected EOF in read_timeout
>done
>mkdir foo/usr/. : No such file or directory (1)
>unexpected EOF in read_timeout
Oh, um, oops. ;-)
"mkdir foo/etc foo/usr foo/var" first.
--
Opinions expressed are my own, I don't speak for my employer, and all that.
Encrypted email preferred. Go ahead, you know you want to. ;-)
OpenPGP at work: 3528 A66A A62D 7ACE 7258 E561 E665 AA6F 263D 2C3D
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-14 17:05 ` Nikita Danilov
@ 2003-01-14 19:04 ` Zygo Blaxell
2003-01-14 19:15 ` Nikita Danilov
2003-01-15 22:44 ` Zygo Blaxell
1 sibling, 1 reply; 16+ messages in thread
From: Zygo Blaxell @ 2003-01-14 19:04 UTC (permalink / raw)
To: reiserfs-list
In article <15908.17236.966170.982897@laputa.namesys.com>,
Nikita Danilov <Nikita@Namesys.COM> wrote:
>Thanks for the report. We shall try to reproduce it tonight.
>
>By the way, do you have REISERFS_CHECK compiled in?
No. I presume I should try again with it enabled, and see if it says
anything interesting?
--
Opinions expressed are my own, I don't speak for my employer, and all that.
Encrypted email preferred. Go ahead, you know you want to. ;-)
OpenPGP at work: 3528 A66A A62D 7ACE 7258 E561 E665 AA6F 263D 2C3D
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-14 19:04 ` Zygo Blaxell
@ 2003-01-14 19:15 ` Nikita Danilov
2003-01-14 22:39 ` Zygo Blaxell
0 siblings, 1 reply; 16+ messages in thread
From: Nikita Danilov @ 2003-01-14 19:15 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: reiserfs-list
Zygo Blaxell writes:
> In article <15908.17236.966170.982897@laputa.namesys.com>,
> Nikita Danilov <Nikita@Namesys.COM> wrote:
> >Thanks for the report. We shall try to reproduce it tonight.
> >
> >By the way, do you have REISERFS_CHECK compiled in?
>
> No. I presume I should try again with it enabled, and see if it says
> anything interesting?
That would be helpful.
>
Nikita.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-14 19:15 ` Nikita Danilov
@ 2003-01-14 22:39 ` Zygo Blaxell
0 siblings, 0 replies; 16+ messages in thread
From: Zygo Blaxell @ 2003-01-14 22:39 UTC (permalink / raw)
To: reiserfs-list
In article <15908.25054.542735.558616@laputa.namesys.com>,
Nikita Danilov <Nikita@Namesys.COM> wrote:
>Zygo Blaxell writes:
> > In article <15908.17236.966170.982897@laputa.namesys.com>,
> > Nikita Danilov <Nikita@Namesys.COM> wrote:
> > >By the way, do you have REISERFS_CHECK compiled in?
> > No. I presume I should try again with it enabled, and see if it says
> > anything interesting?
>That would be helpful.
I ran the script again and started getting Permission denied's after
20 minutes. The kernel said:
Jan 14 17:13:07 berkelium kernel: reiserfs:warning: CONFIG_REISERFS_CHECK is set ON
Jan 14 17:13:07 berkelium kernel: reiserfs:warning: - it is slow mode for debugging.
Jan 14 17:13:07 berkelium kernel: reiserfs: checking transaction log (device 16:02) ...
Jan 14 17:13:08 berkelium kernel: journal-1225: No valid transactions found
Jan 14 17:13:08 berkelium kernel: journal-1299: Setting newest_mount_id to 10
Jan 14 17:13:08 berkelium kernel: Using r5 hash to sort names
Jan 14 17:13:08 berkelium kernel: ReiserFS version 3.6.25
...and that's all, even when repeatedly accessing and trying to
delete the offending names that point to nowhere.
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-14 17:05 ` Nikita Danilov
2003-01-14 19:04 ` Zygo Blaxell
@ 2003-01-15 22:44 ` Zygo Blaxell
2003-01-16 7:49 ` Oleg Drokin
1 sibling, 1 reply; 16+ messages in thread
From: Zygo Blaxell @ 2003-01-15 22:44 UTC (permalink / raw)
To: reiserfs-list
In article <15908.17236.966170.982897@laputa.namesys.com>,
Nikita Danilov <Nikita@Namesys.COM> wrote:
>Zygo Blaxell writes:
> > In article <avprcs$bfi$1@satsuki.furryterror.org>,
> > Zygo Blaxell <eazgwmir@umail.furryterror.org> wrote:
> > >I think I'm seeing a pattern of failure. ...
> >
> > And now I can reliably reproduce it. It has nothing to do with MD,
> > linear, raid, SMP, or unclean shutdowns.
> >
> > I can reproduce this bug on a plain IDE disk partition in about three
> > hours on Linux 2.4.20 (compiled for SMP but running on UP, full .config
> > and system details available on request). My test system has about 4 gigs
> > under /etc, /usr, and /var, /dev/hdc2 is 25GB, and there is 1G of swap.
>
>Thanks for the report. We shall try to reproduce it tonight.
Were you successful? If your experience is anything like mine, you
should have hundreds if not thousands of broken files by now...
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-15 22:44 ` Zygo Blaxell
@ 2003-01-16 7:49 ` Oleg Drokin
2003-01-16 14:16 ` Zygo Blaxell
0 siblings, 1 reply; 16+ messages in thread
From: Oleg Drokin @ 2003-01-16 7:49 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: reiserfs-list
Hello!
On Wed, Jan 15, 2003 at 05:44:26PM -0500, Zygo Blaxell wrote:
> > > And now I can reliably reproduce it. It has nothing to do with MD,
> > > linear, raid, SMP, or unclean shutdowns.
> > >
> > > I can reproduce this bug on a plain IDE disk partition in about three
> > > hours on Linux 2.4.20 (compiled for SMP but running on UP, full .config
> > > and system details available on request). My test system has about 4 gigs
> > > under /etc, /usr, and /var, /dev/hdc2 is 25GB, and there is 1G of swap.
> >Thanks for the report. We shall try to reproduce it tonight.
> Were you successful? If your experience is anything like mine, you
> should have hundreds if not thousands of broken files by now...
Yes, we were able to reproduce the problem and now we are trying to fix it.
Thanks a lot for your help and for the script.
Bye,
Oleg
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-16 7:49 ` Oleg Drokin
@ 2003-01-16 14:16 ` Zygo Blaxell
2003-01-16 15:22 ` Oleg Drokin
0 siblings, 1 reply; 16+ messages in thread
From: Zygo Blaxell @ 2003-01-16 14:16 UTC (permalink / raw)
To: reiserfs-list
In article <20030116104906.A7078@namesys.com>,
Oleg Drokin <green@namesys.com> wrote:
>Yes, we were able to reproduce the problem and now we are trying to fix it.
>Thanks a lot for your help and for the script.
Excellent! :-)
Just on a whim, I ran the tests on a different kernel image yesterday
and got some different results in the syslog:
Jan 15 18:15:54 berkelium kernel: reiserfs: checking transaction log (device 16:02) ...
Jan 15 18:15:54 berkelium kernel: Using r5 hash to sort names
Jan 15 18:15:54 berkelium kernel: ReiserFS version 3.6.25
[test script starts here]
Jan 15 18:26:00 berkelium kernel: mmit_list, block already dirty!
Jan 15 18:26:00 berkelium kernel: journal-569: flush_commit_list, block already dirty!
Jan 15 18:26:00 berkelium last message repeated 291 times
Jan 15 18:26:00 berkelium kernel: vs-13060: reiserfs_update_sd: stat data of object [537145 537147 0x0 SD] (nlink == 6) not found (pos 2)
[messages similar to the last one repeated for about 500 lines]
Then the kernel panicked. I'm going to try this again and try to capture the
panic message. The filesystem was pretty badly trashed when I tried to
mount it after rebooting--lots of files had appeared in the root of the
filesystem where they shouldn't be, and nothing was accessible.
The main difference between the two kernels (aside from whether various SCSI
and RAID drivers are built-in or modules) is the SMP flag and CPU type
(Pentium 3 uniprocessor vs. 586 SMP). Neither one had the REISER_CHECK
option set.
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-16 14:16 ` Zygo Blaxell
@ 2003-01-16 15:22 ` Oleg Drokin
2003-01-16 15:29 ` Chris Mason
2003-01-17 20:13 ` Zygo Blaxell
0 siblings, 2 replies; 16+ messages in thread
From: Oleg Drokin @ 2003-01-16 15:22 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: reiserfs-list
Hello!
On Thu, Jan 16, 2003 at 09:16:11AM -0500, Zygo Blaxell wrote:
> Oleg Drokin <green@namesys.com> wrote:
> >Yes, we were able to reproduce the problem and now we are trying to fix it.
> >Thanks a lot for your help and for the script.
> Excellent! :-)
> Just on a whim, I ran the tests on a different kernel image yesterday
> and got some different results in the syslog:
> Jan 15 18:26:00 berkelium kernel: journal-569: flush_commit_list, block already dirty!
Hm, these are something new for me.
> Jan 15 18:26:00 berkelium kernel: vs-13060: reiserfs_update_sd: stat data of object [537145 537147 0x0 SD] (nlink == 6) not found (pos 2)
I've seen these too right from the beginning.
> Then the kernel panicked. I'm going to try this again and try to capture the
Same for me. And I saw more debugging messages in fact.
> The main difference between the two kernels (aside from whether various SCSI
> and RAID drivers are built-in or modules) is the SMP flag and CPU type
> (Pentium 3 uniprocessor vs. 586 SMP). Neither one had the REISER_CHECK
> option set.
Was the kernel in SMP mode? (I do my tests on UP)
Bye,
Oleg
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-16 15:22 ` Oleg Drokin
@ 2003-01-16 15:29 ` Chris Mason
2003-01-16 15:32 ` Oleg Drokin
2003-01-17 20:13 ` Zygo Blaxell
1 sibling, 1 reply; 16+ messages in thread
From: Chris Mason @ 2003-01-16 15:29 UTC (permalink / raw)
To: Oleg Drokin; +Cc: Zygo Blaxell, reiserfs-list
On Thu, 2003-01-16 at 10:22, Oleg Drokin wrote:
> Hello!
>
> On Thu, Jan 16, 2003 at 09:16:11AM -0500, Zygo Blaxell wrote:
> > Oleg Drokin <green@namesys.com> wrote:
> > >Yes, we were able to reproduce the problem and now we are trying to fix it.
> > >Thanks a lot for your help and for the script.
> > Excellent! :-)
> > Just on a whim, I ran the tests on a different kernel image yesterday
> > and got some different results in the syslog:
> > Jan 15 18:26:00 berkelium kernel: journal-569: flush_commit_list, block already dirty!
>
> Hm, these are something new for me.
>
These aren't good at all, either the locking on the commit lists is
broken or the buffer head management is going wrong.
Oleg, do you have any leads or do you want me to try reproducing?
-chris
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-16 15:29 ` Chris Mason
@ 2003-01-16 15:32 ` Oleg Drokin
0 siblings, 0 replies; 16+ messages in thread
From: Oleg Drokin @ 2003-01-16 15:32 UTC (permalink / raw)
To: Chris Mason; +Cc: Zygo Blaxell, reiserfs-list
Hello!
On Thu, Jan 16, 2003 at 10:29:03AM -0500, Chris Mason wrote:
> These aren't good at all, either the locking on the commit lists is
> broken or the buffer head management is going wrong.
> Oleg, do you have any leads or do you want me to try reproducing?
Well, I found something looking like a race in VFS (I sent a mail to lkml
earlier today). reiserfs_link is called with inodes that have zero i_nlink.
But I do not see how that might explain journaling stuff.
I have not tried the script on SMP yet, though.
Bye,
Oleg
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to break a reiserfs on Linux 2.4.20
2003-01-16 15:22 ` Oleg Drokin
2003-01-16 15:29 ` Chris Mason
@ 2003-01-17 20:13 ` Zygo Blaxell
1 sibling, 0 replies; 16+ messages in thread
From: Zygo Blaxell @ 2003-01-17 20:13 UTC (permalink / raw)
To: reiserfs-list
In article <20030116182201.A28414@namesys.com>,
Oleg Drokin <green@namesys.com> wrote:
>> The main difference between the two kernels (aside from whether various SCSI
>> and RAID drivers are built-in or modules) is the SMP flag and CPU type
>> (Pentium 3 uniprocessor vs. 586 SMP). Neither one had the REISER_CHECK
>> option set.
>
>Was the kernel in SMP mode? (I do my tests on UP)
All my test machines have single processors. My dual-processor machines
are few and all are in production.
--
Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2003-01-17 20:13 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-11 19:30 2.4.20, reiserfs, md linear, and "Permission denied" Zygo Blaxell
2003-01-14 0:37 ` Zygo Blaxell
2003-01-14 16:56 ` How to break a reiserfs on Linux 2.4.20 Zygo Blaxell
2003-01-14 17:05 ` Nikita Danilov
2003-01-14 19:04 ` Zygo Blaxell
2003-01-14 19:15 ` Nikita Danilov
2003-01-14 22:39 ` Zygo Blaxell
2003-01-15 22:44 ` Zygo Blaxell
2003-01-16 7:49 ` Oleg Drokin
2003-01-16 14:16 ` Zygo Blaxell
2003-01-16 15:22 ` Oleg Drokin
2003-01-16 15:29 ` Chris Mason
2003-01-16 15:32 ` Oleg Drokin
2003-01-17 20:13 ` Zygo Blaxell
2003-01-14 17:53 ` Oleg Drokin
2003-01-14 19:02 ` Zygo Blaxell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.