* 2.4.23 && md raid1 && reiserfs panic
@ 2004-02-07 11:23 James Bromberger
2004-02-08 17:22 ` Oleg Drokin
0 siblings, 1 reply; 4+ messages in thread
From: James Bromberger @ 2004-02-07 11:23 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1.1: Type: text/plain, Size: 1168 bytes --]
Hello all,
Can someone point me towards the right mailing list to help get this
issue resolved; I think I've hit something in either the MD, block or
reiserfs sections, but I could be completely wrong.
The symptoms: rm a file from a working RAID1 md reiserfs filesystem,
and I get a panic, rm(1) segfaults, and all further I/O to any interactive
shells stop. The entire system is rednered incapable; reboot (via
ctrl-alt-del) doesnt shutdown and the only action is to hard reset the box.
Attached is a copy of the oops as recorded by syslog, the script I run to
start the error. The exact line that Segfaults is the rm() on a 2.2 GB
tar file.
For reference, the system is documented at
http://www.james.rcpt.to/programs/debian/raid1/
This is only using the Debian packages of the kernel.
kernel-image-2.4.23-686_2.4.23-1.
If there is any further info I can send, please let me know.
Please CC me as I am not subscribed to LKML. Ta.
Regards,
James
--
James Bromberger <james_AT_rcpt.to> www.james.rcpt.to
Remainder moved to http://www.james.rcpt.to/james/sig.html
I am in London on UK mobile +44 7952 042920.
[-- Attachment #1.2: error-2.4.23-1-686-md-reiserfs.txt --]
[-- Type: text/plain, Size: 4562 bytes --]
The kernel: 2.4.23-1-686-
The script:
DAY=`date +%d`
MONTH=`date +%m`
YEAR=`date +%Y`
FROM="/usr/local/fileshare /usr/local/psql"
TO="/usr/local/share/${YEAR}-${MONTH}-${DAY}-fileshare-backup.tbz"
TAR="/bin/tar"
CMD="$TAR -Pcjlf $TO $FROM"
/bin/echo -n "Removing old backup..."
/bin/rm /usr/local/share/????-??-??-fileshare-backup.tbz
The script running:
++ date +%d
+ DAY=07
++ date +%m
+ MONTH=02
++ date +%Y
+ YEAR=2004
+ FROM=/usr/local/fileshare /usr/local/psql
+ TO=/usr/local/share/2004-02-07-fileshare-backup.tbz
+ TAR=/bin/tar
+ CMD=/bin/tar -Pcjlf /usr/local/share/2004-02-07-fileshare-backup.tbz /usr/local/fileshare /usr/local/psql
+ /bin/echo -n 'Removing old backup...'
Removing old backup...+ /bin/rm /usr/local/share/2004-01-30-fileshare-backup.tbz./makecompressedbackup.sh: line 23: 9729 Segmentation fault /bin/rm /usr/local/share/????-??-??-fileshare-backup.tbz
++ date
+ NOW=Sat Feb 7 17:21:19 WST 2004
+ echo 'Sat Feb 7 17:21:19 WST 2004 Doing /bin/tar -Pcjlf /usr/local/share/2004-02-07-fileshare-backup.tbz /usr/local/fileshare /usr/local/psql...'
Sat Feb 7 17:21:19 WST 2004 Doing /bin/tar -Pcjlf /usr/local/share/2004-02-07-fileshare-backup.tbz /usr/local/fileshare /usr/local/psql...
+ /bin/tar -Pcjf /usr/local/share/2004-02-07-fileshare-backup.tbz /usr/local/fileshare /usr/local/psql
From /var/log/syslog:
Feb 7 17:21:19 phobe kernel: md(9,5):vs-4075: reiserfs_free_block: block 1076949926 is out of range on md(9,5)
Feb 7 17:21:19 phobe kernel: md(9,5):vs-4075: reiserfs_free_block: block 2212879009 is out of range on md(9,5)
Feb 7 17:21:19 phobe kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000028c
Feb 7 17:21:19 phobe kernel: printing eip:
Feb 7 17:21:19 phobe kernel: e085124d
Feb 7 17:21:19 phobe kernel: *pde = 00000000
Feb 7 17:21:19 phobe kernel: Oops: 0002
Feb 7 17:21:19 phobe kernel: CPU: 0
Feb 7 17:21:19 phobe kernel: EIP: 0010:[ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1772979/96] Not tainted
Feb 7 17:21:19 phobe kernel: EFLAGS: 00010282
Feb 7 17:21:19 phobe kernel: eax: 00000000 ebx: 0001d0a2 ecx: df47dc00 edx: e0ca624c
Feb 7 17:21:19 phobe kernel: esi: 0000146b edi: e0c170d4 ebp: 000003f4 esp: d56b1b88
Feb 7 17:21:19 phobe kernel: ds: 0018 es: 0018 ss: 0018
Feb 7 17:21:19 phobe kernel: Process rm (pid: 9729, stackpage=d56b1000)
Feb 7 17:21:19 phobe kernel: Stack: d56b1f1c d56b1f1c e851146b 00000000 e0855af0 df47dc00 e851146b e0c170d4
Feb 7 17:21:19 phobe kernel: 00005aa1 e08327ca d56b1f1c e851146b c763be68 000003f4 e08328c6 d56b1f1c
Feb 7 17:21:19 phobe kernel: df47dc00 e851146b e851146b 00000065 e084d735 d56b1f1c e851146b cbf92b00
Feb 7 17:21:19 phobe kernel: Call Trace: [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1754384/96] [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1898550/96] [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1898298/96] [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1788107/96] [__make_request+892/1936]
Feb 7 17:21:19 phobe kernel: [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1541824/96] [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1541784/96] [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1786838/96] [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1784312/96] [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1782530/96] [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1785591/96]
Feb 7 17:21:19 phobe kernel: [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1868477/96] [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1717594/96] [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1868624/96] [ide-floppy:__insmod_ide-floppy_O/lib/modules/2.4.23-1-686/kernel/drive+-1716640/96] [iput+304/640] [cached_lookup+27/112]
Feb 7 17:21:19 phobe kernel: [vfs_unlink+242/448] [sys_unlink+186/288] [system_call+51/56]
Feb 7 17:21:19 phobe kernel:
Feb 7 17:21:19 phobe kernel: Code: 0f ab 30 8b 5c 24 04 31 c0 8b 74 24 08 8b 7c 24 0c 83 c4 10
...
Feb 7 17:26:23 phobe sm-mta[1274]: rejecting connections on daemon MTA: load average: 12
Feb 7 17:26:23 phobe sm-mta[1274]: rejecting connections on daemon MSA: load average: 12
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.4.23 && md raid1 && reiserfs panic
2004-02-07 11:23 2.4.23 && md raid1 && reiserfs panic James Bromberger
@ 2004-02-08 17:22 ` Oleg Drokin
2004-02-09 1:10 ` James Bromberger
0 siblings, 1 reply; 4+ messages in thread
From: Oleg Drokin @ 2004-02-08 17:22 UTC (permalink / raw)
To: linux-kernel, james
James Bromberger <james@rcpt.to> wrote:
JB> The symptoms: rm a file from a working RAID1 md reiserfs filesystem,
JB> and I get a panic, rm(1) segfaults, and all further I/O to any interactive
JB> shells stop. The entire system is rednered incapable; reboot (via
JB> ctrl-alt-del) doesnt shutdown and the only action is to hard reset the box.
What if you run reiserfsck over the volume that seems to be corrupted,
then fix the errors and then retry the operation?
Bye,
Oleg
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.4.23 && md raid1 && reiserfs panic
2004-02-08 17:22 ` Oleg Drokin
@ 2004-02-09 1:10 ` James Bromberger
2004-02-09 1:56 ` Oleg Drokin
0 siblings, 1 reply; 4+ messages in thread
From: James Bromberger @ 2004-02-09 1:10 UTC (permalink / raw)
To: Oleg Drokin; +Cc: linux-kernel
[-- Attachment #1.1: Type: text/plain, Size: 1031 bytes --]
Oleg Drokin (green@linuxhacker.ru) wrote:
> James Bromberger <james@rcpt.to> wrote:
>
> JB> The symptoms: rm a file from a working RAID1 md reiserfs filesystem,
> JB> and I get a panic, rm(1) segfaults, and all further I/O to any interactive
> JB> shells stop. The entire system is rednered incapable; reboot (via
> JB> ctrl-alt-del) doesnt shutdown and the only action is to hard reset the box.
>
> What if you run reiserfsck over the volume that seems to be corrupted,
> then fix the errors and then retry the operation?
Yes! That was it. Attached is the output I captured from reiserfsck.
It identified the very file I was attempting to remove that was causing
the segfault in rm(1).
So I guess this is a reiserfs specific issue when it kills all disk I/O
when this correcption happens. Hmm.
Thanks Oleg for your help.
James
--
James Bromberger <james_AT_rcpt.to> www.james.rcpt.to
Remainder moved to http://www.james.rcpt.to/james/sig.html
I am in London on UK mobile +44 7952 042920.
[-- Attachment #1.2: error-reiserfsck.txt --]
[-- Type: text/plain, Size: 2619 bytes --]
bad_indirect_item: block 1449156: The item [9 24 0x1458e2001 IND (1)] has the bad pointer (904) to the block (616563974)
bad_indirect_item: block 1449156: The item [9 24 0x1458e2001 IND (1)] has the bad pointer (905) to the block (2538674917)
bad_indirect_item: block 1449156: The item [9 24 0x1458e2001 IND (1)] has the bad pointer (906) to the block (1226131222)
bad_indirect_item: block 1449156: The item [9 24 0x1458e2001 IND (1)] has the bad pointer (907) to the block (2721249827)
bad_indirect_item: block 1449156: The item [9 24 0x1458e2001 IND (1)] has the bad pointer (908) to the block (15941101)
bad_indirect_item: block 1449156: The item [9 24 0x1458e2001 IND (1)] has the bad pointer (909) to the block (74275561)
bad_indirect_item: block 1449156: The item [9 24 0x1458e2001 IND (1)] has the bad pointer (910) to the block (3897627755)
bad_indirect_item: block 1449156: The item [9 24 0x1458e2001 IND (1)] has the bad pointer (911) to the block (2212879009)
bad_indirect_item: block 1449156: The item [9 24 0x1458e2001 IND (1)] has the bad pointer (912) to the block (1076949926) /167 (of 167)/126 (of 127)bad_indirect_item: block 1361383: The item (76625 76647 0xfa001 IND (1), len 4048, location 48 entry count 0, fsck need 0, format new) has the bad pointer (995) to the block (6306403), which is in tree already finished
Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
/share/2004-01-30-fileshare-backup.tbzvpf-10670: The file [9 24] has the wrong size in the StatData (2295013376), should be (5466054656)
vpf-10680: The file [9 24] has the wrong block count in the StatData (0), should be (10675888) finished 16 found corruptions can be fixed when running with --fix-fixable
Comparing bitmaps..vpf-10630: The on-disk and the correct bitmaps differs. Will be fixed later.
Checking Semantic tree:
/fileshare/Documents/pics/standup block pics/standing for block.tifvpf-10680: The file [76625 76647] has the wrong block count in the StatData (12192) - correct/share/2004-01-30-fileshare-backup.tbzvpf-10670: The file [9 24] has the wrong size in the StatData (2295013376) - corrected to (5466054656)
vpf-10680: The file [9 24] has the wrong block count in the StatData (0) - corrected to (10675800) finished No corruptions found
There are on the filesystem:
Leaves 24533
Internal nodes 168
Directories 2483
Other files 62601
Data block pointers 8767338 (143 of them are zero)
Safe links 0
###########
reiserfsck finished at Mon Feb 9 08:55:00 2004
###########
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.4.23 && md raid1 && reiserfs panic
2004-02-09 1:10 ` James Bromberger
@ 2004-02-09 1:56 ` Oleg Drokin
0 siblings, 0 replies; 4+ messages in thread
From: Oleg Drokin @ 2004-02-09 1:56 UTC (permalink / raw)
To: James Bromberger; +Cc: linux-kernel
Hello!
On Mon, Feb 09, 2004 at 09:10:40AM +0800, James Bromberger wrote:
> > JB> The symptoms: rm a file from a working RAID1 md reiserfs filesystem,
> > JB> and I get a panic, rm(1) segfaults, and all further I/O to any interactive
> > JB> shells stop. The entire system is rednered incapable; reboot (via
> > JB> ctrl-alt-del) doesnt shutdown and the only action is to hard reset the box.
> >
> > What if you run reiserfsck over the volume that seems to be corrupted,
> > then fix the errors and then retry the operation?
> Yes! That was it. Attached is the output I captured from reiserfsck.
> It identified the very file I was attempting to remove that was causing
> the segfault in rm(1).
> So I guess this is a reiserfs specific issue when it kills all disk I/O
> when this correcption happens. Hmm.
Yes, in-kernel reiserfs is not all that good when it comes to corrupted
filesystems yet. The source of this corruption is yet unknown, though.
You can fix the corruption with reiserfsck --fix-fixable, or
reiserfsck --rebuild-tree if first one does not work.
Be sure to use latest reiserfsprogs.
Bye,
Oleg
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-02-09 1:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-07 11:23 2.4.23 && md raid1 && reiserfs panic James Bromberger
2004-02-08 17:22 ` Oleg Drokin
2004-02-09 1:10 ` James Bromberger
2004-02-09 1:56 ` Oleg Drokin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox