* kernel BUG at prints.c:334
@ 2003-01-02 16:14 Maciej Matysiak
2003-01-06 7:11 ` Oleg Drokin
0 siblings, 1 reply; 10+ messages in thread
From: Maciej Matysiak @ 2003-01-02 16:14 UTC (permalink / raw)
To: reiserfs-list
hi,
reiserfs went mad on one of my machines. it's debian woody, with 2.4.20 kernel.
in system logs i can see:
Jan 1 10:07:44 brzydal smartd: Device: /dev/sdc, Temperature changed 1 degrees to 40 degrees since last reading
Jan 1 10:27:03 brzydal -- MARK --
Jan 1 10:37:44 brzydal smartd: Device: /dev/sdb, Temperature changed -1 degrees to 46 degrees since last reading
Jan 1 10:39:34 brzydal smartd: /dev/sdc:Failed to read smart values
Jan 1 10:39:34 brzydal kernel: < 0xff) 29(c 0x60, s 0x17, l 0, t 0xff) 30(c 0x60, s 0x17, l 0, t 0xff) 31(c 0x60, s 0x17, l 0, t 0xff)
Jan 1 10:39:34 brzydal kernel: Pending list: 106(c 0x60, s 0x17, l 0), 103(c 0x60, s 0x27, l 0), 67(c 0x60, s 0x27, l 0), 83(c 0x60, s 0x37, l 0)
Jan 1 10:39:34 brzydal kernel: Kernel Free SCB list: 113 195 42 246 224 214 88 71 12 70 95 13 51 61 16 59 98 202 123 222 63 10 76 220 201 192 218 53 54 86 1 209 52 17 90 57 231
117 30 56 21 19 230 43 91 78 199 223 227 226 9 35 233 6 221 4 37 191 72 108 60 203 243 82 245 242 239 208 116 236 8 206 234 102 3 68 248 194 94 44 207 46 33 93 215 62 237 118 87
80 36 23 250 114 232 47 5 101 216 126 49 96 39 240 109 189 197 212 32 111 213 190 15 244 229 238 25 89 184 125 58 69 235 131 66 120 247 22 0 124 104 40 65 193 48 81 14 29 31 79 2
11 210 112 74 200 18 225 107 119 97 28 188 219 50 228 251 45 7 20 11 84 73 92 204 85 205 100 130 24 122 41 34 75 196 99 64 38 127 249 55 217 77 115 27 26 198 121 105 110 253 252
2 185 186 187 180 181 182 183 176 177 178 179 172 173 174 175 168 169 170 171 164 165 166 167 160 161 162 163 156 157 158 159 152 153 154 155 148 149 150 151 144 145 146 147 140
141 142 143 136 137 138 139 132 133 134 135 128 129
Jan 1 10:39:34 brzydal kernel: DevQ(0:1:0): 0 waiting
Jan 1 10:39:34 brzydal kernel: DevQ(0:2:0): 0 waiting
Jan 1 10:39:34 brzydal kernel: DevQ(0:3:0): 0 waiting
Jan 1 10:39:34 brzydal kernel: DevQ(0:4:0): 0 waiting
Jan 1 10:39:34 brzydal kernel: DevQ(0:8:0): 0 waiting
Jan 1 10:39:34 brzydal kernel: scsi0:0:1:0: Cmd aborted from QINFIFO
Jan 1 10:39:34 brzydal kernel: aic7xxx_abort returns 0x2002
Jan 1 10:39:34 brzydal kernel: scsi0:0:2:0: Attempting to queue an ABORT message
Jan 1 10:39:34 brzydal kernel: scsi0: Dumping Card State in Command phase, at SEQADDR 0x16d
Jan 1 10:39:34 brzydal kernel: ACCUM = 0x80, SINDEX = 0xa0, DINDEX = 0xe4, ARG_2 = 0x14
Jan 1 10:39:35 brzydal kernel: HCNT = 0x0 SCBPTR = 0xc
Jan 1 10:39:35 brzydal kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
Jan 1 10:39:35 brzydal kernel: DFCNTRL = 0x4, DFSTATUS = 0x89
Jan 1 10:39:35 brzydal kernel: LASTPHASE = 0x80, SCSISIGI = 0x84, SXFRCTL0 = 0x88
Jan 1 10:39:35 brzydal kernel: SSTAT0 = 0x5, SSTAT1 = 0x2
Jan 1 10:39:35 brzydal kernel: STACK == 0x17b, 0x165, 0x0, 0x35
Jan 1 10:39:35 brzydal kernel: SCB count = 254
Jan 1 10:39:35 brzydal kernel: Kernel NEXTQSCB = 67
Jan 1 10:39:35 brzydal kernel: Card NEXTQSCB = 241
Jan 1 10:39:35 brzydal kernel: QINFIFO entries: 241 103
Jan 1 10:39:35 brzydal kernel: Waiting Queue entries:
Jan 1 10:39:35 brzydal kernel: Disconnected Queue entries:
Jan 1 10:39:35 brzydal kernel: QOUTFIFO entries:
Jan 1 10:39:35 brzydal kernel: Sequencer Free SCB List: 0 11 3 20 23 8 2 17 13 10 27 14 26 28 15 18 4 31 30 1 24 21 29 7 6 16 22 25 9 5 19
then the above repeating several times, then:
Jan 1 10:39:35 brzydal kernel: DevQ(0:1:0): 0 waiting
Jan 1 10:39:35 brzydal kernel: DevQ(0:2:0): 0 waiting
Jan 1 10:39:35 brzydal kernel: DevQ(0:3:0): 0 waiting
Jan 1 10:39:35 brzydal kernel: DevQ(0:4:0): 0 waiting
Jan 1 10:39:35 brzydal kernel: DevQ(0:8:0): 0 waiting
Jan 1 10:39:35 brzydal kernel: scsi0:0:3:0: Device is active, asserting ATN
Jan 1 10:39:35 brzydal kernel: Recovery code sleeping
Jan 1 10:39:35 brzydal kernel: Recovery code awake
Jan 1 10:39:35 brzydal kernel: Timer Expired
Jan 1 10:39:35 brzydal kernel: aic7xxx_abort returns 0x2003
Jan 1 10:39:35 brzydal kernel: scsi0:0:1:0: Attempting to queue a TARGET RESET message
Jan 1 10:39:35 brzydal kernel: aic7xxx_dev_reset returns 0x2003
Jan 1 10:39:35 brzydal kernel: scsi0:0:2:0: Attempting to queue a TARGET RESET message
Jan 1 10:39:35 brzydal kernel: aic7xxx_dev_reset returns 0x2003
Jan 1 10:39:35 brzydal kernel: scsi0:0:3:0: Attempting to queue a TARGET RESET message
Jan 1 10:39:35 brzydal kernel: aic7xxx_dev_reset returns 0x2003
Jan 1 10:39:35 brzydal kernel: Recovery SCB completes
Jan 1 10:39:35 brzydal kernel: scsi: device set offline - not ready or command retry failed after bus reset: host 0 channel 0 id 3 lun 0
Jan 1 11:00:29 brzydal kernel: I/O error: dev 08:21, sector 65704
Jan 1 11:00:29 brzydal kernel: zam-7001: io error in reiserfs_find_entry
Jan 1 11:00:29 brzydal kernel: I/O error: dev 08:21, sector 65704
Jan 1 11:00:29 brzydal kernel: zam-7001: io error in reiserfs_find_entry
i've just tried to unmount the device. umount segfaulted, but the device is
no longer mounted. in logs i can see:
Jan 2 16:47:44 brzydal kernel: zam-7001: io error in reiserfs_find_entry
Jan 2 16:52:13 brzydal kernel: I/O error: dev 08:21, sector 64672
Jan 2 16:52:13 brzydal kernel: kernel BUG at prints.c:334!
Jan 2 16:52:13 brzydal kernel: invalid operand: 0000
Jan 2 16:52:13 brzydal kernel: CPU: 0
Jan 2 16:52:13 brzydal kernel: EIP: 0010:[reiserfs_panic+41/96] Not tainted
Jan 2 16:52:13 brzydal kernel: EFLAGS: 00010282
Jan 2 16:52:13 brzydal kernel: eax: 00000024 ebx: c022fc60 ecx: ceb40000 edx: 00000001
Jan 2 16:52:13 brzydal kernel: esi: cf372c00 edi: 00000000 ebp: cf372c00 esp: ce00fe44
Jan 2 16:52:13 brzydal kernel: ds: 0018 es: 0018 ss: 0018
Jan 2 16:52:13 brzydal kernel: Process umount (pid: 20067, stackpage=ce00f000)
Jan 2 16:52:13 brzydal kernel: Stack: c022e0da c02c68a0 c022fc60 ce00fe68 d0e3c3c4 00000002 c016f40f cf372c00
Jan 2 16:52:13 brzydal kernel: c022fc60 000013c4 00000012 00000010 00000000 d0e3c3f8 d0e3c3ec 00000003
Jan 2 16:52:13 brzydal kernel: 00000000 0000003b c1931bc0 c0172bbc cf372c00 d0e3c3c4 00000001 ce00ff38
Jan 2 16:52:13 brzydal kernel: Call Trace: [flush_commit_list+687/928] [do_journal_end+1896/2672] [do_journal_release+37/152] [reiserfs_put_super+74/328] [journal_release+17/
24]
Jan 2 16:52:13 brzydal kernel: [reiserfs_put_super+84/328] [kill_super+165/220] [__mntput+30/36] [path_release+39/44] [sys_umount+111/124] [sys_munmap+53/84]
Jan 2 16:52:13 brzydal kernel: [sys_oldumount+12/16] [system_call+51/56]
Jan 2 16:52:13 brzydal kernel:
Jan 2 16:52:13 brzydal kernel: Code: 0f 0b 4e 01 e0 e0 22 c0 68 a0 68 2c c0 85 f6 74 16 0f b7 46
reiserfsck can't do anything:
brzydal:~/ReiserFS/reiserfsprogs-3.6.4/fsck# ./reiserfsck /dev/sdc1
<-------------reiserfsck, 2002------------->
reiserfsprogs 3.6.4
[...]
Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
reiserfsck: Cannot not open filesystem on "/dev/sdc1"
Aborted
the disk is:
Vendor: IBM Model: IC35L036UWD210-0 Rev: S5BS
Type: Direct-Access ANSI SCSI revision: 03
Attached scsi disk sdc at scsi0, channel 0, id 3, lun 0
(scsi0:A:3): 80.000MB/s transfers (40.000MHz, offset 63, 16bit)
SCSI device sdc: 71687340 512-byte hdwr sectors (36704 MB)
sdc: sdc1
it's brand new, just 3 weeks ago installed. it's attached to:
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
<Adaptec 2940B Ultra2 SCSI adapter>
aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs
what's wrong (disk or kernel?) and what can with this problem?
i'm a bit afraid to reboot the machine at the moment.
m.m.
--
use gnus, not guns!
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: kernel BUG at prints.c:334
2003-01-02 16:14 kernel BUG at prints.c:334 Maciej Matysiak
@ 2003-01-06 7:11 ` Oleg Drokin
2003-01-06 7:36 ` Maciej Matysiak
2003-01-06 8:42 ` Hans Reiser
0 siblings, 2 replies; 10+ messages in thread
From: Oleg Drokin @ 2003-01-06 7:11 UTC (permalink / raw)
To: reiserfs-list; +Cc: phoner.reiserfs
Hello!
On Thu, Jan 02, 2003 at 05:14:25PM +0100, Maciej Matysiak wrote:
> reiserfs went mad on one of my machines. it's debian woody, with 2.4.20 kernel.
> in system logs i can see:
Logs you just provided show that your harddrive cannot remember what it was
supposed to store anymore.
> Jan 1 11:00:29 brzydal kernel: I/O error: dev 08:21, sector 65704
> Jan 1 11:00:29 brzydal kernel: zam-7001: io error in reiserfs_find_entry
> Jan 1 11:00:29 brzydal kernel: I/O error: dev 08:21, sector 65704
> Jan 1 11:00:29 brzydal kernel: zam-7001: io error in reiserfs_find_entry
See these IO errors? It cannot read the data off the HDD (/dev/sdc1).
> i've just tried to unmount the device. umount segfaulted, but the device is
> no longer mounted. in logs i can see:
Yes, reiserfs is not very well prepared to I/O errors while writing to journal,
hence it panicked.
> reiserfsck can't do anything:
No wonder, it cannot read anything off the disk either.
May be if you reboot or just keep the box off for some time and drive will
return to somewhat normal state, you will be able to
read something off it, but may be not. You will not know
for sure until you try.
> it's brand new, just 3 weeks ago installed. it's attached to:
> scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
> <Adaptec 2940B Ultra2 SCSI adapter>
> aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs
> what's wrong (disk or kernel?) and what can with this problem?
Try to reboot (halt, switch the box off, then turn it back on to be sure),
if everything will appear normal, then the problem is with
controller or aic7xxx driver most probably.
If the drive is still inaccessible, then the drive is bad.
> i'm a bit afraid to reboot the machine at the moment.
Since the disk cannot be read anymore, you have no much other choices I afraid.
Bye,
Oleg
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: kernel BUG at prints.c:334
2003-01-06 7:11 ` Oleg Drokin
@ 2003-01-06 7:36 ` Maciej Matysiak
2003-01-06 7:44 ` Oleg Drokin
2003-01-06 8:42 ` Hans Reiser
1 sibling, 1 reply; 10+ messages in thread
From: Maciej Matysiak @ 2003-01-06 7:36 UTC (permalink / raw)
To: reiserfs-list
On the 6th of January 2003 at 08:11, Oleg Drokin <green#namesys.com> wrote:
> Logs you just provided show that your harddrive cannot remember what it was
> supposed to store anymore.
yes. it appeared that the disk has just died. without any warning, just
stopped spinning. i didn't know that at the time of sending the mail (i've had
only remote access to the server), so please forgive me wasting your time.
btw., it appears for me that ibm disks have something like 'y2k3 problem'.
it's a poor joke, but i got already 3 ibm disks that died in my servers this
year. that one was working not even 3 weeks. all of them scsi, made in hungary
or italy. is it my bad luck, or should i buy seagate next time? (rhetorical
question, i don't want to start flamewar here).
> Yes, reiserfs is not very well prepared to I/O errors while writing to
> journal, hence it panicked.
but could the errors be a bit more descriptive, please :)
m.m.
--
in backup we trust.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at prints.c:334
2003-01-06 7:36 ` Maciej Matysiak
@ 2003-01-06 7:44 ` Oleg Drokin
2003-01-06 9:23 ` Maciej Matysiak
0 siblings, 1 reply; 10+ messages in thread
From: Oleg Drokin @ 2003-01-06 7:44 UTC (permalink / raw)
To: reiserfs-list
Hello!
On Mon, Jan 06, 2003 at 08:36:22AM +0100, Maciej Matysiak wrote:
> btw., it appears for me that ibm disks have something like 'y2k3 problem'.
> it's a poor joke, but i got already 3 ibm disks that died in my servers this
> year. that one was working not even 3 weeks. all of them scsi, made in hungary
Ones from Hungary had pretty much of bad press for two years already starting
with infamous (IDE) DTLA-307030 series, I believe.
> > Yes, reiserfs is not very well prepared to I/O errors while writing to
> > journal, hence it panicked.
> but could the errors be a bit more descriptive, please :)
It said "I/O error doing so and so".
And lots of SCSI diagnostic from SCSI layer prior to that.
What do you consider to be more informative message (with example, please).
Bye,
Oleg
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: kernel BUG at prints.c:334
2003-01-06 7:44 ` Oleg Drokin
@ 2003-01-06 9:23 ` Maciej Matysiak
0 siblings, 0 replies; 10+ messages in thread
From: Maciej Matysiak @ 2003-01-06 9:23 UTC (permalink / raw)
To: reiserfs-list
On the 6th of January 2003 at 08:44, Oleg Drokin <green#namesys.com> wrote:
>> btw., it appears for me that ibm disks have something like 'y2k3 problem'.
>> it's a poor joke, but i got already 3 ibm disks that died in my servers
>> this year. that one was working not even 3 weeks. all of them scsi, made in
>> hungary
> Ones from Hungary had pretty much of bad press for two years already starting
> with infamous (IDE) DTLA-307030 series, I believe.
all my dead ibm disks are scsi. that last was identified as IC35L036UWD210-0.
>>> Yes, reiserfs is not very well prepared to I/O errors while writing to
>>> journal, hence it panicked.
>> but could the errors be a bit more descriptive, please :)
> It said "I/O error doing so and so".
> And lots of SCSI diagnostic from SCSI layer prior to that.
> What do you consider to be more informative message (with example, please).
after thinking a bit more about that and looking at the logs again, i have to
admit that you're absolutely right. sorry for the noise.
m.m.
--
use gnus, not guns!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at prints.c:334
2003-01-06 7:11 ` Oleg Drokin
2003-01-06 7:36 ` Maciej Matysiak
@ 2003-01-06 8:42 ` Hans Reiser
2003-01-06 17:02 ` Valdis.Kletnieks
2003-01-10 17:07 ` Any paper on Journaling algorithm? bmoon
1 sibling, 2 replies; 10+ messages in thread
From: Hans Reiser @ 2003-01-06 8:42 UTC (permalink / raw)
To: Oleg Drokin; +Cc: reiserfs-list, phoner.reiserfs
Oleg Drokin wrote:
>Hello!
>
>On Thu, Jan 02, 2003 at 05:14:25PM +0100, Maciej Matysiak wrote:
>
>
>
>>reiserfs went mad on one of my machines. it's debian woody, with 2.4.20 kernel.
>>in system logs i can see:
>>
>>
>
>Logs you just provided show that your harddrive cannot remember what it was
>supposed to store anymore.
>
Oleg, please consider finding all of our messages which say I/O error,
and adding something that says "i/o errors are almost always due to
hardware failure"
>
>
>
>>Jan 1 11:00:29 brzydal kernel: I/O error: dev 08:21, sector 65704
>>Jan 1 11:00:29 brzydal kernel: zam-7001: io error in reiserfs_find_entry
>>Jan 1 11:00:29 brzydal kernel: I/O error: dev 08:21, sector 65704
>>Jan 1 11:00:29 brzydal kernel: zam-7001: io error in reiserfs_find_entry
>>
>>
>
>See these IO errors? It cannot read the data off the HDD (/dev/sdc1).
>
>
>
>>i've just tried to unmount the device. umount segfaulted, but the device is
>>no longer mounted. in logs i can see:
>>
>>
>
>Yes, reiserfs is not very well prepared to I/O errors while writing to journal,
>hence it panicked.
>
>
>
>>reiserfsck can't do anything:
>>
>>
>
>No wonder, it cannot read anything off the disk either.
>
>May be if you reboot or just keep the box off for some time and drive will
>return to somewhat normal state, you will be able to
>read something off it, but may be not. You will not know
>for sure until you try.
>
>
>
>>it's brand new, just 3 weeks ago installed. it's attached to:
>>scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
>> <Adaptec 2940B Ultra2 SCSI adapter>
>> aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs
>>what's wrong (disk or kernel?) and what can with this problem?
>>
>>
>
>Try to reboot (halt, switch the box off, then turn it back on to be sure),
>if everything will appear normal, then the problem is with
>controller or aic7xxx driver most probably.
>If the drive is still inaccessible, then the drive is bad.
>
>
>
>>i'm a bit afraid to reboot the machine at the moment.
>>
>>
>
>Since the disk cannot be read anymore, you have no much other choices I afraid.
>
>Bye,
> Oleg
>
>
>
>
--
Hans
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at prints.c:334
2003-01-06 8:42 ` Hans Reiser
@ 2003-01-06 17:02 ` Valdis.Kletnieks
2003-01-10 17:07 ` Any paper on Journaling algorithm? bmoon
1 sibling, 0 replies; 10+ messages in thread
From: Valdis.Kletnieks @ 2003-01-06 17:02 UTC (permalink / raw)
To: Hans Reiser; +Cc: reiserfs-list
[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]
On Mon, 06 Jan 2003 11:42:09 +0300, Hans Reiser said:
> Oleg, please consider finding all of our messages which say I/O error,
> and adding something that says "i/o errors are almost always due to
> hardware failure"
I had a sign outside my office, about 4 feet off the ground, that looked like:
========================
Your /|\
clue / | \
must |
be at least |
THIS tall |
to ride the |
Internet. |
>>Jan 1 11:00:29 brzydal kernel: I/O error: dev 08:21, sector 65704
>>Jan 1 11:00:29 brzydal kernel: zam-7001: io error in reiserfs_find_entry
>>Jan 1 11:00:29 brzydal kernel: I/O error: dev 08:21, sector 65704
>>Jan 1 11:00:29 brzydal kernel: zam-7001: io error in reiserfs_find_entry
If that combined with an error message from the SCSI/IDE/whatever driver
doesn't make it clear, adding "probably hardware failure" won't impart
any additional clue. If there isn't a SCSI/IDE/etc message, then that is
a *separate* issue that should be fixed in the driver.
--
Valdis Kletnieks
Computer Systems Senior Engineer
Virginia Tech
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Any paper on Journaling algorithm?
2003-01-06 8:42 ` Hans Reiser
2003-01-06 17:02 ` Valdis.Kletnieks
@ 2003-01-10 17:07 ` bmoon
2003-01-13 15:53 ` Hans Reiser
1 sibling, 1 reply; 10+ messages in thread
From: bmoon @ 2003-01-10 17:07 UTC (permalink / raw)
To: Hans Reiser; +Cc: reiserfs-list
Hello!
I am investigating on the Journaling from the Reisersfs source code.
However, it is not easy to understand. I could not find any good reference
on the Journaling.
Please guide me to the good reference or paper on the journaling algorithm.
Thanks,
Bo
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2003-01-13 18:57 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-02 16:14 kernel BUG at prints.c:334 Maciej Matysiak
2003-01-06 7:11 ` Oleg Drokin
2003-01-06 7:36 ` Maciej Matysiak
2003-01-06 7:44 ` Oleg Drokin
2003-01-06 9:23 ` Maciej Matysiak
2003-01-06 8:42 ` Hans Reiser
2003-01-06 17:02 ` Valdis.Kletnieks
2003-01-10 17:07 ` Any paper on Journaling algorithm? bmoon
2003-01-13 15:53 ` Hans Reiser
[not found] ` <3E22ECF0.9050605@emageon.com>
2003-01-13 18:57 ` Hans Reiser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.