* [Drbd-dev] drbd crashes the SyncSource if a read error is encountered during sync
@ 2005-06-23 19:37 Lars Marowsky-Bree
2005-06-24 11:38 ` Philipp Reisner
2005-06-24 11:41 ` Philipp Reisner
0 siblings, 2 replies; 4+ messages in thread
From: Lars Marowsky-Bree @ 2005-06-23 19:37 UTC (permalink / raw)
To: drbd-dev
This is essentially drbd-0.7-latest - kernel message dump:
> Linux version 2.6.5-7.155-SLRS (geeko@buildhost) (gcc version 3.3.3 (SuSE Linux)) #1 Tue Mar 29 14:36:35 UTC 2005
> ...
> drbd: initialised. Version: 0.7.5 (api:77/proto:74)
> drbd: SVN Revision: 1735 build by root@g237, 2005-02-17 16:14:41
> drbd: hijacking NBD device major!
> drbd: registered as block device major 43
> drbd0: resync bitmap: bits=2588788 words=80900
> drbd0: size = 9 GB (10355152 KB)
> drbd0: 8224 MB marked out-of-sync by on disk bit-map.
> drbd0: Found 6 transactions (106 active extents) in activity log.
> drbd0: Marked additional 12 MB as out-of-sync based on AL.
> drbd0: drbdsetup [7700]: cstate Unconfigured --> StandAlone
> drbd0: drbdsetup [7713]: cstate StandAlone --> Unconnected
> drbd0: drbd0_receiver [7714]: cstate Unconnected --> WFConnection
> drbd0: drbd0_receiver [7714]: cstate WFConnection --> WFReportParams
> drbd0: Handshake successful: DRBD Network Protocol version 74
> drbd0: Connection established.
> drbd0: I am(S): 1:00000006:00000001:00000002:00000001:11
> drbd0: Peer(S): 0:00000006:00000001:00000003:00000001:01
> drbd0: drbd0_receiver [7714]: cstate WFReportParams --> WFBitMapS
> drbd0: Secondary/Unknown --> Secondary/Secondary
> drbd0: drbd0_receiver [7714]: cstate WFBitMapS --> SyncSource
> drbd0: Resync started as SyncSource (need to sync 8524240 KB [2131060 bits set]).
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068664
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068664
> drbd0: Local IO failed. Detaching...
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068672
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068672
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068680
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068680
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068688
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068688
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068696
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068696
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068704
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068704
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068712
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068712
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068720
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068720
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068728
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068728
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068736
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068736
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068744
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068744
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068752
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068752
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068760
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068760
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068772, sector=8068768
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068768
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068776, sector=8068776
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068776
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068784, sector=8068784
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068784
> drbd0: Can not satisfy peer's read request, no local data.
> drbd0: Can not satisfy peer's read request, no local data.
> drbd0: Can not satisfy peer's read request, no local data.
> drbd0: Can not satisfy peer's read request, no local data.
> drbd0: Can not satisfy peer's read request, no local data.
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068792, sector=8068792
> ide: failed opcode was: unknown
> end_request: I/O error, dev hda, sector 8068792
> drbd0: drbd0_receiver [7714]: cstate SyncSource --> Timeout
> drbd0: short sent NegDReply size=32 sent=24
> drbd0: 4114 messages suppressed in /usr/src/packages/BUILD/kernel-SLRS-2.6.5/modules-2.6.5/drbd/drbd_receiver.c:1160.
> drbd0: Can not satisfy peer's read request, no local data.
> Unable to handle kernel NULL pointer dereference at virtual address 00000004
> printing eip:
> f8bf6cf8
> *pde = 00000000
> Oops: 0002 [#1]
> CPU: 0
> EIP: 0060:[<f8bf6cf8>] Tainted: G U
> EFLAGS: 00010086 (2.6.5-7.155-SLRS SLES9_SP1_BRANCH-200503291436350000)
> EIP is at receive_DataRequest+0x1b8/0x6f0 [drbd]
> eax: 00000000 ebx: 003ba238 ecx: f687b800 edx: f687bc74
> esi: 00000000 edi: f687bc74 ebp: 00000000 esp: f68d7fa8
> ds: 007b es: 007b ss: 0068
> Process drbd0_receiver (pid: 7714, threadinfo=f68d6000 task=f6a23360)
> Stack: 00004100 ffffff0a 00001000 f687b9d8 f687b800 f8bf6b40 f687b9d8 f687b800
> f687bbd8 f8bf63cc f687bbdc 00000000 f687bbd8 00000000 f8bfd624 f8bfd5c0
> 00000000 00000000 c0106005 f687bbd8 00000000 00000000
> Call Trace:
> [<f8bf6b40>] receive_DataRequest+0x0/0x6f0 [drbd]
> [<f8bf63cc>] drbdd_init+0xac/0x2a0 [drbd]
> [<f8bfd624>] drbd_thread_setup+0x64/0xb0 [drbd]
> [<f8bfd5c0>] drbd_thread_setup+0x0/0xb0 [drbd]
> [<c0106005>] kernel_thread_helper+0x5/0x10
>
> Code: 89 78 04 89 57 04 fb ff 81 b0 03 00 00 8b 81 bc 03 00 00 80
> Dumping to block device (3,1) on CPU 0 ...
While I agree the data on both nodes is toasted at this time, as we had
a second failure during a resync, I'm also thinking it shouldn't panic
(this is the SyncSource, not the primary).
I'd expect to fail the device locally, set the inconsistent flag, and in
fact, then the primary/SyncTarget ought to do the panic thing. (in
drbd_receiver.c)
But the secondary here might be hosting other services in a cross-over
configuration and shouldn't do that.
Comments?
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [Drbd-dev] drbd crashes the SyncSource if a read error is encountered during sync
2005-06-23 19:37 [Drbd-dev] drbd crashes the SyncSource if a read error is encountered during sync Lars Marowsky-Bree
@ 2005-06-24 11:38 ` Philipp Reisner
2005-06-29 10:19 ` Lars Marowsky-Bree
2005-06-24 11:41 ` Philipp Reisner
1 sibling, 1 reply; 4+ messages in thread
From: Philipp Reisner @ 2005-06-24 11:38 UTC (permalink / raw)
To: drbd-dev
Am Donnerstag, 23. Juni 2005 21:37 schrieb Lars Marowsky-Bree:
> This is essentially drbd-0.7-latest - kernel message dump:
> > Linux version 2.6.5-7.155-SLRS (geeko@buildhost) (gcc version 3.3.3 (SuSE
> > Linux)) #1 Tue Mar 29 14:36:35 UTC 2005 ...
> > drbd: initialised. Version: 0.7.5 (api:77/proto:74)
> > drbd: SVN Revision: 1735 build by root@g237, 2005-02-17 16:14:41
> > drbd: hijacking NBD device major!
NB 1735, seems to be 0.7.9
-> 0.7.9 had that uggly LEAK BIOs BUG...!
[...]
> > drbd0: Can not satisfy peer's read request, no local data.
> > drbd0: Can not satisfy peer's read request, no local data.
> > drbd0: Can not satisfy peer's read request, no local data.
> > hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> > hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=8068792,
> > sector=8068792 ide: failed opcode was: unknown
> > end_request: I/O error, dev hda, sector 8068792
> > drbd0: drbd0_receiver [7714]: cstate SyncSource --> Timeout
> > drbd0: short sent NegDReply size=32 sent=24
> > drbd0: 4114 messages suppressed in
> > /usr/src/packages/BUILD/kernel-SLRS-2.6.5/modules-2.6.5/drbd/drbd_receive
> >r.c:1160. drbd0: Can not satisfy peer's read request, no local data.
[ 4114 messages, quite a number... ]
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 00000004 printing eip:
> > f8bf6cf8
> > *pde = 00000000
> > Oops: 0002 [#1]
> > CPU: 0
> > EIP: 0060:[<f8bf6cf8>] Tainted: G U
> > EFLAGS: 00010086 (2.6.5-7.155-SLRS SLES9_SP1_BRANCH-200503291436350000)
> > EIP is at receive_DataRequest+0x1b8/0x6f0 [drbd]
> > eax: 00000000 ebx: 003ba238 ecx: f687b800 edx: f687bc74
> > esi: 00000000 edi: f687bc74 ebp: 00000000 esp: f68d7fa8
> > ds: 007b es: 007b ss: 0068
> > Process drbd0_receiver (pid: 7714, threadinfo=f68d6000 task=f6a23360)
> > Stack: 00004100 ffffff0a 00001000 f687b9d8 f687b800 f8bf6b40 f687b9d8
> > f687b800 f687bbd8 f8bf63cc f687bbdc 00000000 f687bbd8 00000000 f8bfd624
> > f8bfd5c0 00000000 00000000 c0106005 f687bbd8 00000000 00000000
> > Call Trace:
> > [<f8bf6b40>] receive_DataRequest+0x0/0x6f0 [drbd]
> > [<f8bf63cc>] drbdd_init+0xac/0x2a0 [drbd]
> > [<f8bfd624>] drbd_thread_setup+0x64/0xb0 [drbd]
> > [<f8bfd5c0>] drbd_thread_setup+0x0/0xb0 [drbd]
> > [<c0106005>] kernel_thread_helper+0x5/0x10
> >
> > Code: 89 78 04 89 57 04 fb ff 81 b0 03 00 00 8b 81 bc 03 00 00 80
> > Dumping to block device (3,1) on CPU 0 ...
>
> While I agree the data on both nodes is toasted at this time, as we had
> a second failure during a resync, I'm also thinking it shouldn't panic
> (this is the SyncSource, not the primary).
>
Hmmm, It did not panic() it crashed by dereferncing a NULL pointer...
> I'd expect to fail the device locally, set the inconsistent flag, and in
> fact, then the primary/SyncTarget ought to do the panic thing. (in
> drbd_receiver.c)
>
> But the secondary here might be hosting other services in a cross-over
> configuration and shouldn't do that.
>
> Comments?
>
I guess it that the syncSource fails during resync case needs to be
tested. -> Will do that as time permits.
-Philipp
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [Drbd-dev] drbd crashes the SyncSource if a read error is encountered during sync
2005-06-24 11:38 ` Philipp Reisner
@ 2005-06-29 10:19 ` Lars Marowsky-Bree
0 siblings, 0 replies; 4+ messages in thread
From: Lars Marowsky-Bree @ 2005-06-29 10:19 UTC (permalink / raw)
To: Philipp Reisner, drbd-dev
On 2005-06-24T13:38:08, Philipp Reisner <philipp.reisner@linbit.com> wrote:
> > But the secondary here might be hosting other services in a cross-over
> > configuration and shouldn't do that.
> >
> > Comments?
> >
> I guess it that the syncSource fails during resync case needs to be
> tested. -> Will do that as time permits.
OK. I was just checking whether you knew of any issues here off-hand.
I'll prepare to look into it "as time permits" too, but given that the
backup tapes need to be dug out anyway, it's not that critical, and in
the mean time we'll recommend local RAID beneath drbd ;-)
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Drbd-dev] drbd crashes the SyncSource if a read error is encountered during sync
2005-06-23 19:37 [Drbd-dev] drbd crashes the SyncSource if a read error is encountered during sync Lars Marowsky-Bree
2005-06-24 11:38 ` Philipp Reisner
@ 2005-06-24 11:41 ` Philipp Reisner
1 sibling, 0 replies; 4+ messages in thread
From: Philipp Reisner @ 2005-06-24 11:41 UTC (permalink / raw)
To: drbd-dev
[...]
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 00000004 printing eip:
> > f8bf6cf8
> > *pde = 00000000
> > Oops: 0002 [#1]
> > CPU: 0
> > EIP: 0060:[<f8bf6cf8>] Tainted: G U
> > EFLAGS: 00010086 (2.6.5-7.155-SLRS SLES9_SP1_BRANCH-200503291436350000)
> > EIP is at receive_DataRequest+0x1b8/0x6f0 [drbd]
> > eax: 00000000 ebx: 003ba238 ecx: f687b800 edx: f687bc74
> > esi: 00000000 edi: f687bc74 ebp: 00000000 esp: f68d7fa8
> > ds: 007b es: 007b ss: 0068
> > Process drbd0_receiver (pid: 7714, threadinfo=f68d6000 task=f6a23360)
> > Stack: 00004100 ffffff0a 00001000 f687b9d8 f687b800 f8bf6b40 f687b9d8
> > f687b800 f687bbd8 f8bf63cc f687bbdc 00000000 f687bbd8 00000000 f8bfd624
> > f8bfd5c0 00000000 00000000 c0106005 f687bbd8 00000000 00000000
> > Call Trace:
> > [<f8bf6b40>] receive_DataRequest+0x0/0x6f0 [drbd]
> > [<f8bf63cc>] drbdd_init+0xac/0x2a0 [drbd]
> > [<f8bfd624>] drbd_thread_setup+0x64/0xb0 [drbd]
> > [<f8bfd5c0>] drbd_thread_setup+0x0/0xb0 [drbd]
> > [<c0106005>] kernel_thread_helper+0x5/0x10
> >
> > Code: 89 78 04 89 57 04 fb ff 81 b0 03 00 00 8b 81 bc 03 00 00 80
> > Dumping to block device (3,1) on CPU 0 ...
[...]
N.b. I case it is urgent, it would help a lot if you could do an
ksymoops on the machine (for decoding of the Code line), as
well as a
gcc -Sg of that drbd_receiver.c with that compiler on that kernel
with the right other gcc options :)
-phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-06-29 10:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-23 19:37 [Drbd-dev] drbd crashes the SyncSource if a read error is encountered during sync Lars Marowsky-Bree
2005-06-24 11:38 ` Philipp Reisner
2005-06-29 10:19 ` Lars Marowsky-Bree
2005-06-24 11:41 ` Philipp Reisner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox