From: Lars Ellenberg <Lars.Ellenberg@linbit.com>
To: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] oopses in 2.6.19.1
Date: Thu, 25 Jan 2007 23:26:30 +0100 [thread overview]
Message-ID: <20070125222630.GC8857@soda.linbit> (raw)
In-Reply-To: <20070125213210.GK7738@soda.linbit>
/ 2007-01-25 22:32:10 +0100
\ Lars Ellenberg:
>
> first, there is 2.6.19.2 already.
> second, there is drbd 8.0.0 already.
> though, there have not been any interesting changes in this area since revision 2695,
> which you apparently use.
> > drbd0: conn( WFSyncUUID -> SyncTarget )
> > drbd0: Began resync as SyncTarget (will sync 1158770664 KB [289692666 bits set]).
> > drbd0: Writing meta data super block now.
> > eth1: no IPv6 routers present
> > eth0: no IPv6 routers present
> > ----------- [cut here ] --------- [please bite here ] ---------
> > Kernel BUG at ...ed/kernel/tyan-s2891/modules/drbd/drbd/lru_cache.c:312
> > invalid opcode: 0000 [1] SMP
> > Call Trace:
> > [<ffffffff88077ecf>] :drbd:drbd_rs_complete_io+0xcf/0x130
> > [<ffffffff8806b94d>] :drbd:drbd_endio_write_sec+0x1bd/0x2d0
>
> > RIP [<ffffffff8807997f>] :drbd:lc_put+0x4f/0xc0
> > NMI Watchdog detected LOCKUP on CPU 0
> > RIP: 0010:[<ffffffff8026b4ba>] [<ffffffff8026b4ba>] _spin_lock_irqsave+0xa/0x20
> > Call Trace:
> > [<ffffffff8807772b>] :drbd:__drbd_set_in_sync+0x1bb/0x2e0
> > [<ffffffff88071048>] :drbd:e_end_resync_block+0x68/0x100
> > [<ffffffff8806f50b>] :drbd:drbd_process_done_ee+0xdb/0x140
> > [<ffffffff88071688>] :drbd:drbd_asender+0xe8/0x580
I'd love it if it were not a logic bug but rather drbd being not robust
and paranoid enough...
one posibility for this to happen would be:
being SyncTarget
requesting some resync blocks. this also does the drbd_rs_begin_io.
the SyncSource sends us some RSDataReply (with an ID of -1ULL, and
some sector offset).
we currently do not verify whether we expected this sector offset.
we just read in the data and submit them. [there is a FIXME paranoia
comment in place in receive_RSDataReply, though]
later, the drbd_endio_write_sec callback does the drbd_rs_complete_io
for the corresponding resync extent.
now, if that extent was in the resync lru because we used it before,
but the RSDataReply would be for a sector we had not requested [*],
the refcnt is likely to be imbalanced, and we might BUG_ON it being zero,
in lc_put...
[*] how that could happen, I don't know yet...
in any case, regardless of this being a logic bug, (smp) race condition
or anything else, we need to become more robust there:
Index: drbd_actlog.c
===================================================================
--- drbd_actlog.c (revision 2715)
+++ drbd_actlog.c (working copy)
@@ -1098,6 +1098,13 @@
return;
}
+ if(bm_ext->lce.refcnt == 0) {
+ spin_unlock_irqrestore(&mdev->al_lock,flags);
+ ERR("drbd_rs_complete_io(,%llu [=%u]) called, but refcnt is 0!?\n",
+ (unsigned long long)sector, enr);
+ return;
+ }
+
if( lc_put(mdev->resync,(struct lc_element *)bm_ext) == 0 ) {
clear_bit(BME_LOCKED,&bm_ext->flags);
clear_bit(BME_NO_WRITES,&bm_ext->flags);
(not dared to commit this, in case this all was nonsense...
I feel too tired now)
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
next prev parent reply other threads:[~2007-01-25 22:26 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-10 12:31 [Drbd-dev] drbd 2.6.19 crypto changes Ard van Breemen
2007-01-10 13:48 ` Lars Ellenberg
2007-01-10 16:09 ` Ard van Breemen
2007-01-10 19:33 ` Ard van Breemen
2007-01-10 16:23 ` Philipp Reisner
2007-01-10 20:17 ` Ard van Breemen
2007-01-11 14:38 ` Ard van Breemen
2007-01-11 17:12 ` Ard van Breemen
2007-01-11 18:03 ` [Drbd-dev] oopses in 2.6.19.1 Ard van Breemen
2007-01-12 13:53 ` Philipp Reisner
2007-01-15 17:06 ` Philipp Reisner
2007-01-16 10:37 ` Ard van Breemen
2007-01-25 17:45 ` Ard van Breemen
2007-01-25 21:32 ` Lars Ellenberg
2007-01-25 22:26 ` Lars Ellenberg [this message]
2007-01-28 10:59 ` Ard van Breemen
2007-01-28 11:38 ` Ard van Breemen
[not found] ` <20070126142857.GE9639@kwaak.net>
2007-01-26 14:34 ` Ard van Breemen
2007-02-11 21:55 ` Ard van Breemen
2007-01-12 13:50 ` [Drbd-dev] drbd 2.6.19 crypto changes Philipp Reisner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070125222630.GC8857@soda.linbit \
--to=lars.ellenberg@linbit.com \
--cc=drbd-dev@lists.linbit.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.