* [Drbd-dev] Crash in _req_may_be_done()
@ 2006-09-12 7:36 Philipp Reisner
0 siblings, 0 replies; 3+ messages in thread
From: Philipp Reisner @ 2006-09-12 7:36 UTC (permalink / raw)
To: drbd-dev
Hi Lars,
First, do not read e-mails while on vacation.
Ok, I just document here my findings, in case Simon works on the same,
I do not want that we hunt the same bugs ...
Currently I run drbd in my UML setup and hit a crash in. _req_may_be_done()
/* remove the request from the conflict detection
* respective block_id verification hash */
hlist_del(&req->colision); <<<<<<<=====<<<<<<==== HERE!
/* FIXME not yet implemented...
* in case we got "suspended" (on_disconnect: freeze io)
* we may not yet complete the request...
To understand how this came I added that VERBOSE_REQUEST_CODE
[42950452.520000] drbd0: _req_mod(a101c744,to_be_submitted)
[42950452.520000] drbd0: _req_mod(a101c744,completed_ok)
[42950452.520000] drbd0: _req_may_be_done(a101c744 L-coN-----)
******* without modifications it would crash here **********
[42950452.540000] drbd0: _req_mod(a101c744,to_be_send)
[42950452.540000] drbd0: _req_mod(a101c744,to_be_submitted)
[42950452.540000] drbd0: _req_mod(a101c744,queue_for_net_write)
[42950452.540000] drbd0: _req_mod(a101c744,handed_over_to_network)
[42950452.540000] drbd0: _req_may_be_done(a101c744 Lp--Np-s--)
[42950452.540000] drbd0: _req_mod(a101c744,completed_ok)
[42950452.540000] drbd0: _req_may_be_done(a101c744 L-coNp-s--)
[42950452.540000] drbd0: _req_mod(a101c744,recv_acked_by_peer)
[42950452.540000] drbd0: _req_may_be_done(a101c744 L-coN--s-o)
What we see here is, that UML's block layer finishes the write of the
block before we even mark the request that it should be sent.
Strange, since the code in drbd_make_request_common() is:
if (remote) _req_mod(req, to_be_send);
if (local) _req_mod(req, to_be_submitted);
[...]
if (local) {
BUG_ON(req->private_bio->bi_bdev == NULL);
generic_make_request(req->private_bio);
}
The code reads, set the RQ_NET_PENDING first, then the RQ_LOCAL_PENDING
and after that issue the local request (= call generic_make_request())
What happens here ? Agressive reordering of the compiler ? I can not
believe this.
-Philipp
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: [Drbd-dev] Crash in _req_may_be_done()
@ 2006-09-12 14:00 Graham, Simon
2006-09-12 14:18 ` Philipp Reisner
0 siblings, 1 reply; 3+ messages in thread
From: Graham, Simon @ 2006-09-12 14:00 UTC (permalink / raw)
To: Philipp Reisner, drbd-dev
Philipp,
> Ok, I just document here my findings, in case Simon works on the same,
> I do not want that we hunt the same bugs ...
>
> Currently I run drbd in my UML setup and hit a crash in.
> _req_may_be_done()
>
I think I may be looking at the same thing although it's tricky to
locate the source code from the optimized binary. I am certainly seeing
a crash in _req_may_be_done I just haven't figured out where yet (too
much inlined optimized code!)
My plan is to take your new instrumentation this morning and run again
but I'll also watch out for any updates from you.
Simon
PS: re -
> To understand how this came I added that VERBOSE_REQUEST_CODE
>
> [42950452.520000] drbd0: _req_mod(a101c744,to_be_submitted)
> [42950452.520000] drbd0: _req_mod(a101c744,completed_ok)
> [42950452.520000] drbd0: _req_may_be_done(a101c744 L-coN-----)
> ******* without modifications it would crash here
> **********
> [42950452.540000] drbd0: _req_mod(a101c744,to_be_send)
> [42950452.540000] drbd0: _req_mod(a101c744,to_be_submitted)
> [42950452.540000] drbd0: _req_mod(a101c744,queue_for_net_write)
> [42950452.540000] drbd0: _req_mod(a101c744,handed_over_to_network)
> [42950452.540000] drbd0: _req_may_be_done(a101c744 Lp--Np-s--)
> [42950452.540000] drbd0: _req_mod(a101c744,completed_ok)
> [42950452.540000] drbd0: _req_may_be_done(a101c744 L-coNp-s--)
> [42950452.540000] drbd0: _req_mod(a101c744,recv_acked_by_peer)
> [42950452.540000] drbd0: _req_may_be_done(a101c744 L-coN--s-o)
>
> What we see here is, that UML's block layer finishes the write of the
> block before we even mark the request that it should be sent.
> Strange, since the code in drbd_make_request_common() is:
>
Well, Are you talking about the 1st few lines in the list above? Where
the order is to_be_submitted, completed_ok, to_be_send, to_be_submitted?
If so, I would suspect that the lines AFTER the crash location are for a
different request that happens to use the same req structure... It might
be a good idea to implement some sort of monotonically increasing
sequence number for each drbd_req_new() that is done and output that in
the trace (the address of the req structure is really not very
interesting for debugging anyway) -- an additional atomic_increment
inside drbd_req_new shouldn't be too bad...
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Drbd-dev] Crash in _req_may_be_done()
2006-09-12 14:00 [Drbd-dev] Crash in _req_may_be_done() Graham, Simon
@ 2006-09-12 14:18 ` Philipp Reisner
0 siblings, 0 replies; 3+ messages in thread
From: Philipp Reisner @ 2006-09-12 14:18 UTC (permalink / raw)
To: Graham, Simon; +Cc: drbd-dev
Am Dienstag, 12. September 2006 16:00 schrieb Graham, Simon:
> Philipp,
>
> > Ok, I just document here my findings, in case Simon works on the same,
> > I do not want that we hunt the same bugs ...
> >
> > Currently I run drbd in my UML setup and hit a crash in.
> > _req_may_be_done()
>
> I think I may be looking at the same thing although it's tricky to
> locate the source code from the optimized binary. I am certainly seeing
> a crash in _req_may_be_done I just haven't figured out where yet (too
> much inlined optimized code!)
>
> My plan is to take your new instrumentation this morning and run again
> but I'll also watch out for any updates from you.
Hi Simon,
I fixed two bugs during the day. See:
http://lists.linbit.com/pipermail/drbd-cvs/2006-September/001219.html
it was the unconditional hlist_del()
and
http://lists.linbit.com/pipermail/drbd-cvs/2006-September/001221.html
The missing dec_ap_bio(mdev)
[...]
> > [42950452.520000] drbd0: _req_mod(a101c744,to_be_submitted)
> > [42950452.520000] drbd0: _req_mod(a101c744,completed_ok)
> > [42950452.520000] drbd0: _req_may_be_done(a101c744 L-coN-----)
> > ******* without modifications it would crash here
> > **********
> > [42950452.540000] drbd0: _req_mod(a101c744,to_be_send)
> > [42950452.540000] drbd0: _req_mod(a101c744,to_be_submitted)
> > [42950452.540000] drbd0: _req_mod(a101c744,queue_for_net_write)
> > [42950452.540000] drbd0: _req_mod(a101c744,handed_over_to_network)
> > [42950452.540000] drbd0: _req_may_be_done(a101c744 Lp--Np-s--)
> > [42950452.540000] drbd0: _req_mod(a101c744,completed_ok)
> > [42950452.540000] drbd0: _req_may_be_done(a101c744 L-coNp-s--)
> > [42950452.540000] drbd0: _req_mod(a101c744,recv_acked_by_peer)
> > [42950452.540000] drbd0: _req_may_be_done(a101c744 L-coN--s-o)
> >
> > What we see here is, that UML's block layer finishes the write of the
> > block before we even mark the request that it should be sent.
> > Strange, since the code in drbd_make_request_common() is:
>
> Well, Are you talking about the 1st few lines in the list above? Where
> the order is to_be_submitted, completed_ok, to_be_send, to_be_submitted?
> If so, I would suspect that the lines AFTER the crash location are for a
> different request that happens to use the same req structure... It might
> be a good idea to implement some sort of monotonically increasing
> sequence number for each drbd_req_new() that is done and output that in
> the trace (the address of the req structure is really not very
> interesting for debugging anyway) -- an additional atomic_increment
> inside drbd_req_new shouldn't be too bad...
Right, the first three lines where from an read request, and the other
lines form the following write request. I improved the traceing code
afterwards with also printing the direction R/W.
That is it from me for today, it think.
-Philipp
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-09-12 14:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-12 14:00 [Drbd-dev] Crash in _req_may_be_done() Graham, Simon
2006-09-12 14:18 ` Philipp Reisner
-- strict thread matches above, loose matches on Subject: below --
2006-09-12 7:36 Philipp Reisner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox