From: Philipp Reisner <philipp.reisner@linbit.com>
To: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] DRBD-8 - crash due to NULL page* in drbd_send_page
Date: Wed, 16 Aug 2006 10:44:31 +0200 [thread overview]
Message-ID: <200608161044.31669.philipp.reisner@linbit.com> (raw)
In-Reply-To: <342BAC0A5467384983B586A6B0B37671036252EC@EXNA.corp.stratus.com>
[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]
Am Dienstag, 15. August 2006 21:46 schrieb Graham, Simon:
> Have now traced the network and I am very confused -- I'm still
> convinced that the problem is that we are still in drbd_send_zc_bio when
> the Ack for the write is received BUT the data is correctly and
> completely sent on the wire to the peer who turns around and sends a
> WriteAck to it.
>
> I suppose it's theoretically possible that sending the final portion of
> the data from drbd_send_zc_bio might end up being pended; maybe the pipe
> is full when we go to send it which causes the worker thread to get
> suspended. That being the case, it's possible that this thread doesn't
> get rescheduled until waaaaay later - specifically, AFTER the Ack has
> been received and the bio completed and freed -- now we return to the
> worker thread and attempt to continue to loop through the (now free) bio
> with __bio_for_each_segment -- does this seem feasible?
>
> Assuming for the minute that this IS the cause, what would a suitable
> solution be? We really need to delay processing the Ack until the
> send-dblock/send-block has finished -- i.e. we should wait until the
> RQ_DRBD_ON_WIRE flag is set in the request -- is there something
> suitable we could issue a wait_event_interruptible() on in
> got_BlockAck() to wait for this?
>
Simon,
I think a suitable solution would be to complete the request after
1) it was written locally.
2) the ack was received.
3) and we finished sending it [new]
I attached the patch. I guess you will rerun your tests with this
patch. [ it is completely untested ]
I take from Lars' mail yesterday that he could not reproduce this
problem here on our main test cluster here, so it is up to you
to verify it.
-philipp
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :
[-- Attachment #2: for_simon.diff --]
[-- Type: text/x-diff, Size: 1275 bytes --]
Index: drbd_worker.c
===================================================================
--- drbd_worker.c (revision 2373)
+++ drbd_worker.c (working copy)
@@ -564,12 +564,10 @@
ok = drbd_send_dblock(mdev,req);
if (ok) {
- spin_lock_irq(&mdev->req_lock);
- req->rq_status |= RQ_DRBD_ON_WIRE;
- spin_unlock_irq(&mdev->req_lock);
-
inc_ap_pending(mdev);
+ drbd_end_req(req,RQ_DRBD_ON_WIRE,1,drbd_req_get_sector(req));
+
if(mdev->net_conf->wire_protocol == DRBD_PROT_A) {
dec_ap_pending(mdev);
drbd_end_req(req, RQ_DRBD_SENT, 1,
Index: drbd_int.h
===================================================================
--- drbd_int.h (revision 2373)
+++ drbd_int.h (working copy)
@@ -233,9 +233,9 @@
#define RQ_DRBD_NOTHING 0x0001
#define RQ_DRBD_SENT 0x0010 // We got an ack
#define RQ_DRBD_LOCAL 0x0020 // We wrote it to the local disk
-#define RQ_DRBD_DONE 0x0030 // We are done ;)
#define RQ_DRBD_IN_TL 0x0040 // Set when it is in the TL
#define RQ_DRBD_ON_WIRE 0x0080 // Set as soon as it is on the socket...
+#define RQ_DRBD_DONE ( RQ_DRBD_SENT + RQ_DRBD_LOCAL + RQ_DRBD_ON_WIRE )
/* drbd_meta-data.c (still in drbd_main.c) */
#define DRBD_MD_MAGIC (DRBD_MAGIC+4) // 4th incarnation of the disk layout.
next prev parent reply other threads:[~2006-08-16 8:44 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-15 19:46 [Drbd-dev] DRBD-8 - crash due to NULL page* in drbd_send_page Graham, Simon
2006-08-16 8:44 ` Philipp Reisner [this message]
2006-08-16 8:52 ` Philipp Reisner
-- strict thread matches above, loose matches on Subject: below --
2006-08-16 13:37 Graham, Simon
2006-08-16 3:32 Graham, Simon
2006-08-15 20:30 Graham, Simon
2006-08-15 21:29 ` Lars Ellenberg
2006-08-15 18:55 Graham, Simon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200608161044.31669.philipp.reisner@linbit.com \
--to=philipp.reisner@linbit.com \
--cc=drbd-dev@lists.linbit.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.