[Drbd-dev] [DRBD8.0 PATCH] Updated fix to ensure stale state is not sent if a cluster wide state change is in progress

Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed

* [Drbd-dev] [DRBD8.0 PATCH] Updated fix to ensure stale state is not sent if a cluster wide state change is in progress
@ 2008-02-23 20:44 Graham, Simon
  2008-02-25 15:22 ` Philipp Reisner
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Graham, Simon @ 2008-02-23 20:44 UTC (permalink / raw)
  To: drbd-dev

[-- Attachment #1.1: Type: text/plain, Size: 424 bytes --]

This is an addendum to a patch I previously submitted (applied to the
8.0 tree as ID 6e9fdc92) that blocks sending state information if a
cluster wide state change is in progress - there was a hole in this
because the code in drbd_request_state() actually released the drbd
state lock BEFORE updating the state locally leaving a small window
where a request to send state can send the old stale state.

Simon

[-- Attachment #1.2: Type: text/html, Size: 3699 bytes --]

[-- Attachment #2: 0001-Close-window-where-old-state-can-be-sent-by-drbd_sen.patch --]
[-- Type: application/octet-stream, Size: 1392 bytes --]

From 1b4149bc5e0775e3f4d8680cffc5a90654467f62 Mon Sep 17 00:00:00 2001
From: Simon P. Graham <Simon.Graham@stratus.com>
Date: Sat, 23 Feb 2008 15:06:42 -0500
Subject: [PATCH] Close window where old state can be sent by drbd_send_state()

A previous change made this routine acquire the drbd_state_lock() to
ensure that it wont be sent when there is a cluster wide state change
in progress. Unfortunately, the code in drbd_request_state released
the lock before updating the local state which means that it was still
possible for drbd_send_state() to send stale data.

This fix sets the local state whilst still holding the state lock.
---
 drbd/drbd_main.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drbd/drbd_main.c b/drbd/drbd_main.c
index d03ed70..1d717cc 100644
--- a/drbd/drbd_main.c
+++ b/drbd/drbd_main.c
@@ -496,14 +496,18 @@ int _drbd_request_state(drbd_dev* mdev, drbd_state_t mask, drbd_state_t val,
 			if( f & ChgStateVerbose ) print_st_err(mdev,os,ns,rv);
 			return rv;
 		}
+
 		spin_lock_irqsave(&mdev->req_lock,flags);
 		os = mdev->state;
 		ns.i = (os.i & ~mask.i) | val.i;
+		rv = _drbd_set_state(mdev, ns, f);
+
 		drbd_state_unlock(mdev);
 	}
+	else {
+		rv = _drbd_set_state(mdev, ns, f);
+	}

-	rv = _drbd_set_state(mdev, ns, f);
-	ns = mdev->state;
 	spin_unlock_irqrestore(&mdev->req_lock,flags);

 	return rv;
-- 
1.5.4.rc1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Drbd-dev] [DRBD8.0 PATCH] Updated fix to ensure stale state is not sent if a cluster wide state change is in progress
  2008-02-23 20:44 [Drbd-dev] [DRBD8.0 PATCH] Updated fix to ensure stale state is not sent if a cluster wide state change is in progress Graham, Simon
@ 2008-02-25 15:22 ` Philipp Reisner
  2008-02-25 20:31 ` [Drbd-dev] I/O can hang on primary synctarget after an io error Montrose, Ernest
       [not found] ` <BD7042533C2F8943A6A4257A9E31C454F47ACB@EXNA.corp.str atus.com>
  2 siblings, 0 replies; 6+ messages in thread
From: Philipp Reisner @ 2008-02-25 15:22 UTC (permalink / raw)
  To: drbd-dev

Am Samstag, 23. Februar 2008 21:44:10 schrieb Graham, Simon:
> This is an addendum to a patch I previously submitted (applied to the
> 8.0 tree as ID 6e9fdc92) that blocks sending state information if a
> cluster wide state change is in progress - there was a hole in this
> because the code in drbd_request_state() actually released the drbd
> state lock BEFORE updating the state locally leaving a small window
> where a request to send state can send the old stale state.
>

Hi simon,

just applied. Thanks!

-phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Drbd-dev] I/O can hang on primary synctarget after an io error.
  2008-02-23 20:44 [Drbd-dev] [DRBD8.0 PATCH] Updated fix to ensure stale state is not sent if a cluster wide state change is in progress Graham, Simon
  2008-02-25 15:22 ` Philipp Reisner
@ 2008-02-25 20:31 ` Montrose, Ernest
  2008-02-25 21:06   ` Lars Ellenberg
       [not found] ` <BD7042533C2F8943A6A4257A9E31C454F47ACB@EXNA.corp.str atus.com>
  2 siblings, 1 reply; 6+ messages in thread
From: Montrose, Ernest @ 2008-02-25 20:31 UTC (permalink / raw)
  To: drbd-dev

[-- Attachment #1: Type: text/plain, Size: 2225 bytes --]

Hi all,
We are seeing an issue where I/O to a volume that received an I/O error
during re-sync as the sync target hangs. Looking at the logs it seems
that what's going on is that we are skipping a dec_local().  My theory
is that after_state_ch() is blocked forever waiting for local_cnt to be
0 as we are becoming Diskless.  So the worker will not do any work,
hence the hang I/O.  Here is the relevant logs:

Feb 13 03:48:55 node0 kernel: drbd5: Began resync as SyncTarget (will
sync 1048508 KB [262127 bits set]).
Feb 13 03:48:55 node0 kernel: drbd5: Writing meta data super block now.
Feb 13 03:48:55 node0 kernel: drbd5: Creating new epoch in
drbd_try_rs_begin_io
Feb 13 03:48:55 node0 kernel: drbd5: ***Simulating Resync write failure
Feb 13 03:48:56 node0 kernel: drbd5: Resync aborted.
Feb 13 03:48:56 node0 kernel: drbd5: conn( SyncTarget -> Connected )
disk( Inconsistent -> Failed )
Feb 13 03:48:56 node0 kernel: drbd5: Local IO failed. Detaching...
Feb 13 03:48:56 node0 kernel: drbd5: disk( Failed -> Diskless )
Feb 13 03:48:56 node0 kernel: drbd5: Notified peer that my disk is
broken.
Feb 13 03:48:56 node0 kernel: drbd5: Can not write resync data to local
disk.
Feb 13 03:54:57 node0 kernel: drbd5: drbd_nl_disk_conf: mdev->bc not
NULL.

Notice the last line of the log.  Our test environment must have tried
to do an "attach" so since local_cnt is not 0 we never freed the "bc".

But from the "Can not write resync data to local disk." we can go to
drbd_endio_write_sec() and there we see a suspicious :
If(bio->bi_size) return 1; 
We are supposed to do the dec_local at the end of drbd_endio_write
sec(). I am guessing that's where the problem is.  But I do not know why
bi_size would be greater then 0.  Is the fix simply to dec_local while
returning?

BTW, the code below  inserts the fault in question but won't necessarily
make the hang happens:

#!/bin/sh
echo inserting fault....
echo 0x4 >/sys/module/drbd/parameters/enable_faults
echo 0x20 >/sys/module/drbd/parameters/fault_devs
echo 5 > /sys/module/drbd/parameters/fault_rate
echo starting resync
sleep 5
drbdadm -c /etc/drbd.conf.avance invalidate drbd5.vol
echo done....

Thanks,

EM--

[-- Attachment #2: Type: text/html, Size: 12677 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Drbd-dev] I/O can hang on primary synctarget after an io error.
  2008-02-25 20:31 ` [Drbd-dev] I/O can hang on primary synctarget after an io error Montrose, Ernest
@ 2008-02-25 21:06   ` Lars Ellenberg
  0 siblings, 0 replies; 6+ messages in thread
From: Lars Ellenberg @ 2008-02-25 21:06 UTC (permalink / raw)
  To: drbd-dev

On Mon, Feb 25, 2008 at 03:31:01PM -0500, Montrose, Ernest wrote:
> 
>    Hi all,
> 
>    We are seeing an issue where I/O to a volume that received an I/O
>    error during re-sync as the sync target hangs. Looking at the logs it
>    seems that what's going on is that we are skipping a dec_local().  My
>    theory is that after_state_ch() is blocked forever waiting for
>    local_cnt to be 0 as we are becoming Diskless.  So the worker will not
>    do any work, hence the hang I/O.  Here is the relevant logs:
> 
> 
>    Feb 13 03:48:55 node0 kernel: drbd5: Began resync as SyncTarget (will
>    sync 1048508 KB [262127 bits set]).
>    Feb 13 03:48:55 node0 kernel: drbd5: Writing meta data super block
>    now.
>    Feb 13 03:48:55 node0 kernel: drbd5: Creating new epoch in
>    drbd_try_rs_begin_io
>    Feb 13 03:48:55 node0 kernel: drbd5: ***Simulating Resync write
>    failure
>    Feb 13 03:48:56 node0 kernel: drbd5: Resync aborted.
>    Feb 13 03:48:56 node0 kernel: drbd5: conn( SyncTarget -> Connected )
>    disk( Inconsistent -> Failed )
>    Feb 13 03:48:56 node0 kernel: drbd5: Local IO failed. Detaching...
>    Feb 13 03:48:56 node0 kernel: drbd5: disk( Failed -> Diskless )
>    Feb 13 03:48:56 node0 kernel: drbd5: Notified peer that my disk is
>    broken.
>    Feb 13 03:48:56 node0 kernel: drbd5: Can not write resync data to
>    local disk.
>    Feb 13 03:54:57 node0 kernel: drbd5: drbd_nl_disk_conf: mdev->bc not
>    NULL.
> 
> 
>    Notice the last line of the log.  Our test environment must have tried
>    to do an "attach" so since local_cnt is not 0 we never freed the "bc".
> 
> 
>    But from the "Can not write resync data to local disk." we can go to
>    drbd_endio_write_sec() and there we see a suspicious :
> 
>    If(bio->bi_size) return 1;

it's not suspicious. it's "standard procedure".
it even got removed from the internal kernel API recently.

>    We are supposed to do the dec_local at the end of drbd_endio_write
>    sec(). I am guessing that's where the problem is.  But I do not know
>    why bi_size would be greater then 0.  Is the fix simply to dec_local
>    while returning?

IF there is imbalance in the local refcounting, then elsewhere.
drbd_endio_write_sec is correct, afaics.

do you have this a009fc907a14f69026b32fbb48a4db6f1cdd5ecd
commit included in your code base?

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <BD7042533C2F8943A6A4257A9E31C454F47ACB@EXNA.corp.str atus.com>]

* RE: [Drbd-dev] I/O can hang on primary synctarget after an io error.
       [not found] ` <BD7042533C2F8943A6A4257A9E31C454F47ACB@EXNA.corp.str atus.com>
@ 2008-02-25 21:53   ` Montrose, Ernest
  2008-02-26 12:49     ` Lars Ellenberg
  0 siblings, 1 reply; 6+ messages in thread
From: Montrose, Ernest @ 2008-02-25 21:53 UTC (permalink / raw)
  To: Lars Ellenberg, drbd-dev

Lars,
We appear to have a009fc907a14f69026b32fbb48a4db6f1cdd5ecd.  Reading
your response what I get is that we are guaranteed that if we return
early in drbd_end_write_sec() then someone else would have done the
dec_local near the end or an inc_local was never done?  

Hmmm... our testing was not done with the latest git stuff.  I will do
some things with the latest.

Thanks!
EM--

-----Original Message-----
From: drbd-dev-bounces@linbit.com [mailto:drbd-dev-bounces@linbit.com]
On Behalf Of Lars Ellenberg
Sent: Monday, February 25, 2008 4:07 PM
To: drbd-dev@linbit.com
Subject: Re: [Drbd-dev] I/O can hang on primary synctarget after an io
error.

On Mon, Feb 25, 2008 at 03:31:01PM -0500, Montrose, Ernest wrote:
> 
>    Hi all,
> 
>    We are seeing an issue where I/O to a volume that received an I/O
>    error during re-sync as the sync target hangs. Looking at the logs
it
>    seems that what's going on is that we are skipping a dec_local().
My
>    theory is that after_state_ch() is blocked forever waiting for
>    local_cnt to be 0 as we are becoming Diskless.  So the worker will
not
>    do any work, hence the hang I/O.  Here is the relevant logs:
> 
> 
>    Feb 13 03:48:55 node0 kernel: drbd5: Began resync as SyncTarget
(will
>    sync 1048508 KB [262127 bits set]).
>    Feb 13 03:48:55 node0 kernel: drbd5: Writing meta data super block
>    now.
>    Feb 13 03:48:55 node0 kernel: drbd5: Creating new epoch in
>    drbd_try_rs_begin_io
>    Feb 13 03:48:55 node0 kernel: drbd5: ***Simulating Resync write
>    failure
>    Feb 13 03:48:56 node0 kernel: drbd5: Resync aborted.
>    Feb 13 03:48:56 node0 kernel: drbd5: conn( SyncTarget -> Connected
)
>    disk( Inconsistent -> Failed )
>    Feb 13 03:48:56 node0 kernel: drbd5: Local IO failed. Detaching...
>    Feb 13 03:48:56 node0 kernel: drbd5: disk( Failed -> Diskless )
>    Feb 13 03:48:56 node0 kernel: drbd5: Notified peer that my disk is
>    broken.
>    Feb 13 03:48:56 node0 kernel: drbd5: Can not write resync data to
>    local disk.
>    Feb 13 03:54:57 node0 kernel: drbd5: drbd_nl_disk_conf: mdev->bc
not
>    NULL.
> 
> 
>    Notice the last line of the log.  Our test environment must have
tried
>    to do an "attach" so since local_cnt is not 0 we never freed the
"bc".
> 
> 
>    But from the "Can not write resync data to local disk." we can go
to
>    drbd_endio_write_sec() and there we see a suspicious :
> 
>    If(bio->bi_size) return 1;

it's not suspicious. it's "standard procedure".
it even got removed from the internal kernel API recently.

>    We are supposed to do the dec_local at the end of drbd_endio_write
>    sec(). I am guessing that's where the problem is.  But I do not
know
>    why bi_size would be greater then 0.  Is the fix simply to
dec_local
>    while returning?

IF there is imbalance in the local refcounting, then elsewhere.
drbd_endio_write_sec is correct, afaics.

do you have this a009fc907a14f69026b32fbb48a4db6f1cdd5ecd
commit included in your code base?

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
_______________________________________________
drbd-dev mailing list
drbd-dev@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Drbd-dev] I/O can hang on primary synctarget after an io error.
  2008-02-25 21:53   ` Montrose, Ernest
@ 2008-02-26 12:49     ` Lars Ellenberg
  0 siblings, 0 replies; 6+ messages in thread
From: Lars Ellenberg @ 2008-02-26 12:49 UTC (permalink / raw)
  To: drbd-dev

On Mon, Feb 25, 2008 at 04:53:35PM -0500, Montrose, Ernest wrote:
> Lars,
> We appear to have a009fc907a14f69026b32fbb48a4db6f1cdd5ecd.  Reading
> your response what I get is that we are guaranteed that if we return
> early in drbd_end_write_sec() then someone else would have done the
> dec_local near the end or an inc_local was never done?  

this "return early" does not happen.
because almost no driver actually implemented partial completion
notification in the first place, and if some driver should indeed
implement it, it is guaranteed to eventually call us with the the bio
comlpetely done.
finally, you trigger the "io-error" by fault insertion,
which directly calls the endio handler with bi_size == 0.

> Hmmm... our testing was not done with the latest git stuff.
> I will do some things with the latest.

keep us posted.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-02-26 12:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-23 20:44 [Drbd-dev] [DRBD8.0 PATCH] Updated fix to ensure stale state is not sent if a cluster wide state change is in progress Graham, Simon
2008-02-25 15:22 ` Philipp Reisner
2008-02-25 20:31 ` [Drbd-dev] I/O can hang on primary synctarget after an io error Montrose, Ernest
2008-02-25 21:06   ` Lars Ellenberg
     [not found] ` <BD7042533C2F8943A6A4257A9E31C454F47ACB@EXNA.corp.str atus.com>
2008-02-25 21:53   ` Montrose, Ernest
2008-02-26 12:49     ` Lars Ellenberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox