Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
* [Drbd-dev] Re: [DRBD-cvs] svn commit by phil - r1985 - branches/drbd-0.7/drbd - Fixed a self made SMP lockup; showing up in drbd_al_com
       [not found] <20051017134012.CB1741431D@mail.linbit.com>
@ 2005-10-17 13:43 ` Lars Marowsky-Bree
  2005-10-17 13:56   ` Lars Ellenberg
  0 siblings, 1 reply; 4+ messages in thread
From: Lars Marowsky-Bree @ 2005-10-17 13:43 UTC (permalink / raw)
  To: drbd-dev

On 2005-10-17T15:40:12, drbd-cvs@lists.linbit.com wrote:

> Author: phil
> Date: 2005-10-17 15:40:11 +0200 (Mon, 17 Oct 2005)
> New Revision: 1985
> 
> Modified:
>    branches/drbd-0.7/drbd/drbd_bitmap.c
> Log:
> Fixed a self made SMP lockup; showing up in drbd_al_complete_io()
> 
>   With drbd-0.7.12 we moved the al_lock before the bm_clear_bit()
>   in  __drbd_set_in_sync(). That by itself is okay, but 
>   in bm_clear_bit() we used the spin_[un]lock_irq() functions,
>   therefore reenabling interrupts...
>   This is fixed now.

Hi Philipp, how critical is this bug, and how likely is it that it will
be hit in practice?


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Drbd-dev] Re: [DRBD-cvs] svn commit by phil - r1985 - branches/drbd-0.7/drbd - Fixed a self made SMP lockup; showing up in drbd_al_com
  2005-10-17 13:43 ` [Drbd-dev] Re: [DRBD-cvs] svn commit by phil - r1985 - branches/drbd-0.7/drbd - Fixed a self made SMP lockup; showing up in drbd_al_com Lars Marowsky-Bree
@ 2005-10-17 13:56   ` Lars Ellenberg
  2005-10-17 13:57     ` Lars Marowsky-Bree
  0 siblings, 1 reply; 4+ messages in thread
From: Lars Ellenberg @ 2005-10-17 13:56 UTC (permalink / raw)
  To: drbd-dev

/ 2005-10-17 15:43:49 +0200
\ Lars Marowsky-Bree:
> On 2005-10-17T15:40:12, drbd-cvs@lists.linbit.com wrote:
> 
> > Author: phil
> > Date: 2005-10-17 15:40:11 +0200 (Mon, 17 Oct 2005)
> > New Revision: 1985
> > 
> > Modified:
> >    branches/drbd-0.7/drbd/drbd_bitmap.c
> > Log:
> > Fixed a self made SMP lockup; showing up in drbd_al_complete_io()
> > 
> >   With drbd-0.7.12 we moved the al_lock before the bm_clear_bit()
> >   in  __drbd_set_in_sync(). That by itself is okay, but 
> >   in bm_clear_bit() we used the spin_[un]lock_irq() functions,
> >   therefore reenabling interrupts...
> >   This is fixed now.
> 
> Hi Philipp, how critical is this bug, and how likely is it that it will
> be hit in practice?

depending on the actual timing...

once we started looking for it, we could reproduce it on our
test cluster within minutes.  it occurs on Primary/SyncSource which
is the typical case.

it will be hit. rating is CRITICAL (for smp).

there will be a 0.7.14 today, latest tomorrow, including this fix,
and the fix for not notifying the peer about "io-error on read".

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Drbd-dev] Re: [DRBD-cvs] svn commit by phil - r1985 - branches/drbd-0.7/drbd - Fixed a self made SMP lockup; showing up in drbd_al_com
  2005-10-17 13:56   ` Lars Ellenberg
@ 2005-10-17 13:57     ` Lars Marowsky-Bree
  2005-10-17 14:15       ` Lars Ellenberg
  0 siblings, 1 reply; 4+ messages in thread
From: Lars Marowsky-Bree @ 2005-10-17 13:57 UTC (permalink / raw)
  To: drbd-dev

On 2005-10-17T15:56:18, Lars Ellenberg <Lars.Ellenberg@linbit.com> wrote:

> it will be hit. rating is CRITICAL (for smp).
> 
> there will be a 0.7.14 today, latest tomorrow, including this fix,
> and the fix for not notifying the peer about "io-error on read".

Great, thank you. I'll pull that into SLES9 then.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Drbd-dev] Re: [DRBD-cvs] svn commit by phil - r1985 - branches/drbd-0.7/drbd - Fixed a self made SMP lockup; showing up in drbd_al_com
  2005-10-17 13:57     ` Lars Marowsky-Bree
@ 2005-10-17 14:15       ` Lars Ellenberg
  0 siblings, 0 replies; 4+ messages in thread
From: Lars Ellenberg @ 2005-10-17 14:15 UTC (permalink / raw)
  To: drbd-dev

/ 2005-10-17 15:57:49 +0200
\ Lars Marowsky-Bree:
> On 2005-10-17T15:56:18, Lars Ellenberg <Lars.Ellenberg@linbit.com> wrote:
> 
> > it will be hit. rating is CRITICAL (for smp).
> > 
> > there will be a 0.7.14 today, latest tomorrow, including this fix,
> > and the fix for not notifying the peer about "io-error on read".
> 
> Great, thank you. I'll pull that into SLES9 then.

besser is das.

it is a race, and it will only trigger if you have seriously high
application io load during drbd resync on a Primary/SyncSource.

But yes, SLES9 SMP systems probably will match that description most of
the time, and if you then reconnect the secondary after some downtime ...

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-10-17 14:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20051017134012.CB1741431D@mail.linbit.com>
2005-10-17 13:43 ` [Drbd-dev] Re: [DRBD-cvs] svn commit by phil - r1985 - branches/drbd-0.7/drbd - Fixed a self made SMP lockup; showing up in drbd_al_com Lars Marowsky-Bree
2005-10-17 13:56   ` Lars Ellenberg
2005-10-17 13:57     ` Lars Marowsky-Bree
2005-10-17 14:15       ` Lars Ellenberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox