[Drbd-dev] Crash in lru

Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed

* [Drbd-dev] Crash in lru_cache.c
@ 2008-01-10 19:00 Graham, Simon
  2008-01-10 20:19 ` Lars Ellenberg
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Graham, Simon @ 2008-01-10 19:00 UTC (permalink / raw)
  To: drbd-dev

I've been seeing an occasional crash (using DRBD8.0) recently in the
lru_cache.c file that looks like this:

Dec  5 05:57:09 ------------[ cut here ]------------
Dec  5 05:57:09 kernel BUG at
/test_logs/builds/SuperNova/trunk/20071205-r21536/src/platform/drbd/src/
drbd/lru_cache.c:312!
Dec  5 05:57:09 invalid opcode: 0000 [#1]
Dec  5 05:57:09 SMP 
Dec  5 05:57:09 Modules linked in: tun drbd cn ipmi_devintf ipmi_si
ipmi_msghandler bridge ipv6 binfmt_misc dm_mirror dm_multipath dm_mod
video thermal sbs processor i2c_ec i2c_core fan container button battery
asus_acpi ac parport_pc lp parport nvram ide_cd cdrom evdev intel_rng
pcspkr sg bnx2 shpchp piix zlib_inflate pci_hotplug serio_raw
serial_core rtc mptspi scsi_transport_spi ide_disk mptsas mptscsih
mptbase scsi_transport_sas sd_mod scsi_mod raid1 ehci_hcd ohci_hcd
uhci_hcd usbcore
Dec  5 05:57:09 CPU:    1
Dec  5 05:57:09 EIP:    0061:[<ee297564>]    Tainted: GF    VLI
Dec  5 05:57:09 EFLAGS: 00010046  (2.6.18-xen #1) 
Dec  5 05:57:09 EIP is at lc_put+0x84/0xc0 [drbd]
Dec  5 05:57:10 eax: 00000000  ebx: ee24e000  ecx: ed3a4000  edx:
ee24fc50
Dec  5 05:57:10 esi: ee24fc50  edi: ed3a4000  ebp: deb55e30  esp:
deb55e28
Dec  5 05:57:10 ds: 007b  es: 007b  ss: 0069
Dec  5 05:57:10 Process drbd5_asender (pid: 12691, ti=deb54000
task=e6d960f0 task.ti=deb54000)
Dec  5 05:57:10 Stack: ed3a43b0 00000001 deb55e70 ee29480e 00000000
00000000 00004100 deb55e48 
Dec  5 05:57:10        00000000 c031e320 00000000 00000000 deb55f38
ed3a4000 00000000 000000e6 
Dec  5 05:57:10        c8d32288 ed3a4000 deb55ee8 ee29149e 00000001
ffffffff 00000000 00000000 
Dec  5 05:57:10 Call Trace:
Dec  5 05:57:10  [<c0105dc1>] show_stack_log_lvl+0xb1/0xe0
Dec  5 05:57:10  [<c0105ffa>] show_registers+0x1aa/0x230
Dec  5 05:57:10  [<c01061b6>] die+0x136/0x300
Dec  5 05:57:10  [<c01063ff>] do_trap+0x7f/0xb0
Dec  5 05:57:10  [<c0106be7>] do_invalid_op+0x97/0xb0
Dec  5 05:57:10  [<c01058f3>] error_code+0x2b/0x30
Dec  5 05:57:10  [<ee29480e>] drbd_al_complete_io+0x6e/0x130 [drbd]
Dec  5 05:57:10  [<ee29149e>] _req_may_be_done+0x5ee/0x780 [drbd]
Dec  5 05:57:10  [<ee291993>] _req_mod+0x363/0xab0 [drbd]
Dec  5 05:57:10  [<ee29e7c1>] tl_release+0x51/0x1f0 [drbd]
Dec  5 05:57:10  [<ee28c576>] got_BarrierAck+0x16/0xb0 [drbd]
Dec  5 05:57:10  [<ee28d7b9>] drbd_asender+0x2e9/0x5a0 [drbd]
Dec  5 05:57:10  [<ee29ea0f>] drbd_thread_setup+0xaf/0xf0 [drbd]
Dec  5 05:57:10  [<c0103005>] kernel_thread_helper+0x5/0x10

The failing line of code is asserting in lc_put that the current ref
count is greater than zero. Now, I think this bug has been there for a
while if you are using protocol A or B and has now been exposed when
using protocol C because of the recent change to maintain the transfer
log for all protocols (i.e. it's my fault it got exposed!)

My theory is that the following occurred:

1. We were running normally; this means that the TL has at least one
entry most of the 
   time - this entry is a request that includes a reference to the AL
cache for that
   write operation.

2. The local disk is detached for some reason (failure, or 'drbdsetup
detach') - this
   causes the AL cache to be discarded

3. The local disk is reattached - the creates a brand spanking new AL
with no hot entries

4. We process a second write for the same AL area as the one above -
this will create a
   new hot entry in the cache, but the refcount will only be one even
though there are
   two I/O's outstanding for the AL area covered by the entry

5. Now we get a barrier ack that allows us to clear both entries from
the TL when we
   attempt to lc_put for the second one we crash because the ref count
is already zero.

So, first of all, does this seem a reaonsable explanation, or did I miss
something?

Secondly, assuming I'm right, I see a couple of possible solutions:

1. Remember in the req structure if this request has a reference to the
AL cache entry.
   When clearing the AL because of a detach, go through the TL list at
that time and
   clear the flag - thus when we eventually remove the entry, we wont
even try the
   lc_put.

2. When attached a disk, run through the current TL and allocate AL
entries for each
   request currently in the list. The problem with this is that the AL
cache size 
   might have changed in a way that doesn't allow sufficient hot entries
(i.e. the
   cache size is less than the number of unique entries required by the
current
   TL list.

Thoughts? I'm about to start on fixing this, so would welcome ideas...

Thanks,
Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Drbd-dev] Crash in lru_cache.c
  2008-01-10 19:00 [Drbd-dev] Crash in lru_cache.c Graham, Simon
@ 2008-01-10 20:19 ` Lars Ellenberg
  2008-01-10 20:31 ` Graham, Simon
       [not found] ` <342BAC0A5467384983B586A6B0B3767107C5AE95@EXNA.corp.s tratus.com>
  2 siblings, 0 replies; 8+ messages in thread
From: Lars Ellenberg @ 2008-01-10 20:19 UTC (permalink / raw)
  To: drbd-dev

On Thu, Jan 10, 2008 at 02:00:06PM -0500, Graham, Simon wrote:
> I've been seeing an occasional crash (using DRBD8.0) recently in the
> lru_cache.c file that looks like this:
>
> Dec  5 05:57:09 ------------[ cut here ]------------
> Dec  5 05:57:09 kernel BUG at
> /test_logs/builds/SuperNova/trunk/20071205-r21536/src/platform/drbd/src/
> drbd/lru_cache.c:312!

in what exact codebase do you see this?
up to which point have you merged upstream drbd-8.0.git?
what local patches are applied?

> Dec  5 05:57:09 invalid opcode: 0000 [#1]
> Dec  5 05:57:09 SMP 
> Dec  5 05:57:09 Modules linked in: tun drbd cn ipmi_devintf ipmi_si
> ipmi_msghandler bridge ipv6 binfmt_misc dm_mirror dm_multipath dm_mod
> video thermal sbs processor i2c_ec i2c_core fan container button battery
> asus_acpi ac parport_pc lp parport nvram ide_cd cdrom evdev intel_rng
> pcspkr sg bnx2 shpchp piix zlib_inflate pci_hotplug serio_raw
> serial_core rtc mptspi scsi_transport_spi ide_disk mptsas mptscsih
> mptbase scsi_transport_sas sd_mod scsi_mod raid1 ehci_hcd ohci_hcd
> uhci_hcd usbcore
> Dec  5 05:57:09 CPU:    1
> Dec  5 05:57:09 EIP:    0061:[<ee297564>]    Tainted: GF    VLI
> Dec  5 05:57:09 EFLAGS: 00010046  (2.6.18-xen #1) 
> Dec  5 05:57:09 EIP is at lc_put+0x84/0xc0 [drbd]
> Dec  5 05:57:10 eax: 00000000  ebx: ee24e000  ecx: ed3a4000  edx: ee24fc50
> Dec  5 05:57:10 esi: ee24fc50  edi: ed3a4000  ebp: deb55e30  esp: deb55e28
> Dec  5 05:57:10 ds: 007b  es: 007b  ss: 0069
> Dec  5 05:57:10 Process drbd5_asender (pid: 12691, ti=deb54000 task=e6d960f0 task.ti=deb54000)
> Dec  5 05:57:10 Stack: ed3a43b0 00000001 deb55e70 ee29480e 00000000 00000000 00004100 deb55e48 
> Dec  5 05:57:10        00000000 c031e320 00000000 00000000 deb55f38 ed3a4000 00000000 000000e6 
> Dec  5 05:57:10        c8d32288 ed3a4000 deb55ee8 ee29149e 00000001 ffffffff 00000000 00000000 
> Dec  5 05:57:10 Call Trace:
> Dec  5 05:57:10  [<c0105dc1>] show_stack_log_lvl+0xb1/0xe0
> Dec  5 05:57:10  [<c0105ffa>] show_registers+0x1aa/0x230
> Dec  5 05:57:10  [<c01061b6>] die+0x136/0x300
> Dec  5 05:57:10  [<c01063ff>] do_trap+0x7f/0xb0
> Dec  5 05:57:10  [<c0106be7>] do_invalid_op+0x97/0xb0
> Dec  5 05:57:10  [<c01058f3>] error_code+0x2b/0x30
> Dec  5 05:57:10  [<ee29480e>] drbd_al_complete_io+0x6e/0x130 [drbd]
> Dec  5 05:57:10  [<ee29149e>] _req_may_be_done+0x5ee/0x780 [drbd]
> Dec  5 05:57:10  [<ee291993>] _req_mod+0x363/0xab0 [drbd]
> Dec  5 05:57:10  [<ee29e7c1>] tl_release+0x51/0x1f0 [drbd]
> Dec  5 05:57:10  [<ee28c576>] got_BarrierAck+0x16/0xb0 [drbd]
> Dec  5 05:57:10  [<ee28d7b9>] drbd_asender+0x2e9/0x5a0 [drbd]
> Dec  5 05:57:10  [<ee29ea0f>] drbd_thread_setup+0xaf/0xf0 [drbd]
> Dec  5 05:57:10  [<c0103005>] kernel_thread_helper+0x5/0x10
> 
> The failing line of code is asserting in lc_put that the current ref
> count is greater than zero. Now, I think this bug has been there for a
> while if you are using protocol A or B and has now been exposed when
> using protocol C because of the recent change to maintain the transfer
> log for all protocols (i.e. it's my fault it got exposed!)
> 
> My theory is that the following occurred:
> 
> 1. We were running normally; this means that the TL has at least one
> entry most of the time - this entry is a request that includes a
> reference to the AL cache for that write operation.
> 
> 2. The local disk is detached for some reason (failure, or 'drbdsetup
> detach') - this causes the AL cache to be discarded
> 
> 3. The local disk is reattached - the creates a brand spanking new AL
> with no hot entries
> 
> 4. We process a second write for the same AL area as the one above -
> this will create a new hot entry in the cache, but the refcount will
> only be one even though there are two I/O's outstanding for the AL
> area covered by the entry
> 
> 5. Now we get a barrier ack that allows us to clear both entries from
> the TL when we attempt to lc_put for the second one we crash because
> the ref count is already zero.
> 
> So, first of all, does this seem a reaonsable explanation, or did I miss
> something?
> 
> Secondly, assuming I'm right, I see a couple of possible solutions:
> 
> 1. Remember in the req structure if this request has a reference to the
> AL cache entry.  When clearing the AL because of a detach, go through
> the TL list at that time and clear the flag - thus when we eventually
> remove the entry, we wont even try the lc_put.
> 
> 2. When attached a disk, run through the current TL and allocate AL
> entries for each request currently in the list. The problem with this
> is that the AL cache size might have changed in a way that doesn't
> allow sufficient hot entries (i.e. the cache size is less than the
> number of unique entries required by the current TL list.
> 
> Thoughts? I'm about to start on fixing this, so would welcome ideas...

in my version of drbd-8.0,
that would be in this code path:
                if (s & RQ_LOCAL_MASK) {
                        if (inc_local_if_state(mdev,Failed)) {
                                drbd_al_complete_io(mdev, req->sector);
                                dec_local(mdev);
                        } else {
                                WARN("Should have called drbd_al_complete_io(, %llu), "
                                     "but my Disk seems to have failed:(\n",
                                     (unsigned long long) req->sector);
                        }
                }

I don't see why there could possibly be requests in the tl
that have (s & RQ_LOCAL_MASK) when there is no disk.
if there are, that is the real bug, I think.

other than that, what about

3. when attaching a disk,
   suspend incoming requests and wait for the tl to become empty.
   then attach, and resume.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [Drbd-dev] Crash in lru_cache.c
  2008-01-10 19:00 [Drbd-dev] Crash in lru_cache.c Graham, Simon
  2008-01-10 20:19 ` Lars Ellenberg
@ 2008-01-10 20:31 ` Graham, Simon
  2008-01-12 13:51   ` Lars Ellenberg
       [not found] ` <342BAC0A5467384983B586A6B0B3767107C5AE95@EXNA.corp.s tratus.com>
  2 siblings, 1 reply; 8+ messages in thread
From: Graham, Simon @ 2008-01-10 20:31 UTC (permalink / raw)
  To: Lars Ellenberg, drbd-dev

> > Dec  5 05:57:09 ------------[ cut here ]------------
> > Dec  5 05:57:09 kernel BUG at
> > /test_logs/builds/SuperNova/trunk/20071205-
> r21536/src/platform/drbd/src/
> > drbd/lru_cache.c:312!
> 
> in what exact codebase do you see this?
> up to which point have you merged upstream drbd-8.0.git?
> what local patches are applied?
> 

Yes - sorry... this is 8.0.4 plus a bunch of the fixes that are in 8.0.8
(but not all) plus a few more than T haven't submitted yet (but I will
once I wrestle git into submission); the specific change that exposes
this that I have pulled is the one to use the TL for Protocol C as well
as A and B -- however, I think this bug exists IF you are using A or B
without this fix.

> that would be in this code path:
>                 if (s & RQ_LOCAL_MASK) {
>                         if (inc_local_if_state(mdev,Failed)) {
>                                 drbd_al_complete_io(mdev,
req->sector);
>                                 dec_local(mdev);
>                         } else {
>                                 WARN("Should have called
> drbd_al_complete_io(, %llu), "
>                                      "but my Disk seems to have
> failed:(\n",
>                                      (unsigned long long)
req->sector);
>                         }
>                 }
> 

Exactly.

> I don't see why there could possibly be requests in the tl
> that have (s & RQ_LOCAL_MASK) when there is no disk.

Because there WAS a disk when the request was issued - in fact, the
local write to disk completed successfully, but the request is still
sitting in the TL waiting for the next barrier to complete. Subsequent
to that but while the request is still in the TL, the local disk is
detached.

> other than that, what about
> 
> 3. when attaching a disk,
>    suspend incoming requests and wait for the tl to become empty.
>    then attach, and resume.
> 

I think this might work but only as a side effect -- if you look back to
the sequence I documented, you will see that there has to be a write
request to the same AL area after the disk is reattached - this is
because drbd_al_complete_io quietly ignores the case where no active AL
extent is found for the request being completed. You would also need to
trigger a barrier op in this case to force the TL to be flushed.

Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Drbd-dev] Crash in lru_cache.c
  2008-01-10 20:31 ` Graham, Simon
@ 2008-01-12 13:51   ` Lars Ellenberg
  0 siblings, 0 replies; 8+ messages in thread
From: Lars Ellenberg @ 2008-01-12 13:51 UTC (permalink / raw)
  To: drbd-dev

On Thu, Jan 10, 2008 at 03:31:02PM -0500, Graham, Simon wrote:
> > > Dec  5 05:57:09 ------------[ cut here ]------------
> > > Dec  5 05:57:09 kernel BUG at
> > > /test_logs/builds/SuperNova/trunk/20071205-
> > r21536/src/platform/drbd/src/
> > > drbd/lru_cache.c:312!
> > 
> > in what exact codebase do you see this?
> > up to which point have you merged upstream drbd-8.0.git?
> > what local patches are applied?
> > 
> 
> Yes - sorry... this is 8.0.4 plus a bunch of the fixes that are in 8.0.8
> (but not all) plus a few more than T haven't submitted yet (but I will
> once I wrestle git into submission); the specific change that exposes
> this that I have pulled is the one to use the TL for Protocol C as well
> as A and B -- however, I think this bug exists IF you are using A or B
> without this fix.
> 
> > that would be in this code path:
> > if (s & RQ_LOCAL_MASK) {
> >         if (inc_local_if_state(mdev,Failed)) {
> >                 drbd_al_complete_io(mdev, req->sector);
> >                 dec_local(mdev);
> >         } else {
> >                 WARN("Should have called drbd_al_complete_io(, %llu), "
> >                      "but my Disk seems to have failed:(\n",
> >                      (unsigned long long) req->sector);
> >         }
> > }
> > 
> 
> Exactly.
> 
> > I don't see why there could possibly be requests in the tl
> > that have (s & RQ_LOCAL_MASK) when there is no disk.
> 
> Because there WAS a disk when the request was issued - in fact, the
> local write to disk completed successfully, but the request is still
> sitting in the TL waiting for the next barrier to complete. Subsequent
> to that but while the request is still in the TL, the local disk is
> detached.

AND it is re-attached so fast,
that we have a new (uhm; well, probably the same?) disk again,
while still the very same request is sitting there
waiting for that very barrier ack?

now, how unlikely is THAT to happen in real life.

but I think I understand your scenario.

but how do you test this, actually?
inject io failures, and trigger a re-attach
as soon as you see the detach event?

is that to implement a "hot-spare" feature?

> > other than that, what about
> > 
> > 3. when attaching a disk,
> >    suspend incoming requests and wait for the tl to become empty.
> >    then attach, and resume.
> > 
> 
> I think this might work but only as a side effect -- if you look back to
> the sequence I documented, you will see that there has to be a write
> request to the same AL area after the disk is reattached - this is
> because drbd_al_complete_io quietly ignores the case where no active AL
> extent is found for the request being completed.

huh?
I simply disallow re-attaching while there are still requests pending
from before the detach.
no more (s & RQ_LOCAL_MASK), no more un-accounted for references.

if I understand correctly,
you can reproduce this easily.
to underline my point,
does it still trigger when you do
 "dd if=/dev/drbdX of=/dev/null bs=1b count=1 iflag=direct ; sleep 5"
before the re-attach?
(the dd, even if it only reads, due to using directio,
 and drbd being diskless,
 will trigger any pending barrier to be sent)

> You would also need to trigger a barrier op in this case to force the
> TL to be flushed.

for other reasons, I think we need to rewrite the barrier code anyways
to send out the barrier as soon as possible, and not wait until the next
io request comes in.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :

^ permalink raw reply	[flat|nested] 8+ messages in thread

[parent not found: <342BAC0A5467384983B586A6B0B3767107C5AE95@EXNA.corp.s tratus.com>]

* RE: [Drbd-dev] Crash in lru_cache.c
       [not found] ` <342BAC0A5467384983B586A6B0B3767107C5AE95@EXNA.corp.s tratus.com>
@ 2008-01-12 15:23   ` Graham, Simon
  2008-01-12 17:04     ` Lars Ellenberg
  2008-01-12 23:37     ` Graham, Simon
  0 siblings, 2 replies; 8+ messages in thread
From: Graham, Simon @ 2008-01-12 15:23 UTC (permalink / raw)
  To: Lars Ellenberg, drbd-dev

> > Because there WAS a disk when the request was issued - in fact, the
> > local write to disk completed successfully, but the request is still
> > sitting in the TL waiting for the next barrier to complete.
> Subsequent
> > to that but while the request is still in the TL, the local disk is
> > detached.
> 
> AND it is re-attached so fast,
> that we have a new (uhm; well, probably the same?) disk again,
> while still the very same request is sitting there
> waiting for that very barrier ack?
> 

You got it!

> now, how unlikely is THAT to happen in real life.
> 

Fairly rare I agree although someone could do a 'drbdadm detach' and
then 'drbdadm attach' -- that's how we hit this situation (and the
reason for THAT is as a way to test errors on meta-data reads)

Given that there is no real boundary on the lifetime of a request in the
TL, it's also feasible (although unlikely I agree) that a disk could
fail and be replaced and reattached whilst an old request is still in
the TL...

> > I think this might work but only as a side effect -- if you look
back
> to
> > the sequence I documented, you will see that there has to be a write
> > request to the same AL area after the disk is reattached - this is
> > because drbd_al_complete_io quietly ignores the case where no active
> AL
> > extent is found for the request being completed.
> 
> huh?
> I simply disallow re-attaching while there are still requests pending
> from before the detach.
> no more (s & RQ_LOCAL_MASK), no more un-accounted for references.
> 

Yes but those requests that have unaccounted references from before the
detach are still in the TL -- it so happens that the code does not crash
in this case (completing a request in the TL when there is no matching
AL cache entry) but that's not very safe I think.

You also have to trigger a barrier as part of this -- not only block new
requests during attach until the TL is empty but also trigger a barrier
so that the TL will be emptied...

Both of these are why I like the idea of "reconnecting" the requests in
the TL to the AL cache when doing an attach...

> if I understand correctly,
> you can reproduce this easily.
> to underline my point,
> does it still trigger when you do
>  "dd if=/dev/drbdX of=/dev/null bs=1b count=1 iflag=direct ; sleep 5"
> before the re-attach?

So, the real test is to do this _before_ the DETACH, then see what
happens when the requests are removed from the TL.

> for other reasons, I think we need to rewrite the barrier code anyways
> to send out the barrier as soon as possible, and not wait until the
> next
> io request comes in.

That's an interesting idea -- it would also allow you to use the Linux
barrier mechanism to implement. Still wouldn't handle this case I think
though -- you can have requests in the TL that do not yet require a
barrier when you lose the local disk...

Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Drbd-dev] Crash in lru_cache.c
  2008-01-12 15:23   ` Graham, Simon
@ 2008-01-12 17:04     ` Lars Ellenberg
  2008-01-12 23:37     ` Graham, Simon
  1 sibling, 0 replies; 8+ messages in thread
From: Lars Ellenberg @ 2008-01-12 17:04 UTC (permalink / raw)
  To: Graham, Simon; +Cc: drbd-dev

On Sat, Jan 12, 2008 at 10:23:58AM -0500, Graham, Simon wrote:
> > > Because there WAS a disk when the request was issued - in fact, the
> > > local write to disk completed successfully, but the request is still
> > > sitting in the TL waiting for the next barrier to complete.
> > Subsequent
> > > to that but while the request is still in the TL, the local disk is
> > > detached.
> > 
> > AND it is re-attached so fast,
> > that we have a new (uhm; well, probably the same?) disk again,
> > while still the very same request is sitting there
> > waiting for that very barrier ack?
> > 
> 
> You got it!
> 
> > now, how unlikely is THAT to happen in real life.
> > 
> 
> Fairly rare I agree although someone could do a 'drbdadm detach' and
> then 'drbdadm attach' -- that's how we hit this situation (and the
> reason for THAT is as a way to test errors on meta-data reads)
> 
> Given that there is no real boundary on the lifetime of a request in the
> TL, it's also feasible (although unlikely I agree) that a disk could
> fail and be replaced and reattached whilst an old request is still in
> the TL...

well, there is.
the request will only live in the tl until either
 - connection is lost, and we call tl_clear
 - the corresponding barrier ack comes in

right, currently, a barrier is not sent when the epoch closes,
but before the next epoch start, which may be a very long time.
but, we are changing this anyways, and will now send the barrier
as soon as we close the current epoch.
once that is done, soon (milliseconds) after any request is reported as
completed to upper layers (which is the event that is causing the
current epoch to close, the barrier to be send),
it will also be cleared from the tl.

> > > I think this might work but only as a side effect -- if you look
> back
> > to
> > > the sequence I documented, you will see that there has to be a write
> > > request to the same AL area after the disk is reattached - this is
> > > because drbd_al_complete_io quietly ignores the case where no active
> > AL
> > > extent is found for the request being completed.
> > 
> > huh?
> > I simply disallow re-attaching while there are still requests pending
> > from before the detach.
> > no more (s & RQ_LOCAL_MASK), no more un-accounted for references.
> > 
> 
> Yes but those requests that have unaccounted references from before the
> detach are still in the TL 

no they are not, I just said I would not allow an attach
while they are still in there.

> -- it so happens that the code does not crash
> in this case (completing a request in the TL when there is no matching
> AL cache entry) but that's not very safe I think.
> 
> You also have to trigger a barrier as part of this -- not only block new
> requests during attach until the TL is empty but also trigger a barrier
> so that the TL will be emptied...

as outlined earlier, and implemented next week hopefully,
barriers will be sent as soon as the old epoch is closed,
not only when the first new request for the new epoch comes in.

> Both of these are why I like the idea of "reconnecting" the requests in
> the TL to the AL cache when doing an attach...
> 
> > if I understand correctly,
> > you can reproduce this easily.
> > to underline my point,
> > does it still trigger when you do
> >  "dd if=/dev/drbdX of=/dev/null bs=1b count=1 iflag=direct ; sleep 5"
> > before the re-attach?
> 
> So, the real test is to do this _before_ the DETACH, then see what
> happens when the requests are removed from the TL.

no. only a remote read can trigger a barrier.
as long as i have valid local data, all reads are local.

> > for other reasons, I think we need to rewrite the barrier code anyways
> > to send out the barrier as soon as possible, and not wait until the
> > next io request comes in.
> 
> That's an interesting idea -- it would also allow you to use the Linux
> barrier mechanism to implement. Still wouldn't handle this case I think
> though -- you can have requests in the TL that do not yet require a
> barrier when you lose the local disk...

sure I can have requests there, but they are not yet completed to upper
layers.  if they are, their correponding barrier will have been send out
already.

for attach, we would then do
  block new incomming request
  wait for ap count to reach zero
  [in current code, send out a barrier now;
   with the idea outline above, there is no need for that]
  wait for the lates barrier ack
      (tl now empty)
  attach
  unblock

am I still missing something?

-- 
: Lars Ellenberg                            Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [Drbd-dev] Crash in lru_cache.c
  2008-01-12 15:23   ` Graham, Simon
  2008-01-12 17:04     ` Lars Ellenberg
@ 2008-01-12 23:37     ` Graham, Simon
  2008-01-13  3:14       ` Lars Ellenberg
  1 sibling, 1 reply; 8+ messages in thread
From: Graham, Simon @ 2008-01-12 23:37 UTC (permalink / raw)
  To: Lars Ellenberg; +Cc: drbd-dev

> sure I can have requests there, but they are not yet completed to
upper
> layers.  if they are, their correponding barrier will have been send
> out
> already.
> 
> for attach, we would then do
>   block new incomming request
>   wait for ap count to reach zero
>   [in current code, send out a barrier now;
>    with the idea outline above, there is no need for that]
>   wait for the lates barrier ack
>       (tl now empty)
>   attach
>   unblock
> 
> am I still missing something?

I think that works. 

What's the appropriate mechanism for blocking new requests? There are
existing mechanisms based on locking the AL cache entry, but since there
is no AL at this time, we cant use that...

Thanks for the ideas!
Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Drbd-dev] Crash in lru_cache.c
  2008-01-12 23:37     ` Graham, Simon
@ 2008-01-13  3:14       ` Lars Ellenberg
  0 siblings, 0 replies; 8+ messages in thread
From: Lars Ellenberg @ 2008-01-13  3:14 UTC (permalink / raw)
  To: drbd-dev

On Sat, Jan 12, 2008 at 06:37:47PM -0500, Graham, Simon wrote:
> > sure I can have requests there, but they are not yet completed to
> upper
> > layers.  if they are, their correponding barrier will have been send
> > out
> > already.
> > 
> > for attach, we would then do
> >   block new incomming request
> >   wait for ap count to reach zero
> >   [in current code, send out a barrier now;
> >    with the idea outline above, there is no need for that]
> >   wait for the lates barrier ack
> >       (tl now empty)
> >   attach
> >   unblock
> > 
> > am I still missing something?
> 
> I think that works. 
> 
> What's the appropriate mechanism for blocking new requests? There are
> existing mechanisms based on locking the AL cache entry, but since there
> is no AL at this time, we cant use that...

there is an other one, examining the drbd state,
right as the first statement in

	drbd_make_request_common
	  inc_ap_bio
	    __inc_ap_bio_cond

 :-)

I have to think about whether we need yet an other itermediate state.
probably blocking anything for Diskless < state < Inconsistent
would be enough.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-01-13  3:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-10 19:00 [Drbd-dev] Crash in lru_cache.c Graham, Simon
2008-01-10 20:19 ` Lars Ellenberg
2008-01-10 20:31 ` Graham, Simon
2008-01-12 13:51   ` Lars Ellenberg
     [not found] ` <342BAC0A5467384983B586A6B0B3767107C5AE95@EXNA.corp.s tratus.com>
2008-01-12 15:23   ` Graham, Simon
2008-01-12 17:04     ` Lars Ellenberg
2008-01-12 23:37     ` Graham, Simon
2008-01-13  3:14       ` Lars Ellenberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox