All of lore.kernel.org
 help / color / mirror / Atom feed
* [Drbd-dev] DRBD-8: BUG when disk write errors occur during heavy I/O
@ 2007-01-06  5:15 Graham, Simon
  2007-01-08  9:51 ` Philipp Reisner
  0 siblings, 1 reply; 2+ messages in thread
From: Graham, Simon @ 2007-01-06  5:15 UTC (permalink / raw)
  To: drbd-dev

We have encountered a BUG crash when inserting real disk errors during I/O as follows:

drbd0: drbd_md_sync_page_io(,8191929s,WRITE) failed!
drbd0: Notified peer that my disk is broken.
Jan  5 06:24:09  1:0:28:0: rejecting I/O to dead device
drbd0: got an _req_mod() errno of -5
drbd0: Local WRITE failed sec=675944s size=4096
tennille kernel: drbd0: got an _req_mod() errno of -5
------------[ cut here ]------------
kernel BUG at /test_logs/builds/SuperNova/trunk/070105/platform/drbd/src/drbd/lru_cache.c:120!

This is actually in this code:

struct lc_element* lc_find(struct lru_cache* lc, unsigned int enr)
{
    struct hlist_node *n;
    struct lc_element *e;

    BUG_ON(!lc);

called from 

void drbd_al_complete_io(struct Drbd_Conf *mdev, sector_t sector)
{
...
    spin_lock_irqsave(&mdev->al_lock,flags);

    extent = lc_find(mdev->act_log,enr);

So the act_log field was NULL when the lc_find executed.

Now, I believe the following is what happened:

1. We had a write error in the meta-data region of the disk -- the code I added a while back forces
   the error to be processed and will change the state to Diskless. This code path blocks waiting
   for the mdev->local_cnt to reach zero (which it isn't because there's a bunch of I/O outstanding)

2. The last outstanding local write completes (either with or without an error) and we end up running
   req_mod with write_completed_with_error or completed_ok. This code does a dec_local() BEFORE calling
   req_may_be_done -- thus it's entirely possible for the stalled code from above that is waiting
   for local_cnt to reach zero will run and release the act_log and resync data.

3. Now the req_may_be_done() call for the last I/O is called which calls drbd_al_complete_io which
   calls lc_find which BUGs because act_log is now NULL.

Now, it seems to me there are a couple of ways to fix this:

1. We could delay calling dec_local() until all the code that might reference fields in the mdev is
   done -- i.e. after _req_may_be_done is called - I'm worried this might cause problems though.

2. Change drbd_al_complete_io to check act_log inside the spin lock. Also change after_state_change
   to acquire the spinlock before freeing act_log and resync AND any other places that use act_log and
   resync to check for NULL. I'd be worried about finding all the possible places with this fix.

So -- I'm looking for guidance on the best way to fix this sycnronization issue
Thanks,
Simon
    

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Drbd-dev] DRBD-8: BUG when disk write errors occur during heavy I/O
  2007-01-06  5:15 [Drbd-dev] DRBD-8: BUG when disk write errors occur during heavy I/O Graham, Simon
@ 2007-01-08  9:51 ` Philipp Reisner
  0 siblings, 0 replies; 2+ messages in thread
From: Philipp Reisner @ 2007-01-08  9:51 UTC (permalink / raw)
  To: drbd-dev

Am Samstag, 6. Januar 2007 06:15 schrieb Graham, Simon:
> We have encountered a BUG crash when inserting real disk errors during I/O
> as follows:
>
> drbd0: drbd_md_sync_page_io(,8191929s,WRITE) failed!
> drbd0: Notified peer that my disk is broken.
> Jan  5 06:24:09  1:0:28:0: rejecting I/O to dead device
> drbd0: got an _req_mod() errno of -5
> drbd0: Local WRITE failed sec=675944s size=4096
> tennille kernel: drbd0: got an _req_mod() errno of -5
> ------------[ cut here ]------------
> kernel BUG at
> /test_logs/builds/SuperNova/trunk/070105/platform/drbd/src/drbd/lru_cache.c
>:120!
>
> This is actually in this code:
>
> struct lc_element* lc_find(struct lru_cache* lc, unsigned int enr)
> {
>     struct hlist_node *n;
>     struct lc_element *e;
>
>     BUG_ON(!lc);
>
> called from
>
> void drbd_al_complete_io(struct Drbd_Conf *mdev, sector_t sector)
> {
> ...
>     spin_lock_irqsave(&mdev->al_lock,flags);
>
>     extent = lc_find(mdev->act_log,enr);
>
> So the act_log field was NULL when the lc_find executed.
>
> Now, I believe the following is what happened:
>
> 1. We had a write error in the meta-data region of the disk -- the code I
> added a while back forces the error to be processed and will change the
> state to Diskless. This code path blocks waiting for the mdev->local_cnt to
> reach zero (which it isn't because there's a bunch of I/O outstanding)
>
> 2. The last outstanding local write completes (either with or without an
> error) and we end up running req_mod with write_completed_with_error or
> completed_ok. This code does a dec_local() BEFORE calling req_may_be_done
> -- thus it's entirely possible for the stalled code from above that is
> waiting for local_cnt to reach zero will run and release the act_log and
> resync data.
>
> 3. Now the req_may_be_done() call for the last I/O is called which calls
> drbd_al_complete_io which calls lc_find which BUGs because act_log is now
> NULL.
>
> Now, it seems to me there are a couple of ways to fix this:
>
> 1. We could delay calling dec_local() until all the code that might
> reference fields in the mdev is done -- i.e. after _req_may_be_done is
> called - I'm worried this might cause problems though.

Simon, this is an excellent description of what is going on. I also have gone
through it as well, and think that moving dec_local() is the correct 
solution.

Just have just committed it
http://lists.linbit.com/pipermail/drbd-cvs/2007-January/001421.html

-Phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-01-08  9:51 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-06  5:15 [Drbd-dev] DRBD-8: BUG when disk write errors occur during heavy I/O Graham, Simon
2007-01-08  9:51 ` Philipp Reisner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.