public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Reuben Farrelly <reuben-lkml@reub.net>
To: Neil Brown <neilb@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.6.15-mm2
Date: Wed, 11 Jan 2006 18:15:40 +1300	[thread overview]
Message-ID: <43C4947C.1040703@reub.net> (raw)
In-Reply-To: <17348.34472.105452.831193@cse.unsw.edu.au>



On 11/01/2006 5:16 p.m., Neil Brown wrote:
> On Tuesday January 10, mingo@elte.hu wrote:
>> * Andrew Morton <akpm@osdl.org> wrote:
>>
>>> Reuben Farrelly <reuben-lkml@reub.net> wrote:
>>>> Ok here's the latest one, this time with KALLSYMS_ALL, CONFIG_FRAME_POINTER, 
>>>>  CONFIG_DETECT_SOFTLOCKUP and the DEBUG_WARN_ON(current->state != TASK_RUNNING); 
>>>>  patch from Ingo.
>>> This is quite ugly.  I'd be suspecting a block layer problem: RAID or 
>>> the underlying device driver (ahci) has lost an IO.
>> yeah, now it more looks like that to me too. What happens is a raid1 
>> resync happens in the background - which is one of the more complex 
>> raid1 workloads - and there've been a good number of md patches 
>> recently. Reuben, does -git5 show the same symptoms?
> 
> There isn't a resync happening - if there was you would a process
> called
>    mdX_resync
> (for some X).
> 
> What I see here is:
>  pdflush at:
> Call Trace:
>   [<c02a2f72>] md_write_start+0xbc/0x150
>   [<c029a659>] make_request+0x51/0x432
>   [<c01e1146>] generic_make_request+0xbe/0x13d
>   [<c01e120e>] submit_bio+0x49/0xd3
> 
> So it is trying to write to a raid1 which was 'clean' and needs to
> be marked 'dirty' (or 'active') before the first write.
> md_start_write arranges for the array's thread to do this.
> What is that thread doing?
> 
> md2_raid1     D F7227200     0   386     11           390   382 (L-TLB)
>   ...
> Call Trace:
>   [<c029d004>] md_super_wait+0xd5/0xea
>   [<c02a4f93>] bitmap_unplug+0x1d8/0x1df
>   [<c029b72b>] raid1d+0x7d/0x555
>   [<c02a211a>] md_thread+0x44/0x14f
> 
> It probably hasn't tried to write out the superblock, and just
> now it is writing out some write-intent-bitmap entries and waiting
> for the write to complete.
> 
> md_super_wait is waiting for 'pending_writes' to become zero.
> It is incremented when any superblock or bitmap write starts, and
> is decremented when that write completes.
> 
> So a lost write request in one of the components of the array could
> cause this, but it is too easy to simply blame it on someone else....
> 
> But there is something I don't understand....
> 
> If md2_raid1 is in bitmap_unplug, that means there are outstanding
> write requests to md2_raid1, so the one that pdflush is currently
> generating cannot be the first.
> 
> This suggests that pdflush is not writing to md2, but to something
> else.
> Ahhhh.. md0_raid1 is also blocked:
> Call Trace:
>   [<c029d004>] md_super_wait+0xd5/0xea
>   [<c029ec29>] md_update_sb+0xc9/0x153
>   [<c02a3a20>] md_check_recovery+0x182/0x437
>   [<c029b6cd>] raid1d+0x1f/0x555
> 
> It has just updated the superblocks for md0 and is waiting for those
> writes to complete.  But they don't seem to want to complete.
> 
> So it seems that two raid1 arrays are blocked in slightly different
> places.
> 
> I'm tempted to blame the IO scheduled, only because there have been
> vaguely similar problems in the recent past that can be avoided by
> changing the scheduler.
> 
> Reuben:  could you check what IO scheduler your drives are using, and 
> try changing it.  I suspect they use 'as' by default.  Try 'cfq' or
> 'deadline'.

By default it was using 'deadline', but I just added elevator=as to my kernel 
command line, and it still failed in the same way :(  I'm building all four 
schedulers into the kernel (should probably optimise that to one someday but not 
now..)

I'm tempted to see if I can narrow it down to a specific -gitX release, maybe 
that would narrow it down to say, 200 or so patches ;-)

reuben



  reply	other threads:[~2006-01-11  5:16 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-07 13:22 2.6.15-mm2 Andrew Morton
2006-01-07 13:23 ` 2.6.15-mm2 Andrew Morton
2006-01-07 15:05 ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 21:31   ` 2.6.15-mm2 Andrew Morton
2006-01-07 22:06     ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 23:15       ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 23:40         ` 2.6.15-mm2 Andrew Morton
2006-01-10 10:15     ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 10:30       ` 2.6.15-mm2 Andrew Morton
2006-01-10 10:58         ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 10:47       ` 2.6.15-mm2 Ingo Molnar
2006-01-10 10:52         ` 2.6.15-mm2 Ingo Molnar
2006-01-10 10:58           ` 2.6.15-mm2 Ingo Molnar
2006-01-10 11:34           ` 2.6.15-mm2 Ingo Molnar
2006-01-10 12:28         ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 12:42           ` 2.6.15-mm2 Andrew Morton
2006-01-10 13:16             ` 2.6.15-mm2 Ingo Molnar
2006-01-11  4:16               ` 2.6.15-mm2 Neil Brown
2006-01-11  5:15                 ` Reuben Farrelly [this message]
2006-01-11  5:30                   ` 2.6.15-mm2 Andrew Morton
2006-01-11  5:30                     ` 2.6.15-mm2 Andrew Morton
2006-01-11 10:49                       ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 11:05                         ` 2.6.15-mm2 Andrew Morton
2006-01-11 11:13                           ` 2.6.15-mm2 Jens Axboe
2006-01-11 11:40                             ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 11:56                               ` 2.6.15-mm2 Jens Axboe
2006-01-11 14:39                                 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 14:52                                   ` 2.6.15-mm2 Jens Axboe
2006-01-11 14:55                                     ` 2.6.15-mm2 Jens Axboe
2006-01-11 19:23                                       ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 19:45                                         ` 2.6.15-mm2 Jens Axboe
2006-01-11 19:53                                           ` 2.6.15-mm2 Jens Axboe
2006-01-12  3:49                                             ` 2.6.15-mm2 Reuben Farrelly
2006-01-12  8:00                                               ` 2.6.15-mm2 Tejun Heo
2006-01-12  8:22                                                 ` 2.6.15-mm2 Jens Axboe
     [not found]                                                 ` <43C61598.7050004@reub.net>
2006-01-12 11:18                                                   ` 2.6.15-mm2 Tejun Heo
2006-01-12 12:05                                                     ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 12:31                                                       ` 2.6.15-mm2 Ric Wheeler
2006-01-12 12:39                                                         ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 13:55                                                           ` 2.6.15-mm2 Tejun Heo
2006-01-12 14:10                                                             ` 2.6.15-mm2 Jens Axboe
2006-01-12 14:20                                                               ` 2.6.15-mm2 Tejun Heo
2006-01-12 19:26                                                             ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 20:32                                                               ` 2.6.15-mm2 Andrew Morton
2006-01-12 20:51                                                                 ` 2.6.15-mm2 Jeff Garzik
2006-01-13  4:49                                                                   ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 21:44                                 ` 2.6.15-mm2 Neil Brown
2006-01-12  7:35                                   ` 2.6.15-mm2 Jens Axboe
2006-01-07 15:08 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 17:47   ` 2.6.15-mm2 Jesper Juhl
2006-01-09 17:57     ` 2.6.15-mm2 Dave Jones
2006-01-09 18:01       ` 2.6.15-mm2 Jesper Juhl
2006-01-09 18:24         ` 2.6.15-mm2 Hugh Dickins
2006-01-09 18:48           ` 2.6.15-mm2 Jesper Juhl
2006-01-09 19:16             ` 2.6.15-mm2 Hugh Dickins
2006-01-09 19:21               ` 2.6.15-mm2 Hugh Dickins
2006-01-09 19:39               ` 2.6.15-mm2 Jesper Juhl
2006-01-09 20:15                 ` 2.6.15-mm Hugh Dickins
2006-01-09 20:30                   ` 2.6.15-mm Jesper Juhl
2006-01-09 20:41                     ` 2.6.15-mm Hugh Dickins
2006-01-09 20:46                       ` [PATCH] fix Jesper's sg_page_free Bad page states Hugh Dickins
2006-01-09 20:44                   ` 2.6.15-mm Mike Christie
2006-01-09 21:04                     ` 2.6.15-mm Hugh Dickins
2006-01-07 16:20 ` 2.6.15-mm2: why is __get_page_state() global again? Adrian Bunk
2006-01-07 18:00 ` [-mm patch] drivers/block/amiflop.c: fix compilation Adrian Bunk
2006-01-07 18:19 ` [-mm patch] drivers/acpi/: make two functions static Adrian Bunk
2006-01-07 18:21 ` [-mm patch] kernel/synchro-test.c: make 5 " Adrian Bunk
2006-01-07 19:31 ` 2.6.15-mm2 Brice Goglin
2006-01-07 21:04   ` 2.6.15-mm2 Dave Jones
2006-01-07 21:26     ` 2.6.15-mm2 Brice Goglin
2006-01-07 21:29       ` 2.6.15-mm2 David S. Miller
2006-01-07 21:41       ` 2.6.15-mm2 Arjan van de Ven
2006-01-07 21:42       ` 2.6.15-mm2 Dave Jones
2006-01-07 21:50         ` 2.6.15-mm2 Brice Goglin
2006-01-07 22:13           ` 2.6.15-mm2 Dave Jones
2006-01-07 22:26             ` 2.6.15-mm2 Brice Goglin
2006-01-11 18:41       ` 2.6.15-mm2 Brice Goglin
2006-01-11 20:29         ` 2.6.15-mm2 Dave Jones
2006-01-11 21:50           ` 2.6.15-mm2 Dave Airlie
2006-01-11 21:56             ` 2.6.15-mm2 Dave Jones
2006-01-11 23:50               ` 2.6.15-mm2 Dave Airlie
2006-01-12 10:58           ` 2.6.15-mm2 Ulrich Mueller
2006-01-12 17:11             ` 2.6.15-mm2 Dave Jones
2006-01-12 18:11               ` 2.6.15-mm2 Ulrich Mueller
2006-01-12 20:37                 ` 2.6.15-mm2 Dave Airlie
2006-01-12 21:03                   ` 2.6.15-mm2 Alan Hourihane
2006-01-12 22:02                     ` 2.6.15-mm2 Dave Airlie
2006-01-13  8:32                       ` 2.6.15-mm2 Alan Hourihane
2006-01-13 16:49                         ` 2.6.15-mm2 Dave Jones
2006-01-12 19:12               ` 2.6.15-mm2 Brice Goglin
2006-01-12 19:21                 ` 2.6.15-mm2 Dave Jones
2006-01-07 22:58   ` 2.6.15-mm2 Andrew Morton
2006-01-07 23:38     ` 2.6.15-mm2 Brice Goglin
2006-01-08 12:24       ` 2.6.15-mm2 Andrew Morton
2006-01-08 14:39         ` 2.6.15-mm2 Brice Goglin
2006-01-08 18:56           ` 2.6.15-mm2 Andrew Morton
2006-01-08 12:28       ` 2.6.15-mm2 Andrew Morton
2006-01-08 14:14         ` 2.6.15-mm2 Brice Goglin
2006-01-07 20:51 ` Badness in __mutex_unlock_slowpath Andrew James Wade
2006-01-07 21:13   ` Arjan van de Ven
2006-01-08  8:53     ` Ingo Molnar
2006-01-07 21:06 ` 2.6.15-mm2: alpha broken Alexey Dobriyan
2006-01-07 23:48   ` Andrew Morton
2006-01-08  0:45     ` [PATCH -mm] fixup *at syscalls additions (alpha, sparc64) Alexey Dobriyan
2006-01-08  0:54     ` [PATCH -mm] Fixup arch/alpha/mm/init.c compilation Alexey Dobriyan
2006-01-08 12:31     ` 2.6.15-mm2: alpha broken Alexey Dobriyan
2006-01-11  2:24     ` Paul Jackson
2006-01-13 14:11       ` Adrian Bunk
2006-01-13 15:52         ` Paul Jackson
2006-01-13 16:37         ` Al Viro
2006-01-13 18:10         ` Paul Jackson
2006-01-13 18:19           ` Randy.Dunlap
2006-01-13 19:05             ` Thomas Gleixner
2006-01-13 21:08             ` Adrian Bunk
2006-01-13 21:12               ` Randy.Dunlap
2006-01-13 21:32                 ` Adrian Bunk
2006-01-13 21:52                   ` Paul Jackson
2006-01-13 22:18                     ` Andrew Morton
2006-01-13 19:26           ` Andrew Morton
2006-01-13 21:05           ` Adrian Bunk
2006-01-08  0:40 ` 2.6.15-mm2 Alexander Gran
     [not found] ` <200601080139.34774@zodiac.zodiac.dnsalias.org>
     [not found]   ` <20060107175056.3d7a2895.akpm@osdl.org>
2006-01-10  0:30     ` 2.6.15-mm2 Alexander Gran
2006-01-10  1:22       ` 2.6.15-mm2 Andrew Morton
2006-01-10 21:20 ` 2.6.15-mm2 Serge E. Hallyn
  -- strict thread matches above, loose matches on Subject: below --
2006-01-07 21:51 2.6.15-mm2 Chuck Ebbert
2006-01-07 22:01 2.6.15-mm2 Chuck Ebbert
2006-01-08  8:16 2.6.15-mm2 Brown, Len
2006-01-08 14:23 ` 2.6.15-mm2 Brice Goglin
2006-01-08  8:19 2.6.15-mm2 Brown, Len
2006-01-08  9:40 ` 2.6.15-mm2 Reuben Farrelly
2006-01-08 17:58 2.6.15-mm2 Brown, Len
2006-01-08 18:08 2.6.15-mm2 Brown, Len
2006-01-08 18:18 2.6.15-mm2 Brown, Len

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43C4947C.1040703@reub.net \
    --to=reuben-lkml@reub.net \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox