From: Reuben Farrelly <reuben-lkml@reub.net>
To: Neil Brown <neilb@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>, Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org
Subject: Re: 2.6.15-mm2
Date: Wed, 11 Jan 2006 18:15:40 +1300 [thread overview]
Message-ID: <43C4947C.1040703@reub.net> (raw)
In-Reply-To: <17348.34472.105452.831193@cse.unsw.edu.au>
On 11/01/2006 5:16 p.m., Neil Brown wrote:
> On Tuesday January 10, mingo@elte.hu wrote:
>> * Andrew Morton <akpm@osdl.org> wrote:
>>
>>> Reuben Farrelly <reuben-lkml@reub.net> wrote:
>>>> Ok here's the latest one, this time with KALLSYMS_ALL, CONFIG_FRAME_POINTER,
>>>> CONFIG_DETECT_SOFTLOCKUP and the DEBUG_WARN_ON(current->state != TASK_RUNNING);
>>>> patch from Ingo.
>>> This is quite ugly. I'd be suspecting a block layer problem: RAID or
>>> the underlying device driver (ahci) has lost an IO.
>> yeah, now it more looks like that to me too. What happens is a raid1
>> resync happens in the background - which is one of the more complex
>> raid1 workloads - and there've been a good number of md patches
>> recently. Reuben, does -git5 show the same symptoms?
>
> There isn't a resync happening - if there was you would a process
> called
> mdX_resync
> (for some X).
>
> What I see here is:
> pdflush at:
> Call Trace:
> [<c02a2f72>] md_write_start+0xbc/0x150
> [<c029a659>] make_request+0x51/0x432
> [<c01e1146>] generic_make_request+0xbe/0x13d
> [<c01e120e>] submit_bio+0x49/0xd3
>
> So it is trying to write to a raid1 which was 'clean' and needs to
> be marked 'dirty' (or 'active') before the first write.
> md_start_write arranges for the array's thread to do this.
> What is that thread doing?
>
> md2_raid1 D F7227200 0 386 11 390 382 (L-TLB)
> ...
> Call Trace:
> [<c029d004>] md_super_wait+0xd5/0xea
> [<c02a4f93>] bitmap_unplug+0x1d8/0x1df
> [<c029b72b>] raid1d+0x7d/0x555
> [<c02a211a>] md_thread+0x44/0x14f
>
> It probably hasn't tried to write out the superblock, and just
> now it is writing out some write-intent-bitmap entries and waiting
> for the write to complete.
>
> md_super_wait is waiting for 'pending_writes' to become zero.
> It is incremented when any superblock or bitmap write starts, and
> is decremented when that write completes.
>
> So a lost write request in one of the components of the array could
> cause this, but it is too easy to simply blame it on someone else....
>
> But there is something I don't understand....
>
> If md2_raid1 is in bitmap_unplug, that means there are outstanding
> write requests to md2_raid1, so the one that pdflush is currently
> generating cannot be the first.
>
> This suggests that pdflush is not writing to md2, but to something
> else.
> Ahhhh.. md0_raid1 is also blocked:
> Call Trace:
> [<c029d004>] md_super_wait+0xd5/0xea
> [<c029ec29>] md_update_sb+0xc9/0x153
> [<c02a3a20>] md_check_recovery+0x182/0x437
> [<c029b6cd>] raid1d+0x1f/0x555
>
> It has just updated the superblocks for md0 and is waiting for those
> writes to complete. But they don't seem to want to complete.
>
> So it seems that two raid1 arrays are blocked in slightly different
> places.
>
> I'm tempted to blame the IO scheduled, only because there have been
> vaguely similar problems in the recent past that can be avoided by
> changing the scheduler.
>
> Reuben: could you check what IO scheduler your drives are using, and
> try changing it. I suspect they use 'as' by default. Try 'cfq' or
> 'deadline'.
By default it was using 'deadline', but I just added elevator=as to my kernel
command line, and it still failed in the same way :( I'm building all four
schedulers into the kernel (should probably optimise that to one someday but not
now..)
I'm tempted to see if I can narrow it down to a specific -gitX release, maybe
that would narrow it down to say, 200 or so patches ;-)
reuben
next prev parent reply other threads:[~2006-01-11 5:16 UTC|newest]
Thread overview: 133+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-01-07 13:22 2.6.15-mm2 Andrew Morton
2006-01-07 13:23 ` 2.6.15-mm2 Andrew Morton
2006-01-07 15:05 ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 21:31 ` 2.6.15-mm2 Andrew Morton
2006-01-07 22:06 ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 23:15 ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 23:40 ` 2.6.15-mm2 Andrew Morton
2006-01-10 10:15 ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 10:30 ` 2.6.15-mm2 Andrew Morton
2006-01-10 10:58 ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 10:47 ` 2.6.15-mm2 Ingo Molnar
2006-01-10 10:52 ` 2.6.15-mm2 Ingo Molnar
2006-01-10 10:58 ` 2.6.15-mm2 Ingo Molnar
2006-01-10 11:34 ` 2.6.15-mm2 Ingo Molnar
2006-01-10 12:28 ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 12:42 ` 2.6.15-mm2 Andrew Morton
2006-01-10 13:16 ` 2.6.15-mm2 Ingo Molnar
2006-01-11 4:16 ` 2.6.15-mm2 Neil Brown
2006-01-11 5:15 ` Reuben Farrelly [this message]
2006-01-11 5:30 ` 2.6.15-mm2 Andrew Morton
2006-01-11 5:30 ` 2.6.15-mm2 Andrew Morton
2006-01-11 10:49 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 11:05 ` 2.6.15-mm2 Andrew Morton
2006-01-11 11:13 ` 2.6.15-mm2 Jens Axboe
2006-01-11 11:40 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 11:56 ` 2.6.15-mm2 Jens Axboe
2006-01-11 14:39 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 14:52 ` 2.6.15-mm2 Jens Axboe
2006-01-11 14:55 ` 2.6.15-mm2 Jens Axboe
2006-01-11 19:23 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 19:45 ` 2.6.15-mm2 Jens Axboe
2006-01-11 19:53 ` 2.6.15-mm2 Jens Axboe
2006-01-12 3:49 ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 8:00 ` 2.6.15-mm2 Tejun Heo
2006-01-12 8:22 ` 2.6.15-mm2 Jens Axboe
[not found] ` <43C61598.7050004@reub.net>
2006-01-12 11:18 ` 2.6.15-mm2 Tejun Heo
2006-01-12 12:05 ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 12:31 ` 2.6.15-mm2 Ric Wheeler
2006-01-12 12:39 ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 13:55 ` 2.6.15-mm2 Tejun Heo
2006-01-12 14:10 ` 2.6.15-mm2 Jens Axboe
2006-01-12 14:20 ` 2.6.15-mm2 Tejun Heo
2006-01-12 19:26 ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 20:32 ` 2.6.15-mm2 Andrew Morton
2006-01-12 20:51 ` 2.6.15-mm2 Jeff Garzik
2006-01-13 4:49 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 21:44 ` 2.6.15-mm2 Neil Brown
2006-01-12 7:35 ` 2.6.15-mm2 Jens Axboe
2006-01-07 15:08 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 17:47 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 17:57 ` 2.6.15-mm2 Dave Jones
2006-01-09 18:01 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 18:24 ` 2.6.15-mm2 Hugh Dickins
2006-01-09 18:48 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 19:16 ` 2.6.15-mm2 Hugh Dickins
2006-01-09 19:21 ` 2.6.15-mm2 Hugh Dickins
2006-01-09 19:39 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 20:15 ` 2.6.15-mm Hugh Dickins
2006-01-09 20:30 ` 2.6.15-mm Jesper Juhl
2006-01-09 20:41 ` 2.6.15-mm Hugh Dickins
2006-01-09 20:46 ` [PATCH] fix Jesper's sg_page_free Bad page states Hugh Dickins
2006-01-09 20:44 ` 2.6.15-mm Mike Christie
2006-01-09 21:04 ` 2.6.15-mm Hugh Dickins
2006-01-07 16:20 ` 2.6.15-mm2: why is __get_page_state() global again? Adrian Bunk
2006-01-07 18:00 ` [-mm patch] drivers/block/amiflop.c: fix compilation Adrian Bunk
2006-01-07 18:19 ` [-mm patch] drivers/acpi/: make two functions static Adrian Bunk
2006-01-07 18:21 ` [-mm patch] kernel/synchro-test.c: make 5 " Adrian Bunk
2006-01-07 19:31 ` 2.6.15-mm2 Brice Goglin
2006-01-07 21:04 ` 2.6.15-mm2 Dave Jones
2006-01-07 21:26 ` 2.6.15-mm2 Brice Goglin
2006-01-07 21:29 ` 2.6.15-mm2 David S. Miller
2006-01-07 21:41 ` 2.6.15-mm2 Arjan van de Ven
2006-01-07 21:42 ` 2.6.15-mm2 Dave Jones
2006-01-07 21:50 ` 2.6.15-mm2 Brice Goglin
2006-01-07 22:13 ` 2.6.15-mm2 Dave Jones
2006-01-07 22:26 ` 2.6.15-mm2 Brice Goglin
2006-01-11 18:41 ` 2.6.15-mm2 Brice Goglin
2006-01-11 20:29 ` 2.6.15-mm2 Dave Jones
2006-01-11 21:50 ` 2.6.15-mm2 Dave Airlie
2006-01-11 21:56 ` 2.6.15-mm2 Dave Jones
2006-01-11 23:50 ` 2.6.15-mm2 Dave Airlie
2006-01-12 10:58 ` 2.6.15-mm2 Ulrich Mueller
2006-01-12 17:11 ` 2.6.15-mm2 Dave Jones
2006-01-12 18:11 ` 2.6.15-mm2 Ulrich Mueller
2006-01-12 20:37 ` 2.6.15-mm2 Dave Airlie
2006-01-12 21:03 ` 2.6.15-mm2 Alan Hourihane
2006-01-12 22:02 ` 2.6.15-mm2 Dave Airlie
2006-01-13 8:32 ` 2.6.15-mm2 Alan Hourihane
2006-01-13 16:49 ` 2.6.15-mm2 Dave Jones
2006-01-12 19:12 ` 2.6.15-mm2 Brice Goglin
2006-01-12 19:21 ` 2.6.15-mm2 Dave Jones
2006-01-07 22:58 ` 2.6.15-mm2 Andrew Morton
2006-01-07 23:38 ` 2.6.15-mm2 Brice Goglin
2006-01-08 12:24 ` 2.6.15-mm2 Andrew Morton
2006-01-08 14:39 ` 2.6.15-mm2 Brice Goglin
2006-01-08 18:56 ` 2.6.15-mm2 Andrew Morton
2006-01-08 12:28 ` 2.6.15-mm2 Andrew Morton
2006-01-08 14:14 ` 2.6.15-mm2 Brice Goglin
2006-01-07 20:51 ` Badness in __mutex_unlock_slowpath Andrew James Wade
2006-01-07 21:13 ` Arjan van de Ven
2006-01-08 8:53 ` Ingo Molnar
2006-01-07 21:06 ` 2.6.15-mm2: alpha broken Alexey Dobriyan
2006-01-07 23:48 ` Andrew Morton
2006-01-08 0:45 ` [PATCH -mm] fixup *at syscalls additions (alpha, sparc64) Alexey Dobriyan
2006-01-08 0:54 ` [PATCH -mm] Fixup arch/alpha/mm/init.c compilation Alexey Dobriyan
2006-01-08 12:31 ` 2.6.15-mm2: alpha broken Alexey Dobriyan
2006-01-11 2:24 ` Paul Jackson
2006-01-13 14:11 ` Adrian Bunk
2006-01-13 15:52 ` Paul Jackson
2006-01-13 16:37 ` Al Viro
2006-01-13 18:10 ` Paul Jackson
2006-01-13 18:19 ` Randy.Dunlap
2006-01-13 19:05 ` Thomas Gleixner
2006-01-13 21:08 ` Adrian Bunk
2006-01-13 21:12 ` Randy.Dunlap
2006-01-13 21:32 ` Adrian Bunk
2006-01-13 21:52 ` Paul Jackson
2006-01-13 22:18 ` Andrew Morton
2006-01-13 19:26 ` Andrew Morton
2006-01-13 21:05 ` Adrian Bunk
2006-01-08 0:40 ` 2.6.15-mm2 Alexander Gran
[not found] ` <200601080139.34774@zodiac.zodiac.dnsalias.org>
[not found] ` <20060107175056.3d7a2895.akpm@osdl.org>
2006-01-10 0:30 ` 2.6.15-mm2 Alexander Gran
2006-01-10 1:22 ` 2.6.15-mm2 Andrew Morton
2006-01-10 21:20 ` 2.6.15-mm2 Serge E. Hallyn
-- strict thread matches above, loose matches on Subject: below --
2006-01-07 21:51 2.6.15-mm2 Chuck Ebbert
2006-01-07 22:01 2.6.15-mm2 Chuck Ebbert
2006-01-08 8:16 2.6.15-mm2 Brown, Len
2006-01-08 14:23 ` 2.6.15-mm2 Brice Goglin
2006-01-08 8:19 2.6.15-mm2 Brown, Len
2006-01-08 9:40 ` 2.6.15-mm2 Reuben Farrelly
2006-01-08 17:58 2.6.15-mm2 Brown, Len
2006-01-08 18:08 2.6.15-mm2 Brown, Len
2006-01-08 18:18 2.6.15-mm2 Brown, Len
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43C4947C.1040703@reub.net \
--to=reuben-lkml@reub.net \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox