From: Reuben Farrelly <reuben-lkml@reub.net>
To: Neil Brown <neilb@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>, Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org
Subject: Re: 2.6.15-mm2
Date: Wed, 11 Jan 2006 18:15:40 +1300 [thread overview]
Message-ID: <43C4947C.1040703@reub.net> (raw)
In-Reply-To: <17348.34472.105452.831193@cse.unsw.edu.au>
On 11/01/2006 5:16 p.m., Neil Brown wrote:
> On Tuesday January 10, mingo@elte.hu wrote:
>> * Andrew Morton <akpm@osdl.org> wrote:
>>
>>> Reuben Farrelly <reuben-lkml@reub.net> wrote:
>>>> Ok here's the latest one, this time with KALLSYMS_ALL, CONFIG_FRAME_POINTER,
>>>> CONFIG_DETECT_SOFTLOCKUP and the DEBUG_WARN_ON(current->state != TASK_RUNNING);
>>>> patch from Ingo.
>>> This is quite ugly. I'd be suspecting a block layer problem: RAID or
>>> the underlying device driver (ahci) has lost an IO.
>> yeah, now it more looks like that to me too. What happens is a raid1
>> resync happens in the background - which is one of the more complex
>> raid1 workloads - and there've been a good number of md patches
>> recently. Reuben, does -git5 show the same symptoms?
>
> There isn't a resync happening - if there was you would a process
> called
> mdX_resync
> (for some X).
>
> What I see here is:
> pdflush at:
> Call Trace:
> [<c02a2f72>] md_write_start+0xbc/0x150
> [<c029a659>] make_request+0x51/0x432
> [<c01e1146>] generic_make_request+0xbe/0x13d
> [<c01e120e>] submit_bio+0x49/0xd3
>
> So it is trying to write to a raid1 which was 'clean' and needs to
> be marked 'dirty' (or 'active') before the first write.
> md_start_write arranges for the array's thread to do this.
> What is that thread doing?
>
> md2_raid1 D F7227200 0 386 11 390 382 (L-TLB)
> ...
> Call Trace:
> [<c029d004>] md_super_wait+0xd5/0xea
> [<c02a4f93>] bitmap_unplug+0x1d8/0x1df
> [<c029b72b>] raid1d+0x7d/0x555
> [<c02a211a>] md_thread+0x44/0x14f
>
> It probably hasn't tried to write out the superblock, and just
> now it is writing out some write-intent-bitmap entries and waiting
> for the write to complete.
>
> md_super_wait is waiting for 'pending_writes' to become zero.
> It is incremented when any superblock or bitmap write starts, and
> is decremented when that write completes.
>
> So a lost write request in one of the components of the array could
> cause this, but it is too easy to simply blame it on someone else....
>
> But there is something I don't understand....
>
> If md2_raid1 is in bitmap_unplug, that means there are outstanding
> write requests to md2_raid1, so the one that pdflush is currently
> generating cannot be the first.
>
> This suggests that pdflush is not writing to md2, but to something
> else.
> Ahhhh.. md0_raid1 is also blocked:
> Call Trace:
> [<c029d004>] md_super_wait+0xd5/0xea
> [<c029ec29>] md_update_sb+0xc9/0x153
> [<c02a3a20>] md_check_recovery+0x182/0x437
> [<c029b6cd>] raid1d+0x1f/0x555
>
> It has just updated the superblocks for md0 and is waiting for those
> writes to complete. But they don't seem to want to complete.
>
> So it seems that two raid1 arrays are blocked in slightly different
> places.
>
> I'm tempted to blame the IO scheduled, only because there have been
> vaguely similar problems in the recent past that can be avoided by
> changing the scheduler.
>
> Reuben: could you check what IO scheduler your drives are using, and
> try changing it. I suspect they use 'as' by default. Try 'cfq' or
> 'deadline'.
By default it was using 'deadline', but I just added elevator=as to my kernel
command line, and it still failed in the same way :( I'm building all four
schedulers into the kernel (should probably optimise that to one someday but not
now..)
I'm tempted to see if I can narrow it down to a specific -gitX release, maybe
that would narrow it down to say, 200 or so patches ;-)
reuben
next prev parent reply other threads:[~2006-01-11 5:16 UTC|newest]
Thread overview: 146+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-01-07 13:22 2.6.15-mm2 Andrew Morton
2006-01-07 13:23 ` 2.6.15-mm2 Andrew Morton
2006-01-07 15:05 ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 21:31 ` 2.6.15-mm2 Andrew Morton
2006-01-07 22:06 ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 23:15 ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 23:40 ` 2.6.15-mm2 Andrew Morton
2006-01-10 10:15 ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 10:30 ` 2.6.15-mm2 Andrew Morton
2006-01-10 10:58 ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 10:47 ` 2.6.15-mm2 Ingo Molnar
2006-01-10 10:52 ` 2.6.15-mm2 Ingo Molnar
2006-01-10 10:58 ` 2.6.15-mm2 Ingo Molnar
2006-01-10 11:34 ` 2.6.15-mm2 Ingo Molnar
2006-01-10 12:28 ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 12:42 ` 2.6.15-mm2 Andrew Morton
2006-01-10 13:16 ` 2.6.15-mm2 Ingo Molnar
2006-01-11 4:16 ` 2.6.15-mm2 Neil Brown
2006-01-11 5:15 ` Reuben Farrelly [this message]
2006-01-11 5:30 ` 2.6.15-mm2 Andrew Morton
2006-01-11 5:30 ` 2.6.15-mm2 Andrew Morton
2006-01-11 10:49 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 11:05 ` 2.6.15-mm2 Andrew Morton
2006-01-11 11:13 ` 2.6.15-mm2 Jens Axboe
2006-01-11 11:40 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 11:56 ` 2.6.15-mm2 Jens Axboe
2006-01-11 14:39 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 14:52 ` 2.6.15-mm2 Jens Axboe
2006-01-11 14:55 ` 2.6.15-mm2 Jens Axboe
2006-01-11 19:23 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 19:45 ` 2.6.15-mm2 Jens Axboe
2006-01-11 19:53 ` 2.6.15-mm2 Jens Axboe
2006-01-12 3:49 ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 8:00 ` 2.6.15-mm2 Tejun Heo
2006-01-12 8:22 ` 2.6.15-mm2 Jens Axboe
[not found] ` <43C61598.7050004@reub.net>
2006-01-12 11:18 ` 2.6.15-mm2 Tejun Heo
2006-01-12 12:05 ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 12:31 ` 2.6.15-mm2 Ric Wheeler
2006-01-12 12:39 ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 13:55 ` 2.6.15-mm2 Tejun Heo
2006-01-12 14:10 ` 2.6.15-mm2 Jens Axboe
2006-01-12 14:20 ` 2.6.15-mm2 Tejun Heo
2006-01-12 19:26 ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 20:32 ` 2.6.15-mm2 Andrew Morton
2006-01-12 20:51 ` 2.6.15-mm2 Jeff Garzik
2006-01-13 4:49 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 21:44 ` 2.6.15-mm2 Neil Brown
2006-01-12 7:35 ` 2.6.15-mm2 Jens Axboe
2006-01-07 15:08 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 17:47 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 17:57 ` 2.6.15-mm2 Dave Jones
2006-01-09 18:01 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 18:24 ` 2.6.15-mm2 Hugh Dickins
2006-01-09 18:48 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 19:16 ` 2.6.15-mm2 Hugh Dickins
2006-01-09 19:21 ` 2.6.15-mm2 Hugh Dickins
2006-01-09 19:39 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 20:15 ` 2.6.15-mm Hugh Dickins
2006-01-09 20:30 ` 2.6.15-mm Jesper Juhl
2006-01-09 20:41 ` 2.6.15-mm Hugh Dickins
2006-01-09 20:46 ` [PATCH] fix Jesper's sg_page_free Bad page states Hugh Dickins
2006-01-09 20:44 ` 2.6.15-mm Mike Christie
2006-01-09 21:04 ` 2.6.15-mm Hugh Dickins
2006-01-07 16:20 ` 2.6.15-mm2: why is __get_page_state() global again? Adrian Bunk
2006-01-07 18:00 ` [-mm patch] drivers/block/amiflop.c: fix compilation Adrian Bunk
[not found] ` <20060107052221.61d0b600.akpm-3NddpPZAyC0@public.gmane.org>
2006-01-07 18:19 ` [-mm patch] drivers/acpi/: make two functions static Adrian Bunk
2006-01-07 18:19 ` Adrian Bunk
2006-01-07 18:21 ` [-mm patch] kernel/synchro-test.c: make 5 " Adrian Bunk
2006-01-07 19:31 ` 2.6.15-mm2 Brice Goglin
2006-01-07 21:04 ` 2.6.15-mm2 Dave Jones
2006-01-07 21:26 ` 2.6.15-mm2 Brice Goglin
2006-01-07 21:29 ` 2.6.15-mm2 David S. Miller
2006-01-07 21:41 ` 2.6.15-mm2 Arjan van de Ven
2006-01-07 21:42 ` 2.6.15-mm2 Dave Jones
2006-01-07 21:50 ` 2.6.15-mm2 Brice Goglin
2006-01-07 22:13 ` 2.6.15-mm2 Dave Jones
2006-01-07 22:26 ` 2.6.15-mm2 Brice Goglin
2006-01-11 18:41 ` 2.6.15-mm2 Brice Goglin
2006-01-11 20:29 ` 2.6.15-mm2 Dave Jones
2006-01-11 21:50 ` 2.6.15-mm2 Dave Airlie
2006-01-11 21:56 ` 2.6.15-mm2 Dave Jones
2006-01-11 23:50 ` 2.6.15-mm2 Dave Airlie
2006-01-12 10:58 ` 2.6.15-mm2 Ulrich Mueller
2006-01-12 17:11 ` 2.6.15-mm2 Dave Jones
2006-01-12 18:11 ` 2.6.15-mm2 Ulrich Mueller
2006-01-12 20:37 ` 2.6.15-mm2 Dave Airlie
2006-01-12 21:03 ` 2.6.15-mm2 Alan Hourihane
2006-01-12 22:02 ` 2.6.15-mm2 Dave Airlie
2006-01-13 8:32 ` 2.6.15-mm2 Alan Hourihane
2006-01-13 16:49 ` 2.6.15-mm2 Dave Jones
2006-01-12 19:12 ` 2.6.15-mm2 Brice Goglin
2006-01-12 19:21 ` 2.6.15-mm2 Dave Jones
[not found] ` <43C0172E.7040607-vYW+cPY1g1pg9hUCZPvPmw@public.gmane.org>
2006-01-07 22:58 ` 2.6.15-mm2 Andrew Morton
2006-01-07 22:58 ` 2.6.15-mm2 Andrew Morton
[not found] ` <20060107145800.113d7de5.akpm-3NddpPZAyC0@public.gmane.org>
2006-01-07 23:38 ` 2.6.15-mm2 Brice Goglin
2006-01-07 23:38 ` 2.6.15-mm2 Brice Goglin
[not found] ` <43C050FA.9040400-vYW+cPY1g1pg9hUCZPvPmw@public.gmane.org>
2006-01-08 12:24 ` 2.6.15-mm2 Andrew Morton
2006-01-08 12:24 ` 2.6.15-mm2 Andrew Morton
[not found] ` <20060108042425.4d0b8a76.akpm-3NddpPZAyC0@public.gmane.org>
2006-01-08 14:39 ` 2.6.15-mm2 Brice Goglin
2006-01-08 14:39 ` 2.6.15-mm2 Brice Goglin
[not found] ` <43C12404.1010306-vYW+cPY1g1pg9hUCZPvPmw@public.gmane.org>
2006-01-08 18:56 ` 2.6.15-mm2 Andrew Morton
2006-01-08 18:56 ` 2.6.15-mm2 Andrew Morton
2006-01-08 12:28 ` 2.6.15-mm2 Andrew Morton
2006-01-08 14:14 ` 2.6.15-mm2 Brice Goglin
2006-01-08 10:31 ` 2.6.15-mm2 Erik Slagter
[not found] ` <1136716282.7377.1.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2006-01-09 5:03 ` 2.6.15-mm2 Dave Jones
2006-01-07 20:51 ` Badness in __mutex_unlock_slowpath Andrew James Wade
2006-01-07 21:13 ` Arjan van de Ven
2006-01-08 8:53 ` Ingo Molnar
2006-01-07 21:06 ` 2.6.15-mm2: alpha broken Alexey Dobriyan
2006-01-07 23:48 ` Andrew Morton
2006-01-08 0:45 ` [PATCH -mm] fixup *at syscalls additions (alpha, sparc64) Alexey Dobriyan
2006-01-08 0:54 ` [PATCH -mm] Fixup arch/alpha/mm/init.c compilation Alexey Dobriyan
2006-01-08 12:31 ` 2.6.15-mm2: alpha broken Alexey Dobriyan
2006-01-11 2:24 ` Paul Jackson
2006-01-13 14:11 ` Adrian Bunk
2006-01-13 15:52 ` Paul Jackson
2006-01-13 16:37 ` Al Viro
2006-01-13 18:10 ` Paul Jackson
2006-01-13 18:19 ` Randy.Dunlap
2006-01-13 19:05 ` Thomas Gleixner
2006-01-13 21:08 ` Adrian Bunk
2006-01-13 21:12 ` Randy.Dunlap
2006-01-13 21:32 ` Adrian Bunk
2006-01-13 21:52 ` Paul Jackson
2006-01-13 22:18 ` Andrew Morton
2006-01-13 19:26 ` Andrew Morton
2006-01-13 21:05 ` Adrian Bunk
2006-01-08 0:40 ` 2.6.15-mm2 Alexander Gran
[not found] ` <200601080139.34774@zodiac.zodiac.dnsalias.org>
[not found] ` <20060107175056.3d7a2895.akpm@osdl.org>
2006-01-10 0:30 ` 2.6.15-mm2 Alexander Gran
2006-01-10 1:22 ` 2.6.15-mm2 Andrew Morton
2006-01-10 21:20 ` 2.6.15-mm2 Serge E. Hallyn
2006-01-10 21:20 ` 2.6.15-mm2 Serge E. Hallyn
-- strict thread matches above, loose matches on Subject: below --
2006-01-08 18:18 2.6.15-mm2 Brown, Len
2006-01-08 18:18 ` 2.6.15-mm2 Brown, Len
2006-01-08 18:08 2.6.15-mm2 Brown, Len
2006-01-08 17:58 2.6.15-mm2 Brown, Len
2006-01-08 17:58 ` 2.6.15-mm2 Brown, Len
2006-01-08 8:19 2.6.15-mm2 Brown, Len
2006-01-08 9:40 ` 2.6.15-mm2 Reuben Farrelly
2006-01-08 8:16 2.6.15-mm2 Brown, Len
2006-01-08 8:16 ` 2.6.15-mm2 Brown, Len
2006-01-08 14:23 ` 2.6.15-mm2 Brice Goglin
2006-01-07 22:01 2.6.15-mm2 Chuck Ebbert
2006-01-07 21:51 2.6.15-mm2 Chuck Ebbert
2006-01-07 13:22 2.6.15-mm2 Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43C4947C.1040703@reub.net \
--to=reuben-lkml@reub.net \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.