All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Jes Sorensen <Jes.Sorensen@redhat.com>
Cc: linux-raid@vger.kernel.org, Shaohua Li <shli@kernel.org>,
	Eryu Guan <eguan@redhat.com>
Subject: Re: raid5 lockups post ca64cae96037de16e4af92678814f5d4bf0c1c65
Date: Wed, 20 Mar 2013 11:55:18 +1100	[thread overview]
Message-ID: <20130320115518.3f5afb71@notabene.brown> (raw)
In-Reply-To: <wrfj38vy5t92.fsf@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4425 bytes --]

On Thu, 14 Mar 2013 08:35:05 +0100 Jes Sorensen <Jes.Sorensen@redhat.com>
wrote:

> NeilBrown <neilb@suse.de> writes:
> > On Tue, 12 Mar 2013 14:45:44 +0100 Jes Sorensen <Jes.Sorensen@redhat.com>
> > wrote:
> >
> >> NeilBrown <neilb@suse.de> writes:
> >> > On Tue, 12 Mar 2013 09:32:31 +1100 NeilBrown <neilb@suse.de> wrote:
> >> >
> >> >> On Wed, 06 Mar 2013 10:31:55 +0100 Jes Sorensen <Jes.Sorensen@redhat.com>
> >> >> wrote:
> >> >> 
> >> >
> >> >> > 
> >> >> > I am attaching the test script I am running too. It was written by Eryu
> >> >> > Guan.
> >> >> 
> >> >> Thanks for that.  I've tried using it but haven't managed to trigger a BUG
> >> >> yet.  What size are the loop files?  I mostly use fairly small ones, but
> >> >> maybe it needs to be bigger to trigger the problem.
> >> >
> >> > Shortly after I wrote that I got a bug-on!  It hasn't happened again though.
> >> >
> >> > This was using code without that latest patch I sent.  The bug was
> >> > 		BUG_ON(s->uptodate != disks);
> >> >
> >> > in the check_state_compute_result case of handle_parity_checks5() which is
> >> > probably the same cause as your most recent BUG.
> >> >
> >> > I've revised my thinking a bit and am now running with this patch which I
> >> > think should fix a problem that probably caused the symptoms we have seen.
> >> >
> >> > If you could run your tests for a while too and is whether it will
> >> > still crash
> >> > for you, I'd really appreciate it.
> >> 
> >> Hi Neil,
> >> 
> >> Sorry I can't verify the line numbers of my old test since I managed to
> >> mess up my git tree in the process :(
> >> 
> >> However running with this new patch I have just hit another but
> >> different case. Looks like a deadlock.
> >
> > You test setup is clearly different from mine.  I've been running all night
> > without a single hiccup.
> >
> >> 
> >> This is basically running ca64cae96037de16e4af92678814f5d4bf0c1c65 with
> >> your patch applied on top, and nothing else.
> >> 
> >> If you want me to try a more uptodate Linus tree, please let me know.
> >> 
> >> Cheers,
> >> Jes
> >> 
> >> 
> >> [17635.205927] INFO: task mkfs.ext4:20060 blocked for more than 120 seconds.
> >> [17635.213543] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> disables this message.
> >> [17635.222291] mkfs.ext4 D ffff880236814100 0 20060 20026 0x00000080
> >> [17635.230199] ffff8801bc8bbb98 0000000000000082 ffff88022f0be540
> >> ffff8801bc8bbfd8
> >> [17635.238518] ffff8801bc8bbfd8 ffff8801bc8bbfd8 ffff88022d47b2a0
> >> ffff88022f0be540
> >> [17635.246837] ffff8801cea1f430 000000000001d5f0 ffff8801c7f4f430
> >> ffff88022169a400
> >> [17635.255161] Call Trace:
> >> [17635.257891]  [<ffffffff81614f79>] schedule+0x29/0x70
> >> [17635.263433]  [<ffffffffa0386ada>] make_request+0x6da/0x6f0 [raid456]
> >> [17635.270525]  [<ffffffff81084210>] ? wake_up_bit+0x40/0x40
> >> [17635.276560]  [<ffffffff814a6633>] md_make_request+0xc3/0x200
> >> [17635.282884]  [<ffffffff81134655>] ? mempool_alloc_slab+0x15/0x20
> >> [17635.289586]  [<ffffffff812c70d2>] generic_make_request+0xc2/0x110
> >> [17635.296393]  [<ffffffff812c7199>] submit_bio+0x79/0x160
> >> [17635.302232]  [<ffffffff811ca625>] ? bio_alloc_bioset+0x65/0x120
> >> [17635.308844]  [<ffffffff812ce234>] blkdev_issue_discard+0x184/0x240
> >> [17635.315748]  [<ffffffff812cef76>] blkdev_ioctl+0x3b6/0x810
> >> [17635.321877]  [<ffffffff811cb971>] block_ioctl+0x41/0x50
> >> [17635.327714]  [<ffffffff811a6aa9>] do_vfs_ioctl+0x99/0x580
> >> [17635.333745] [<ffffffff8128a19a>] ?
> >> inode_has_perm.isra.30.constprop.60+0x2a/0x30
> >> [17635.342103]  [<ffffffff8128b6d7>] ? file_has_perm+0x97/0xb0
> >> [17635.348329]  [<ffffffff811a7021>] sys_ioctl+0x91/0xb0
> >> [17635.353972]  [<ffffffff810de9dc>] ? __audit_syscall_exit+0x3ec/0x450
> >> [17635.361070]  [<ffffffff8161e759>] system_call_fastpath+0x16/0x1b
> >
> > There is a small race in the exclusion between discard and recovery.
> > This patch on top should fix it (I hope).
> > Thanks for testing.
> 
> Ok I spent most of yesterday running tests on this. With this additional
> patch applied I haven't been able to reproduce the hang so far - without
> it I could do it in about an hour, so I suspect it solves the problem.
> 
> Thanks!
> Jes

Thanks.  I'll get the queued for Linus and -stable shortly.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

      reply	other threads:[~2013-03-20  0:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-04 13:50 raid5 lockups post ca64cae96037de16e4af92678814f5d4bf0c1c65 Jes Sorensen
2013-03-04 21:00 ` NeilBrown
2013-03-05  8:44   ` Jes Sorensen
2013-03-06  2:18     ` NeilBrown
2013-03-06  9:31       ` Jes Sorensen
2013-03-11 22:32         ` NeilBrown
2013-03-12  1:32           ` NeilBrown
2013-03-12 11:12             ` joystick
2013-03-20  0:54               ` NeilBrown
2013-03-12 13:45             ` Jes Sorensen
2013-03-12 23:35               ` NeilBrown
2013-03-13  7:32                 ` Jes Sorensen
2013-03-14  7:35                 ` Jes Sorensen
2013-03-20  0:55                   ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130320115518.3f5afb71@notabene.brown \
    --to=neilb@suse.de \
    --cc=Jes.Sorensen@redhat.com \
    --cc=eguan@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.