From: Jens Axboe <axboe@kernel.dk>
To: Will Deacon <will.deacon@arm.com>,
rusty@rustcorp.com.au, tj@kernel.org, hch@lst.de
Cc: linux-kernel@vger.kernel.org
Subject: Re: 3.17-rc2 percpu-ref oops with virtblk remove
Date: Thu, 28 Aug 2014 12:51:38 -0600 [thread overview]
Message-ID: <53FF7A3A.7060607@kernel.dk> (raw)
In-Reply-To: <20140828185046.GR22580@arm.com>
On 08/28/2014 12:50 PM, Will Deacon wrote:
> Hi guys,
>
> I've been debugging an issue triggered by virtio-blk and vfio on an
> arm64 system running 3.17-rc2. The problem occurs when removing a
> virtio-pci block device, which triggers the following warning in the
> percpu-ref code:
>
> WARNING: CPU: 0 PID: 1312 at lib/percpu-refcount.c:179 percpu_ref_kill_and_confirm+0x90/0x9c()
> percpu_ref_kill() called more than once!
> Modules linked in:
> CPU: 0 PID: 1312 Comm: vfio-setup.sh Not tainted 3.17.0-rc2+ #5
> Call trace:
> [<ffffffc0002b68b0>] percpu_ref_kill_and_confirm+0x8c/0x9c
> [<ffffffc0002938e8>] blk_mq_freeze_queue+0x34/0xc8
> [<ffffffc000294dac>] blk_mq_free_queue+0x1c/0x10c
> [<ffffffc00028cc1c>] blk_release_queue+0x68/0xa8
> [<ffffffc0002a7208>] kobject_release+0x40/0x84
> [<ffffffc0002a7284>] kobject_put+0x38/0x6c
> [<ffffffc000288414>] blk_put_queue+0xc/0x18
> [<ffffffc0002989b0>] disk_release+0x7c/0xb8
> [<ffffffc0003245f0>] device_release+0x30/0x98
> [<ffffffc0002a7208>] kobject_release+0x40/0x84
> [<ffffffc0002a7284>] kobject_put+0x38/0x6c
> [<ffffffc0002980cc>] put_disk+0x10/0x1c
> [<ffffffc00033b010>] virtblk_remove+0x74/0xd4
>
> Some more debugging shows that virtblk_remove earlier called
> blk_cleanup_queue:
>
> [<ffffffc0002b686c>] percpu_ref_kill_and_confirm+0x48/0x9c
> [<ffffffc0002938e8>] blk_mq_freeze_queue+0x34/0xc8
> [<ffffffc000288fd4>] blk_cleanup_queue+0x70/0xf4
> [<ffffffc00033afe0>] virtblk_remove+0x44/0xd4
>
> which ends up marking the percpu ref as dead and queues some RCU work
> to switch over to the atomic_t version using call_rcu_sched. The
> virtblk_remove function then continues happily and calls put_disk (first
> backtrace above), which ends up trying to get exclusive access to the
> queue by killing the usage counter and waiting for the percpu reference
> to become zero. Unfortunately, since the reference is already marked as
> dead, the call to percpu_ref_kill triggers the warning. Worse still, we
> then queue the RCU callback a second time, which dies with something
> like:
>
> Unable to handle kernel paging request at virtual address 87f971000
> pgd = ffffffc87aca8000
> [87f971000] *pgd=0000000000000000, *pud=0000000000000000
> Internal error: Oops: 96000005 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 0 PID: 1312 Comm: systemd-udevd Tainted: G W 3.17.0-rc2+ #9
> task: ffffffc87bb89480 ti: ffffffc87ace0000 task.ti: ffffffc87ace0000
> PC is at percpu_ref_kill_rcu+0x50/0x188
> LR is at percpu_ref_kill_rcu+0x6c/0x188
> pc : [<ffffffc0002b6a34>] lr : [<ffffffc0002b6a50>] pstate: 80000145
> sp : ffffffc87ace3840
> x29: ffffffc87ace3840 x28: ffffffc000470000
> x27: 0000000000000000 x26: ffffffc87ff74a00
> x25: ffffffc00061f238 x24: ffffffc07f858d68
> x23: ffffffc000657000 x22: ffffffc07f858d88
> x21: ffffffc00064bc68 x20: 0000000000000000
> x19: 0000000000000000 x18: 0000007fc81227e0
> x17: 0000007fb74f1194 x16: ffffffc000161a44
> x15: 0000007fb757b5a0 x14: 0000000000000008
> x13: 0000000000000000 x12: ffffffc000470c70
> x11: 0000000000000005 x10: 0101010101010101
> x9 : ffffffc000470c4b x8 : ffffffc000647318
> x7 : fffffffffffffe40 x6 : 0000000000005b00
> x5 : ffffffc00064bc68 x4 : 0000000000000038
> x3 : 00000000000000ff x2 : 0000000000000000
> x1 : ffffffc000657ee8 x0 : 000000087f971000
>
> because the percpu_ref has been freed from blk_mq_free_queue.
>
> I'm not really sure how to fix this -- it seems like we shouldn't try to
> kill a reference that is already dead, but using __pcpu_ref_alive isn't
> the right answer. Simply removing the warning works for me (patch below),
> but that also feels like a hack (we skip the confirm callback, for a
> start).
>
> Any ideas?
It's fixed in for-linus, which will go out soon, more stuff just kept
coming in. Fix:
http://git.kernel.dk/?p=linux-block.git;a=commit;h=cddd5d17642cc6881352732693c2ae6930e9ce65
--
Jens Axboe
next prev parent reply other threads:[~2014-08-28 18:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-28 18:50 3.17-rc2 percpu-ref oops with virtblk remove Will Deacon
2014-08-28 18:51 ` Jens Axboe [this message]
2014-08-28 18:54 ` Will Deacon
2014-08-28 20:42 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53FF7A3A.7060607@kernel.dk \
--to=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
--cc=tj@kernel.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.