From: Nikolay Borisov <kernel@kyup.com>
To: Tejun Heo <tj@kernel.org>
Cc: "Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>,
SiteGround Operations <operations@siteground.com>
Subject: Re: corruption causing crash in __queue_work
Date: Thu, 10 Dec 2015 11:28:02 +0200 [thread overview]
Message-ID: <566945A2.1050208@kyup.com> (raw)
In-Reply-To: <20151209162744.GN30240@mtj.duckdns.org>
On 12/09/2015 06:27 PM, Tejun Heo wrote:
> Hello,
>
> On Wed, Dec 09, 2015 at 06:23:15PM +0200, Nikolay Borisov wrote:
>> I think we are seeing this at least daily on at least 1 server (we have
>> multiple servers like that). So adding printk's would likely be the way
>> to go, anything in particular you might be interested in knowing? I see
>> RCU stuff around so might be tricky race condition.
>
> Printing out the workqueue's pointer, name, pwq's pointer, the node
> being installed for and the installed pointer should give us enough
> clues. There's RCU involved but the pointers shouldn't be becoming
> NULLs unless we're installing NULL ptrs.
So the debug patch has been rolled on 1 server and several more
are in the process, here it is what it prints:
WQ: ffff88046f00ba00 (events_unbound) old_pwq: (null) new_pwq: ffff88046f00d300 node: 0
WQ: ffff88046f00be00 (events_power_efficient) old_pwq: (null) new_pwq: ffff88046f00d400 node: 0
WQ: ffff88046d71c000 (events_freezable_power_) old_pwq: (null) new_pwq: ffff88046f00d500 node: 0
WQ: ffff88046ce9ca00 (khelper) old_pwq: (null) new_pwq: ffff88046f00d600 node: 0
WQ: ffff88046ce9c000 (netns) old_pwq: (null) new_pwq: ffff88046f00d700 node: 0
WQ: ffff88046ce9d400 (perf) old_pwq: (null) new_pwq: ffff88046f00d800 node: 0
WQ: ffff88046c408000 (writeback) old_pwq: (null) new_pwq: ffff88046c800000 node: 0
WQ: ffff88046c409200 (kacpi_hotplug) old_pwq: (null) new_pwq: ffff88046c42e200 node: 0
WQ: ffff880468455600 (scsi_tmf_0) old_pwq: (null) new_pwq: ffff88046c801f00 node: 0
WQ: ffff8804687f4400 (scsi_tmf_1) old_pwq: (null) new_pwq: ffff88046caa6700 node: 0
WQ: ffff8804687f4c00 (scsi_tmf_2) old_pwq: (null) new_pwq: ffff88046caa6900 node: 0
WQ: ffff8804687f5400 (scsi_tmf_3) old_pwq: (null) new_pwq: ffff88046caa6b00 node: 0
WQ: ffff8804687f5c00 (scsi_tmf_4) old_pwq: (null) new_pwq: ffff88046caa6d00 node: 0
WQ: ffff8804687f6400 (scsi_tmf_5) old_pwq: (null) new_pwq: ffff88046caa7000 node: 0
WQ: ffff8804687f6c00 (scsi_tmf_6) old_pwq: (null) new_pwq: ffff88046caa7300 node: 0
WQ: ffff880467964000 (kdmremove) old_pwq: (null) new_pwq: ffff880467a3c800 node: 0
WQ: ffff880467965000 (deferwq) old_pwq: (null) new_pwq: ffff880467a3c100 node: 0
WQ: ffff8804669bc600 (ib_addr) old_pwq: (null) new_pwq: ffff88046845a600 node: 0
WQ: ffff88007d167e00 (qib0_0) old_pwq: (null) new_pwq: ffff880466c19800 node: 0
WQ: ffff88007d165a00 (qib0_1) old_pwq: (null) new_pwq: ffff880466c18e00 node: 0
WQ: ffff88007d165200 (ib_mad1) old_pwq: (null) new_pwq: ffff880466c19d00 node: 0
WQ: ffff8804665d2000 (ib_mad2) old_pwq: (null) new_pwq: ffff880466c18a00 node: 0
WQ: ffff8804667d7600 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880469806100 node: 0
WQ: ffff880079a9fc00 (edac-poller) old_pwq: (null) new_pwq: ffff88007d5ebf00 node: 0
WQ: ffff88046b47cc00 (kvm-irqfd-cleanup) old_pwq: (null) new_pwq: ffff8804651f0f00 node: 0
WQ: ffff8804694baa00 (kloopd0) old_pwq: (null) new_pwq: ffff88046949d100 node: 0
WQ: ffff880079a9cc00 (kloopd1) old_pwq: (null) new_pwq: ffff8804698cb900 node: 0
WQ: ffff88046809dc00 (kloopd2) old_pwq: (null) new_pwq: ffff88046957aa00 node: 0
WQ: ffff88046809c000 (kloopd3) old_pwq: (null) new_pwq: ffff8804650acc00 node: 0
WQ: ffff880466f3b000 (kloopd4) old_pwq: (null) new_pwq: ffff880469575900 node: 0
WQ: ffff88046809e800 (kloopd5) old_pwq: (null) new_pwq: ffff880469888200 node: 0
WQ: ffff88046809de00 (kloopd6) old_pwq: (null) new_pwq: ffff880469827400 node: 0
WQ: ffff88007d5f1c00 (dm_bufio_cache) old_pwq: (null) new_pwq: ffff8804673dda00 node: 0
WQ: ffff88046c42a400 (dm-thin) old_pwq: (null) new_pwq: ffff880079955100 node: 0
WQ: ffff8804672d0800 (dm-thin) old_pwq: (null) new_pwq: ffff88046baed800 node: 0
WQ: ffff88046993fa00 (dm-thin) old_pwq: (null) new_pwq: ffff8804650ff100 node: 0
WQ: ffff88046993d400 (dm-thin) old_pwq: (null) new_pwq: ffff88046949d600 node: 0
WQ: ffff88046993e400 (dm-thin) old_pwq: (null) new_pwq: ffff88046b833000 node: 0
WQ: ffff880466466400 (dm-thin) old_pwq: (null) new_pwq: ffff88007da60d00 node: 0
WQ: ffff88046b3eb200 (dm-thin) old_pwq: (null) new_pwq: ffff88046633d200 node: 0
WQ: ffff8804672d0600 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880079955400 node: 0
WQ: ffff88046b3eb600 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880465684900 node: 0
WQ: ffff88046c42a400 (dm-thin) old_pwq: (null) new_pwq: ffff8800799ee900 node: 0
WQ: ffff880466f39a00 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880469849e00 node: 0
WQ: ffff880467b0cc00 (dm-thin) old_pwq: (null) new_pwq: ffff88007d52fa00 node: 0
WQ: ffff8804672d4e00 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff88046ca07f00 node: 0
WQ: ffff880079a9ca00 (dm-thin) old_pwq: (null) new_pwq: ffff8802d1be9e00 node: 0
WQ: ffff880466175000 (dm-thin) old_pwq: (null) new_pwq: ffff8802d8efec00 node: 0
WQ: ffff880403f28400 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff8802e224dd00 node: 0
WQ: ffff880403f29a00 (dm-thin) old_pwq: (null) new_pwq: ffff880465685300 node: 0
WQ: ffff8804672d6c00 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880466d69300 node: 0
WQ: ffff880466f3ba00 (dm-thin) old_pwq: (null) new_pwq: ffff880469576500 node: 0
WQ: ffff8804672d4600 (dm-thin) old_pwq: (null) new_pwq: ffff8802d1a1ee00 node: 0
WQ: ffff8803ccf5c200 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff8804657b3200 node: 0
Is this format ok? Also I observed the exact same crash
on a machine running 4.1.12 kernel as well.
>
> Thanks.
>
next prev parent reply other threads:[~2015-12-10 9:28 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-09 12:08 corruption causing crash in __queue_work Nikolay Borisov
2015-12-09 16:08 ` Tejun Heo
2015-12-09 16:23 ` Nikolay Borisov
2015-12-09 16:27 ` Tejun Heo
2015-12-10 9:28 ` Nikolay Borisov [this message]
2015-12-10 15:29 ` Tejun Heo
2015-12-11 15:57 ` Nikolay Borisov
2015-12-11 17:08 ` Tejun Heo
2015-12-11 18:00 ` Nikolay Borisov
2015-12-11 19:14 ` Mike Snitzer
2015-12-12 11:49 ` Nikolay Borisov
2015-12-14 8:41 ` Nikolay Borisov
2015-12-14 8:41 ` Nikolay Borisov
2015-12-14 15:31 ` Mike Snitzer
2015-12-14 20:11 ` Nikolay Borisov
2015-12-14 20:31 ` Mike Snitzer
2015-12-17 10:46 ` Nikolay Borisov
2015-12-17 15:33 ` Tejun Heo
2015-12-17 15:43 ` Nikolay Borisov
2015-12-17 15:50 ` Tejun Heo
2015-12-17 17:15 ` Mike Snitzer
2015-12-19 13:34 ` Nikolay Borisov
2015-12-21 21:44 ` Tejun Heo
2015-12-21 21:45 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=566945A2.1050208@kyup.com \
--to=kernel@kyup.com \
--cc=linux-kernel@vger.kernel.org \
--cc=operations@siteground.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.