From: Nikolay Borisov <kernel@kyup.com>
To: Tejun Heo <tj@kernel.org>
Cc: "Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>,
SiteGround Operations <operations@siteground.com>
Subject: Re: corruption causing crash in __queue_work
Date: Thu, 10 Dec 2015 11:28:02 +0200 [thread overview]
Message-ID: <566945A2.1050208@kyup.com> (raw)
In-Reply-To: <20151209162744.GN30240@mtj.duckdns.org>
On 12/09/2015 06:27 PM, Tejun Heo wrote:
> Hello,
>
> On Wed, Dec 09, 2015 at 06:23:15PM +0200, Nikolay Borisov wrote:
>> I think we are seeing this at least daily on at least 1 server (we have
>> multiple servers like that). So adding printk's would likely be the way
>> to go, anything in particular you might be interested in knowing? I see
>> RCU stuff around so might be tricky race condition.
>
> Printing out the workqueue's pointer, name, pwq's pointer, the node
> being installed for and the installed pointer should give us enough
> clues. There's RCU involved but the pointers shouldn't be becoming
> NULLs unless we're installing NULL ptrs.
So the debug patch has been rolled on 1 server and several more
are in the process, here it is what it prints:
WQ: ffff88046f00ba00 (events_unbound) old_pwq: (null) new_pwq: ffff88046f00d300 node: 0
WQ: ffff88046f00be00 (events_power_efficient) old_pwq: (null) new_pwq: ffff88046f00d400 node: 0
WQ: ffff88046d71c000 (events_freezable_power_) old_pwq: (null) new_pwq: ffff88046f00d500 node: 0
WQ: ffff88046ce9ca00 (khelper) old_pwq: (null) new_pwq: ffff88046f00d600 node: 0
WQ: ffff88046ce9c000 (netns) old_pwq: (null) new_pwq: ffff88046f00d700 node: 0
WQ: ffff88046ce9d400 (perf) old_pwq: (null) new_pwq: ffff88046f00d800 node: 0
WQ: ffff88046c408000 (writeback) old_pwq: (null) new_pwq: ffff88046c800000 node: 0
WQ: ffff88046c409200 (kacpi_hotplug) old_pwq: (null) new_pwq: ffff88046c42e200 node: 0
WQ: ffff880468455600 (scsi_tmf_0) old_pwq: (null) new_pwq: ffff88046c801f00 node: 0
WQ: ffff8804687f4400 (scsi_tmf_1) old_pwq: (null) new_pwq: ffff88046caa6700 node: 0
WQ: ffff8804687f4c00 (scsi_tmf_2) old_pwq: (null) new_pwq: ffff88046caa6900 node: 0
WQ: ffff8804687f5400 (scsi_tmf_3) old_pwq: (null) new_pwq: ffff88046caa6b00 node: 0
WQ: ffff8804687f5c00 (scsi_tmf_4) old_pwq: (null) new_pwq: ffff88046caa6d00 node: 0
WQ: ffff8804687f6400 (scsi_tmf_5) old_pwq: (null) new_pwq: ffff88046caa7000 node: 0
WQ: ffff8804687f6c00 (scsi_tmf_6) old_pwq: (null) new_pwq: ffff88046caa7300 node: 0
WQ: ffff880467964000 (kdmremove) old_pwq: (null) new_pwq: ffff880467a3c800 node: 0
WQ: ffff880467965000 (deferwq) old_pwq: (null) new_pwq: ffff880467a3c100 node: 0
WQ: ffff8804669bc600 (ib_addr) old_pwq: (null) new_pwq: ffff88046845a600 node: 0
WQ: ffff88007d167e00 (qib0_0) old_pwq: (null) new_pwq: ffff880466c19800 node: 0
WQ: ffff88007d165a00 (qib0_1) old_pwq: (null) new_pwq: ffff880466c18e00 node: 0
WQ: ffff88007d165200 (ib_mad1) old_pwq: (null) new_pwq: ffff880466c19d00 node: 0
WQ: ffff8804665d2000 (ib_mad2) old_pwq: (null) new_pwq: ffff880466c18a00 node: 0
WQ: ffff8804667d7600 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880469806100 node: 0
WQ: ffff880079a9fc00 (edac-poller) old_pwq: (null) new_pwq: ffff88007d5ebf00 node: 0
WQ: ffff88046b47cc00 (kvm-irqfd-cleanup) old_pwq: (null) new_pwq: ffff8804651f0f00 node: 0
WQ: ffff8804694baa00 (kloopd0) old_pwq: (null) new_pwq: ffff88046949d100 node: 0
WQ: ffff880079a9cc00 (kloopd1) old_pwq: (null) new_pwq: ffff8804698cb900 node: 0
WQ: ffff88046809dc00 (kloopd2) old_pwq: (null) new_pwq: ffff88046957aa00 node: 0
WQ: ffff88046809c000 (kloopd3) old_pwq: (null) new_pwq: ffff8804650acc00 node: 0
WQ: ffff880466f3b000 (kloopd4) old_pwq: (null) new_pwq: ffff880469575900 node: 0
WQ: ffff88046809e800 (kloopd5) old_pwq: (null) new_pwq: ffff880469888200 node: 0
WQ: ffff88046809de00 (kloopd6) old_pwq: (null) new_pwq: ffff880469827400 node: 0
WQ: ffff88007d5f1c00 (dm_bufio_cache) old_pwq: (null) new_pwq: ffff8804673dda00 node: 0
WQ: ffff88046c42a400 (dm-thin) old_pwq: (null) new_pwq: ffff880079955100 node: 0
WQ: ffff8804672d0800 (dm-thin) old_pwq: (null) new_pwq: ffff88046baed800 node: 0
WQ: ffff88046993fa00 (dm-thin) old_pwq: (null) new_pwq: ffff8804650ff100 node: 0
WQ: ffff88046993d400 (dm-thin) old_pwq: (null) new_pwq: ffff88046949d600 node: 0
WQ: ffff88046993e400 (dm-thin) old_pwq: (null) new_pwq: ffff88046b833000 node: 0
WQ: ffff880466466400 (dm-thin) old_pwq: (null) new_pwq: ffff88007da60d00 node: 0
WQ: ffff88046b3eb200 (dm-thin) old_pwq: (null) new_pwq: ffff88046633d200 node: 0
WQ: ffff8804672d0600 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880079955400 node: 0
WQ: ffff88046b3eb600 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880465684900 node: 0
WQ: ffff88046c42a400 (dm-thin) old_pwq: (null) new_pwq: ffff8800799ee900 node: 0
WQ: ffff880466f39a00 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880469849e00 node: 0
WQ: ffff880467b0cc00 (dm-thin) old_pwq: (null) new_pwq: ffff88007d52fa00 node: 0
WQ: ffff8804672d4e00 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff88046ca07f00 node: 0
WQ: ffff880079a9ca00 (dm-thin) old_pwq: (null) new_pwq: ffff8802d1be9e00 node: 0
WQ: ffff880466175000 (dm-thin) old_pwq: (null) new_pwq: ffff8802d8efec00 node: 0
WQ: ffff880403f28400 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff8802e224dd00 node: 0
WQ: ffff880403f29a00 (dm-thin) old_pwq: (null) new_pwq: ffff880465685300 node: 0
WQ: ffff8804672d6c00 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880466d69300 node: 0
WQ: ffff880466f3ba00 (dm-thin) old_pwq: (null) new_pwq: ffff880469576500 node: 0
WQ: ffff8804672d4600 (dm-thin) old_pwq: (null) new_pwq: ffff8802d1a1ee00 node: 0
WQ: ffff8803ccf5c200 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff8804657b3200 node: 0
Is this format ok? Also I observed the exact same crash
on a machine running 4.1.12 kernel as well.
>
> Thanks.
>
next prev parent reply other threads:[~2015-12-10 9:28 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-09 12:08 corruption causing crash in __queue_work Nikolay Borisov
2015-12-09 16:08 ` Tejun Heo
2015-12-09 16:23 ` Nikolay Borisov
2015-12-09 16:27 ` Tejun Heo
2015-12-10 9:28 ` Nikolay Borisov [this message]
2015-12-10 15:29 ` Tejun Heo
2015-12-11 15:57 ` Nikolay Borisov
2015-12-11 17:08 ` Tejun Heo
2015-12-11 18:00 ` Nikolay Borisov
2015-12-11 19:14 ` Mike Snitzer
2015-12-12 11:49 ` Nikolay Borisov
2015-12-14 8:41 ` Nikolay Borisov
2015-12-14 15:31 ` Mike Snitzer
2015-12-14 20:11 ` Nikolay Borisov
2015-12-14 20:31 ` Mike Snitzer
2015-12-17 10:46 ` Nikolay Borisov
2015-12-17 15:33 ` Tejun Heo
2015-12-17 15:43 ` Nikolay Borisov
2015-12-17 15:50 ` Tejun Heo
2015-12-17 17:15 ` Mike Snitzer
[not found] ` <CAJFSNy5Lqv_xy7Lf1GEDPczHpZU8+a2CYCM-3ZR=VkDPJptmcg@mail.gmail.com>
2015-12-21 21:44 ` Tejun Heo
2015-12-21 21:45 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=566945A2.1050208@kyup.com \
--to=kernel@kyup.com \
--cc=linux-kernel@vger.kernel.org \
--cc=operations@siteground.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).