From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sam Kappen Subject: Re: schedule under irqs_disabled in SLUB problem Date: Fri, 24 Nov 2017 12:09:16 +0530 Message-ID: References: <20171102165009.u7a7ahmmywo2qugd@linutronix.de> <59FC4393.8030005@mcst.ru> <5A01812F.7040406@mcst.ru> <20171116160837.hfpnq4vb4j2osbuz@linutronix.de> <20171117173820.GM872@jcartwri.amer.corp.natinst.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" To: linux-rt-users@vger.kernel.org Return-path: Received: from mail-vk0-f51.google.com ([209.85.213.51]:46712 "EHLO mail-vk0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751643AbdKXGjS (ORCPT ); Fri, 24 Nov 2017 01:39:18 -0500 Received: by mail-vk0-f51.google.com with SMTP id 138so1660581vko.13 for ; Thu, 23 Nov 2017 22:39:18 -0800 (PST) In-Reply-To: <20171117173820.GM872@jcartwri.amer.corp.natinst.com> Sender: linux-rt-users-owner@vger.kernel.org List-ID: Hi, I am also faces a similar kind of issue on X86 target, while testing 3.10.105-rt119. The issue is seen during boot-up when USB/SCSI enumeration starts. Below is the log from my console scsi 0:0:0:0: Direct-Access Linux scsi_debug 0004 PQ: 0 ANSI: 5 ------------[ cut here ]------------ ------------[ cut here ]------------ WARNING: at kernel/sched/core.c:3052 migrate_disable+0xee/0x100() Modules linked in: CPU: 3 PID: 7 Comm: kworker/u16:0 Not tainted 3.10.107-rt120+ #2 Hardware name: Intel Corporation S1200RP_SE/S1200RP_SE, BIOS S1200RP.86B.02.02.0005.102320140911 10/23/2014 Workqueue: events_unbound async_run_entry_fn 0000000000000000 ffff880244927338 ffffffff8168b2f0 0000000000000000 0000000000000009 ffff880244927370 ffffffff8105ef8c ffff8802448fb540 0000000000000025 0000000000000004 0000000000000025 ffffffff81d9810c Call Trace: [] dump_stack+0x4f/0x65 [] warn_slowpath_common+0x5c/0xa0 [] warn_slowpath_null+0x15/0x20 [] migrate_disable+0xee/0x100 [] call_console_drivers.constprop.14+0x4f/0xd0 [] console_unlock+0x2a1/0x470 [] vprintk_emit+0x2d2/0x550 [] ? _raw_spin_unlock_irqrestore+0x19/0x50 [] ? migrate_enable+0x15e/0x1f0 [] printk+0x4a/0x52 [] ? migrate_enable+0x15e/0x1f0 [] warn_slowpath_common+0x2a/0xa0 [] warn_slowpath_null+0x15/0x20 [] migrate_enable+0x15e/0x1f0 [] get_page_from_freelist+0x630/0xb90 [] ? rt_spin_lock_slowlock+0x2ca/0x310 [] __alloc_pages_nodemask+0x13d/0x9e0 [] ? get_page_from_freelist+0x662/0xb90 [] alloc_pages_current+0xb0/0x150 [] new_slab+0x2b5/0x380 [] __slab_alloc.isra.18+0x58a/0x670 [] ? scsi_pool_alloc_command+0x20/0x70 [] ? alloc_pages_current+0xb0/0x150 [] kmem_cache_alloc+0xd6/0x100 [] ? scsi_pool_alloc_command+0x20/0x70 [] scsi_pool_alloc_command+0x20/0x70 [] scsi_host_alloc_command.isra.1+0x1e/0x80 [] __scsi_get_command+0x20/0xc0 [] scsi_get_command+0x33/0xc0 [] scsi_get_cmd_from_req+0x4a/0x60 [] scsi_setup_blk_pc_cmnd+0x2b/0xf0 [] scsi_prep_fn+0x3c/0x50 [] blk_peek_request+0xf3/0x1c0 [] scsi_request_fn+0x50/0x570 [] __blk_run_queue+0x2e/0x40 [] blk_execute_rq_nowait+0x70/0x100 [] blk_execute_rq+0x88/0xe0 sd 0:0:0:0: Attached scsi generic sg0 type 0 [] ? blk_rq_bio_prep+0x60/0xc0 [] ? blk_rq_map_kern+0xf0/0x170 [] ? blk_get_request+0x60/0xe0 [] scsi_execute+0xf0/0x150 [] scsi_execute_req_flags+0x82/0xf0 [] read_capacity_16+0xcf/0x520 [] sd_revalidate_disk+0x350/0x1bd0 [] sd_probe_async+0xc4/0x1d0 [] async_run_entry_fn+0x32/0x130 [] process_one_work+0x145/0x420 [] worker_thread+0x163/0x470 [] ? preempt_schedule+0x4c/0x70 [] ? manage_workers.isra.7+0x2d0/0x2d0 [] kthread+0xbf/0xd0 [] ? kthread_worker_fn+0x1a0/0x1a0 [] ret_from_fork+0x4e/0x80 [] ? kthread_worker_fn+0x1a0/0x1a0 ---[ end trace 0000000000000001 ]--- ------------[ cut here ]------------ WARNING: at kernel/sched/core.c:3087 migrate_enable+0x15e/0x1f0() Modules linked in: CPU: 3 PID: 7 Comm: kworker/u16:0 Tainted: G W 3.10.1 Test case to reproduce: 1. Enable PXE boot and mount file-system on USB stick 2. Continuously reboot the system with USB stick connected 3. We generally see the issue after every 3 to 5 hours. On looking at the issue it is identified that there is some piece of code someplace that calls migrate_disable() with interrupts off, enables interrupts, then calls migrate_enable(). On instrumentation it is observed that for some SCSI layer calls(calls from get_requests) the above condition is not evaluated to true hence reaches at buffered_rmqueue with irqs in disabled state. >>From the below call chain buffered_rmqueue-> local_spin_lock_irqsave -> local_lock_irqsave -> spin_lock ->rt_spin_lock -> rt_spin_lock_fastlock -> rt_spin_lock_slowlock In a normal case, when it enters rt_spin_lock_slowlock with irqs_disabled, the same is returned in below case, if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) { raw_spin_unlock(&lock->wait_lock); return; } But in the some case above condition is meet true and the control reaches below in same function, pi_lock(&self->pi_lock); self->saved_state = self->state; __set_current_state(TASK_UNINTERRUPTIBLE); pi_unlock(&self->pi_lock); pi_lock & pi_unlock disables and enables the irqs respectively, so in this special case the irq state is not retained while exiting rt_spin_lock_slowlock function and this results in the crash! Could you please help to resolve the issue. Regards, Sam On Fri, Nov 17, 2017 at 11:08 PM, Julia Cartwright wrote: > On Thu, Nov 16, 2017 at 05:08:37PM +0100, Sebastian Andrzej Siewior wrote: >> + Steven & Julia >> >> On 2017-11-07 12:47:27 [+0300], Pavel V. Panteleev wrote: >> > Thanks, it works. >> >> Okay, good to hear. >> >> Steven + Julia: >> We need to decide what are going to do about this stable-wise. The bug >> was reported against 3.14.79-rt85 and the devel tree is not affected*. >> The thread starts at >> https://www.spinics.net/lists/linux-rt-users/msg17560.html > > Your proposed patch seems reasonable to me to pull back into the > relevant releases. Can you send a proper patch against the latest > affected tree (4.9?) and the stable team will pull it back? It looks > like it will need some minor massaging on it's way back, but that > shouldn't be a problem. > > Thanks, > Julia > -- > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html