Re: schedule under irqs_disabled in SLUB problem

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Sam Kappen <skappen@mvista.com>
To: linux-rt-users@vger.kernel.org
Subject: Re: schedule under irqs_disabled in SLUB problem
Date: Fri, 24 Nov 2017 12:09:16 +0530	[thread overview]
Message-ID: <CAJ9FNxsfHRuc+UZCPKGDJDtk5neApOb3i6thpGTy9c-oP8T4JA@mail.gmail.com> (raw)
In-Reply-To: <20171117173820.GM872@jcartwri.amer.corp.natinst.com>

Hi,

I am also faces a similar kind of issue on X86 target, while testing
3.10.105-rt119.
The issue is seen during boot-up when USB/SCSI enumeration starts.

Below is the log from my console

scsi 0:0:0:0: Direct-Access     Linux    scsi_debug       0004 PQ: 0 ANSI: 5
------------[ cut here ]------------
------------[ cut here ]------------
WARNING: at kernel/sched/core.c:3052 migrate_disable+0xee/0x100()
Modules linked in:
CPU: 3 PID: 7 Comm: kworker/u16:0 Not tainted 3.10.107-rt120+ #2
Hardware name: Intel Corporation S1200RP_SE/S1200RP_SE, BIOS
S1200RP.86B.02.02.0005.102320140911 10/23/2014
Workqueue: events_unbound async_run_entry_fn
 0000000000000000 ffff880244927338 ffffffff8168b2f0 0000000000000000
 0000000000000009 ffff880244927370 ffffffff8105ef8c ffff8802448fb540
 0000000000000025 0000000000000004 0000000000000025 ffffffff81d9810c
Call Trace:
 [<ffffffff8168b2f0>] dump_stack+0x4f/0x65
 [<ffffffff8105ef8c>] warn_slowpath_common+0x5c/0xa0
 [<ffffffff8105f085>] warn_slowpath_null+0x15/0x20
 [<ffffffff8109355e>] migrate_disable+0xee/0x100
 [<ffffffff810600af>] call_console_drivers.constprop.14+0x4f/0xd0
 [<ffffffff81061241>] console_unlock+0x2a1/0x470
 [<ffffffff810616e2>] vprintk_emit+0x2d2/0x550
 [<ffffffff8168eb49>] ? _raw_spin_unlock_irqrestore+0x19/0x50
 [<ffffffff810936ce>] ? migrate_enable+0x15e/0x1f0
 [<ffffffff816892d3>] printk+0x4a/0x52
 [<ffffffff810936ce>] ? migrate_enable+0x15e/0x1f0
 [<ffffffff8105ef5a>] warn_slowpath_common+0x2a/0xa0
 [<ffffffff8105f085>] warn_slowpath_null+0x15/0x20
 [<ffffffff810936ce>] migrate_enable+0x15e/0x1f0
 [<ffffffff810fce40>] get_page_from_freelist+0x630/0xb90
 [<ffffffff8168e32a>] ? rt_spin_lock_slowlock+0x2ca/0x310
 [<ffffffff810fe36d>] __alloc_pages_nodemask+0x13d/0x9e0
 [<ffffffff810fce72>] ? get_page_from_freelist+0x662/0xb90
 [<ffffffff81133dd0>] alloc_pages_current+0xb0/0x150
 [<ffffffff81138e05>] new_slab+0x2b5/0x380
 [<ffffffff8113b67a>] __slab_alloc.isra.18+0x58a/0x670
 [<ffffffff813d3f40>] ? scsi_pool_alloc_command+0x20/0x70
 [<ffffffff81133dd0>] ? alloc_pages_current+0xb0/0x150
 [<ffffffff8113b956>] kmem_cache_alloc+0xd6/0x100
 [<ffffffff813d3f40>] ? scsi_pool_alloc_command+0x20/0x70
 [<ffffffff813d3f40>] scsi_pool_alloc_command+0x20/0x70
 [<ffffffff813d492e>] scsi_host_alloc_command.isra.1+0x1e/0x80
 [<ffffffff813d49b0>] __scsi_get_command+0x20/0xc0
 [<ffffffff813d4a83>] scsi_get_command+0x33/0xc0
 [<ffffffff813dad1a>] scsi_get_cmd_from_req+0x4a/0x60
 [<ffffffff813db6cb>] scsi_setup_blk_pc_cmnd+0x2b/0xf0
 [<ffffffff813db8fc>] scsi_prep_fn+0x3c/0x50
 [<ffffffff812c9ef3>] blk_peek_request+0xf3/0x1c0
 [<ffffffff813db960>] scsi_request_fn+0x50/0x570
 [<ffffffff812c6c6e>] __blk_run_queue+0x2e/0x40
 [<ffffffff812cdde0>] blk_execute_rq_nowait+0x70/0x100
 [<ffffffff812cdef8>] blk_execute_rq+0x88/0xe0
sd 0:0:0:0: Attached scsi generic sg0 type 0
 [<ffffffff812ca040>] ? blk_rq_bio_prep+0x60/0xc0
 [<ffffffff812cdcf0>] ? blk_rq_map_kern+0xf0/0x170
 [<ffffffff812c86c0>] ? blk_get_request+0x60/0xe0
 [<ffffffff813da050>] scsi_execute+0xf0/0x150
 [<ffffffff813da182>] scsi_execute_req_flags+0x82/0xf0
 [<ffffffff8145d87f>] read_capacity_16+0xcf/0x520
 [<ffffffff8145e060>] sd_revalidate_disk+0x350/0x1bd0
 [<ffffffff8145f9a4>] sd_probe_async+0xc4/0x1d0
 [<ffffffff8108e7c2>] async_run_entry_fn+0x32/0x130
 [<ffffffff8107f5a5>] process_one_work+0x145/0x420
 [<ffffffff81080903>] worker_thread+0x163/0x470
 [<ffffffff8168d91c>] ? preempt_schedule+0x4c/0x70
 [<ffffffff810807a0>] ? manage_workers.isra.7+0x2d0/0x2d0
 [<ffffffff8108735f>] kthread+0xbf/0xd0
 [<ffffffff810872a0>] ? kthread_worker_fn+0x1a0/0x1a0
 [<ffffffff8168f6be>] ret_from_fork+0x4e/0x80
 [<ffffffff810872a0>] ? kthread_worker_fn+0x1a0/0x1a0
---[ end trace 0000000000000001 ]---
------------[ cut here ]------------
WARNING: at kernel/sched/core.c:3087 migrate_enable+0x15e/0x1f0()
Modules linked in:
CPU: 3 PID: 7 Comm: kworker/u16:0 Tainted: G        W    3.10.1


Test case to reproduce:
1. Enable PXE boot and mount file-system on USB stick
2. Continuously reboot the system with USB stick connected
3. We generally see the issue after every 3 to 5 hours.


On looking at the issue it is identified that there is some piece of
code someplace that
calls migrate_disable() with interrupts off, enables interrupts, then calls
migrate_enable().

On instrumentation it is observed that for some SCSI layer calls(calls from
get_requests) the above condition is not evaluated to true hence reaches at
buffered_rmqueue with irqs in disabled state.

>From the below call chain

buffered_rmqueue-> local_spin_lock_irqsave -> local_lock_irqsave -> spin_lock
->rt_spin_lock -> rt_spin_lock_fastlock -> rt_spin_lock_slowlock

In a normal case, when it enters rt_spin_lock_slowlock with irqs_disabled, the
same is returned in below case,

        if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) {
                raw_spin_unlock(&lock->wait_lock);
                return;
        }

But in the some case above condition is meet true and the control reaches below
in same function,

        pi_lock(&self->pi_lock);
        self->saved_state = self->state;
        __set_current_state(TASK_UNINTERRUPTIBLE);
        pi_unlock(&self->pi_lock);


pi_lock & pi_unlock disables and enables the irqs respectively, so in this
special case the irq state is not retained while exiting rt_spin_lock_slowlock
function and this results in the crash!

Could you please help to resolve the issue.

Regards,
Sam

On Fri, Nov 17, 2017 at 11:08 PM, Julia Cartwright <julia@ni.com> wrote:
> On Thu, Nov 16, 2017 at 05:08:37PM +0100, Sebastian Andrzej Siewior wrote:
>> + Steven & Julia
>>
>> On 2017-11-07 12:47:27 [+0300], Pavel V. Panteleev wrote:
>> > Thanks, it works.
>>
>> Okay, good to hear.
>>
>> Steven + Julia:
>> We need to decide what are going to do about this stable-wise. The bug
>> was reported against 3.14.79-rt85 and the devel tree is not affected*.
>> The thread starts at
>>   https://www.spinics.net/lists/linux-rt-users/msg17560.html
>
> Your proposed patch seems reasonable to me to pull back into the
> relevant releases.  Can you send a proper patch against the latest
> affected tree (4.9?) and the stable team will pull it back?  It looks
> like it will need some minor massaging on it's way back, but that
> shouldn't be a problem.
>
> Thanks,
>    Julia
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2017-11-24  6:39 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CADF-jezvVP2O++FR2KiRSSSJF7oObjy8LSP3-yj1HCmxyTzB_Q@mail.gmail.com>
2017-11-02 16:50 ` schedule under irqs_disabled in SLUB problem Sebastian Andrzej Siewior
2017-11-02 20:55   ` Grygorii Strashko
     [not found]   ` <CADF-jexLs9vRuiuoRmcA+0L6Mp-XxW75okheWV+ipGf1b_Ua1w@mail.gmail.com>
2017-11-03 10:23     ` Pavel V. Panteleev
2017-11-07  9:00       ` Pavel V. Panteleev
2017-11-07  9:14       ` Pavel V. Panteleev
2017-11-07  9:47       ` Pavel V. Panteleev
2017-11-16 16:08         ` Sebastian Andrzej Siewior
2017-11-16 16:39           ` Pavel V. Panteleev
2017-11-17 17:38           ` Julia Cartwright
2017-11-24  6:39             ` Sam Kappen [this message]
2017-11-24  9:37               ` Sebastian Andrzej Siewior
2017-11-27  6:46                 ` Sam Kappen
2017-12-04  9:59                   ` Sebastian Andrzej Siewior
2017-12-05 16:31                     ` Sam Kappen
2017-12-12 10:18                       ` Sebastian Andrzej Siewior
2018-03-05  8:47                         ` Sam Kappen
2018-03-05 17:40                           ` Sebastian Andrzej Siewior
2017-11-24  9:35             ` [PATCH] mm/slub: enable IRQs once scheduling is working Sebastian Andrzej Siewior
2017-11-01 11:31 schedule under irqs_disabled in SLUB problem Pavel V. Panteleev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ9FNxsfHRuc+UZCPKGDJDtk5neApOb3i6thpGTy9c-oP8T4JA@mail.gmail.com \
    --to=skappen@mvista.com \
    --cc=linux-rt-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).