From: Paul Gortmaker <paul.gortmaker@windriver.com>
To: Thavatchai Makphaibulchoke <thavatchai.makpahibulchoke@hp.com>
Cc: Thavatchai Makphaibulchoke <tmac@hp.com>, <rostedt@goodmis.org>,
<linux-kernel@vger.kernel.org>, <mingo@redhat.com>,
<tglx@linutronix.de>, <linux-rt-users@vger.kernel.org>
Subject: Re: [PATCH RT v2] kernel/res_counter.c: Change lock of struct res_counter to raw_spinlock_t
Date: Fri, 13 Feb 2015 16:30:17 -0500 [thread overview]
Message-ID: <20150213213016.GC7102@windriver.com> (raw)
In-Reply-To: <54DE6AD5.9070000@hp.com>
[Re: [PATCH RT v2] kernel/res_counter.c: Change lock of struct res_counter to raw_spinlock_t] On 13/02/2015 (Fri 14:21) Thavatchai Makphaibulchoke wrote:
>
>
> On 02/13/2015 12:19 PM, Paul Gortmaker wrote:
> >
> > I think there is more to this issue than just a lock conversion.
> > Firstly, if we look at the existing -rt patches, we've got the old
> > patch from ~2009 that is:
> >
>
> Thanks Paul for testing and reporting the problem.
>
> Yes, looks like the issue probably involve more than converting to a
> raw_spinlock_t.
>
> > From: Ingo Molnar <mingo@elte.hu>
> > Date: Fri, 3 Jul 2009 08:44:33 -0500
> > Subject: [PATCH] core: Do not disable interrupts on RT in res_counter.c
> >
> > which changed the local_irq_save to local_irq_save_nort in order to
> > avoid such a raw lock conversion.
> >
>
> The patch did not quite state explicitly that the fix was to avoid raw
> lock conversion. I guess one could infer so.
Yes, it is kind of the implicit choice ; don't disable interrupts, or
don't use a sleeping lock.
>
> Anyway as the patch also mentioned, the code needs a second look.
>
> I'll try to see if I could rework my patch.
>
> > Also, when I test this patch on a larger machine with lots of cores, I
> > get boot up issues (general protection fault while trying to access the
> > raw lock) or RCU stalls that trigger broadcast NMI backtraces; both which
> > implicate the same code area, and they go away with a revert.
> >
>
> Could you please let me know how many cores/threads you are running.
Interestingly, when I did a quick sanity test on a core2-duo (~5year old
desktop) it seemed fine. Only on the larger machine did it really go
pear shaped. That machine looks like this:
root@yow-intel-canoe-pass-4-21774:~# cat /proc/cpuinfo |grep name|uniq
model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
root@yow-intel-canoe-pass-4-21774:~# cat /proc/cpuinfo |grep name|wc -l
20
root@yow-intel-canoe-pass-4-21774:~#
>
> Could you please also send me a stack trace for the protection fault
> problem, if available.
I rebooted several times and the rcu fail seemed to be the most common
fail. The machine was writing logs to /var/volatile so I don't have
a saved copy of that one :( -- if time permits I'll have a go at
rebooting a few more times with the patch to see if I can capture it.
Paul.
--
>
> Thanks again for reporting the problem.
>
> Thanks,
> Mak.
>
>
> > Stuff like the below. Figured I'd better mention it since Steve was
> > talking about rounding up patches for stable, and the solution to the
> > original problem reported here seems to need to be revisited.
> >
> > Paul.
> > --
> >
> >
> > [ 38.615736] NMI backtrace for cpu 15
> > [ 38.615739] CPU: 15 PID: 835 Comm: ovirt-engine.py Not tainted 3.14.33-rt28-WR7.0.0.0_ovp+ #3
> > [ 38.615740] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.01.0002.082220131453 08/22/2013
> > [ 38.615742] task: ffff880faca80000 ti: ffff880f9d890000 task.ti: ffff880f9d890000
> > [ 38.615751] RIP: 0010:[<ffffffff810820a1>] [<ffffffff810820a1>] preempt_count_add+0x41/0xb0
> > [ 38.615752] RSP: 0018:ffff880ffd5e3d00 EFLAGS: 00000097
> > [ 38.615754] RAX: 0000000000010002 RBX: 0000000000000001 RCX: 0000000000000000
> > [ 38.615755] RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000001
> > [ 38.615756] RBP: ffff880ffd5e3d08 R08: ffffffff82317700 R09: 0000000000000028
> > [ 38.615757] R10: 000000000000000f R11: 0000000000017484 R12: 0000000000044472
> > [ 38.615758] R13: 000000000000000f R14: 00000000c42caa68 R15: 0000000000000010
> > [ 38.615760] FS: 00007effa30c2700(0000) GS:ffff880ffd5e0000(0000) knlGS:0000000000000000
> > [ 38.615761] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 38.615762] CR2: 00007f19e3c29320 CR3: 0000000f9f9a3000 CR4: 00000000001407e0
> > [ 38.615763] Stack:
> > [ 38.615765] 00000000c42caa20 ffff880ffd5e3d38 ffffffff8140e524 0000000000001000
> > [ 38.615767] 00000000000003e9 0000000000000400 0000000000000002 ffff880ffd5e3d48
> > [ 38.615769] ffffffff8140e43f ffff880ffd5e3d58 ffffffff8140e477 ffff880ffd5e3d78
> > [ 38.615769] Call Trace:
> > [ 38.615771] <IRQ>
> > [ 38.615779] [<ffffffff8140e524>] delay_tsc+0x44/0xd0
> > [ 38.615782] [<ffffffff8140e43f>] __delay+0xf/0x20
> > [ 38.615784] [<ffffffff8140e477>] __const_udelay+0x27/0x30
> > [ 38.615788] [<ffffffff810355da>] native_safe_apic_wait_icr_idle+0x2a/0x60
> > [ 38.615792] [<ffffffff81036c80>] default_send_IPI_mask_sequence_phys+0xc0/0xe0
> > [ 38.615798] [<ffffffff8103a5f7>] physflat_send_IPI_all+0x17/0x20
> > [ 38.615801] [<ffffffff81036e80>] arch_trigger_all_cpu_backtrace+0x70/0xb0
> > [ 38.615807] [<ffffffff810b4d41>] rcu_check_callbacks+0x4f1/0x840
> > [ 38.615814] [<ffffffff8105365e>] ? raise_softirq_irqoff+0xe/0x40
> > [ 38.615821] [<ffffffff8105cc52>] update_process_times+0x42/0x70
> > [ 38.615826] [<ffffffff810c0336>] tick_sched_handle.isra.15+0x36/0x50
> > [ 38.615829] [<ffffffff810c0394>] tick_sched_timer+0x44/0x70
> > [ 38.615835] [<ffffffff8107598b>] __run_hrtimer+0x9b/0x2a0
> > [ 38.615838] [<ffffffff810c0350>] ? tick_sched_handle.isra.15+0x50/0x50
> > [ 38.615842] [<ffffffff81076cbe>] hrtimer_interrupt+0x12e/0x2e0
> > [ 38.615845] [<ffffffff810352c7>] local_apic_timer_interrupt+0x37/0x60
> > [ 38.615851] [<ffffffff81a376ef>] smp_apic_timer_interrupt+0x3f/0x50
> > [ 38.615854] [<ffffffff81a3664a>] apic_timer_interrupt+0x6a/0x70
> > [ 38.615855] <EOI>
> > [ 38.615861] [<ffffffff810dc604>] ? __res_counter_charge+0xc4/0x170
> > [ 38.615866] [<ffffffff81a34487>] ? _raw_spin_lock+0x47/0x60
> > [ 38.615882] [<ffffffff81a34457>] ? _raw_spin_lock+0x17/0x60
> > [ 38.615885] [<ffffffff810dc604>] __res_counter_charge+0xc4/0x170
> > [ 38.615888] [<ffffffff810dc6c0>] res_counter_charge+0x10/0x20
> > [ 38.615896] [<ffffffff81186645>] vm_cgroup_charge_shmem+0x35/0x50
> > [ 38.615900] [<ffffffff8113a686>] shmem_getpage_gfp+0x4b6/0x8e0
> > [ 38.615904] [<ffffffff8108201d>] ? get_parent_ip+0xd/0x50
> > [ 38.615908] [<ffffffff8113b626>] shmem_symlink+0xe6/0x210
> > [ 38.615914] [<ffffffff81195361>] ? __inode_permission+0x41/0xd0
> > [ 38.615917] [<ffffffff811961f0>] vfs_symlink+0x90/0xd0
> > [ 38.615923] [<ffffffff8119a762>] SyS_symlinkat+0x62/0xc0
> > [ 38.615927] [<ffffffff8119a7d6>] SyS_symlink+0x16/0x20
> > [ 38.615930] [<ffffffff81a359d6>] system_call_fastpath+0x1a/0x1f
> >
> >
next prev parent reply other threads:[~2015-02-13 21:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-08 19:38 [PATCH RT] kernel/res_counter.c: Change lock of struct res_counter to raw_spinlock_t Thavatchai Makphaibulchoke
2015-01-30 18:59 ` [PATCH RT v2] " Thavatchai Makphaibulchoke
2015-02-13 19:19 ` Paul Gortmaker
2015-02-13 21:21 ` Thavatchai Makphaibulchoke
2015-02-13 21:30 ` Paul Gortmaker [this message]
2015-02-18 11:05 ` Sebastian Andrzej Siewior
2015-02-20 18:53 ` Paul Gortmaker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150213213016.GC7102@windriver.com \
--to=paul.gortmaker@windriver.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=thavatchai.makpahibulchoke@hp.com \
--cc=tmac@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.