All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Vince Weaver <vince@deater.net>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: Linux 3.18 released
Date: Wed, 10 Dec 2014 16:38:58 -0800	[thread overview]
Message-ID: <5488E7A2.1050400@amacapital.net> (raw)
In-Reply-To: <alpine.DEB.2.10.1412081331580.28373@pianoman.cluster.toy>

On 12/08/2014 10:39 AM, Vince Weaver wrote:
> On Sun, 7 Dec 2014, Linus Torvalds wrote:
> 
>> I'd love to say that we've figured out the problem that plagues 3.17
>> for a couple of people, but we haven't. At the same time, there's
>> absolutely no point in having everybody else twiddling their thumbs
>> when a couple of people are actively trying to bisect an older issue,
>> so holding up the release just didn't make sense. Especially since
>> that would just have then held things up entirely over the holiday
>> break.
>>
>> So the merge window for 3.19 is open, and DaveJ will hopefully get his
>> bisection done (or at least narrow things down sufficiently that we
>> have that "Ahaa" moment) over the next week. But in solidarity with
>> Dave (and to make my life easier too ;) let's try to avoid introducing
>> any _new_ nasty issues, ok?
> 
> It's probably unrelated to DaveJ's issue, but my perf_event fuzzer still 
> quickly locks the kernel pretty solid on 3.18.
> 
> Just 5 minutes of testing managed to trip over the following issue that 
> dates back to at least 3.15-rc7

Out of curiosity, can you see if this:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/paranoid-and-more&id=38e49874d0ab18276f753f5784420b091f4be6eb

makes the problem much worse?  (Don't take the whole series there --
just cherry-pick the one patch.)

--Andy

> 
> My notes say last time I tracked down the issue as so:
> 
>   What happens is in kernel/core/events.c  find_get_context()
>   somehow perf_lock_task_context() returns NULL 
>   due to !atomic_inc_not_zero(&ctx->refcount)
>   but task->perf_event_ctxp[ctxn] still has a valid value.
> 
> There are multiple perf related issues like this that are hard to track 
> down.  They are borderline heisenbugs that are possibly race conditions, 
> so bisecting doesn't work and even things like enablibg ftrace will make 
> the issue go away (or crash ftrace itself).
> 
> This particular manifestation of the bug (or bugs) wedges things but I can 
> use alt-sysrq from the serial console to see where it is stuck (see 
> below; the CPU is stuck in a loop).
> 
> 
> [ 2225.916004]  [<ffffffff810e61e9>] ? get_page_from_freelist+0x55/0x781
> [ 2225.916004]  [<ffffffff810e6a7c>] __alloc_pages_nodemask+0x167/0x6dc
> [ 2225.916004]  [<ffffffff8101a4a3>] ? intel_pmu_enable_all+0x28/0xa4
> [ 2225.916004]  [<ffffffff8111f0b3>] kmem_getpages+0x58/0xec
> [ 2225.916004]  [<ffffffff81120278>] cache_grow+0xad/0x1d8
> [ 2225.916004]  [<ffffffff81120021>] ____cache_alloc+0x237/0x2ce
> [ 2225.916004]  [<ffffffff811216b9>] __kmalloc+0x8f/0xf2
> [ 2225.916004]  [<ffffffff810dc35d>] ? T.1336+0xe/0x10
> [ 2225.916004]  [<ffffffff810dc35d>] T.1336+0xe/0x10
> [ 2225.916004]  [<ffffffff810dc8ca>] alloc_perf_context+0x20/0x51
> [ 2225.916004]  [<ffffffff810dca33>] find_get_context+0x138/0x1c7
> [ 2225.916004]  [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
> [ 2225.916004]  [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
> [ 2225.916004]  [<ffffffff81560016>] system_call_fastpath+0x16/0x1b
> 
> [ 2256.708004]  [<ffffffff810d7e36>] ? put_ctx+0x40/0x61
> [ 2256.708004]  [<ffffffff810dcaa4>] find_get_context+0x1a9/0x1c7
> [ 2256.708004]  [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
> [ 2256.708004]  [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
> [ 2256.708004]  [<ffffffff81560016>] system_call_fastpath+0x16/0x1b
> 
> [ 2303.796003]  [<ffffffff810fa6cb>] ? kmalloc_slab+0x7f/0x8d
> [ 2303.796003]  [<ffffffff81121653>] __kmalloc+0x29/0xf2
> [ 2303.796003]  [<ffffffff810dc35d>] ? T.1336+0xe/0x10
> [ 2303.796003]  [<ffffffff810dc35d>] T.1336+0xe/0x10
> [ 2303.796003]  [<ffffffff810dc8ca>] alloc_perf_context+0x20/0x51
> [ 2303.796003]  [<ffffffff810dca33>] find_get_context+0x138/0x1c7
> [ 2303.796003]  [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
> [ 2303.796003]  [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
> [ 2303.796003]  [<ffffffff81560016>] system_call_fastpath+0x16/0x1b
> 
> Vince
> 


      parent reply	other threads:[~2014-12-11  0:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-08  0:10 Linux 3.18 released Linus Torvalds
2014-12-08 18:39 ` Vince Weaver
2014-12-08 19:11   ` Vince Weaver
2014-12-09 10:18   ` Ingo Molnar
2014-12-09 11:06     ` Ingo Molnar
2014-12-11  0:38   ` Andy Lutomirski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5488E7A2.1050400@amacapital.net \
    --to=luto@amacapital.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vince@deater.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.