All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Joerg Roedel <jroedel@suse.de>, rostedt <rostedt@goodmis.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shile Zhang <shile.zhang@linux.alibaba.com>,
	Andy Lutomirski <luto@amacapital.net>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()
Date: Thu, 30 Apr 2020 11:23:29 -0400 (EDT)	[thread overview]
Message-ID: <50145570.77817.1588260209983.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20200430141120.GA8135@suse.de>

----- On Apr 30, 2020, at 10:11 AM, Joerg Roedel jroedel@suse.de wrote:

> Hi,
> 
> On Wed, Apr 29, 2020 at 10:07:31AM -0400, Steven Rostedt wrote:
>> Talking with Mathieu about this on IRC, he pointed out that my code does
>> have a vzalloc() that is called:
>> 
>> in trace_pid_write()
>> 
>> 	pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3);
>> 
>> This is done when -P1,2 is on the trace-cmd command line.
> 
> Okay, tracked it down, some instrumentation in the page-fault and
> double-fault handler gave me the stack-traces. Here is what happens:
> 
> As already pointed out, it all happens because of page-faults on the
> vzalloc'ed pid bitmap. It starts with this stack-trace:
> 
> RIP: 0010:trace_event_ignore_this_pid+0x23/0x30
> Code: e9 c2 4b 6b 00 cc cc 48 8b 57 28 48 8b 8a b8 00 00 00 48 8b 82 c0 00 00 00
> 48 85 c0 74 11 48 8b 42 28 65 48 03 05 5d 9c e6 7e <0f> b6 40 7c c3 48 85 c9 75
> ea f3 c3 90 48 8b 4f 70 48 83 02 01 48
> RSP: 0018:ffffc90000673bd8 EFLAGS: 00010082
> RAX: ffffe8ffffd8c870 RBX: 0000000000000203 RCX: ffff88810734ca90
> RDX: ffff888451578000 RSI: ffffffff820f3d2a RDI: ffff888453594d68
> RBP: ffff888453594d68 R08: 000000000001845e R09: ffffffff81114ba0
> R10: 0000000000000000 R11: 000000000000000e R12: 4000000000000000
> R13: ffffffff820f3d2a R14: 000000000001845e R15: 4000000000000002
>  ? trace_event_raw_event_rcu_fqs+0xa0/0xa0
>  trace_event_raw_event_rcu_dyntick+0x89/0xa0
>  ? trace_event_raw_event_rcu_dyntick+0x89/0xa0
>  ? trace_event_raw_event_rcu_dyntick+0x89/0xa0
>  ? insn_get_prefixes.part.2+0x174/0x2d0
>  rcu_irq_enter+0xf0/0x1d0
>  rcu_irq_enter_irqson+0x21/0x50
>  switch_mm_irqs_off+0x43c/0x570
>  ? do_one_initcall+0x51/0x210
>  __text_poke+0x1a9/0x470
>  text_poke_bp_batch+0x73/0x180
>  text_poke_flush+0x43/0x50
>  arch_jump_label_transform_apply+0x16/0x30
>  __static_key_slow_dec_cpuslocked+0x42/0x50
>  static_key_slow_dec+0x1f/0x50
>  tracepoint_probe_unregister+0x1e2/0x220
>  trace_event_reg+0x6a/0x80
>  __ftrace_event_enable_disable+0x1ca/0x240
>  __ftrace_set_clr_event_nolock+0xe1/0x140
>  __ftrace_set_clr_event+0x3d/0x60
>  system_enable_write+0x76/0xa0
>  vfs_write+0xad/0x1a0
>  ? rcu_irq_exit+0xb8/0x170
>  ksys_write+0x48/0xb0
>  do_syscall_64+0x60/0x200
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> RIP: 0033:0x7f875b0ecdb0
> Code: Bad RIP value.
> RSP: 002b:00007ffc746be918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f875b0ecdb0
> RDX: 0000000000000001 RSI: 00007ffc746be93f RDI: 0000000000000004
> RBP: 00007ffc746be9a0 R08: 636172742f6c6500 R09: 0000000002242ee0
> R10: 6f662f7365636e61 R11: 0000000000000246 R12: 000000000040a4b0
> R13: 00007ffc746bebd0 R14: 0000000000000000 R15: 0000000000000000
> 
> I havn't figured out how rcu_irq_enter() calls down into
> trace_event_raw_event_rcu_dyntick() and further into
> trace_event_ignore_this_pid(), but the stacktrace shows it does.
> 
> So trace_event_ignore_this_pid() faults on the vzalloc()'ed memory, calling
> into the page-fault handler. What happens there is:
> 
> RIP: 0010:trace_event_ignore_this_pid+0x23/0x30
> Code: e9 c2 4b 6b 00 cc cc 48 8b 57 28 48 8b 8a b8 00 00 00 48 8b 82 c0 00 00 00
> 48 85 c0 74 11 48 8b 42 28 65 48 03 05 5d 9c e6 7e <0f> b6 40 7c c3 48 85 c9 75
> ea f3 c3 90 48 8b 4f 70 48 83 02 01 48
> RSP: 0018:ffffc90000673a50 EFLAGS: 00010082
> RAX: ffffe8ffffd8c870 RBX: 0000000000000203 RCX: ffff88810734ca90
> RDX: ffff888451578000 RSI: ffffe8ffffd8c8ec RDI: ffff88844fd98478
> RBP: ffff88844fd98478 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffe8ffffd8c8ec
> R13: 0000000000000000 R14: ffffc90000673b28 R15: 0000000000000000
>  trace_event_raw_event_x86_exceptions+0x87/0xa0
>  ? trace_event_buffer_lock_reserve+0x6e/0x110
>  do_page_fault+0x45e/0x630
>  ? trace_hardirqs_off_thunk+0x1a/0x37
>  page_fault+0x43/0x50
> 
> The page-fault handler calls a tracing function which again ends up in
> trace_event_ignore_this_pid(), where it faults again. From here on the CPU is in
> a page-fault loop, which continues until the stack overflows (with
> CONFIG_VMAP_STACK).
> 
> Then there is no mapped stack anymore, so the page-fault results in a
> double-fault, which uses an IST stack. The double-fault handler does
> ist_enter(), which calls into rcu_nmi_enter(), which also has trace-events down
> its call-path. I have no stack-trace for this, but what likely happens
> now is that it page-faults again while on the IST stack and the
> page-fault loops until the #DF IST stack overflows. Then the next #DF
> happens and the stack pointer is reset to the top of the #DF IST stack,
> starting the loop over again. This loops forever, causing the hang.

I suspect it's called by:

include/linux/trace_event.h:trace_trigger_soft_disabled():

static inline bool
trace_trigger_soft_disabled(struct trace_event_file *file)
{
        unsigned long eflags = file->flags;

        if (!(eflags & EVENT_FILE_FL_TRIGGER_COND)) {
                if (eflags & EVENT_FILE_FL_TRIGGER_MODE)
                        event_triggers_call(file, NULL, NULL);
                if (eflags & EVENT_FILE_FL_SOFT_DISABLED)
                        return true;
                if (eflags & EVENT_FILE_FL_PID_FILTER)
                        return trace_event_ignore_this_pid(file);
        }
        return false;
}

Thanks,

Mathieu

> 
> Regards,
> 
> 	Joerg

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2020-04-30 15:23 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-29  9:48 [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Steven Rostedt
2020-04-29 10:59 ` Joerg Roedel
2020-04-29 12:28   ` Steven Rostedt
2020-04-29 14:07     ` Steven Rostedt
2020-04-29 14:10       ` Joerg Roedel
2020-04-29 14:32         ` Steven Rostedt
2020-04-29 15:44           ` Peter Zijlstra
2020-04-29 16:17       ` Joerg Roedel
2020-04-29 16:20         ` Joerg Roedel
2020-04-29 16:52           ` Steven Rostedt
2020-04-29 17:29             ` Mathieu Desnoyers
2020-04-29 18:51               ` Peter Zijlstra
2020-04-30 14:11       ` Joerg Roedel
2020-04-30 14:50         ` Joerg Roedel
2020-04-30 15:20           ` Mathieu Desnoyers
2020-04-30 16:16             ` Steven Rostedt
2020-04-30 16:18               ` Mathieu Desnoyers
2020-04-30 16:30                 ` Steven Rostedt
2020-04-30 16:35                   ` Mathieu Desnoyers
2020-04-30 15:23         ` Mathieu Desnoyers [this message]
2020-04-30 16:12           ` Steven Rostedt
2020-04-30 16:11         ` Steven Rostedt
2020-04-30 16:16           ` Mathieu Desnoyers
2020-04-30 16:25             ` Steven Rostedt
2020-04-30 19:14           ` Joerg Roedel
2020-05-01  1:13             ` Steven Rostedt
2020-05-01  2:26               ` Mathieu Desnoyers
2020-05-01  2:39                 ` Steven Rostedt
2020-05-01 10:16                   ` Joerg Roedel
2020-05-01 13:35                   ` Mathieu Desnoyers
2020-05-04 15:12                   ` [PATCH] percpu: Sync vmalloc mappings in pcpu_alloc() and free_percpu() Joerg Roedel
2020-05-04 15:28                     ` Mathieu Desnoyers
2020-05-04 15:31                       ` Joerg Roedel
2020-05-04 15:38                         ` Mathieu Desnoyers
2020-05-04 15:51                           ` Joerg Roedel
2020-05-04 17:04                           ` Steven Rostedt
2020-05-04 17:40                     ` Steven Rostedt
2020-05-04 18:38                       ` Joerg Roedel
2020-05-04 19:10                         ` Steven Rostedt
2020-05-05 12:31                           ` [PATCH] tracing: Call vmalloc_sync_mappings() after alloc_percpu() Joerg Roedel
2020-05-06 15:17                             ` Steven Rostedt
2020-05-08 14:42                               ` Joerg Roedel
2020-05-04 20:25                     ` [PATCH] percpu: Sync vmalloc mappings in pcpu_alloc() and free_percpu() Peter Zijlstra
2020-05-04 20:43                       ` Steven Rostedt
2020-05-01  4:20                 ` [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Steven Rostedt
2020-05-01 13:22                   ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50145570.77817.1588260209983.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=jroedel@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=shile.zhang@linux.alibaba.com \
    --cc=tglx@linutronix.de \
    --cc=tz.stoyanov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.