From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [PATCH bpf-next 1/2] bpf: enable stackmap with build_id in nmi context Date: Wed, 2 May 2018 11:21:09 +0200 Message-ID: <20180502092109.GI12180@hirez.programming.kicks-ass.net> References: <20180502000220.2585320-1-songliubraving@fb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@vger.kernel.org, kernel-team@fb.com, Alexei Starovoitov , Daniel Borkmann To: Song Liu Return-path: Received: from bombadil.infradead.org ([198.137.202.133]:57032 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751134AbeEBJVV (ORCPT ); Wed, 2 May 2018 05:21:21 -0400 Content-Disposition: inline In-Reply-To: <20180502000220.2585320-1-songliubraving@fb.com> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, May 01, 2018 at 05:02:19PM -0700, Song Liu wrote: > @@ -267,17 +285,27 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > { > int i; > struct vm_area_struct *vma; > + bool in_nmi_ctx = in_nmi(); > + bool irq_work_busy = false; > + struct stack_map_irq_work *work; > + > + if (in_nmi_ctx) { > + work = this_cpu_ptr(&irq_work); > + if (work->sem) > + /* cannot queue more up_read, fallback */ > + irq_work_busy = true; > + } > > /* > + * We cannot do up_read() in nmi context. To do build_id lookup > + * in nmi context, we need to run up_read() in irq_work. We use > + * a percpu variable to do the irq_work. If the irq_work is > + * already used by another lookup, we fall back to report ips. > * > * Same fallback is used for kernel stack (!user) on a stackmap > * with build_id. > */ > + if (!user || !current || !current->mm || irq_work_busy || > down_read_trylock(¤t->mm->mmap_sem) == 0) { > /* cannot access current->mm, fall back to ips */ > for (i = 0; i < trace_nr; i++) { > @@ -299,7 +327,13 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > - vma->vm_start; > id_offs[i].status = BPF_STACK_BUILD_ID_VALID; > } > + > + if (!in_nmi_ctx) > + up_read(¤t->mm->mmap_sem); > + else { > + work->sem = ¤t->mm->mmap_sem; > + irq_work_queue(&work->work); > + } > } This is disguisting.. :-) It's broken though, I've bet you've never actually ran this with lockdep enabled for example. Also, you set work->sem before you do trylock, if the trylock fails you return early and keep work->sem set, which will thereafter always result in irq_work_busy.