From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=iwK6=QZ=vger.kernel.org=netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.5 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SIGNED_OFF_BY,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B59C2C43381
	for <netdev@archiver.kernel.org>; Mon, 18 Feb 2019 17:29:38 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 7821C2177E
	for <netdev@archiver.kernel.org>; Mon, 18 Feb 2019 17:29:38 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=fomichev-me.20150623.gappssmtp.com header.i=@fomichev-me.20150623.gappssmtp.com header.b="GJc2wAbl"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1732969AbfBRR3h (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Mon, 18 Feb 2019 12:29:37 -0500
Received: from mail-pl1-f193.google.com ([209.85.214.193]:36043 "EHLO
        mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1730010AbfBRR3h (ORCPT
        <rfc822;netdev@vger.kernel.org>); Mon, 18 Feb 2019 12:29:37 -0500
Received: by mail-pl1-f193.google.com with SMTP id g9so9065957plo.3
        for <netdev@vger.kernel.org>; Mon, 18 Feb 2019 09:29:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=fomichev-me.20150623.gappssmtp.com; s=20150623;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to:user-agent;
        bh=nlH1sdTNp/C4kWUOJFRGTOzlhvm//dmz1UwdNlIWETs=;
        b=GJc2wAbl24b0hrSNiKFFAExKIgriMvXAGfWaCDVVW+gFTexfbTY0ZCjtm8wFzbM4t9
         qByfKepPGaz4coWA3ZFp0jGAj1WxOcDMrt8jw6K1/VDexJPNKfZaGjqoVqp1BxQZhu5s
         9/7eZ7WFbbWS/DeWnyTS6i1SJUjUYO4vwXcElxUBDHVpuLi5h/rmZR+eusPJh8XQ4E6O
         gwM+fJ01k2/KZAQRuceGAM4tUZ9zdDAfAJm1Q9cCPpM7Pb2YDhUmtkPK6f+wE7URLewV
         flL/IOCPJbvHhX07Cz5lTk/6beVg1pP/Kty1DSvW65oTuYZR/4GfZrOy8/hCAiPRK+Z5
         NFpw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to:user-agent;
        bh=nlH1sdTNp/C4kWUOJFRGTOzlhvm//dmz1UwdNlIWETs=;
        b=aHFhrioUgUgBX0FGXk+rwQbOP7cf1j2LmmIT5UYDDwzIQJEsKmBbB3shUHnCB4zrny
         SM3dLr/v2gxM65sv31USekbgDh3r/PBzKw3/NC1+LnBO6KhdfcfqDEESnuzyzXH64N3t
         i/YZ/ipmPG0c6PC8947Ev1Gt24KGzuEWqRtl4p1u4c5/qG327TnyAtbXuYwsrJIOr5Dh
         mBgjHhg0cgWE4z/1AswKWGf0UzXv6bcqOXL+zJzCPgUnalFADhWWTl5Ac0NjvD1HmAqt
         GaBVDVl6OGD3MfdmpyF73RganrEy7WToId44GmOnC4huW1VdvdAtaSHrrWqlqXTZhBg0
         Pr2g==
X-Gm-Message-State: AHQUAuY40wFXYD52q4IFYMjDx5bf0dLjPp3/xSxR6Js4nKpLfGWbGCGO
        LhVZgXsNgjwP2loEtWptR3ggtw==
X-Google-Smtp-Source: AHgI3IYbDYlNnbg1BZrHkYpVYINp6c/dkPua4yI5u/A/cWSh3ABkH5wNapA0Ro6Z72S26kSHbuKk+A==
X-Received: by 2002:a17:902:6f08:: with SMTP id w8mr11172202plk.5.1550510975900;
        Mon, 18 Feb 2019 09:29:35 -0800 (PST)
Received: from localhost ([2601:646:8f00:18d9:d0fa:7a4b:764f:de48])
        by smtp.gmail.com with ESMTPSA id g136sm23812792pfb.154.2019.02.18.09.29.34
        (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
        Mon, 18 Feb 2019 09:29:35 -0800 (PST)
Date:   Mon, 18 Feb 2019 09:29:34 -0800
From:   Stanislav Fomichev <sdf@fomichev.me>
To:     Daniel Borkmann <daniel@iogearbox.net>
Cc:     Stanislav Fomichev <sdf@google.com>, netdev@vger.kernel.org,
        davem@davemloft.net, ast@kernel.org,
        syzbot <syzkaller@googlegroups.com>
Subject: Re: [PATCH bpf 1/2] bpf/test_run: fix unkillable BPF_PROG_TEST_RUN
Message-ID: <20190218172934.GD20651@mini-arch>
References: <20190212234239.174386-1-sdf@google.com>
 <74457479-d54c-69fa-958a-3cfb1ee9e5a2@iogearbox.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <74457479-d54c-69fa-958a-3cfb1ee9e5a2@iogearbox.net>
User-Agent: Mutt/1.11.3 (2019-02-01)
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

On 02/16, Daniel Borkmann wrote:
> On 02/13/2019 12:42 AM, Stanislav Fomichev wrote:
> > Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff
> > makes process unkillable. The problem is that when CONFIG_PREEMPT is
> > enabled, we never see need_resched() return true. This is due to the
> > fact that preempt_enable() (which we do in bpf_test_run_one on each
> > iteration) now handles resched if it's needed.
> > 
> > Let's disable preemption for the whole run, not per test. In this case
> > we can properly see whether resched is needed.
> > Let's also properly return -EINTR to the userspace in case of a signal
> > interrupt.
> > 
> > See recent discussion:
> > http://lore.kernel.org/netdev/CAH3MdRWHr4N8jei8jxDppXjmw-Nw=puNDLbu1dQOFQHxfU2onA@mail.gmail.com
> > 
> > I'll follow up with the same fix bpf_prog_test_run_flow_dissector in
> > bpf-next.
> > 
> > Reported-by: syzbot <syzkaller@googlegroups.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  net/bpf/test_run.c | 45 ++++++++++++++++++++++++---------------------
> >  1 file changed, 24 insertions(+), 21 deletions(-)
> > 
> > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > index fa2644d276ef..e31e1b20f7f4 100644
> > --- a/net/bpf/test_run.c
> > +++ b/net/bpf/test_run.c
> > @@ -13,27 +13,13 @@
> >  #include <net/sock.h>
> >  #include <net/tcp.h>
> >  
> > -static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
> > -		struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
> > -{
> > -	u32 ret;
> > -
> > -	preempt_disable();
> > -	rcu_read_lock();
> > -	bpf_cgroup_storage_set(storage);
> > -	ret = BPF_PROG_RUN(prog, ctx);
> > -	rcu_read_unlock();
> > -	preempt_enable();
> > -
> > -	return ret;
> > -}
> > -
> > -static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *ret,
> > -			u32 *time)
> > +static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
> > +			u32 *retval, u32 *time)
> >  {
> >  	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { 0 };
> >  	enum bpf_cgroup_storage_type stype;
> >  	u64 time_start, time_spent = 0;
> > +	int ret = 0;
> >  	u32 i;
> >  
> >  	for_each_cgroup_storage_type(stype) {
> > @@ -48,25 +34,42 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *ret,
> >  
> >  	if (!repeat)
> >  		repeat = 1;
> > +
> > +	rcu_read_lock();
> > +	preempt_disable();
> >  	time_start = ktime_get_ns();
> >  	for (i = 0; i < repeat; i++) {
> > -		*ret = bpf_test_run_one(prog, ctx, storage);
> > +		bpf_cgroup_storage_set(storage);
> > +		*retval = BPF_PROG_RUN(prog, ctx);
> > +
> > +		if (signal_pending(current)) {
> > +			ret = -EINTR;
> > +			break;
> > +		}
> 
> Wouldn't it be enough to just move the signal_pending() test to
> the above as you did to actually fix the unkillable issue? For
> CONFIG_PREEMPT the below need_resched() is never triggered as you
> mention as preempt_enable() handles rescheduling internally in
> this situation, so moving it only out should suffice.
> 
> The rationale for disabling preemption for the whole run is imho
> a bit different, namely that you would not screw up the ktime
> measurements due to rescheduling happening in between otherwise.
That's exactly the reason why we need to preempt_disable() the whole
run; we can't preempt on preempt_enable(), it would screw up our
ktime estimation.

> But then, once preemption is disabled for the whole run, is there
> a need to move out the extra signal_pending() test (presumably as
> need_resched() does not handle TIF_SIGPENDING but only TIF_NEED_RESCHED
> but we still wouldn't get into a unkillable situation here, no)?
I'm not sure, they look like two separate flags, it feels safer to handle
them separately (and we have a precedent in do_check in verifier.c). While
we do set them both when sending signal, it looks like need_resched is
for the cases where we wake up a task with a higher priority. So, in
theory, we can have a signal_pending without need_resched. (Also, with
CONFIG_PREEMT=y kernel, there is another complication with
preempt_count()).

> 
> >  		if (need_resched()) {
> > -			if (signal_pending(current))
> > -				break;
> >  			time_spent += ktime_get_ns() - time_start;
> > +			preempt_enable();
> > +			rcu_read_unlock();
> > +
> >  			cond_resched();
> > +
> > +			rcu_read_lock();
> > +			preempt_disable();
> >  			time_start = ktime_get_ns();
> >  		}
> >  	}
> >  	time_spent += ktime_get_ns() - time_start;
> > +	preempt_enable();
> > +	rcu_read_unlock();
> > +
> >  	do_div(time_spent, repeat);
> >  	*time = time_spent > U32_MAX ? U32_MAX : (u32)time_spent;
> >  
> >  	for_each_cgroup_storage_type(stype)
> >  		bpf_cgroup_storage_free(storage[stype]);
> >  
> > -	return 0;
> > +	return ret;
> >  }
> >  
> >  static int bpf_test_finish(const union bpf_attr *kattr,
> > 
>