All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <olsajiri@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <olsajiri@gmail.com>, Yonghong Song <yhs@meta.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Song Liu <song@kernel.org>, Hao Sun <sunhao.th@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>,
	Thorsten Leemhuis <regressions@leemhuis.info>
Subject: Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp
Date: Sat, 10 Dec 2022 00:29:09 +0100	[thread overview]
Message-ID: <Y5PExSx9idXP2y/e@krava> (raw)
In-Reply-To: <Y5O/yxcjQLq5oDAv@krava>

On Sat, Dec 10, 2022 at 12:07:58AM +0100, Jiri Olsa wrote:

SNIP

> > > > > > 
> > > > > > looking at the code.. how do we ensure that code running through
> > > > > > bpf_prog_run_xdp will not get dispatcher image changed while
> > > > > > it's being exetuted
> > > > > > 
> > > > > > we use 'the other half' of the image when we add/remove programs,
> > > > > > but could bpf_dispatcher_update race with bpf_prog_run_xdp like:
> > > > > > 
> > > > > > 
> > > > > > cpu 0:                                  cpu 1:
> > > > > > 
> > > > > > bpf_prog_run_xdp
> > > > > >      ...
> > > > > >      bpf_dispatcher_xdp_func
> > > > > >         start exec image at offset 0x0
> > > > > > 
> > > > > >                                           bpf_dispatcher_update
> > > > > >                                                   update image at offset 0x800
> > > > > >                                           bpf_dispatcher_update
> > > > > >                                                   update image at offset 0x0
> > > > > > 
> > > > > >         still in image at offset 0x0
> > > > > > 
> > > > > > 
> > > > > > that might explain why I wasn't able to trigger that on
> > > > > > bare metal just in qemu
> > > > > 
> > > > > I tried patch below and it fixes the issue for me and seems
> > > > > to confirm the race above.. but not sure it's the best fix
> > > > > 
> > > > > jirka
> > > > > 
> > > > > 
> > > > > ---
> > > > > diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> > > > > index c19719f48ce0..6a2ced102fc7 100644
> > > > > --- a/kernel/bpf/dispatcher.c
> > > > > +++ b/kernel/bpf/dispatcher.c
> > > > > @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
> > > > >    	}
> > > > >    	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
> > > > > +	synchronize_rcu_tasks();
> > > > >    	if (new)
> > > > >    		d->image_off = noff;
> > > > 
> > > > This might work. In arch/x86/kernel/alternative.c, we have following
> > > > code and comments. For text_poke, synchronize_rcu_tasks() might be able
> > > > to avoid concurrent execution and update.
> > > 
> > > so my idea was that we need to ensure all the current callers of
> > > bpf_dispatcher_xdp_func (which should have rcu read lock, based
> > > on the comment in bpf_prog_run_xdp) are gone before and new ones
> > > execute the new image, so the next call to the bpf_dispatcher_update
> > > will be safe to overwrite the other half of the image
> > 
> > If v6.1-rc1 was indeed okay, then it looks like this may be related to
> > the trampoline patching for the static_call? Did it repro on v6.1-rc1
> > just with dbe69b299884 ("bpf: Fix dispatcher patchable function entry
> > to 5 bytes nop") cherry-picked?
> 
> I'll try that.. it looks to me like the problem was always there,
> maybe harder to trigger.. also to reproduce it you need to call
> bpf_dispatcher_update heavily, which is not probably the common
> use case
> 
> one other thing is that I think the fix might need rcu locking
> on the bpf_dispatcher_xdp_func side, because local_bh_disable
> seems not to be enough to make synchronize_rcu_tasks work
> 
> I'm now testing patch below
> 
> jirka
> 
> 
> ---
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index efc42a6e3aed..a27245b96d6b 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -772,7 +772,13 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
>  	 * under local_bh_disable(), which provides the needed RCU protection
>  	 * for accessing map entries.
>  	 */
> -	u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> +	u32 act;
> +
> +	rcu_read_lock();
> +
> +	act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> +
> +	rcu_read_unlock();
>  
>  	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
>  		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> index c19719f48ce0..6a2ced102fc7 100644
> --- a/kernel/bpf/dispatcher.c
> +++ b/kernel/bpf/dispatcher.c
> @@ -124,6 +124,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
>  	}
>  
>  	__BPF_DISPATCHER_UPDATE(d, new ?: (void *)&bpf_dispatcher_nop_func);
> +	synchronize_rcu_tasks();
>  
>  	if (new)
>  		d->image_off = noff;

hm, so I'm eventually getting splats like below

I guess I'm missing some rcu/xdp detail, thoughts? ;-)

jirka


---
[ 1107.911088][   T41] INFO: task rcu_tasks_kthre:12 blocked for more than 122 seconds.
[ 1107.913332][   T41]       Not tainted 6.1.0-rc7+ #847
[ 1107.914801][   T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1107.916691][   T41] task:rcu_tasks_kthre state:D stack:14392 pid:12    ppid:2      flags:0x00004000
[ 1107.917324][   T41] Call Trace:
[ 1107.917563][   T41]  <TASK>
[ 1107.917784][   T41]  __schedule+0x419/0xe30
[ 1107.918764][   T41]  schedule+0x5d/0xe0
[ 1107.919061][   T41]  schedule_timeout+0x102/0x140
[ 1107.919386][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.919747][   T41]  ? lock_release+0x264/0x4f0
[ 1107.920079][   T41]  ? lock_acquired+0x207/0x470
[ 1107.920397][   T41]  ? trace_hardirqs_on+0x2b/0xd0
[ 1107.920723][   T41]  __wait_for_common+0xb6/0x210
[ 1107.921067][   T41]  ? usleep_range_state+0xb0/0xb0
[ 1107.921401][   T41]  __synchronize_srcu+0x151/0x1e0
[ 1107.921731][   T41]  ? rcu_tasks_pregp_step+0x10/0x10
[ 1107.922112][   T41]  ? ktime_get_mono_fast_ns+0x3a/0x90
[ 1107.922463][   T41]  ? synchronize_srcu+0xa1/0xe0
[ 1107.922784][   T41]  rcu_tasks_wait_gp+0x183/0x3b0
[ 1107.923129][   T41]  ? lock_release+0x264/0x4f0
[ 1107.923442][   T41]  rcu_tasks_one_gp+0x35a/0x3e0
[ 1107.923766][   T41]  ? rcu_tasks_postscan+0x20/0x20
[ 1107.924114][   T41]  rcu_tasks_kthread+0x31/0x40
[ 1107.924434][   T41]  kthread+0xf2/0x120
[ 1107.924713][   T41]  ? kthread_complete_and_exit+0x20/0x20
[ 1107.925095][   T41]  ret_from_fork+0x1f/0x30
[ 1107.925404][   T41]  </TASK>
[ 1107.925664][   T41] INFO: task ex:7319 blocked for more than 122 seconds.
[ 1107.926121][   T41]       Not tainted 6.1.0-rc7+ #847
[ 1107.926461][   T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1107.927090][   T41] task:ex              state:D stack:13648 pid:7319  ppid:677    flags:0x00004006
[ 1107.927791][   T41] Call Trace:
[ 1107.928079][   T41]  <TASK>
[ 1107.928334][   T41]  __schedule+0x419/0xe30
[ 1107.928683][   T41]  schedule+0x5d/0xe0
[ 1107.929019][   T41]  schedule_preempt_disabled+0x14/0x30
[ 1107.929440][   T41]  __mutex_lock+0x3fd/0x850
[ 1107.929799][   T41]  ? bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.930235][   T41]  ? bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.930609][   T41]  bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.930977][   T41]  bpf_prog_test_run_xdp+0x39b/0x600
[ 1107.931340][   T41]  __sys_bpf+0x963/0x2bb0
[ 1107.931684][   T41]  ? futex_wait+0x175/0x250
[ 1107.932014][   T41]  ? lock_acquire+0x2ed/0x370
[ 1107.932328][   T41]  ? lock_release+0x264/0x4f0
[ 1107.932640][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.933028][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.933388][   T41]  ? lock_release+0x264/0x4f0
[ 1107.933700][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.934070][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.934432][   T41]  __x64_sys_bpf+0x1a/0x30
[ 1107.934733][   T41]  do_syscall_64+0x37/0x90
[ 1107.935050][   T41]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1107.935428][   T41] RIP: 0033:0x7f02f9f0af3d
[ 1107.935731][   T41] RSP: 002b:00007f02fa0e9df8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ 1107.936291][   T41] RAX: ffffffffffffffda RBX: 00007f02fa0ea640 RCX: 00007f02f9f0af3d
[ 1107.936811][   T41] RDX: 0000000000000048 RSI: 0000000020000140 RDI: 000000000000000a
[ 1107.937360][   T41] RBP: 00007f02fa0e9e20 R08: 0000000000000000 R09: 0000000000000000
[ 1107.937884][   T41] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffff80
[ 1107.938425][   T41] R13: 0000000000000011 R14: 00007ffda75fd290 R15: 00007f02fa0ca000
[ 1107.939050][   T41]  </TASK>
[ 1107.939315][   T41] INFO: task ex:7352 blocked for more than 122 seconds.
[ 1107.939744][   T41]       Not tainted 6.1.0-rc7+ #847
[ 1107.940095][   T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1107.940651][   T41] task:ex              state:D stack:13648 pid:7352  ppid:766    flags:0x00004006
[ 1107.941254][   T41] Call Trace:
[ 1107.941492][   T41]  <TASK>
[ 1107.941710][   T41]  __schedule+0x419/0xe30
[ 1107.942018][   T41]  ? lock_acquired+0x207/0x470
[ 1107.942339][   T41]  schedule+0x5d/0xe0
[ 1107.942616][   T41]  schedule_timeout+0x102/0x140
[ 1107.942955][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.943330][   T41]  ? lock_release+0x264/0x4f0
[ 1107.943643][   T41]  ? lock_acquired+0x207/0x470
[ 1107.943965][   T41]  ? trace_hardirqs_on+0x2b/0xd0
[ 1107.944318][   T41]  __wait_for_common+0xb6/0x210
[ 1107.944641][   T41]  ? usleep_range_state+0xb0/0xb0
[ 1107.950003][   T41]  __wait_rcu_gp+0x14d/0x170
[ 1107.950399][   T41]  ? 0xffffffffa0013840
[ 1107.950726][   T41]  synchronize_rcu_tasks_generic.part.0.isra.0+0x31/0x50
[ 1107.951207][   T41]  ? call_rcu_tasks_generic+0x350/0x350
[ 1107.951643][   T41]  ? rcu_tasks_pregp_step+0x10/0x10
[ 1107.952070][   T41]  bpf_dispatcher_change_prog+0x204/0x380
[ 1107.952521][   T41]  bpf_prog_test_run_xdp+0x39b/0x600
[ 1107.952941][   T41]  __sys_bpf+0x963/0x2bb0
[ 1107.953302][   T41]  ? futex_wait+0x175/0x250
[ 1107.953669][   T41]  ? lock_acquire+0x2ed/0x370
[ 1107.954058][   T41]  ? lock_release+0x264/0x4f0
[ 1107.954435][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.954868][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.955329][   T41]  ? lock_release+0x264/0x4f0
[ 1107.955705][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.956148][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.956582][   T41]  __x64_sys_bpf+0x1a/0x30
[ 1107.956937][   T41]  do_syscall_64+0x37/0x90
[ 1107.957312][   T41]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1107.957771][   T41] RIP: 0033:0x7ffaa610af3d
[ 1107.958140][   T41] RSP: 002b:00007ffaa629adf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ 1107.958792][   T41] RAX: ffffffffffffffda RBX: 00007ffaa629b640 RCX: 00007ffaa610af3d
[ 1107.959427][   T41] RDX: 0000000000000048 RSI: 0000000020000140 RDI: 000000000000000a
[ 1107.960054][   T41] RBP: 00007ffaa629ae20 R08: 0000000000000000 R09: 0000000000000000
[ 1107.960680][   T41] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffff80
[ 1107.961314][   T41] R13: 0000000000000011 R14: 00007ffef5c89e00 R15: 00007ffaa627b000
[ 1107.961948][   T41]  </TASK>
[ 1107.962226][   T41] INFO: task ex:7354 blocked for more than 122 seconds.
[ 1107.962756][   T41]       Not tainted 6.1.0-rc7+ #847
[ 1107.963178][   T41] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1107.963786][   T41] task:ex              state:D stack:13648 pid:7354  ppid:767    flags:0x00004006
[ 1107.964451][   T41] Call Trace:
[ 1107.964733][   T41]  <TASK>
[ 1107.965001][   T41]  __schedule+0x419/0xe30
[ 1107.965354][   T41]  schedule+0x5d/0xe0
[ 1107.965682][   T41]  schedule_preempt_disabled+0x14/0x30
[ 1107.966130][   T41]  __mutex_lock+0x3fd/0x850
[ 1107.966493][   T41]  ? lock_acquire+0x2ed/0x370
[ 1107.966870][   T41]  ? bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.967340][   T41]  ? bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.967792][   T41]  bpf_dispatcher_change_prog+0x3a/0x380
[ 1107.968236][   T41]  bpf_prog_test_run_xdp+0x2c8/0x600
[ 1107.968654][   T41]  __sys_bpf+0x963/0x2bb0
[ 1107.969012][   T41]  ? futex_wait+0x175/0x250
[ 1107.969380][   T41]  ? lock_acquire+0x2ed/0x370
[ 1107.969754][   T41]  ? lock_release+0x264/0x4f0
[ 1107.970135][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.970565][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.971008][   T41]  ? lock_release+0x264/0x4f0
[ 1107.971385][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.971813][   T41]  ? rcu_read_lock_sched_held+0x10/0x90
[ 1107.972257][   T41]  __x64_sys_bpf+0x1a/0x30
[ 1107.972614][   T41]  do_syscall_64+0x37/0x90
[ 1107.972984][   T41]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1107.973385][   T41] RIP: 0033:0x7ffaa610af3d
[ 1107.973696][   T41] RSP: 002b:00007ffaa629adf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ 1107.974261][   T41] RAX: ffffffffffffffda RBX: 00007ffaa629b640 RCX: 00007ffaa610af3d
[ 1107.974795][   T41] RDX: 0000000000000048 RSI: 0000000020000140 RDI: 000000000000000a
[ 1107.975348][   T41] RBP: 00007ffaa629ae20 R08: 0000000000000000 R09: 0000000000000000
[ 1107.975942][   T41] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffff80
[ 1107.976570][   T41] R13: 0000000000000011 R14: 00007ffef5c89e00 R15: 00007ffaa627b000
[ 1107.977216][   T41]  </TASK>

  reply	other threads:[~2022-12-09 23:29 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-06  3:28 BUG: unable to handle kernel paging request in bpf_dispatcher_xdp Hao Sun
2022-12-06  6:46 ` Hao Sun
2022-12-06 15:18   ` Jiri Olsa
2022-12-07 19:57     ` Alexei Starovoitov
2022-12-08 17:48       ` Alexei Starovoitov
2022-12-08 18:06         ` Jiri Olsa
2022-12-08 22:26           ` Jiri Olsa
2022-12-08 23:02             ` Jiri Olsa
2022-12-09  7:09               ` Jiri Olsa
2022-12-09 11:22                 ` Jiri Olsa
2022-12-09 13:50                   ` Jiri Olsa
2022-12-09 15:20                     ` Jiri Olsa
2022-12-09 20:31                       ` Yonghong Song
2022-12-09 21:53                         ` Jiri Olsa
2022-12-09 22:41                           ` Daniel Borkmann
2022-12-09 23:07                             ` Jiri Olsa
2022-12-09 23:29                               ` Jiri Olsa [this message]
2022-12-09 23:32                               ` Daniel Borkmann
2022-12-09 23:34                                 ` Jakub Kicinski
2022-12-10  0:06                                   ` Jiri Olsa
2022-12-10  0:38                                     ` Paul E. McKenney
2022-12-10 13:05                                       ` Jiri Olsa
2022-12-10  1:12                                     ` Alexei Starovoitov
2022-12-10 13:11                                       ` Jiri Olsa
2022-12-12 15:04                                         ` Jiri Olsa
2022-12-13  2:26                                           ` Hao Sun
2022-12-08  8:44 ` BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot Thorsten Leemhuis
2022-12-19  9:59   ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5PExSx9idXP2y/e@krava \
    --to=olsajiri@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=regressions@leemhuis.info \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=sunhao.th@gmail.com \
    --cc=yhs@fb.com \
    --cc=yhs@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.