All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: splice() rcu_sched self-detected stall on CPU
Date: Mon, 20 Jul 2020 16:12:04 -0400	[thread overview]
Message-ID: <20200720201204.GA1280@lca.pw> (raw)
In-Reply-To: <8678900D-8D52-4195-A767-9E6923EE0AAF@lca.pw>

On Tue, May 05, 2020 at 03:00:34PM -0400, Qian Cai wrote:
> 
> 
> > On May 5, 2020, at 2:57 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > On Mon, May 04, 2020 at 03:11:09PM -0400, Qian Cai wrote:
> >> Running a syscall fuzzer inside a container on linux-next floods
> >> systems with soft lockups. It looks like stuck in this line at
> >> iov_iter_copy_from_user_atomic(), Thoughts?
> >> 
> >> iterate_all_kinds(i, bytes, v,
> >>                copyin((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
> >>                memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page,
> >>                                 v.bv_offset, v.bv_len),
> >>                memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
> >>        )
> > 
> > If the size being copied is large enough, something like this might happen.
> > 
> > Is this a CONFIG_PREEMPT=n kernel?  And is the size passed in to
> 
> Yes, CONFIG_PREEMPT=n.
> 
> > iov_iter_copy_from_user_atomic() quite large, given that this is generated
> > by a fuzzer?  If so, one thing to try is to add cond_resched() in the
> > iterate_bvec(), iterate_kvec(), and iterate_iovec() macros.
> 
> I’ll try that.

Finally, had a chance to try it, but in
iov_iter_copy_from_user_atomic(), it calls kmap_atomic() first which
calls preempt_disable(). Thus, in_atomic() == 1. Later, those
cond_resched() would trigger the "sleeping function called from invalid
context" warnings.

Just to refresh the memory, it is quite easy to still reproduce this
today.

# git clone https://gitlab.com/cailca/linux-mm
# cd linux-mm; make
# ./random -x 0-100 -f

[10605.410827][   C34] watchdog: BUG: soft lockup - CPU#34 stuck for 23s! [trinity-c47:134866]
[10605.419192][   C34] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio loop processor ip_tables x_tables sd_mod ahci libahci mlx5_core firmware_class libata dm_mirror dm_region_hash dm_log dm_mod efivarfs
[10605.438424][   C34] irq event stamp: 439794
[10605.442615][   C34] hardirqs last  enabled at (439793): [<ffffa000101d38c4>] el1_irq+0xc4/0x140
[10605.451313][   C34] hardirqs last disabled at (439794): [<ffffa000101d387c>] el1_irq+0x7c/0x140
[10605.460013][   C34] softirqs last  enabled at (439790): [<ffffa000101d1b50>] efi_header_end+0xb50/0x14d4
[10605.469494][   C34] softirqs last disabled at (439783): [<ffffa000102bd898>] irq_exit+0x440/0x510
[10605.478368][   C34] CPU: 34 PID: 134866 Comm: trinity-c47 Not tainted 5.8.0-rc5-next-20200717+ #2
[10605.487236][   C34] Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
[10605.497669][   C34] pstate: 40400009 (nZcv daif +PAN -UAO BTYPE=--)
[10605.503938][   C34] pc : iov_iter_copy_from_user_atomic+0x640/0xa68
[10605.510208][   C34] lr : generic_perform_write+0x24c/0x370
[10605.515691][   C34] sp : ffff008921b57580
[10605.519697][   C34] x29: ffff008921b57580 x28: 0000000000000000
[10605.525712][   C34] x27: dfffa00000000000 x26: ffff00895cbe6500
[10605.531726][   C34] x25: 0000000000000000 x24: 0000000000000003
[10605.537740][   C34] x23: ffff008921b57b90 x22: 0000000000000000
[10605.543754][   C34] x21: 0000000000000000 x20: 0000000000000004
[10605.549769][   C34] x19: ffff000c5e817d39 x18: 0000000000000000
[10605.555782][   C34] x17: 0000000000000000 x16: 0000000000000000
[10605.561796][   C34] x15: 0000000000000000 x14: 1fffe0112436ae4a
[10605.567809][   C34] x13: ffff8012adcac617 x12: 1fffe012adcac616
[10605.573823][   C34] x11: 1fffe012adcac616 x10: ffff8012adcac616
[10605.579838][   C34] x9 : dfffa00000000000 x8 : ffff00956e5630b7
[10605.585852][   C34] x7 : ffff80112436af75 x6 : ffff8012adcac617
[10605.591865][   C34] x5 : 0000000000000000 x4 : 0000000000000004
[10605.597886][   C34] x3 : ffff00895cbe6500 x2 : 1fffe0112b97cca1
[10605.603926][   C34] x1 : 0000000000000007 x0 : ffff00895cbe650c
[10605.609941][   C34] Call trace:
[10605.613083][   C34]  iov_iter_copy_from_user_atomic+0x640/0xa68
[10605.619003][   C34]  generic_perform_write+0x24c/0x370
[10605.624140][   C34]  __generic_file_write_iter+0x280/0x428
[10605.629624][   C34]  generic_file_write_iter+0x2a4/0x8e0
[10605.634936][   C34]  do_iter_readv_writev+0x348/0x590
[10605.639986][   C34]  do_iter_write+0x124/0x468
[10605.644428][   C34]  vfs_iter_write+0x54/0x98
[10605.648786][   C34]  iter_file_splice_write+0x4a0/0x860
[10605.654009][   C34]  do_splice_from+0x60/0x120
[10605.658452][   C34]  do_splice+0x7ac/0x1310
[10605.662633][   C34]  __arm64_sys_splice+0x118/0x1e0
[10605.667510][   C34]  do_el0_svc+0x124/0x228
[10605.671693][   C34]  el0_sync_handler+0x260/0x410
[10605.676395][   C34]  el0_sync+0x140/0x180
[10648.198030][   C34] rcu: INFO: rcu_sched self-detected stall on CPU
[10648.204756][   C34] rcu: 	34-....: (6472 ticks this GP) idle=666/1/0x4000000000000002 softirq=36485/36485 fqs=1901
[10648.216008][   C34] 	(t=6501 jiffies g=587637 q=293554)
[10648.221240][   C34] Task dump for CPU 34:
[10648.225248][   C34] trinity-c47     R  running task    25856 134866 106496 0x00000007
[10648.233097][   C34] Call trace:
[10648.236245][   C34]  dump_backtrace+0x0/0x398
[10648.240603][   C34]  show_stack+0x14/0x20
[10648.244615][   C34]  sched_show_task+0x498/0x608
[10648.249232][   C34]  dump_cpu_task+0xcc/0xdc
[10648.253503][   C34]  rcu_dump_cpu_stacks+0x248/0x294
[10648.258467][   C34]  rcu_sched_clock_irq+0x12f4/0x1998
[10648.263606][   C34]  update_process_times+0x2c/0x90
[10648.268484][   C34]  tick_sched_handle+0x68/0x120
[10648.273187][   C34]  tick_sched_timer+0x60/0x128
[10648.277802][   C34]  __hrtimer_run_queues+0x7b4/0x1388
[10648.282940][   C34]  hrtimer_interrupt+0x284/0x640
[10648.287732][   C34]  arch_timer_handler_phys+0x48/0x68
[10648.292871][   C34]  handle_percpu_devid_irq+0x248/0xde0
[10648.298194][   C34]  generic_handle_irq+0x74/0xa0
[10648.302907][   C34]  __handle_domain_irq+0xa0/0x160
[10648.307784][   C34]  gic_handle_irq+0xc0/0x1a0
[10648.312227][   C34]  el1_irq+0xbc/0x140
[10648.316064][   C34]  iov_iter_copy_from_user_atomic+0x668/0xa68
[10648.321984][   C34]  generic_perform_write+0x24c/0x370
[10648.327121][   C34]  __generic_file_write_iter+0x280/0x428
[10648.332605][   C34]  generic_file_write_iter+0x2a4/0x8e0
[10648.337916][   C34]  do_iter_readv_writev+0x348/0x590
[10648.342966][   C34]  do_iter_write+0x124/0x468
[10648.347407][   C34]  vfs_iter_write+0x54/0x98
[10648.351764][   C34]  iter_file_splice_write+0x4a0/0x860
[10648.356987][   C34]  do_splice_from+0x60/0x120
[10648.361430][   C34]  do_splice+0x7ac/0x1310
[10648.365611][   C34]  __arm64_sys_splice+0x118/0x1e0
[10648.370489][   C34]  do_el0_svc+0x124/0x228
[10648.374672][   C34]  el0_sync_handler+0x260/0x410
[10648.379374][   C34]  el0_sync+0x140/0x180

[10718.696867][ T1302] INFO: task trinity-c44:135125 blocked for more than 123 seconds.
[10718.704644][ T1302]       Tainted: G             L    5.8.0-rc5-next-20200717+ #2
[10718.712163][ T1302] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10718.720728][ T1302] trinity-c44     D25760 135125      1 0x00000001
[10718.727040][ T1302] Call trace:
[10718.730222][ T1302]  __switch_to+0x1a0/0x258
[10718.734496][ T1302]  __schedule+0x770/0x18d0
[10718.738804][ T1302]  schedule+0x1e4/0x3e8
[10718.742824][ T1302]  rwsem_down_write_slowpath+0x80c/0xe28
[10718.748359][ T1302]  down_write+0x208/0x230
[10718.752547][ T1302]  generic_file_write_iter+0xd0/0x8e0
[10718.757774][ T1302]  do_iter_readv_writev+0x348/0x590
[10718.762864][ T1302]  do_iter_write+0x124/0x468
[10718.767311][ T1302]  vfs_writev+0x15c/0x250
[10718.771543][ T1302]  do_pwritev+0x128/0x160
[10718.775732][ T1302]  __arm64_sys_pwritev+0x88/0xc8
[10718.780561][ T1302]  do_el0_svc+0x124/0x228
[10718.784749][ T1302]  el0_sync_handler+0x260/0x410
[10718.789491][ T1302]  el0_sync+0x140/0x180

  reply	other threads:[~2020-07-20 20:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04 19:11 splice() rcu_sched self-detected stall on CPU Qian Cai
2020-05-05 18:57 ` Paul E. McKenney
2020-05-05 19:00   ` Qian Cai
2020-07-20 20:12     ` Qian Cai [this message]
2020-07-20 22:10       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200720201204.GA1280@lca.pw \
    --to=cai@lca.pw \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.