From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D1BD3FD140 for ; Tue, 12 May 2026 05:57:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.168.131 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778565488; cv=none; b=bv+MwwmKcirxXNXzl2CVrBEP931gXtXIgmf5tJiqNkjro2hNa55kKVUHhNuyqvKipq++Hzfcp0O6/tYHbwUXGYLM31z/g/vdniHmx0w/hS5gv230TCSOIe1MRELhuMiQQ7GTnesmiWM/4BfPHTlipNZj3TnqrVjkmo+XDP1PLWw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778565488; c=relaxed/simple; bh=I4P2K3NDeEiID8tJFNDDaV1ulvJ5LK3yOgNr/X/VspM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=bH2ZV2rc+u4N0fU4PzVXUcAFDQ+nIPk85X+gvkFs7od72NS1sOMzUisG2LXD4WubDxH/AOrg8pNofwBk114ve30mnCKC1U7Ul6p+xZCQhEQd/mpl7C/d/dWMlmpxdg2VqVk+vonRG7HEctFFmesvhYrUmbDWohf2Pi6JI6CJK4M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com; spf=pass smtp.mailfrom=oss.qualcomm.com; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b=Sd8bCxKU; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b=IbiWeM95; arc=none smtp.client-ip=205.220.168.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oss.qualcomm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=qualcomm.com header.i=@qualcomm.com header.b="Sd8bCxKU"; dkim=pass (2048-bit key) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="IbiWeM95" Received: from pps.filterd (m0279864.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64C5Bkrg2574326 for ; Tue, 12 May 2026 05:57:54 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= LQoDCYgO0PTjoC7vV0SqsIyxGR2z04I0lSmR7QkFCLY=; b=Sd8bCxKUJ802Yc35 uoiJZTujGp4nnt6samW75C/9zhSFfjr1g3Xp9LQ3cTxqm05ctCmaAVbI5OLau80e R4jBofuYWHYmTD55rMu9WF+nJGcrPIn6wDiMPsoKxhjNfRxaDx8RlPYJe+bsjXsA Wn3hVrXjObGHXJsORhF5AzcDL/Cg7ttxm53F2wjLV6N9qUcCCCh85haWlFYRiNyx P9XGoplv1kGh734A8UDmcow3BYqT6VpxhC+yzS2Yl9Iw4bAqrudOng9sVYGjN0zx N35a20J/mJa0cd8lHd+maGu7lJkGJKQ5+VNgHiAu9goGsjqWGzWRruQV1un/+A5T QWqjlQ== Received: from mail-pj1-f71.google.com (mail-pj1-f71.google.com [209.85.216.71]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4e3nv29pwq-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Tue, 12 May 2026 05:57:54 +0000 (GMT) Received: by mail-pj1-f71.google.com with SMTP id 98e67ed59e1d1-366122e01fcso5825284a91.2 for ; Mon, 11 May 2026 22:57:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1778565473; x=1779170273; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=LQoDCYgO0PTjoC7vV0SqsIyxGR2z04I0lSmR7QkFCLY=; b=IbiWeM95/+O8pYDIieU6RA5mx/r8ja5ELOqIBttAoe/wVMkstfOiThbMt7vHUaM5w3 I5suU7QhNzNvE4RFvAZWuS3f8HJOQC71CZelKuMR+JI2Er+XWewRKMnymMpAFBufg+07 nzE3A89WxTq26BiUFsShmlp1cgfEwZJB+1uEdM0FeUAy+5nMeQTlQRQCBI4/vuPtShK6 HTocysANOcdcjPeKg9X5fjK0SiM+NiY7Xj3nbCfaArn3JEX04xNNL0w51IYCUgVZXJo6 wZ3HVpk/q5zUWlI4PvK+pyxU+XjIgJCnnB+pvUVfKI8u6wANlN8l2V32pDJ6yf+4XMqq Nn0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778565473; x=1779170273; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LQoDCYgO0PTjoC7vV0SqsIyxGR2z04I0lSmR7QkFCLY=; b=DovaiJlE3kEALGKWT5EGrXPZ2lenX486YG9k4AGycMYwEKXykJH9oHfEMQd7Bfde/c mc32Je1fqanvg1LJT35rA92sNY8z8K1fxen5PSEj9lzZueyhDkAfNl9uvamUKxyaHTtY 6Rt4xvo2uZiEQR3Qn1E4wVlSc8X+ZszQdYjfT4wwr2eXXZKeDSYRfadnLTjDbR382E0b qE0gBcrJCsYjP/QECxnsdzsQHxcEsr2bc7srOjExJs3FQ5HkA3ijmFb3SGRZb3MUpu30 nlVTplu9C+Zuj8bEsiIPCnJPbNZGbAo0OHIAN8okq2HvtOlrV2kvtSRLXb2so2cH1vKu fDmw== X-Gm-Message-State: AOJu0YzQyaU6AkHrc2nMu0h0Mv0WVCEp/UB/3pxmVgbK5nkNw3nly9pC 4Zyt5p3e7dFxHxeyHuQNwS4s/IxgEKHSJeQlOWX3iJ7VXnd8dRQ0t55vP25ZcucJVJCiB3jziyH rmdIVcdiJqTl/y3AYsafuKFK1tEhOcnTMAt7Yj1IGXniOvepf8RQYXx/IVKIvTf+AlezsKwWqhr Eo6Q== X-Gm-Gg: Acq92OHWS8gHkJoB6UACJHFGGCSC5N3akO7gjc5JF8CzoyYBgf38v/r3m00c/DTooWp raDDEaaksnNK9pz2NKm/GGWC2vUh6BIb9TXbURw8p0LQycsgAgKOKgcwkMA/w72Py1WO+LVHWEq xPlHY099/namVfMoESaGUYlmTcXJQvd3v0XJIceUhQxp6NJ7rokOd/kZpqvNQuVXlXLWQQvyZfo wlGrXqpTzA+COlTxdk/cIzlv4yFFliYOcsLFlxyOUvP37r7oXSxwSNG3apJhuEFIblrhAchln2O MGSv8otJLcDGUgYJ/OqLBT/AccaXa+h4QzXWfveZVmLR1mZOk0xk/u9woTMjFclB3bK6ThuwYio FuJ9nPzMsWfqTkDcv2siUa83xwrZu5Q7d0eqf7/VC8mvhlT2Ozyme0TQ4X5tjaHMyiPxO/WFyRa s2iVJ0WerF10HiJ2w= X-Received: by 2002:a17:90b:1d4f:b0:365:fd4b:24ef with SMTP id 98e67ed59e1d1-368b24f48c4mr1826473a91.8.1778565473335; Mon, 11 May 2026 22:57:53 -0700 (PDT) X-Received: by 2002:a17:90b:1d4f:b0:365:fd4b:24ef with SMTP id 98e67ed59e1d1-368b24f48c4mr1826441a91.8.1778565472690; Mon, 11 May 2026 22:57:52 -0700 (PDT) Received: from [10.249.20.221] (tpe-colo-wan-fw-bordernet.qualcomm.com. [103.229.16.4]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-367d684009esm9991552a91.11.2026.05.11.22.57.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 11 May 2026 22:57:52 -0700 (PDT) Message-ID: <5660795d-87de-46f5-add4-7729a02225ef@oss.qualcomm.com> Date: Tue, 12 May 2026 13:56:55 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] sched: disable preemption around blk_flush_plug in sched_submit_work To: Ming Lei , Jens Axboe , linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Michael Wu References: <20260423125528.2917171-1-tom.leiming@gmail.com> Content-Language: en-US From: Xiaosen In-Reply-To: <20260423125528.2917171-1-tom.leiming@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Authority-Analysis: v=2.4 cv=IsAutr/g c=1 sm=1 tr=0 ts=6a02c162 cx=c_pps a=UNFcQwm+pnOIJct1K4W+Mw==:117 a=nuhDOHQX5FNHPW3J6Bj6AA==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=DJpcGTmdVt4CTyJn9g5Z:22 a=VwQbUJbxAAAA:8 a=6Ujbnq6iAAAA:8 a=pGLkceISAAAA:8 a=fk9MhG10RmsXh4Jbe1IA:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10 a=uKXjsCUrEbL0IQVhDsJ9:22 a=-sNzveBoo8RYOSiOai2t:22 X-Proofpoint-GUID: 7ZDFofG7T_tgs8r6Kon6rTMU_Y9KUO9Z X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTEyMDA1NSBTYWx0ZWRfX/GG+CpMq9+2T bK4n0CweYDxJibPPyOlSs5J5AGBxZ+TjWnLDY9H5+f5qamw1UaX7XBw17UES4pe3+6F4lXgJk2O C6zJgK4w4FZ1awyAT7DMaSVA+HeKXwuIEK7YmiK78STEdI3qdWpfsLXwg5nWYWUvtrbuN3FgcuD CMQDdy8ssyujGt2v1n9DyL87yZuUT7yBDv1KFSxCA1mXsxA881+QNCLDqRuicrue0WxgOoRTL94 6vTa8rODtJXGvFZDL1EUQbXUvRCQMS84Houwn6saHUZ1SIOA7wKFZPvmYI7F2vca/GFIGYPljsy Jwelmn0LkoVuqDElcTZqKQcUj+YTCAaRRWfOkOkX2GdLqp2hB4aeUgnaUtSJReMSly9+ZasQIyB 0V9No1Yt7C9GqEQVyYH6aBFPE51aDIPBNZHf9hf4QzuaPDE/qcEO4KbxzTDPqDhEk/ms2RQ1W3k PGOeuS2tRYfZDsiBGhQ== X-Proofpoint-ORIG-GUID: 7ZDFofG7T_tgs8r6Kon6rTMU_Y9KUO9Z X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-05-11_05,2026-05-08_02,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 spamscore=0 bulkscore=0 clxscore=1011 phishscore=0 malwarescore=0 impostorscore=0 priorityscore=1501 adultscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605050000 definitions=main-2605120055 There is another deadlock caused by preemption during calling blk_flush_plug() in sched_submit_work(). blk_mq_dispatch_list percpu_ref_get(&this_hctx->queue->q_usage_counter) percpu_ref_get_many(ref, 1); rcu_read_lock() __rcu_read_lock() rcu_lock_acquire lock_acquire preempt_schedule_irq --> writeback worker got preempted here and be scheduled out in D state 1. task kworker/u32:6 had dirty pages from f2fs node inode submitted to block layer and the corresponding request was added to plug list of the current task. 2. task snpe-net-run acquired gc_lock waiting for the request that contained page from node inode to be completed. 3. task kworker/u32:6 needed to acquire gc_lock to perform foreground GC, since the gc_lock had already been acquired by task snpe-net-run, so it called blk_flush_plug() in sched_submit_work() before sleeping to avoid deadlocks, but task kworker/u32:6 got preempted in RCU critical section before running hw queue to issue plugged requests. so the plugged requests were pending in local request list. task kworker/u32:6 was scheduled out waiting to be woken up by the release of gc_lock. 4. so, there is a deadlock to cause RCU STALL. I think task kworker/u32:6 should not be scheduled out before returning from blk_flush_plug(), and I think this patch should be able to fix such deadlocks. rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu:   Tasks blocked on level-0 rcu_node (CPUs 0-7): P18782/1:b..l rcu:   (detected by 0, t=5255 jiffies, g=4338701, q=1751 ncpus=8) task:kworker/u32:6 state:D stack:0 pid:18782 tgid:18782 ppid:2 task_flags:0x24208060 flags:0x00000010 Workqueue: writeback wb_workfn (flush-254:55) Call trace: __switch_to+0x214/0x3fc (T) __schedule+0xa70/0x1048 preempt_schedule_irq+0x70/0xd4 raw_irqentry_exit_cond_resched+0x2c/0x44 irqentry_exit+0x38/0x64 exit_to_kernel_mode+0x28/0x38 el1_interrupt+0x5c/0xa8 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x84/0x88 lock_acquire+0x170/0x29c (P) rcu_lock_acquire+0x38/0x44 blk_mq_dispatch_list+0x190/0x69c blk_mq_flush_plug_list+0x13c/0x170 __blk_flush_plug+0x11c/0x17c sched_submit_work+0x7c/0xb8 schedule+0x38/0xc4 schedule_preempt_disabled+0x18/0x2c rwsem_down_write_slowpath+0x768/0x10c0 down_write+0x98/0x240 f2fs_down_write_trace+0x30/0x84 f2fs_balance_fs+0x130/0x17c f2fs_write_single_data_page+0x42c/0x738 f2fs_write_data_pages+0x8c0/0xe80 do_writepages+0xd4/0x1a0 __writeback_single_inode+0x78/0x5bc writeback_sb_inodes+0x2b8/0x580 __writeback_inodes_wb+0xa0/0xf0 wb_writeback+0x188/0x4bc wb_workfn+0x3ec/0x658 process_one_work+0x284/0x62c worker_thread+0x260/0x3b4 kthread+0x150/0x288 ret_from_fork+0x10/0x20 Task name: snpe-net-run [affinity: 0xff] pid: 10012 tgid: 10012 cpu: 2 prio: 120 start: 0xffffff8a1eafc6c0 state: 0x2[D] exit_state: 0x0 stack base: 0xffffffc0a77c0000 Stack: __switch_to+0x214 __schedule+0xa70 schedule+0x48 schedule_timeout+0xa0 io_schedule_timeout+0x48 f2fs_wait_on_all_pages+0x84 do_checkpoint+0x804 f2fs_write_checkpoint+0x820 f2fs_gc+0x1f0 f2fs_balance_fs+0x14c f2fs_map_blocks+0xd1c f2fs_file_write_iter+0x3c0 vfs_write+0x270 ksys_write+0x78 __arm64_sys_write+0x1c invoke_syscall+0x58 el0_svc_common+0xa8 do_el0_svc+0x1c el0_svc+0x40 el0t_64_sync_handler+0x84 el0t_64_sync+0x1c4 Regards, Xiaosen On 4/23/2026 8:55 PM, Ming Lei wrote: > On preemptible kernels, a three-way deadlock can occur involving > blk_mq_freeze_queue and blk_mq_dispatch_list: > > - Task A holds a filesystem lock (e.g., f2fs io_rwsem) and enters > __bio_queue_enter(), waiting for mq_freeze_depth == 0 > - Task B holds mq_freeze_depth=1 (elevator_change) and waits for > q_usage_counter to reach zero in blk_mq_freeze_queue_wait() > - Task C is going to sleep waiting for the filesystem lock. Before > sleeping, schedule() calls sched_submit_work() -> blk_flush_plug() > -> blk_mq_dispatch_list(), which acquires q_usage_counter via > percpu_ref_get(). If Task C gets preempted before percpu_ref_put(), > it will not be scheduled back because the task is already in > uninterruptible sleep state (TASK_UNINTERRUPTIBLE). This means it > holds the percpu_ref indefinitely, preventing freeze from completing. > > This is fundamentally an ABBA deadlock between queue freeze and the > filesystem lock, exposed by preemption creating an artificial hold > on q_usage_counter during the plug flush. > > Fix by disabling preemption around blk_flush_plug() in > sched_submit_work(). The _notrace variants are used since this runs > in scheduler context. preempt_enable_no_resched_notrace() is correct > because we are already inside __schedule() and about to pick the next > task. > > Fixes: 73c101011926 ("block: initial patch for on-stack per-task plugging") > Reported-by: Michael Wu > Tested-by: Michael Wu > Link: https://lore.kernel.org/linux-block/20260417082744.30124-1-michael@allwinnertech.com/ > Signed-off-by: Ming Lei > --- > kernel/sched/core.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index b7f77c165a6e..4217aaaa8e47 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6966,7 +6966,9 @@ static inline void sched_submit_work(struct task_struct *tsk) > * If we are going to sleep and we have plugged IO queued, > * make sure to submit it to avoid deadlocks. > */ > + preempt_disable_notrace(); > blk_flush_plug(tsk->plug, true); > + preempt_enable_no_resched_notrace(); > > lock_map_release(&sched_map); > } > -- > 2.53.0 > >