From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67736296BD2 for ; Mon, 27 Apr 2026 17:44:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.54 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777311878; cv=none; b=sm6q0N+iWQ/ZLFuensOolQLtKnu6e3rou60iMjgX8TefnSCSKmxQGU4qXKZ9yyliuZ67n4P0ni6mCsm9JrgYIJ5EGCUkXHQthSd6WTM6saWfQyCjLA3SaiBAH3iLGNicWX+KTRNONbLFh4NOXRw8beQb2ev2WDb33dV7/OIbkWU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777311878; c=relaxed/simple; bh=Q23bYhHL/7vStZ7nw50Qae+tU91nstFMVQEBdwOiDe8=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=vF6otCYyhbtFKPQSrjiBqESmFonDYRGG/WBU+3fWnPDFzGAnRY5Q1LEF8x7K9m2ZJY7llJcsJfjP0pSNRgxlm7AyU81oaQKPfU8xAxqyec3/arLlWs/Fo7U6aETlrzVfbNhOmjW3rZVyfxZbZGtuaXC4OLHE3q40CFAM8+Lg9D0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AIYIws5c; arc=none smtp.client-ip=209.85.221.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AIYIws5c" Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-43eada6d900so10321682f8f.0 for ; Mon, 27 Apr 2026 10:44:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777311874; x=1777916674; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=qNTXa+mqIuGkwejdo9tem0NUyOYBLXBlPSmGOAibTY0=; b=AIYIws5c2cwIHqx7hn+Tc7htUKo7sUliUe39zp1EbsZQbg6m8YBs0d7Amy53nrSTXv DugL8VDTTN8cnvMNt7XFUVRU2qnPK6BBKGggcgnVwfRoLU8Hl+zqss8h5n8Ya+PnxLM/ d/1SUocRh4RHwYAzWs0twOYptxf9yJk3EEaWzqddSPPTDZWx+vjzH3YNLFcLcTHy7F3r VEDtAlQgShcRosMjHjET1qRpj5OKU1YXdRsvbHzjkkbanuUW1NG+K8AAsZha4unae/hb pri32wDAC05afpkPaYtW909YQMWueba0XwFGPDZ3bSmc8jUp5J2xB/yH3fPLdD9seE4M eCfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777311874; x=1777916674; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=qNTXa+mqIuGkwejdo9tem0NUyOYBLXBlPSmGOAibTY0=; b=AJogZg6EHc2tq8n9LJiXyLLm7RkJt4DL4O7Yq8cq9WmKtbZTsh5C5+JnREZWAngU/J Ma7LPcR5TAASwCVvlG+gdxPy3qDEc5Q2tmzme9OIY9dbix6emgexx1OOzZn34N1OT0+m vC5uUei3rtOwOLMzXIZ087UiItAZmrE+JBnEWL57+zPGZs2nKT2mRB85bfkV4/riOCZy nnzQvXwPp8kN8lWGwGUrXXYwRI+k+UQSA/3FAUTNO9i2tNXtsH3Xi2xCv1QxOmetGZZf KUnOPE0dE4reSrEzyCgXSNXfYowIIehlUUzRYMrJNXMwJHzobhweTSBR20xCwc9DVT/f tuOA== X-Forwarded-Encrypted: i=1; AFNElJ85imLRybrZUZWyAd0YDHxEH5gGSgx/eYurc7yTOeI0HRrWKh4g+0RdAyYLxZhr4afyLrDNU7b24kEJ+dg=@vger.kernel.org X-Gm-Message-State: AOJu0YxE2f+GFHVAWll/QVC607W68wPEC9b0g3s8yGtn0/o5pohSAA47 kE2byfnQldaFmfp1w14EdMMPNKNRAnR9UX8uQ/iTVnx5yuLOQ5G6fzKlAS0sng== X-Gm-Gg: AeBDiev41FslpAU52ajhYQs3FdCdYIIVuCkSMQeJWH5p7vy4ZQ6PhrV1VX010FE1FY6 CjJ7vNnw+6W2Fg6SYaxxr3AUS1QW2OAs9GY8kW1IcaqxyO/woteAUymG1mJrs4UyZ2j33UzdgSM yoQkVq2BuHCLNtOAkSkxyuMi/2TyquzJFObygShvRrj9n34wqgVYutDG/IZ6h//+cM53nrnEiU1 ipU2lfApgBe2NCbRDUJVNB4ppO3+lVBVXJG9PjPGQ9HBbypWIOnW4BYw5DvIfoM3qfWBgCMUZOR 3amvEJr402E3woc3DfqWn2dl/rwbHg75abr4QhG78XaYHeWvoZ/OudfwDEJ12m+v7Nubn8+b05R UQjT2X9XsR3t90uHua1aTNdnnZuwZT1vR3/1sF3GnxRj1JMTbUTloht1qt7HS1hFHZAFT+BI1V+ KjJBDJZ+v23cTO0UieI2mw5hLwhQcI3FRwP+jATkRaj32vTfUHUoBnNq35edLlC8UjDR+TVEzNg u9f X-Received: by 2002:a05:6000:26d3:b0:43f:e414:1bed with SMTP id ffacd0b85a97d-4463d219b52mr27448f8f.0.1777311874127; Mon, 27 Apr 2026 10:44:34 -0700 (PDT) Received: from Abds-MacBook-Air.local ([2a02:3037:60f:89ae:4c2f:e07a:5958:2fb1]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44639f2d534sm73715f8f.0.2026.04.27.10.44.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Apr 2026 10:44:33 -0700 (PDT) From: Abd-Alrhman Masalkhi To: Paul Menzel Cc: song@kernel.org, yukuai@fnnas.com, shli@fb.com, neilb@suse.com, linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] md/raid1: fix bio splitting in raid1 thread to avoid recursion and deadlock In-Reply-To: References: <20260427103446.300378-1-abd.masalkhi@gmail.com> Date: Mon, 27 Apr 2026 19:44:31 +0200 Message-ID: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Paul, Thank you for the feedback. On Mon, Apr 27, 2026 at 16:49 +0200, Paul Menzel wrote: > Dear Abd-Alrhman, > > > Thank you for your patch. > > Am 27.04.26 um 12:34 schrieb Abd-Alrhman Masalkhi: >> Splitting a bio while executing in the raid1 thread can lead to >> recursion, as task->bio_list is NULL in this context. >>=20 >> In addition, resubmitting an md_cloned_bio after splitting may lead to >> a deadlock if the array is suspended before the md driver calls >> percpu_ref_tryget_live(&mddev->active_io) on it's path to >> pers->make_request(). >>=20 >> Avoid splitting the bio in this context and require that it is either >> read in full or not at all. >>=20 >> This prevents recursion and avoids potential deadlocks during array >> suspension. > > Do you have a reproducer? I found this issue while reviewing the code and trying to understand the read path. The problem can be triggered when the first rdev cannot complete the md_cloned_bio successfully, and RAID1 selects another rdev that cannot fulfil the entire request. In that case, raid1_read_request() will split the bio (the md_cloned_bio) via bio_submit_split_bioset(), which in turn calls submit_bio_noacct_nocheck(). Since current->bio_list is NULL in this context, this leads to raid1_read_request() being called again, resulting in recursion. I have also created a test to confirm that this can occur by modifying the code with the following patch: drivers/md/raid1.c | 52 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 46 insertions(+), 6 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index cc9914bd15c1..145e3ad0b1b8 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -362,11 +362,34 @@ static int find_bio_disk(struct r1bio *r1_bio, struct= bio *bio) static void raid1_end_read_request(struct bio *bio) { + static int tmp =3D 75; int uptodate =3D !bio->bi_status; struct r1bio *r1_bio =3D bio->bi_private; struct r1conf *conf =3D r1_bio->mddev->private; struct md_rdev *rdev =3D conf->mirrors[r1_bio->read_disk].rdev; + if (tmp > 2) { + tmp --; + pr_info("--tmp =3D %d\n", tmp); + } else if (tmp =3D=3D 2) { + if (r1_bio->sectors > 2 && uptodate) { + pr_info("I will start omitting errors\n"); + bio->bi_status =3D BLK_STS_IOERR; + uptodate =3D false; + tmp =3D 0; + } + } else { + if (r1_bio->sectors > 2) { + if (tmp) { + bio->bi_status =3D BLK_STS_IOERR; + uptodate =3D false; + tmp =3D false; + } else { + tmp =3D true; + } + } + } + /* * this branch is our 'one mirror IO has finished' event handler: */ @@ -607,7 +630,7 @@ static int choose_first_rdev(struct r1conf *conf, struc= t r1bio *r1_bio, /* choose the first disk even if it has some bad blocks. */ read_len =3D raid1_check_read_range(rdev, this_sector, &len); - if (read_len > 0) { + if (read_len > 0 && (!*max_sectors || read_len =3D=3D r1_bio->sectors)) { update_read_sectors(conf, disk, this_sector, read_len); *max_sectors =3D read_len; return disk; @@ -704,8 +727,12 @@ static int choose_slow_rdev(struct r1conf *conf, struc= t r1bio *r1_bio, } if (bb_disk !=3D -1) { - *max_sectors =3D bb_read_len; - update_read_sectors(conf, bb_disk, this_sector, bb_read_len); + if (!*max_sectors || bb_read_len =3D=3D r1_bio->sectors) { + *max_sectors =3D bb_read_len; + update_read_sectors(conf, bb_disk, this_sector, bb_read_len); + } else { + bb_disk =3D -1; + } } return bb_disk; @@ -884,9 +911,11 @@ static int read_balance(struct r1conf *conf, struct r1= bio *r1_bio, * now spend a bit more time trying to find one with the most good * sectors. */ - disk =3D choose_bb_rdev(conf, r1_bio, max_sectors); - if (disk >=3D 0) - return disk; + if (!*max_sectors) { + disk =3D choose_bb_rdev(conf, r1_bio, max_sectors); + if (disk >=3D 0) + return disk; + } return choose_slow_rdev(conf, r1_bio, max_sectors); } @@ -1320,6 +1349,11 @@ static void raid1_read_request(struct mddev *mddev, = struct bio *bio, int rdisk; bool r1bio_existed =3D !!r1_bio; + if (mddev->thread && mddev->thread->tsk =3D=3D current) { + pr_info("taask->bio_list =3D %p, %d\n", current->bio_list, + r1bio_existed); + } + /* * If r1_bio is set, we are blocking the raid1d thread * so there is a tiny risk of deadlock. So ask for @@ -1347,6 +1381,7 @@ static void raid1_read_request(struct mddev *mddev, s= truct bio *bio, * make_request() can abort the operation when read-ahead is being * used and no empty request is available. */ + max_sectors =3D r1bio_existed; rdisk =3D read_balance(conf, r1_bio, &max_sectors); if (rdisk < 0) { /* couldn't find anywhere to read from */ @@ -1376,7 +1411,12 @@ static void raid1_read_request(struct mddev *mddev, = struct bio *bio, mddev->bitmap_ops->wait_behind_writes(mddev); } + if (r1bio_existed) + max_sectors =3D max_sectors - 1; + if (max_sectors < bio_sectors(bio)) { + /* we are not allowed */ + /* BUG_ON(r1bio_existed); */ bio =3D bio_submit_split_bioset(bio, max_sectors, &conf->bio_split); if (!bio) { -- 2.43.0 using trace-cmd, it shows the recursion clearly, its output: raid1_read_request() { bio_submit_split_bioset() { bio_split() { bio_alloc_clone() { bio_alloc_bioset() { mempool_alloc_noprof() { __cond_resched(); mempool_alloc_slab() { kmem_cache_alloc_noprof(); } } bio_associate_blkg() { __rcu_read_lock(); kthread_blkcg(); bio_associate_blkg_from_css() { __rcu_read_lock(); __rcu_read_unlock(); } __rcu_read_unlock(); } } bio_clone_blkg_association() { bio_associate_blkg_from_css() { __rcu_read_lock(); __rcu_read_unlock(); __rcu_read_lock(); __rcu_read_unlock(); } } } } bio_chain(); should_fail_bio(); submit_bio_noacct_nocheck() { blk_cgroup_bio_start(); __submit_bio() { __rcu_read_lock(); __rcu_read_unlock(); md_submit_bio() { bio_split_to_limits() { bio_split_rw() { bio_split_io_at(); bio_submit_split(); } } md_handle_request() { __rcu_read_lock(); __rcu_read_unlock(); raid1_make_request() { raid1_read_request() { _printk() { vprintk() { vprintk_default() { vprintk_emit() { panic_on_other_cpu(); nbcon_get_default_prio() { panic_on_this_cpu(); nbcon_get_cpu_emergency_nesting() { printk_percpu_data_ready(); } } is_printk_legacy_deferred() { is_printk_cpu_sync_owner(); } vprintk_store() { local_clock(); printk_parse_prefix(); is_printk_force_console(); prb_reserve() { data_alloc() { data_push_tail(); } space_used.isra.0(); } printk_sprint() { printk_parse_prefix(); } prb_final_commit() { desc_update_last_finalized() { _prb_read_valid() { desc_read_finalized_seq(); } _prb_read_valid() { desc_read_finalized_seq(); panic_on_this_cpu(); } } } } console_trylock() { panic_on_other_cpu(); __printk_safe_enter(); down_trylock() { _raw_spin_lock_irqsave(); _raw_spin_unlock_irqrestore(); } __printk_safe_exit(); } console_unlock() { nbcon_get_default_prio() { panic_on_this_cpu(); nbcon_get_cpu_emergency_nesting() { printk_percpu_data_ready(); } } is_printk_legacy_deferred() { is_printk_cpu_sync_owner(); } console_flush_one_record() { nbcon_get_default_prio() { panic_on_this_cpu(); nbcon_get_cpu_emergency_nesting() { printk_percpu_data_ready(); } } is_printk_legacy_deferred() { is_printk_cpu_sync_owner(); } __srcu_read_lock(); printk_get_next_message() { prb_read_valid() { _prb_read_valid() { desc_read_finalized_seq(); get_data(); desc_read_finalized_seq(); } } } panic_on_other_cpu(); __srcu_read_unlock(); } console_flush_one_record() { nbcon_get_default_prio() { panic_on_this_cpu(); nbcon_get_cpu_emergency_nesting() { printk_percpu_data_ready(); } } is_printk_legacy_deferred() { is_printk_cpu_sync_owner(); } __srcu_read_lock(); printk_get_next_message() { prb_read_valid() { _prb_read_valid() { desc_read_finalized_seq(); panic_on_this_cpu(); } } } __srcu_read_unlock(); } __printk_safe_enter(); up() { _raw_spin_lock_irqsave(); _raw_spin_unlock_irqrestore(); } __printk_safe_exit(); prb_read_valid() { _prb_read_valid() { desc_read_finalized_seq(); panic_on_this_cpu(); } } } __wake_up_klogd(); } } } } mempool_alloc_noprof() { __cond_resched(); mempool_kmalloc() { __kmalloc_noprof(); } } md_account_bio() { __rcu_read_lock(); __rcu_read_unlock(); bio_alloc_clone() { bio_alloc_bioset() { mempool_alloc_noprof() { __cond_resched(); mempool_alloc_slab() { kmem_cache_alloc_noprof(); } } bio_associate_blkg() { __rcu_read_lock(); kthread_blkcg(); bio_associate_blkg_from_css() { __rcu_read_lock(); __rcu_read_unlock(); } __rcu_read_unlock(); } } bio_clone_blkg_association() { bio_associate_blkg_from_css() { __rcu_read_lock(); __rcu_read_unlock(); __rcu_read_lock(); __rcu_read_unlock(); } } } bio_start_io_acct() { update_io_ticks(); } } bio_alloc_clone() { bio_alloc_bioset() { mempool_alloc_noprof() { __cond_resched(); mempool_alloc_slab() { kmem_cache_alloc_noprof(); } } bio_associate_blkg() { __rcu_read_lock(); kthread_blkcg(); bio_associate_blkg_from_css() { __rcu_read_lock(); __rcu_read_unlock(); } __rcu_read_unlock(); } } bio_clone_blkg_association() { bio_associate_blkg_from_css() { __rcu_read_lock(); __rcu_read_unlock(); __rcu_read_lock(); __rcu_read_unlock(); } } } submit_bio_noacct() { __cond_resched(); submit_bio_noacct_nocheck() { blk_cgroup_bio_start(); } } } } __rcu_read_lock(); __rcu_read_unlock(); } } __rcu_read_lock(); __rcu_read_unlock(); } __submit_bio() { blk_mq_submit_bio() { __rcu_read_lock(); __rcu_read_unlock(); bio_split_rw() { bio_split_io_at(); bio_submit_split(); } blk_attempt_plug_merge(); blk_mq_sched_bio_merge(); __blk_mq_alloc_requests() { blk_mq_get_tag() { __blk_mq_get_tag(); } blk_mq_rq_ctx_init.isra.0(); } ktime_get(); update_io_ticks(); blk_add_rq_to_plug(); } } } } bio_alloc_clone() { bio_alloc_bioset() { mempool_alloc_noprof() { __cond_resched(); mempool_alloc_slab() { kmem_cache_alloc_noprof() { __slab_alloc.isra.0() { ___slab_alloc(); } } } } bio_associate_blkg() { __rcu_read_lock(); kthread_blkcg(); bio_associate_blkg_from_css() { __rcu_read_lock(); __rcu_read_unlock(); } __rcu_read_unlock(); } } bio_clone_blkg_association() { bio_associate_blkg_from_css() { __rcu_read_lock(); __rcu_read_unlock(); __rcu_read_lock(); __rcu_read_unlock(); } } } submit_bio_noacct() { __cond_resched(); submit_bio_noacct_nocheck() { blk_cgroup_bio_start(); __submit_bio() { blk_mq_submit_bio() { __rcu_read_lock(); __rcu_read_unlock(); bio_split_rw() { bio_split_io_at(); bio_submit_split(); } blk_attempt_plug_merge(); blk_mq_sched_bio_merge(); __blk_mq_alloc_requests() { blk_mq_get_tag() { __blk_mq_get_tag(); } blk_mq_rq_ctx_init.isra.0(); } update_io_ticks() { bdev_count_inflight(); } blk_add_rq_to_plug(); } } } } } > >> Fixes: 689389a06ce7 ("md/raid1: simplify handle_read_error().") >> Signed-off-by: Abd-Alrhman Masalkhi >> --- >> I sent an email about this issue two days ago, but at the time I was not >> sure whether it was a real problem or a misunderstanding on my part. >>=20 >> After further analysis, it appears that this issue can occur. >>=20 >> Apologies for the earlier confusion, and thank you for your time. >>=20 >> Abd-Alrhman > > I suggest to always share the URL (lore.kernel.org), when referencing=20 > another thread. If relevant, maybe even reference your message with a=20 > Link: tag in the commit message. Yes, i will make sure to do that next time. Here is the link: https://lore.kernel.org/linux-raid/20260425142938.5555-1-abd.masalkhi@gmail= .com/T > >> --- >> drivers/md/raid1.c | 33 ++++++++++++++++++++++++--------- >> 1 file changed, 24 insertions(+), 9 deletions(-) >>=20 >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index cc9914bd15c1..14f6d6625811 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -607,7 +607,7 @@ static int choose_first_rdev(struct r1conf *conf, st= ruct r1bio *r1_bio, >>=20=20=20 >> /* choose the first disk even if it has some bad blocks. */ >> read_len =3D raid1_check_read_range(rdev, this_sector, &len); >> - if (read_len > 0) { >> + if (read_len > 0 && (!*max_sectors || read_len =3D=3D r1_bio->sectors= )) { >> update_read_sectors(conf, disk, this_sector, read_len); >> *max_sectors =3D read_len; >> return disk; >> @@ -704,8 +704,13 @@ static int choose_slow_rdev(struct r1conf *conf, st= ruct r1bio *r1_bio, >> } >>=20=20=20 >> if (bb_disk !=3D -1) { >> - *max_sectors =3D bb_read_len; >> - update_read_sectors(conf, bb_disk, this_sector, bb_read_len); >> + if (!*max_sectors || bb_read_len =3D=3D r1_bio->sectors) { >> + *max_sectors =3D bb_read_len; >> + update_read_sectors(conf, bb_disk, this_sector, >> + bb_read_len); >> + } else { >> + bb_disk =3D -1; >> + } >> } >>=20=20=20 >> return bb_disk; >> @@ -852,8 +857,9 @@ static int choose_best_rdev(struct r1conf *conf, str= uct r1bio *r1_bio) >> * disks and disks with bad blocks for now. Only pay attention to key = disk >> * choice. >> * >> - * 3) If we've made it this far, now look for disks with bad blocks and= choose >> - * the one with most number of sectors. >> + * 3) If we've made it this far and *max_sectors is 0 (i.e., we are tol= erant >> + * of bad blocks), look for disks with bad blocks and choose the one wi= th >> + * the most sectors. >> * >> * 4) If we are all the way at the end, we have no choice but to use a= disk even >> * if it is write mostly. >> @@ -882,11 +888,13 @@ static int read_balance(struct r1conf *conf, struc= t r1bio *r1_bio, >> /* >> * If we are here it means we didn't find a perfectly good disk so >> * now spend a bit more time trying to find one with the most good >> - * sectors. >> + * sectors. but only if we are tolerant of bad blocks. > > s/but/But/ > I will fix this in v2. >> */ >> - disk =3D choose_bb_rdev(conf, r1_bio, max_sectors); >> - if (disk >=3D 0) >> - return disk; >> + if (!*max_sectors) { >> + disk =3D choose_bb_rdev(conf, r1_bio, max_sectors); >> + if (disk >=3D 0) >> + return disk; >> + } >>=20=20=20 >> return choose_slow_rdev(conf, r1_bio, max_sectors); >> } >> @@ -1346,7 +1354,14 @@ static void raid1_read_request(struct mddev *mdde= v, struct bio *bio, >> /* >> * make_request() can abort the operation when read-ahead is being >> * used and no empty request is available. >> + * >> + * If we allow splitting the bio while executing in the raid1 thread, >> + * we may end up recursing (current->bio_list is NULL), and we might >> + * also deadlock if we try to suspend the array, since we are >> + * resubmitting an md_cloned_bio. Therefore, we must be read either > > =E2=80=A6 we must read =E2=80=A6 > I will fix this in v2. >> + * all the sectors or none. >> */ >> + max_sectors =3D r1bio_existed; > > Excuse my ignorance, but I do not get why a bool is assigned to an int=20 > representing the maximum sector value. > I modified read_balance() to interpret *max_sectors as a flag. If it is 0, the read path is allowed to be tolerant of bad blocks; otherwise, it is not. In both cases, *max_sectors will eventually be updated to the maximum number of readable sectors if a suitable disk is found. I used r1bio_existed to initialize this value, so assigning max_sectors =3D r1bio_existed effectively encodes this behavior without introducing an additional parameter. >> rdisk =3D read_balance(conf, r1_bio, &max_sectors); >> if (rdisk < 0) { >> /* couldn't find anywhere to read from */ > > > Kind regards, > > Paul --=20 Best Regards, Abd-Alrhman