From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69729358367 for ; Wed, 24 Jun 2026 07:24:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782285896; cv=none; b=gmfCavoLRBTTW5NX7yAhGH/OX8B5ojRvWe0wFYL669DIYplhLuScclEB4yoEkL+KqxJQIFPMpB9uF9Ok1CSsCd3XO1/U/RrIVh8qMP1UsYokGGeGi0oIsjBrn0ImPU2zq1qTuLLi7U3F/MoGIEuSjj+Z0PvPPRyyPp4JCW4gPnc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782285896; c=relaxed/simple; bh=FzyGfplLkDeUsMlhIW+FcHad0Wi3Gcm1HmNpwFByhdg=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=lXHh7R0owPO1hyRYH11XVfHoReeXyR95B1heBEdplds564V0D9j88t2JFSB056XRI0SFsKmiG6FokPSK/VxFiL4iBN0IYL/F1uOSxq3YyyoWvW0coeXgcTnGNuEuTOjQV6BUkZ9ZOoNGXFd76zbAv4AB+nripcJ8tT647aXaU9g= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FqKpUVqB; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FqKpUVqB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CDE1F1F000E9; Wed, 24 Jun 2026 07:24:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782285895; bh=ZHMGdhlrBq8N/8OTtJq2B4iT4WKQ/eYYmiuGO1SYfFk=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=FqKpUVqBcx5QGe/e4IDi0h4YZiDcZXN5LfE7bRW4PoBjh7ULXJOUMABdLflhAoYvo sVxvsmiRADOeL9khZ/IrHLe+DTBFW2DxCL4O9fk7dJ4xWUJK9LYwd0g9ArUhrY6a1s e1SB0I2Hfs+IFtszdqmVImlmIrptAS4rvfGWgLuL9Ezyh7OxEoy7XcClmBe0m7MYji PKppXyT3C2e87jZoBVMk5FjRi4Vm/6XPLvrEHks15lUSq28vSLuKMgsYszBGAxBH89 6vgwb1yieQsqZJLGLP8ZaiW9DMGOdPWlLetmXwknW4dr3tBL2iBg4/JYFGup/+nmGe /idd1BBTzuiDw== From: sashiko-bot@kernel.org Subject: Re: [PATCH v2 18/20] md/raid5: reject llbitmap reshape when md chunk shrinks Reply-To: sashiko-reviews@lists.linux.dev To: "Yu Kuai" Cc: linux-raid@vger.kernel.org, yukuai@fygo.io In-Reply-To: <9b06d3a655f709b35ab10995a872b30d2026f7c3.1782282042.git.yukuai@kernel.org> References: <20260605091527.2463539-1-yukuai@kernel.org> <9b06d3a655f709b35ab10995a872b30d2026f7c3.1782282042.git.yukuai@kernel.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Wed, 24 Jun 2026 07:24:54 +0000 Message-Id: <20260624072454.CDE1F1F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 3 potential issue(= s) to consider: New issues: - [High] The validation in `check_reshape` fails to reject reshapes that de= crease the number of data disks (`mddev->delta_disks < 0`) for llbitmap, le= aving an identical bitmap coverage truncation vulnerability to the one the = patch attempts to fix. Pre-existing issues: - [High] Writing 0 or a very small/overflowing value to the `chunk_size` sy= sfs attribute bypasses chunk size validation, leading to a divide-by-zero k= ernel panic during reshape. - [High] Pre-existing memory leak of `stripe_head` pages in `resize_stripes= ()` if array reshape initialization fails. -- commit cd96767f06f730181f5e43d1d52815ff0535ad82 Author: Yu Kuai md/raid5: reject llbitmap reshape when md chunk shrinks =20 llbitmap reshape keeps one live bitmap and cannot safely make an existing bitmap bit cover a smaller data range. The llbitmap chunksize itself will not shrink when mddev->chunk_sectors stays the same or grows. However, shrinking mddev->chunk_sectors shrinks sectors_per_chunk used by raid5_bitmap_sector_map(). That can shrink the effective data range covered by each bit across the old and new RAID5 geometry. Reject that reshape while llbitmap is active. > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 1613a42cc25dc..9de648f75dd04 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -8506,6 +8506,9 @@ static int check_reshape(struct mddev *mddev) > if (!check_stripe_cache(mddev)) > return -ENOSPC; > =20 > + if (mddev->bitmap_id =3D=3D ID_LLBITMAP && > + mddev->new_chunk_sectors < mddev->chunk_sectors) > + return -EOPNOTSUPP; [Severity: High] Does this validation fully address the logical coverage truncation? The commit message correctly notes that shrinking mddev->chunk_sectors shri= nks sectors_per_chunk, which is unsafe for llbitmap because it shrinks the effective data range covered by each bit. However, sectors_per_chunk is also reduced if mddev->delta_disks < 0: sectors_per_chunk =3D chunk_sectors * (raid_disks - max_degraded); Since the validation here only checks if the new chunk sectors are smaller, a reshape that reduces the number of data disks could bypass this check, resulting in the exact same logical coverage truncation vulnerability. Could this be updated to also check mddev->delta_disks < 0, or compute the product to verify sectors_per_chunk does not shrink? > if (mddev->new_chunk_sectors > mddev->chunk_sectors || > mddev->delta_disks > 0) [Severity: High] This is a pre-existing issue, but writing 0 or a very small value to the chunk_size sysfs attribute appears to bypass chunk size validation entirely, leading to a divide-by-zero panic during reshape. In drivers/md/md.c:chunk_size_store(), user input is right-shifted by 9 to compute mddev->new_chunk_sectors. If the user writes 0, it becomes 0. When raid5_check_reshape() is called: drivers/md/raid5.c:raid5_check_reshape() { ... if (new_chunk > 0) { if (!is_power_of_2(new_chunk)) return -EINVAL; ... } Because the checks are inside if (new_chunk > 0), a value of 0 skips the sanity bounds. check_reshape() then accepts it. Later, in raid5_compute_sector(), this chunk size causes a kernel panic: drivers/md/raid5.c:raid5_compute_sector() { ... chunk_offset =3D sector_div(r_sector, sectors_per_chunk); ... } Should chunk_size_store() or raid5_check_reshape() explicitly prevent a chunk size of 0? > if (resize_chunks(conf, > conf->previous_raid_disks > + max(0, mddev->delta_disks), > max(mddev->new_chunk_sectors, > mddev->chunk_sectors) > ) < 0) > return -ENOMEM; >=20 > if (conf->previous_raid_disks + mddev->delta_disks <=3D conf->pool_size) > return 0; /* never bother to shrink */ > return resize_stripes(conf, (conf->previous_raid_disks > + mddev->delta_disks)); > } [Severity: High] This isn't a bug introduced by this patch, but is it possible for resize_stripes() to leak the stripe_head pages if the array reshape initialization fails? If the array reshape fails under memory pressure during step 3 of resize_stripes() (for example, the ndisks allocation fails), the function structurally proceeds to step 4 without aborting: drivers/md/raid5.c:resize_stripes() { ... for (i=3Dconf->raid_disks; i < newsize; i++) if (nsh->dev[i].page =3D=3D NULL) { struct page *p =3D alloc_page(GFP_NOIO); nsh->dev[i].page =3D p; ... } It loops up to newsize and allocates physical pages for the expanded stripe slots. At the end of the function, because err is set, it skips updating conf->pool_size: drivers/md/raid5.c:resize_stripes() { ... if (!err) conf->pool_size =3D newsize; ... } Since conf->pool_size is unchanged, any future teardown via shrink_buffers() will intentionally ignore the newly allocated pages at indices >=3D pool_si= ze. When free_stripe() subsequently destroys the slab object without freeing th= ese pages, they appear to be permanently leaked. Is there a mechanism that cleans up these orphaned pages if conf->pool_size is not updated? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/cover.1782282042.gi= t.yukuai@kernel.org?part=3D18