From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80B7F326928 for ; Thu, 26 Mar 2026 02:26:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774491988; cv=none; b=qY/vznKJi9ouBkoxUvPCjooFj270WUZsc8FXHtg9K6oeZClNV9qyfjObzzZLgiSxJD26RgTR09Io/6cBT5wX6OKK8VrEDjJG8wiD5bNkLdEDF3UKF2jGunEx0vOfTK/yeHi5QJjbljazLCdcM2bHzlduo+szpC3Nr8DA7ovo/cQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774491988; c=relaxed/simple; bh=ynby6JrDwn5Ax6abbGMCPRmuLTqO16MRFElp7xF6HjQ=; h=From:References:In-Reply-To:Date:Message-Id:To:Cc:Subject: Mime-Version:Content-Type; b=Ct78pSYzPpAxKeikug/S8DD6PF8cKabcwfbAPeAAsZPM0LC98n5uwFhcNPImZV4IMf/7fGmjBYHyryNZOYjtlqz2SdtfCXJ/Z9uhgWPg3A2AGu7kepzON/ZhPIwGxLnztr4XzYjZCuzHFeL36Sz2FC+pj8l7Qzmt++YMtptyv3I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=E5n+X4Q/; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="E5n+X4Q/" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774491981; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=ynby6JrDwn5Ax6abbGMCPRmuLTqO16MRFElp7xF6HjQ=; b=E5n+X4Q/L5G+c2DiV3CVngYaOn9sbMB7AIgJUv6IY7v9FVZ1RcFbsFwFgeFq8x7uyikqrQ TPKICGsDxpiYnpqHCAuYfQmp7c6qaeG6TC/0Sl96LjxHZ1AJRx61izL4RE6/Qy0Zuiowg9 8Eck6rrI3ozlqwKZeJFgrAqulfWIICwvdkiDGVV75vI6AL0G25UP1Sb80BI2tJMH8cB6wG hURQ+qW6w6hCnjJsOhbJ89362V+2/Ljbk18Zgiy6k94yInEXEqepHtF3b1R+ZYRX6sESd5 yyGZKACfNiBG5KsVR3sK+xqpVqMEhUvtpwYdeM/TB/TsJctUeSBbylpvbGqsTQ== From: "changfengnan" X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable References: <20260325093349.630193-1-diangangli@gmail.com> <20260325093349.630193-2-diangangli@gmail.com> In-Reply-To: Date: Thu, 26 Mar 2026 10:26:16 +0800 Message-Id: To: "Zhang Yi" Cc: "Diangang Li" , "Andreas Dilger" , "Diangang Li" , , , , Subject: Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO failure Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 > From: "Zhang Yi" > Date:=C2=A0 Wed, Mar 25, 2026, 22:27 > Subject:=C2=A0 Re: [RFC 1/1] ext4: fail fast on repeated metadata reads a= fter IO failure > To: "Diangang Li", "Andreas Dilger", "Diangang Li" > Cc: , , , , > Hi, Diangang, >=C2=A0 > On 3/25/2026 7:13 PM, Diangang Li wrote: > > Hi Andreas, > >=C2=A0 > > BH_Read_EIO is cleared on successful read or write. >=C2=A0 > I think what Andreas means is, since you modified the ext4_read_bh()=C2= =A0 > interface, if the bh to be read already has the Read_EIO flag set, then= =C2=A0 > subsequent read operations through this interface will directly return=C2= =A0 > failure without issuing a read I/O. At the same time, because its state IMO, we first need to reach a consensus on whether we can expect a retry to succeed after a read failure.=C2=A0 Given that current SCSI and NVMe drivers already perform multiple retries for I/O errors. IMO, this depends on the specific error. If the block layer returns BLK_STS_RESOURCE or BLK_STS_AGAIN, we can retry; however, if it returns BLK_STS_MEDIUM or BLK_STS_IOERR, there is no need to retry. For scenarios requiring a retry, we should also wait for a certain time window before retrying. Thanks. Fengnan. > is also not uptodate, for an existing block, a write request will not be= =C2=A0 > issued either. How can we clear this Read_EIO flag? IIRC, relying solely= =C2=A0 > on ext4_read_bh_nowait() doesn't seem sufficient to achieve this. >=C2=A0 > Thanks, > Yi. >=C2=A0 > >=C2=A0 > > In practice bad blocks are typically repaired/remapped on write, so we > > expect recovery after a successful rewrite. If the block is never > > rewritten, repeatedly issuing the same failing read does not help. > >=C2=A0 > > We clear the flag on successful reads so the buffer can recover > > immediately if the error was transient. Since read-ahead reads are not > > blocked, a later successful read-ahead will clear the flag and allow > > subsequent synchronous readers to proceed normally. > >=C2=A0 > > Best, > > Diangang > >=C2=A0 > > On 3/25/26 6:15 PM, Andreas Dilger wrote: > >> On Mar 25, 2026, at 03:33, Diangang Li wrote: > >>> > >>> From: Diangang Li > >>> > >>> ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read f= ails, > >>> the buffer remains !Uptodate. With concurrent callers, each waiter ca= n > >>> retry the same failing read after the previous holder drops BH_Lock. = This > >>> amplifies device retry latency and may trigger hung tasks. > >>> > >>> In the normal read path the block driver already performs its own ret= ries. > >>> Once the retries keep failing, re-submitting the same metadata read f= rom > >>> the filesystem just amplifies the latency by serializing waiters on > >>> BH_Lock. > >>> > >>> Remember read failures on buffer_head and fail fast for ext4 metadata= reads > >>> once a buffer has already failed to read. Clear the flag on successfu= l > >>> read/write completion so the buffer can recover. ext4 read-ahead uses > >>> ext4_read_bh_nowait(), so it does not set the failure flag and remain= s > >>> best-effort. > >> > >> Not that the patch is bad, but if the BH_Read_EIO flag is set on a buf= fer > >> and it prevents other tasks from reading that block again, how would t= he > >> buffer ever become Uptodate to clear the flag? =C2=A0There isn't enoug= h state > >> in a 1-bit flag to have any kind of expiry and later retry. > >> > >> Cheers, Andreas > > >=C2=A0