From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05870C76188 for ; Mon, 3 Apr 2023 14:40:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233792AbjDCOkw (ORCPT ); Mon, 3 Apr 2023 10:40:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233791AbjDCOkv (ORCPT ); Mon, 3 Apr 2023 10:40:51 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E47D417ACF for ; Mon, 3 Apr 2023 07:40:49 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8E8AAB81CEA for ; Mon, 3 Apr 2023 14:40:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F2503C433D2; Mon, 3 Apr 2023 14:40:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1680532847; bh=qAyAT3Te9onZUsww9VPY0vLWsDiE3qYQbgYPKPQkrGc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GdmHHhTjoCBc7zfZeZb8kToT1SdPXoJB7yjc7Bqb6XGUfHc13XWdUfZhpxf/JfXQK 0tAUiQJyZbPeX5dmasSYH8vd3SoMhS9sCDTztj5iXo6KK/WutL7l/4E8ioLoChadwa pcvYXznFn1ZJqIg+RqDNNao24a5feLS8Qw9S9g9M= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Hans Holmberg , Damien Le Moal , Christoph Hellwig , Johannes Thumshirn , Hans Holmberg Subject: [PATCH 6.1 139/181] zonefs: Always invalidate last cached page on append write Date: Mon, 3 Apr 2023 16:09:34 +0200 Message-Id: <20230403140419.577729204@linuxfoundation.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230403140415.090615502@linuxfoundation.org> References: <20230403140415.090615502@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Damien Le Moal commit c1976bd8f23016d8706973908f2bb0ac0d852a8f upstream. When a direct append write is executed, the append offset may correspond to the last page of a sequential file inode which might have been cached already by buffered reads, page faults with mmap-read or non-direct readahead. To ensure that the on-disk and cached data is consistant for such last cached page, make sure to always invalidate it in zonefs_file_dio_append(). If the invalidation fails, return -EBUSY to userspace to differentiate from IO errors. This invalidation will always be a no-op when the FS block size (device zone write granularity) is equal to the page size (e.g. 4K). Reported-by: Hans Holmberg Fixes: 02ef12a663c7 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Reviewed-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Tested-by: Hans Holmberg Signed-off-by: Greg Kroah-Hartman --- fs/zonefs/file.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) --- a/fs/zonefs/file.c +++ b/fs/zonefs/file.c @@ -382,6 +382,7 @@ static ssize_t zonefs_file_dio_append(st struct zonefs_zone *z = zonefs_inode_zone(inode); struct block_device *bdev = inode->i_sb->s_bdev; unsigned int max = bdev_max_zone_append_sectors(bdev); + pgoff_t start, end; struct bio *bio; ssize_t size; int nr_pages; @@ -390,6 +391,19 @@ static ssize_t zonefs_file_dio_append(st max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize); iov_iter_truncate(from, max); + /* + * If the inode block size (zone write granularity) is smaller than the + * page size, we may be appending data belonging to the last page of the + * inode straddling inode->i_size, with that page already cached due to + * a buffered read or readahead. So make sure to invalidate that page. + * This will always be a no-op for the case where the block size is + * equal to the page size. + */ + start = iocb->ki_pos >> PAGE_SHIFT; + end = (iocb->ki_pos + iov_iter_count(from) - 1) >> PAGE_SHIFT; + if (invalidate_inode_pages2_range(inode->i_mapping, start, end)) + return -EBUSY; + nr_pages = iov_iter_npages(from, BIO_MAX_VECS); if (!nr_pages) return 0;