From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92AFB21B192 for ; Thu, 24 Jul 2025 19:31:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753385482; cv=none; b=rMdEtzg048hNid92IExVuqi7EzEZ6pRpsUfY8GNgjrB4H9cM8t2BLbIFrjVO/fhib6ZTZUqreXP0vYIDAwOKB9eIybileIRMQs3aU0zVzTf1n3qpQveJMT6wdvWH5t0QjRriuXXLzhA8htfgMRHMihPsFNjsBuFWf03kcNgQVbo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753385482; c=relaxed/simple; bh=hto0eLFMU0nw+zHYrsfQJae+ySNuu0bHnYmPMt25sQU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O470JYRL9C56GxvSZSfeKo3PyHPZmKpBCQqz5mjdhzDDrWm0FRobefSAQHRQe6ZVeMrnEUu3NcB5Lh2GEmpwq69syTJaiC9pQzCOuJJ0tGU7t0lnmp+nS5rSYOjt9bx97avT5JnmZnllZ9i6pw0uRCINi4wHHHu61XwqQVxeMwo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Q4n3pv8p; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Q4n3pv8p" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B9AADC4CEF1; Thu, 24 Jul 2025 19:31:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753385482; bh=hto0eLFMU0nw+zHYrsfQJae+ySNuu0bHnYmPMt25sQU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Q4n3pv8pPHGj4swE8RgWW7qKyfICQiYOfdM5UV9sAXvVT7SNnWW3cGUq7HkM2Mu1I wFdFchHC3Nb05s7GiwHgkoKpCExhmUvNHRZnYkkZrOHHw9n94sWLTV/MF3qFD/RV6A fVHxuMPQCH8xXoIXGMsG7mFPOKQjxGpJbY1EOw+pMjzroafpLeZ9z9OCybXpxxPF9g SkWCR1DCdogzBeU13h2TNwC0N0BdoNpP9k6CFl0hmbNA0Oz7FTKrS4fXFN3D7lEPog Ka4AD50ppNnDATjI3a2rDbeS1hS83IP3lzmkd9ByK7gmH7HdWedoS+TnEPrZ2i5VYV tiFIUTtT+3zvw== From: Mike Snitzer To: Chuck Lever , Jeff Layton , Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Subject: [PATCH v5 13/13] nfs/direct: add misaligned WRITE handling Date: Thu, 24 Jul 2025 15:31:02 -0400 Message-ID: <20250724193102.65111-14-snitzer@kernel.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20250724193102.65111-1-snitzer@kernel.org> References: <20250724193102.65111-1-snitzer@kernel.org> Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Because the NFS client will already happily handle misaligned O_DIRECT IO (by sending it out to NFSD via RPC) this commit's new capabilities are for the benefit of LOCALIO and require the nfs modparam: localio_O_DIRECT_align_misaligned_IO=Y When enabled, misaligned WRITE IO is split into a start, middle and end as needed. The large middle extent is DIO-aligned and the start and/or end are misaligned (due to each being a partial page). Like the READ support that came before this WRITE support, the nfs_analyze_dio trace event shows how the NFS client split a given misaligned IO into a mix of misaligned page(s) and a DIO-aligned extent. This combination of trace events is useful for LOCALIO WRITEs: echo 1 > /sys/kernel/tracing/events/nfs/nfs_analyze_dio/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_write/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_writeback_done/enable echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable Which for this dd command: dd if=/dev/zero of=/mnt/share1/test bs=47008 count=2 oflag=direct Results in: dd-63257 [001] ..... 83742.427650: nfs_analyze_dio: WRITE offset=0 len=47008 start=0+0 middle=0+45056 end=45056+1952 dd-63257 [001] ..... 83742.427659: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=0 count=45056 stable=UNSTABLE dd-63257 [001] ..... 83742.427662: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=45056 count=1952 stable=UNSTABLE kworker/u193:3-62985 [011] ..... 83742.427664: xfs_file_direct_write: dev 259:22 ino 0x5e0000a3 disize 0x0 pos 0x0 bytecount 0xb000 kworker/u193:3-62985 [011] ..... 83742.427695: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=0 count=45056 res=45056 stable=UNSTABLE verifier=a8b37e6803d1eb1e kworker/u193:4-63221 [004] ..... 83742.427699: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=45056 count=1952 res=1952 stable=UNSTABLE verifier=a8b37e6803d1eb1e dd-63257 [001] ..... 83742.427755: nfs_analyze_dio: WRITE offset=47008 len=47008 start=47008+2144 middle=49152+40960 end=90112+3904 dd-63257 [001] ..... 83742.427758: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=47008 count=2144 stable=UNSTABLE dd-63257 [001] ..... 83742.427760: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=49152 count=40960 stable=UNSTABLE kworker/u193:4-63221 [004] ..... 83742.427761: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=47008 count=2144 res=2144 stable=UNSTABLE verifier=a8b37e6803d1eb1e dd-63257 [001] ..... 83742.427763: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=90112 count=3904 stable=UNSTABLE kworker/u193:4-63221 [004] ..... 83742.427763: xfs_file_direct_write: dev 259:22 ino 0x5e0000a3 disize 0xb7a0 pos 0xc000 bytecount 0xa000 kworker/u193:4-63221 [004] ..... 83742.427783: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=49152 count=40960 res=40960 stable=UNSTABLE verifier=a8b37e6803d1eb1e kworker/u193:3-62985 [011] ..... 83742.427788: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=90112 count=3904 res=3904 stable=UNSTABLE verifier=a8b37e6803d1eb1e Signed-off-by: Mike Snitzer --- fs/nfs/direct.c | 84 ++++++++++++++++++++++++++++++++++++++++++++--- fs/nfs/internal.h | 1 + 2 files changed, 80 insertions(+), 5 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 4e1e668eaa1f..80c2ca37cf28 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -1048,11 +1048,19 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, if (result < 0) break; - bytes = result; - npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE; + /* Limit the amount of bytes serviced each iteration to aligned batches */ + if (pos < dreq->middle_offset && dreq->start_len) + bytes = min_t(size_t, dreq->start_len, result); + else if (pos < dreq->end_offset && dreq->middle_len) + bytes = min_t(size_t, dreq->middle_len, result); + else + bytes = result; + npages = (bytes + pgbase + PAGE_SIZE - 1) / PAGE_SIZE; + for (i = 0; i < npages; i++) { struct nfs_page *req; unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase); + bool issue_dio_now = false; req = nfs_page_create_from_page(dreq->ctx, pagevec[i], pgbase, pos, req_len); @@ -1068,6 +1076,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, } pgbase = 0; + result -= req_len; bytes -= req_len; requested_bytes += req_len; pos += req_len; @@ -1077,9 +1086,27 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, continue; } + /* Looking ahead, is this req the end of the start or middle? */ + if (bytes == 0) { + if ((dreq->start_len && + pos == dreq->middle_offset && result >= dreq->middle_len) || + (dreq->end_len && + pos == dreq->end_offset && result == dreq->end_len)) { + desc.pg_doio_now = 1; + issue_dio_now = true; + /* Reset iter to the last boundary, isse the current + * req and then handle iter to next boundary or end. + */ + iov_iter_revert(iter, result); + } + } + nfs_lock_request(req); - if (nfs_pageio_add_request(&desc, req)) + if (nfs_pageio_add_request(&desc, req)) { + if (issue_dio_now) + break; continue; + } /* Exit on hard errors */ if (desc.pg_error < 0 && desc.pg_error != -EAGAIN) { @@ -1120,6 +1147,50 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, return requested_bytes; } +/* + * If localio_O_DIRECT_align_misaligned_WRITE enabled, split misaligned + * WRITE to a DIO-aligned middle and misaligned head and/or tail. + */ +static bool nfs_analyze_write_dio(loff_t offset, __u32 len, + struct nfs_direct_req *dreq) +{ +#if IS_ENABLED(CONFIG_NFS_LOCALIO) + /* Hardcoded to PAGE_SIZE (since don't have LOCALIO nfsd_file's + * dio_alignment), works for smaller alignment too (e.g. 512b). + */ + u32 dio_blocksize = PAGE_SIZE; + loff_t start_end, orig_end, middle_end; + + /* Return early if feature disabled, if IO is irreparably + * misaligned (len < PAGE_SIZE) or if IO is already DIO-aligned. + */ + if (!nfs_localio_O_DIRECT_align_misaligned_IO() || + unlikely(len < dio_blocksize) || + (((offset | len) & (dio_blocksize-1)) == 0)) + return false; + + start_end = round_up(offset, dio_blocksize); + orig_end = offset + len; + middle_end = round_down(orig_end, dio_blocksize); + + dreq->io_start = offset; + dreq->max_count = orig_end - offset; + + dreq->start_len = start_end - offset; + dreq->middle_offset = start_end; + dreq->middle_len = middle_end - start_end; + dreq->end_offset = middle_end; + dreq->end_len = orig_end - middle_end; + + trace_nfs_analyze_dio(WRITE, offset, len, offset, dreq->start_len, + dreq->middle_offset, dreq->middle_len, + dreq->end_offset, dreq->end_len); + return true; +#else + return false; +#endif +} + /** * nfs_file_direct_write - file direct write operation for NFS files * @iocb: target I/O control block @@ -1176,9 +1247,12 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, if (!dreq) goto out; + if (swap || !nfs_analyze_write_dio(pos, count, dreq)) { + dreq->max_count = count; + dreq->io_start = pos; + } + dreq->inode = inode; - dreq->max_count = count; - dreq->io_start = pos; dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp)); l_ctx = nfs_get_lock_context(dreq->ctx); if (IS_ERR(l_ctx)) { diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 06a15bf08357..8daed5b1aa50 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -995,6 +995,7 @@ struct nfs_direct_req { struct bio_vec * start_extra_bvec; loff_t middle_offset; /* Offset for start of DIO-aligned middle */ loff_t end_offset; /* Offset for start of DIO-aligned end */ + ssize_t start_len; /* Length for misaligned first page */ ssize_t middle_len; /* Length for DIO-aligned middle */ ssize_t end_len; /* Length for misaligned last page */ }; -- 2.44.0