From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-oo1-f42.google.com (mail-oo1-f42.google.com [209.85.161.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42E654A39 for ; Wed, 1 May 2024 01:32:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.42 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714527128; cv=none; b=okoxNgIclnymPhFyMImOs5/lldDvD3BqeHzikPgdwric6W8D3dq2kcY2nFAgMVaaBHQFmLUDcsD8DVEZq9q6sTI8DQ52Rj41hwA7/4x+A914uCVj81FxzewXk7qT4+By5yaZYJkbB+hsXtfH4GtQ0tR7oRoST2NV6MYIeFyZn2k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714527128; c=relaxed/simple; bh=nNA/+5lUlX8TZ8Hrv3FoOYIRmlTOzjp/ZOYWxKwxduc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=McAa+xos5ECAnTi90AM7hnRayUQKz5BR/AsPLX8qqp/V/HUVMd47JdoqAxJM+lNr+JITC4mfvHFb3JId16ry78CCiBqVYPpQlYEm6cJWCLXB44lw7XlINOEq4t9KqmjIJ7OrngyOyoo5C/c5HIo7z4Yssu6tEHXltx8VrOG1F6E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=1+OV8jIb; arc=none smtp.client-ip=209.85.161.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="1+OV8jIb" Received: by mail-oo1-f42.google.com with SMTP id 006d021491bc7-5aa400b917dso4121564eaf.0 for ; Tue, 30 Apr 2024 18:32:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1714527124; x=1715131924; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=KM7Hd5zki3USqEbBfz9TXSdoSf1EQG0WzU/UqVvexrQ=; b=1+OV8jIbYG66pZUK2m8IEQwBn5oVX6u5ZhY8tP/EIrnrCsEkWop/7yOIkM/+5SFJ/S N7mJCaNT5/m/VZDZG4VS54WhLNRVYlivJPx0Z9BwmpS+AAOJ1COCJT754NtxTsXJnQ7O 8UDilEVIa1VsvMJQvc2Jc71llQj05GbEoq88v3BUjPcLB0Q0q/NfTb7cx+0tqHo0vMKl LZrmUCAi4JnpYnDoRUEKI9x9eEPu0KYY9lOXQA4lq4Y7dXN2xndg+X2BP8YE281XEZ9K BSFbrJbZfweqbjTZKiXc7ZTbqb6kC3kgAW7cygSErGHV0QXj4VDJrhtFNEbZrZNawan/ g6HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714527124; x=1715131924; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=KM7Hd5zki3USqEbBfz9TXSdoSf1EQG0WzU/UqVvexrQ=; b=fEoNsSGbmdc3Gu6Fi9u8a7tFz+CDT6lDr0k2sHENLWTLBzg2scHFgbQ5/OkmC9msC8 ASXcGjGdN5T6ZoqerqbJ+IaTAA1WWPIq4hY65kF+PxIwD/TGZuvmV8rfUXY7mGD+5p/e YUm7IC0H0PYTHqZF/97YpzkM2im7aTGKXNfTQbw0UUDKZCyQu/mQ1Ptcklv2pdowzHM2 dT8lK9HMaSaBhQlWrbQkBRu45QKq0OGHQAxcEd1V1HRqdxVmoGGgSh4jLupvOXZr/MFH XXK3FNeVlXiw5vSj5pFVObQh8Slfwiv8QevKmf12fmye8o7fnAw06CCwYi4NjsMj7WY6 R2Aw== X-Forwarded-Encrypted: i=1; AJvYcCVrgWfUAsM9hBqDFQU/AbNKVyArE7lxddWsmwWmyhBspj4UEjEoZOX/N5C4PM9gvs4/cl0jyp71MaypXKgO3HemVoO+qvqmrnu4 X-Gm-Message-State: AOJu0YwI2cUISgPY2nP/t+aX76xV2dSfymJLxFFV5Oh7oIw0Aj0D0lM4 As+ikyTwNkwzLFYmghHCR9Ba/bJF2LHhz1I93AZ4Zo2ro5I3xg/PAvtxC4cw1oQ= X-Google-Smtp-Source: AGHT+IEAQ8njw5Xk9bQwmGDIxl6ZK6d3U3YkxZ7bTcibYCg1oYT8UUm8HvkrTms16QE9uu7PwRaMpA== X-Received: by 2002:a05:6358:887:b0:17e:b867:cb99 with SMTP id m7-20020a056358088700b0017eb867cb99mr1868688rwj.1.1714527123767; Tue, 30 Apr 2024 18:32:03 -0700 (PDT) Received: from dread.disaster.area (pa49-179-32-121.pa.nsw.optusnet.com.au. [49.179.32.121]) by smtp.gmail.com with ESMTPSA id y124-20020a636482000000b006109431806dsm6349949pgb.92.2024.04.30.18.32.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 18:32:03 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1s1ypE-00GpU1-0H; Wed, 01 May 2024 11:32:00 +1000 Date: Wed, 1 May 2024 11:32:00 +1000 From: Dave Chinner To: John Garry Cc: djwong@kernel.org, hch@lst.de, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, chandan.babu@oracle.com, willy@infradead.org, axboe@kernel.dk, martin.petersen@oracle.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, ojaswin@linux.ibm.com, ritesh.list@gmail.com, mcgrof@kernel.org, p.raghav@samsung.com, linux-xfs@vger.kernel.org, catherine.hoang@oracle.com Subject: Re: [PATCH v3 15/21] fs: xfs: iomap: Sub-extent zeroing Message-ID: References: <20240429174746.2132161-1-john.g.garry@oracle.com> <20240429174746.2132161-16-john.g.garry@oracle.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240429174746.2132161-16-john.g.garry@oracle.com> On Mon, Apr 29, 2024 at 05:47:40PM +0000, John Garry wrote: > Set iomap->extent_size when sub-extent zeroing is required. > > We treat a sub-extent write same as an unaligned write, so we can leverage > the existing sub-FSblock unaligned write support, i.e. try a shared lock > with IOMAP_DIO_OVERWRITE_ONLY flag, if this fails then try the exclusive > lock. > > In xfs_iomap_write_unwritten(), FSB calcs are now based on the extsize. If forcedalign is set, should we just reject unaligned DIOs? ..... > Signed-off-by: John Garry > --- > fs/xfs/xfs_file.c | 35 ++++++++++++++++++++++------------- > fs/xfs/xfs_iomap.c | 13 +++++++++++-- > 2 files changed, 33 insertions(+), 15 deletions(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index e81e01e6b22b..ee4f94cf6f4e 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -620,18 +620,19 @@ xfs_file_dio_write_aligned( > * Handle block unaligned direct I/O writes * Handle unaligned direct IO writes. > * > * In most cases direct I/O writes will be done holding IOLOCK_SHARED, allowing > - * them to be done in parallel with reads and other direct I/O writes. However, > - * if the I/O is not aligned to filesystem blocks, the direct I/O layer may need > - * to do sub-block zeroing and that requires serialisation against other direct > - * I/O to the same block. In this case we need to serialise the submission of > - * the unaligned I/O so that we don't get racing block zeroing in the dio layer. > - * In the case where sub-block zeroing is not required, we can do concurrent > - * sub-block dios to the same block successfully. > + * them to be done in parallel with reads and other direct I/O writes. > + * However if the I/O is not aligned to filesystem blocks/extent, the direct > + * I/O layer may need to do sub-block/extent zeroing and that requires > + * serialisation against other direct I/O to the same block/extent. In this > + * case we need to serialise the submission of the unaligned I/O so that we > + * don't get racing block/extent zeroing in the dio layer. > + * In the case where sub-block/extent zeroing is not required, we can do > + * concurrent sub-block/extent dios to the same block/extent successfully. > * > * Optimistically submit the I/O using the shared lock first, but use the > * IOMAP_DIO_OVERWRITE_ONLY flag to tell the lower layers to return -EAGAIN > - * if block allocation or partial block zeroing would be required. In that case > - * we try again with the exclusive lock. > + * if block/extent allocation or partial block/extent zeroing would be > + * required. In that case we try again with the exclusive lock. Rather than changing every "block" to "block/extent", leave the bulk of the comment unchanged and add another paragraph to it that says something like: * If forced extent alignment is turned on, then serialisation * constraints are extended from filesystem block alignment * to extent alignment boundaries. In this case, we treat any * non-extent-aligned DIO the same as a sub-block DIO. > */ > static noinline ssize_t > xfs_file_dio_write_unaligned( > @@ -646,9 +647,9 @@ xfs_file_dio_write_unaligned( > ssize_t ret; > > /* > - * Extending writes need exclusivity because of the sub-block zeroing > - * that the DIO code always does for partial tail blocks beyond EOF, so > - * don't even bother trying the fast path in this case. > + * Extending writes need exclusivity because of the sub-block/extent > + * zeroing that the DIO code always does for partial tail blocks > + * beyond EOF, so don't even bother trying the fast path in this case. > */ > if (iocb->ki_pos > isize || iocb->ki_pos + count >= isize) { > if (iocb->ki_flags & IOCB_NOWAIT) > @@ -714,11 +715,19 @@ xfs_file_dio_write( > struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp)); > struct xfs_buftarg *target = xfs_inode_buftarg(ip); > size_t count = iov_iter_count(from); > + struct xfs_mount *mp = ip->i_mount; > + unsigned int blockmask; > > /* direct I/O must be aligned to device logical sector size */ > if ((iocb->ki_pos | count) & target->bt_logical_sectormask) > return -EINVAL; > - if ((iocb->ki_pos | count) & ip->i_mount->m_blockmask) > + > + if (xfs_inode_has_forcealign(ip) && ip->i_extsize > 1) > + blockmask = XFS_FSB_TO_B(mp, ip->i_extsize) - 1; > + else > + blockmask = mp->m_blockmask; alignmask = XFS_FSB_TO_B(mp, xfs_inode_alignment(ip)) - 1; Note that this would consider sub rt_extsize IO as unaligned, which may be undesirable. In that case, we should define a second helper such as xfs_inode_io_alignment() that doesn't take into account RT extent sizes because we can still do filesystem block sized unwritten extent conversion on those devices. The same IO-specific wrapper would be used for the other cases in this patch, too. > + > + if ((iocb->ki_pos | count) & blockmask) > return xfs_file_dio_write_unaligned(ip, iocb, from); > return xfs_file_dio_write_aligned(ip, iocb, from); > } > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index 4087af7f3c9f..1a3692bbc84d 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -138,6 +138,8 @@ xfs_bmbt_to_iomap( > > iomap->validity_cookie = sequence_cookie; > iomap->folio_ops = &xfs_iomap_folio_ops; > + if (xfs_inode_has_forcealign(ip) && ip->i_extsize > 1) > + iomap->extent_size = XFS_FSB_TO_B(mp, ip->i_extsize); iomap->io_block_size = XFS_FSB_TO_B(mp, xfs_inode_alignment(ip)); > return 0; > } > > @@ -570,8 +572,15 @@ xfs_iomap_write_unwritten( > > trace_xfs_unwritten_convert(ip, offset, count); > > - offset_fsb = XFS_B_TO_FSBT(mp, offset); > - count_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count); > + if (xfs_inode_has_forcealign(ip) && ip->i_extsize > 1) { > + xfs_extlen_t extsize_bytes = mp->m_sb.sb_blocksize * ip->i_extsize; > + > + offset_fsb = XFS_B_TO_FSBT(mp, round_down(offset, extsize_bytes)); > + count_fsb = XFS_B_TO_FSB(mp, round_up(offset + count, extsize_bytes)); > + } else { > + offset_fsb = XFS_B_TO_FSBT(mp, offset); > + count_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count); > + } More places we can use a xfs_inode_alignment() helper. offset_fsb = XFS_B_TO_FSBT(mp, offset); count_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count); rounding = XFS_FSB_TO_B(mp, xfs_inode_alignment(ip)); if (rounding > 1) { offset_fsb = rounddown_64(offset_fsb, rounding); count_fsb = roundup_64(count_fsb, rounding); } Cheers, Dave. -- Dave Chinner david@fromorbit.com