From: Corey Hickey <bugfood-ml@fatooh.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: read-modify-write occurring for direct I/O on RAID-5
Date: Sun, 6 Aug 2023 11:21:38 -0700 [thread overview]
Message-ID: <6ac1f404-2cd2-42db-87b3-e1c7d5933a2d@fatooh.org> (raw)
In-Reply-To: <ZM7PHRsOqfJ71fMN@dread.disaster.area>
On 2023-08-05 15:37, Dave Chinner wrote:
> On Fri, Aug 04, 2023 at 06:44:47PM -0700, Corey Hickey wrote:
>> On 2023-08-04 14:52, Dave Chinner wrote:
>>> On Fri, Aug 04, 2023 at 12:26:22PM -0700, Corey Hickey wrote:
>>>> On 2023-08-04 01:07, Dave Chinner wrote:
>>>>> If you want to force XFS to do stripe width aligned allocation for
>>>>> large files to match with how MD exposes it's topology to
>>>>> filesytsems, use the 'swalloc' mount option. The down side is that
>>>>> you'll hotspot the first disk in the MD array....
>>>>
>>>> If I use 'swalloc' with the autodetected (wrong) swidth, I don't see any
>>>> unaligned writes.
>>>>
>>>> If I manually specify the (I think) correct values, I do still get writes
>>>> aligned to sunit but not swidth, as before.
>>>
>>> Hmmm, it should not be doing that - where is the misalignment
>>> happening in the file? swalloc isn't widely used/tested, so there's
>>> every chance there's something unexpected going on in the code...
>>
>> I don't know how to tell the file position, but I wrote a one-liner for
>> blktrace that may help. This should tell the position within the block
>> device of writes enqueued.
>
> xfs_bmap will tell you the file extent layout (offset to lba relationship).
> (`xfs_bmap -vvp <file>` output is prefered if you are going to paste
> it into an email.)
Ah, nice; the flags even show the alignment.
Here are the results for a filesystem on a 2-data-disk RAID-5 with 128 KB
chunk size.
$ sudo mkfs.xfs -s size=4096 -d sunit=256,swidth=512 /dev/md5 -f
meta-data=/dev/md5 isize=512 agcount=16, agsize=983008 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=1 inobtcount=1 nrext64=0
data = bsize=4096 blocks=15728128, imaxpct=25
= sunit=32 swidth=64 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=16384, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
$ sudo mount -o noatime,swalloc /dev/md5 /mnt/tmp
$ sudo dd if=/dev/zero of=/mnt/tmp/test.bin iflag=fullblock oflag=direct bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 62.6102 s, 171 MB/s
$ sudo xfs_bmap -vvp /mnt/tmp/test.bin
/mnt/tmp/test.bin:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7806975]: 512..7807487 0 (512..7807487) 7806976 000000
1: [7806976..15613951]: 7864576..15671551 1 (512..7807487) 7806976 000011
2: [15613952..20971519]: 15728640..21086207 2 (512..5358079) 5357568 000000
FLAG Values:
0100000 Shared extent
0010000 Unwritten preallocated extent
0001000 Doesn't begin on stripe unit
0000100 Doesn't end on stripe unit
0000010 Doesn't begin on stripe width
0000001 Doesn't end on stripe width
>>> One thing to try is to set extent size hints for the directories
>>> these large files are going to be written to. That takes a lot of
>>> the allocation decisions away from the size/shape of the individual
>>> IO and instead does large file offset aligned/sized allocations
>>> which are much more likely to be stripe width aligned. e.g. set a
>>> extent size hint of 16MB, and the first write into a hole will
>>> allocate a 16MB chunk around the write instead of just the size that
>>> covers the write IO.
>>
>> Can you please give me a documentation pointer for that? I wasn't able
>> to find the right thing via searching.
>
[...]
> $ man xfs_io
> ....
> extsize [ -R | -D ] [ value ]
[...]
Aha, thanks. That's what I was looking for.
-Corey
next prev parent reply other threads:[~2023-08-06 18:21 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-04 5:44 read-modify-write occurring for direct I/O on RAID-5 Corey Hickey
2023-08-04 8:07 ` Dave Chinner
2023-08-04 19:26 ` Corey Hickey
2023-08-04 21:52 ` Dave Chinner
2023-08-05 1:44 ` Corey Hickey
2023-08-05 22:37 ` Dave Chinner
2023-08-06 18:21 ` Corey Hickey [this message]
2023-08-06 22:38 ` Dave Chinner
2023-08-06 18:54 ` Corey Hickey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6ac1f404-2cd2-42db-87b3-e1c7d5933a2d@fatooh.org \
--to=bugfood-ml@fatooh.org \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox