From: Brian Foster <bfoster@redhat.com>
To: Mao Cheng <chengmao2010@gmail.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: xfs_alloc_ag_vextent_near() takes about 30ms to complete
Date: Tue, 23 Oct 2018 10:53:40 -0400 [thread overview]
Message-ID: <20181023145339.GA7537@bfoster> (raw)
In-Reply-To: <CAGiyNfAUewzAjASPBccOk91PJ+nLS0nF4U3EvO4n59XsYu5P-w@mail.gmail.com>
On Tue, Oct 23, 2018 at 03:56:51PM +0800, Mao Cheng wrote:
> Sorry for trouble again. I just wrote wrong function name in previous
> sending, so resend it.
> If you have received previous email please ignore it, thanks
>
> we have a XFS mkfs with "-k" and mount with the default options(
> rw,relatime,attr2,inode64,noquota), the size is about 2.2TB,and
> exported via samba.
>
> [root@test1 home]# xfs_info /dev/sdk
> meta-data=/dev/sdk isize=512 agcount=4, agsize=131072000 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=1 finobt=0 spinodes=0
> data = bsize=4096 blocks=524288000, imaxpct=5
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0 ftype=1
> log =internal bsize=4096 blocks=256000, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> free space about allocation groups:
> from to extents blocks pct
> 1 1 9 9 0.00
> 2 3 14291 29124 0.19
> 4 7 5689 22981 0.15
> 8 15 119 1422 0.01
> 16 31 754657 15093035 99.65
> 32 63 1 33 0.00
> total free extents 774766
> total free blocks 15146604
> average free extent size 19.5499
> from to extents blocks pct
> 1 1 253 253 0.00
> 2 3 7706 16266 0.21
> 4 7 7718 30882 0.39
> 8 15 24 296 0.00
> 16 31 381976 7638130 96.71
> 32 63 753 38744 0.49
> 131072 262143 1 173552 2.20
> total free extents 398431
> total free blocks 7898123
> average free extent size 19.8231
> from to extents blocks pct
> 1 1 370 370 0.00
> 2 3 2704 5775 0.01
> 4 7 1016 4070 0.01
> 8 15 24 254 0.00
> 16 31 546614 10931743 20.26
> 32 63 19191 1112600 2.06
> 64 127 2 184 0.00
> 131072 262143 1 163713 0.30
> 524288 1048575 2 1438626 2.67
> 1048576 2097151 4 5654463 10.48
> 2097152 4194303 1 3489060 6.47
> 4194304 8388607 2 12656637 23.46
> 16777216 33554431 1 18502975 34.29
> total free extents 569932
> total free blocks 53960470
> average free extent size 94.6788
> from to extents blocks pct
> 1 1 8 8 0.00
> 2 3 5566 11229 0.06
> 4 7 9622 38537 0.21
> 8 15 57 686 0.00
> 16 31 747242 14944852 80.31
> 32 63 570 32236 0.17
> 2097152 4194303 1 3582074 19.25
> total free extents 763066
> total free blocks 18609622
> average free extent size 24.38
>
So it looks like free space in 3 out of 4 AGs is mostly fragmented to
16-31 block extents. Those same AGs appear to have a much higher number
(~15k-20k) of even smaller extents.
> we copy small files(about 150kb) from windows to xfs via SMB protocal,
> sometines kworker process consumes 100% of one CPU, and "perf top"
> shows xfs_extent_busy_trim() and xfs_btree_increment() consume too much
> cpu resources, ftrace also show xfs_alloc_ag_vextent_near takes about 30ms to
> complete.
>
This is kind of a vague performance report. Some process consumes a full
CPU and this is a problem for some (??) reason given unknown CPU and
unknown storage (with unknown amount of RAM). I assume that kworker task
is writeback, but you haven't really specified that either.
xfs_alloc_ag_vextent_near() is one of the several block allocation
algorithms in XFS. That function itself includes a couple different
algorithms for the "near" allocation based on the state of the AG. One
looks like an intra-block search of the by-size free space btree (if not
many suitably sized extents are available) and the second looks like an
outward sweep of the by-block free space btree to find a suitably sized
extent. I could certainly see the latter taking some time for certain
sized allocation requests under fragmented free space conditions. If you
wanted more detail over what's going on here, I'd suggest to capture a
sample of the xfs_alloc* (and perhaps xfs_extent_busy*) tracepoints
during the workload.
That aside, it's probably best to step back and describe for the list
the overall environment, workload and performance problem you observed
that caused this level of digging in the first place. For example, has
throughput degraded over time? Latency increased? How many writers are
active at once? Is preallocation involved (I thought Samba/Windows
triggered it certain cases, but I don't recall)?
Brian
> In addition all tests were performed on Centos7.4(3.10.0-693.el7.x86_64).
>
> Any suggestions are welcome.
next prev parent reply other threads:[~2018-10-23 23:17 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-23 7:56 xfs_alloc_ag_vextent_near() takes about 30ms to complete Mao Cheng
2018-10-23 14:53 ` Brian Foster [this message]
2018-10-24 3:01 ` Mao Cheng
2018-10-24 4:34 ` Dave Chinner
2018-10-24 9:02 ` Mao Cheng
2018-10-24 12:11 ` Brian Foster
2018-10-25 4:01 ` Mao Cheng
2018-10-25 14:55 ` Brian Foster
2018-10-24 12:09 ` Brian Foster
2018-10-24 22:35 ` Dave Chinner
2018-10-25 13:21 ` Brian Foster
2018-10-26 1:03 ` Dave Chinner
2018-10-26 13:03 ` Brian Foster
2018-10-27 3:16 ` Dave Chinner
2018-10-28 14:09 ` Brian Foster
2018-10-29 0:17 ` Dave Chinner
2018-10-29 9:53 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181023145339.GA7537@bfoster \
--to=bfoster@redhat.com \
--cc=chengmao2010@gmail.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox