All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Alex Lyakas <alex@zadarastorage.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Brian Foster <bfoster@redhat.com>,
	linux-xfs@vger.kernel.org
Subject: Re: xfs_alloc_ag_vextent_near() takes minutes to complete
Date: Fri, 5 May 2017 13:29:05 +1000	[thread overview]
Message-ID: <20170505032905.GF17542@dastard> (raw)
In-Reply-To: <DDDCF9D44B6A4D1493F62FCA5AC73CF6@alyakaslap>

On Thu, May 04, 2017 at 11:07:45AM +0300, Alex Lyakas wrote:
> Hello Brian, Cristoph,
> 
> Thank you for your responses.
> 
> >The search overhead could be high due to either fragmented free space or
> >perhaps waiting on busy extents (since you have enabled online discard).
> >Do you have any threads freeing space and waiting on discard operations
> >when this occurs? Also, what does 'xfs_db -c "freesp -s" <dev>' show for
> >this filesystem?
> I disabled the discard, but the problem still happens. Output of the
> freesp command is at [1]. To my understanding this means that 60% of
> the free space is 16-31 continuous blocks, i.e., 64kb-124kb. Does
> this count as a fragmented free space?
> 
> I debugged the issue further, profiling the
> xfs_alloc_ag_vextent_near() call and what it does. Some results:
> 
> # it appears to not be triggering any READs of xfs_buf, i.e., no
> calls to xfs_buf_ioapply_map() with rw==READ or rw==READA in the
> same thread
> # most of the time (about 95%) is spent in xfs_buf_lock() waiting in
> "down(&bp->b_sema)" call
> # the average time to lock an xfs_buf is about 10-12 ms
> 
> For example, in one test it took 45778 ms to complete the
> xfs_alloc_ag_vextent_near()  execution. During this time, 6240
> xfs_buf were locked, totalling to 42810 ms spent in locking the
> buffers, which is about 93%. On average 7 ms to lock a buffer.
> 
> # it is still not clear who is holding the lock
> 
> Cristoph, I understand that kernel 3.18 is EOL at the moment, but it
> used to be a long-term kernel, so there is an expectation of
> stability, but perhaps not community support at this point.
> 
> Thanks,
> Alex.
> 
> 
> [1]
>   from      to extents  blocks    pct
>      1       1  155759  155759   0.00
>      2       3    1319    3328   0.00
>      4       7   13153   56265   0.00
>      8      15  152663 1752813   0.03
>     16      31 143626908 4019133338  60.17

There's your problem. 143 million small free space extents totalling
4TB of free space. That's going to require (roughly speaking)
somewhere between 3-500,000 4k btree leaf blocks to index. i.e a
footprint of 10-20GB of metadata.

Even accounting for it being evenly spread across 50AGs, that's
still a 5-10k of btree blocks per free space btree per AG, and so if
that's not in cache when we end up doing a linear search for a near
block of a size that falls into this bucket, it's going to get stuck
reading btree leaf siblings from disk synchronously....

Perhaps this "near block" search needs to terminate after at a
certain search radius, similar to how the old AGI btree searches
during inode allocation were terminated after a certain radius of
allocated inode clusters were searched for free inodes....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2017-05-05  3:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-01 12:37 xfs_alloc_ag_vextent_near() takes minutes to complete Alex Lyakas
2017-05-01 15:26 ` Brian Foster
2017-05-02  7:35 ` Christoph Hellwig
2017-05-04  8:07   ` Alex Lyakas
2017-05-04 11:13     ` Alex Lyakas
2017-05-04 12:29       ` Brian Foster
2017-05-04 12:25     ` Brian Foster
2017-05-04 13:53       ` Alex Lyakas
2017-05-05  3:29     ` Dave Chinner [this message]
2017-05-07  7:52       ` Alex Lyakas
2017-05-07  8:00   ` Alex Lyakas
2017-05-07  9:12     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170505032905.GF17542@dastard \
    --to=david@fromorbit.com \
    --cc=alex@zadarastorage.com \
    --cc=bfoster@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.