public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@scylladb.com>
To: Dave Chinner <david@fromorbit.com>, Glauber Costa <glauber@scylladb.com>
Cc: Brian Foster <bfoster@redhat.com>, xfs@oss.sgi.com
Subject: Re: sleeps and waits during io_submit
Date: Tue, 1 Dec 2015 22:56:01 +0200	[thread overview]
Message-ID: <565E0961.4060603@scylladb.com> (raw)
In-Reply-To: <20151201204535.GX19199@dastard>

On 12/01/2015 10:45 PM, Dave Chinner wrote:
> On Tue, Dec 01, 2015 at 09:01:13AM -0500, Glauber Costa wrote:
>> On Tue, Dec 1, 2015 at 8:58 AM, Avi Kivity <avi@scylladb.com> wrote:
>>> On 12/01/2015 03:11 PM, Brian Foster wrote:
>>>> It sounds to me that first and foremost you want to make sure you don't
>>>> have however many parallel operations you typically have running
>>>> contending on the same inodes or AGs. Hint: creating files under
>>>> separate subdirectories is a quick and easy way to allocate inodes under
>>>> separate AGs (the agno is encoded into the upper bits of the inode
>>>> number).
>>>
>>> Unfortunately our directory layout cannot be changed.  And doesn't this
>>> require having agcount == O(number of active files)?  That is easily in the
>>> thousands.
>> Actually, wouldn't agcount == O(nr_cpus) be good enough?
> Not quite. What you need is agcount ~= O(nr_active_allocations).

Yes, this is what I mean by "active files".

>
> The difference is an allocation can block waiting on IO, and the
> CPU can then go off and run another process, which then tries to do
> an allocation. So you might only have 4 CPUs, but a workload that
> can have a hundred active allocations at once (not uncommon in
> file server workloads).

But for us, probably not much more.  We try to restrict active I/Os to 
the effective disk queue depth (more than that and they just turn sour 
waiting in the disk queue).


> On worklaods that are roughly 1 process per CPU, it's typical that
> agcount = 2 * N cpus gives pretty good results on large filesystems.

This is probably using sync calls.  Using async calls you can have many 
more I/Os in progress (but still limited by effective disk queue depth).

> If you've got 400GB filesystems or you are using spinning disks,
> then you probably don't want to go above 16 AGs, because then you
> have problems with maintaining continugous free space and you'll
> seek the spinning disks to death....

We're concentrating on SSDs for now.

>
>>>> 'mount -o ikeep,'
>>>
>>> Interesting.  Our files are large so we could try this.
> Keep in mind that ikeep means that inode allocation permanently
> fragments free space, which can affect how large files are allocated
> once you truncate/rm the original files.
>
>

We can try to prime this by allocating a lot of inodes up front, then 
removing them, so that this doesn't happen.

Hurray ext2.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-12-01 20:56 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-28  2:43 sleeps and waits during io_submit Glauber Costa
2015-11-30 14:10 ` Brian Foster
2015-11-30 14:29   ` Avi Kivity
2015-11-30 16:14     ` Brian Foster
2015-12-01  9:08       ` Avi Kivity
2015-12-01 13:11         ` Brian Foster
2015-12-01 13:58           ` Avi Kivity
2015-12-01 14:01             ` Glauber Costa
2015-12-01 14:37               ` Avi Kivity
2015-12-01 20:45               ` Dave Chinner
2015-12-01 20:56                 ` Avi Kivity [this message]
2015-12-01 23:41                   ` Dave Chinner
2015-12-02  8:23                     ` Avi Kivity
2015-12-01 14:56             ` Brian Foster
2015-12-01 15:22               ` Avi Kivity
2015-12-01 16:01                 ` Brian Foster
2015-12-01 16:08                   ` Avi Kivity
2015-12-01 16:29                     ` Brian Foster
2015-12-01 17:09                       ` Avi Kivity
2015-12-01 18:03                         ` Carlos Maiolino
2015-12-01 19:07                           ` Avi Kivity
2015-12-01 21:19                             ` Dave Chinner
2015-12-01 21:38                               ` Avi Kivity
2015-12-01 23:06                                 ` Dave Chinner
2015-12-02  9:02                                   ` Avi Kivity
2015-12-02 12:57                                     ` Carlos Maiolino
2015-12-02 23:19                                     ` Dave Chinner
2015-12-03 12:52                                       ` Avi Kivity
2015-12-04  3:16                                         ` Dave Chinner
2015-12-08 13:52                                           ` Avi Kivity
2015-12-08 23:13                                             ` Dave Chinner
2015-12-01 18:51                         ` Brian Foster
2015-12-01 19:07                           ` Glauber Costa
2015-12-01 19:35                             ` Brian Foster
2015-12-01 19:45                               ` Avi Kivity
2015-12-01 19:26                           ` Avi Kivity
2015-12-01 19:41                             ` Christoph Hellwig
2015-12-01 19:50                               ` Avi Kivity
2015-12-02  0:13                             ` Brian Foster
2015-12-02  0:57                               ` Dave Chinner
2015-12-02  8:38                                 ` Avi Kivity
2015-12-02  8:34                               ` Avi Kivity
2015-12-08  6:03                                 ` Dave Chinner
2015-12-08 13:56                                   ` Avi Kivity
2015-12-08 23:32                                     ` Dave Chinner
2015-12-09  8:37                                       ` Avi Kivity
2015-12-01 21:04                 ` Dave Chinner
2015-12-01 21:10                   ` Glauber Costa
2015-12-01 21:39                     ` Dave Chinner
2015-12-01 21:24                   ` Avi Kivity
2015-12-01 21:31                     ` Glauber Costa
2015-11-30 15:49   ` Glauber Costa
2015-12-01 13:11     ` Brian Foster
2015-12-01 13:39       ` Glauber Costa
2015-12-01 14:02         ` Brian Foster
2015-11-30 23:10 ` Dave Chinner
2015-11-30 23:51   ` Glauber Costa
2015-12-01 20:30     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=565E0961.4060603@scylladb.com \
    --to=avi@scylladb.com \
    --cc=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=glauber@scylladb.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox