All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johann Lombardi <johann@sun.com>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] Moving forward on Quotas
Date: Mon, 02 Jun 2008 14:22:07 +0200	[thread overview]
Message-ID: <20080602122207.GD3628@lore> (raw)
In-Reply-To: <C4682B4E.5615%peter.braam@sun.com>

On Sun, Jun 01, 2008 at 10:32:46AM +0800, Peter Braam wrote:
> I am quite worried about the dynamic qunit patch.
> I am not convinced I want smaller qunits to stick around.
>
> Please PROVE RIGOROUSLY that qunits are grow large quickly again, otherwise
> they create too much server - server overhead.

I've _not_ been involved in the design of the adaptive qunit feature (the DLD
pre-dates my involvement with Sun/CFS), but here is how it basically works:
* if remaining quota space < 4 * #osts * current_qunit, the qunit size is
  divided by 2,
* if remaining quota space > 8 * #osts * current_qunit, the qunit size is
  multiplied by 2.
The initial bunit size (also the maximum value) is the default one (i.e. 128MB).
The "4" and "8" can be tuned through /proc and there is a minimum value for
qunit (by default, 1MB = PTLRPC_MAX_BRW_SIZE for bunit).

Let's consider a cluster with 500 OSTs:
* the initial qunit size for a particular uid/gid is 128MB (unless the quota
  limit is too low)
* when left_quota = 256GB, bunit is shrunk to 64MB
* when left_quota = 128GB, bunit is shrunk to 32MB
* when left_quota = 64GB, bunit is shrunk to 16MB
* when left_quota = 32GB, bunit is shrunk to 8MB
* when left_quota = 16GB, bunit is shrunk to 4MB
* when left_quota = 8GB, bunit is shrunk to 2MB
* when left_quota = 4GB, bunit is shrunk to 1MB

Similarly, bunit is grown when the remaining quota space hits the same
thresholds. The dynamic qunit patch also maintains an accurate accounting of
how many threads are waiting for quota space from the master. Thus, slaves
can ask for more than one qunit at a time in a single DQACQ request.
IMO, the current algorithm/parameters are probably too aggressive and the
correct tuning has not been found yet.

> The cost of 100MB of disk space is barely more than a cent now; what are we trying
> to address with tiny qunits?

Today, a couple of customers are asking for accurate quotas. We should probably
discuss with them to understand their motivations.
From my point of view, the interesting feature is not to support small quota
limits or tiny qunits, but to have the ability to adapt qunits for each uid/gid
depending on how much free quota space remains. We can now increase qunit
significantly without hurting quotas accuracy and performance should only be
impacted when getting closer to the quota limit (that was the original goal in
the DLD). That being said, adaptive qunits can be disabled easily by setting
the mininum qunit size to the default qunit size.

> Plan for 5000 OSS servers at the minimum and 1,000,000 clients, and up to
> 100TB/sec in I/O. Calculate quota RPC traffic from that. A server cannot
> handle more than 15,000 RPC's / sec.
>
> No arguing, or opinions here, numbers please.

With static qunits:
100TB/s / default_bunit_size ~ 1,000,000 RPCs / sec
To get below the 15,000 RPCs/s, we should increase bunit to ~6.7GB.
If each OST acquires 1 qunit ahead of time w/o actually using it, we "leak"
6.7GB * 5,000 OSTs = 33.5TB.

With adaptive qunits, we can set default bunit to a larger value (e.g. 10GB)
and the mininum bunit to 100MB. This way, quotas can remain "accurate" (maximum
leak is 500GB) and performane would be impacted (more RPCs sent) only when
getting close to the quota limit.
However, the current shrink/enlarge algorithm is definitely not suitable for
such a big cluster since it decreases qunit too quickly.

> The original design I did 4 years ago limited quota calls from one OSS to the
> master to one per second.
> Qunits were made adaptive without solid reasoning or design.

IMHO, adaptive qunits is not such a bad feature, even if there is definitely
room for improvements.

Johann

  reply	other threads:[~2008-06-02 12:22 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <18490.63940.619731.992500@gargle.gargle.HOWL>
2008-05-26 23:28 ` [Lustre-devel] Moving forward on Quotas Peter Braam
2008-05-28  8:06   ` Johann Lombardi
2008-06-01  2:32     ` Peter Braam
2008-06-02 12:22       ` Johann Lombardi [this message]
2008-06-02 23:24       ` Andreas Dilger
2008-06-03  8:49         ` Landen tian
2008-06-04  1:24           ` Peter Braam
2008-06-04  7:05             ` Landen tian
2008-06-04  8:26         ` Johann Lombardi
2008-05-28 14:29   ` Ricardo M. Correia
2008-05-28 14:54     ` Nikita Danilov
2008-05-28 15:14       ` Ricardo M. Correia
2008-05-28 16:22         ` Nikita Danilov
2008-05-28 17:05           ` Ricardo M. Correia
2008-05-28 20:06             ` Nikita Danilov
2008-05-28 21:07               ` Ricardo M. Correia
2008-05-28 21:11                 ` Nikita Danilov
2008-05-28 21:33                   ` Ricardo M. Correia
2008-05-29  8:39                     ` Nikita Danilov
     [not found]                       ` <18496.11672.844774.815457@gargle.gargle.HOWL>
2008-05-31 15:31                         ` Ricardo M. Correia
2008-05-31 15:49                           ` Ricardo M. Correia
     [not found]                         ` <1212247447.21348.70.camel@localhost>
2008-05-31 16:19                           ` Nikita Danilov
2008-05-31 17:19                             ` Ricardo M. Correia
2008-05-31 19:11                               ` Nikita Danilov
2008-06-01  2:36                 ` Peter Braam
2008-06-01  3:17                   ` Mike Shapiro
2008-06-01  2:26       ` Peter Braam
2008-06-01  4:53         ` Jeff Bonwick
2008-06-01 13:58           ` Nikita Danilov
2008-06-03  0:50             ` Matthew Ahrens
2008-06-03  7:49               ` Nikita Danilov
2008-06-04 23:50                 ` Matthew Ahrens
2008-05-28 15:24   ` Nikita Danilov
2008-05-31 10:25     ` Peter Braam
     [not found] <92825021-D566-4805-9297-5EFBD3260D73@Sun.COM>
2008-06-01  2:44 ` Peter Braam
     [not found] <20080605083957.GQ6283@lore>
2008-06-05 11:09 ` Peter Braam
2008-06-05 12:27   ` Johann Lombardi
2008-06-05 13:45     ` Peter Braam
2008-06-06  7:33       ` Johann Lombardi
2008-06-06 12:21         ` Peter Braam
2008-06-09  8:52           ` Yong Fan
2008-06-09 15:37             ` Peter Braam
2008-06-09 16:09               ` Yong Fan
2008-06-10 13:54           ` Yong Fan
2008-06-10 16:51             ` Peter Braam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080602122207.GD3628@lore \
    --to=johann@sun.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.