Re: [patch 02/29] knfsd: Add stats table infrastructure.

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Greg Banks <gnb-xTcybq6BZ68@public.gmane.org>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Linux NFS ML <linux-nfs@vger.kernel.org>
Subject: Re: [patch 02/29] knfsd: Add stats table infrastructure.
Date: Sun, 26 Apr 2009 14:12:23 +1000	[thread overview]
Message-ID: <ac442c870904252112u707f00den889fbe839799a36a@mail.gmail.com> (raw)
In-Reply-To: <20090425035624.GC24770@fieldses.org>

On Sat, Apr 25, 2009 at 1:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Wed, Apr 01, 2009 at 07:28:02AM +1100, Greg Banks wrote:

>> +int nfsd_stats_enabled = 1;
>> +int nfsd_stats_prune_period = 2*86400;
>
> For those of us that don't immediately recognize 86400 as the number of
> seconds in a day, writing that out as " = 2*24*60*60;" could be a useful
> hint.

Done.

>
> Also nice: a comment with any rationale (however minor) for the choice
> of period.

I've added this comment

/*
 * This number provides a bound on how long a record for a particular
 * stats entry survives after it's last use (an entry will die between
 * 1x and 2x the prune period after it's last use).  This is really only
 * particularly useful if a system admin is going to be trawling through
 * the /proc files manually and wants to see entries for (e.g.) clients
 * which have since unmounted.  If instead he uses some userspace
 * stats infrastructure which can handle rate conversion and instance
 * management, the prune period doesn't really matter.  The choice of
 * 2 days is really quite arbitrary.
 */

>> + * Stats hash pruning works thus.  A scan is run every prune period.
>> + * On every scan, hentries with the OLD flag are detached and
>> + * a reference dropped (usually that will be the last reference
>> + * and the hentry will be deleted).  Hentries without the OLD flag
>> + * have the OLD flag set; the flag is reset in nfsd_stats_get().
>> + * So hentries with active traffic in the last 2 prune periods
>> + * are not candidates for pruning.
>
> s/2 prune periods/prune period/ ?
>
> (From the description above: on exit from nfsd_stats_prune() all
> remaining entries have OLD set.  Therefore if an entry is not touched in
> the single period between two nfsd_stats_prune()'s, the second
> nfsd_stats_prune() run will drop it.)

Yeah, that was poorly phrased.  Fixed.

>
>> + */
>> +static void nfsd_stats_prune(unsigned long closure)
>> +{
>> +     nfsd_stats_hash_t *sh = (nfsd_stats_hash_t *)closure;
>> +     unsigned int i;
>> +     nfsd_stats_hentry_t *se;
>> +     struct hlist_node *hn, *next;
>> +     struct hlist_head to_be_dropped = HLIST_HEAD_INIT;
>> +
>> +     dprintk("nfsd_stats_prune\n");
>> +
>> +     if (!down_write_trylock(&sh->sh_sem)) {
>> +             /* hash is busy...try again in a second */
>> +             dprintk("nfsd_stats_prune: busy\n");
>> +             mod_timer(&sh->sh_prune_timer, jiffies + HZ);
>
> Could we make sh_sem a spinlock?  It doesn't look the the critical
> sections ever need to sleep.
>
> (Or even consider rcu, if we need the read lock on every rpc?  OK, I'm
> mostly ignorant of rcu.)

So was I way back when I wrote this patch, and it was written for an
antique kernel which was missing some useful locking bits.  So I'm not
too surprised that the locking scheme could do with a rethink.  I'll
take another look and get back to you.

>
>> +             return;
>> +     }
>> +
>> +     for (i = 0 ; i < sh->sh_size ; i++) {
>> +             hlist_for_each_entry_safe(se, hn, next, &sh->sh_hash[i], se_node) {
>> +                     if (!test_and_set_bit(NFSD_STATS_HENTRY_OLD, &se->se_flags))
>
> It looks like this is only ever used under the lock, so the
> test_and_set_bit() is overkill.

It's cleared in nfsd_stats_get() without the sh_sem lock.

>
>> +                             continue;
>> +                     hlist_del_init(&se->se_node);
>> +                     hlist_add_head(&se->se_node, &to_be_dropped);
>
> Replace those two by hlist_move_list?

If I read hlist_move_list() correctly, it moves an entire chain from
one hlist_head to another.  Here we want instead to move a single
hlist_node from one chain to another.  So, no.

>
>> +             }
>> +     }
>> +
>> +     up_write(&sh->sh_sem);
>> +
>> +     dprintk("nfsd_stats_prune: deleting\n");
>> +     hlist_for_each_entry_safe(se, hn, next, &to_be_dropped, se_node)
>> +             nfsd_stats_put(se);
>
> nfsd_stats_put() can down a semaphore, which we probably don't want in a
> timer.

Ouch.  What the hell was I thinking <kicks self>

>
>> +
>> +     mod_timer(&sh->sh_prune_timer, jiffies + nfsd_stats_prune_period * HZ);
>> +}
>> +
>> +/*
>> + * Initialise a stats hash.  Array size scales with
>> + * server memory, as a loose heuristic for how many
>> + * clients or exports a server is likely to have.
>> + */
>> +static void nfsd_stats_hash_init(nfsd_stats_hash_t *sh, const char *which)
>> +{
>> +     unsigned int nbits;
>> +     unsigned int i;
>> +
>> +     init_rwsem(&sh->sh_sem);
>> +
>> +     nbits = 5 + ilog2(totalram_pages >> (30-PAGE_SHIFT));
>> +     sh->sh_size = (1<<nbits);
>> +     sh->sh_mask = (sh->sh_size-1);
>
> Some comment on the choice of scale factor?  Also, see:
>
>        http://marc.info/?l=linux-kernel&m=118299825922287&w=2
>
> and followups.

Ok, I'll look into those.

>
> Might consider a little helper function to do this kind of
> fraction-of-total-memory calculation since I think the server does it in
> 3 or 4 places.
>
>> +
>> +     sh->sh_hash = kmalloc(sizeof(struct hlist_head) * sh->sh_size, GFP_KERNEL);
>
> Can this be a more than a page?

Yes, but it would need to be a fairly large-memory machine.  With 4K
pages and 8B pointers, totalram_pages would need to be 16G.  With 4B
pointers, we'd need 32G.

> (If so, could we just cap it at that
> size to avoid >order-0 allocations and keep the kmalloc failure
> unlikely?)

Well...I have no problem with capping it, but I don't think it's a
likely failure mode.  Firstly, there *two* allocations, which are
probably only order 1, and they happen at nfsd module load time.
Secondly, the allocation order scales, really quite slowly, with
available RAM.  Thirdly, machines which have a lowmem split will hit
the >0 order later than more modern machines with flat address spaces.

>
>> +     if (sh->sh_hash == NULL) {
>> +             printk(KERN_ERR "failed to allocate knfsd %s stats hashtable\n", which);
>> +             /* struggle on... */
>> +             return;
>> +     }
>> +     printk(KERN_INFO "knfsd %s stats hashtable, %u entries\n", which, sh->sh_size);
>
> Eh.  Make it a dprintk?

I don't think a dprintk() is useful.  This happens once during nfsd
module load, so there's no chance for an admin to enable dprintks
before it happens.

>  Or maybe expose this in the nfsd filesystem if
> it's not already?

There will be two files in the nfsd filesystem.  I'll remove the printk()

>> +     if (sh->sh_hash != NULL) {
>
> Drop the NULL check.

Done.

>> + * Drop a reference to a hentry, deleting the hentry if this
>> + * was the last reference.  Does it's own locking using the
>
> s/it's/its/

Done.

>
> (Contending for the nitpick-of-the-day award.)

:-)

>> +
>> +     if (atomic_read(&se->se_refcount)) {
>> +             /*
>> +              * We lost a race getting the write lock, and
>> +              * now there's a reference again.  Whatever.
>> +              */
>
> Some kind of atomic_dec_and_lock() might close the race.

Yep.  I'll address this when I rethink locking.

>> +
>> +typedef struct nfsd_stats_hash               nfsd_stats_hash_t;
>> +typedef struct nfsd_stats_hentry     nfsd_stats_hentry_t;
>
> Absent unusual circumstances, standard kernel style is to drop the
> typedefs and use "struct nfsd_stats_{hash,hentry}" everywhere.

Sorry, it's a disgusting habit and I'll stop it right now.

-- 
Greg.

next prev parent reply	other threads:[~2009-04-26  4:20 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-31 20:28 [patch 00/29] SGI enhancedNFS patches Greg Banks
2009-03-31 20:28 ` [patch 01/29] knfsd: Add infrastructure for measuring RPC service times Greg Banks
2009-04-25  2:13   ` J. Bruce Fields
2009-04-25  2:14     ` J. Bruce Fields
2009-04-25  2:52     ` Greg Banks
2009-03-31 20:28 ` [patch 02/29] knfsd: Add stats table infrastructure Greg Banks
2009-04-25  3:56   ` J. Bruce Fields
2009-04-26  4:12     ` Greg Banks [this message]
2009-03-31 20:28 ` [patch 03/29] knfsd: add userspace controls for stats tables Greg Banks
2009-04-25 21:57   ` J. Bruce Fields
2009-04-25 22:03     ` J. Bruce Fields
2009-04-27 16:06       ` Chuck Lever
2009-04-27 23:22         ` J. Bruce Fields
2009-04-28 15:37           ` Chuck Lever
2009-04-28 15:57             ` J. Bruce Fields
2009-04-28 16:03               ` Chuck Lever
2009-04-28 16:26                 ` J. Bruce Fields
2009-04-29  1:45               ` Greg Banks
     [not found]         ` <ac442c870904271827w6041a67ew82fe36a843beeac3@mail.gmail.com>
     [not found]           ` <ac442c870904271827w6041a67ew82fe36a843beeac3-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-04-28  1:31             ` Greg Banks
2009-04-26  4:14     ` Greg Banks
2009-03-31 20:28 ` [patch 04/29] knfsd: Add stats updating API Greg Banks
2009-03-31 20:28 ` [patch 05/29] knfsd: Infrastructure for providing stats to userspace Greg Banks
2009-04-01  0:28   ` J. Bruce Fields
2009-04-01  3:43     ` Greg Banks
2009-03-31 20:28 ` [patch 06/29] knfsd: Gather per-export stats Greg Banks
2009-03-31 20:28 ` [patch 07/29] knfsd: Prefetch the per-export stats entry Greg Banks
2009-03-31 20:28 ` [patch 08/29] knfsd: Gather per-client stats Greg Banks
2009-03-31 20:28 ` [patch 09/29] knfsd: Cache per-client stats entry on TCP transports Greg Banks
2009-03-31 20:28 ` [patch 10/29] knfsd: Update per-client & per-export stats from NFSv3 Greg Banks
2009-03-31 20:28 ` [patch 11/29] knfsd: Update per-client & per-export stats from NFSv2 Greg Banks
2009-03-31 20:28 ` [patch 12/29] knfsd: Update per-client & per-export stats from NFSv4 Greg Banks
2009-03-31 20:28 ` [patch 13/29] knfsd: reply cache cleanups Greg Banks
2009-05-12 19:54   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 14/29] knfsd: better hashing in the reply cache Greg Banks
2009-05-08 22:01   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 15/29] knfsd: fix reply cache memory corruption Greg Banks
2009-05-12 19:55   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 16/29] knfsd: use client IPv4 address in reply cache hash Greg Banks
2009-05-11 21:48   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 17/29] knfsd: make the reply cache SMP-friendly Greg Banks
2009-03-31 20:28 ` [patch 18/29] knfsd: dynamically expand the reply cache Greg Banks
2009-05-26 18:57   ` J. Bruce Fields
2009-05-26 19:04     ` J. Bruce Fields
2009-05-26 21:24     ` Rob Gardner
2009-05-26 21:52       ` J. Bruce Fields
2009-05-27  0:28       ` Greg Banks
2009-03-31 20:28 ` [patch 19/29] knfsd: faster probing in " Greg Banks
2009-03-31 20:28 ` [patch 20/29] knfsd: add extended reply cache stats Greg Banks
2009-03-31 20:28 ` [patch 21/29] knfsd: remove unreported filehandle stats counters Greg Banks
2009-05-12 20:00   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 22/29] knfsd: make svc_authenticate() scale Greg Banks
2009-05-12 21:24   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 23/29] knfsd: introduce SVC_INC_STAT Greg Banks
2009-03-31 20:28 ` [patch 24/29] knfsd: remove the program field from struct svc_stat Greg Banks
2009-03-31 20:28 ` [patch 25/29] knfsd: allocate svc_serv.sv_stats dynamically Greg Banks
2009-03-31 20:28 ` [patch 26/29] knfsd: make svc_serv.sv_stats per-CPU Greg Banks
2009-03-31 20:28 ` [patch 27/29] knfsd: move hot procedure count field out of svc_procedure Greg Banks
2009-03-31 20:28 ` [patch 28/29] knfsd: introduce NFSD_INC_STAT() Greg Banks
2009-03-31 20:28 ` [patch 29/29] knfsd: make nfsdstats per-CPU Greg Banks
2009-04-01  0:23 ` [patch 00/29] SGI enhancedNFS patches J. Bruce Fields
2009-04-01  3:32   ` Greg Banks
     [not found]     ` <ac442c870903312032t34630c6dvdbb644cb510f8079-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-04-01  6:34       ` Jeff Garzik
2009-04-01  6:41         ` Greg Banks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac442c870904252112u707f00den889fbe839799a36a@mail.gmail.com \
    --to=gnb-xtcybq6bz68@public.gmane.org \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).