All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Greg Banks <gnb-xTcybq6BZ68@public.gmane.org>
Cc: Linux NFS ML <linux-nfs@vger.kernel.org>
Subject: Re: [patch 02/29] knfsd: Add stats table infrastructure.
Date: Fri, 24 Apr 2009 23:56:24 -0400	[thread overview]
Message-ID: <20090425035624.GC24770@fieldses.org> (raw)
In-Reply-To: <20090331202938.445359000@sgi.com>

On Wed, Apr 01, 2009 at 07:28:02AM +1100, Greg Banks wrote:
> This infrastructure will be used to implement per-client and per-export
> serverside stats.  Multiple stats objects are kept in a hashtable,
> keyed by a string name (e.g. client IP address or export path).
> Old entries are pruned from the table using a timer.  The function
> nfsd_stats_find() can be used to find an entry and create it if
> necessary.
> 
> Signed-off-by: Greg Banks <gnb@sgi.com>
> ---
> 
>  fs/nfsd/stats.c            |  231 ++++++++++++++++++++++++++++++++++
>  include/linux/nfsd/debug.h |    1 
>  include/linux/nfsd/stats.h |   43 ++++++
>  3 files changed, 275 insertions(+)
> 
> Index: bfields/fs/nfsd/stats.c
> ===================================================================
> --- bfields.orig/fs/nfsd/stats.c
> +++ bfields/fs/nfsd/stats.c
> @@ -29,17 +29,29 @@
>  #include <linux/seq_file.h>
>  #include <linux/stat.h>
>  #include <linux/module.h>
> +#include <linux/jhash.h>
> +#include <linux/list.h>
> +#include <linux/swap.h>
> +#include <linux/log2.h>
>  
>  #include <linux/sunrpc/svc.h>
>  #include <linux/sunrpc/stats.h>
>  #include <linux/nfsd/nfsd.h>
>  #include <linux/nfsd/stats.h>
>  
> +#define NFSDDBG_FACILITY		NFSDDBG_STATS
> +
> +#define hentry_from_hnode(hn) \
> +	hlist_entry((hn), nfsd_stats_hentry_t, se_node)
> +
>  struct nfsd_stats	nfsdstats;
>  struct svc_stat		nfsd_svcstats = {
>  	.program	= &nfsd_program,
>  };
>  
> +int nfsd_stats_enabled = 1;
> +int nfsd_stats_prune_period = 2*86400;

For those of us that don't immediately recognize 86400 as the number of
seconds in a day, writing that out as " = 2*24*60*60;" could be a useful
hint.

Also nice: a comment with any rationale (however minor) for the choice
of period.

> +
>  static int nfsd_proc_show(struct seq_file *seq, void *v)
>  {
>  	int i;
> @@ -98,6 +110,225 @@ static const struct file_operations nfsd
>  	.release = single_release,
>  };
>  
> +
> +/*
> + * Stats hash pruning works thus.  A scan is run every prune period.
> + * On every scan, hentries with the OLD flag are detached and
> + * a reference dropped (usually that will be the last reference
> + * and the hentry will be deleted).  Hentries without the OLD flag
> + * have the OLD flag set; the flag is reset in nfsd_stats_get().
> + * So hentries with active traffic in the last 2 prune periods
> + * are not candidates for pruning.

s/2 prune periods/prune period/ ?

(From the description above: on exit from nfsd_stats_prune() all
remaining entries have OLD set.  Therefore if an entry is not touched in
the single period between two nfsd_stats_prune()'s, the second
nfsd_stats_prune() run will drop it.)

> + */
> +static void nfsd_stats_prune(unsigned long closure)
> +{
> +	nfsd_stats_hash_t *sh = (nfsd_stats_hash_t *)closure;
> +	unsigned int i;
> +	nfsd_stats_hentry_t *se;
> +	struct hlist_node *hn, *next;
> +	struct hlist_head to_be_dropped = HLIST_HEAD_INIT;
> +
> +	dprintk("nfsd_stats_prune\n");
> +
> +	if (!down_write_trylock(&sh->sh_sem)) {
> +		/* hash is busy...try again in a second */
> +		dprintk("nfsd_stats_prune: busy\n");
> +		mod_timer(&sh->sh_prune_timer, jiffies + HZ);

Could we make sh_sem a spinlock?  It doesn't look the the critical
sections ever need to sleep.

(Or even consider rcu, if we need the read lock on every rpc?  OK, I'm
mostly ignorant of rcu.)

> +		return;
> +	}
> +
> +	for (i = 0 ; i < sh->sh_size ; i++) {
> +		hlist_for_each_entry_safe(se, hn, next, &sh->sh_hash[i], se_node) {
> +			if (!test_and_set_bit(NFSD_STATS_HENTRY_OLD, &se->se_flags))

It looks like this is only ever used under the lock, so the
test_and_set_bit() is overkill.

> +				continue;
> +			hlist_del_init(&se->se_node);
> +			hlist_add_head(&se->se_node, &to_be_dropped);

Replace those two by hlist_move_list?

> +		}
> +	}
> +
> +	up_write(&sh->sh_sem);
> +
> +	dprintk("nfsd_stats_prune: deleting\n");
> +	hlist_for_each_entry_safe(se, hn, next, &to_be_dropped, se_node)
> +		nfsd_stats_put(se);

nfsd_stats_put() can down a semaphore, which we probably don't want in a
timer.  (So: make sh_sem a spinlock?)

> +
> +	mod_timer(&sh->sh_prune_timer, jiffies + nfsd_stats_prune_period * HZ);
> +}
> +
> +/*
> + * Initialise a stats hash.  Array size scales with
> + * server memory, as a loose heuristic for how many
> + * clients or exports a server is likely to have.
> + */
> +static void nfsd_stats_hash_init(nfsd_stats_hash_t *sh, const char *which)
> +{
> +	unsigned int nbits;
> +	unsigned int i;
> +
> +	init_rwsem(&sh->sh_sem);
> +
> +	nbits = 5 + ilog2(totalram_pages >> (30-PAGE_SHIFT));
> +	sh->sh_size = (1<<nbits);
> +	sh->sh_mask = (sh->sh_size-1);

Some comment on the choice of scale factor?  Also, see:

	http://marc.info/?l=linux-kernel&m=118299825922287&w=2

and followups.

Might consider a little helper function to do this kind of
fraction-of-total-memory calculation since I think the server does it in
3 or 4 places.

> +
> +	sh->sh_hash = kmalloc(sizeof(struct hlist_head) * sh->sh_size, GFP_KERNEL);

Can this be a more than a page?  (If so, could we just cap it at that
size to avoid >order-0 allocations and keep the kmalloc failure
unlikely?)

> +	if (sh->sh_hash == NULL) {
> +		printk(KERN_ERR "failed to allocate knfsd %s stats hashtable\n", which);
> +		/* struggle on... */
> +		return;
> +	}
> +	printk(KERN_INFO "knfsd %s stats hashtable, %u entries\n", which, sh->sh_size);

Eh.  Make it a dprintk?  Or maybe expose this in the nfsd filesystem if
it's not already?

> +
> +	for (i = 0 ; i < sh->sh_size ; i++)
> +		INIT_HLIST_HEAD(&sh->sh_hash[i]);
> +
> +	/* start the prune timer */
> +	init_timer(&sh->sh_prune_timer);
> +	sh->sh_prune_timer.function = nfsd_stats_prune;
> +	sh->sh_prune_timer.expires = jiffies + nfsd_stats_prune_period * HZ;
> +	sh->sh_prune_timer.data = (unsigned long)sh;
> +}
> +
> +/*
> + * Destroy a stats hash.  Drop what should be the last
> + * reference on all hentries, clean up the timer, and
> + * free the hash array.
> + */
> +static void nfsd_stats_hash_destroy(nfsd_stats_hash_t *sh)
> +{
> +	unsigned int i;
> +	nfsd_stats_hentry_t *se;
> +
> +	del_timer_sync(&sh->sh_prune_timer);
> +
> +	/* drop the last reference for all remaining hentries */
> +	for (i = 0 ; i < sh->sh_size ; i++) {
> +		struct hlist_head *hh = &sh->sh_hash[i];
> +
> +		while (hh->first != NULL) {
> +			se = hentry_from_hnode(hh->first);
> +			BUG_ON(atomic_read(&se->se_refcount) != 1);
> +			nfsd_stats_put(se);
> +		}
> +	}
> +
> +	if (sh->sh_hash != NULL) {

Drop the NULL check.

> +		kfree(sh->sh_hash);
> +	}
> +}
> +
> +/*
> + * Find and return a hentry for the given name, with a new refcount,
> + * creating it if necessary.  Will only return NULL on OOM or if
> + * stats are disabled.  Does it's own locking using the hash rwsem;
> + * may sleep.
> + */
> +nfsd_stats_hentry_t *nfsd_stats_find(nfsd_stats_hash_t *sh,
> +				     const char *name, int len)
> +{
> +	u32 hash;
> +	nfsd_stats_hentry_t *se, *new = NULL;
> +	struct hlist_node *hn;
> +
> +	dprintk("nfsd_stats_find: name %s len %d\n", name, len);
> +
> +	if (!nfsd_stats_enabled || sh->sh_hash == NULL)
> +		return NULL;
> +
> +
> +	/* search the hash table */
> +	hash = jhash(name, len, 0xfeedbeef) & sh->sh_mask;
> +	down_read(&sh->sh_sem);
> +	hlist_for_each_entry(se, hn, &sh->sh_hash[hash], se_node) {
> +		if (!strcmp(se->se_name, name)) {
> +			/* found matching */
> +			dprintk("nfsd_stats_find: found %s\n", se->se_name);
> +			nfsd_stats_get(se);
> +			up_read(&sh->sh_sem);
> +			return se;
> +		}
> +	}
> +	up_read(&sh->sh_sem);
> +
> +	/* not found, create a new one */
> +	dprintk("nfsd_stats_find: allocating new for %s\n", name);
> +	new = (nfsd_stats_hentry_t *)kmalloc(sizeof(*new), GFP_KERNEL);
> +	if (new == NULL)
> +		return NULL;
> +	/* initialise */
> +
> +	new->se_name = kmalloc(len+1, GFP_KERNEL);
> +	if (new->se_name == NULL) {
> +		kfree(new);
> +		return NULL;
> +	}
> +
> +	memcpy(new->se_name, name, len+1);
> +	atomic_set(&new->se_refcount, 2);/* 1 for the caller, 1 for the hash */
> +	new->se_hash = sh;
> +	new->se_flags = 0;
> +	INIT_HLIST_NODE(&new->se_node);
> +	memset(&new->se_data, 0, sizeof(new->se_data));
> +
> +	/* attach to the hash datastructure */
> +
> +	/*
> +	 * first check to see if we lost a race and some
> +	 * other thread already added a matching hentry.
> +	 */
> +	down_write(&sh->sh_sem);
> +	hlist_for_each_entry(se, hn, &sh->sh_hash[hash], se_node) {
> +		if (!strcmp(se->se_name, name)) {
> +			/* found matching, use that instead */
> +			dprintk("nfsd_stats_find: found(2) %s\n", name);
> +			kfree(new->se_name);
> +			kfree(new);
> +			nfsd_stats_get(se);
> +			up_write(&sh->sh_sem);
> +			return se;
> +		}
> +	}
> +	/* still not there, insert new one into the hash */
> +	hlist_add_head(&new->se_node, &sh->sh_hash[hash]);
> +
> +	up_write(&sh->sh_sem);
> +	return new;
> +}
> +
> +/*
> + * Drop a reference to a hentry, deleting the hentry if this
> + * was the last reference.  Does it's own locking using the

s/it's/its/

(Contending for the nitpick-of-the-day award.)

> + * hash rwsem; may sleep.
> + */
> +void
> +nfsd_stats_put(nfsd_stats_hentry_t *se)
> +{
> +	nfsd_stats_hash_t *sh = se->se_hash;
> +
> +	if (!atomic_dec_and_test(&se->se_refcount))
> +		return;
> +
> +	/* just dropped the last reference */
> +	down_write(&sh->sh_sem);
> +
> +	if (atomic_read(&se->se_refcount)) {
> +		/*
> +		 * We lost a race getting the write lock, and
> +		 * now there's a reference again.  Whatever.
> +		 */

Some kind of atomic_dec_and_lock() might close the race.

> +		goto out_unlock;
> +	}
> +
> +	dprintk("nfsd_stats_put: freeing %s\n", se->se_name);
> +	hlist_del(&se->se_node);
> +	kfree(se->se_name);
> +	kfree(se);
> +
> +out_unlock:
> +	up_write(&sh->sh_sem);
> +}
> +
> +
>  void
>  nfsd_stat_init(void)
>  {
> Index: bfields/include/linux/nfsd/stats.h
> ===================================================================
> --- bfields.orig/include/linux/nfsd/stats.h
> +++ bfields/include/linux/nfsd/stats.h
> @@ -40,6 +40,37 @@ struct nfsd_stats {
>  
>  };
>  
> +struct nfsd_op_stats {
> +	/* nothing to see here, yet */
> +};
> +
> +
> +typedef struct nfsd_stats_hash		nfsd_stats_hash_t;
> +typedef struct nfsd_stats_hentry	nfsd_stats_hentry_t;

Absent unusual circumstances, standard kernel style is to drop the
typedefs and use "struct nfsd_stats_{hash,hentry}" everywhere.

--b.

> +
> +/* Entry in the export and client stats hashtables */
> +struct nfsd_stats_hentry {
> +	struct hlist_node	se_node;	/* links hash chains */
> +	char			*se_name;
> +	atomic_t		se_refcount;	/* 1 for each user + 1 for hash */
> +#define NFSD_STATS_HENTRY_OLD	0
> +	unsigned long		se_flags;
> +	nfsd_stats_hash_t	*se_hash;
> +	struct nfsd_op_stats	se_data;
> +};
> +
> +/*
> + * Hashtable structure for export and client stats.
> + * Table width is chosen at boot time to scale with
> + * the size of the server.
> + */
> +struct nfsd_stats_hash {
> +	struct rw_semaphore	sh_sem;
> +	unsigned int		sh_size;
> +	unsigned int		sh_mask;
> +	struct hlist_head	*sh_hash;
> +	struct timer_list	sh_prune_timer;
> +};
>  
>  extern struct nfsd_stats	nfsdstats;
>  extern struct svc_stat		nfsd_svcstats;
> @@ -47,5 +78,17 @@ extern struct svc_stat		nfsd_svcstats;
>  void	nfsd_stat_init(void);
>  void	nfsd_stat_shutdown(void);
>  
> +extern nfsd_stats_hentry_t *nfsd_stats_find(nfsd_stats_hash_t *,
> +					    const char *name, int len);
> +static inline void
> +nfsd_stats_get(nfsd_stats_hentry_t *se)
> +{
> +	atomic_inc(&se->se_refcount);
> +	clear_bit(NFSD_STATS_HENTRY_OLD, &se->se_flags);
> +}
> +extern void nfsd_stats_put(nfsd_stats_hentry_t *se);
> +
> +
> +
>  #endif /* __KERNEL__ */
>  #endif /* LINUX_NFSD_STATS_H */
> Index: bfields/include/linux/nfsd/debug.h
> ===================================================================
> --- bfields.orig/include/linux/nfsd/debug.h
> +++ bfields/include/linux/nfsd/debug.h
> @@ -32,6 +32,7 @@
>  #define NFSDDBG_REPCACHE	0x0080
>  #define NFSDDBG_XDR		0x0100
>  #define NFSDDBG_LOCKD		0x0200
> +#define NFSDDBG_STATS		0x0400
>  #define NFSDDBG_ALL		0x7FFF
>  #define NFSDDBG_NOCHANGE	0xFFFF
>  
> 
> --
> Greg

  reply	other threads:[~2009-04-25  3:56 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-31 20:28 [patch 00/29] SGI enhancedNFS patches Greg Banks
2009-03-31 20:28 ` [patch 01/29] knfsd: Add infrastructure for measuring RPC service times Greg Banks
2009-04-25  2:13   ` J. Bruce Fields
2009-04-25  2:14     ` J. Bruce Fields
2009-04-25  2:52     ` Greg Banks
2009-03-31 20:28 ` [patch 02/29] knfsd: Add stats table infrastructure Greg Banks
2009-04-25  3:56   ` J. Bruce Fields [this message]
2009-04-26  4:12     ` Greg Banks
2009-03-31 20:28 ` [patch 03/29] knfsd: add userspace controls for stats tables Greg Banks
2009-04-25 21:57   ` J. Bruce Fields
2009-04-25 22:03     ` J. Bruce Fields
2009-04-27 16:06       ` Chuck Lever
2009-04-27 23:22         ` J. Bruce Fields
2009-04-28 15:37           ` Chuck Lever
2009-04-28 15:57             ` J. Bruce Fields
2009-04-28 16:03               ` Chuck Lever
2009-04-28 16:26                 ` J. Bruce Fields
2009-04-29  1:45               ` Greg Banks
     [not found]         ` <ac442c870904271827w6041a67ew82fe36a843beeac3@mail.gmail.com>
     [not found]           ` <ac442c870904271827w6041a67ew82fe36a843beeac3-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-04-28  1:31             ` Greg Banks
2009-04-26  4:14     ` Greg Banks
2009-03-31 20:28 ` [patch 04/29] knfsd: Add stats updating API Greg Banks
2009-03-31 20:28 ` [patch 05/29] knfsd: Infrastructure for providing stats to userspace Greg Banks
2009-04-01  0:28   ` J. Bruce Fields
2009-04-01  3:43     ` Greg Banks
2009-03-31 20:28 ` [patch 06/29] knfsd: Gather per-export stats Greg Banks
2009-03-31 20:28 ` [patch 07/29] knfsd: Prefetch the per-export stats entry Greg Banks
2009-03-31 20:28 ` [patch 08/29] knfsd: Gather per-client stats Greg Banks
2009-03-31 20:28 ` [patch 09/29] knfsd: Cache per-client stats entry on TCP transports Greg Banks
2009-03-31 20:28 ` [patch 10/29] knfsd: Update per-client & per-export stats from NFSv3 Greg Banks
2009-03-31 20:28 ` [patch 11/29] knfsd: Update per-client & per-export stats from NFSv2 Greg Banks
2009-03-31 20:28 ` [patch 12/29] knfsd: Update per-client & per-export stats from NFSv4 Greg Banks
2009-03-31 20:28 ` [patch 13/29] knfsd: reply cache cleanups Greg Banks
2009-05-12 19:54   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 14/29] knfsd: better hashing in the reply cache Greg Banks
2009-05-08 22:01   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 15/29] knfsd: fix reply cache memory corruption Greg Banks
2009-05-12 19:55   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 16/29] knfsd: use client IPv4 address in reply cache hash Greg Banks
2009-05-11 21:48   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 17/29] knfsd: make the reply cache SMP-friendly Greg Banks
2009-03-31 20:28 ` [patch 18/29] knfsd: dynamically expand the reply cache Greg Banks
2009-05-26 18:57   ` J. Bruce Fields
2009-05-26 19:04     ` J. Bruce Fields
2009-05-26 21:24     ` Rob Gardner
2009-05-26 21:52       ` J. Bruce Fields
2009-05-27  0:28       ` Greg Banks
2009-03-31 20:28 ` [patch 19/29] knfsd: faster probing in " Greg Banks
2009-03-31 20:28 ` [patch 20/29] knfsd: add extended reply cache stats Greg Banks
2009-03-31 20:28 ` [patch 21/29] knfsd: remove unreported filehandle stats counters Greg Banks
2009-05-12 20:00   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 22/29] knfsd: make svc_authenticate() scale Greg Banks
2009-05-12 21:24   ` J. Bruce Fields
2009-03-31 20:28 ` [patch 23/29] knfsd: introduce SVC_INC_STAT Greg Banks
2009-03-31 20:28 ` [patch 24/29] knfsd: remove the program field from struct svc_stat Greg Banks
2009-03-31 20:28 ` [patch 25/29] knfsd: allocate svc_serv.sv_stats dynamically Greg Banks
2009-03-31 20:28 ` [patch 26/29] knfsd: make svc_serv.sv_stats per-CPU Greg Banks
2009-03-31 20:28 ` [patch 27/29] knfsd: move hot procedure count field out of svc_procedure Greg Banks
2009-03-31 20:28 ` [patch 28/29] knfsd: introduce NFSD_INC_STAT() Greg Banks
2009-03-31 20:28 ` [patch 29/29] knfsd: make nfsdstats per-CPU Greg Banks
2009-04-01  0:23 ` [patch 00/29] SGI enhancedNFS patches J. Bruce Fields
2009-04-01  3:32   ` Greg Banks
     [not found]     ` <ac442c870903312032t34630c6dvdbb644cb510f8079-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-04-01  6:34       ` Jeff Garzik
2009-04-01  6:41         ` Greg Banks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090425035624.GC24770@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=gnb-xTcybq6BZ68@public.gmane.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.