Re: [RFC] [PATCH] To improve kretprobe scalability

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Srinivasa D S <srinivasa@in.ibm.com>
Cc: linux-kernel@vger.kernel.org, ananth@in.ibm.com,
	mhiramat@redhat.com, jkenisto@us.ibm.com,
	srikar@linux.vnet.ibm.com
Subject: Re: [RFC] [PATCH] To improve kretprobe scalability
Date: Wed, 21 May 2008 16:32:35 -0700	[thread overview]
Message-ID: <20080521163235.0fcb39ed.akpm@linux-foundation.org> (raw)
In-Reply-To: <200805210632.17655.srinivasa@in.ibm.com>

On Wed, 21 May 2008 06:32:17 +0530
Srinivasa D S <srinivasa@in.ibm.com> wrote:

> 
> Currently list of kretprobe instances are stored in kretprobe object (as
> used_instances,free_instances) and in kretprobe hash table. We have one 
> global kretprobe lock to serialise the access to these lists. This causes 
> only one kretprobe handler to execute at a time. Hence affects system 
> performance, particularly on SMP systems and when return probe is set on
> lot of functions (like on all systemcalls).
> 
> Solution proposed here gives fine-grain locks that performs better on SMP
> system compared to present kretprobe implementation.
> 
> Solution:
>    1) Instead of having one global lock to protect kretprobe instances  
> present in kretprobe object and kretprobe hash table. We will have two locks, 
> one lock for protecting kretprobe hash table and another lock for kretporbe 
> object.
> 	
>   2) We hold lock present in kretprobe object while we modify kretprobe 
> instance in kretprobe object and we hold per-hash-list lock while modifying 
> kretprobe instances present in that hash list. To prevent deadlock, we never 
> grab a per-hash-list lock while holding a kretprobe lock.
> 
>   3) We can remove used_instances from struct kretprobe, as we can track used
> instances of kretprobe instances using kretprobe hash table.

Are you sure that all the code which was previously globally serialised
is safe to run concurrently?

> Time duration for  kernel compilation ("make -j 8") on a 8-way ppc64 system
> with return probes set on all systemcalls looks like this.
> 
> Patched-kernel                            Un-Patched kernel
> ============================================================
> real    10m49.694s                       real    13m21.276s
> user    35m6.117s                        user    40m30.454s
> sys     2m55.480s                        sys     3m23.221s
> ===========================================================

That's a large speedup.

Do you have available the figures for the same workload when no probes
are set at all?

> --- linux-2.6.26-rc2-mm1.orig/kernel/kprobes.c
> +++ linux-2.6.26-rc2-mm1/kernel/kprobes.c
> @@ -62,6 +62,7 @@
>  	addr = ((kprobe_opcode_t *)(kallsyms_lookup_name(name)))
>  #endif
>  
> +static int kprobes_initialized;
>  static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
>  static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
>  
> @@ -69,8 +70,8 @@ static struct hlist_head kretprobe_inst_
>  static bool kprobe_enabled;
>  
>  DEFINE_MUTEX(kprobe_mutex);		/* Protects kprobe_table */
> -DEFINE_SPINLOCK(kretprobe_lock);	/* Protects kretprobe_inst_table */
>  static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL;
> +static spinlock_t kretprobe_table_locks[KPROBE_TABLE_SIZE];

An array of 64 spinlocks.  These will all be crammed into two or four
cachelines.  If there is really any concurrency in here, the contention
will be large.

I suggest that you experiment with one-lock-per-cacheline here and see
whether it is worth doing that permanently.

Something along the lines of

static struct {
	spinlock_t lock __cacheline_aligned;
} kretprobe_locks[KPROBE_TABLE_SIZE];

static spinlock_t *kretprobe_lock_ptr(...)
{
	return kretprobe_locks + hash(...);
}


Also, none of this new code is needed on uniprocessor builds.  Did we
just add some bloat?

> -struct hlist_head __kprobes *kretprobe_inst_table_head(struct task_struct 
> *tsk)
> +void get_kretprobe_hash_lock(struct task_struct *tsk,
> +			 struct hlist_head **head, unsigned long *flags)
> +{
> +	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
> +	spinlock_t *hlist_lock;
> +
> +	*head = &kretprobe_inst_table[hash];
> +	hlist_lock = &kretprobe_table_locks[hash];
> +	spin_lock_irqsave(hlist_lock, *flags);
> +}
> +
> +void put_kretprobe_hash_lock(struct task_struct *tsk, unsigned long *flags)
>  {
> -	return &kretprobe_inst_table[hash_ptr(tsk, KPROBE_HASH_BITS)];
> +	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
> +	spinlock_t *hlist_lock;
> +
> +	hlist_lock = &kretprobe_table_locks[hash];
> +	spin_unlock_irqrestore(hlist_lock, *flags);
>  }

I think these functions are misnamed.  Generally functions whose names
are of the form get_foo() and put_foo() are used for acquiring and
releasing reference counts.  These functions don't do that.

kretprobe_hash_lock() and kretprobe_hash_unlock() would be better?

>  /*
> @@ -401,17 +417,21 @@ void __kprobes kprobe_flush_task(struct 
>  	struct kretprobe_instance *ri;
>  	struct hlist_head *head, empty_rp;
>  	struct hlist_node *node, *tmp;
> -	unsigned long flags = 0;
> +	unsigned long hash, flags = 0;
>  
> -	INIT_HLIST_HEAD(&empty_rp);
> -	spin_lock_irqsave(&kretprobe_lock, flags);
> -	head = kretprobe_inst_table_head(tk);
> +	if (unlikely(!kprobes_initialized))
> +		/* Early boot.  kretprobe_table_locks not yet initialized. */
> +		return;
> +
> +	hash = hash_ptr(tk, KPROBE_HASH_BITS);
> +	head = &kretprobe_inst_table[hash];
> +	spin_lock_irqsave(&kretprobe_table_locks[hash], flags);

that's an open-coded copy of get_kretprobe_hash_lock()?

>  	hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
>  		if (ri->task == tk)
>  			recycle_rp_inst(ri, &empty_rp);
>  	}
> -	spin_unlock_irqrestore(&kretprobe_lock, flags);
> -
> +	spin_unlock_irqrestore(&kretprobe_table_locks[hash], flags);

and of put_kretprobe_hash_lock().  Perhaps a separate function which
just obtains the spinlock_t* is needed.

> +	INIT_HLIST_HEAD(&empty_rp);
>  	hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
>  		hlist_del(&ri->hlist);
>  		kfree(ri);
> @@ -423,24 +443,29 @@ static inline void free_rp_inst(struct k
>  	struct kretprobe_instance *ri;
>  	struct hlist_node *pos, *next;
>  
> -	hlist_for_each_entry_safe(ri, pos, next, &rp->free_instances, uflist) {
> -		hlist_del(&ri->uflist);
> +	hlist_for_each_entry_safe(ri, pos, next, &rp->free_instances, hlist) {
> +		hlist_del(&ri->hlist);
>  		kfree(ri);
>  	}
>  }
>

next prev parent reply	other threads:[~2008-05-21 23:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-21  1:02 [RFC] [PATCH] To improve kretprobe scalability Srinivasa D S
2008-05-21 23:32 ` Andrew Morton [this message]
2008-05-22  8:26   ` Srinivasa D S
2008-05-27  8:22     ` Ananth N Mavinakayanahalli
2008-05-22  7:07 ` Abhishek Sagar
2008-05-22  8:42   ` Srinivasa DS
2008-05-22 12:16     ` Abhishek Sagar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080521163235.0fcb39ed.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=ananth@in.ibm.com \
    --cc=jkenisto@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=srinivasa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.