From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932649AbYEUXdS (ORCPT ); Wed, 21 May 2008 19:33:18 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756959AbYEUXdI (ORCPT ); Wed, 21 May 2008 19:33:08 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:55619 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755867AbYEUXdF (ORCPT ); Wed, 21 May 2008 19:33:05 -0400 Date: Wed, 21 May 2008 16:32:35 -0700 From: Andrew Morton To: Srinivasa D S Cc: linux-kernel@vger.kernel.org, ananth@in.ibm.com, mhiramat@redhat.com, jkenisto@us.ibm.com, srikar@linux.vnet.ibm.com Subject: Re: [RFC] [PATCH] To improve kretprobe scalability Message-Id: <20080521163235.0fcb39ed.akpm@linux-foundation.org> In-Reply-To: <200805210632.17655.srinivasa@in.ibm.com> References: <200805210632.17655.srinivasa@in.ibm.com> X-Mailer: Sylpheed version 2.2.4 (GTK+ 2.8.20; i486-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 21 May 2008 06:32:17 +0530 Srinivasa D S wrote: > > Currently list of kretprobe instances are stored in kretprobe object (as > used_instances,free_instances) and in kretprobe hash table. We have one > global kretprobe lock to serialise the access to these lists. This causes > only one kretprobe handler to execute at a time. Hence affects system > performance, particularly on SMP systems and when return probe is set on > lot of functions (like on all systemcalls). > > Solution proposed here gives fine-grain locks that performs better on SMP > system compared to present kretprobe implementation. > > Solution: > 1) Instead of having one global lock to protect kretprobe instances > present in kretprobe object and kretprobe hash table. We will have two locks, > one lock for protecting kretprobe hash table and another lock for kretporbe > object. > > 2) We hold lock present in kretprobe object while we modify kretprobe > instance in kretprobe object and we hold per-hash-list lock while modifying > kretprobe instances present in that hash list. To prevent deadlock, we never > grab a per-hash-list lock while holding a kretprobe lock. > > 3) We can remove used_instances from struct kretprobe, as we can track used > instances of kretprobe instances using kretprobe hash table. Are you sure that all the code which was previously globally serialised is safe to run concurrently? > Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system > with return probes set on all systemcalls looks like this. > > Patched-kernel Un-Patched kernel > ============================================================ > real 10m49.694s real 13m21.276s > user 35m6.117s user 40m30.454s > sys 2m55.480s sys 3m23.221s > =========================================================== That's a large speedup. Do you have available the figures for the same workload when no probes are set at all? > --- linux-2.6.26-rc2-mm1.orig/kernel/kprobes.c > +++ linux-2.6.26-rc2-mm1/kernel/kprobes.c > @@ -62,6 +62,7 @@ > addr = ((kprobe_opcode_t *)(kallsyms_lookup_name(name))) > #endif > > +static int kprobes_initialized; > static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE]; > static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE]; > > @@ -69,8 +70,8 @@ static struct hlist_head kretprobe_inst_ > static bool kprobe_enabled; > > DEFINE_MUTEX(kprobe_mutex); /* Protects kprobe_table */ > -DEFINE_SPINLOCK(kretprobe_lock); /* Protects kretprobe_inst_table */ > static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL; > +static spinlock_t kretprobe_table_locks[KPROBE_TABLE_SIZE]; An array of 64 spinlocks. These will all be crammed into two or four cachelines. If there is really any concurrency in here, the contention will be large. I suggest that you experiment with one-lock-per-cacheline here and see whether it is worth doing that permanently. Something along the lines of static struct { spinlock_t lock __cacheline_aligned; } kretprobe_locks[KPROBE_TABLE_SIZE]; static spinlock_t *kretprobe_lock_ptr(...) { return kretprobe_locks + hash(...); } Also, none of this new code is needed on uniprocessor builds. Did we just add some bloat? > -struct hlist_head __kprobes *kretprobe_inst_table_head(struct task_struct > *tsk) > +void get_kretprobe_hash_lock(struct task_struct *tsk, > + struct hlist_head **head, unsigned long *flags) > +{ > + unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS); > + spinlock_t *hlist_lock; > + > + *head = &kretprobe_inst_table[hash]; > + hlist_lock = &kretprobe_table_locks[hash]; > + spin_lock_irqsave(hlist_lock, *flags); > +} > + > +void put_kretprobe_hash_lock(struct task_struct *tsk, unsigned long *flags) > { > - return &kretprobe_inst_table[hash_ptr(tsk, KPROBE_HASH_BITS)]; > + unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS); > + spinlock_t *hlist_lock; > + > + hlist_lock = &kretprobe_table_locks[hash]; > + spin_unlock_irqrestore(hlist_lock, *flags); > } I think these functions are misnamed. Generally functions whose names are of the form get_foo() and put_foo() are used for acquiring and releasing reference counts. These functions don't do that. kretprobe_hash_lock() and kretprobe_hash_unlock() would be better? > /* > @@ -401,17 +417,21 @@ void __kprobes kprobe_flush_task(struct > struct kretprobe_instance *ri; > struct hlist_head *head, empty_rp; > struct hlist_node *node, *tmp; > - unsigned long flags = 0; > + unsigned long hash, flags = 0; > > - INIT_HLIST_HEAD(&empty_rp); > - spin_lock_irqsave(&kretprobe_lock, flags); > - head = kretprobe_inst_table_head(tk); > + if (unlikely(!kprobes_initialized)) > + /* Early boot. kretprobe_table_locks not yet initialized. */ > + return; > + > + hash = hash_ptr(tk, KPROBE_HASH_BITS); > + head = &kretprobe_inst_table[hash]; > + spin_lock_irqsave(&kretprobe_table_locks[hash], flags); that's an open-coded copy of get_kretprobe_hash_lock()? > hlist_for_each_entry_safe(ri, node, tmp, head, hlist) { > if (ri->task == tk) > recycle_rp_inst(ri, &empty_rp); > } > - spin_unlock_irqrestore(&kretprobe_lock, flags); > - > + spin_unlock_irqrestore(&kretprobe_table_locks[hash], flags); and of put_kretprobe_hash_lock(). Perhaps a separate function which just obtains the spinlock_t* is needed. > + INIT_HLIST_HEAD(&empty_rp); > hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) { > hlist_del(&ri->hlist); > kfree(ri); > @@ -423,24 +443,29 @@ static inline void free_rp_inst(struct k > struct kretprobe_instance *ri; > struct hlist_node *pos, *next; > > - hlist_for_each_entry_safe(ri, pos, next, &rp->free_instances, uflist) { > - hlist_del(&ri->uflist); > + hlist_for_each_entry_safe(ri, pos, next, &rp->free_instances, hlist) { > + hlist_del(&ri->hlist); > kfree(ri); > } > } >