From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with ESMTP id A4FE0900087 for ; Tue, 19 Apr 2011 06:58:54 -0400 (EDT) Subject: Re: [PATCH v3 2.6.39-rc1-tip 12/26] 12: uprobes: slot allocation for uprobes From: Peter Zijlstra In-Reply-To: <20110419062654.GB10698@linux.vnet.ibm.com> References: <20110401143223.15455.19844.sendpatchset@localhost6.localdomain6> <20110401143457.15455.64839.sendpatchset@localhost6.localdomain6> <1303145171.32491.886.camel@twins> <20110419062654.GB10698@linux.vnet.ibm.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Date: Tue, 19 Apr 2011 11:11:48 +0200 Message-ID: <1303204308.32491.923.camel@twins> Mime-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Srikar Dronamraju Cc: James Morris , Ingo Molnar , Steven Rostedt , Linux-mm , Arnaldo Carvalho de Melo , Linus Torvalds , Jonathan Corbet , Christoph Hellwig , Masami Hiramatsu , Thomas Gleixner , Ananth N Mavinakayanahalli , Oleg Nesterov , Andrew Morton , Jim Keniston , Roland McGrath , Andi Kleen , LKML (Dropped the systemtap list since its mis-behaving, please leave it out on = future postings) On Tue, 2011-04-19 at 11:56 +0530, Srikar Dronamraju wrote: > > > TODO: On massively threaded processes (or if a huge number of process= es > > > share the same mm), there is a possiblilty of running out of slots. > > > One alternative could be to extend the slots as when slots are requir= ed. > >=20 > > As long as you're single stepping things and not using boosted probes > > you can fully serialize the slot usage. Claim a slot on trap and releas= e > > the slot on finish. Claiming can wait on a free slot since you already > > have the whole SLEEPY thing. > >=20 >=20 > Yes, thats certainly one approach but that approach makes every > breakpoint hit contend for spinlock. (Infact we will have to change it > to mutex lock (as you rightly pointed out) so that we allow threads to > wait when slots are not free). Assuming a 4K page, we would be taxing > applications that have less than 32 threads (which is probably the > default case). If we continue with the current approach, then we > could only add additional page(s) for apps which has more than 32 > threads and only when more than 32 __live__ threads have actually hit a > breakpoint.=20 That very much depends on what you do, some folks think its entirely reasonable for processes to have thousands of threads. Now I completely agree with you that that is not 'normal', but then I think using Java isn't normal either ;-) Anyway, avoiding that spinlock/mutex for each trap isn't hard, avoiding a process wide cacheline bounce is slightly harder but still not impossible.=20 With 32 slots in 4k you have 128 bytes to play with, all we need is a single bit per slot to mark it being in-use. If a task remembers what slot it used last and tries to claim that using an atomic test and set for that bit it will, in the 'normal' case, never contend on a process wide cacheline. In case it does find the slot taken, it'll have to go the slow route and scan for a free slot and possibly wait for one to become free. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org