From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755180AbWKMQdp (ORCPT ); Mon, 13 Nov 2006 11:33:45 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755192AbWKMQdp (ORCPT ); Mon, 13 Nov 2006 11:33:45 -0500 Received: from mx2.mail.elte.hu ([157.181.151.9]:20621 "EHLO mx2.mail.elte.hu") by vger.kernel.org with ESMTP id S1755180AbWKMQdn (ORCPT ); Mon, 13 Nov 2006 11:33:43 -0500 Date: Mon, 13 Nov 2006 17:32:16 +0100 From: Ingo Molnar To: Andi Kleen Cc: "Siddha, Suresh B" , Andrew Morton , linux-kernel@vger.kernel.org, ashok.raj@intel.com Subject: Re: [patch] genapic: optimize & fix APIC mode setup Message-ID: <20061113163216.GA3480@elte.hu> References: <20061111151414.GA32507@elte.hu> <200611131529.46464.ak@suse.de> <20061113150415.GA20321@elte.hu> <200611131710.13285.ak@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200611131710.13285.ak@suse.de> User-Agent: Mutt/1.4.2.2i X-ELTE-SpamScore: -2.8 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-2.8 required=5.9 tests=ALL_TRUSTED,AWL,BAYES_50 autolearn=no SpamAssassin version=3.0.3 -3.3 ALL_TRUSTED Did not pass through any untrusted hosts 0.5 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5000] -0.0 AWL AWL: From: address is in the auto white-list X-ELTE-VirusStatus: clean Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org * Andi Kleen wrote: > > > (IPIs tend to be thousands of cycles). [...] > > > > Programming the IPI itself is cheap. > > yep, that is why physical mode doesn't hurt much. and that is why your argument of thousands of cycles per IPI 'cost', in comparison to which any other cost is miniscule, was misleading IMO. > > > for_each_cpu_mask is essentially just BSF with some glue. > > > > yes - it's a minimum of 2 BSF scans, and that isnt exactly the cheapest > > of instructions. It should be tens of cycles or more, combined with the > > nonzero cost of passing a cpumask (which is 32 bytes) into the function > > by value ... > > Sorry, but everytime IPIs and all the other overhead etc. are involved > even 10 BSFs would be completely in the noise. The relative cost is > so much smaller. I think you're barking up the wrong tree here > regarding that loop. i'm puzzled, you just confirmed above what i said before, that programming the IPI is cheap... If something is cheap then other costs added to (such as cpu-mask scanning, and passing around the cpumask) definitely do matter. > There are surely inefficiencies in the code, but I don't think they're > related to cpu loops. it's the death by thousands of small cuts ... > [BTW the code for the cpu loop is much worse than it could be [1], but > that's a different story. But even with the worse code it doesn't > matter much] so by your argument nothing matters, as long as it's just a small cost?? My guesstimation is that the cost is a few dozen cycles per IPI. > > and your argument again, is to hide a hotplug bug > > I don't think it's a bug -- it's inherent. please explain. > > indeed, you are right. Btw., this is another change to io_apic.c that i > > would never have ACKed ;-) This whole thing is hidden via something that > > looks like a macro value: > > Yes agreed the magic macros are ugly. It has historical reasons from > i386 (and I'm partly to blame). Essentially it comes out of the mess > that i386 subarchs are and when moving genapic to x86-64 that > particular ugliness wasn't cleaned up. yes. It comes out of tacking a feature ontop of a messy codebase (which got messy partly because other, similar features like subarch ioapic code was not done in conjunction with the necessary cleanup) - this mirrors the mess that is in the time code currently. Mess like that is just like the overhead of small featurities - it only shows up in a noticeable way if piled together. > > physical delivery is bad on the IO-APIC anyway - today LowestPrio > > delivery mode works again and the hardware can help us. > > Help us with what exactly? logical flat APIC mode defaults to dest_LowestPrio - this delivery mode allows the hardware (the chipset) to optimize IRQ routing. This is only available in logical delivery mode, because the 'target CPU range' there is an APIC ID mask, inherently. the hardware might know details like 'is a target core in lowpower mode' and redirect IRQs accordingly. We could do the same from the kernel only via expensive irq-redirection of IO-APIC entries. > > you are risking the introduction of the kind of regressions that i'm > > seeing on my box: that physical delivery mystically doesnt work > > occasionally. AFAIK Windows defaults to logical APIC delivery too on > > small systems. (it in fact uses the TPR IIRC which can only work > > with logical delivery) > > Yes that's a real risk. it's an unknown risk against the supposedly-known risk of the hotplug-CPU breakage. We should pick known risks, obviously. > > So your suggestion would put us up to use an uncommon (and more > > expensive ...) mode of IPI delivery - instead of limiting that rare > > delivery mode to large SMP systems only ... > > > > and again, please remind me of what you are trying to argue: that we > > should keep the slower and less common delivery mode just to not > > have to fix the hotplug bug? > > Ok assuming temporarily it's a bug, how would you want to fix it? for that i'd have to know the bug, and this is the third time i'm asking about specifics :-) (The URL that was given in the thread was about a chipset bug regarding incompatibility with clustered-APIC mode - my patch in fact - because it switches small systems to use logical flat mode always - solves that kind of regression too.) Ingo