From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754782AbdIGIcC (ORCPT ); Thu, 7 Sep 2017 04:32:02 -0400 Received: from mga06.intel.com ([134.134.136.31]:48166 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754073AbdIGIcB (ORCPT ); Thu, 7 Sep 2017 04:32:01 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.42,357,1500966000"; d="scan'208";a="146480710" Date: Thu, 7 Sep 2017 16:34:06 +0800 From: Yu Chen To: Thomas Gleixner Cc: x86@kernel.org, Ingo Molnar , "H. Peter Anvin" , Rui Zhang , LKML , "Rafael J. Wysocki" , Len Brown , Dan Williams , Christoph Hellwig , Peter Zijlstra , Jeff Kirsher Subject: Re: [PATCH 4/4][RFC v2] x86/apic: Spread the vectors by choosing the idlest CPU Message-ID: <20170907083405.GA24450@localhost.localdomain> References: <20170906043454.GD23250@localhost.localdomain> <20170907025212.GA18130@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 07, 2017 at 07:54:09AM +0200, Thomas Gleixner wrote: > On Thu, 7 Sep 2017, Yu Chen wrote: > > On Wed, Sep 06, 2017 at 10:03:58AM +0200, Thomas Gleixner wrote: > > > Can you please apply the debug patch below, boot the machine and right > > > after login provide the output of > > > > > > # cat /sys/kernel/debug/tracing/trace > > > > > kworker/0:2-303 [000] .... 9.135467: msi_domain_alloc_irqs: dev: 0000:bb:00.0 nvec 1 virq 34 > > kworker/0:2-303 [000] .... 9.135476: msi_domain_alloc_irqs: dev: 0000:bb:00.0 nvec 1 virq 35 > > kworker/0:2-303 [000] .... 9.135484: msi_domain_alloc_irqs: dev: 0000:bb:00.0 nvec 1 virq 36 > > > > > kworker/0:2-303 [000] .... 9.762268: msi_domain_alloc_irqs: dev: 0000:bb:00.3 nvec 1 virq 331 > > kworker/0:2-303 [000] .... 9.762278: msi_domain_alloc_irqs: dev: 0000:bb:00.3 nvec 1 virq 332 > > kworker/0:2-303 [000] .... 9.762288: msi_domain_alloc_irqs: dev: 0000:bb:00.3 nvec 1 virq 333 > > That's 300 vectors. > > > bb:00.[0-3] Ethernet controller: Intel Corporation Device 37d0 (rev 03) > > > > -+-[0000:b2]-+-00.0-[b3-bc]----00.0-[b4-bc]--+-00.0-[b5-b6]----00.0 > > | | +-01.0-[b7-b8]----00.0 > > | | +-02.0-[b9-ba]----00.0 > > | | \-03.0-[bb-bc]--+-00.0 > > | | +-00.1 > > | | +-00.2 > > | | \-00.3 > > > > and they are using i40e driver, the vectors should be reserved by: > > i40e_probe() -> > > i40e_init_interrupt_scheme() -> > > i40e_init_msix() -> > > i40e_reserve_msix_vectors() -> > > pci_enable_msix_range() > > > > # ls /sys/kernel/debug/irq/irqs > > 0 10 11 13 142 184 217 259 292 31 33 > > 337 339 340 342 344 346 348 350 352 354 356 > > 358 360 362 364 366 368 370 372 374 376 378 > > 380 382 384 386 388 390 392 394 4 6 7 9 > > 1 109 12 14 15 2 24 26 3 32 335 > > 338 34 341 343 345 347 349 351 353 355 357 > > 359 361 363 365 367 369 371 373 375 377 379 > > 381 383 385 387 389 391 393 395 5 67 8 > > Out of these 300 interrupts exactly 8 randomly selected ones are actively > used. And the other 292 interrupts are just there because it might need > them in the future when the 32 CPU machine gets magically upgraded to 4096 > cores at runtime? > Humm, the 292 vectors remain disabled due to the network devices have not been enabled(say,ifconfig up does not get invoked), so request_irq() does not get invoked for these vectors? I have an impression that once I've borrowed some fiber cables to connect the platform, the active IRQ from i40e raised a lot, although I don't have these expensive cables now... > Can the i40e people @intel please fix this waste of resources and sanitize > their interrupt allocation scheme? > > Please switch it over to managed interrupts so the affinity spreading > happens in a sane way and the interrupts are properly managed on CPU > hotplug. Ok, I think currently in i40e driver the reservation of vectors leverages pci_enable_msix_range() and did not provide the affinity hit to low level IRQ system thus the managed interrupts is not enabled there(although later in i40e driver we use irq_set_affinity_hint() to spread the IRQs) Thanks, Yu > > Thanks, > > tglx