public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Yu Chen <yu.c.chen@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>, Peter Anvin <hpa@zytor.com>,
	Marc Zyngier <marc.zyngier@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>, Rui Zhang <rui.zhang@intel.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Christoph Hellwig <hch@lst.de>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Joerg Roedel <joro@8bytes.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Juergen Gross <jgross@suse.com>, Tony Luck <tony.luck@intel.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	Alok Kataria <akataria@vmware.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Arjan van de Ven <arjan@linux.intel.com>
Subject: Re: [patch 00/52] x86: Rework the vector management
Date: Tue, 19 Sep 2017 17:12:29 +0800	[thread overview]
Message-ID: <20170919091228.GA8704@yu-chen.sh.intel.com> (raw)
In-Reply-To: <20170913212902.530704676@linutronix.de>

On Wed, Sep 13, 2017 at 11:29:02PM +0200, Thomas Gleixner wrote:
> Sorry for the large CC list, but this is a major surgery.
> 
> The vector management in x86 including the surrounding code is a
> conglomorate of ancient bits and pieces which have been subject to
> 'modernization' and featuritis over the years. The most obscure parts are
> the vector allocation mechanics, the cleanup vector handling and the cpu
> hotplug machinery. Replacing these pieces of art was on my todo list for a
> long time.
> 
> Recent attempts to 'solve' CPU offline / hibernation issues which are
> partially caused by the current vector management implementation made me
> look for real. Further information in this thread:
> 
>     http://lkml.kernel.org/r/cover.1504235838.git.yu.c.chen@intel.com
> 
> Aside of drivers allocating gazillion of interrupts, there are quite some
> things which can be addressed in the x86 vector management and in the core
> code.
> 
>   - Multi CPU affinities:
> 
>     A dubious property which is not available on all machines and causes
>     major complexity both in the allocator and the cleanup/hotplug
>     management. See:
> 
>        http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos
> 
>   - Priority level spreading:
> 
>     An obscure and undocumented property which I think is sufficiently
>     argued to be not required in:
> 
>        http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos
> 
>   - Allocation of vectors when interrupt descriptors are allocated.
> 
>     This is a historical implementation detail, which is not really
>     required when the vector allocation is delayed up to the point when
>     request_irq() is invoked. This might make request_irq() fail, when the
>     vector space is exhausted, but drivers should handle request_irq()
>     fails anyway.
> 
>     The upside of changing this is that the active vector space becomes
>     smaller especially on hibernation/cpu offline when drivers shut down
>     queue interrupts of outgoing CPUs.
> 
>     Some of this is already addressed with the managed interrupt facility,
>     but that was bolted on top of the existing vector management because
>     proper integration was not possible at that point. I take the blame
>     for this, but the tradeoff of not doing it would have been more
>     broken driver boiler plate code all over the place. So I went for the
>     lesser of two evils.
> 
>   - Allocation of vectors on the wrong place
> 
>     Even for managed interrupts the vector allocation at descriptor
>     allocation happens on the wrong place and gets fixed after the fact
>     with a call to set_affinity(). In case of not remapped interrupts
>     this results in at least one interrupt on the wrong CPU before it is
>     migrated to the desired target.
> 
>   - Lack of instrumentation
>  
>     All of this is a black box which allows no insight into the actual
>     vector usage.
> 
> The series addresses these points and converts the x86 vector management to
> a bitmap based allocator which provides proper reservation management for
> 'managed interrupts' and best effort reservation for regular interrupts.
> The latter allows overcommitment, which 'fixes' some of hotplug/hibernation
> problems in a clean way. It can't fix all of them depending on the driver
> involved.
> 
> This rework is no excuse for driver writers to do exhaustive vector
> allocations instead of utilizing the managed interrupt infrastructure, but
> it addresses long standing issues in this code with the side effect of
> mitigating some of the driver oddities. The proper solution for multi queue
> management are 'managed interrupts' which has been proven in the block-mq
> work as they solve issues which are worked around in other drivers in
> creative ways with lots of copied code and often enough broken attempts to
> handle interrupt affinity and CPU hotplug problems.
> 
> The new bitmap allocator and the x86 vector management code are
> instrumented with tracepoints and the irq domain debugfs files allow deep
> insight into the vector allocation and reservations.
> 
> The patches work on machines with and without interrupt remapping and
> inside of KVM guests of various flavours, though I have no idea what I
> broke on the way with other hypervisors, posted interrupts etc. So I kindly
> ask for your support in testing and review.
> 
> The series applies on top of Linus tree and is available as git branch:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/apic
> 
> Note, that this branch is Linus tree plus scheduler and x86 fixes which I
> required to do proper testing. They have outstanding pull requests and
> might be merged already when you read this.
> 
> Thanks,
> 
> 	tglx
> ---
Tested on top of:
commit e1b476ae32fcfa59fc6752b4b01988e759269dc3
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Thu Sep 14 09:53:10 2017 +0200

    x86/vector: Exclude IRQ0 from reservation mode

from branch WIP.x86/apic, on a platform with 16 cores,
bootup okay, cpu[1-31] offline/online okay.
Before offline:

name:   VECTOR
 size:   0
 mapped: 484
 flags:  0x00000041
Online bitmaps:       32
Global available:   6419
Global reserved:     407
Total allocated:      77
System: 41: 0-19,32,50,128,238-255
 | CPU | avl | man | act | vectors
     0   126     0    77  33-49,51-110
     1   203     0     0  
     2   203     0     0  
     3   203     0     0  
     4   203     0     0  
     5   203     0     0  
     6   203     0     0  
     7   203     0     0  
     8   203     0     0  
     9   203     0     0  
    10   203     0     0  
    11   203     0     0  
    12   203     0     0  
    13   203     0     0  
    14   203     0     0  
    15   203     0     0  
    16   203     0     0  
    17   203     0     0  
    18   203     0     0  
    19   203     0     0  
    20   203     0     0  
    21   203     0     0  
    22   203     0     0  
    23   203     0     0  
    24   203     0     0  
    25   203     0     0  
    26   203     0     0  
    27   203     0     0  
    28   203     0     0  
    29   203     0     0  
    30   203     0     0  
    31   203     0     0 

After offline:

name:   VECTOR
 size:   0
 mapped: 484
 flags:  0x00000041
Online bitmaps:        1
Global available:    126
Global reserved:     407
Total allocated:      77
System: 41: 0-19,32,50,128,238-255
 | CPU | avl | man | act | vectors
     0   126     0    77  33-49,51-110

 Thanks,
 	Yu

      parent reply	other threads:[~2017-09-19  9:10 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-13 21:29 [patch 00/52] x86: Rework the vector management Thomas Gleixner
2017-09-13 21:29 ` [patch 01/52] genirq: Fix cpumask check in __irq_startup_managed() Thomas Gleixner
2017-09-16 18:24   ` [tip:irq/urgent] " tip-bot for Thomas Gleixner
2017-09-13 21:29 ` [patch 02/52] genirq/debugfs: Show debug information for all irq descriptors Thomas Gleixner
2017-09-13 21:29 ` [patch 03/52] genirq/msi: Capture device name for debugfs Thomas Gleixner
2017-09-13 21:29 ` [patch 04/52] irqdomain/debugfs: Provide domain specific debug callback Thomas Gleixner
2017-09-13 21:29 ` [patch 05/52] genirq: Make state consistent for !IRQ_DOMAIN_HIERARCHY Thomas Gleixner
2017-09-13 21:29 ` [patch 06/52] genirq: Set managed shut down flag at init Thomas Gleixner
2017-09-13 21:29 ` [patch 07/52] genirq: Separate activation and startup Thomas Gleixner
2017-09-13 21:29 ` [patch 08/52] genirq/irqdomain: Update irq_domain_ops.activate() signature Thomas Gleixner
2017-09-13 21:29 ` [patch 09/52] genirq/irqdomain: Allow irq_domain_activate_irq() to fail Thomas Gleixner
2017-09-13 21:29 ` [patch 10/52] genirq/irqdomain: Propagate early activation Thomas Gleixner
2017-09-13 21:29 ` [patch 11/52] genirq/irqdomain: Add force reactivation flag to irq domains Thomas Gleixner
2017-09-13 21:29 ` [patch 12/52] genirq: Implement bitmap matrix allocator Thomas Gleixner
2017-09-13 21:29 ` [patch 13/52] genirq/matrix: Add tracepoints Thomas Gleixner
2017-09-13 21:29 ` [patch 14/52] x86/apic: Deinline x2apic functions Thomas Gleixner
2017-09-13 21:29 ` [patch 15/52] x86/apic: Sanitize return value of apic.set_apic_id() Thomas Gleixner
2017-09-13 21:29 ` [patch 16/52] x86/apic: Sanitize return value of check_apicid_used() Thomas Gleixner
2017-09-13 21:29 ` [patch 17/52] x86/apic: Move probe32 specific APIC functions Thomas Gleixner
2017-09-13 21:29 ` [patch 18/52] x86/apic: Move APIC noop specific functions Thomas Gleixner
2017-09-13 21:29 ` [patch 19/52] x86/apic: Sanitize 32/64bit APIC callbacks Thomas Gleixner
2017-09-13 21:29 ` [patch 20/52] x86/apic: Move common " Thomas Gleixner
2017-09-13 21:29 ` [patch 21/52] x86/apic: Reorganize struct apic Thomas Gleixner
2017-09-13 21:29 ` [patch 22/52] x86/apic/x2apic: Simplify cluster management Thomas Gleixner
2017-09-13 21:29 ` [patch 23/52] x86/apic: Get rid of apic->target_cpus Thomas Gleixner
2017-09-13 21:29 ` [patch 24/52] x86/vector: Rename used_vectors to system_vectors Thomas Gleixner
2017-09-13 21:29 ` [patch 25/52] x86/apic: Get rid of multi CPU affinity Thomas Gleixner
2017-09-13 21:29 ` [patch 26/52] x86/ioapic: Remove obsolete post hotplug update Thomas Gleixner
2017-09-13 21:29 ` [patch 27/52] x86/vector: Simplify the CPU hotplug vector update Thomas Gleixner
2017-09-13 21:29 ` [patch 28/52] x86/vector: Cleanup variable names Thomas Gleixner
2017-09-13 21:29 ` [patch 29/52] x86/vector: Store the single CPU targets in apic data Thomas Gleixner
2017-09-13 21:29 ` [patch 30/52] x86/vector: Simplify vector move cleanup Thomas Gleixner
2017-09-13 21:29 ` [patch 31/52] x86/ioapic: Mark legacy vectors at reallocation time Thomas Gleixner
2017-09-13 21:29 ` [patch 32/52] x86/apic: Get rid of the legacy irq data storage Thomas Gleixner
2017-09-13 21:29 ` [patch 33/52] x86/vector: Remove pointless pointer checks Thomas Gleixner
2017-09-13 21:29 ` [patch 34/52] x86/vector: Move helper functions around Thomas Gleixner
2017-09-13 21:29 ` [patch 35/52] x86/apic: Add replacement for cpu_mask_to_apicid() Thomas Gleixner
2017-09-13 21:29 ` [patch 36/52] x86/irq/vector: Initialize matrix allocator Thomas Gleixner
2017-09-13 21:29 ` [patch 37/52] x86/vector: Add vector domain debugfs support Thomas Gleixner
2017-09-13 21:29 ` [patch 38/52] x86/smpboot: Set online before setting up vectors Thomas Gleixner
2017-09-13 21:29 ` [patch 39/52] x86/vector: Add tracepoints for vector management Thomas Gleixner
2017-09-13 21:29 ` [patch 40/52] x86/vector: Use matrix allocator for vector assignment Thomas Gleixner
2017-09-13 21:29 ` [patch 41/52] x86/apic: Remove unused callbacks Thomas Gleixner
2017-09-13 21:29 ` [patch 42/52] x86/vector: Compile SMP only code conditionally Thomas Gleixner
2017-09-13 21:29 ` [patch 43/52] x86/vector: Untangle internal state from irq_cfg Thomas Gleixner
2017-09-13 21:29 ` [patch 44/52] x86/apic/msi: Force reactivation of interrupts at startup time Thomas Gleixner
2017-09-13 21:29 ` [patch 45/52] iommu/vt-d: Reevaluate vector configuration on activate() Thomas Gleixner
2017-09-13 21:29 ` [patch 46/52] iommu/amd: " Thomas Gleixner
2017-09-13 21:29 ` [patch 47/52] x86/io_apic: " Thomas Gleixner
2017-09-13 21:29 ` [patch 48/52] x86/vector: Handle managed interrupts proper Thomas Gleixner
2017-09-13 21:29 ` [patch 49/52] x86/vector/msi: Switch to global reservation mode Thomas Gleixner
2017-09-13 21:29 ` [patch 50/52] x86/vector: Switch IOAPIC " Thomas Gleixner
2017-09-13 21:29 ` [patch 51/52] x86/irq: Simplify hotplug vector accounting Thomas Gleixner
2017-09-13 21:29 ` [patch 52/52] x86/vector: Respect affinity mask in irq descriptor Thomas Gleixner
2017-09-14 11:21 ` [patch 00/52] x86: Rework the vector management Juergen Gross
2017-09-20 10:21   ` Paolo Bonzini
2017-09-19  9:12 ` Yu Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170919091228.GA8704@yu-chen.sh.intel.com \
    --to=yu.c.chen@intel.com \
    --cc=akataria@vmware.com \
    --cc=arjan@linux.intel.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=joro@8bytes.org \
    --cc=kys@microsoft.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=mingo@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=rostedt@goodmis.org \
    --cc=rui.zhang@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox