virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Dongli Zhang <dongli.zhang@oracle.com>, x86@kernel.org
Cc: mingo@redhat.com, Borislavbp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, joe.jin@oracle.com,
	linux-kernel@vger.kernel.org, virtualization@lists.linux.dev
Subject: Re: [PATCH 1/1] x86/vector: Fix vector leak during CPU offline
Date: Mon, 13 May 2024 14:44:15 +0200	[thread overview]
Message-ID: <87msotnaow.ffs@tglx> (raw)
In-Reply-To: <20240510190623.117031-1-dongli.zhang@oracle.com>

On Fri, May 10 2024 at 12:06, Dongli Zhang wrote:
> The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
> interrupt affinity reconfiguration via procfs. Instead, the change is
> deferred until the next instance of the interrupt being triggered on the
> original CPU.
>
> When the interrupt next triggers on the original CPU, the new affinity is
> enforced within __irq_move_irq(). A vector is allocated from the new CPU,
> but if the old vector on the original CPU remains online, it is not
> immediately reclaimed. Instead, apicd->move_in_progress is flagged, and the
> reclaiming process is delayed until the next trigger of the interrupt on
> the new CPU.
>
> Upon the subsequent triggering of the interrupt on the new CPU,
> irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
> remains online. Subsequently, the timer on the old CPU iterates over its
> vector_cleanup list, reclaiming vectors.
>
> However, if the old CPU is offline before the interrupt triggers again on
> the new CPU, irq_complete_move() simply resets both apicd->move_in_progress
> and apicd->prev_vector to 0. Consequently, the vector remains unreclaimed
> in vector_matrix, resulting in a CPU vector leak.

I doubt that.

Any interrupt which is affine to an outgoing CPU is migrated and
eventually pending moves are enforced:

cpu_down()
  ...
  cpu_disable_common()
    fixup_irqs()
      irq_migrate_all_off_this_cpu()
        migrate_one_irq()
          irq_force_complete_move()
            free_moved_vector();

No?

In fact irq_complete_move() should never see apicd->move_in_progress
with apicd->prev_cpu pointing to an offline CPU.

The CPU offline case in __vector_schedule_cleanup() should not even
exist or at least just emit a warning.

If you can trigger that case, then there is something fundamentally
wrong with the CPU hotplug interrupt migration code and that needs to be
investigated and fixed.

Thanks,

        tglx


  parent reply	other threads:[~2024-05-13 12:44 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-10 19:06 [PATCH 1/1] x86/vector: Fix vector leak during CPU offline Dongli Zhang
2024-05-10 19:48 ` Dave Hansen
2024-05-13 12:44 ` Thomas Gleixner [this message]
2024-05-13 17:43   ` Dongli Zhang
2024-05-13 22:46     ` Thomas Gleixner
2024-05-15 19:51       ` Dongli Zhang
2024-05-21 12:00         ` Thomas Gleixner
2024-05-22 21:44           ` Dongli Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87msotnaow.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=Borislavbp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dongli.zhang@oracle.com \
    --cc=hpa@zytor.com \
    --cc=joe.jin@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).