From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C427EC47077 for ; Thu, 11 Jan 2024 16:00:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 607946B0092; Thu, 11 Jan 2024 11:00:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B7636B0093; Thu, 11 Jan 2024 11:00:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 480306B0095; Thu, 11 Jan 2024 11:00:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 369C86B0092 for ; Thu, 11 Jan 2024 11:00:10 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id F377C8050D for ; Thu, 11 Jan 2024 16:00:09 +0000 (UTC) X-FDA: 81667491780.15.F2788E4 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf29.hostedemail.com (Postfix) with ESMTP id 20894120029 for ; Thu, 11 Jan 2024 16:00:07 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=LijIMia8; spf=pass (imf29.hostedemail.com: domain of 3hhCgZQYKCFsL73GC59HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--seanjc.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3hhCgZQYKCFsL73GC59HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704988808; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JxN7R377bwy0eVKj0G4Rz7CiNeL7SD47OvC/Yydz4N4=; b=k1muz6B6eTYoiQ7LcSJLxtJwMftn0AOLEHOS+hdtgq4qh4Yt5wpF3Lu5x/iRnot6h+cDQk vyhaLJNF1c+CZTcAOv1uuZ2DXlNvNjY0z7iF11p1vamI8lY6eUvPu9IL7cfG32jW4S0Twy c8OHu1ny7xIpsK0Mb9zYWkEIwECKc20= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704988808; a=rsa-sha256; cv=none; b=UWISAbI66mMMpDUtVWFCJmE7Z0QWdjNMOee06XM/K2YYQPJExTs4yOX/xdUVGV/VMGIfj4 LKJzYMjZSArfySP3z1VJPJG1m1c021IrvW+GKsPZruNtRL5HaO8ACvoW7/M99ST+S5i3yR qpzUgEY9CZeh6HCAO5cAde1+3vdn+xw= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=LijIMia8; spf=pass (imf29.hostedemail.com: domain of 3hhCgZQYKCFsL73GC59HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--seanjc.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3hhCgZQYKCFsL73GC59HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-5fa19331bf4so26067547b3.1 for ; Thu, 11 Jan 2024 08:00:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704988807; x=1705593607; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=JxN7R377bwy0eVKj0G4Rz7CiNeL7SD47OvC/Yydz4N4=; b=LijIMia8QLX5Hq7dIyDAa+ctd61MJH8QW/xv5V5a/ZWjv+Mq6cNPpbiPdL9neMnyLz 1FzI5fCsAcw0BnJN4uHsM13/aogPYPuyn3OZOP3MoteIP7ssJ7NHQKAkx1+9/ufYBDQ9 ilJjDqoZOnNmV9+2vxR7W6sInOE9DluGGa4cPyPmyqzmN6388o0R+k+yqRg4lCtjxpRr F2c6nWr1uyFsECIKYb1ic5Qoz4MXXsJn8359I+pn7XkN0LAfyzue1uS1kac/e4Wimknb cu39xMD6xJEkNi5MFhj3BlEYsQErg0Xm2bfSgVviSqbeQe4jZzTTeZ5aqZ99BzWfdYUZ 2vsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704988807; x=1705593607; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JxN7R377bwy0eVKj0G4Rz7CiNeL7SD47OvC/Yydz4N4=; b=JIvv22UBNZAApeJ6hjusJMAkCCrKQJRabayxVJSWyQeMCMeV4RWcPM3t9M1xZ7Z661 e2QAFBwm+l3edwCFXeIiH7QWJjvlwOvRGPXI4D+tjgVTOdMkY1T87RPxWaGpDds4+nbY GJ8tYqlvIElb0vfkJWVsi0btf1C2rRJb/Wkto0rCAO16U2gHiOK5jPEnOSmdUF57/OOJ lPImHTgjG45sBB7jGgAssQutANr0EygJ1DWqRWuhAuUS/42pj2fWjvqRIaXqTGjw+aaG 7hwHkliKvX8oGlJt+QXnn0stidsqHucZc2gtTaecWkqmzSaeAwzwmXxkrPmDxLm3COFJ 2byg== X-Gm-Message-State: AOJu0YwwBjfwwmCY1ULeVYHgb6I0h0Qcw6k2Q2RYOHoMXLohQCVGDr54 uLky8mTF6/cObpGhzXJvLaBNWdRXGj+vNNUoAw== X-Google-Smtp-Source: AGHT+IE30GMehPurCyiES1cREhJREwfrrR4fxbnpplJHm846Ao21KoBD2vg4J2OgkIxidWjcBwnCdxeO62M= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:72a:b0:dbf:4556:2c58 with SMTP id l10-20020a056902072a00b00dbf45562c58mr241806ybt.1.1704988806911; Thu, 11 Jan 2024 08:00:06 -0800 (PST) Date: Thu, 11 Jan 2024 08:00:05 -0800 In-Reply-To: <832697b9-3652-422d-a019-8c0574a188ac@proxmox.com> Mime-Version: 1.0 References: <832697b9-3652-422d-a019-8c0574a188ac@proxmox.com> Message-ID: Subject: Re: Temporary KVM guest hangs connected to KSM and NUMA balancer From: Sean Christopherson To: Friedrich Weber Cc: kvm@vger.kernel.org, Paolo Bonzini , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="us-ascii" X-Stat-Signature: er5j146wgom31jpqxi4gisiao7sshwbq X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 20894120029 X-Rspam-User: X-HE-Tag: 1704988807-689990 X-HE-Meta: U2FsdGVkX18aaHvo+Oq52oEMmnRRtq8ndLOgRE1DMWPOUzLjbwkXe405SgluJzeRiBkUAQtWVUB59kzgm6mrBmf4meSAy92/bamEzod4COQNSOMwbO+mVMKt+OTyrbkIn2T9bd7OkkX9hIdWbrEFHji8T2MMVA+CzeqYDM+UQOqLGm/unJffXJ/ncMqMxSzsXCFjo0ItUqGJez+Beb3u55AC6BcYyQbrwKjlyrOG+j/kgLCSIjdIhweWbKTHutklsvHlteZO1wMRBa+NhWcXn2MvbZ3zPKpqmR2dToQkFFq7Uq+k8wd/Ts7lmLGWzOsJkwoLvX0ijM/wxa9SSJFdszk3l7ofmGq2AvymXrzfTU379oSGCCSKe/MTnirtlCgl+GxoCG9o19cWrvHrufB46/5CvPDLEcry/jcQPMGu0ncrKee7HlVOOl6JGQT/MdGXgdIGx4EveDDjTn39o6YOv3LUidClewV1qiZ+Kl5WPHIe9bE3cfQ5JNuJGF9Idu63HxK7Y+VQzWKTUM/g2x1aZ2uXen09i2nVN7vxcbutMrCQLPgeSICVsjXZQWt0wV+l93lZCJtEFmA3LwB5vgncv8OWPRCbd+hr4v+A/c9P5FVMVAqAYRHMeoEDxHID+dc9A+jtFh2LQEXsHCKovuog6zst+igVkrgNcL9Meen262Fz7jVhaZ6yukc2035kh3OHsesDLRfAsZh6usvs0VuFVbgGD1UPhZEpwVQa8XqM/TnETfq5Rl5/+8ZJam1wRq0+iVk2nopnbOZkDTH8ay7FojyRmhETb+68mtorRUNR4YYb/uBkgX0KfhEjO1k5M2xP3ohV5qaiP0/L3IMIwbDrZfgTg1woYxIQ1FTXTpwzIw77ZnhgBUkwpX4H3dXRSEeQTCkvHBjmm/8YCtfBhrFPSkeGclZe5Suox1nm7DWHQmgdGRN3gfhHvdggacanW+AE8sa7+VywoHzqBDShB/W SQ1UhHxa OpwEoqZu0/bSjBTjtsmUx3HP1CISvcvVdpEXU/sP7uDP0nmuYcyirMVjMTOQ/JVgceFb4d8a0o0OU/2nly/A2qm2REJKfY64x/s/msFyBwKnARa5F/v+XXXfhOtLVaZQBOac8lJI31ZyhQ5jy8cOss77rlfGWCgrOMxyDb6BazcdvGVldoFljP1G60z/28hrktN7cdv/n2YIiUGawS6kWT1KPpQIyqF9AL4H9XYL45s92YyikQqTCxqWuEoU0mVNNJAop2gI+SOVcD1zvd9KigmPEyXMCqPq+sX7wENoIfzA7ihQXm8EJYMD01YVmymUX1Vfldh4GRkujgBPnuGdsa2jxymBPEbtem31ak0DDDcBMnPoVpGNOD96JZq2mK0w9Jrscic8qmkhaHy2VWI1IO9aBfwupNXY3gCQH7MISb3QOtlaHpLAtZy7WP/xxa9ArOunv4Y5Yo8K2A+vunl/DR9x5z5ORo8mGgsTCWYkrl08QsPG5oiTPAH1rDJNBm8uRAnLVITVl5sN4v0rXVgdnllfXJiqUbrYCzButr97AVLmNN8imsPELTcp1GAGnHDE0+zxTmuap7Tx+xsHvfC777u+oIzD7s8Woc6+Uy1tf0P+wa8+Xt2kdndSS72Nnwq5sOLfc3VflAtdnm5upNv4UaX/cXk2WcHyQKjKvMW0aq7z1+9r6cwYSS/8GZw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 04, 2024, Friedrich Weber wrote: > Hi, > > some of our (Proxmox VE) users have been reporting [1] that guests > occasionally become unresponsive with high CPU usage for some time > (varying between ~1 and more than 60 seconds). After that time, the > guests come back and continue running fine. Windows guests seem most > affected (not responding to pings during the hang, RDP sessions time > out). But we also got reports about Linux guests. This issue was not > present while we provided (host) kernel 5.15 and was first reported when > we rolled out a kernel based on 6.2. The reports seem to concern NUMA > hosts only. Users reported that the issue becomes easier to trigger the > more memory is assigned to the guests. Setting mitigations=off was > reported to alleviate (but not eliminate) the issue. The issue seems to > disappear after disabling KSM. > > We can reproduce the issue with a Windows guest on a NUMA host, though > only occasionally and not very reliably. Using a bpftrace script like > [7] we found the hangs to correlate with long-running invocations of > `task_numa_work` (more than 500ms), suggesting a connection to the NUMA > balancer. Indeed, we can't reproduce the issue after disabling the NUMA > balancer with `echo 0 > /proc/sys/kernel/numa_balancing` [2] and got a > user confirming this fixes the issue for them [3]. > > Since the Windows reproducer is not very stable, we tried to find a > Linux guest reproducer and have found one (described below [0]) that > triggers a very similar (hopefully the same) issue. The reproducer > triggers the hangs also if the host is on current Linux 6.7-rc8 > (610a9b8f). A kernel bisect points to the following as the commit > introducing the issue: > > f47e5bbb ("KVM: x86/mmu: Zap only TDP MMU leafs in zap range and > mmu_notifier unmap") > > which is why I cc'ed Sean and Paolo. Because of the possible KSM > connection I cc'ed Andrew and linux-mm. > > Indeed, on f47e5bbb~1 = a80ced6e ("KVM: SVM: fix panic on out-of-bounds > guest IRQ") the reproducer does not trigger the hang, and on f47e5bbb it > triggers the hang. > > Currently I don't know enough about the KVM/KSM/NUMA balancer code to > tell how the patch may trigger these issues. Any idea who we could ask > about this, or how we could further debug this would be greatly appreciated! This is a known issue. It's mostly a KVM bug[1][2] (fix posted[3]), but I suspect that a bug in the dynamic preemption model logic[4] is also contributing to the behavior by causing KVM to yield on preempt models where it really shouldn't. [1] https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com [2] https://lore.kernel.org/all/bug-218259-28872@https.bugzilla.kernel.org%2F [3] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com [4] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com