From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAC4D3CF21F for ; Thu, 2 Jul 2026 22:33:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783031603; cv=none; b=XFCWQG/fk6B5EIKYqjKyVE9tavF/lUyuYLtqNuyoJv6T0Y0P4MzR3vZsE/g5QAI1CBctqFxLza5kcwpdzklqiMDBqfI/n/E3hxWOvOpgKQ9S4Hsj0bvx0Xavsbisip3G9t28WOHq3nhYLPJArx7F4+WwtGG7cN4m5CDtfg15GfI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783031603; c=relaxed/simple; bh=If0RLOUujKflD4TPO0bHjoQNMG3OOqX05cTdyJQ9Rpw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=iikgu7xPjBLcMUWsuvAqU1x1P+UOhmABF87dCub5zPrrVmRsJMfr1f5N44E5R4FvpUZs6V8tA5G+flIQkmr21psYdgWhkoTfyNkmuhC+7tPhzTqBO+7qI8JBMLfXBcqC2IliRpphwxVxmlZ8o0Pgrd7Yv+4mrhnjCUW/ZrqRClI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CqTOmnJJ; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=bTzqgePD; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CqTOmnJJ"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="bTzqgePD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1783031599; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Pi2bHpQyTkwt4bKDBeuu8K7oiC/67dbnM3I033MQlI0=; b=CqTOmnJJl3c3z26JZOXsq9Dfu8YGRahOBVyPZjnlMgjdMPqwWWOMJgSrMFXNdCCvD+NSe9 SWjYlbQqRKvZGRDXVhkAFsAv2QfBwcYeL5L9puf/4YazHkigeGR2i792f8ASTbfzF1DCu3 1myIhYgM5NBbE70q+SGc5ARuWYh02DY= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-623-WURPP6QjMN2fmSSY0xtOgA-1; Thu, 02 Jul 2026 18:33:18 -0400 X-MC-Unique: WURPP6QjMN2fmSSY0xtOgA-1 X-Mimecast-MFC-AGG-ID: WURPP6QjMN2fmSSY0xtOgA_1783031597 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-475eba52438so1995011f8f.3 for ; Thu, 02 Jul 2026 15:33:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1783031597; x=1783636397; darn=vger.kernel.org; h=in-reply-to:content-disposition:content-type:mime-version :references:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to:content-type; bh=Pi2bHpQyTkwt4bKDBeuu8K7oiC/67dbnM3I033MQlI0=; b=bTzqgePDElNSVj7wXO0kUgqe+ASM+WymB5klyZd22iCEHPQfKykJlKElRY4Rhnlkxg 3XohfwZf9lZazghd3bQMLSUJ2HG6Vx6qt12wClIVYqq6UZ1aRzmd5lR8fHSeurn82x/n LXaq71m/FyMdxnckZEn2kCWilAII0lLT5X0k322KpNlU1QWFnMLdw3Lgw216Q4CFpAbH 9qDCkME0olchDl1/V0ju7NMmbRg9Xm3RySe2WDPlOCpT5k9JFOi7zAvl7jCovpoBwfK9 3JUJHfCyvhMLVhUEqt5Yg+jJEiGvwncjutNpErX9NjH9omqWvo3Kibx+GR0QUHlClQS0 gX5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1783031597; x=1783636397; h=in-reply-to:content-disposition:content-type:mime-version :references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to :content-type; bh=Pi2bHpQyTkwt4bKDBeuu8K7oiC/67dbnM3I033MQlI0=; b=expzLmHJ+PegfdnHaSsCk9HJO7N3JkaCWhWOwEW6GMPhTH8bXQ/SO9DWAvGCn3jopN nVO1zyNyNzNP+wVFa5gBkjEdARVIOEKTM7lr47SHdIJWYvMk6mejRXfFfIWZPMrcnvPE eZaqGL2TOKK0eE7MCU2cmZmeBXyuoFFczhrH7QbjNkNHKUamtaXqqovn62gT0lxVmmgB xR9BDMV4S1uhAn7PfNga162R/G6OMkZAtAuCpen52PqR8ktai2Tjws5pNBdx8qG7weBq n06UogwFQiRETL8gu1WWeQJsA9cjKO/d0j3mfl0c0A+Xwgied6KD7I2Hzt5n87lN0RX8 scbw== X-Forwarded-Encrypted: i=1; AHgh+Rrv96JFsHHd/IMYqPxXG3C391+gjQtqxmM9uMpSAvR+02xYoHQoQGpQ8ugLCbXVYEGFptbMvBcts7NzQI8=@vger.kernel.org X-Gm-Message-State: AOJu0YxEASUrcWmUg6mblePC9lqyeCReR4Lhp7XPQAA1C2rnyAsTU2LJ s2KqM9MipaKKSRGmOlN2XgEccZNWwwWRGr7iseBPlQLiZW6a/f8DzRkX2WQtZoRhfiJM+jpWuOm Hy9meLXsqvEft6KtxJv7horhsbqJTZtulcc4Obd7v4Zd0XQTmKPtytNlGKq5w7D4wJQ== X-Gm-Gg: AfdE7cm3bCA6NXp1DuONRMpVOoSu+BwK5I38U4QxLClrUmel9+nTOaZ+ATLfUKhbi6X TCg+nQL1SfrmS8y2eUBwvocnL6/A3rAE0A3aDAMpl8Qywnp4RjuvcT2OQaR7BscU7JRHYQrbqZR Kv3/rBLKPyFRpSwdGabfqQeWz8dc5f9wBPxbH9fPj6ERVJTN12+YtAiIuxOTxuB6i44TEvkcWNc TwnkrVe9N+X1qcYQx5ONmRB0zoCHhU7ulZw8NcZ5jGT0qaWcKzTCbr5vsS6CEZ8b9V1JFcxhz12 UtfRR+tXEu0MtpghHSh0gpMiupfkjWHEVJPIFtpN7H5dXA/m5OabMAugTjovF9tcoYX0udIrVTX 3xKCm9iJ0P/hiU4WZ0t6Ei/ri+uiM3VkZ X-Received: by 2002:a05:6000:25ef:b0:474:fd7a:93f2 with SMTP id ffacd0b85a97d-477b39830c4mr10801336f8f.15.1783031596838; Thu, 02 Jul 2026 15:33:16 -0700 (PDT) X-Received: by 2002:a05:6000:25ef:b0:474:fd7a:93f2 with SMTP id ffacd0b85a97d-477b39830c4mr10801298f8f.15.1783031596202; Thu, 02 Jul 2026 15:33:16 -0700 (PDT) Received: from redhat.com (IGLD-80-230-68-31.inter.net.il. [80.230.68.31]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-477ddf0f433sm14683979f8f.32.2026.07.02.15.33.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Jul 2026 15:33:15 -0700 (PDT) Date: Thu, 2 Jul 2026 18:33:12 -0400 From: "Michael S. Tsirkin" To: "David Hildenbrand (Arm)" Cc: "Denis V. Lunev" , "Denis V. Lunev" , virtualization@lists.linux.dev, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 0/4] virtio_balloon: quiesce balloon work on device shutdown Message-ID: <20260702183236-mutt-send-email-mst@kernel.org> References: <20260624140846.2616797-1-den@openvz.org> <00ebb86d-f7b8-4ddc-a17c-c4700cab73a9@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <00ebb86d-f7b8-4ddc-a17c-c4700cab73a9@kernel.org> On Thu, Jul 02, 2026 at 09:30:26PM +0200, David Hildenbrand (Arm) wrote: > On 7/2/26 19:50, Denis V. Lunev wrote: > > On 6/24/26 16:08, Denis V. Lunev wrote: > >> This email originated from an IP that might not be authorized by the domain it was sent from. > >> Do not click links or open attachments unless it is an email you expected to receive. > >> Since commit 8bd2fa086a04 ("virtio: break and reset virtio devices on > >> device_shutdown()") the virtio bus breaks and resets every virtio device > >> during device_shutdown(), i.e. on reboot and kexec. virtio_balloon has no > >> .shutdown of its own, so that generic path runs while the balloon's > >> asynchronous work is still armed: the free page reporting worker, the > >> inflate/deflate and stats workers, the OOM notifier and the free page > >> shrinker. > >> > >> Once the device has been broken, virtqueue_add_inbuf() in > >> virtballoon_free_page_report() returns -EIO and trips its WARN_ON_ONCE(). > >> On a kernel booted with panic_on_warn that turns an ordinary reboot into a > >> fatal panic in the middle of device_shutdown(), so the machine never > >> reaches the new kernel. The inflate/deflate and OOM paths do not warn but > >> are no better off: they call wait_event(vb->acked, ...) and would block > >> forever on a queue that can no longer complete. > >> > >> This was hit in the field as an intermittent failure of a virtualization > >> cluster upgrade: guest storage nodes were rebooted via kexec into the new > >> kernel, and the ones whose free page reporting happened to run during > >> device_shutdown() panicked (the guests run with panic_on_warn) and never > >> came back, stalling the rolling upgrade. The crash dump showed the WARN at > >> virtio_balloon.c:216 in a page_reporting kworker, with all the balloon > >> virtqueues already broken. > >> > >> Validated by churning balloon inflate/deflate from the host while > >> kexec-rebooting the guest in a loop under panic_on_warn: the unpatched > >> kernel reproduces the WARN within a couple of cycles, while the patched > >> kernel survives many consecutive kexec cycles cleanly (12/12 in the final > >> run, 0 WARNs). checkpatch is clean across the series. > >> > >> Changes in v2: > >> - Add a virtio_device_shutdown() core helper and call it from the balloon > >> .shutdown handler instead of open-coding break + synchronize_cbs + reset > >> (David Hildenbrand). > >> - New patch: make tell_host() warn and bail instead of hanging if a buffer > >> add ever fails (David Hildenbrand); kept as a separate patch > >> (Michael S. Tsirkin). > >> > >> v1: https://lore.kernel.org/all/20260622133715.3707707-1-den@openvz.org > >> > >> Denis V. Lunev (4): > >> virtio: add virtio_device_shutdown() helper > >> virtio_balloon: factor out virtballoon_quiesce() > >> virtio_balloon: quiesce balloon work before device shutdown > >> virtio_balloon: warn on failed buffer add in tell_host() > >> > >> drivers/virtio/virtio.c | 41 ++++++++++++++++++++++----------- > >> drivers/virtio/virtio_balloon.c | 40 ++++++++++++++++++++++++-------- > >> include/linux/virtio.h | 1 + > >> 3 files changed, 59 insertions(+), 23 deletions(-) > >> > > Hi, David! > > Hi! :) > > > > > Is this good to go in? I have not seen the confirmation that the > > series is taken into your tree. > > I don't have a tree (yet), and once I have one it will likely be more mm focused :) > > @MST, I think this is good to go! > > -- > Cheers, > > David Indeed, it's just a bugfix and I'm trying to get features into qemu now before their freeze. I'll work on linux end of next week.