From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C62D13101D4 for ; Thu, 28 Aug 2025 13:00:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756386057; cv=none; b=URqFLE6I1CJzH4H6C82ycb9c1hWsNtVBc8l50iXyfOeMvUOrva8QBO/6AG6Gg46uSHyjbOZ7YM1yV+7LnLKFxyNPGWwQgTnp4qD3lQ5r3QXLv0WgK1wewYDyd0MTPLCQi2kY1vIaK+keoCrdZOxNCsLw9vsTgrOvx6/IGIr4VYY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756386057; c=relaxed/simple; bh=jKUgrkKGxwjF/OmmVzVGTm7SLGw0brqesEGq3+BW44I=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: In-Reply-To:Content-Type:Content-Disposition; b=mjSuxe43BNehGT1Z6+21LtlE9u+r8DCtDQveSvJgxMHUIJhn3XrwsrutOjaP6cfbxXGC5ESIk4paj/W+CCOMrDCMkHaaXBUvA1e5HAwocdpHRi2yM6gIcnyNdA/swQi8yES/evfixIvwJiwLbRZP5cOue1C9mfR7JRfOJV3JDGE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WZnpnioz; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WZnpnioz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1756386054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=AbvuB6Au0zhNqXABCgDm8J3gFB0IHbInVCTp8q39tsI=; b=WZnpniozP0NCcsCap6fWm1C0G+7IilBA3nGTNy/oZAOMbByzSy9QUsnBHCDBoREDIlRK3X V64RIt7HMOy06Tyskj43g7k0G2Td9QBvCN9EqNrkoAx8BjLd9TLknycjt/hBWJ8bL62bR1 V+ljj9ET9uCmvwB/Hwcal2Prxk6S7uc= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-175-F-Lm-MPrOVC_9zuwDTRwvg-1; Thu, 28 Aug 2025 09:00:52 -0400 X-MC-Unique: F-Lm-MPrOVC_9zuwDTRwvg-1 X-Mimecast-MFC-AGG-ID: F-Lm-MPrOVC_9zuwDTRwvg_1756386051 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-3cc3765679fso352085f8f.2 for ; Thu, 28 Aug 2025 06:00:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756386051; x=1756990851; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=AbvuB6Au0zhNqXABCgDm8J3gFB0IHbInVCTp8q39tsI=; b=oD13hyyGFDjS5ALzJaqIUz5DXvdXuPQO6d5stDuhfEGRLGb+IUASBJf6pLgwBLsE0J ENzfBSz8I/V3bryKYpJ0wAT6P07IOj4ZNu4Xk1vnnLdvBQDx2d/hqqEIl4159Ue1Tds2 hL2oCeN1cU0nrGN/cYeRBsL8kdVvq5OUbCimXdgcFstoSTMSqGkSKLrSOHlqYyvbtOlS wFsDD5MT7Hcmywbw8tmSIfpDNcIaIEG2CSralF1WkBfOzoCd28zBqq+pJxtsThMMl5EH KOK5qvtFrS59/8iQ/p571NBK+a9ogxGlKEcF/fkKT/MGRsJToXupHY0tt8F8CnGgW1aL 90YA== X-Forwarded-Encrypted: i=1; AJvYcCWVpWy53MgH1JTYBdM3fp+WTsKssgLCRpO+wH5NNXByZjbFDUkXeoVrbZPNRR5e2+GZ4U6SesVB+FO/cPNz0w==@lists.linux.dev X-Gm-Message-State: AOJu0YyirQv41otl4uraJ+R+FgROZF5Sfq1e2o7dT2haeHF2LmD0AZH+ A0ByDrlRkQYfrq8QJ9MPk9McJId6wTYX+hNOKpeeDWS1PxeIHos+0C46dFCh6CBZXBPUcV+eDiM SYQz+3Jv/xAvECOR1shH8z88WvSkKEIxBeQwIAHhsYNbNs0gxj2Il2rbcT3zQK9vJcdgE X-Gm-Gg: ASbGnctrGwaJemiIw1oo/EqHXvcxI8c9p+JrVSXvVLfPVa7JqGecdrZhJOLvWsORJBO E0H7LA1zrP4fHWewzCSukXyGHqFKYRDjDmUTim56FtlneVdntHZAT4JkQ3xs06g3IZTaGsrMA58 B0/Y/xvZztOUZcHHuaaXB0OVrj0NDGyViLwquukfyV9erth36XkFt/F6H1q/zf19RVMRSFujjO5 Uo6Ums0tt05hPWajpw+ydhPyMDAlyF/s1t91bG/7UHvlhc9Rl4ZtuWvl9LYqD+u74Vl0oQxpXfz q5Al5fKqpdSVKpq9T8z0+GXLo6g8uER9 X-Received: by 2002:a05:6000:2386:b0:3cb:46fc:8ea2 with SMTP id ffacd0b85a97d-3cb46fc90eamr6842049f8f.6.1756386050882; Thu, 28 Aug 2025 06:00:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEFwS7Qliz/HxgGv2dXJpTk5nUZcNUDdFuQrU+BPOwYcdT0gczDl4R2JQKuapyawZNMQWWyqw== X-Received: by 2002:a05:6000:2386:b0:3cb:46fc:8ea2 with SMTP id ffacd0b85a97d-3cb46fc90eamr6842013f8f.6.1756386050171; Thu, 28 Aug 2025 06:00:50 -0700 (PDT) Received: from redhat.com ([2a0d:6fc0:1515:7300:62e6:253a:2a96:5e3]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3cd5997658fsm4839515f8f.46.2025.08.28.06.00.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Aug 2025 06:00:49 -0700 (PDT) Date: Thu, 28 Aug 2025 09:00:46 -0400 From: "Michael S. Tsirkin" To: Parav Pandit Cc: Cornelia Huck , "virtualization@lists.linux.dev" , "jasowang@redhat.com" , "stefanha@redhat.com" , "pbonzini@redhat.com" , "xuanzhuo@linux.alibaba.com" , "stable@vger.kernel.org" , Max Gurtovoy , "NBU-Contact-Li Rongqing (EXTERNAL)" , "linux-s390@vger.kernel.org" Subject: Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device" Message-ID: <20250828085526-mutt-send-email-mst@kernel.org> References: <20250824102947-mutt-send-email-mst@kernel.org> <20250827061537-mutt-send-email-mst@kernel.org> <87frdddmni.fsf@redhat.com> <87cy8fej4z.fsf@redhat.com> <20250828081717-mutt-send-email-mst@kernel.org> <87a53jeiv6.fsf@redhat.com> Precedence: bulk X-Mailing-List: virtualization@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: lKJBRCsQG0K1Q7g6emDFBhajL8P68W01uWbjOFrlPdA_1756386051 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Aug 28, 2025 at 12:33:58PM +0000, Parav Pandit wrote: > > > From: Cornelia Huck > > Sent: 28 August 2025 05:52 PM > > > > On Thu, Aug 28 2025, "Michael S. Tsirkin" wrote: > > > > > On Thu, Aug 28, 2025 at 02:16:28PM +0200, Cornelia Huck wrote: > > >> On Thu, Aug 28 2025, Parav Pandit wrote: > > >> > > >> >> From: Cornelia Huck > > >> >> Sent: 27 August 2025 05:04 PM > > >> >> > > >> >> On Wed, Aug 27 2025, "Michael S. Tsirkin" wrote: > > >> >> > > >> >> > On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote: > > >> >> >> > What I do not understand, is what good does the revert do. Sorry. > > >> >> >> > > > >> >> >> Let me explain. > > >> >> >> It prevents the issue of vblk requests being stuck due to broken VQ. > > >> >> >> It prevents the vnet driver start_xmit() to be not stuck on skb > > completions. > > >> >> > > > >> >> > This is the part I don't get. In what scenario, before > > >> >> > 43bb40c5b9265 start_xmit is not stuck, but after 43bb40c5b9265 it is > > stuck? > > >> >> > > > >> >> > Once the device is gone, it is not using any buffers at all. > > >> >> > > >> >> What I also don't understand: virtio-ccw does exactly the same > > >> >> thing (virtio_break_device(), added in 2014), and it supports > > >> >> surprise removal _only_, yet I don't remember seeing bug reports? > > >> > > > >> > I suspect that stress testing may not have happened for ccw with active > > vblk Ios and outstanding transmit pkt and cvq commands. > > >> > Hard to say as we don't have ccw hw or systems. > > >> > > >> cc:ing linux-s390 list. I'd be surprised if nobody ever tested > > >> surprise removal on a loaded system in the last 11 years. > > > > > > > > > As it became very clear from follow up discussion, the issue is > > > nothing to do with virtio, it is with a broken hypervisor that allows > > > device to DMA into guest memory while also telling the guest that the > > > device has been removed. > > > > > > I guess s390 is just not broken like this. > > > > Ah good, I missed that -- that indeed sounds broken, and needs to be fixed > > there. > Nop. This is not the issue. You missed this focused on fixing the device. > > The fact is: the driver is expecting the IOs and CVQ commands and DMA to succeed even after device is removed. > The driver is expecting the device reset to also succeed. > Stefan already pointed out this in the vblk driver patches. > This is why you see call traces on del_gendisk(), CVQ commands. > > Again, it is the broken drivers not the device. > Device can stop the DMA and stop responding to the requests and kernel 6.X will continue to hang as long as it has cited commit. Parav, the issues you cite are real but unrelated and will hang anyway with or without the commit. All you have to do is pull out the device while e.g. a command is in the process of being submitted. All the commit you want to revert does, is in some instances instead of just hanging it will make queue as broken and release memory. Since you device is not really gone and keeps DMAing into memory, guest memory gets corrupted. But your argument that the issue is that the fix is "incomplete" is bogus - when we make the fix complete it will become even worse for this broken devices. -- MST