From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 002A5C00140 for ; Fri, 12 Aug 2022 07:27:41 +0000 (UTC) Received: from localhost ([::1]:49124 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMP52-0002Eg-PI for qemu-devel@archiver.kernel.org; Fri, 12 Aug 2022 03:27:40 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47148) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oMP40-0001Wj-EH for qemu-devel@nongnu.org; Fri, 12 Aug 2022 03:26:36 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:1776 helo=mx0a-001b2d01.pphosted.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oMP3x-0003sY-UV for qemu-devel@nongnu.org; Fri, 12 Aug 2022 03:26:36 -0400 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27C7QHLC012209; Fri, 12 Aug 2022 07:26:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : in-reply-to : references : content-type : content-transfer-encoding : mime-version; s=pp1; bh=JhPG11mD8tx6VeMjmOLwuteI/m6c3zz92siRESpniQo=; b=l6zjtv4i9kQIsW+i4FZ5IppC7Hb3r57DCleA4qE/3Waht7fN7toZRlfbN82XAQ8iMeUp nZdCkgIFCdO1w/1qGxVWqvp3AAjW1sX6QuxFg5WCMcZnYFsOnpncrnSgfqGQv4Kd5p9i yXL6v6++A1sjZ8dch1K7rYDv5aLTpKf2F5xYn/dswY8a2bJ49v+E+q7rSuP9t3yL/PMS IKXKeeViVwTf7R8jYSrXaj2StMySDezq2ESRV3dKajyw03YuDg4iBAqsVi7Mm37K9aC+ aBRUn+LOuULbLoZjV+HXzuZh8Z6UcXl728itssJUMOxZXY+YJajTAaBhScUWUecqKjmX aA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3hwjfnr07s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 12 Aug 2022 07:26:32 +0000 Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 27C7QKdH012399; Fri, 12 Aug 2022 07:26:31 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3hwjfnr073-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 12 Aug 2022 07:26:31 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 27C7Laic021123; Fri, 12 Aug 2022 07:26:29 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma03ams.nl.ibm.com with ESMTP id 3huwvg2wpj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 12 Aug 2022 07:26:29 +0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 27C7QhUL25625022 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 12 Aug 2022 07:26:44 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4B2204C046; Fri, 12 Aug 2022 07:26:26 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 841E44C040; Fri, 12 Aug 2022 07:26:25 +0000 (GMT) Received: from p-imbrenda (unknown [9.145.3.179]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 12 Aug 2022 07:26:25 +0000 (GMT) Date: Fri, 12 Aug 2022 09:26:23 +0200 From: Claudio Imbrenda To: Murilo Opsfelder =?UTF-8?B?QXJhw7pqbw==?= Cc: "Daniel P. =?UTF-8?B?QmVycmFuZ8Op?=" , pbonzini@redhat.com, qemu-devel@nongnu.org, david@redhat.com, cohuck@redhat.com, thuth@redhat.com, borntraeger@de.ibm.com, frankja@linux.ibm.com, fiuczy@linux.ibm.com, pasic@linux.ibm.com, alex.bennee@linaro.org, armbru@redhat.com Subject: Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux Message-ID: <20220812092623.19058f32@p-imbrenda> In-Reply-To: <42b6bfa1-1983-b065-6b0d-9b5d89465f9b@linux.ibm.com> References: <20220809064024.15259-1-imbrenda@linux.ibm.com> <20220811155623.25f0d4b4@p-imbrenda> <42b6bfa1-1983-b065-6b0d-9b5d89465f9b@linux.ibm.com> Organization: IBM X-Mailer: Claws Mail 4.1.0 (GTK 3.24.34; x86_64-redhat-linux-gnu) Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: tZsgunEK-bkHIS4KprCPVthOuewbGoJO X-Proofpoint-GUID: fhOZlO6JHn1YudCUPDHF0KiFCgWBSQun Content-Transfer-Encoding: quoted-printable X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-12_04,2022-08-11_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 spamscore=0 mlxlogscore=904 clxscore=1015 priorityscore=1501 adultscore=0 impostorscore=0 malwarescore=0 phishscore=0 bulkscore=0 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208120018 Received-SPF: pass client-ip=148.163.158.5; envelope-from=imbrenda@linux.ibm.com; helo=mx0a-001b2d01.pphosted.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Thu, 11 Aug 2022 23:05:52 -0300 Murilo Opsfelder Ara=C3=BAjo wrote: > On 8/11/22 11:02, Daniel P. Berrang=C3=A9 wrote: > [...] > >>> Hmm, I was hoping you could just use SIGKILL to guarantee that this > >>> gets killed off. Is SIGKILL delivered too soon to allow for the > >>> main QEMU process to have exited quickly ?=20=20 > >> > >> yes, I tried. qemu has not finished exiting when the signal is > >> delivered, the cleanup process dies before qemu, which defeats the > >> purpose=20=20 > > > > Ok, too bad. > >=20=20 > >>> If so I wonder what happens when systemd just delivers SIGKILL to > >>> all processes in the cgroup - I'm not sure there's a guarantee it > >>> will SIGKILL the main qemu before it SIGKILLs this helper=20=20 > >> > >> I'm afraid in that case there is no guarantee. > >> > >> for what it's worth, both virsh shutdown and destroy seem to do things > >> properly.=20=20 > > > > Hmm, probably because libvirt tells QEMU to exit before systemd comes > > along and tells everything in the cgroup to die with SIGKILL.=20=20 >=20 > It seems Libvirt sends SIGKILL if qemu process doesn't terminate within 10 > seconds after Libvirt sent SIGTERM: >=20 > https://gitlab.com/libvirt/libvirt/-/blob/0615df084ec9996b5df88d6a1b59c55= 7e22f3a12/src/util/virprocess.c#L375 but this is fine. with asynchronous teardown, qemu will exit almost immediately when receiving SIGTERM, and the cleanup process will start cleaning up. >=20 > So I guess this patch happened to work with Libvirt because the main qemu > process terminated before the timeout and before SIGKILL was delivered. it seems so >=20 > The cleanup process is trying to solve the problem where the main qemu pr= ocess > takes too long to terminate. However, if the cleanup process itself takes= too > long, SIGKILL will be sent by Libvirt anyway. but that is not a problem, the sole purpose of the cleanup process is to terminate _after_ qemu. it doesn't matter what happens after qemu has terminated. if you look at the patch, after going to great lengths to assure that qemu has terminated, all the child process does is _exit(0).=20 >=20 > Perhaps we can describe this situation in the parameter help, e.g.: If > management layer decides to send SIGKILL (e.g.: due to timeout or deliber= ate > decision), the cleanup process can exit before the main process, deceivin= g its > purpose. if the management layer (or the user) decides to send SIGKILL immediately to the whole cgroup without sending SIGTERM first, then this whole asynchronous teardown mechanism is defeated, yes.