From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95857358B2 for ; Wed, 8 Nov 2023 18:23:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Hf2WbI+u"; dkim=pass (1024-bit key) header.d=oracle.onmicrosoft.com header.i=@oracle.onmicrosoft.com header.b="h+h1rU7b" Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 410D81FF5; Wed, 8 Nov 2023 10:23:55 -0800 (PST) Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A8GEn6B003416; Wed, 8 Nov 2023 18:23:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=corp-2023-03-30; bh=zUfcsfjEd0+8+7vIt/8nwvwI/KA8q5RlU+yPP+o57ls=; b=Hf2WbI+uCrH/VhlRDwdszZCdPMe9JXuTdUpeznzmMuJDNpXm+U3M24XwbL/vBZB5IAgC zszT0pkyCx3kKA6F8eJBpbFoWfpgDNusrAPKLQbB6/p1GRvCbT7e/FIsDhMwPpqAkNcC Fs6QhmNJv16X9BkZnWlPEHZpUwlQmecDrPE7l2lCd3xSdG677j+GqRf+1tCjzhjW8moS bL1Lffil8NdvCF8y08wwjtAZ8bRPS1XciR/+0F5WcNirdDVn2yyj6WD7SXg9nOafBeBU KUUdCfmk4IJcYpRa/pamLCC/etEXM2dGe0ysWj3hQL6NZ2ipJC9L4O3ccoCgIeNos78v 1A== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3u7w23j7cy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Nov 2023 18:23:25 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 3A8HXWTq004090; Wed, 8 Nov 2023 18:23:24 GMT Received: from nam04-dm6-obe.outbound.protection.outlook.com (mail-dm6nam04lp2040.outbound.protection.outlook.com [104.47.73.40]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3u7w1x5xse-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 08 Nov 2023 18:23:24 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=oDdrSGDfiTxc744z44Alk9nzquSoi5c2B7LI8tnt/8gMrOLD4wytyCfyL/B0pCunfAlTPSBHMyNfyqcb9n7PauLvQbvZTypAhlkMnCBulTFapFRruIi+p+pNEnI/TSVZsnI3YfLehFDXARehc5MhBqLjxhAkDjvHAvHBHd65oIK9QMZIekCSMPUyU2XO0SQd1gZx8zCNTc6Q98vIT3wZKIlV0Rjg+VMH1P/ua5ArjaTQELO6TcsvZkJ/+1FVQVB1Oty+BSRwKFcwPULEHJsN8vWt2iNd1jjdV7d22JZ6KC9IgCetOsAHEFCcOzusEKgQaw5oKYzbJqyiDDz6I3x4Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zUfcsfjEd0+8+7vIt/8nwvwI/KA8q5RlU+yPP+o57ls=; b=LSjr7xllqumwOWfvkrBvFeG7JWCWKERn5xV45GhusgANck78y+F3dl8irCgvPYGDlmuJ2gq1zp85dmtlnXfix1/ZTrpqNNd2ncqeLf0NJ75eNFQOw/1VyltUTHKzlZmdi0CJXUSEfpC9waIXVUUU8O6AqBL4jRy6MzwOgSrQNVV9srDA/LJgYWUEP/WjxJifXydDyfraFw6EXLPyUYgIgRCEh63DfnMwLGmA2juDv4yPyuZFFgbuSbAzmz8pGyUJBuBrnbEd8chX7tgXB5YxSyJeKDbIehD+ZY4HeGEGaU623/BlJiypW0Oa4U17FhO51i6Cu+QRkAgCixa2vPDAXQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zUfcsfjEd0+8+7vIt/8nwvwI/KA8q5RlU+yPP+o57ls=; b=h+h1rU7bmbKeov2gaKQRVhrDH5QAneUoqvSg8ww44T4i3yIgGcV7dPrO0vFQ2lLxm09vJSjWcTMedMhsFeUh+iSOH0PN+DbHX5i4ci6o0IQczmvQ4dVbQDlFbNfCTgVf+tHYHsoLsxhYmo++vCw/LEHfDBQohjXUH3bfNaQWxiM= Received: from BYAPR10MB2663.namprd10.prod.outlook.com (2603:10b6:a02:a9::20) by DM4PR10MB5968.namprd10.prod.outlook.com (2603:10b6:8:aa::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.18; Wed, 8 Nov 2023 18:22:55 +0000 Received: from BYAPR10MB2663.namprd10.prod.outlook.com ([fe80::dec8:8ef8:62b0:7777]) by BYAPR10MB2663.namprd10.prod.outlook.com ([fe80::dec8:8ef8:62b0:7777%4]) with mapi id 15.20.6954.029; Wed, 8 Nov 2023 18:22:55 +0000 Message-ID: Date: Wed, 8 Nov 2023 10:22:51 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH v2] KVM: x86/xen: improve accuracy of Xen timers To: David Woodhouse , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Paul Durrant , Sean Christopherson , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" References: <96da7273adfff2a346de9a4a27ce064f6fe0d0a1.camel@infradead.org> <74f32bfae7243a78d0e74b1ba3a2d1ea4a4a7518.camel@infradead.org> <2bd5d543-08a0-a0f6-0f59-b8724a2d8d75@oracle.com> <12e8ade22fe6c1e6bec74e60e8213302a7da635e.camel@infradead.org> <19f8de0a-17f7-1a25-f2e9-adbf00ecb035@oracle.com> <37225cb2ab45c842275c2b5b5d84d1bb514a8640.camel@infradead.org> <33907a83-4e1a-f121-74f3-bde1e68b047c@oracle.com> <5e0598c86361570674401f43191c3f819a6b32d2.camel@infradead.org> Content-Language: en-US From: Dongli Zhang In-Reply-To: <5e0598c86361570674401f43191c3f819a6b32d2.camel@infradead.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-ClientProxiedBy: MN2PR10CA0009.namprd10.prod.outlook.com (2603:10b6:208:120::22) To BYAPR10MB2663.namprd10.prod.outlook.com (2603:10b6:a02:a9::20) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BYAPR10MB2663:EE_|DM4PR10MB5968:EE_ X-MS-Office365-Filtering-Correlation-Id: 7d527827-fcda-4780-0e5d-08dbe087ba40 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Xzya4cCK2ZbNccFgZ9e70e4GSPZeD5Ho8MYhQiciP9zWThyd7S0EkWFXif3g/Xd15CCuHzyH0Ijytm6b6H7HCz/HGsK5pFKRm3Ua/Co4tsH7uRyn9+6sCWyjz2ljXnlGktx+6I8S79QbX1tksxwMVeZ+mcVOhZK7tgzWRft/AY6hgR/zB5GqIQ3SysZhbjMcN0SuKQYmqsTD3skCU9JDwJd9ojOKrMqYEKlpdFQWi/36q7lUwOKA7Zq9E4qeGTPyMtdeblz8inAcYyCCwEFmr6qkGYXisH2NW0LH8LBxv5sQfjluSLR8YDh52AGFKZz6C6n1qXN5+VGiToPpNzOc0jA2RuDqSjbvw/EKgfsjZIztM5j42/2x00GgYdU5XFsOgEJz/TIXGRvnyxB8bbwErQlpXnf3ra0RwdJLye1+QzgA9ev7f5PHzfnLGurK4D9W8pjZdJGBhuWpqrn9EgkDYJJGLx21GbJD56iCSf/ZdTFHD4YpDBeEb8CP7zOZCEkY+fMUpq2nxlYaWK01fJpMmowZeVN65UDoChNpf5q0oqXxv3oQKQ7eX5dCtIPCNPGN2sXjAE0ZRAPrMRfEnyMXdeFn2wAg/j4Rq8Fe4lSzSqG/8Nlcfr0BYSD2eG18BHEpcLa1NF1twESb7ZiEYDLMYQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BYAPR10MB2663.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(376002)(136003)(396003)(366004)(39860400002)(346002)(230922051799003)(451199024)(1800799009)(64100799003)(186009)(7416002)(26005)(2616005)(83380400001)(86362001)(31696002)(6506007)(6666004)(36756003)(6512007)(478600001)(53546011)(2906002)(6486002)(966005)(38100700002)(31686004)(316002)(66946007)(54906003)(66556008)(66476007)(44832011)(5660300002)(8936002)(41300700001)(4326008)(8676002)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?ZENWY2k5NXFiNm9oVE1tU3l0dHpKM1VWdGVGb0tuYTN5bnZMcEFJZThSeXdE?= =?utf-8?B?UTlQNHluS2ZSR3FtS25Ra081Ni9hc2dxK1NpajRkZ092eGhEWmgwREpqVGor?= =?utf-8?B?UU1IelVxYkhJNlNmYXRLSVQwb0NzenR1QktCdE1yNnViQVQ1SEd4aXdjZCtq?= =?utf-8?B?R1FpRHdXN2RpaHljVDhNRlg3eHZPU2hKOGRSSGdidkRqVjhHdjEwUzNMSm1H?= =?utf-8?B?ODVoS3loVDRpeDF6bkFSeWhSbWlldWpHT1Vtc2hoeVQrTzdGOU9ObXlEMDlx?= =?utf-8?B?bkE5ZDZ5K1hta2NlUWd0YlhoMzlycEdha0pnc0ZMN3dtUU4xKytQN2xMQlpm?= =?utf-8?B?SFlUZlRwQmswdUpDYUVCekRZOGEvejBZZktuWjQ4WVV2emlrN2tzMWI5Qlk0?= =?utf-8?B?ZnFJQ0pmZzg1WnVhZy94OEJ5UmJGYTZDQ0MwMWNQMXVsV2hyRk5qN0VrMmky?= =?utf-8?B?MHdoei9VenAvb1IyRnU5Wi9Zb1VKelVqeEZUb01rQ0x1L0lyVG92QlFVZTBp?= =?utf-8?B?azNaSHcyQ1JKdGViSmJzYUt4aTdPZFVEdG83aFdVY2p1RENhZEZZVGpBRzJC?= =?utf-8?B?QnBnbWdaUFdtR3dqQ3RVZGMwd0Fwd242aUw2N1dRWkt0RmJkaktEL3JPQ0la?= =?utf-8?B?QkhHTlExVHhwbElOV1NQNUVxZENoamc0KytMWmgva0wwUmhhSitKNWU2ZldO?= =?utf-8?B?V3BRVnM2TzJHWEEzTlNFN3RlUzkyaE9kMzM2bjZ3bDVqVXNRdWZ5NE1ablhJ?= =?utf-8?B?d2oxeFY0bFNsUE1NMkhYZm8xb2xZV3ZHQktOY28vVXhteHBWd2FEVU9WZ3Rn?= =?utf-8?B?d2ZGSmNkRktWZEFpRXloRWJVK0M5UWxPeDRURUNKM2c4T0ZLV0RSdTZiU0d5?= =?utf-8?B?VW8yZktuaEF5VmsvYUpHRHIwWW42amp2M0N3R01YM2RQbnFtdGtaeklqSDlU?= =?utf-8?B?dTdTUHRzaGwvN1V4SXZxZkt5cy9CeTd4WTJZNUNzZDlUb3p0dSs2ZHBCQWFx?= =?utf-8?B?aTBQV2NBRXFzdVZYVSs4d0FOb3FhU0RoUktMQmdSUGNsMmNWZFB2cUFLQ1VS?= =?utf-8?B?bGpWZGk4K3BiL1VjeklOdmFEZWdvLytBSEpmNmt4TTNPUlhneTZqYTViQVdD?= =?utf-8?B?bEdGaWNxWURmWGlWQVJMQU1ZQmNlVHZOR3hPRlREMHhvNDNxTFhTRlVPNFo3?= =?utf-8?B?ajUvcm5QdlpheXhOYXdwdHkwSmFla0dscUJURHdWMjRZY1hGdlZxUm1GSDdo?= =?utf-8?B?RjkzbktReWFIMS9SUllZaWhkWkJ5Q2JSMkFKbXZLZ1ZrT2pIL1VIZHZlT0pF?= =?utf-8?B?UGlyTHJ2RjRNQnFiWGQ1U005Tm5qWTJQNHFORlYxS29uWHpvaEYzeVdrL1p2?= =?utf-8?B?MFF4c2RPQ3h6eEJmTS9FaTB3cDJ0RmpLRkhrTVBvektMVjF0a0FIcWc0VEM5?= =?utf-8?B?RjlGd0hrMHphaDJ6MkNienJSRzFrMVlsVGFuNGRkaGp1aFQ1TEFKVGM1WUR1?= =?utf-8?B?emFrN2dWY29mTVBOb0VWenhoZ09YMUFlb1YrK2FhMmEyZEZaM1NDWjNIMmFn?= =?utf-8?B?T09TMmVBOVdDeU1HWVBMK3BzQjhVOFZrM09Tc1NuVk9UMHZSQ1BlZ1I2Si9B?= =?utf-8?B?cG9sQytEOGlVZ1BmUDNlWmhEVmJMUW50MEs4cmh1M2pJNSszbDBtTXMrZ1Uy?= =?utf-8?B?bzJSMGE1UTRuY0NtZURiRk1EWlZud2hraDN2T2NhTW9pWkgyOUZGTXlBdm9i?= =?utf-8?B?ZnFsZlhMcG53enNFVUx2SEdFNWQ3QllyRW5nR3BSRHA5eG9XWHBXMlhsb251?= =?utf-8?B?OFI2eld4TEFmb2FBaXZnbktmVTV2bmdQWHpibTBhbUJhT3VudWNXWTU0S3cx?= =?utf-8?B?dTBqaVVXOG80Z3FkeHJLaysyQ1Qvdks1U0s4VURCZTJ3NFJLRGdITk9XT3dT?= =?utf-8?B?OFRxZnFmNm95MENTSzhUQTh0STFrdEVHU0IwZnVRN296ejc3bm9PWDBOWTdP?= =?utf-8?B?ejB6VEhRallJbnFkU0RNWHNNNGswMkNpa3NqVDJGSnQ5REl5LzZWdTRmbnJX?= =?utf-8?B?S0JwUWJkaTFNWHpGd3dzeHdlZzhYdm1RWTNrSU5qYkpsN1BZcnZPV0xYdCtk?= =?utf-8?Q?6foF0ehOH11Bpo108ed/v/NJl?= X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: =?utf-8?B?eE13MWk4UTFVL2M1a0ZGM2NncDFoWU1kSDF0MGg3a053QVhsZiszT0Y2NGJ2?= =?utf-8?B?SVJkeG84RDJlWU92K2xJUlk0dXNhVHdpdjFiRDF0eTFkU0I0aUhxUDlSTFNX?= =?utf-8?B?d0cyNjBjMUlxQlBlQ3NBWDdDV084WGd2bVJVSlhKVjNjblp0L3RoaUlJVlJV?= =?utf-8?B?UTVQK29XNTYzNGlUWUxuWUduV3o3VGhqaWtqUXJBRGt5dkMzWU1KbXM0cGMw?= =?utf-8?B?UE1ib0FQd1VFNjhvdHg0SGRnNTd0M2hpK0hqanpJRGo4dmlSNG1Oa3V3K1JY?= =?utf-8?B?Y2xGT1loQzh3ZU52VGQvc0NEN3JRMnM2OVVpTXF4Nnk1TmlXWENiOHRpTUFr?= =?utf-8?B?c2NKWndwTkVkUCtpME5oM1pIK3ZydkYxRFpjYnBpdUVXeFc3c0lKdEowU2VP?= =?utf-8?B?am5FYWV1Mnc5bDVEcDZqaHg0V3FZZ3J4Y1JiT2p5OFlwUnMvalpFT3BWdlND?= =?utf-8?B?TzdQcWRxNlRDVVBPb0gzT3UwYnZGQlRsazVjQTRUb2VYcmpUOXdYMU4vVFcw?= =?utf-8?B?Sm9zZmsySjcwU2VzZ2hHS2UyazQzVHhGQ3FlTm5DMkJjV3RWdnN6d0YwOXBz?= =?utf-8?B?Y04wWXBIYnk3VGFteVVwWFRoTHFOVU1hbXYzMnNiUW1WQ2JQTFdzV3VkRjNk?= =?utf-8?B?ZkxxejYvTGZWSXdacHduVHBuSC9hZ2VsTkdGdnYvSGY3ZHhDR3dZdEdLdkhl?= =?utf-8?B?SEdVWDBTZ0xOenU2aEFKK1pWOGFYMkthTmljdGVpK2crUVFmcEZCQ3lHTTZ6?= =?utf-8?B?Y3FxcFhtdHRzeWtKSVIvbEx6Q3BFVlVJS1VPcnFNbmhXckdNdlJuUlNsazd2?= =?utf-8?B?TFZ3a3lQMzM1Zkg4UmNSdkx2aWozbDBFL2l0NGo3bXBuY1NHYjBTVjVFdUkr?= =?utf-8?B?SXBuRXRvMHpoOHNGVC8vVHJvUS9QSWNZMldSQ1VEemJuZTRPSXc0Sm9KVFFW?= =?utf-8?B?K3FPR044SzNQMzBRSmhOMjY0K0JodWRTYWJVRmpQNDRpRmVQVk0xOFNsM1Ry?= =?utf-8?B?OWRVZWtQc0FYekh3Y2FjWHhWeHROcldTZzI2akkzcXNlb2VEK0xXemRYNFp5?= =?utf-8?B?ZVRyT3RRNC9rVTdaOWZwVWQyaGpnZkxXaDM3WTM4VWw1YkJMZW9LT005WStJ?= =?utf-8?B?aXZ3QjBSUUNsWUsxZVRFQm4zREhESGZrZjJRTUsxR2g4SWVQRDExVTg2STMz?= =?utf-8?B?aGY0ZEduZldUYTNSMkVmUkxwQlpMSzMxd3c2U08wNGNNOTc3RW91UVdGVG8r?= =?utf-8?B?MHlhWTcyK1QycTA2M3ZObHdkdWllSVFWT21GTmtiUE5WUThTbW83SGRESERG?= =?utf-8?B?T0xnRUp2dmxtTVFUeG81YlBrZ1VJaVVWdmJTR3JYSVY5Rks1WnF4VTBnL2ZX?= =?utf-8?Q?oTbr0PhNMV6p5VgUn0P6Kor+WBLWJfU4=3D?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7d527827-fcda-4780-0e5d-08dbe087ba40 X-MS-Exchange-CrossTenant-AuthSource: BYAPR10MB2663.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Nov 2023 18:22:55.5261 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: X0uRGTop5rQwugnP5LH0bWQdKZvM+3LOdzo+9xw8kf64fczX8b/4KoreyjBdii0rcnXz6S6LwVjYWH+LDeuQSg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR10MB5968 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-08_07,2023-11-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxlogscore=999 mlxscore=0 adultscore=0 phishscore=0 suspectscore=0 spamscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311080152 X-Proofpoint-GUID: Flzcu0szY6DE4UvdL8tbjGo76yoYZGL_ X-Proofpoint-ORIG-GUID: Flzcu0szY6DE4UvdL8tbjGo76yoYZGL_ Hi David, On 11/8/23 07:25, David Woodhouse wrote: > On Tue, 2023-11-07 at 17:43 -0800, Dongli Zhang wrote: >> Hi David, >> >> On 11/7/23 15:24, David Woodhouse wrote: >>> On Tue, 2023-11-07 at 15:07 -0800, Dongli Zhang wrote: >>>> Thank you very much for the detailed explanation. >>>> >>>> I agree it is important to resolve the "now" problem. I guess the KVM lapic >>>> deadline timer has the "now" problem as well. >>> >>> I think so. And quite gratuitously so, since it just does: >>> >>>         now = ktime_get(); >>>         guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); >>> >>> >>> Couldn't that trivially be changed to kvm_get_monotonic_and_clockread()? >> >> The core idea is to always capture the pair of (tsc, ns) at exactly the same >> time point. >> >> I have no idea how much accuracy it can improve, considering the extra costs to >> inject the timer interrupt into the vCPU. > > Right. It's probably in the noise most of the time, unless you're > unlucky enough to get preempted between the two TSC reads which are > supposed to be happening "at the same time". >> > >>> I conveniently ducked this question in my patch by only supporting the >>> CONSTANT_TSC case, and not the case where we happen to know the >>> (potentially different) TSC frequencies on all the different pCPUs and >>> vCPUs. >> >> This is also my question that why to support only the CONSTANT_TSC case. >> >> For the lapic timer case: >> >> The timer is always calculated based on the *current* vCPU's tsc virtualization, >> regardless CONSTANT_TSC or not. >> >> For the xen timer case: >> >> Why not always calculate the expire based on the *current* vCPU's time >> virtualization? That is, why not always use the current vCPU's hv_clock, >> regardless CONSTANT_TSC/masteclock? > > The simple answer is because I wasn't sure it would work correctly in > all cases, and didn't *care* enough about the non-CONSTANT_TSC case to > prove it to myself. > > Let's think about it... > > In the non-CONSTANT_TSC case, each physical CPU can have a different > TSC frequency, yes? And KVM has a cpufreq notifier which triggers when > the TSC changes, and make a KVM_REQ_CLOCK_UPDATE request to any vCPU > running on the affected pCPU. With an associated IPI to ensure the vCPU > exits guest mode and will processes the update before executing any > further guest code. > > If a vCPU had *previously* been running on the affected pCPU but wasn't > running when the notifier happened, then kvm_arch_vcpu_load() will make > a KVM_REQ_GLOBAL_CLOCK_UPDATE request, which will immediately update > the vCPU in question, and then trigger a deferred KVM_REQ_CLOCK_UPDATE > for the others. > > So the vCPU itself, in guest mode, is always going to see *correct* > pvclock information corresponding to the pCPU it is running on at the > time. > > (I *believe* the way this works is that when a vCPU runs on a pCPU > which has a TSC frequency lower than the vCPU should have, it runs in > 'always catchup' mode. Where the TSC offset is bumped *every* time the > vCPU enters guest mode, so the TSC is about right on every entry, might > seem to run a little slow if the vCPU does a tight loop of rdtsc, but > will catch up again on next vmexit/entry?) > > But we aren't talking about the vCPU running in guest mode. The code in > kvm_xen_start_timer() and in start_sw_tscdeadline() is running in the > host kernel. How can we be sure that it's running on the *same* > physical CPU that the vCPU was previously running on, and thus how can > we be sure that the vcpu->arch.hv_clock is valid with respect to a > rdtsc on the current pCPU? I don't know that we can know that. > > As far as I can tell, the code in start_sw_tscdeadline() makes no > attempt to do the 'catchup' thing, and just converts the pCPU's TSC to > guest TSC using kvm_read_l1_tsc() — which uses a multiplier that's set > once and *never* recalculated when the host TSC frequency changes. > > On the whole, now I *have* thought about it, I'm even more convinced I > was right in the first place that I didn't want to know :) > > I think I stand by my original decision that the Xen timer code in the > non-CONSTANT_TSC case can just use get_kvmclock_ns(). The "now" problem > is going to be in the noise if the TSC isn't constant anyway, and we > need to fix the drift and jumps of get_kvmclock_ns() *anyway* rather > than adding a temporary special case for the Xen timers. > >> That is: kvm lapic method with kvm_get_monotonic_and_clockread(). >> >>> >>> >>>> >>>> E.g., according to the KVM lapic deadline timer, all values are based on (1) the >>>> tsc value, (2)on the current vCPU. >>>> >>>> >>>> 1949 static void start_sw_tscdeadline(struct kvm_lapic *apic) >>>> 1950 { >>>> 1951         struct kvm_timer *ktimer = &apic->lapic_timer; >>>> 1952         u64 guest_tsc, tscdeadline = ktimer->tscdeadline; >>>> 1953         u64 ns = 0; >>>> 1954         ktime_t expire; >>>> 1955         struct kvm_vcpu *vcpu = apic->vcpu; >>>> 1956         unsigned long this_tsc_khz = vcpu->arch.virtual_tsc_khz; >>>> 1957         unsigned long flags; >>>> 1958         ktime_t now; >>>> 1959 >>>> 1960         if (unlikely(!tscdeadline || !this_tsc_khz)) >>>> 1961                 return; >>>> 1962 >>>> 1963         local_irq_save(flags); >>>> 1964 >>>> 1965         now = ktime_get(); >>>> 1966         guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); >>>> 1967 >>>> 1968         ns = (tscdeadline - guest_tsc) * 1000000ULL; >>>> 1969         do_div(ns, this_tsc_khz); >>>> >>>> >>>> Sorry if I make the question very confusing. The core question is: where and >>>> from which clocksource the abs nanosecond value is from? What will happen if the >>>> Xen VM uses HPET as clocksource, while xen timer as clock event? >>> >>> If the guest uses HPET as clocksource and Xen timer as clockevents, >>> then keeping itself in sync is the *guest's* problem. The Xen timer is >>> defined in terms of nanoseconds since guest start, as provided in the >>> pvclock information described above. Hope that helps! >>> >> >> The "in terms of nanoseconds since guest start" refers to *one* global value. >> Should we use wallclock when we are referring to a global value shared by all vCPUs? >> >> >> Based on the following piece of code, I do not think we may assume all vCPUs >> have the same pvclock at the same time point: line 104-108, when >> PVCLOCK_TSC_STABLE_BIT is not set. >> > > The *result* of calculating the pvclock should be the same on all vCPUs > at any given moment in time. > > The precise *calculation* may differ, depending on the frequency of the > TSC for that particular vCPU and the last time the pvclock information > was created for that vCPU. > > >> >>  67 static __always_inline >>  68 u64 __pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, bool dowd) >>  69 { >>  70         unsigned version; >>  71         u64 ret; >>  72         u64 last; >>  73         u8 flags; >>  74 >>  75         do { >>  76                 version = pvclock_read_begin(src); >>  77                 ret = __pvclock_read_cycles(src, rdtsc_ordered()); >>  78                 flags = src->flags; >>  79         } while (pvclock_read_retry(src, version)); >> ... ... >> 104         last = raw_atomic64_read(&last_value); >> 105         do { >> 106                 if (ret <= last) >> 107                         return last; >> 108         } while (!raw_atomic64_try_cmpxchg(&last_value, &last, ret)); >> 109 >> 110         return ret; >> 111 } >> >> >> That's why I appreciate a definition of the abs nanoseconds used by the xen >> timer (e.g., derived from pvclock). If it is per-vCPU, we may not use it for a >> global "in terms of nanoseconds since guest start", when PVCLOCK_TSC_STABLE_BIT >> is not set. > > It is only per-vCPU if the vCPUs have *different* TSC frequencies. > That's because of the scaling; the guest calculates the nanoseconds > from the *guest* TSC of course, scaled according to the pvclock > information given to the guest by KVM. > > As discussed and demonstrated by http://david.woodhou.se/tsdrift.c , if > KVM scales directly to nanoseconds from the *host* TSC at its known > frequency, that introduces a systemic drift between what the guest > calculates, and what KVM calculates — even in the CONSTANT_TSC case. > > How do we reconcile the two? Well, it makes no sense for the definition > of the pvclock to be something that the guest *cannot* calculate, so > obviously KVM must do the same calculations the guest does; scale to > the guest TSC (kvm_read_l1_tsc()) and then apply the same pvclock > information from vcpu->arch.hvclock to get the nanoseconds. > > In the sane world where the guest vCPUs all have the *same* TSC > frequency, that's fine. The kvmclock isn't *really* per-vCPU because > they're all the same. > > If the VMM sets up different vCPUs to have different TSC frequencies > then yes, their kvmclock will drift slightly apart over time. That > might be the *one* case where I will accept that the guest pvclock > might ever change, even in the CONSTANT_TSC environment (without host > suspend or any other nonsense). > > In that patch I started typing on Monday and *still* haven't got round > to finishing because other things keep catching fire, I'm using the > *KVM-wide* guest TSC frequency as the definition for the kvmclock. > > Thank you very much for the explanation. I understand you may use different methods to obtain the 'expire' under different cases. Maybe add some comments in the KVM code of xen timer emulation? E.g.: - When the TSC is reliable, follow the standard/protocol that xen timer is per-vCPU pvclock based: that is, to always scale host_tsc with kvm_read_l1_tsc(). - However, sometimes TSC is not reliable. Use the legacy method get_kvmclock_ns(). This may help developers understand the standard/protocol used by xen timer. The core idea will be: the implementation is trying to following the xen timer nanoseconds definition (per-vCPU pvclock), and it may use other legacy solution under special case, in order to improve the accuracy. TBH, I never think about what the definition of nanosecond is in xen timer (even I used to and I am still working on some xen issue). Thank you very much! Dongli Zhang