From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B23F7C3DA49 for ; Fri, 26 Jul 2024 13:15:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:In-Reply-To:MIME-Version:References:Message-ID:Subject:Cc:To: From:Date:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=MEiHh5gKYu2aAteljFklWJEd9UWqkmgr0EA7AbaB1O8=; b=MJIYJg8J++VfIoB6GhQ4TQQbD7 0P+usk9RCaR7rtOhCUHl5iIVx8/WaskxKyPoTtIHcDlWaE54x3RbKtKBc0xhU9xHUO2jpgcUeF1Lx Rbx8ENvCX2e3Bg+JfT7mW/SUjjFsPfSfE/HSnuQiDY/pLW9534dq+bxB2ESMCfV9vO99wEMgjKIk+ A9VgYk52rzRDgkFVm7LH3DazUBCMOIVmLzGdh9WDd7qA3bKOUAPUA1q6TQwpstJZx3v6uprLcZUcb a1BgB13FZO/fy0txRpbjjP/m189NXfhou/9KNpjmxNp4vg7Ayae7X59j8lsT6MWMcbMJXX1H9OjnP gU5rsm7Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXKma-00000003yvd-0qVD; Fri, 26 Jul 2024 13:14:52 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXKmA-00000003ypY-2KaR for linux-arm-kernel@lists.infradead.org; Fri, 26 Jul 2024 13:14:28 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721999665; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MEiHh5gKYu2aAteljFklWJEd9UWqkmgr0EA7AbaB1O8=; b=e49qRVCL2x0Bm0ZrKbAeJ1isLjVyZ22Ojq7UqXVGgrA9QIjiM25Zf461n6xLxBXyG4x6m9 4gPZLap8UzWUysl3f46hAnxQzys/gUGzC5H3A4p7XfwequXkc2eiPH8a8D7NGzgS3+d4Df jmtdSrVp9JZjoYVxwZG1pdkSQZLtaCk= Received: from mail-lj1-f200.google.com (mail-lj1-f200.google.com [209.85.208.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-586-v11e3dcBO_-k2OBR8Ywyag-1; Fri, 26 Jul 2024 09:14:24 -0400 X-MC-Unique: v11e3dcBO_-k2OBR8Ywyag-1 Received: by mail-lj1-f200.google.com with SMTP id 38308e7fff4ca-2ef2e57fb7cso12748981fa.1 for ; Fri, 26 Jul 2024 06:14:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721999662; x=1722604462; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MEiHh5gKYu2aAteljFklWJEd9UWqkmgr0EA7AbaB1O8=; b=v4h9wYy0T7w/3BM9Iw+MzCBCSH+tG1B0p3vPledkL+yO53MOwN51psg73uuM4oeKhu rU0NSotGbVVD91+eeMOql03F3Fedsn7zYa9H49Bmyi0BpF37HOKYo/yUMxr5NwPXHGiF 6mqpMPxMLkkYAIjl3w8dBOgmGicVLDokAC6XK6RBm/M5IBgthzgvjdaC8sz0IU3S/cD9 PynC8xGqawHgioHmkotJ5i5Xo+gfZtlgzZa4RMXUyMnWOf6SH9PzsE27UdqwKV4PoLrY Q+npCbnybj/IZbMLjj7zyouPJntiHgZM7eK9X32TLB+rnW2t1cG9GBNtSaK322NvgN2V s1sQ== X-Forwarded-Encrypted: i=1; AJvYcCWhP9VXqMMI3SDNzvcyplkH4VubbQNeMi4tMALLIBf7LR4v8n8usEczCspgqzyFhAC3XwT+ZsqKM5IlSSbqGkKF8u9ce4zVFd2WFMY34TOKIrAeTrI= X-Gm-Message-State: AOJu0YzjvclNupdNDLyb9Wjo1WqB9MoArfbcKbNQlJbflxSWisdshVF4 TJynEGDlPBzn/85S3A3vaYHLeYanRjizB3Aj7zwpbZwnt1fNm9FUDJOK69ykVixj4PMmsXmdkIa MR5LNJVMiu5Tk15DijfGz+vz4c/Se67vUetL581+gloKA990CMAzGjopF9MYAnZTW4mikaGrB X-Received: by 2002:a05:651c:481:b0:2ed:5c34:4082 with SMTP id 38308e7fff4ca-2f039c4e382mr38779091fa.8.1721999662258; Fri, 26 Jul 2024 06:14:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF53rKt2uVwvXPDaQM7yGiiu7wlme1aY9rCv4gUf+mdi8bDRhOJvxRm12JCuaLvhRxpT162Ig== X-Received: by 2002:a05:651c:481:b0:2ed:5c34:4082 with SMTP id 38308e7fff4ca-2f039c4e382mr38778391fa.8.1721999661205; Fri, 26 Jul 2024 06:14:21 -0700 (PDT) Received: from redhat.com ([2a02:14f:1f7:28ce:f21a:7e1e:6a9:f708]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4280573fd83sm79169445e9.19.2024.07.26.06.14.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Jul 2024 06:14:20 -0700 (PDT) Date: Fri, 26 Jul 2024 09:14:14 -0400 From: "Michael S. Tsirkin" To: David Woodhouse Cc: Richard Cochran , Peter Hilber , linux-kernel@vger.kernel.org, virtualization@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-rtc@vger.kernel.org, "Ridoux, Julien" , virtio-dev@lists.linux.dev, "Luu, Ryan" , "Chashper, David" , "Mohamed Abuelfotoh, Hazem" , "Christopher S . Hall" , Jason Wang , John Stultz , netdev@vger.kernel.org, Stephen Boyd , Thomas Gleixner , Xuan Zhuo , Marc Zyngier , Mark Rutland , Daniel Lezcano , Alessandro Zummo , Alexandre Belloni , qemu-devel , Simon Horman Subject: Re: [PATCH v2] ptp: Add vDSO-style vmclock support Message-ID: <20240726090538-mutt-send-email-mst@kernel.org> References: <7b3a2490d467560afd2fe08d4f28c4635919ec48.camel@infradead.org> MIME-Version: 1.0 In-Reply-To: <7b3a2490d467560afd2fe08d4f28c4635919ec48.camel@infradead.org> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240726_061426_712315_784E2A6C X-CRM114-Status: GOOD ( 49.02 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Jul 26, 2024 at 01:28:17PM +0100, David Woodhouse wrote: > diff --git a/include/uapi/linux/vmclock-abi.h b/include/uapi/linux/vmclock-abi.h > new file mode 100644 > index 000000000000..7b1b4759363c > --- /dev/null > +++ b/include/uapi/linux/vmclock-abi.h > @@ -0,0 +1,187 @@ > +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */ > + > +/* > + * This structure provides a vDSO-style clock to VM guests, exposing the > + * relationship (or lack thereof) between the CPU clock (TSC, timebase, arch > + * counter, etc.) and real time. It is designed to address the problem of > + * live migration, which other clock enlightenments do not. > + * > + * When a guest is live migrated, this affects the clock in two ways. > + * > + * First, even between identical hosts the actual frequency of the underlying > + * counter will change within the tolerances of its specification (typically > + * ±50PPM, or 4 seconds a day). This frequency also varies over time on the > + * same host, but can be tracked by NTP as it generally varies slowly. With > + * live migration there is a step change in the frequency, with no warning. > + * > + * Second, there may be a step change in the value of the counter itself, as > + * its accuracy is limited by the precision of the NTP synchronization on the > + * source and destination hosts. > + * > + * So any calibration (NTP, PTP, etc.) which the guest has done on the source > + * host before migration is invalid, and needs to be redone on the new host. > + * > + * In its most basic mode, this structure provides only an indication to the > + * guest that live migration has occurred. This allows the guest to know that > + * its clock is invalid and take remedial action. For applications that need > + * reliable accurate timestamps (e.g. distributed databases), the structure > + * can be mapped all the way to userspace. This allows the application to see > + * directly for itself that the clock is disrupted and take appropriate > + * action, even when using a vDSO-style method to get the time instead of a > + * system call. > + * > + * In its more advanced mode. this structure can also be used to expose the > + * precise relationship of the CPU counter to real time, as calibrated by the > + * host. This means that userspace applications can have accurate time > + * immediately after live migration, rather than having to pause operations > + * and wait for NTP to recover. This mode does, of course, rely on the > + * counter being reliable and consistent across CPUs. > + * > + * Note that this must be true UTC, never with smeared leap seconds. If a > + * guest wishes to construct a smeared clock, it can do so. Presenting a > + * smeared clock through this interface would be problematic because it > + * actually messes with the apparent counter *period*. A linear smearing > + * of 1 ms per second would effectively tweak the counter period by 1000PPM > + * at the start/end of the smearing period, while a sinusoidal smear would > + * basically be impossible to represent. > + * > + * This structure is offered with the intent that it be adopted into the > + * nascent virtio-rtc standard, as a virtio-rtc that does not address the live > + * migration problem seems a little less than fit for purpose. For that > + * reason, certain fields use precisely the same numeric definitions as in > + * the virtio-rtc proposal. The structure can also be exposed through an ACPI > + * device with the CID "VMCLOCK", modelled on the "VMGENID" device except for > + * the fact that it uses a real _CRS to convey the address of the structure > + * (which should be a full page, to allow for mapping directly to userspace). > + */ > + > +#ifndef __VMCLOCK_ABI_H__ > +#define __VMCLOCK_ABI_H__ > + > +#ifdef __KERNEL__ > +#include > +#else > +#include > +#endif > + > +struct vmclock_abi { > + /* CONSTANT FIELDS */ > + uint32_t magic; > +#define VMCLOCK_MAGIC 0x4b4c4356 /* "VCLK" */ > + uint32_t size; /* Size of region containing this structure */ > + uint16_t version; /* 1 */ > + uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except INVALID */ > +#define VMCLOCK_COUNTER_ARM_VCNT 0 > +#define VMCLOCK_COUNTER_X86_TSC 1 > +#define VMCLOCK_COUNTER_INVALID 0xff > + uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */ > +#define VMCLOCK_TIME_UTC 0 /* Since 1970-01-01 00:00:00z */ > +#define VMCLOCK_TIME_TAI 1 /* Since 1970-01-01 00:00:00z */ > +#define VMCLOCK_TIME_MONOTONIC 2 /* Since undefined epoch */ > +#define VMCLOCK_TIME_INVALID_SMEARED 3 /* Not supported */ > +#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED 4 /* Not supported */ > + > + /* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */ > + uint32_t seq_count; /* Low bit means an update is in progress */ > + /* > + * This field changes to another non-repeating value when the CPU > + * counter is disrupted, for example on live migration. This lets > + * the guest know that it should discard any calibration it has > + * performed of the counter against external sources (NTP/PTP/etc.). > + */ > + uint64_t disruption_marker; > + uint64_t flags; > + /* Indicates that the tai_offset_sec field is valid */ > +#define VMCLOCK_FLAG_TAI_OFFSET_VALID (1 << 0) > + /* > + * Optionally used to notify guests of pending maintenance events. > + * A guest which provides latency-sensitive services may wish to > + * remove itself from service if an event is coming up. Two flags > + * indicate the approximate imminence of the event. > + */ > +#define VMCLOCK_FLAG_DISRUPTION_SOON (1 << 1) /* About a day */ > +#define VMCLOCK_FLAG_DISRUPTION_IMMINENT (1 << 2) /* About an hour */ > +#define VMCLOCK_FLAG_PERIOD_ESTERROR_VALID (1 << 3) > +#define VMCLOCK_FLAG_PERIOD_MAXERROR_VALID (1 << 4) > +#define VMCLOCK_FLAG_TIME_ESTERROR_VALID (1 << 5) > +#define VMCLOCK_FLAG_TIME_MAXERROR_VALID (1 << 6) > + /* > + * If the MONOTONIC flag is set then (other than leap seconds) it is > + * guaranteed that the time calculated according this structure at > + * any given moment shall never appear to be later than the time > + * calculated via the structure at any *later* moment. > + * > + * In particular, a timestamp based on a counter reading taken > + * immediately after setting the low bit of seq_count (and the > + * associated memory barrier), using the previously-valid time and > + * period fields, shall never be later than a timestamp based on > + * a counter reading taken immediately before *clearing* the low > + * bit again after the update, using the about-to-be-valid fields. > + */ > +#define VMCLOCK_FLAG_TIME_MONOTONIC (1 << 7) > + > + uint8_t pad[2]; > + uint8_t clock_status; > +#define VMCLOCK_STATUS_UNKNOWN 0 > +#define VMCLOCK_STATUS_INITIALIZING 1 > +#define VMCLOCK_STATUS_SYNCHRONIZED 2 > +#define VMCLOCK_STATUS_FREERUNNING 3 > +#define VMCLOCK_STATUS_UNRELIABLE 4 > + > + /* > + * The time exposed through this device is never smeared. This field > + * corresponds to the 'subtype' field in virtio-rtc, which indicates > + * the smearing method. However in this case it provides a *hint* to > + * the guest operating system, such that *if* the guest OS wants to > + * provide its users with an alternative clock which does not follow > + * UTC, it may do so in a fashion consistent with the other systems > + * in the nearby environment. > + */ > + uint8_t leap_second_smearing_hint; /* Matches VIRTIO_RTC_SUBTYPE_xxx */ > +#define VMCLOCK_SMEARING_STRICT 0 > +#define VMCLOCK_SMEARING_NOON_LINEAR 1 > +#define VMCLOCK_SMEARING_UTC_SLS 2 > + int16_t tai_offset_sec; > + uint8_t leap_indicator; > + /* > + * This field is based on the the VIRTIO_RTC_LEAP_xxx values as > + * defined in the current draft of virtio-rtc, but since smearing > + * cannot be used with the shared memory device, some values are > + * not used. > + * > + * The _POST_POS and _POST_NEG values allow the guest to perform > + * its own smearing during the day or so after a leap second when > + * such smearing may need to continue being applied for a leap > + * second which is now theoretically "historical". > + */ > +#define VMCLOCK_LEAP_NONE 0x00 /* No known nearby leap second */ > +#define VMCLOCK_LEAP_PRE_POS 0x01 /* Positive leap second at EOM */ > +#define VMCLOCK_LEAP_PRE_NEG 0x02 /* Negative leap second at EOM */ > +#define VMCLOCK_LEAP_POS 0x03 /* Set during 23:59:60 second */ > +#define VMCLOCK_LEAP_POST_POS 0x04 > +#define VMCLOCK_LEAP_POST_NEG 0x05 > + > + /* Bit shift for counter_period_frac_sec and its error rate */ > + uint8_t counter_period_shift; > + /* > + * Paired values of counter and UTC at a given point in time. > + */ > + uint64_t counter_value; > + /* > + * Counter period, and error margin of same. The unit of these > + * fields is 1/2^(64 + counter_period_shift) of a second. > + */ > + uint64_t counter_period_frac_sec; > + uint64_t counter_period_esterror_rate_frac_sec; > + uint64_t counter_period_maxerror_rate_frac_sec; > + > + /* > + * Time according to time_type field above. > + */ > + uint64_t time_sec; /* Seconds since time_type epoch */ > + uint64_t time_frac_sec; /* Units of 1/2^64 of a second */ > + uint64_t time_esterror_nanosec; > + uint64_t time_maxerror_nanosec; > +}; > + > +#endif /* __VMCLOCK_ABI_H__ */ For purposes of virtio, should we label all the fields here __le? > -- > 2.44.0 > >