Re: [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Thomas Gleixner <tglx@kernel.org>
To: David Woodhouse <dwmw2@infradead.org>,
	Miroslav Lichvar <mlichvar@redhat.com>
Cc: "Richard Cochran" <richardcochran@gmail.com>,
	"Wen Gu" <guwen@linux.alibaba.com>,
	"Andrew Lunn" <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"John Stultz" <jstultz@google.com>,
	"Stephen Boyd" <sboyd@kernel.org>,
	"Anna-Maria Behnsen" <anna-maria@linutronix.de>,
	"Frederic Weisbecker" <frederic@kernel.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Thomas Weißschuh" <thomas.weissschuh@linutronix.de>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Julien Ridoux" <ridouxj@amazon.com>,
	"Ryan Luu" <rluu@amazon.com>,
	linux-kernel@vger.kernel.org,
	"Marcelo Tosatti" <mtosatti@redhat.com>
Subject: Re: [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock
Date: Sun, 24 May 2026 14:36:35 +0200	[thread overview]
Message-ID: <87se7ht25o.ffs@tglx> (raw)
In-Reply-To: <b5faa559e4282a3698021068e9d5f502540e3624.camel@infradead.org>

On Fri, May 22 2026 at 17:23, David Woodhouse wrote:
> On Fri, 2026-05-22 at 17:28 +0200, Thomas Gleixner wrote:
>> This raises an interesting question. Must any of the existing PTM using
>> drivers mplement that new extended getcrosstimestampattr() callback, in
>> order to expose the cycles/csid in attr or can you fallback to the
>> existing callback and have the rest of the fields 0?
>>
>> Same question arises if you change the pre/post timestamp helpers to
>> utilize ktime_get_snapshot_id(). All existing drivers which use them
>> will then automatically retrieve cs_cycles/cs_id.
>
> Taking those in reverse order... yes, this means that with a new
> variant of PTP_SYS_OFFSET_EXTENDED, userspace can see actual counter
> values even for the system parts of those ABA timestamps, even for non-
> PTM clocks, and discipline the TSC/archcounter against the external
> clock.

Correct.

> Currently I have userspace which literally does rdtsc() either side of
> calling the ioctl :)

Why am I not surprised? :)

> And PTM devices can be used with PTP_SYS_OFFSET_PRECISE, which goes
> through get_device_system_crosststamp() as described, and all just
> works? It's just that we now allow userspace to *see* the counter value
> that the driver was already generating.

A new variant of PRECISE

> So to your questions: although there's new userspace ioctl support, the
> *drivers* don't need any modification for that (as long as they use the
> standard prets/postts helpers).

Yes.

> The remaining question is the device timestamp part (the 'B' in the ABA
> sandwich) for PTP_SYS_OFFSET_EXTENDED with PTM-capable drivers. Should
> that get a counterval?

PTM-capable driver support cross timestamps, which will with a new
version of PTP_SYS_OFFSET_PRECISE expose the system counterval. No ABA
for that as it's hardware latched AB.

> I don't have a strong opinion. On one hand we'd have to find a way to
> convert it from PTM for devices where it actually *is* PTM, and that's
> what PTP_SYS_OFFSET_PRECISE is *for*.

Correct.

> But on the other hand, can't the conversion be a whole lot simpler than
> get_device_system_crosststamp() because it's not actually dealing with
> any timekeepers; it's basically only invoking convert_base_to_cs()?

But what for? If you have PTM, use PRECISE. There is _zero_ value of
having pre/post timestamps when the hardware already does the correlated
precise sampling, no?

> And the ioctl should support it *all* but just have a clear way of
> indicating that any of the optional fields including the attrs are
> *not* populated (or use 0/max values maybe?).

Yes.

> So no, I don't think any driver *has* to add any attr support in order
> to expose counter values to userspace. The only reason I asked Arthur
> to mix those things up was for the *userspace* API, to avoid adding yet
> another ioctl over and over again. And now I feel bad for doing so :)

I think you can create _one_ data structure variant, which fits both
EXTENDED_ATTR and PRECISE_ATTR:

struct attrs {
       u32	valid;
       u32	error_bound;
       ....
       u32	reserved[N];
};

@valid tells user space, which of the attributes has been filled in by
the driver. That avoids bounds based validity checks, which are a pain
as you might end up with different bounds for every attribute. Having a
valid flags field avoids that completely.

struct devtime {
        ptp_clock_time	device_time;
        struct attrs	attrs;
};

struct systime {
	u64	sys_systime;
        u64     sys_rawtime;
        u64     sys_counter;
        u32	sys_counter_id;
        u32	reserved;
};

Exposing @sys_counter_id requires to expose CSID_* in the user space ABI
reliably, as otherwise a kernel internal CSID enum change would blow up
the user space guess work. Your ptp_counter_id approach is error prone.

struct timestamp {
	union {
		struct systime		systime;
		struct systime		pre_systime;
	};
	struct devtime			devtime;
	struct systime			post_systime;
};

struct request {
	u32		valid;
	clockid_t	clock_id;
	unsigned int	num_samples;
        u32		reserved[N];
};

I rather have @valid here too. The 'zero the reserved' members approach
is a pain as new kernels have to map 0 to default behavior instead of
being free to make 0 mean what they intend. @valid allows you to use
other sizes than u32 for future fields. All you have to take care of is
to keep the existing fields at the same place as before.

struct ioctl_data {
	struct request		request;
        struct timestamp	timestamps[];
};

So for both PTP_SYS_OFFSET_EXTENDED_ATTRS and
PTP_SYS_OFFSET_PRECISE_ATTRS user space allocates enough space to
accomodate data::request::num_samples.

For PTP_SYS_OFFSET_PRECISE_ATTRS num_samples has to be 1 and
data::timestamps[0].post_systime is zeroed by the kernel because it has
no meaning.

So now in the kernel you do:

ptp_sys_offset_extended_attrs(struct ptp_clock *ptp, void __user *argptr)
{
        struct ioctl_data __user *data = argptr;
        struct request;

        if (copy_from_user(&request, &data->request, sizeof(request)))
        	return -EFAULT;

        if (!extattr_request_valid(request))        	
        	return -EINVAL;

        for (unsigned int i; i < request.num_samples; i++) {
        	struct ptp_system_timestamp sts = { .clock_id = request.clock_id, };
	        struct timestamp uts = { };
                struct timespec64 devts;

        	if (ptp->info->gettimex64_attr)
                	ret = ptp->info->gettimex64_attr(ptp->info, &dev_ts, &sts, &uts.attr);
                else if (ptp->info->gettimex64)
                	ret = ptp->info->gettimex64(ptp->info, &dev_ts, &sts);
                else
                	return -ENOTSUPP;

                if (ret)
                	return ret;

               uts.pre_systime = mangle(sts.pre_systime);
               uts.devtime.device_time = mangle(dev_ts);
               uts.post_systime = mangle(sts.post_systime);
               if (!copy_to_user(&data->timestamps[i], uts, sizeof(uts)))
               		return -EFAULT;
	}
        return 0;
}

ptp_sys_offset_precise_attrs(struct ptp_clock *ptp, void __user *argptr)
{
        struct ioctl_data __user *data = argptr;
        struct request;

        if (copy_from_user(&request, &data->request, sizeof(request)))
        	return -EFAULT;

        if (!preciseattr_request_valid(request))        	
        	return -EINVAL;

	struct system_device_crosststamp xtstamp = { .clock_id = request.clock_id, };
        struct timestamp uts = { };
        
        if (ptp->info->getcrosststamp_attr)
                ret = ptp->info->getcrosststamp_attr(ptp->info, &xtstamp, &uts.attr);
        else if (ptp->info->getcrosststamp)
              	ret = ptp->info->getcrosststamp(ptp->info, &xtstamp);
        else
              	return -ENOTSUPP;

        if (ret)
              	return ret;

        uts.systime = mangle(xtstamp.systime);
        uts.devtime.device_time = mangle(xtstamp.device);
        if (!copy_to_user(&data->timestamps[0], uts, sizeof(uts)))
        	return -EFAULT;
        return 0;
}

Or something like this, which immediately enables the functionality for
all drivers which implement the getcrosststamp() or the gettimex64()
callbacks with a unified user space data structure.

The attributes.valid bits are all zero and and once drivers implement
the _attr callback variants, those attributes supported by the driver
will magically appear with the corresponding valid bits set.

Hmm?

Thanks,

        tglx

next prev parent reply	other threads:[~2026-05-24 12:36 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-17 21:25 [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 1/8] timekeeping: Remove xtime_remainder from ntp_error accumulation David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 2/8] timekeeping: Account for clawback adjustment in ntp_error David Woodhouse
2026-05-19  1:59   ` John Stultz
2026-05-19 10:04     ` David Woodhouse
2026-05-19 19:28       ` John Stultz
2026-05-20 10:47         ` Miroslav Lichvar
2026-05-20 12:37           ` David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 3/8] timekeeping: Clamp time_offset delta to prevent infinite tail David Woodhouse
2026-05-19 13:25   ` Miroslav Lichvar
2026-05-19 13:31     ` David Woodhouse
2026-05-19 14:17       ` Miroslav Lichvar
2026-05-19 15:06         ` David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 4/8] timekeeping: Add absolute reference for feed-forward clock discipline David Woodhouse
2026-05-19  2:09   ` John Stultz
2026-05-19 11:07     ` David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 5/8] ptp_vmclock: Feed reference to timekeeping for feed-forward discipline David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 6/8] timekeeping: Guard against divide-by-zero in timekeeping_adjust David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 7/8] timekeeping: Drive time_offset skew via per-tick ntp_error transfer David Woodhouse
2026-05-17 21:25 ` [RFC PATCH v2 8/8] WIP: kernel/time: Add /dev/vmclock_host miscdev David Woodhouse
2026-05-19 13:16 ` [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock Miroslav Lichvar
2026-05-19 15:50   ` David Woodhouse
2026-05-20 10:39     ` Miroslav Lichvar
2026-05-20 12:21       ` David Woodhouse
2026-05-21  6:35         ` Miroslav Lichvar
2026-05-21  9:54           ` David Woodhouse
2026-05-25  8:08             ` Miroslav Lichvar
2026-05-25  9:14               ` David Woodhouse
2026-05-26  7:10                 ` Miroslav Lichvar
2026-05-26 10:00                   ` David Woodhouse
2026-05-27  7:46                     ` Miroslav Lichvar
2026-05-27 12:28                       ` David Woodhouse
2026-05-21 18:30         ` Thomas Gleixner
2026-05-21 21:06           ` David Woodhouse
2026-05-22  8:02             ` Thomas Gleixner
2026-05-22 10:01               ` David Woodhouse
2026-05-22 15:28                 ` Thomas Gleixner
2026-05-22 16:23                   ` David Woodhouse
2026-05-24 12:36                     ` Thomas Gleixner [this message]
2026-05-24 13:13                       ` David Woodhouse
2026-05-24 15:05                         ` Thomas Gleixner
2026-05-25  8:06                       ` Arthur Kiyanovski
2026-05-25  8:41                         ` David Woodhouse
2026-05-26 14:12                         ` Thomas Gleixner
2026-05-22 16:50                   ` David Woodhouse
2026-05-24 15:15                     ` Thomas Gleixner
2026-05-24 15:37                       ` Thomas Gleixner
2026-05-24 15:48                         ` Thomas Gleixner
2026-05-24 16:36                         ` Thomas Gleixner
2026-05-24 16:42                           ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87se7ht25o.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=anna-maria@linutronix.de \
    --cc=arnd@arndb.de \
    --cc=davem@davemloft.net \
    --cc=dwmw2@infradead.org \
    --cc=edumazet@google.com \
    --cc=frederic@kernel.org \
    --cc=guwen@linux.alibaba.com \
    --cc=jstultz@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlichvar@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richardcochran@gmail.com \
    --cc=ridouxj@amazon.com \
    --cc=rluu@amazon.com \
    --cc=sboyd@kernel.org \
    --cc=shuah@kernel.org \
    --cc=thomas.weissschuh@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.