netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/2] ptp_vclock fixes
@ 2025-06-13 17:47 Vladimir Oltean
  2025-06-13 17:47 ` [PATCH net 1/2] ptp: fix breakage after ptp_vclock_in_use() rework Vladimir Oltean
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Vladimir Oltean @ 2025-06-13 17:47 UTC (permalink / raw)
  To: netdev
  Cc: Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Jeongjun Park, Yangbo Lu

Hello,

While I was intending to test something else related to PTP in net-next
I noticed any time I would run ptp4l on an interface, the kernel would
print "ptp: physical clock is free running" and ptp4l would exit with an
error code.

I then found Jeongjun Park's patch and subsequent explanation provided
to Jakub's question, specifically related to the code which introduced
the breakage I am seeing.
https://lore.kernel.org/netdev/CAO9qdTEjQ5414un7Yw604paECF=6etVKSDSnYmZzZ6Pg3LurXw@mail.gmail.com/

I had to look at the original issue that prompted Jeongjun Park's patch,
and provide an alternative fix for it. Patch 1/2 in this set contains a
logical revert plus the alternative fix, squashed into one.

Patch 2/2 fixes another issue which was confusing during debugging/
characterization, namely: "ok, the kernel clearly thinks that any
physical clock is free-running after this change (despite there being no
vclocks), but why would ptp4l fail to create the clock altogether? Why
not just fail to adjust it?"

By reverting (locally) Jeongjun Park's commit, I could reproduce
the reported lockdep splat using the commands from patch 1/2's commit
message, and this goes away with the reworked implementation.

Vladimir Oltean (2):
  ptp: fix breakage after ptp_vclock_in_use() rework
  ptp: allow reading of currently dialed frequency to succeed on
    free-running clocks

 drivers/ptp/ptp_clock.c   |  3 ++-
 drivers/ptp/ptp_private.h | 22 +++++++++++++++++++++-
 2 files changed, 23 insertions(+), 2 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net 1/2] ptp: fix breakage after ptp_vclock_in_use() rework
  2025-06-13 17:47 [PATCH net 0/2] ptp_vclock fixes Vladimir Oltean
@ 2025-06-13 17:47 ` Vladimir Oltean
  2025-06-15 15:34   ` Jeongjun Park
  2025-06-13 17:47 ` [PATCH net 2/2] ptp: allow reading of currently dialed frequency to succeed on free-running clocks Vladimir Oltean
  2025-06-17 23:30 ` [PATCH net 0/2] ptp_vclock fixes patchwork-bot+netdevbpf
  2 siblings, 1 reply; 7+ messages in thread
From: Vladimir Oltean @ 2025-06-13 17:47 UTC (permalink / raw)
  To: netdev
  Cc: Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Jeongjun Park, Yangbo Lu

What is broken
--------------

ptp4l, and any other application which calls clock_adjtime() on a
physical clock, is greeted with error -EBUSY after commit 87f7ce260a3c
("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()").

Explanation for the breakage
----------------------------

The blamed commit was based on the false assumption that
ptp_vclock_in_use() callers already test for n_vclocks prior to calling
this function.

This is notably incorrect for the code path below, in which there is, in
fact, no n_vclocks test:

ptp_clock_adjtime()
-> ptp_clock_freerun()
   -> ptp_vclock_in_use()

The result is that any clock adjustment on any physical clock is now
impossible. This is _despite_ there not being any vclock over this
physical clock.

$ ptp4l -i eno0 -2 -P -m
ptp4l[58.425]: selected /dev/ptp0 as PTP clock
[   58.429749] ptp: physical clock is free running
ptp4l[58.431]: Failed to open /dev/ptp0: Device or resource busy
failed to create a clock
$ cat /sys/class/ptp/ptp0/n_vclocks
0

The patch makes the ptp_vclock_in_use() function say "if it's not a
virtual clock, then this physical clock does have virtual clocks on
top".

Then ptp_clock_freerun() uses this information to say "this physical
clock has virtual clocks on top, so it must stay free-running".

Then ptp_clock_adjtime() uses this information to say "well, if this
physical clock has to be free-running, I can't do it, return -EBUSY".

Simply put, ptp_vclock_in_use() cannot be simplified so as to remove the
test whether vclocks are in use.

What did the blamed commit intend to fix
----------------------------------------

The blamed commit presents a lockdep warning stating "possible recursive
locking detected", with the n_vclocks_store() and ptp_clock_unregister()
functions involved.

The recursive locking seems this:
n_vclocks_store()
-> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 1
-> device_for_each_child_reverse(..., unregister_vclock)
   -> unregister_vclock()
      -> ptp_vclock_unregister()
         -> ptp_clock_unregister()
            -> ptp_vclock_in_use()
               -> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 2

The issue can be triggered by creating and then deleting vclocks:
$ echo 2 > /sys/class/ptp/ptp0/n_vclocks
$ echo 0 > /sys/class/ptp/ptp0/n_vclocks

But note that in the original stack trace, the address of the first lock
is different from the address of the second lock. This is because at
step 1 marked above, &ptp->n_vclocks_mux is the lock of the parent
(physical) PTP clock, and at step 2, the lock is of the child (virtual)
PTP clock. They are different locks of different devices.

In this situation there is no real deadlock, the lockdep warning is
caused by the fact that the mutexes have the same lock class on both the
parent and the child. Functionally it is fine.

Proposed alternative solution
-----------------------------

We must reintroduce the body of ptp_vclock_in_use() mostly as it was
structured prior to the blamed commit, but avoid the lockdep warning.

Based on the fact that vclocks cannot be nested on top of one another
(ptp_is_attribute_visible() hides n_vclocks for virtual clocks), we
already know that ptp->n_vclocks is zero for a virtual clock. And
ptp->is_virtual_clock is a runtime invariant, established at
ptp_clock_register() time and never changed. There is no need to
serialize on any mutex in order to read ptp->is_virtual_clock, and we
take advantage of that by moving it outside the lock.

Thus, virtual clocks do not need to acquire &ptp->n_vclocks_mux at
all, and step 2 in the code walkthrough above can simply go away.
We can simply return false to the question "ptp_vclock_in_use(a virtual
clock)".

Other notes
-----------

Releasing &ptp->n_vclocks_mux before ptp_vclock_in_use() returns
execution seems racy, because the returned value can become stale as
soon as the function returns and before the return value is used (i.e.
n_vclocks_store() can run any time). The locking requirement should
somehow be transferred to the caller, to ensure a longer life time for
the returned value, but this seems out of scope for this severe bug fix.

Because we are also fixing up the logic from the original commit, there
is another Fixes: tag for that.

Fixes: 87f7ce260a3c ("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()")
Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/ptp/ptp_private.h | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/ptp/ptp_private.h b/drivers/ptp/ptp_private.h
index 528d86a33f37..a6aad743c282 100644
--- a/drivers/ptp/ptp_private.h
+++ b/drivers/ptp/ptp_private.h
@@ -98,7 +98,27 @@ static inline int queue_cnt(const struct timestamp_event_queue *q)
 /* Check if ptp virtual clock is in use */
 static inline bool ptp_vclock_in_use(struct ptp_clock *ptp)
 {
-	return !ptp->is_virtual_clock;
+	bool in_use = false;
+
+	/* Virtual clocks can't be stacked on top of virtual clocks.
+	 * Avoid acquiring the n_vclocks_mux on virtual clocks, to allow this
+	 * function to be called from code paths where the n_vclocks_mux of the
+	 * parent physical clock is already held. Functionally that's not an
+	 * issue, but lockdep would complain, because they have the same lock
+	 * class.
+	 */
+	if (ptp->is_virtual_clock)
+		return false;
+
+	if (mutex_lock_interruptible(&ptp->n_vclocks_mux))
+		return true;
+
+	if (ptp->n_vclocks)
+		in_use = true;
+
+	mutex_unlock(&ptp->n_vclocks_mux);
+
+	return in_use;
 }
 
 /* Check if ptp clock shall be free running */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 2/2] ptp: allow reading of currently dialed frequency to succeed on free-running clocks
  2025-06-13 17:47 [PATCH net 0/2] ptp_vclock fixes Vladimir Oltean
  2025-06-13 17:47 ` [PATCH net 1/2] ptp: fix breakage after ptp_vclock_in_use() rework Vladimir Oltean
@ 2025-06-13 17:47 ` Vladimir Oltean
  2025-06-17 23:30 ` [PATCH net 0/2] ptp_vclock fixes patchwork-bot+netdevbpf
  2 siblings, 0 replies; 7+ messages in thread
From: Vladimir Oltean @ 2025-06-13 17:47 UTC (permalink / raw)
  To: netdev
  Cc: Richard Cochran, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Jeongjun Park, Yangbo Lu

There is a bug in ptp_clock_adjtime() which makes it refuse the
operation even if we just want to read the current clock dialed
frequency, not modify anything (tx->modes == 0). That should be possible
even if the clock is free-running. For context, the kernel UAPI is the
same for getting and setting the frequency of a POSIX clock.

For example, ptp4l errors out at clock_create() -> clockadj_get_freq()
-> clock_adjtime() time, when it should logically only have failed on
actual adjustments to the clock, aka if the clock was configured as
slave. But in master mode it should work.

This was discovered when examining the issue described in the previous
commit, where ptp_clock_freerun() returned true despite n_vclocks being
zero.

Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/ptp/ptp_clock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 35a5994bf64f..36f57d7b4a66 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -121,7 +121,8 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct __kernel_timex *tx)
 	struct ptp_clock_info *ops;
 	int err = -EOPNOTSUPP;
 
-	if (ptp_clock_freerun(ptp)) {
+	if (tx->modes & (ADJ_SETOFFSET | ADJ_FREQUENCY | ADJ_OFFSET) &&
+	    ptp_clock_freerun(ptp)) {
 		pr_err("ptp: physical clock is free running\n");
 		return -EBUSY;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/2] ptp: fix breakage after ptp_vclock_in_use() rework
  2025-06-13 17:47 ` [PATCH net 1/2] ptp: fix breakage after ptp_vclock_in_use() rework Vladimir Oltean
@ 2025-06-15 15:34   ` Jeongjun Park
  2025-06-15 16:03     ` Vladimir Oltean
  0 siblings, 1 reply; 7+ messages in thread
From: Jeongjun Park @ 2025-06-15 15:34 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Richard Cochran, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yangbo Lu

Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>
> What is broken
> --------------
>
> ptp4l, and any other application which calls clock_adjtime() on a
> physical clock, is greeted with error -EBUSY after commit 87f7ce260a3c
> ("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()").
>
> Explanation for the breakage
> ----------------------------
>
> The blamed commit was based on the false assumption that
> ptp_vclock_in_use() callers already test for n_vclocks prior to calling
> this function.
>
> This is notably incorrect for the code path below, in which there is, in
> fact, no n_vclocks test:
>
> ptp_clock_adjtime()
> -> ptp_clock_freerun()
>    -> ptp_vclock_in_use()
>
> The result is that any clock adjustment on any physical clock is now
> impossible. This is _despite_ there not being any vclock over this
> physical clock.
>
> $ ptp4l -i eno0 -2 -P -m
> ptp4l[58.425]: selected /dev/ptp0 as PTP clock
> [   58.429749] ptp: physical clock is free running
> ptp4l[58.431]: Failed to open /dev/ptp0: Device or resource busy
> failed to create a clock
> $ cat /sys/class/ptp/ptp0/n_vclocks
> 0
>
> The patch makes the ptp_vclock_in_use() function say "if it's not a
> virtual clock, then this physical clock does have virtual clocks on
> top".
>
> Then ptp_clock_freerun() uses this information to say "this physical
> clock has virtual clocks on top, so it must stay free-running".
>
> Then ptp_clock_adjtime() uses this information to say "well, if this
> physical clock has to be free-running, I can't do it, return -EBUSY".
>
> Simply put, ptp_vclock_in_use() cannot be simplified so as to remove the
> test whether vclocks are in use.
>
> What did the blamed commit intend to fix
> ----------------------------------------
>
> The blamed commit presents a lockdep warning stating "possible recursive
> locking detected", with the n_vclocks_store() and ptp_clock_unregister()
> functions involved.
>
> The recursive locking seems this:
> n_vclocks_store()
> -> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 1
> -> device_for_each_child_reverse(..., unregister_vclock)
>    -> unregister_vclock()
>       -> ptp_vclock_unregister()
>          -> ptp_clock_unregister()
>             -> ptp_vclock_in_use()
>                -> mutex_lock_interruptible(&ptp->n_vclocks_mux) // 2
>
> The issue can be triggered by creating and then deleting vclocks:
> $ echo 2 > /sys/class/ptp/ptp0/n_vclocks
> $ echo 0 > /sys/class/ptp/ptp0/n_vclocks
>
> But note that in the original stack trace, the address of the first lock
> is different from the address of the second lock. This is because at
> step 1 marked above, &ptp->n_vclocks_mux is the lock of the parent
> (physical) PTP clock, and at step 2, the lock is of the child (virtual)
> PTP clock. They are different locks of different devices.
>
> In this situation there is no real deadlock, the lockdep warning is
> caused by the fact that the mutexes have the same lock class on both the
> parent and the child. Functionally it is fine.
>
> Proposed alternative solution
> -----------------------------
>
> We must reintroduce the body of ptp_vclock_in_use() mostly as it was
> structured prior to the blamed commit, but avoid the lockdep warning.
>
> Based on the fact that vclocks cannot be nested on top of one another
> (ptp_is_attribute_visible() hides n_vclocks for virtual clocks), we
> already know that ptp->n_vclocks is zero for a virtual clock. And
> ptp->is_virtual_clock is a runtime invariant, established at
> ptp_clock_register() time and never changed. There is no need to
> serialize on any mutex in order to read ptp->is_virtual_clock, and we
> take advantage of that by moving it outside the lock.
>
> Thus, virtual clocks do not need to acquire &ptp->n_vclocks_mux at
> all, and step 2 in the code walkthrough above can simply go away.
> We can simply return false to the question "ptp_vclock_in_use(a virtual
> clock)".
>
> Other notes
> -----------
>
> Releasing &ptp->n_vclocks_mux before ptp_vclock_in_use() returns
> execution seems racy, because the returned value can become stale as
> soon as the function returns and before the return value is used (i.e.
> n_vclocks_store() can run any time). The locking requirement should
> somehow be transferred to the caller, to ensure a longer life time for
> the returned value, but this seems out of scope for this severe bug fix.
>
> Because we are also fixing up the logic from the original commit, there
> is another Fixes: tag for that.
>

Thanks for quickly finding the part I missed!

As you said, I confirmed that adjtime and settime functions do not work
properly due to this commit.

However, I don't think it is appropriate to fix ptp_vclock_in_use().
I agree that ptp->n_vclocks should be checked in the path where
ptp_clock_freerun() is called, but there are many drivers that do not
have any contact with ptp->n_vclocks in the path where
ptp_clock_unregister() is called.

The reason I removed the ptp->n_vclocks check logic from the
ptp_vclock_in_use() function is to prevent false positives from lockdep,
but also to prevent the performance overhead caused by locking
ptp->n_vclocks_mux and checking ptp->n_vclocks when calling
ptp_vclock_in_use() from a driver that has nothing to do with
ptp->n_vclocks.

Therefore, I think it would be appropriate to modify ptp_clock_freerun()
like this instead of ptp_vclock_in_use():
---
 drivers/ptp/ptp_private.h | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/ptp/ptp_private.h b/drivers/ptp/ptp_private.h
index 528d86a33f37..abd99087f0ca 100644
--- a/drivers/ptp/ptp_private.h
+++ b/drivers/ptp/ptp_private.h
@@ -104,10 +104,20 @@ static inline bool ptp_vclock_in_use(struct
ptp_clock *ptp)
 /* Check if ptp clock shall be free running */
 static inline bool ptp_clock_freerun(struct ptp_clock *ptp)
 {
+   bool ret = false;
+
    if (ptp->has_cycles)
-       return false;
+       return ret;
+
+   if (mutex_lock_interruptible(&ptp->n_vclocks_mux))
+       return true;
+
+   if (ptp_vclock_in_use(ptp) && ptp->n_vclocks)
+       ret = true;
+
+   mutex_unlock(&ptp->n_vclocks_mux);

-   return ptp_vclock_in_use(ptp);
+   return ret;
 }

 extern const struct class ptp_class;
-- 

I tested this patch with test.c and confirmed that
ptp_clock_{adj,set}time() works correctly again after applying this patch.

test.c:
```c
// gcc -o test test.c -lrt

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <inttypes.h>
#include <math.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/timex.h>
#include <sys/types.h>
#include <time.h>
#include <unistd.h>

#include <linux/ptp_clock.h>

static clockid_t get_clockid(int fd)
{
#define CLOCKFD 3
#define FD_TO_CLOCKID(fd)   ((~(clockid_t) (fd) << 3) | CLOCKFD)
    return FD_TO_CLOCKID(fd);
}

int main() {
    int fd = open("/dev/ptp0", O_RDWR);
    clockid_t clockid = get_clockid(fd);
    struct timex tx;
    struct timespec ts;

    memset(&tx, 0, sizeof(tx));
    tx.modes = ADJ_OFFSET;
    tx.time.tv_sec = 0;
    tx.time.tv_usec = 0;

    clock_adjtime(clockid, &tx);

    ts.tv_sec = 0;
    ts.tv_nsec = 0;

    clock_settime(clockid, &ts);

    return 0;
}
```

> Fixes: 87f7ce260a3c ("ptp: remove ptp->n_vclocks check logic in ptp_vclock_in_use()")
> Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  drivers/ptp/ptp_private.h | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/ptp/ptp_private.h b/drivers/ptp/ptp_private.h
> index 528d86a33f37..a6aad743c282 100644
> --- a/drivers/ptp/ptp_private.h
> +++ b/drivers/ptp/ptp_private.h
> @@ -98,7 +98,27 @@ static inline int queue_cnt(const struct timestamp_event_queue *q)
>  /* Check if ptp virtual clock is in use */
>  static inline bool ptp_vclock_in_use(struct ptp_clock *ptp)
>  {
> -       return !ptp->is_virtual_clock;
> +       bool in_use = false;
> +
> +       /* Virtual clocks can't be stacked on top of virtual clocks.
> +        * Avoid acquiring the n_vclocks_mux on virtual clocks, to allow this
> +        * function to be called from code paths where the n_vclocks_mux of the
> +        * parent physical clock is already held. Functionally that's not an
> +        * issue, but lockdep would complain, because they have the same lock
> +        * class.
> +        */
> +       if (ptp->is_virtual_clock)
> +               return false;
> +
> +       if (mutex_lock_interruptible(&ptp->n_vclocks_mux))
> +               return true;
> +
> +       if (ptp->n_vclocks)
> +               in_use = true;
> +
> +       mutex_unlock(&ptp->n_vclocks_mux);
> +
> +       return in_use;
>  }
>
>  /* Check if ptp clock shall be free running */
> --
> 2.43.0
>

Regards,

Jeongjun Park

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/2] ptp: fix breakage after ptp_vclock_in_use() rework
  2025-06-15 15:34   ` Jeongjun Park
@ 2025-06-15 16:03     ` Vladimir Oltean
  2025-06-17 16:04       ` Jeongjun Park
  0 siblings, 1 reply; 7+ messages in thread
From: Vladimir Oltean @ 2025-06-15 16:03 UTC (permalink / raw)
  To: Jeongjun Park
  Cc: netdev, Richard Cochran, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yangbo Lu

On Mon, Jun 16, 2025 at 12:34:59AM +0900, Jeongjun Park wrote:
> However, I don't think it is appropriate to fix ptp_vclock_in_use().
> I agree that ptp->n_vclocks should be checked in the path where
> ptp_clock_freerun() is called, but there are many drivers that do not
> have any contact with ptp->n_vclocks in the path where
> ptp_clock_unregister() is called.

What do you mean there are many drivers that do not have any contact
with ptp->n_vclocks? It is a feature visible only to the core, and
transparent to the drivers. All drivers have contact with it, or none
do. It all depends solely upon user configuration, and not dependent at
all upon the specific driver.

> The reason I removed the ptp->n_vclocks check logic from the
> ptp_vclock_in_use() function is to prevent false positives from lockdep,
> but also to prevent the performance overhead caused by locking
> ptp->n_vclocks_mux and checking ptp->n_vclocks when calling
> ptp_vclock_in_use() from a driver that has nothing to do with
> ptp->n_vclocks.

Can you quantify the performance overhead caused by acquiring
ptp->n_vclocks_mux on unregistering physical clocks?

> 
> Therefore, I think it would be appropriate to modify ptp_clock_freerun()
> like this instead of ptp_vclock_in_use():
> ---
>  drivers/ptp/ptp_private.h | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ptp/ptp_private.h b/drivers/ptp/ptp_private.h
> index 528d86a33f37..abd99087f0ca 100644
> --- a/drivers/ptp/ptp_private.h
> +++ b/drivers/ptp/ptp_private.h
> @@ -104,10 +104,20 @@ static inline bool ptp_vclock_in_use(struct
> ptp_clock *ptp)
>  /* Check if ptp clock shall be free running */
>  static inline bool ptp_clock_freerun(struct ptp_clock *ptp)
>  {
> +   bool ret = false;
> +
>     if (ptp->has_cycles)
> -       return false;
> +       return ret;
> +
> +   if (mutex_lock_interruptible(&ptp->n_vclocks_mux))
> +       return true;
> +
> +   if (ptp_vclock_in_use(ptp) && ptp->n_vclocks)
> +       ret = true;
> +
> +   mutex_unlock(&ptp->n_vclocks_mux);
> 
> -   return ptp_vclock_in_use(ptp);
> +   return ret;
>  }
> 
>  extern const struct class ptp_class;
> -- 

If we leave the ptp_vclock_in_use() implementation as
"return !ptp->is_virtual_clock;", then a physical PTP clock with
n_vclocks=0 will have ptp_vclock_in_use() return true.
Do you consider that expected behavior? What does "vclocks in use"
even mean?

In any case, I do agree with the fact that we shouldn't need to acquire
a mutex in ptp_clock_unregister() to avoid racing with the sysfs device
attributes. This seems avoidable with better operation ordering
(unregister child virtual clocks when sysfs calls are no longer
possible), or the use of pre-existing "ptp->defunct", or some other
mechanism.

You can certainly expand on that idea in net-next. The lockdep splat is
a completely unrelated issue, and only that part is 'net' material.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/2] ptp: fix breakage after ptp_vclock_in_use() rework
  2025-06-15 16:03     ` Vladimir Oltean
@ 2025-06-17 16:04       ` Jeongjun Park
  0 siblings, 0 replies; 7+ messages in thread
From: Jeongjun Park @ 2025-06-17 16:04 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Richard Cochran, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yangbo Lu

Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>
> On Mon, Jun 16, 2025 at 12:34:59AM +0900, Jeongjun Park wrote:
> > However, I don't think it is appropriate to fix ptp_vclock_in_use().
> > I agree that ptp->n_vclocks should be checked in the path where
> > ptp_clock_freerun() is called, but there are many drivers that do not
> > have any contact with ptp->n_vclocks in the path where
> > ptp_clock_unregister() is called.
>
> What do you mean there are many drivers that do not have any contact
> with ptp->n_vclocks? It is a feature visible only to the core, and
> transparent to the drivers. All drivers have contact with it, or none
> do. It all depends solely upon user configuration, and not dependent at
> all upon the specific driver.
>

I think I wrote it strangely, so I'll explain it again.

As you know, ptp_clock_{register,unregister}() is called not only in the
ptp core but also in several networking drivers to use the ptp clock
function. However, although these kernel drivers does not use any
n_vclocks-related functionality, the initial implementation of
ptp_vclock_in_use() acquired n_vclocks_mux and checked n_vclocks when
ptp_clock_unregister() was called.

> > The reason I removed the ptp->n_vclocks check logic from the
> > ptp_vclock_in_use() function is to prevent false positives from lockdep,
> > but also to prevent the performance overhead caused by locking
> > ptp->n_vclocks_mux and checking ptp->n_vclocks when calling
> > ptp_vclock_in_use() from a driver that has nothing to do with
> > ptp->n_vclocks.
>
> Can you quantify the performance overhead caused by acquiring
> ptp->n_vclocks_mux on unregistering physical clocks?
>

I think this performance overhead is a bit hard to quantify, but I think
it's wrong to add code inside ptp_clock_unregister() that performs
unnecessary locking in most cases except for some special cases.

> >
> > Therefore, I think it would be appropriate to modify ptp_clock_freerun()
> > like this instead of ptp_vclock_in_use():
> > ---
> >  drivers/ptp/ptp_private.h | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/ptp/ptp_private.h b/drivers/ptp/ptp_private.h
> > index 528d86a33f37..abd99087f0ca 100644
> > --- a/drivers/ptp/ptp_private.h
> > +++ b/drivers/ptp/ptp_private.h
> > @@ -104,10 +104,20 @@ static inline bool ptp_vclock_in_use(struct
> > ptp_clock *ptp)
> >  /* Check if ptp clock shall be free running */
> >  static inline bool ptp_clock_freerun(struct ptp_clock *ptp)
> >  {
> > +   bool ret = false;
> > +
> >     if (ptp->has_cycles)
> > -       return false;
> > +       return ret;
> > +
> > +   if (mutex_lock_interruptible(&ptp->n_vclocks_mux))
> > +       return true;
> > +
> > +   if (ptp_vclock_in_use(ptp) && ptp->n_vclocks)
> > +       ret = true;
> > +
> > +   mutex_unlock(&ptp->n_vclocks_mux);
> >
> > -   return ptp_vclock_in_use(ptp);
> > +   return ret;
> >  }
> >
> >  extern const struct class ptp_class;
> > --
>
> If we leave the ptp_vclock_in_use() implementation as
> "return !ptp->is_virtual_clock;", then a physical PTP clock with
> n_vclocks=0 will have ptp_vclock_in_use() return true.
> Do you consider that expected behavior? What does "vclocks in use"
> even mean?
>
> In any case, I do agree with the fact that we shouldn't need to acquire
> a mutex in ptp_clock_unregister() to avoid racing with the sysfs device
> attributes. This seems avoidable with better operation ordering
> (unregister child virtual clocks when sysfs calls are no longer
> possible), or the use of pre-existing "ptp->defunct", or some other
> mechanism.
>
> You can certainly expand on that idea in net-next. The lockdep splat is
> a completely unrelated issue, and only that part is 'net' material.

I agree with this, so I think it would be a good idea to add the
ptp->n_vclocks check to the caller function that absolutely needs it,
like the patch I sent in the previous email, and to remove the
ptp_vclock_in_use() function altogether and replace it with the code that
checks ptp->is_virtual_clock.

Of course, I think this idea should be discussed through net-next after
this problems caused by the previous patch are resolved as you mentioned.

Regards,

Jeongjun Park

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 0/2] ptp_vclock fixes
  2025-06-13 17:47 [PATCH net 0/2] ptp_vclock fixes Vladimir Oltean
  2025-06-13 17:47 ` [PATCH net 1/2] ptp: fix breakage after ptp_vclock_in_use() rework Vladimir Oltean
  2025-06-13 17:47 ` [PATCH net 2/2] ptp: allow reading of currently dialed frequency to succeed on free-running clocks Vladimir Oltean
@ 2025-06-17 23:30 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-06-17 23:30 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, richardcochran, andrew+netdev, davem, edumazet, kuba,
	pabeni, aha310510, yangbo.lu

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 13 Jun 2025 20:47:47 +0300 you wrote:
> Hello,
> 
> While I was intending to test something else related to PTP in net-next
> I noticed any time I would run ptp4l on an interface, the kernel would
> print "ptp: physical clock is free running" and ptp4l would exit with an
> error code.
> 
> [...]

Here is the summary with links:
  - [net,1/2] ptp: fix breakage after ptp_vclock_in_use() rework
    https://git.kernel.org/netdev/net/c/5ab73b010cad
  - [net,2/2] ptp: allow reading of currently dialed frequency to succeed on free-running clocks
    https://git.kernel.org/netdev/net/c/aa112cbc5f0a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-06-17 23:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-13 17:47 [PATCH net 0/2] ptp_vclock fixes Vladimir Oltean
2025-06-13 17:47 ` [PATCH net 1/2] ptp: fix breakage after ptp_vclock_in_use() rework Vladimir Oltean
2025-06-15 15:34   ` Jeongjun Park
2025-06-15 16:03     ` Vladimir Oltean
2025-06-17 16:04       ` Jeongjun Park
2025-06-13 17:47 ` [PATCH net 2/2] ptp: allow reading of currently dialed frequency to succeed on free-running clocks Vladimir Oltean
2025-06-17 23:30 ` [PATCH net 0/2] ptp_vclock fixes patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).