Re: PTP vclock: BUG: scheduling while atomic

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Martin Habets <habetsm.xilinx@gmail.com>
To: "Íñigo Huguet" <ihuguet@redhat.com>
Cc: netdev@vger.kernel.org, richardcochran@gmail.com,
	yangbo.lu@nxp.com, mlichvar@redhat.com,
	gerhard@engleder-embedded.com, ecree.xilinx@gmail.com,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, alex.maftei@amd.com
Subject: Re: PTP vclock: BUG: scheduling while atomic
Date: Fri, 3 Feb 2023 09:09:16 +0000	[thread overview]
Message-ID: <Y9zPPON16NEbzw86@gmail.com> (raw)
In-Reply-To: <69d0ff33-bd32-6aa5-d36c-fbdc3c01337c@redhat.com>

On Thu, Feb 02, 2023 at 05:02:07PM +0100, Íñigo Huguet wrote:
> Hello,
> 
> Our QA team was testing PTP vclocks, and they've found this error with sfc NIC/driver:
>   BUG: scheduling while atomic: ptp5/25223/0x00000002
> 
> The reason seems to be that vclocks disable interrupts with `spin_lock_irqsave` in
> `ptp_vclock_gettime`, and then read the timecounter, which in turns ends calling to
> the driver's `gettime64` callback.
> 
> Vclock framework was added in commit 5d43f951b1ac ("ptp: add ptp virtual clock driver
> framework").

Looking at that commit we'll face the same spinlock issue in
ptp_vclock_adjfine and ptp_vclock_adjtime.

> At first glance, it seems that vclock framework is reusing the already existing callbacks
> of the drivers' ptp clocks, but it's imposing a new limitation that didn't exist before:
> now they can't sleep (due the spin_lock_irqsave). Sfc driver might sleep waiting for the
> fw response.
> 
> Sfc driver can be fixed to avoid this issue, but I wonder if something might not be
> correct in the vclock framework. I don't have enough knowledge about how clocks
> synchronization should work regarding this, so I leave it to your consideration.

If the timer hardware is local to the CPU core a spinlock could work.
But if it global across CPUs, or like in our case remote behind a PCI bus,
using a spinlock is too much of a restriction.
I also wonder why the spinlock was used, and if that limitation can be
reduced.

Martin

> These are the logs with stack traces:
>  BUG: scheduling while atomic: ptp5/25223/0x00000002
>  [...skip...]
>  Call Trace:
>   dump_stack_lvl+0x34/0x48
>   __schedule_bug.cold+0x47/0x53
>   __schedule+0x40e/0x580
>   schedule+0x43/0xa0
>   schedule_timeout+0x88/0x160
>   ? __bpf_trace_tick_stop+0x10/0x10
>   _efx_mcdi_rpc_finish+0x2a9/0x480 [sfc]
>   ? efx_mcdi_send_request+0x1d5/0x260 [sfc]
>   ? dequeue_task_stop+0x70/0x70
>   _efx_mcdi_rpc.constprop.0+0xcd/0x3d0 [sfc]
>   ? update_load_avg+0x7e/0x730
>   _efx_mcdi_rpc_evb_retry+0x5d/0x1d0 [sfc]
>   efx_mcdi_rpc+0x10/0x20 [sfc]
>   efx_phc_gettime+0x5f/0xc0 [sfc]
>   ptp_vclock_read+0xa3/0xc0
>   timecounter_read+0x11/0x60
>   ptp_vclock_refresh+0x31/0x60
>   ? ptp_clock_release+0x50/0x50
>   ptp_aux_kworker+0x19/0x40
>   kthread_worker_fn+0xa9/0x250
>   ? kthread_should_park+0x30/0x30
>   kthread+0x146/0x170
>   ? set_kthread_struct+0x50/0x50
>   ret_from_fork+0x1f/0x30
>  BUG: scheduling while atomic: ptp5/25223/0x00000000
>  [...skip...]
>  Call Trace:
>   dump_stack_lvl+0x34/0x48
>   __schedule_bug.cold+0x47/0x53
>   __schedule+0x40e/0x580
>   ? ptp_clock_release+0x50/0x50
>   schedule+0x43/0xa0
>   kthread_worker_fn+0x128/0x250
>   ? kthread_should_park+0x30/0x30
>   kthread+0x146/0x170
>   ? set_kthread_struct+0x50/0x50
>   ret_from_fork+0x1f/0x30

     prev parent reply	other threads:[~2023-02-03  9:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-02 16:02 PTP vclock: BUG: scheduling while atomic Íñigo Huguet
2023-02-02 16:33 ` Miroslav Lichvar
2023-02-02 20:52   ` Jacob Keller
2023-02-03  0:10   ` Richard Cochran
2023-02-03 16:04     ` Íñigo Huguet
2023-02-03  9:09 ` Martin Habets [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9zPPON16NEbzw86@gmail.com \
    --to=habetsm.xilinx@gmail.com \
    --cc=alex.maftei@amd.com \
    --cc=davem@davemloft.net \
    --cc=ecree.xilinx@gmail.com \
    --cc=edumazet@google.com \
    --cc=gerhard@engleder-embedded.com \
    --cc=ihuguet@redhat.com \
    --cc=kuba@kernel.org \
    --cc=mlichvar@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=yangbo.lu@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).