* [PATCH net-next] bnxt_en: optimize gettimex64
@ 2024-11-12 11:05 Vadim Fedorenko
2024-11-12 18:05 ` Michael Chan
0 siblings, 1 reply; 3+ messages in thread
From: Vadim Fedorenko @ 2024-11-12 11:05 UTC (permalink / raw)
To: Vadim Fedorenko, Pavan Chebbi, Andrew Lunn, Paolo Abeni,
Michael Chan, Jakub Kicinski
Cc: Richard Cochran, netdev, David S. Miller, Eric Dumazet,
Vadim Fedorenko, Simon Horman
Current implementation of gettimex64() makes at least 3 PCIe reads to
get current PHC time. It takes at least 2.2us to get this value back to
userspace. At the same time there is cached value of upper bits of PHC
available for packet timestamps already. This patch reuses cached value
to speed up reading of PHC time.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
---
I did some benchmarks on host with Broadcom Thor NIC trying to build
histogram of time spent to call clock_gettime() to query PTP device
over million iterations.
With current implementation the result is (long tail is cut):
2200ns: 902624
2300ns: 87404
2400ns: 4025
2500ns: 1307
2600ns: 581
2700ns: 261
2800ns: 104
2900ns: 36
3000ns: 32
3100ns: 24
3200ns: 16
3300ns: 29
3400ns: 29
3500ns: 23
Optimized version on the very same machine and NIC gives next values:
900ns: 865436
1000ns: 128630
1100ns: 2671
1200ns: 727
1300ns: 397
1400ns: 178
1500ns: 92
1600ns: 16
1700ns: 15
1800ns: 11
1900ns: 6
2000ns: 20
2100ns: 11
That means pct(99) improved from 2300ns to 1000ns.
---
drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c | 32 +++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
index 91e7e08fabb1..8764ce412f7b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
@@ -112,6 +112,28 @@ static int bnxt_refclk_read(struct bnxt *bp, struct ptp_system_timestamp *sts,
return rc;
}
+static int bnxt_refclk_read_low(struct bnxt *bp, struct ptp_system_timestamp *sts,
+ u32 *low)
+{
+ struct bnxt_ptp_cfg *ptp = bp->ptp_cfg;
+ unsigned long flags;
+
+ /* We have to serialize reg access and FW reset */
+ read_seqlock_excl_irqsave(&ptp->ptp_lock, flags);
+
+ if (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
+ read_sequnlock_excl_irqrestore(&ptp->ptp_lock, flags);
+ return -EIO;
+ }
+
+ ptp_read_system_prets(sts);
+ *low = readl(bp->bar0 + ptp->refclk_mapped_regs[0]);
+ ptp_read_system_postts(sts);
+
+ read_sequnlock_excl_irqrestore(&ptp->ptp_lock, flags);
+ return 0;
+}
+
static void bnxt_ptp_get_current_time(struct bnxt *bp)
{
struct bnxt_ptp_cfg *ptp = bp->ptp_cfg;
@@ -162,13 +184,19 @@ static int bnxt_ptp_gettimex(struct ptp_clock_info *ptp_info,
{
struct bnxt_ptp_cfg *ptp = container_of(ptp_info, struct bnxt_ptp_cfg,
ptp_info);
- u64 ns, cycles;
+ u64 ns, cycles, time;
+ u32 low;
int rc;
- rc = bnxt_refclk_read(ptp->bp, sts, &cycles);
+ rc = bnxt_refclk_read_low(ptp->bp, sts, &low);
if (rc)
return rc;
+ time = (u64)READ_ONCE(ptp->old_time) << BNXT_HI_TIMER_SHIFT;
+ cycles = (time & BNXT_HI_TIMER_MASK) | low;
+ if (low < (time & BNXT_LO_TIMER_MASK))
+ cycles += BNXT_LO_TIMER_MASK + 1;
+
ns = bnxt_timecounter_cyc2time(ptp, cycles);
*ts = ns_to_timespec64(ns);
--
2.43.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net-next] bnxt_en: optimize gettimex64
2024-11-12 11:05 [PATCH net-next] bnxt_en: optimize gettimex64 Vadim Fedorenko
@ 2024-11-12 18:05 ` Michael Chan
2024-11-14 2:59 ` Jakub Kicinski
0 siblings, 1 reply; 3+ messages in thread
From: Michael Chan @ 2024-11-12 18:05 UTC (permalink / raw)
To: Vadim Fedorenko
Cc: Vadim Fedorenko, Pavan Chebbi, Andrew Lunn, Paolo Abeni,
Jakub Kicinski, Richard Cochran, netdev, David S. Miller,
Eric Dumazet, Simon Horman
[-- Attachment #1: Type: text/plain, Size: 1751 bytes --]
On Tue, Nov 12, 2024 at 3:06 AM Vadim Fedorenko <vadfed@meta.com> wrote:
>
> Current implementation of gettimex64() makes at least 3 PCIe reads to
> get current PHC time. It takes at least 2.2us to get this value back to
> userspace. At the same time there is cached value of upper bits of PHC
> available for packet timestamps already. This patch reuses cached value
> to speed up reading of PHC time.
>
> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
> index 91e7e08fabb1..8764ce412f7b 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
> @@ -162,13 +184,19 @@ static int bnxt_ptp_gettimex(struct ptp_clock_info *ptp_info,
> {
> struct bnxt_ptp_cfg *ptp = container_of(ptp_info, struct bnxt_ptp_cfg,
> ptp_info);
> - u64 ns, cycles;
> + u64 ns, cycles, time;
> + u32 low;
> int rc;
>
> - rc = bnxt_refclk_read(ptp->bp, sts, &cycles);
> + rc = bnxt_refclk_read_low(ptp->bp, sts, &low);
> if (rc)
> return rc;
>
> + time = (u64)READ_ONCE(ptp->old_time) << BNXT_HI_TIMER_SHIFT;
> + cycles = (time & BNXT_HI_TIMER_MASK) | low;
> + if (low < (time & BNXT_LO_TIMER_MASK))
> + cycles += BNXT_LO_TIMER_MASK + 1;
> +
Looks good to me. I think this logic here to get the upper bits and
check for overflow is identical to the RX timestamp logic. It may be
worthwhile to have a common helper function to do this. Other than
that,
Reviewed-by: Michael Chan <michael.chan@broadocm.com>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net-next] bnxt_en: optimize gettimex64
2024-11-12 18:05 ` Michael Chan
@ 2024-11-14 2:59 ` Jakub Kicinski
0 siblings, 0 replies; 3+ messages in thread
From: Jakub Kicinski @ 2024-11-14 2:59 UTC (permalink / raw)
To: Michael Chan
Cc: Vadim Fedorenko, Vadim Fedorenko, Pavan Chebbi, Andrew Lunn,
Paolo Abeni, Richard Cochran, netdev, David S. Miller,
Eric Dumazet, Simon Horman
On Tue, 12 Nov 2024 10:05:50 -0800 Michael Chan wrote:
> I think this logic here to get the upper bits and
> check for overflow is identical to the RX timestamp logic. It may be
> worthwhile to have a common helper function to do this.
+1
--
pw-bot: cr
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-11-14 2:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-12 11:05 [PATCH net-next] bnxt_en: optimize gettimex64 Vadim Fedorenko
2024-11-12 18:05 ` Michael Chan
2024-11-14 2:59 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).