From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH net-next] mlx4: do not use rwlock in fast path Date: Wed, 27 Jun 2018 06:49:18 -0700 Message-ID: <5d0fbf2f-9feb-3f6b-49b0-39b74285b124@gmail.com> References: <1486660204.7793.104.camel@edumazet-glaptop3.roam.corp.google.com> <05ae066b-873d-159b-4ac2-ab39120c949b@mellanox.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: netdev , Shawn Bohrer , Shay Agroskin , Eran Ben Elisha To: Tariq Toukan , Eric Dumazet , David Miller Return-path: Received: from mail-pl0-f68.google.com ([209.85.160.68]:37980 "EHLO mail-pl0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964914AbeF0NtV (ORCPT ); Wed, 27 Jun 2018 09:49:21 -0400 Received: by mail-pl0-f68.google.com with SMTP id d10-v6so1069177plo.5 for ; Wed, 27 Jun 2018 06:49:21 -0700 (PDT) In-Reply-To: <05ae066b-873d-159b-4ac2-ab39120c949b@mellanox.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 06/27/2018 05:11 AM, Tariq Toukan wrote: > > > On 09/02/2017 7:10 PM, Eric Dumazet wrote: >> From: Eric Dumazet >> >> Using a reader-writer lock in fast path is silly, when we can >> instead use RCU or a seqlock. >> >> For mlx4 hwstamp clock, a seqlock is the way to go, removing >> two atomic operations and false sharing. >> >> Signed-off-by: Eric Dumazet >> Cc: Tariq Toukan >> --- >>   drivers/net/ethernet/mellanox/mlx4/en_clock.c |   35 ++++++++-------- >>   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h  |    2 >>   2 files changed, 19 insertions(+), 18 deletions(-) >> > > Hi Eric, > > When my peer, Shay, modified mlx5 to adopt this same locking scheme/type, he noticed a degradation in packet rate. > He got back to testing mlx4 and also noticed a degradation introduced by this patch. > > Perf numbers (single ring): > > mlx4: > with rw-lock: ~8.54M pps > with seq-lock: ~8.51M pps > > mlx5: > With rw-lock: ~14.94M pps > With seq-lock: ~14.48M pps > > Actually, this can be explained by the analysis below. > In short, number of readers is significantly larger than of writers. Hence optimizing the readers flow would give better numbers. The issue is, the read/write lock might cause writers starvation. Maybe RCU fits best here? > > Degradation analysis: > The patch changes the lock type which protects reads and updates of a variable ( (struct mlx4_en_dev).clock variable) > This variable is used to convert the hw timestamp into skb->hwtstamps. > This variable is read for each transmitted/received packet and updated only via ptp module and some overflow periodic work we have (maximum of 10 times per second) > Meaning that there are much more readers than writers, and it’s best to optimize the readers flow. > Hi Tariq Are you sure you enabled time stamps in your tests ? mlx4_en_fill_hwtstamps() is _really_ called 8,540,000 times per second, meaning a same amount of read_lock_irqsave()/read_unlock_irqrestore() is performed ? You have a pretty damn good CPU it seems. seqlock has no cost for a reader [1], other than reading one integer value and testing it. [1] If this value never change (and is on a clean cache line). Really this looks like ring->hwtstamp_rx_filter != HWTSTAMP_FILTER_ALL in your tests. The numbers you gave just give one cycle difference per packet (half a nano second), so I really doubt adding back the heavy read_lock_irqsave()/read_unlock_irqrestore() could be faster. Conceptually seqlock is some form of RCU, it really optimizes the readers flow. Thanks