From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vasu Dev Subject: Re: [net-next 13/13] ixgbe: use per NUMA node lock for FCoE DDP Date: Mon, 13 Jun 2011 16:59:12 -0700 Message-ID: <1308009552.31900.80.camel@vi2.jf.intel.com> References: <1307761341-5267-1-git-send-email-jeffrey.t.kirsher@intel.com> <1307761341-5267-14-git-send-email-jeffrey.t.kirsher@intel.com> <1307769482.2872.62.camel@edumazet-laptop> <1307770931.2872.70.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Jeff Kirsher , davem@davemloft.net, Vasu Dev , netdev@vger.kernel.org, gospo@redhat.com To: Eric Dumazet Return-path: Received: from mga11.intel.com ([192.55.52.93]:56928 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755319Ab1FMX7N (ORCPT ); Mon, 13 Jun 2011 19:59:13 -0400 In-Reply-To: <1307770931.2872.70.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-ID: On Sat, 2011-06-11 at 07:42 +0200, Eric Dumazet wrote: > Le samedi 11 juin 2011 =C3=A0 07:18 +0200, Eric Dumazet a =C3=A9crit = : > > > /** > > > diff --git a/drivers/net/ixgbe/ixgbe_fcoe.h b/drivers/net/ixgbe/i= xgbe_fcoe.h > > > index d876e7a..8618892 100644 > > > --- a/drivers/net/ixgbe/ixgbe_fcoe.h > > > +++ b/drivers/net/ixgbe/ixgbe_fcoe.h > > > @@ -69,6 +69,7 @@ struct ixgbe_fcoe { > > > struct pci_pool **pool; > > > atomic_t refcnt; > > > spinlock_t lock; > > > + struct spinlock **node_lock; > >=20 > > Wont this read_mostly pointer sits in often modified cache line ? > >=20 > > > struct ixgbe_fcoe_ddp ddp[IXGBE_FCOE_DDP_MAX]; > > > unsigned char *extra_ddp_buffer; > > > dma_addr_t extra_ddp_buffer_dma; > >=20 >=20 > This patch seems overkill to me, have you tried the more simple way I > did in commit 79640a4ca6955e3ebdb7038508fa7a0cd7fa5527 > (net: add additional lock to qdisc to increase throughput ) >=20 Conceptually this patch does same as referred patch from you, in your case added Qdisc busylock worked fine with multiple tx/Qdisc but for fcoe a single ixgbe_fcoe instance is used and therefore this patch required to setup per node lock similar to Qdisc busylock and that was the only additional code here and rest is similar. > (remember you must place ->busylock in a separate cache line, to not > slow down the two cpus that have access to ->lock) >=20 > struct ixgbe_fcoe could probably be more carefuly reordered to lower > false sharing >=20 I did optimize ixgbe_fcoe due to added node_lock using pahole and not t= o stepped on any mostly modified line beside also reducing holes, but moving far away in the struct on new cacheline boundary is even better and I'll do that in updated patch. > I kindly ask you guys provide actual perf numbers between >=20 > 1) before any patch > 2) After your multilevel per numanode locks > 3) A more simple way (my suggestion of adding a single 'busylock') >=20 I'm retracting this patch for now and in case of updated patch I'll get number by just this change, earlier I had number w/ and w/o three DDP related improvements patches in this series and that had net 26% IOPS increase for 512Bytes reads but that is mainly from other patch adding per CPU pci pool. The two sockets systems had about same IOPS w/ or w/o just this patch and so may be this will work better w/ more sockets, so I'll will respin this patch only if I see any justifiable gain by just this patch. Thanks Vasu > Thanks >=20 >=20 > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html