From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nathan Zimmer Subject: Re: [PATCH] fs/proc: Move kfree outside pde_unload_lock Date: Fri, 24 Aug 2012 11:45:45 -0500 Message-ID: <5037AFB9.5010001@sgi.com> References: <1345653510-22000-1-git-send-email-nzimmer@sgi.com> <1345660110.5158.1969.camel@edumazet-glaptop> <1345671778.5158.2369.camel@edumazet-glaptop> <20120824144852.GA18850@gulag1.americas.sgi.com> <1345820311.4824.2.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: , , , Alexander Viro , David Woodhouse To: Eric Dumazet Return-path: In-Reply-To: <1345820311.4824.2.camel@edumazet-laptop> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On 08/24/2012 09:58 AM, Eric Dumazet wrote: > Le vendredi 24 ao=C3=BBt 2012 =C3=A0 09:48 -0500, Nathan Zimmer a =C3= =A9crit : >> On Wed, Aug 22, 2012 at 11:42:58PM +0200, Eric Dumazet wrote: >>> On Wed, 2012-08-22 at 20:28 +0200, Eric Dumazet wrote: >>> >>>> Thats interesting, but if you really want this to fly, one RCU >>>> conversion would be much better ;) >>>> >>>> pde_users would be an atomic_t and you would avoid the spinlock >>>> contention. >>> Here is what I had in mind, I would be interested to know how it he= lps a 512 core machine ;) >>> >> Here are the results and they look great. >> >> cpuinfo baseline moved kfree Rcu >> tasks read-sec read-sec read-sec >> 1 0.0141 0.0141 0.0141 >> 2 0.0140 0.0140 0.0142 >> 4 0.0140 0.0141 0.0141 >> 8 0.0145 0.0145 0.0140 >> 16 0.0553 0.0548 0.0168 >> 32 0.1688 0.1622 0.0549 >> 64 0.5017 0.3856 0.1690 >> 128 1.7005 0.9710 0.5038 >> 256 5.2513 2.6519 2.0804 >> 512 8.0529 6.2976 3.0162 >> >> >> > Indeed... > > Could you explicit the test you are actually doing ? > > Thanks > > It is a dead simple test. The test starts by forking off X number of tasks assigning each their own cpu. Each task then allocs a bit of memory. All tasks wait on a memory cell for the go order. We measure the read time starting here. Once the go order is given they all read a chunk of the selected proc f= ile. I was using /proc/cpuinfo to test. Once everyone has finished we take the end read time.