From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757612AbYFWJOM (ORCPT ); Mon, 23 Jun 2008 05:14:12 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752957AbYFWJN6 (ORCPT ); Mon, 23 Jun 2008 05:13:58 -0400 Received: from E23SMTP06.au.ibm.com ([202.81.18.175]:53825 "EHLO e23smtp06.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752123AbYFWJN5 (ORCPT ); Mon, 23 Jun 2008 05:13:57 -0400 Date: Mon, 23 Jun 2008 02:13:49 -0700 From: "Paul E. McKenney" To: Lai Jiangshan Cc: Andrew Morton , torvalds@linux-foundation.org, Linux Kernel Mailing List Subject: Re: [RFC][PATCH] rcu classic: new algorithm for callbacks-processing Message-ID: <20080623091349.GK22569@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <4844BE83.5010401@cn.fujitsu.com> <20080608151844.GA18751@linux.vnet.ibm.com> <485F17B2.5010700@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <485F17B2.5010700@cn.fujitsu.com> User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 23, 2008 at 11:25:38AM +0800, Lai Jiangshan wrote: > > I apologize for for so later response. I do not stop this works. > But some problems occurred when i tested. (Actually, i wanted to reply > you after all are fixed, my fault!) No problem -- as long as things are progressing, I am happy! ;-) > Paul E. McKenney wrote: > > On Tue, Jun 03, 2008 at 11:46:11AM +0800, Lai Jiangshan wrote: > >> The code/algorithm of the implement of current callbacks-processing > >> is very efficient and technical. But when I studied it and I found > >> a disadvantage: > >> > >> In multi-CPU systems, when a new RCU callback is being > >> queued(call_rcu[_bh]), this callback will be invoked after the grace > >> period for the batch with batch number = rcp->cur+2 has completed > >> very very likely in current implement. Actually, this callback can be > >> invoked after the grace period for the batch with > >> batch number = rcp->cur+1 has completed. The delay of invocation means > >> that latency of synchronize_rcu() is extended. But more important thing > >> is that the callbacks usually free memory, and these works are delayed > >> too! it's necessary for reclaimer to free memory as soon as > >> possible when left memory is few. > > > > Speeding up the RCU grace periods would indeed be a very good thing! > > > >> A very simple way can solve this problem: > >> a field(struct rcu_head::batch) is added to record the batch number for > >> the RCU callback. And when a new RCU callback is being queued, we > >> determine the batch number for this callback(head->batch = rcp->cur+1) > >> and we move this callback to rdp->donelist if we find > >> that head->batch <= rcp->completed when we process callbacks. > >> This simple way reduces the wait time for invocation a lot. (about > >> 2.5Grace Period -> 1.5Grace Period in average in multi-CPU systems) > >> > >> This is my algorithm. But I do not add any field for struct rcu_head > >> in my implement. We just need to memorize the last 2 batches and > >> their batch number, because these 2 batches include all entries that > >> for whom the grace period hasn't completed. So we use a special > >> linked-list rather than add a field. > >> Please see the comment of struct rcu_data. > > > > Maintaining the single list with multiple pointers into it certainly > > does seem to simplify the list processing, as does extracting the common > > code from call_rcu() and call_rcu_bh(). Just out of curiosity, why > > did you keep donelist as a separate list instead of an additional pointer > > into the mxtlist? > > donelist is only accessed in softirq(do not need irq disabled), > but nxtlist is not. i didn't want to modify rcu_do_batch(). OK, we can always handle this separately if it makes sense. > >> rcutourture was tested successfully(x86_64/4cpu i386/2cpu i386/1cpu). > > > > Of course, RCU implementations need careful inspection, testing and > > validation. Running rcutorture is a good first step, but unfortunately > > only a first step. So I need to ask you the following questions: > > > > 1. How long did you run rcutorture? > > 2 hours at the first time i run rcutorture , but no hotplug nor > test_no_idle_hz argument. How long would be appropriate? I would suggest 24 hours for a core change like this. > > 2. Do you have access to weak-memory-order machines on which > > to do rcutorture testing? (If not, I expect that we can > > motivate testing elsewhere.) > > I can't access to weak-memory-order machines. Could you please > test it after all my test are OK? I would be very happy to! Let me know when you are ready, and send me a patch against some published version (e.g., one of the ones that is run by test.kernel.org). > > 3. Did you run CPU hotplug while running rcutorture? Doing so > > is extremely important, as RCU interacts with CPU hotplug. > > failed with the following script is run at the same time, > i hasn't found out the reason: > #!/bin/sh > > # 4cpus > > cpu1=1 > cpu2=1 > cpu3=1 > while ((1)) > do > no=$(($RANDOM % 3 + 1)) > if ((!cpu$no)) > then > echo 1 > /sys/devices/system/cpu/cpu$no/online > ((cpu$no=1)) > else > echo 0 > /sys/devices/system/cpu/cpu$no/online > ((cpu$no=0)) > fi > echo 1 $cpu1 $cpu2 $cpu3 > sleep 2 > done You tried this without your changes and it passed, correct? Never forget to test the base kernel. ;-) > > 4. Did you use the rcutorture test_no_idle_hz and shuffle_interval > > arguments to test out RCU's interaction with CONFIG_NO_HZ? > > (This requires running a CONFIG_NO_HZ kernel.) > > test is OK with test_no_idle_hz=1 shuffle_interval=5. Very good!!! For how long? > It seems my patch changes nothing about NO_HZ. Agreed, but changes can have unanticipated side-effects, so it is always good to check. > > 5. One concern I have is the removal of a few memory barriers. > > Could you please tell me why it is safe to remove these? > > Yes, it is safe, it's may delay the processing a little > when read the old/error values for rcp->cur/rcp->next_pending. > I had fixed it. But it may still delay the processing when old value > for rcp->completed is read in rcu_pending(). Would you please write down why you believe it is safe? It is not that I doubt your ability, it is just that RCU implementations can be a bit tricky at times. It will help me check your code if I fully understand your thinking. > > Could you please run any additional combinations of tests that you > > are able to given the hardware you have access to? > > Yes, i will test and i want more advice. And, as noted earlier, I will be happy to fill in testing on weak-memory machines once it is working well on the machines you have access to. > > And thank you very much for all your work in simplifying and speeding > > up RCU grace-period detection! There may be some additional work > > required, but this patch does look promising! > > > > Thanx, Paul > > > > How can I test to find out whether a patch of rcu > advances system's performance? By running a number of benchmarks. You might want to check boot-up and shutdown speed as well. > I didn't changed any code for batch's grace period. I just > insert callbacks into the right batch to speeding up their grace periods > in SMP. > > And I think broadcasting when a new batch is started will > speed up batch's grace period. It might well, and speeding up the grace period would be a good thing. However, it will be necessary to test overhead -- and we might need some help from the SGI guys for their very large machines. (I have some moderately large ones I can get access to from time to time, but the SGI machines are the biggest I am aware of.) Thanx, Paul