All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gautham R Shenoy <ego@in.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org,
	mathieu.desnoyers@polymtl.ca, mingo@elte.hu, hch@infradead.org,
	mmlnx@us.ibm.com, dipankar@in.ibm.com, dsmith@redhat.com,
	rostedt@goodmis.org, adrian.bunk@movial.fi,
	a.p.zijlstra@chello.nl, niv@us.ibm.com, dvhltc@us.ibm.com,
	rusty@au1.ibm.com, jkenisto@linux.vnet.ibm.com, oleg@tv-sign.ru
Subject: Re: [PATCH,RFC] Add call_rcu_sched()
Date: Tue, 8 Apr 2008 13:40:48 +0530	[thread overview]
Message-ID: <20080408081048.GA15001@in.ibm.com> (raw)
In-Reply-To: <20080408003449.873574d7.akpm@linux-foundation.org>

On Tue, Apr 08, 2008 at 12:34:49AM -0700, Andrew Morton wrote:
> On Sun, 6 Apr 2008 14:37:19 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Hello!
> > 
> > Third cut of patch to provide the call_rcu_sched().  This is again to
> > synchronize_sched() as call_rcu() is to synchronize_rcu().
> > 
> > Should be fine for experimental use, but not ready for inclusion.
> 
> Let me know when to come out of hiding ;)
> 
> > Passes multi-hour rcutorture sessions with concurrent CPU hotplugging.
> > 
> > Fixes since the first version include a bug that could result in
> > indefinite blocking (spotted by Gautham Shenoy), better resiliency
> > against CPU-hotplug operations, and other minor fixes.
> > 
> > Fixes since the second version include reworking grace-period detection
> > to avoid deadlocks that could happen when running concurrently with
> > CPU hotplug, adding Mathieu's fix to avoid the softlockup messages,
> > as well as Mathieu's fix to allow use earlier in boot.
> > 
> > Known/suspected shortcomings:
> > 
> > o	Only moderately tested on x86-64 and POWER -- a few hours of
> > 	rcutorture with concurrent CPU hotplugging.  In particular, I
> > 	still do not trust the sleep/wakeup logic between call_rcu_sched()
> > 	and rcu_sched_grace_period().
> > 
> > o	Need to add call_rcu_sched() testing to rcutorture.
> > 
> > o	Still needs rcu_barrier_sched() -- intending to incorporate
> > 	the version Mathieu provided.
> > 
> > This patch also fixes a long-standing bug in the earlier preemptable-RCU
> > implementation of synchronize_rcu() that could result in loss of
> > concurrent external changes to a task's CPU affinity mask.  I still cannot
> > remember who reported this...
> >
> > ...
> >
> > +#define call_rcu_sched(head, func) call_rcu(head, func)
> > +
> >  extern void __rcu_init(void);
> > +#define rcu_init_sched()	do { } while (0)
> 
> There are lots of creepy macros-which-probably-dont-need-to-be-macros in
> here.
> 
> > +
> > +static inline int
> > +rcu_qsctr_inc_needed_dyntick(int cpu)
> 
> Unneeded newline.
> 
> > +{
> > +	long curr;
> > +	long snap;
> > +	struct rcu_dyntick_sched *rdssp = &per_cpu(rcu_dyntick_sched, cpu);
> > +
> > +	curr = rdssp->dynticks;
> > +	snap = rdssp->sched_dynticks_snap;
> > +	smp_mb(); /* force ordering with cpu entering/leaving dynticks. */
> > +
> > +	/*
> > +	 * If the CPU remained in dynticks mode for the entire time
> > +	 * and didn't take any interrupts, NMIs, SMIs, or whatever,
> > +	 * then it cannot be in the middle of an rcu_read_lock(), so
> > +	 * the next rcu_read_lock() it executes must use the new value
> > +	 * of the counter.  Therefore, this CPU has been in a quiescent
> > +	 * state the entire time, and we don't need to wait for it.
> > +	 */
> > +
> > +	if ((curr == snap) && ((curr & 0x1) == 0))
> > +		return 0;
> > +
> > +	/*
> > +	 * If the CPU passed through or entered a dynticks idle phase with
> > +	 * no active irq handlers, then, as above, this CPU has already
> > +	 * passed through a quiescent state.
> > +	 */
> > +
> > +	if ((curr - snap) > 2 || (snap & 0x1) == 0)
> > +		return 0;
> > +
> > +	/* We need this CPU to go through a quiescent state. */
> > +
> > +	return 1;
> > +}
> 
> That's a pretty big inline.  It only has a single callsite so the compiler
> should inline it for us.  And if it grows a second callsite, the inlining
> is probably wrong.
> 
> > +static inline int
> > +rcu_qsctr_inc_needed(int cpu)
> 
> Unneeded newline.
> 
> >  /*
> >   * Get here when RCU is idle.  Decide whether we need to
> >   * move out of idle state, and return non-zero if so.
> > @@ -821,6 +924,13 @@ void rcu_check_callbacks(int cpu, int us
> >  	unsigned long flags;
> >  	struct rcu_data *rdp = RCU_DATA_CPU(cpu);
> >  
> > +	if (user ||
> > +	    (idle_cpu(cpu) && !in_softirq() &&
> > +	     hardirq_count() <= (1 << HARDIRQ_SHIFT))) {
> 
> I think this test could do with a bigfatcomment explaining what it is doing.
> 
> > +		smp_mb();	/* Guard against aggressive schedule(). */
> > +	     	rcu_qsctr_inc(cpu);
> > +	}
> > +
> >  	rcu_check_mb(cpu);
> >  	if (rcu_ctrlblk.completed == rdp->completed)
> >  		rcu_try_flip();
> >
> > ...
> >
> > +
> > +	/*
> > +	 * The rcu_sched grace-period processing might have bypassed
> > +	 * this CPU, given that it was not in the rcu_cpu_online_map
> > +	 * when the grace-period scan started.  This means that the
> > +	 * grace-period task might sleep.  So make sure that if this
> > +	 * should happen, the first callback posted to this CPU will
> > +	 * wake up the grace-period task if need be.
> > +	 */
> > +
> > +	local_irq_save(flags);
> > +	rdp = RCU_DATA_ME();
> > +	spin_lock(&rdp->lock);
> 
> I assume that splitting the irq-disable from the spin_lock is a little
> latency optimisation?

RCU_DATA_ME() eventually calls smp_processor_id(), which yells if we
call it from a irq-enabled or a preempt-enabled context. Hence the
splitting.

> 
> > +	rdp->rcu_sched_sleeping = 1;
> > +	spin_unlock_irqrestore(&rdp->lock, flags);
> >  }
> >  
> >  #else /* #ifdef CONFIG_HOTPLUG_CPU */
> > @@ -993,26 +1129,194 @@ void call_rcu(struct rcu_head *head, voi
> >  }
> >  EXPORT_SYMBOL_GPL(call_rcu);
> >  
> > +void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
> > +{
> > +	unsigned long flags;
> > +	struct rcu_data *rdp;
> > +	int wake_gp = 0;
> > +
> > +	head->func = func;
> > +	head->next = NULL;
> > +	local_irq_save(flags);
> > +	rdp = RCU_DATA_ME();
> > +	spin_lock(&rdp->lock);
> > +	*rdp->nextschedtail = head;
> > +	rdp->nextschedtail = &head->next;
> > +	if (rdp->rcu_sched_sleeping) {
> > +
> > +		/* Grace-period processing might be sleeping... */
> > +
> > +		rdp->rcu_sched_sleeping = 0;
> > +		wake_gp = 1;
> > +	}
> > +	spin_unlock(&rdp->lock);
> > +	local_irq_restore(flags);
> 
> spin_unlock_irqrestore() here would be consistent with the above.
> 
> > +	if (wake_gp) {
> > +
> > +		/* Wake up grace-period processing, unless someone beat us. */
> > +
> > +		spin_lock_irqsave(&rcu_ctrlblk.schedlock, flags);
> 
> If wake_gp!=0 is common then we could microoptimise straight-line
> performance here by retaining the irq-offness from above.
> 
> > +		if (rcu_ctrlblk.sched_sleep != rcu_sched_sleeping)
> > +			wake_gp = 0;
> > +		rcu_ctrlblk.sched_sleep = rcu_sched_not_sleeping;
> > +		spin_unlock_irqrestore(&rcu_ctrlblk.schedlock, flags);
> > +		if (wake_gp)
> > +			wake_up_interruptible(&rcu_ctrlblk.sched_wq);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(call_rcu_sched);
> >
> > ...
> >
> > +static int
> > +rcu_sched_grace_period(void *arg)
> 
> Unneeded newline.
> 
> >  {
> > -	cpumask_t oldmask;
> > +	int couldsleep;		/* might sleep after current pass. */
> > +	int couldsleepnext = 0; /* might sleep after next pass. */
> >  	int cpu;
> > +	unsigned long flags;
> > +	struct rcu_data *rdp;
> > +	int ret;
> >  
> > -	if (sched_getaffinity(0, &oldmask) < 0)
> > -		oldmask = cpu_possible_map;
> > -	for_each_online_cpu(cpu) {
> > -		sched_setaffinity(0, cpumask_of_cpu(cpu));
> > -		schedule();
> > -	}
> > -	sched_setaffinity(0, oldmask);
> > +	/*
> > +	 * Each pass through the following loop handles one
> > +	 * rcu_sched grace period cycle.
> > +	 */
> > +
> > +	do {
> > +		
> > +		/* Save each CPU's current state. */
> > +
> > +		for_each_online_cpu(cpu) {
> 
> Numerous unneeded newline ;)
> 
> > +			dyntick_save_progress_counter_sched(cpu);
> > +			save_qsctr_sched(cpu);
> > +		}
> > +
> > +		/*
> > +		 * Sleep for about an RCU grace-period's worth to
> > +		 * allow better batching and to consume less CPU.
> > +		 */
> > +
> > +		schedule_timeout_interruptible(HZ / 20);
> 
> eek, a magic number.
> 
> > +		/*
> > +		 * If there was nothing to do last time, prepare to
> > +		 * sleep at the end of the current grace period cycle.
> > +		 */
> > +
> > +		couldsleep = couldsleepnext;
> > +		couldsleepnext = 1;
> > +		if (couldsleep) {
> > +			spin_lock_irqsave(&rcu_ctrlblk.schedlock, flags);
> > +			rcu_ctrlblk.sched_sleep = rcu_sched_sleep_prep;
> > +			spin_unlock_irqrestore(&rcu_ctrlblk.schedlock, flags);
> > +		}
> 
> If the above locking actually correct and needed?  The write to
> rcu_ctrlblk.sched_sleep is a single word...
> 
> > +		/*
> > +		 * Wait on each CPU in turn to have either visited
> > +		 * a quiescent state or been in dynticks-idle mode.
> > +		 */
> > +
> > +		for_each_online_cpu(cpu) {
> > +			while (rcu_qsctr_inc_needed(cpu) &&
> > +			       rcu_qsctr_inc_needed_dyntick(cpu)) {
> > +				/* resched_cpu(cpu); */
> > +				schedule_timeout_interruptible(1);
> > +			}
> > +		}
> > +
> > +		/*
> > +		 * Advance callbacks for each CPU.
> > +		 */
> > +
> > +		for_each_online_cpu(cpu) {
> 
> It's more conventional to omit the blank line after the above form of
> comment block.
> 
> > +			rdp = RCU_DATA_CPU(cpu);
> > +			spin_lock_irqsave(&rdp->lock, flags);
> > +
> > +			/*
> > +			 * We are running on this CPU irq-disabled, so no
> > +			 * CPU can go offline until we re-enable irqs.
> 
> but, but, but.  The cpu at `cpu' could have gone offline just before we
> disabled local interrupts.

In that case the CPU_DEAD callback should have migrated the rcu-lists to
a cpu which is online.

> 
> > +			 * Advance the callbacks!  We share normal RCU's
> > +			 * donelist, since callbacks are invoked the
> > +			 * same way in either case.
> > +			 */
> > +
> > +			if (rdp->waitschedlist != NULL) {
> > +				*rdp->donetail = rdp->waitschedlist;
> > +				rdp->donetail = rdp->waitschedtail;
> > +
> > +				/*
> > +				 * Next rcu_check_callbacks() will
> > +				 * do the required raise_softirq().
> > +				 */
> > +			}
> > +			if (rdp->nextschedlist != NULL) {
> > +				rdp->waitschedlist = rdp->nextschedlist;
> > +				rdp->waitschedtail = rdp->nextschedtail;
> > +				couldsleep = 0;
> > +				couldsleepnext = 0;
> > +			} else {
> > +				rdp->waitschedlist = NULL;
> > +				rdp->waitschedtail = &rdp->waitschedlist;
> > +			}
> > +			rdp->nextschedlist = NULL;
> > +			rdp->nextschedtail = &rdp->nextschedlist;
> > +
> > +			/* Mark sleep intention. */
> > +
> > +			rdp->rcu_sched_sleeping = couldsleep;
> > +
> > +			spin_unlock_irqrestore(&rdp->lock, flags);
> > +		}
> > +
> > +		/* If we saw callbacks on the last scan, go deal with them. */
> > +
> > +		if (!couldsleep)
> > +			continue;
> > +
> > +		/* Attempt to block... */
> > +
> > +		spin_lock_irqsave(&rcu_ctrlblk.schedlock, flags);
> > +		if (rcu_ctrlblk.sched_sleep != rcu_sched_sleep_prep) {
> > +
> > +			/*
> > +			 * Someone posted a callback after we scanned.
> > +			 * Go take care of it.
> > +			 */
> > +
> > +			spin_unlock_irqrestore(&rcu_ctrlblk.schedlock, flags);
> > +			couldsleepnext = 0;
> > +			continue;
> > +		}
> > +
> > +		/* Block until the next person posts a callback. */
> > +
> > +		rcu_ctrlblk.sched_sleep = rcu_sched_sleeping;
> > +		spin_unlock_irqrestore(&rcu_ctrlblk.schedlock, flags);
> > +		ret = 0;
> > +		__wait_event_interruptible(rcu_ctrlblk.sched_wq,
> > +			rcu_ctrlblk.sched_sleep != rcu_sched_sleeping,
> > +			ret);
> > +		if (ret)
> > +			flush_signals(current);
> 
> That flush_signals() was a surprise.  A desurprising comment would be nice.
> 
> > +		couldsleepnext = 0;
> > +
> > +	} while (!kthread_should_stop());
> > +
> > +	return (0);
> >  }
> > -EXPORT_SYMBOL_GPL(__synchronize_sched);
> >  
> >  /*
> >   * Check to see if any future RCU-related work will need to be done
> > @@ -1029,7 +1333,9 @@ int rcu_needs_cpu(int cpu)
> >  
> >  	return (rdp->donelist != NULL ||
> >  		!!rdp->waitlistcount ||
> > -		rdp->nextlist != NULL);
> > +		rdp->nextlist != NULL ||
> > +		rdp->nextschedlist != NULL ||
> > +		rdp->waitschedlist != NULL);
> >  }
> >  
> >  int rcu_pending(int cpu)
> > @@ -1040,7 +1346,9 @@ int rcu_pending(int cpu)
> >  
> >  	if (rdp->donelist != NULL ||
> >  	    !!rdp->waitlistcount ||
> > -	    rdp->nextlist != NULL)
> > +	    rdp->nextlist != NULL ||
> > +	    rdp->nextschedlist != NULL ||
> > +	    rdp->waitschedlist != NULL)
> >  		return 1;
> >  
> >  	/* The RCU core needs an acknowledgement from this CPU. */
> > @@ -1107,6 +1415,11 @@ void __init __rcu_init(void)
> >  		rdp->donetail = &rdp->donelist;
> >  		rdp->rcu_flipctr[0] = 0;
> >  		rdp->rcu_flipctr[1] = 0;
> > +		rdp->nextschedlist = NULL;
> > +		rdp->nextschedtail = &rdp->nextschedlist;
> > +		rdp->waitschedlist = NULL;
> > +		rdp->waitschedtail = &rdp->waitschedlist;
> > +		rdp->rcu_sched_sleeping = 0;
> >  	}
> >  	register_cpu_notifier(&rcu_nb);
> >  
> > @@ -1129,6 +1442,18 @@ void __init __rcu_init(void)
> >  }
> >  
> >  /*
> > + * Late-boot-time RCU initialization that must wait until after scheduler
> > + * has been initialized.
> > + */
> > +void __init rcu_init_sched(void)
> > +{
> > +	rcu_sched_grace_period_task = kthread_run(rcu_sched_grace_period,
> > +						  NULL,
> > +						  "rcu_sched_grace_period");
> > +	WARN_ON(IS_ERR(rcu_sched_grace_period_task));
> > +}
> > +
> > +/*
> >   * Deprecated, use synchronize_rcu() or synchronize_sched() instead.
> >   */
> >  void synchronize_kernel(void)
> 
> I suspect I don't understand any of the RCU code any more.


> 

-- 
Thanks and Regards
gautham

  reply	other threads:[~2008-04-08  8:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-21 14:36 [PATCH,RFC] Add call_rcu_sched() Paul E. McKenney
2008-03-21 22:40 ` Paul E. McKenney
2008-03-24  5:06   ` Mathieu Desnoyers
2008-03-24  5:46     ` Paul E. McKenney
2008-03-24 13:59       ` Mathieu Desnoyers
2008-03-25 12:53       ` [PATCH,RFC] Initialize call_rcu_sched sooner Mathieu Desnoyers
2008-03-25 18:49         ` Paul E. McKenney
2008-04-06 21:37   ` [PATCH,RFC] Add call_rcu_sched() Paul E. McKenney
2008-04-08  7:34     ` Andrew Morton
2008-04-08  8:10       ` Gautham R Shenoy [this message]
2008-04-08  8:39         ` Andrew Morton
2008-04-08  8:56           ` Gautham R Shenoy
2008-04-08  9:07             ` Andrew Morton
2008-04-08 17:17               ` Paul E. McKenney
2008-04-08 16:59       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080408081048.GA15001@in.ibm.com \
    --to=ego@in.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=adrian.bunk@movial.fi \
    --cc=akpm@linux-foundation.org \
    --cc=dipankar@in.ibm.com \
    --cc=dsmith@redhat.com \
    --cc=dvhltc@us.ibm.com \
    --cc=hch@infradead.org \
    --cc=jkenisto@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=mmlnx@us.ibm.com \
    --cc=niv@us.ibm.com \
    --cc=oleg@tv-sign.ru \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rostedt@goodmis.org \
    --cc=rusty@au1.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.