All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alan Stern <stern@rowland.harvard.edu>,
	Jens Axboe <jens.axboe@oracle.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
Date: Wed, 29 Nov 2006 11:29:53 -0800	[thread overview]
Message-ID: <20061129192953.GA2335@us.ibm.com> (raw)
In-Reply-To: <20061127161106.GA279@oleg>

On Mon, Nov 27, 2006 at 07:11:06PM +0300, Oleg Nesterov wrote:
> On 11/26, Paul E. McKenney wrote:
> >
> > Looks pretty good, actually.  A few quibbles below.  I need to review
> > again after sleeping on it.
> 
> Thanks! Please also look at spinlock-based implementation,
> 
> 	http://marc.theaimsgroup.com/?l=linux-kernel&m=116457701231964
> 
> I must admit, Alan was right: it really looks simpler and the perfomance
> penalty should be very low. I personally hate this additional spinlock
> per se as a "unneeded complication", but probably you will like it more.

The two have different advantages and disadvantages, but nothing really
overwhelming either way.  Here is my take:

1.	The spinlock version will be easier for most people to understand.

2.	The atomic version has better read-side overhead -- probably
	roughly twice as fast on most machines.

3.	The atomic version will have better worst-case latency under
	heavy read-side load -- at least assuming that the underlying
	hardware is fair.

4.	The spinlock version would have better fairness in face of
	synchronize_xxx() overload.

5.	Neither version can be used from irq (but the same is true of
	SRCU as well).

If I was to choose, I would probably go with the easy-to-understand
case, which would push me towards the spinlocks.  If there is a
read-side performance problem, then the atomic version can be easily
resurrected from the LKML archives.  Maybe have a URL in a comment
pointing to the atomic implementation?  ;-)

All this assuming that the spinlock version passes rcutorture, of course!!!

> > > +int xxx_read_lock(struct xxx_struct *sp)
> > > +{
> > > +	for (;;) {
> > > +		int idx = sp->completed & 0x1;
> >
> > Might need a comment saying why no rcu_dereference() needed on the
> > preceding line.  The reason (as I understand it) is that we are
> > only doing atomic operations on the element being indexed.
> 
> My understanding is the same. Actually, smp_read_barrier_depends() can't
> help because 'atomic_inc' and '->completed++' in synchronize_xxx() could
> be re-ordered anyway, so we should rely on correctness of atomic_t.

Fair enough!

> > > +		if (likely(atomic_inc_not_zero(sp->ctr + idx)))
> > > +			return idx;
> > > +	}
> > > +}
> >
> > The loop seems absolutely necessary if one wishes to avoid a
> > synchronize_sched() in synchronize_xxx() below (and was one of the things
> > I was missing earlier).  However, isn't there a possibility that a pile
> > of synchronize_xxx() calls might indefinitely delay an unlucky reader?
> 
> Note that synchronize_xxx() does nothing when there are no readers under
> xxx_read_lock(), so
> 
> 	for (;;)
> 		synchronize_xxx();
> 
> can't suspend xxx_read_lock(). atomic_inc_not_zero() fails when something like
> the events below happen between 'idx = sp->completed' and 'atomic_inc_not_zero'
> 
> 	- another reader does xxx_read_lock(), increments ->ctr.
> 
> 	- synchronize_xxx() notices it, goes to __wait_event()
> 
> 	- both the reader and writer manage to do atomic_dec()
> 
> This is possible in theory, but indefinite delay... Look, we have the same
> "problem" with spinlocks: in theory synchronize_xxx() calls might indefinitely
> delay an unlucky reader because synchronize_xxx() always wins spin_lock(&sp->lock);

True enough!  Again, the only way I can see to avoid the possibility of
indefinite delay is to make the updater do synchronize_sched(), which
is what we were trying to avoid in the first place.  ;-)

> > > +
> > > +void xxx_read_unlock(struct xxx_struct *sp, int idx)
> > > +{
> > > +	if (unlikely(atomic_dec_and_test(sp->ctr + idx)))
> > > +		wake_up(&sp->wq);
> > > +}
> > > +
> > > +void synchronize_xxx(struct xxx_struct *sp)
> > > +{
> > > +	int idx;
> > > +
> > > +	mutex_lock(&sp->mutex);
> > > +
> > > +	idx = sp->completed & 0x1;
> > > +	if (!atomic_add_unless(sp->ctr + idx, -1, 1))
> > > +		goto out;
> > > +
> > > +	atomic_inc(sp->ctr + (idx ^ 0x1));
> > > +	sp->completed++;
> > > +
> > > +	__wait_event(sp->wq, !atomic_read(sp->ctr + idx));
> > > +out:
> > > +	mutex_unlock(&sp->mutex);
> > > +}
> >
> > Test code!!!  Very good!!!  (This is added to rcutorture, right?)
> 
> Yes, the whole patch goes to kernel/rcutorture.c, it is only for
> testing/review.
> 
> Note: I suspect that Documentation/ lies about atomic_add_unless(), see
> 
> 	http://marc.theaimsgroup.com/?l=linux-kernel&m=116448966030359

Hmmm...  Some do and some don't:

Alpha:	Does a memory barrier after (but not before) via cmpxchg().

arm:	No clue.  http://www.arm.com/pdfs/DUI0204G_rvct_assembler_guide.pdf
	does not seem to say much about memory-ordering issues.
	There are no obvious memory-barrier instructions, but I
	don't see what (if any) ordering effects that ldrex or
	strexeq might or might not have.

arm26:	No SMP support, so no problem.

cris:	Uses hashed spinlocks, so probably OK.  (Are cris spinlocks
	"leaky" in the ia64 sense?  If so, then -not- OK.)

frv:	No SMP support, so no problem.

h8300:	No SMP support, despite having code under CONFIG_SMP.
	In any case, local_irq_save() doesn't do much for SMP.
	(Right?  Or does h8300 do something special here?)

i386:	The x86 semantics, as I understand them, are in fact equivalent
	to having a memory barrier before and after the operation.
	However, the documentation I have is not as clear as it might be.

ia64:	"Acquire" semantics, so that earlier operations may be reordered
	after the atomic_add_unless(), but later operations may -not-
	be reordered before atomic_add_unless().

m32r:	Don't know much about m32r, but it does have an CONFIG_SMP
	implementation.

m68k:	I don't know what memory-barrier semantics the "cas" instructions
	provide.  A quick Google search was not helpful.

mips:	Seems to have a memory barrier only after, not before.
	http://www.mips.com/content/PressRoom/TechLibrary/WhitePapers/multi_cpu
	seem to indicate that the semantics of the "sync" instruction
	depend on hardware external to the CPU.

parisc:	Implements atomic_add_unless() with a spinlock, so probably	
	does memory barrier before and after (I believe parisc does
	not have "leaky" spinlock primitives like ia64 does).

powerpc: lwsync before and isync after.

s390:	The "cs" (compare-and-swap) instruction does serialization
	equivalent to memory barrier before and after.

sh:	No SMP support, despite having code under CONFIG_SMP.
	In any case, local_irq_save() doesn't do much for SMP.
	(Right?  Or does "sh" do something special here?)

sh64:	No SMP support, despite having code under CONFIG_SMP.
	In any case, local_irq_save() doesn't do much for SMP.
	(Right?  Or does "sh64" do something special here?)

sparc:	Uses spinlocks, so similar to parisc.

sparc64: Does have explicit memory barriers before and after.  ;-)

v850:	No SMP support, so no problem.

x86_64:	Same as for i386.

xtensa:	No SMP support, so no problem.

---

So either the docs or several of the architectures need fixing.

And it would be -really- nice if more architectures posted complete
instruction reference manuals on the web!!!  (Or maybe I need to
be better at searching for them?)

> so synchronize_xxx() should be
> 
> 	void synchronize_xxx(struct xxx_struct *sp)
> 	{
> 		int idx;
> 
> 		smp_mb();
> 		mutex_lock(&sp->mutex);
> 
> 		idx = sp->completed & 0x1;
> 		if (atomic_read(sp->ctr + idx) == 1)
> 			goto out;
> 
> 		atomic_inc(sp->ctr + (idx ^ 0x1));
> 		sp->completed++;
> 
> 		atomic_dec(sp->ctr + idx);
> 		wait_event(sp->wq, !atomic_read(sp->ctr + idx));
> 	out:
> 		mutex_unlock(&sp->mutex);
> 	}
> 
> Yes, Alan was right, spinlock_t makes the code simpler.

;-)

							Thanx, Paul

  parent reply	other threads:[~2006-11-29 19:28 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-16 20:00 BUG: cpufreq notification broken Thomas Gleixner
2006-11-16 20:15 ` [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync Ingo Molnar
2006-11-16 20:45   ` Thomas Gleixner
2006-11-16 21:47     ` Linus Torvalds
2006-11-16 22:03       ` Alan Stern
2006-11-16 22:21         ` Linus Torvalds
2006-11-17  3:06           ` Alan Stern
2006-11-17  6:51             ` Paul E. McKenney
2006-11-17  9:29               ` Jens Axboe
2006-11-17 18:39                 ` Oleg Nesterov
2006-11-18  0:28                   ` Paul E. McKenney
2006-11-18 16:15                     ` Alan Stern
2006-11-18 17:14                       ` Paul E. McKenney
2006-11-18 19:34                         ` Oleg Nesterov
2006-11-19 21:26                           ` Paul E. McKenney
2006-11-18 21:00                         ` Alan Stern
2006-11-18 21:25                           ` Oleg Nesterov
2006-11-18 22:13                             ` Alan Stern
2006-11-18 22:46                               ` Oleg Nesterov
2006-11-19 20:12                                 ` Alan Stern
2006-11-19 21:43                           ` Paul E. McKenney
2006-11-20 17:19                             ` Alan Stern
2006-11-20 17:58                               ` Jens Axboe
2006-11-20 19:39                                 ` Alan Stern
2006-11-20 20:13                                   ` Jens Axboe
2006-11-20 21:39                                     ` Alan Stern
2006-11-21  7:39                                       ` Jens Axboe
2006-11-20 18:57                               ` Oleg Nesterov
2006-11-20 20:01                                 ` Alan Stern
2006-11-20 20:51                                   ` Paul E. McKenney
2006-11-21 20:04                                   ` Oleg Nesterov
2006-11-21 20:54                                     ` Alan Stern
2006-11-21 22:07                                     ` Paul E. McKenney
2006-11-20 20:38                                 ` Paul E. McKenney
2006-11-21 16:44                                   ` Oleg Nesterov
2006-11-21 19:56                                     ` Paul E. McKenney
2006-11-21 20:26                                       ` Alan Stern
2006-11-21 23:03                                         ` Paul E. McKenney
2006-11-22  2:17                                           ` Alan Stern
2006-11-22 17:01                                             ` Paul E. McKenney
2006-11-26 22:25                                 ` Oleg Nesterov
2006-11-27 21:10                                   ` Alan Stern
2006-11-28  1:47                                     ` Paul E. McKenney
2006-11-20 19:17                               ` Paul E. McKenney
2006-11-20 20:22                                 ` Alan Stern
2006-11-21 17:55                                   ` Paul E. McKenney
2006-11-21 17:56                             ` Alan Stern
2006-11-21 19:13                               ` Paul E. McKenney
2006-11-21 20:40                                 ` Alan Stern
2006-11-22 18:08                                   ` Paul E. McKenney
2006-11-21 21:01                                 ` Oleg Nesterov
2006-11-22  0:51                                   ` Paul E. McKenney
2006-11-18 18:46                     ` Oleg Nesterov
2006-11-19 21:07                       ` Paul E. McKenney
2006-11-20  7:15                         ` Jens Axboe
2006-11-20 16:59                           ` Paul E. McKenney
2006-11-20 17:55                             ` Jens Axboe
2006-11-20 20:09                               ` Paul E. McKenney
2006-11-20 20:21                                 ` Jens Axboe
2006-11-18 19:02                     ` Oleg Nesterov
2006-11-19 21:23                       ` Paul E. McKenney
2006-11-17 19:15                 ` Paul E. McKenney
2006-11-17 19:27                   ` Alan Stern
2006-11-18  0:38                     ` Paul E. McKenney
2006-11-18  4:33                       ` Alan Stern
2006-11-18  4:51                         ` Andrew Morton
2006-11-18  5:57                           ` Paul E. McKenney
2006-11-19 19:00                 ` Oleg Nesterov
2006-11-19 20:21                   ` Alan Stern
2006-11-19 20:55                     ` Oleg Nesterov
2006-11-19 21:09                       ` Alan Stern
2006-11-19 21:17                         ` Oleg Nesterov
2006-11-19 21:54                           ` Paul E. McKenney
2006-11-19 22:28                             ` Oleg Nesterov
2006-11-20  5:47                               ` Paul E. McKenney
2006-11-19 21:20                       ` Paul E. McKenney
2006-11-19 21:50                         ` Oleg Nesterov
2006-11-19 22:04                           ` Paul E. McKenney
2006-11-23 14:59                   ` Oleg Nesterov
2006-11-23 20:40                     ` Paul E. McKenney
2006-11-23 21:34                       ` Oleg Nesterov
2006-11-23 21:49                       ` Oleg Nesterov
2006-11-27  4:33                         ` Paul E. McKenney
2006-11-24 18:21                     ` Oleg Nesterov
2006-11-24 20:04                       ` Jens Axboe
2006-11-24 20:47                       ` Alan Stern
2006-11-24 21:13                         ` Oleg Nesterov
2006-11-25  3:24                           ` Alan Stern
2006-11-25 17:14                             ` Oleg Nesterov
2006-11-25 22:06                               ` Alan Stern
2006-11-26 21:34                                 ` Oleg Nesterov
2006-11-27  5:02                       ` Paul E. McKenney
2006-11-27 16:11                         ` Oleg Nesterov
2006-11-27 16:56                           ` Oleg Nesterov
2006-11-29 19:29                           ` Paul E. McKenney [this message]
2006-11-29 20:16                             ` Oleg Nesterov
2006-11-29 23:08                               ` Paul E. McKenney
2006-11-30  0:01                                 ` Oleg Nesterov
2006-11-17  2:33       ` Paul E. McKenney
2006-11-16 20:52   ` Peter Zijlstra
2006-11-16 21:20   ` Andrew Morton
2006-11-16 21:27     ` Thomas Gleixner
2006-11-20 19:57   ` Linus Torvalds
2006-11-16 20:27 ` BUG: cpufreq notification broken Alan Stern
2006-11-16 21:09   ` Thomas Gleixner
2006-11-16 21:26     ` Alan Stern
2006-11-16 21:36       ` Thomas Gleixner
2006-11-16 21:56         ` Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061129192953.GA2335@us.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@tv-sign.ru \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.