All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Oleg Nesterov <oleg@tv-sign.ru>,
	Alan Stern <stern@rowland.harvard.edu>,
	Linus Torvalds <torvalds@osdl.org>,
	Thomas Gleixner <tglx@timesys.com>, Ingo Molnar <mingo@elte.hu>,
	LKML <linux-kernel@vger.kernel.org>,
	john stultz <johnstul@us.ibm.com>,
	David Miller <davem@davemloft.net>,
	Arjan van de Ven <arjan@infradead.org>,
	Andrew Morton <akpm@osdl.org>, Andi Kleen <ak@suse.de>,
	manfred@colorfullife.com
Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
Date: Mon, 20 Nov 2006 12:09:33 -0800	[thread overview]
Message-ID: <20061120200933.GG8033@us.ibm.com> (raw)
In-Reply-To: <20061120175553.GC8055@kernel.dk>

On Mon, Nov 20, 2006 at 06:55:54PM +0100, Jens Axboe wrote:
> On Mon, Nov 20 2006, Paul E. McKenney wrote:
> > On Mon, Nov 20, 2006 at 08:15:14AM +0100, Jens Axboe wrote:
> > > On Sun, Nov 19 2006, Paul E. McKenney wrote:
> > > > On Sat, Nov 18, 2006 at 09:46:24PM +0300, Oleg Nesterov wrote:
> > > > > On 11/17, Paul E. McKenney wrote:
> > > > > >
> > > > > > Oleg, any thoughts about Jens's optimization?  He would code something
> > > > > > like:
> > > > > >
> > > > > > 	if (srcu_readers_active(&my_srcu))
> > > > > > 		synchronize_srcu();
> > > > > > 	else
> > > > > > 		smp_mb();
> > > > >
> > > > > Well, this is clearly racy, no? I am not sure, but may be we can do
> > > > >
> > > > > 	smp_mb();
> > > > > 	if (srcu_readers_active(&my_srcu))
> > > > > 		synchronize_srcu();
> > > > >
> > > > > in this case we also need to add 'smp_mb()' into srcu_read_lock() after
> > > > > 'atomic_inc(&sp->hardluckref)'.
> > > > >
> > > > > > However, he is doing ordered I/O requests rather than protecting data
> > > > > > structures.
> > > > >
> > > > > Probably this makes a difference, but I don't understand this.
> > > >
> > > > OK, one hypothesis here...
> > > >
> > > > 	The I/Os must be somehow explicitly ordered to qualify
> > > > 	for I/O-barrier separation.  If two independent processes
> > > > 	issue I/Os concurrently with a third process doing an
> > > > 	I/O barrier, the I/O barrier is free to separate the
> > > > 	two concurrent I/Os or not, on its whim.
> > > >
> > > > Jens, is the above correct?  If so, what would the two processes
> > >
> > > That's completely correct, hence my somewhat relaxed approach with SRCU.
> >
> > OK, less scary in that case.  ;-)
> 
> Yep, it's really not scary in any ordering sense!
> 
> > > > need to do in order to ensure that their I/O was considered to be
> > > > ordered with respect to the I/O barrier?  Here are some possibilities:
> > >
> > > If we consider the barrier a barrier in a certain stream of requests,
> > > it is the responsibility of the issuer of that barrier to ensure that
> > > the queueing is ordered. So if two "unrelated" streams of requests with
> > > barriers hit __make_request() at the same time, we don't go to great
> > > lengths to ensure who gets there firt.
> >
> > So the "preceding" requests have to have completed their I/O system
> > calls?  If this is the case, does this include normal (non-direct/raw)
> > writes and asynchronous reads?  My guess is that it would include
> > asynchronous I/O, but not buffered writes.
> 
> They need not have completed, but they must have been queued at the
> block layer level. IOW, the io scheduler must know about them. Since
> it's a block layer device property, we really don't care about system
> calls since any of them could amount to 1 or lots more individual io
> requests.
> 
> But now we have taken a detour from the original problem. As I wrote
> above, the io scheduler must know about the requests. When the plug list
> ends up in the private process context, the io scheduler doesn't know
> about it yet. When a barrier is queued, the block layer does not care
> about io that hasn't been issued yet (dirty data in the page cache
> perhaps), since if it hasn't been seen, it's by definition not
> interesting. But if some of the requests reside in a different process
> private request list, then that is a violation of this rule since it
> should technically belong to the block layer / io scheduler at that
> point. This is where I wanted to use SRCU.

OK.  Beyond a certain point, I would need to see the code using SRCU.

> > > > 1.	I/O barriers apply only to preceding and following I/Os from
> > > > 	the process issuing the I/O barrier.
> > > >
> > > > 2.	As for #1 above, but restricted to task rather than process.
> > > >
> > > > 3.	I/O system calls that have completed are ordered by the
> > > > 	barrier to precede I/O system calls that have not yet
> > > > 	started, but I/O system calls still in flight could legally
> > > > 	land on either side of the concurrently executing I/O
> > > > 	barrier.
> > > >
> > > > 4.	Something else entirely?
> > > >
> > > > Given some restriction like one of the above, it is entirely possible
> > > > that we don't even need the memory barrier...
> > >
> > > 3 is the closest. The request queue doesn't really know the scope of the
> > > barrier, it has to rely on the issuer getting it right. If you have two
> > > competing processes issuing io and process A relies on process B issuing
> > > a barrier, they have to synchronize that between them. Normally that is
> > > not a problem, since that's how the file systems always did io before
> > > barriers on items that need to be on disk (it was a serialization point
> > > anyway, it's just a stronger one now).
> >
> > So something like a user-level mutex or atomic instructions must be used
> > by the tasks doing the pre-barrier I/Os to announce that these I/Os have
> > been started in the kernel.
> 
> We don't do barriers from user space, it's purely a feature available to
> file systems to ensure ordering of writes even at the disk platter
> level.

Got it.  The question then becomes whether the barriers involved in
locking/whatever for the queues enter into the picture.

> > > That said, I think the
> > >
> > >         smp_mb();
> > >         if (srcu_readers_active(sp))
> > >                 synchronize_srcu();
> > >
> > > makes the most sense.
> >
> > If the user-level tasks/threads/processes must explicitly synchronize,
> > and if the pre-barrier I/O-initation syscalls have to have completed,
> > then I am not sure that the smp_mb() is needed.  Seems like the queuing
> > mechanisms in the syscall and the user-level synchronization would have
> > supplied the needed memory barriers.  Or are you using some extremely
> > lightweight user-level synchronization?
> 
> Once a process holds a queue plug, any write issued to that plug list
> will do an srcu_read_lock(). So as far as I can tell, the smp_mb() is
> needed to ensure that an immediately following synchronize_srcu() from a
> barrier write queued on a different CPU will see that srcu_read_lock().

Is the srcu_read_lock() invoked before actually queueing the I/O?
Is there any interaction with the queues befory calling synchronize_srcu()?
If yes to both, it might be possible to remove the memory barrier.

That said, the overhead of the smp_mb() is so small compared to that of
the I/O path that it might not be worth it.

> There are no syscall or user-level synchronization.

Got it, thank you for the explanation!

							Thanx, Paul

  reply	other threads:[~2006-11-20 20:08 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-16 20:00 BUG: cpufreq notification broken Thomas Gleixner
2006-11-16 20:15 ` [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync Ingo Molnar
2006-11-16 20:45   ` Thomas Gleixner
2006-11-16 21:47     ` Linus Torvalds
2006-11-16 22:03       ` Alan Stern
2006-11-16 22:21         ` Linus Torvalds
2006-11-17  3:06           ` Alan Stern
2006-11-17  6:51             ` Paul E. McKenney
2006-11-17  9:29               ` Jens Axboe
2006-11-17 18:39                 ` Oleg Nesterov
2006-11-18  0:28                   ` Paul E. McKenney
2006-11-18 16:15                     ` Alan Stern
2006-11-18 17:14                       ` Paul E. McKenney
2006-11-18 19:34                         ` Oleg Nesterov
2006-11-19 21:26                           ` Paul E. McKenney
2006-11-18 21:00                         ` Alan Stern
2006-11-18 21:25                           ` Oleg Nesterov
2006-11-18 22:13                             ` Alan Stern
2006-11-18 22:46                               ` Oleg Nesterov
2006-11-19 20:12                                 ` Alan Stern
2006-11-19 21:43                           ` Paul E. McKenney
2006-11-20 17:19                             ` Alan Stern
2006-11-20 17:58                               ` Jens Axboe
2006-11-20 19:39                                 ` Alan Stern
2006-11-20 20:13                                   ` Jens Axboe
2006-11-20 21:39                                     ` Alan Stern
2006-11-21  7:39                                       ` Jens Axboe
2006-11-20 18:57                               ` Oleg Nesterov
2006-11-20 20:01                                 ` Alan Stern
2006-11-20 20:51                                   ` Paul E. McKenney
2006-11-21 20:04                                   ` Oleg Nesterov
2006-11-21 20:54                                     ` Alan Stern
2006-11-21 22:07                                     ` Paul E. McKenney
2006-11-20 20:38                                 ` Paul E. McKenney
2006-11-21 16:44                                   ` Oleg Nesterov
2006-11-21 19:56                                     ` Paul E. McKenney
2006-11-21 20:26                                       ` Alan Stern
2006-11-21 23:03                                         ` Paul E. McKenney
2006-11-22  2:17                                           ` Alan Stern
2006-11-22 17:01                                             ` Paul E. McKenney
2006-11-26 22:25                                 ` Oleg Nesterov
2006-11-27 21:10                                   ` Alan Stern
2006-11-28  1:47                                     ` Paul E. McKenney
2006-11-20 19:17                               ` Paul E. McKenney
2006-11-20 20:22                                 ` Alan Stern
2006-11-21 17:55                                   ` Paul E. McKenney
2006-11-21 17:56                             ` Alan Stern
2006-11-21 19:13                               ` Paul E. McKenney
2006-11-21 20:40                                 ` Alan Stern
2006-11-22 18:08                                   ` Paul E. McKenney
2006-11-21 21:01                                 ` Oleg Nesterov
2006-11-22  0:51                                   ` Paul E. McKenney
2006-11-18 18:46                     ` Oleg Nesterov
2006-11-19 21:07                       ` Paul E. McKenney
2006-11-20  7:15                         ` Jens Axboe
2006-11-20 16:59                           ` Paul E. McKenney
2006-11-20 17:55                             ` Jens Axboe
2006-11-20 20:09                               ` Paul E. McKenney [this message]
2006-11-20 20:21                                 ` Jens Axboe
2006-11-18 19:02                     ` Oleg Nesterov
2006-11-19 21:23                       ` Paul E. McKenney
2006-11-17 19:15                 ` Paul E. McKenney
2006-11-17 19:27                   ` Alan Stern
2006-11-18  0:38                     ` Paul E. McKenney
2006-11-18  4:33                       ` Alan Stern
2006-11-18  4:51                         ` Andrew Morton
2006-11-18  5:57                           ` Paul E. McKenney
2006-11-19 19:00                 ` Oleg Nesterov
2006-11-19 20:21                   ` Alan Stern
2006-11-19 20:55                     ` Oleg Nesterov
2006-11-19 21:09                       ` Alan Stern
2006-11-19 21:17                         ` Oleg Nesterov
2006-11-19 21:54                           ` Paul E. McKenney
2006-11-19 22:28                             ` Oleg Nesterov
2006-11-20  5:47                               ` Paul E. McKenney
2006-11-19 21:20                       ` Paul E. McKenney
2006-11-19 21:50                         ` Oleg Nesterov
2006-11-19 22:04                           ` Paul E. McKenney
2006-11-23 14:59                   ` Oleg Nesterov
2006-11-23 20:40                     ` Paul E. McKenney
2006-11-23 21:34                       ` Oleg Nesterov
2006-11-23 21:49                       ` Oleg Nesterov
2006-11-27  4:33                         ` Paul E. McKenney
2006-11-24 18:21                     ` Oleg Nesterov
2006-11-24 20:04                       ` Jens Axboe
2006-11-24 20:47                       ` Alan Stern
2006-11-24 21:13                         ` Oleg Nesterov
2006-11-25  3:24                           ` Alan Stern
2006-11-25 17:14                             ` Oleg Nesterov
2006-11-25 22:06                               ` Alan Stern
2006-11-26 21:34                                 ` Oleg Nesterov
2006-11-27  5:02                       ` Paul E. McKenney
2006-11-27 16:11                         ` Oleg Nesterov
2006-11-27 16:56                           ` Oleg Nesterov
2006-11-29 19:29                           ` Paul E. McKenney
2006-11-29 20:16                             ` Oleg Nesterov
2006-11-29 23:08                               ` Paul E. McKenney
2006-11-30  0:01                                 ` Oleg Nesterov
2006-11-17  2:33       ` Paul E. McKenney
2006-11-16 20:52   ` Peter Zijlstra
2006-11-16 21:20   ` Andrew Morton
2006-11-16 21:27     ` Thomas Gleixner
2006-11-20 19:57   ` Linus Torvalds
2006-11-16 20:27 ` BUG: cpufreq notification broken Alan Stern
2006-11-16 21:09   ` Thomas Gleixner
2006-11-16 21:26     ` Alan Stern
2006-11-16 21:36       ` Thomas Gleixner
2006-11-16 21:56         ` Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061120200933.GG8033@us.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=arjan@infradead.org \
    --cc=davem@davemloft.net \
    --cc=jens.axboe@oracle.com \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=mingo@elte.hu \
    --cc=oleg@tv-sign.ru \
    --cc=stern@rowland.harvard.edu \
    --cc=tglx@timesys.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.