All of lore.kernel.org
 help / color / mirror / Atom feed
* Wait for mutex to become unlocked
@ 2022-05-04 21:44 Matthew Wilcox
  2022-05-05  0:22 ` Thomas Gleixner
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Matthew Wilcox @ 2022-05-04 21:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long
  Cc: Paul E. McKenney, Thomas Gleixner, Liam R. Howlett, linux-kernel

Paul, Liam and I were talking about some code we intend to write soon
and realised there's a missing function in the mutex & rwsem API.
We're intending to use it for an rwsem, but I think it applies equally
to mutexes.

The customer has a low priority task which wants to read /proc/pid/smaps
of a higher priority task.  Today, everything is awful; smaps acquires
mmap_sem read-only, is preempted, then the high-pri task calls mmap()
and the down_write(mmap_sem) blocks on the low-pri task.  Then all the
other threads in the high-pri task block on the mmap_sem as they take
page faults because we don't want writers to starve.

The approach we're looking at is to allow RCU lookup of VMAs, and then
take a per-VMA rwsem for read.  Because we're under RCU protection,
that looks a bit like this:

	rcu_read_lock();
	vma = vma_lookup();
	if (down_read_trylock(&vma->sem)) {
		rcu_read_unlock();
	} else {
		rcu_read_unlock();
		down_read(&mm->mmap_sem);
		vma = vma_lookup();
		down_read(&vma->sem);
		up_read(&mm->mmap_sem);
	}

(for clarity, I've skipped the !vma checks; don't take this too literally)

So this is Good.  For the vast majority of cases, we avoid taking the
mmap read lock and the problem will appear much less often.  But we can
do Better with a new API.  You see, for this case, we don't actually
want to acquire the mmap_sem; we're happy to spin a bit, but there's no
point in spinning waiting for the writer to finish when we can sleep.
I'd like to write this code:

again:
	rcu_read_lock();
	vma = vma_lookup();
	if (down_read_trylock(&vma->sem)) {
		rcu_read_unlock();
	} else {
		rcu_read_unlock();
		rwsem_wait_read(&mm->mmap_sem);
		goto again;
	}

That is, rwsem_wait_read() puts the thread on the rwsem's wait queue,
and wakes it up without giving it the lock.  Now this thread will never
be able to block any thread that tries to acquire mmap_sem for write.

Similarly, it may make sense to add rwsem_wait_write() and mutex_wait().
Perhaps also mutex_wait_killable() and mutex_wait_interruptible()
(the combinatoric explosion is a bit messy; I don't know that it makes
sense to do the _nested, _io variants).

Does any of this make sense?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Wait for mutex to become unlocked
  2022-05-04 21:44 Wait for mutex to become unlocked Matthew Wilcox
@ 2022-05-05  0:22 ` Thomas Gleixner
  2022-05-05  0:38   ` Matthew Wilcox
  2022-05-05  1:11 ` Waiman Long
       [not found] ` <20220505015223.5132-1-hdanton@sina.com>
  2 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2022-05-05  0:22 UTC (permalink / raw)
  To: Matthew Wilcox, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long
  Cc: Paul E. McKenney, Liam R. Howlett, linux-kernel

On Wed, May 04 2022 at 22:44, Matthew Wilcox wrote:
> The customer has a low priority task which wants to read /proc/pid/smaps
> of a higher priority task.  Today, everything is awful; smaps acquires
> mmap_sem read-only, is preempted, then the high-pri task calls mmap()
> and the down_write(mmap_sem) blocks on the low-pri task.  Then all the
> other threads in the high-pri task block on the mmap_sem as they take
> page faults because we don't want writers to starve.

Welcome to the wonderful world of priority inversion.

> So this is Good.  For the vast majority of cases, we avoid taking the
> mmap read lock and the problem will appear much less often.  But we can
> do Better with a new API.  You see, for this case, we don't actually
> want to acquire the mmap_sem; we're happy to spin a bit, but there's no
> point in spinning waiting for the writer to finish when we can sleep.
> I'd like to write this code:
>
> again:
> 	rcu_read_lock();
> 	vma = vma_lookup();
> 	if (down_read_trylock(&vma->sem)) {
> 		rcu_read_unlock();
> 	} else {
> 		rcu_read_unlock();
> 		rwsem_wait_read(&mm->mmap_sem);
> 		goto again;
> 	}
>
> That is, rwsem_wait_read() puts the thread on the rwsem's wait queue,
> and wakes it up without giving it the lock.  Now this thread will never
> be able to block any thread that tries to acquire mmap_sem for write.

Never?

 	if (down_read_trylock(&vma->sem)) {

---> preemption by writer

The writer will still block and depending on the rest of the runnable
threads it can take quite a while until the low prio reader comes back
on a CPU.

I grant you that the propability will decrease, but 'never' is just
wishful thinking.

> Similarly, it may make sense to add rwsem_wait_write() and mutex_wait().
> Perhaps also mutex_wait_killable() and mutex_wait_interruptible()
> (the combinatoric explosion is a bit messy; I don't know that it makes
> sense to do the _nested, _io variants).
>
> Does any of this make sense?

TBH, no.

If we start opening this can of worms, then we'll see tons of "customer
want's a pony" problems being solved by half baken "solutions" which
will exactly cause the combinatoric explosion you are worried about.

If there is a legitimate requirement to retrieve such information, then
we are better off thinking about a general approach of introspection,
which makes such information available as unreliable snapshots
retrievable with RCU protection.

The information gathered from /proc/pid/smaps is unreliable at the point
where the lock is dropped already today. So it does not make a
difference whether the VMAs have a 'read me if you really think it's
useful' sideband information which gets updated when the VMA changes and
allows to do:

 	rcu_read_lock();
 	vma = vma_lookup();
        if (vma)
                dump(vma->info);
        rcu_read_unlock();

You still need to decide, whether you want to provide that information
or not for a particular interface, but that's way more sane than the
'make locking more complex for questionable value' approach.

But looking at the stuff which gets recomputed and reevaluated in that
proc/smaps code this makes a lot of sense, because most if not all of
this information is already known at the point where the VMA is modified
while holding mmap_sem for useful reasons, no?

So no, we don't want to add more magic locking functions which pretend
to solve unsolvable problems. We rather go and make use of information
which is already available by providing a less archaic access mechanism.

What you are trying to do here is just adding to technical debt IMNSHO.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Wait for mutex to become unlocked
  2022-05-05  0:22 ` Thomas Gleixner
@ 2022-05-05  0:38   ` Matthew Wilcox
  2022-05-05  1:14     ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2022-05-05  0:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Paul E. McKenney, Liam R. Howlett, linux-kernel

On Thu, May 05, 2022 at 02:22:30AM +0200, Thomas Gleixner wrote:
> > So this is Good.  For the vast majority of cases, we avoid taking the
> > mmap read lock and the problem will appear much less often.  But we can
> > do Better with a new API.  You see, for this case, we don't actually
> > want to acquire the mmap_sem; we're happy to spin a bit, but there's no
> > point in spinning waiting for the writer to finish when we can sleep.
> > I'd like to write this code:
> >
> > again:
> > 	rcu_read_lock();
> > 	vma = vma_lookup();
> > 	if (down_read_trylock(&vma->sem)) {
> > 		rcu_read_unlock();
> > 	} else {
> > 		rcu_read_unlock();
> > 		rwsem_wait_read(&mm->mmap_sem);
> > 		goto again;
> > 	}
> >
> > That is, rwsem_wait_read() puts the thread on the rwsem's wait queue,
> > and wakes it up without giving it the lock.  Now this thread will never
> > be able to block any thread that tries to acquire mmap_sem for write.
> 
> Never?
> 
>  	if (down_read_trylock(&vma->sem)) {
> 
> ---> preemption by writer

Ah!  This is a different semaphore.  Yes, it can be preempted while
holding the VMA rwsem and block a thread which is trying to modify the
VMA which will then block all threads from faulting _on that VMA_,
but it won't affect page faults on any other VMA.  It's only Better,
not Best (the Best approach was proposed on Monday afternoon, and
the other MM developers asked us to only go as far as Better and
see if that was good enough).

> The information gathered from /proc/pid/smaps is unreliable at the point
> where the lock is dropped already today. So it does not make a
> difference whether the VMAs have a 'read me if you really think it's
> useful' sideband information which gets updated when the VMA changes and
> allows to do:

Mmm.  I'm not sure that we want to maintain the smaps information on
the off chance that somebody wants to query it.

> But looking at the stuff which gets recomputed and reevaluated in that
> proc/smaps code this makes a lot of sense, because most if not all of
> this information is already known at the point where the VMA is modified
> while holding mmap_sem for useful reasons, no?

I suspect the only way to know is to try to implement it, and then
benchmark it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Wait for mutex to become unlocked
  2022-05-04 21:44 Wait for mutex to become unlocked Matthew Wilcox
  2022-05-05  0:22 ` Thomas Gleixner
@ 2022-05-05  1:11 ` Waiman Long
       [not found] ` <20220505015223.5132-1-hdanton@sina.com>
  2 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2022-05-05  1:11 UTC (permalink / raw)
  To: Matthew Wilcox, Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Paul E. McKenney, Thomas Gleixner, Liam R. Howlett, linux-kernel

On 5/4/22 17:44, Matthew Wilcox wrote:
> Paul, Liam and I were talking about some code we intend to write soon
> and realised there's a missing function in the mutex & rwsem API.
> We're intending to use it for an rwsem, but I think it applies equally
> to mutexes.
>
> The customer has a low priority task which wants to read /proc/pid/smaps
> of a higher priority task.  Today, everything is awful; smaps acquires
> mmap_sem read-only, is preempted, then the high-pri task calls mmap()
> and the down_write(mmap_sem) blocks on the low-pri task.  Then all the
> other threads in the high-pri task block on the mmap_sem as they take
> page faults because we don't want writers to starve.
>
> The approach we're looking at is to allow RCU lookup of VMAs, and then
> take a per-VMA rwsem for read.  Because we're under RCU protection,
> that looks a bit like this:
>
> 	rcu_read_lock();
> 	vma = vma_lookup();
> 	if (down_read_trylock(&vma->sem)) {
> 		rcu_read_unlock();
> 	} else {
> 		rcu_read_unlock();
> 		down_read(&mm->mmap_sem);
> 		vma = vma_lookup();
> 		down_read(&vma->sem);
> 		up_read(&mm->mmap_sem);
> 	}
>
> (for clarity, I've skipped the !vma checks; don't take this too literally)
>
> So this is Good.  For the vast majority of cases, we avoid taking the
> mmap read lock and the problem will appear much less often.  But we can
> do Better with a new API.  You see, for this case, we don't actually
> want to acquire the mmap_sem; we're happy to spin a bit, but there's no
> point in spinning waiting for the writer to finish when we can sleep.
> I'd like to write this code:
>
> again:
> 	rcu_read_lock();
> 	vma = vma_lookup();
> 	if (down_read_trylock(&vma->sem)) {
> 		rcu_read_unlock();
> 	} else {
> 		rcu_read_unlock();
> 		rwsem_wait_read(&mm->mmap_sem);
> 		goto again;
> 	}
>
> That is, rwsem_wait_read() puts the thread on the rwsem's wait queue,
> and wakes it up without giving it the lock.  Now this thread will never
> be able to block any thread that tries to acquire mmap_sem for write.

I suppose that a writer that needs to take a write lock on vma->sem will 
have to take a write lock on mmap_sem first, then it makes sense to me 
that you want to wait for all the vma->sem writers to finish by waiting 
on the wait queue of mmap_sem. By the time the waiting task is being 
woken up, there is no active write lock on the vma->sem and hopefully by 
the time the waiting process wakes up and do a down_read_trylock(), it 
will succeed. However, the time gap in the wakeup process may have 
another writer coming in taking the vma->sem write lock. It improves the 
chance of a successful trylock but it is not guaranteed. So you will 
need a retry count and revert back to a direct down_read() when there 
are too many retries.

Since the waiting process isn't taking any lock, the name 
rwsem_wait_read() may be somewhat misleading. I think a better name may 
be rwsem_flush_waiters(). So do you want to flush the waiters at the 
point this API is called or you want to wait until the wait queue is empty?

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Wait for mutex to become unlocked
  2022-05-05  0:38   ` Matthew Wilcox
@ 2022-05-05  1:14     ` Thomas Gleixner
  2022-05-05  5:04       ` Paul E. McKenney
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2022-05-05  1:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Paul E. McKenney, Liam R. Howlett, linux-kernel

On Thu, May 05 2022 at 01:38, Matthew Wilcox wrote:
> On Thu, May 05, 2022 at 02:22:30AM +0200, Thomas Gleixner wrote:
>> > That is, rwsem_wait_read() puts the thread on the rwsem's wait queue,
>> > and wakes it up without giving it the lock.  Now this thread will never
>> > be able to block any thread that tries to acquire mmap_sem for write.
>> 
>> Never?
>> 
>>  	if (down_read_trylock(&vma->sem)) {
>> 
>> ---> preemption by writer
>
> Ah!  This is a different semaphore.  Yes, it can be preempted while
> holding the VMA rwsem and block a thread which is trying to modify the
> VMA which will then block all threads from faulting _on that VMA_,
> but it won't affect page faults on any other VMA.

Ooops. Missed that detail. Too many semaphores here.

> It's only Better, not Best (the Best approach was proposed on Monday
> afternoon, and the other MM developers asked us to only go as far as
> Better and see if that was good enough).

:)

>> The information gathered from /proc/pid/smaps is unreliable at the point
>> where the lock is dropped already today. So it does not make a
>> difference whether the VMAs have a 'read me if you really think it's
>> useful' sideband information which gets updated when the VMA changes and
>> allows to do:
>
> Mmm.  I'm not sure that we want to maintain the smaps information on
> the off chance that somebody wants to query it.

Fair enough, but then the question is whether it's more reasonable to
document that if you want to read that nonsense, then you have to live
with the consequences. The problem with many of those interfaces is that
they have been added for whatever reasons, became ABI and people are
suddenly making performance claims which might not be justified at all.

We really have to make our mind up and make decisions whether we want to
solve every "I want a pony" complaint just because.

>> But looking at the stuff which gets recomputed and reevaluated in that
>> proc/smaps code this makes a lot of sense, because most if not all of
>> this information is already known at the point where the VMA is modified
>> while holding mmap_sem for useful reasons, no?
>
> I suspect the only way to know is to try to implement it, and then
> benchmark it.

Sure. There are other ways than having a RCU protected info, e.g. a
sequence count which ensures that the to be read information is
consistent.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Wait for mutex to become unlocked
       [not found] ` <20220505015223.5132-1-hdanton@sina.com>
@ 2022-05-05  4:12   ` Matthew Wilcox
  0 siblings, 0 replies; 8+ messages in thread
From: Matthew Wilcox @ 2022-05-05  4:12 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Waiman Long, Paul E. McKenney, Liam R. Howlett, linux-kernel

On Thu, May 05, 2022 at 09:52:23AM +0800, Hillf Danton wrote:
> +++ x/kernel/locking/rwsem.c
> @@ -1464,6 +1464,35 @@ void __sched down_read(struct rw_semapho
>  }
>  EXPORT_SYMBOL(down_read);
>  
> +static void __rwsem_wait(struct rw_semaphore *sem, int read, int state)
> +{
> +	DEFINE_WAIT(wait);
> +	int locked;
> +
> +	prepare_to_wait(&sem->willy_wq, &wait, state);
> +	if (read)
> +		locked = down_read_trylock(sem);

... but then we just acquired the lock.  And the point was to never
acquire the lock.

Also, what's the 'willy_wq' thing?  Do you mean wait_list?
Oh, no, I see, you're pretending that we should add an extra waitq
to the rwsem.  That's very silly.

The point was not to ask "how can we do this", the question was "should we
do this?"  And Thomas, at least for now, is saying "No".

If you want to figure out how to do it properly, see rwsem_add_waiter()
and how it's used.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Wait for mutex to become unlocked
  2022-05-05  1:14     ` Thomas Gleixner
@ 2022-05-05  5:04       ` Paul E. McKenney
  2022-05-05  5:21         ` Paul E. McKenney
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E. McKenney @ 2022-05-05  5:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Matthew Wilcox, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Liam R. Howlett, linux-kernel

On Thu, May 05, 2022 at 03:14:53AM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 01:38, Matthew Wilcox wrote:
> > On Thu, May 05, 2022 at 02:22:30AM +0200, Thomas Gleixner wrote:
> >> > That is, rwsem_wait_read() puts the thread on the rwsem's wait queue,
> >> > and wakes it up without giving it the lock.  Now this thread will never
> >> > be able to block any thread that tries to acquire mmap_sem for write.
> >> 
> >> Never?
> >> 
> >>  	if (down_read_trylock(&vma->sem)) {
> >> 
> >> ---> preemption by writer
> >
> > Ah!  This is a different semaphore.  Yes, it can be preempted while
> > holding the VMA rwsem and block a thread which is trying to modify the
> > VMA which will then block all threads from faulting _on that VMA_,
> > but it won't affect page faults on any other VMA.
> 
> Ooops. Missed that detail. Too many semaphores here.
> 
> > It's only Better, not Best (the Best approach was proposed on Monday
> > afternoon, and the other MM developers asked us to only go as far as
> > Better and see if that was good enough).
> 
> :)
> 
> >> The information gathered from /proc/pid/smaps is unreliable at the point
> >> where the lock is dropped already today. So it does not make a
> >> difference whether the VMAs have a 'read me if you really think it's
> >> useful' sideband information which gets updated when the VMA changes and
> >> allows to do:
> >
> > Mmm.  I'm not sure that we want to maintain the smaps information on
> > the off chance that somebody wants to query it.
> 
> Fair enough, but then the question is whether it's more reasonable to
> document that if you want to read that nonsense, then you have to live
> with the consequences. The problem with many of those interfaces is that
> they have been added for whatever reasons, became ABI and people are
> suddenly making performance claims which might not be justified at all.
> 
> We really have to make our mind up and make decisions whether we want to
> solve every "I want a pony" complaint just because.
> 
> >> But looking at the stuff which gets recomputed and reevaluated in that
> >> proc/smaps code this makes a lot of sense, because most if not all of
> >> this information is already known at the point where the VMA is modified
> >> while holding mmap_sem for useful reasons, no?
> >
> > I suspect the only way to know is to try to implement it, and then
> > benchmark it.
> 
> Sure. There are other ways than having a RCU protected info, e.g. a
> sequence count which ensures that the to be read information is
> consistent.

So the thought is to maintain the /proc/smaps information separately,
so that it can just be read out, correct?  If so...

As you say, sequence counts can check consistency, but something else
is required to protect any dereferences of pointers to data that might
be freed.  One approach is to place the /proc/smaps information somewhere
that cannot be freed during /proc/smaps scan.  The place that comes
immediately to mind is the mm_struct, but I suspect that the /proc/smaps
information will need to be variable length, especially on 64-bit systems.

Another approach is to allocate space for the /proc/smaps information
dynamically, using RCU to protect only reads of only that information.
But you seem to be thinking of something else.  Or maybe your point is
that the use of RCU can be restricted to this /proc/smaps information?

Yet another approach is to use reference counts, but of course the counts
need to live outside of the structure being protected.  If the summary
information is not to block expansion of the address space (which is
the asked-for pony), this gets tricky due to the need to quickly and
repeatedly enlarge the memory holding the /proc/smaps information.

Or am I missing a trick here?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Wait for mutex to become unlocked
  2022-05-05  5:04       ` Paul E. McKenney
@ 2022-05-05  5:21         ` Paul E. McKenney
  0 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2022-05-05  5:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Matthew Wilcox, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Liam R. Howlett, linux-kernel

On Wed, May 04, 2022 at 10:04:44PM -0700, Paul E. McKenney wrote:
> On Thu, May 05, 2022 at 03:14:53AM +0200, Thomas Gleixner wrote:
> > On Thu, May 05 2022 at 01:38, Matthew Wilcox wrote:
> > > On Thu, May 05, 2022 at 02:22:30AM +0200, Thomas Gleixner wrote:
> > >> > That is, rwsem_wait_read() puts the thread on the rwsem's wait queue,
> > >> > and wakes it up without giving it the lock.  Now this thread will never
> > >> > be able to block any thread that tries to acquire mmap_sem for write.
> > >> 
> > >> Never?
> > >> 
> > >>  	if (down_read_trylock(&vma->sem)) {
> > >> 
> > >> ---> preemption by writer
> > >
> > > Ah!  This is a different semaphore.  Yes, it can be preempted while
> > > holding the VMA rwsem and block a thread which is trying to modify the
> > > VMA which will then block all threads from faulting _on that VMA_,
> > > but it won't affect page faults on any other VMA.
> > 
> > Ooops. Missed that detail. Too many semaphores here.
> > 
> > > It's only Better, not Best (the Best approach was proposed on Monday
> > > afternoon, and the other MM developers asked us to only go as far as
> > > Better and see if that was good enough).
> > 
> > :)
> > 
> > >> The information gathered from /proc/pid/smaps is unreliable at the point
> > >> where the lock is dropped already today. So it does not make a
> > >> difference whether the VMAs have a 'read me if you really think it's
> > >> useful' sideband information which gets updated when the VMA changes and
> > >> allows to do:
> > >
> > > Mmm.  I'm not sure that we want to maintain the smaps information on
> > > the off chance that somebody wants to query it.
> > 
> > Fair enough, but then the question is whether it's more reasonable to
> > document that if you want to read that nonsense, then you have to live
> > with the consequences. The problem with many of those interfaces is that
> > they have been added for whatever reasons, became ABI and people are
> > suddenly making performance claims which might not be justified at all.
> > 
> > We really have to make our mind up and make decisions whether we want to
> > solve every "I want a pony" complaint just because.
> > 
> > >> But looking at the stuff which gets recomputed and reevaluated in that
> > >> proc/smaps code this makes a lot of sense, because most if not all of
> > >> this information is already known at the point where the VMA is modified
> > >> while holding mmap_sem for useful reasons, no?
> > >
> > > I suspect the only way to know is to try to implement it, and then
> > > benchmark it.
> > 
> > Sure. There are other ways than having a RCU protected info, e.g. a
> > sequence count which ensures that the to be read information is
> > consistent.
> 
> So the thought is to maintain the /proc/smaps information separately,
> so that it can just be read out, correct?  If so...
> 
> As you say, sequence counts can check consistency, but something else
> is required to protect any dereferences of pointers to data that might
> be freed.  One approach is to place the /proc/smaps information somewhere
> that cannot be freed during /proc/smaps scan.  The place that comes
> immediately to mind is the mm_struct, but I suspect that the /proc/smaps
> information will need to be variable length, especially on 64-bit systems.
> 
> Another approach is to allocate space for the /proc/smaps information
> dynamically, using RCU to protect only reads of only that information.
> But you seem to be thinking of something else.  Or maybe your point is
> that the use of RCU can be restricted to this /proc/smaps information?
> 
> Yet another approach is to use reference counts, but of course the counts
> need to live outside of the structure being protected.  If the summary
> information is not to block expansion of the address space (which is
> the asked-for pony), this gets tricky due to the need to quickly and
> repeatedly enlarge the memory holding the /proc/smaps information.

Ah, maybe you were thinking of protecting the pointer dereference (and
maybe also a reference-count increment) with a lock (perhaps in the
mm_struct?), where the lock's critical section disabled preemption.
The lock would be released before scanning the information, and your
sequence count could then retry inconsistent reads.  (With some limit
beyond which inconsistency is accepted, just as inconsistent /proc/smaps
reads can happen today).

Or again, am I missing a trick here?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-05  5:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-04 21:44 Wait for mutex to become unlocked Matthew Wilcox
2022-05-05  0:22 ` Thomas Gleixner
2022-05-05  0:38   ` Matthew Wilcox
2022-05-05  1:14     ` Thomas Gleixner
2022-05-05  5:04       ` Paul E. McKenney
2022-05-05  5:21         ` Paul E. McKenney
2022-05-05  1:11 ` Waiman Long
     [not found] ` <20220505015223.5132-1-hdanton@sina.com>
2022-05-05  4:12   ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.