linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Boqun Feng <boqun.feng@gmail.com>
Subject: Re: [PATCH 11/20] locking/osq: Export osq_(lock|unlock)
Date: Wed, 2 Aug 2023 17:09:13 -0400	[thread overview]
Message-ID: <11d39248-31fc-c625-7c06-341f0146bd67@redhat.com> (raw)
In-Reply-To: <20230802204407.lk5mnj7ua6idddbd@moria.home.lan>

On 8/2/23 16:44, Kent Overstreet wrote:
> On Wed, Aug 02, 2023 at 04:16:12PM -0400, Waiman Long wrote:
>> On 7/12/23 17:11, Kent Overstreet wrote:
>>> These are used by bcachefs's six locks.
>>>
>>> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: Waiman Long <longman@redhat.com>
>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>> ---
>>>    kernel/locking/osq_lock.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
>>> index d5610ad52b..b752ec5cc6 100644
>>> --- a/kernel/locking/osq_lock.c
>>> +++ b/kernel/locking/osq_lock.c
>>> @@ -203,6 +203,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>>>    	return false;
>>>    }
>>> +EXPORT_SYMBOL_GPL(osq_lock);
>>>    void osq_unlock(struct optimistic_spin_queue *lock)
>>>    {
>>> @@ -230,3 +231,4 @@ void osq_unlock(struct optimistic_spin_queue *lock)
>>>    	if (next)
>>>    		WRITE_ONCE(next->locked, 1);
>>>    }
>>> +EXPORT_SYMBOL_GPL(osq_unlock);
>> Have you considered extending the current rw_semaphore to support a SIX lock
>> semantics? There are a number of instances in the kernel that a up_read() is
>> followed by a down_write(). Basically, the code try to upgrade the lock from
>> read to write. I have been thinking about adding a upgrade_read() API to do
>> that. However, the concern that I had was that another writer may come in
>> and make modification before the reader can be upgraded to have exclusive
>> write access and will make the task to repeat what has been done in the read
>> lock part. By adding a read with intent to upgrade to write, we can have
>> that guarantee.
> It's been discussed, Linus had the same thought.
>
> But it'd be a massive change to the rw semaphore code; this "read with
> intent" really is a third lock state which needs all the same
> lock/trylock/unlock paths, and with the way rw semaphore has separate
> entry points for read and write it'd be a _ton_ of new code. It really
> touches everything - waitlist handling included.

Yes, it is a major change, but I had done that before and it is 
certainly doable. There are spare bits in the low byte of rwsem->count 
that can be used as an intent bit. We also need to add a new 
rwsem_wake_type for that for waitlist handling.


>
> And six locks have several other features that bcachefs needs, and other
> users may also end up wanting, that rw semaphores don't have; the two
> main features being a percpu read lock mode and support for an external
> cycle detector (which requires exposing lock waitlists, with some
> guarantees about how those waitlists are used).

Can you provide more information about those features?

>
>> With that said, I would prefer to keep osq_{lock/unlock} for internal use by
>> some higher level locking primitives - mutex, rwsem and rt_mutex.
> Yeah, I'm aware, but it seems like exposing osq_(lock|unlock) is the
> most palatable solution for now. Long term, I'd like to get six locks
> promoted to kernel/locking.

Your SIX overlaps with rwsem in term of features. So we will have to 
somehow merge them instead of having 2 APIs with somewhat similar 
functionality.

Cheers,
Longman

>


  reply	other threads:[~2023-08-02 21:10 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-12 21:10 [PATCH 00/20] bcachefs prereqs patch series Kent Overstreet
2023-07-12 21:10 ` [PATCH 01/20] sched: Add task_struct->faults_disabled_mapping Kent Overstreet
2023-07-12 21:10 ` [PATCH 02/20] fs: factor out d_mark_tmpfile() Kent Overstreet
2023-07-12 21:10 ` [PATCH 03/20] iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic() Kent Overstreet
2023-07-12 21:10 ` [PATCH 04/20] block: Add some exports for bcachefs Kent Overstreet
2023-07-24 17:31   ` Christoph Hellwig
2023-07-25  3:00     ` Kent Overstreet
2023-07-26 13:20       ` Christoph Hellwig
2023-08-01 18:59         ` Kent Overstreet
2023-07-25  2:59   ` Matthew Wilcox
2023-07-12 21:11 ` [PATCH 05/20] block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset Kent Overstreet
2023-07-24 17:34   ` Christoph Hellwig
2023-07-25  2:43     ` Kent Overstreet
2023-07-26 13:23       ` Christoph Hellwig
2023-08-01 19:04         ` Kent Overstreet
2023-08-02 11:47           ` Christoph Hellwig
2023-08-02 16:44             ` Kent Overstreet
2023-07-12 21:11 ` [PATCH 06/20] block: Bring back zero_fill_bio_iter Kent Overstreet
2023-07-24 17:35   ` Christoph Hellwig
2023-07-25  2:45     ` Kent Overstreet
2023-07-26 13:21       ` Christoph Hellwig
2023-08-01 19:05         ` Kent Overstreet
2023-07-12 21:11 ` [PATCH 07/20] block: Don't block on s_umount from __invalidate_super() Kent Overstreet
2023-07-12 21:11 ` [PATCH 08/20] stacktrace: Export stack_trace_save_tsk Kent Overstreet
2023-07-12 21:11 ` [PATCH 09/20] lib/string_helpers: string_get_size() now returns characters wrote Kent Overstreet
2023-07-12 21:11 ` [PATCH 10/20] lib: Export errname Kent Overstreet
2023-07-13  7:10   ` Eric Biggers
2023-07-12 21:11 ` [PATCH 11/20] locking/osq: Export osq_(lock|unlock) Kent Overstreet
2023-08-02 20:16   ` Waiman Long
2023-08-02 20:44     ` Kent Overstreet
2023-08-02 21:09       ` Waiman Long [this message]
2023-08-02 21:42         ` Kent Overstreet
2023-10-10  8:09           ` [NAK] " Ingo Molnar
2023-10-18 21:04             ` Kent Overstreet
2023-07-12 21:11 ` [PATCH 12/20] bcache: move closures to lib/ Kent Overstreet
2023-07-13  3:21   ` Randy Dunlap
2023-07-13  3:52     ` Kent Overstreet
2023-07-12 21:11 ` [PATCH 13/20] MAINTAINERS: Add entry for closures Kent Overstreet
2023-07-12 21:11 ` [PATCH 14/20] closures: closure_wait_event() Kent Overstreet
2023-07-12 21:11 ` [PATCH 15/20] closures: closure_nr_remaining() Kent Overstreet
2023-07-12 21:11 ` [PATCH 16/20] closures: Add a missing include Kent Overstreet
2023-07-12 21:11 ` [PATCH 17/20] MAINTAINERS: Add entry for generic-radix-tree Kent Overstreet
2023-07-12 21:11 ` [PATCH 18/20] lib/generic-radix-tree.c: Don't overflow in peek() Kent Overstreet
2023-07-12 21:11 ` [PATCH 19/20] lib/generic-radix-tree.c: Add a missing include Kent Overstreet
2023-07-25  3:04   ` Matthew Wilcox
2023-07-25  3:36     ` Kent Overstreet
2023-07-12 21:11 ` [PATCH 20/20] lib/generic-radix-tree.c: Add peek_prev() Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11d39248-31fc-c625-7c06-341f0146bd67@redhat.com \
    --to=longman@redhat.com \
    --cc=boqun.feng@gmail.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).