From: Jason Gunthorpe <jgg@ziepe.ca>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] RFC: hold i_rwsem until aio completes
Date: Wed, 15 Jan 2020 09:24:28 -0400 [thread overview]
Message-ID: <20200115132428.GA25201@ziepe.ca> (raw)
In-Reply-To: <20200115065614.GC21219@lst.de>
On Wed, Jan 15, 2020 at 07:56:14AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 14, 2020 at 03:27:00PM -0400, Jason Gunthorpe wrote:
> > I've seen similar locking patterns quite a lot, enough I've thought
> > about having a dedicated locking primitive to do it. It really wants
> > to be a rwsem, but as here the rwsem rules don't allow it.
> >
> > The common pattern I'm looking at looks something like this:
> >
> > 'try begin read'() // aka down_read_trylock()
> >
> > /* The lockdep release hackery you describe,
> > the rwsem remains read locked */
> > 'exit reader'()
> >
> > .. delegate unlock to work queue, timer, irq, etc ..
> >
> > in the new context:
> >
> > 're_enter reader'() // Get our lockdep tracking back
> >
> > 'end reader'() // aka up_read()
> >
> > vs a typical write side:
> >
> > 'begin write'() // aka down_write()
> >
> > /* There is no reason to unlock it before kfree of the rwsem memory.
> > Somehow the user prevents any new down_read_trylock()'s */
> > 'abandon writer'() // The object will be kfree'd with a locked writer
> > kfree()
> >
> > The typical goal is to provide an object destruction path that can
> > serialize and fence all readers wherever they may be before proceeding
> > to some synchronous destruction.
> >
> > Usually this gets open coded with some atomic/kref/refcount and a
> > completion or wait queue. Often implemented wrongly, lacking the write
> > favoring bias in the rwsem, and lacking any lockdep tracking on the
> > naked completion.
> >
> > Not to discourage your patch, but to ask if we can make the solution
> > more broadly applicable?
>
> Your requirement seems a little different, and in fact in many ways
> similar to the percpu_ref primitive.
I was interested because you are talking about allowing the read/write side
of a rw sem to be held across a return to user space/etc, which is the
same basic problem.
precpu refcount looks more like a typical refcount with a release that
is called by whatever context does the final put. The point above is
to basically move the release of a refcount into a synchrnous path by
introducing some barrier to wait for the refcount to go to zero. In
the above the barrier is the down_write() as it is really closer to a
rwsem than a refcount.
Thanks,
Jason
WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Waiman Long <longman@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-ext4@vger.kernel.org, cluster-devel@redhat.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: RFC: hold i_rwsem until aio completes
Date: Wed, 15 Jan 2020 09:24:28 -0400 [thread overview]
Message-ID: <20200115132428.GA25201@ziepe.ca> (raw)
In-Reply-To: <20200115065614.GC21219@lst.de>
On Wed, Jan 15, 2020 at 07:56:14AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 14, 2020 at 03:27:00PM -0400, Jason Gunthorpe wrote:
> > I've seen similar locking patterns quite a lot, enough I've thought
> > about having a dedicated locking primitive to do it. It really wants
> > to be a rwsem, but as here the rwsem rules don't allow it.
> >
> > The common pattern I'm looking at looks something like this:
> >
> > 'try begin read'() // aka down_read_trylock()
> >
> > /* The lockdep release hackery you describe,
> > the rwsem remains read locked */
> > 'exit reader'()
> >
> > .. delegate unlock to work queue, timer, irq, etc ..
> >
> > in the new context:
> >
> > 're_enter reader'() // Get our lockdep tracking back
> >
> > 'end reader'() // aka up_read()
> >
> > vs a typical write side:
> >
> > 'begin write'() // aka down_write()
> >
> > /* There is no reason to unlock it before kfree of the rwsem memory.
> > Somehow the user prevents any new down_read_trylock()'s */
> > 'abandon writer'() // The object will be kfree'd with a locked writer
> > kfree()
> >
> > The typical goal is to provide an object destruction path that can
> > serialize and fence all readers wherever they may be before proceeding
> > to some synchronous destruction.
> >
> > Usually this gets open coded with some atomic/kref/refcount and a
> > completion or wait queue. Often implemented wrongly, lacking the write
> > favoring bias in the rwsem, and lacking any lockdep tracking on the
> > naked completion.
> >
> > Not to discourage your patch, but to ask if we can make the solution
> > more broadly applicable?
>
> Your requirement seems a little different, and in fact in many ways
> similar to the percpu_ref primitive.
I was interested because you are talking about allowing the read/write side
of a rw sem to be held across a return to user space/etc, which is the
same basic problem.
precpu refcount looks more like a typical refcount with a release that
is called by whatever context does the final put. The point above is
to basically move the release of a refcount into a synchrnous path by
introducing some barrier to wait for the refcount to go to zero. In
the above the barrier is the down_write() as it is really closer to a
rwsem than a refcount.
Thanks,
Jason
next prev parent reply other threads:[~2020-01-15 13:24 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-14 16:12 [Cluster-devel] RFC: hold i_rwsem until aio completes Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-01-14 16:12 ` [Cluster-devel] [PATCH 01/12] mm: fix a comment in sys_swapon Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-02-10 23:29 ` [Cluster-devel] " Andrew Morton
2020-02-10 23:29 ` Andrew Morton
2020-02-12 7:37 ` [Cluster-devel] " Christoph Hellwig
2020-02-12 7:37 ` Christoph Hellwig
2020-01-14 16:12 ` [Cluster-devel] [PATCH 02/12] locking/rwsem: Exit early when held by an anonymous owner Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-01-14 18:17 ` [Cluster-devel] " Waiman Long
2020-01-14 18:17 ` Waiman Long
2020-01-14 18:25 ` [Cluster-devel] " Christoph Hellwig
2020-01-14 18:25 ` Christoph Hellwig
2020-01-14 18:33 ` [Cluster-devel] " Waiman Long
2020-01-14 18:33 ` Waiman Long
2020-01-14 18:55 ` [Cluster-devel] " Waiman Long
2020-01-14 18:55 ` Waiman Long
2020-01-14 16:12 ` [Cluster-devel] [PATCH 03/12] xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-01-14 16:12 ` [Cluster-devel] [PATCH 04/12] gfs2: move setting current->backing_dev_info Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-01-14 16:12 ` [Cluster-devel] [PATCH 05/12] gfs2: fix O_SYNC write handling Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-01-27 9:03 ` [Cluster-devel] " Christoph Hellwig
2020-01-28 16:57 ` Bob Peterson
2020-02-06 15:31 ` Andreas Gruenbacher
2020-02-06 15:31 ` Andreas Gruenbacher
2020-01-14 16:12 ` [Cluster-devel] [PATCH 06/12] iomap: pass a flags value to iomap_dio_rw Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-01-14 16:12 ` [PATCH 07/12] iomap: allow holding i_rwsem until aio completion Christoph Hellwig
2020-01-14 16:12 ` [Cluster-devel] [PATCH 08/12] ext4: hold i_rwsem until AIO completes Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-01-14 21:50 ` [Cluster-devel] " Theodore Y. Ts'o
2020-01-14 21:50 ` Theodore Y. Ts'o
2020-01-15 6:48 ` [Cluster-devel] " Christoph Hellwig
2020-01-15 6:48 ` Christoph Hellwig
2020-01-14 16:12 ` [Cluster-devel] [PATCH 09/12] gfs2: " Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-01-14 16:12 ` [Cluster-devel] [PATCH 10/12] xfs: " Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-01-14 16:12 ` [Cluster-devel] [PATCH 11/12] xfs: don't set IOMAP_DIO_SYNCHRONOUS for unaligned I/O Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-01-14 16:12 ` [Cluster-devel] [PATCH 12/12] iomap: remove the inode_dio_begin/end calls Christoph Hellwig
2020-01-14 16:12 ` Christoph Hellwig
2020-01-14 18:47 ` [Cluster-devel] RFC: hold i_rwsem until aio completes Matthew Wilcox
2020-01-14 18:47 ` Matthew Wilcox
2020-01-15 6:54 ` [Cluster-devel] " Christoph Hellwig
2020-01-15 6:54 ` Christoph Hellwig
2020-01-14 19:27 ` [Cluster-devel] " Jason Gunthorpe
2020-01-14 19:27 ` Jason Gunthorpe
2020-01-15 6:56 ` [Cluster-devel] " Christoph Hellwig
2020-01-15 6:56 ` Christoph Hellwig
2020-01-15 13:24 ` Jason Gunthorpe [this message]
2020-01-15 13:24 ` Jason Gunthorpe
2020-01-15 14:33 ` [Cluster-devel] " Peter Zijlstra
2020-01-15 14:33 ` Peter Zijlstra
2020-01-15 14:49 ` [Cluster-devel] " Jason Gunthorpe
2020-01-15 14:49 ` Jason Gunthorpe
2020-01-15 19:03 ` [Cluster-devel] " Waiman Long
2020-01-15 19:03 ` Waiman Long
2020-01-15 19:07 ` [Cluster-devel] " Christoph Hellwig
2020-01-15 19:07 ` Christoph Hellwig
2020-01-18 22:40 ` [Cluster-devel] " Matthew Wilcox
2020-01-18 22:40 ` Matthew Wilcox
2020-01-15 15:36 ` [Cluster-devel] " Christoph Hellwig
2020-01-15 15:36 ` Christoph Hellwig
2020-01-15 16:26 ` [Cluster-devel] " Jason Gunthorpe
2020-01-15 16:26 ` Jason Gunthorpe
2020-01-16 14:00 ` [Cluster-devel] " Jan Kara
2020-01-16 14:00 ` Jan Kara
2020-02-03 17:44 ` [Cluster-devel] " Christoph Hellwig
2020-02-03 17:44 ` Christoph Hellwig
2020-01-18 9:28 ` [Cluster-devel] " Dave Chinner
2020-01-18 9:28 ` Dave Chinner
2020-02-03 17:46 ` [Cluster-devel] " Christoph Hellwig
2020-02-03 17:46 ` Christoph Hellwig
2020-02-03 23:02 ` [Cluster-devel] " Dave Chinner
2020-02-03 23:02 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200115132428.GA25201@ziepe.ca \
--to=jgg@ziepe.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.