From: Matthew Wilcox <willy@infradead.org>
To: Christoph Lameter <cl@linux.com>
Cc: linux-mm@kvack.org, Jesper Dangaard Brouer <brouer@redhat.com>,
riel@redhat.com, Mel Gorman <mel@csn.ul.ie>
Subject: Re: [LSF/MM TOPIC] Movable memory and reliable higher order allocations
Date: Fri, 3 Mar 2017 12:39:20 -0800 [thread overview]
Message-ID: <20170303203920.GR16328@bombadil.infradead.org> (raw)
In-Reply-To: <alpine.DEB.2.20.1703030915170.16721@east.gentwo.org>
On Fri, Mar 03, 2017 at 09:24:23AM -0600, Christoph Lameter wrote:
> > We may need to negotiate the API a little ;-)
>
> Well lets continue the fun then.
It is a fun little dance! It'd help if you posted your current code;
I'm trying to reason about what you're probably doing and why, and a
bit less guesswork would make it easier.
> > > Locks are held. Interrupts are disabled. No slab operations may be
> > > performed and any operations on the slab page will cause that the
> > > concurrent access to block.
> > >
> > > The callback must establish a stable reference to the slab objects.
> > > Meaning generally a additional refcount is added so that any free
> > > operations will not remove the object. This is required in order to ensure
> > > that free operations will not interfere with reclaim processing.
> >
> > I don't currently have a way to do that. There is a refcount on the node,
> > but if somebody does an operation which causes the node to be removed
> > from the tree (something like splatting a huge page over the top of it),
> > we ignore the refcount and free the node. However, since it's been in
> > the tree, we pass it to RCU to free, so if you hold the RCU read lock in
> > addition to your other locks, the xarray can satisfy your requirements
> > that the object not be handed back to slab.
>
> We need a general solution here. Objects having a refcount is the common
> way to provide an existence guarantee. Holding rcu_locks in a
> function that performs slab operations or lenghty object inspection
> calling a variety of VM operations is not advisable.
Even if I had a refcount, it wouldn't solve your problem. Look at
the dcache:
if (!(dentry->d_flags & DCACHE_RCUACCESS))
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
and the inode freeing routine is much the same:
if (inode->i_sb->s_op->destroy_inode)
inode->i_sb->s_op->destroy_inode(inode);
else
call_rcu(&inode->i_rcu, i_callback);
So all three of the most important reclaimable caches free their data
using RCU. And once an object has gone onto the RCU lists, there's no
refcount that's going to avoid it being passed from RCU to the slab.
Your best bet for avoiding having somebody call kmem_cache_free() on
one of the objects in your list is to hold off RCU.
Of course, I now realise that taking the RCU read lock is not going
to help. Your critical section will not pre-date all callers of RCU,
so we can have a situation like this:
CPU A CPU B CPU C
read_lock
get node
spin_lock
call_rcu
spin_unlock
read stale data
read_lock
mark slab page as blocking
read_unlock
kmem_cache_free()
read_unlock
and CPU B is going to block in softirq context. Nasty. I also don't see
how to avoid it. Unless by "block", you mean "will spin on slab_lock()",
which isn't too bad, I suppose.
> > > This is required to have a stable array of objects to work on. If the
> > > objects could be freed at any time then the objects could not be inspected
> > > for state nor could an array of pointers to the objects be passed on for
> > > future processing.
> >
> > If I can free some, but not all of the objects, is that worth doing,
> > or should I return NULL here?
>
> The objects are all objects from the same slab page. If you cannot free
> one then the whole slab page must remain. It it advantageous to not free
> objects. The slab can then be used for more allocations and filled up
> again.
OK. So how about we have the following functions:
bool can_free(void **objects, unsigned int nr);
void reclaim(void **objects, unsigned int nr);
I don't think the kmem_cache is actually useful to any of the callees.
And until we have a user, let's not complicate the interface with the
ability to pass a private data structure around -- again, i don't see
it being useful for dentries or inodes.
The callee can take references or whetever else is useful to mark
objects as being targetted for reclaim in 'can_free', but may not sleep,
and should not take a long time to execute (because we're potentially
delaying somebody in irq context).
In reclaim, anything goes, no locks are held by slab, kmem_cache_alloc
can be called. When reclaim() returns, slab will evaluate the state
of the page and free it back to the page allocator if everything is
freed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-03-03 20:39 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-28 21:32 [LSF/MM TOPIC] Movable memory and reliable higher order allocations Christoph Lameter
2017-02-28 23:17 ` Matthew Wilcox
2017-03-02 4:12 ` Matthew Wilcox
2017-03-02 17:26 ` Christoph Lameter
2017-03-02 20:55 ` Matthew Wilcox
2017-03-03 15:24 ` Christoph Lameter
2017-03-03 20:39 ` Matthew Wilcox [this message]
2017-03-06 14:53 ` Christoph Lameter
2017-03-02 17:00 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170303203920.GR16328@bombadil.infradead.org \
--to=willy@infradead.org \
--cc=brouer@redhat.com \
--cc=cl@linux.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).