From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Paul Mackerras <paulus@samba.org>,
David Gibson <david@gibson.dropbear.id.au>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
Date: Fri, 6 Nov 2015 13:17:17 +1100 [thread overview]
Message-ID: <563C0DAD.8070100@ozlabs.ru> (raw)
In-Reply-To: <20151103093913.346374e2@gandalf.local.home>
On 11/04/2015 01:39 AM, Steven Rostedt wrote:
> On Tue, 3 Nov 2015 17:57:05 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
>> This defines list_for_each_entry_lockless. This allows safe list
>> traversing in cases when lockdep() invocation is unwanted like
>> real mode (MMU is off).
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> This is for VFIO acceleration in POWERKVM for pSeries guests.
>> There is a KVM instance. There also can be some VFIO (PCI passthru)
>> devices attached to a KVM guest.
>>
>> To perform DMA, a pSeries guest registers DMA memory by calling some
>> hypercalls explicitely at the rate close to one-two hcalls per
>> a network packet, i.e. very often. When a guest does a hypercall
>> (which is just an assembly instruction), the host kernel receives it
>> in the real mode (MMU is off). When real mode fails to handle it,
>> it enables MMU and tries handling a hcall in virtual mode.
>>
>> A logical bus ID (LIOBN) is a tagret id for these hypecalls.
>>
>> Each VFIO device belongs to an IOMMU group. Each group has an address
>> translation table. It is allowed to have multiple IOMMU groups (i.e.
>> multiple tables) under the same LIOBN.
>>
>> So effectively every DMA hcall has to update one or more TCE tables
>> attached to the same LIOBN. RCU is used to update/traverse this list
>> safely.
>>
>> Using RCU as is in virtual mode is fine. Lockdep works, etc.
>> list_add_rcu() is used to populate the list;
>> list_del_rcu() + call_rcu() used to remove groups from a list.
>> These operations can happen in runtim as a result of PCI hotplug/unplug
>> in guests.
>>
>> Using RCU as is in real mode is not fine as some RCU checks can lock up
>> the system and in real mode we won't even have a chance to see any
>> debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used.
>>
>> Previous version of this used to define list_for_each_entry_rcu_notrace()
>> but it was proposed to use list_entry_lockless() instead. However
>> the comment for lockless_dereference() suggests this is a good idea
>> if "lifetime is managed by something other than RCU" but it is in my case.
>>
>> So what would be the correct approach here? Thanks.
>
> If the only use case for this so far is in POWERKVM, perhaps it should
> be defined specifically (and in arch/powerpc) and not confuse others
> about using this.
>
> Or, if you do imagine that this can be used in other scenarios, then a
> much deeper comment must be made in the code in the kerneldoc section.
> list_for_each_entry_rcu() should really be used in 99.99% of the time
> in the kernel. This looks to be an extreme exception. I hate to add a
> generic helper for something that will only be used in one location.
No, I cannot imagine this really and I can move it somewhere in arch/powerpc.
Still, is my approach correct? What does the comment for
lockless_dereference() actally mean - it won't work together with RCU at
all or this is to force people not to use it as "list_for_each_entry_rcu()
should really be used in 99.99% of the time"? :)
>
> -- Steve
>
>
>> ---
>> include/linux/rculist.h | 16 ++++++++++++++++
>> 1 file changed, 16 insertions(+)
>>
>> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
>> index 17c6b1f..a83a924 100644
>> --- a/include/linux/rculist.h
>> +++ b/include/linux/rculist.h
>> @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list,
>> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>>
>> /**
>> + * list_for_each_entry_lockless - iterate over rcu list of given type
>> + * @pos: the type * to use as a loop cursor.
>> + * @head: the head for your list.
>> + * @member: the name of the list_struct within the struct.
>> + *
>> + * This list-traversal primitive may safely run concurrently
>> + */
>> +#define list_entry_lockless(ptr, type, member) \
>> + container_of((typeof(ptr))lockless_dereference(ptr), type, member)
>> +
>> +#define list_for_each_entry_lockless(pos, head, member) \
>> + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \
>> + &pos->member != (head); \
>> + pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
>> +
>> +/**
>> * list_for_each_entry_continue_rcu - continue iteration over list of given type
>> * @pos: the type * to use as a loop cursor.
>> * @head: the head for your list.
>
--
Alexey
next prev parent reply other threads:[~2015-11-06 2:17 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-03 6:57 [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu Alexey Kardashevskiy
2015-11-03 14:39 ` Steven Rostedt
2015-11-06 2:17 ` Alexey Kardashevskiy [this message]
2015-11-18 19:13 ` Paul E. McKenney
2015-11-29 23:39 ` Paul Mackerras
2015-11-30 20:30 ` Paul E. McKenney
2015-12-06 2:19 ` Paul E. McKenney
2015-12-08 5:20 ` Paul Mackerras
2015-12-08 5:46 ` Paul E. McKenney
2015-12-22 7:08 ` Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=563C0DAD.8070100@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=david@gibson.dropbear.id.au \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=paulus@samba.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.