public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Huang Ying <ying.huang@intel.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH -mm 1/2] irq_work, Use llist in irq_work
Date: Fri, 02 Sep 2011 09:08:52 +0800	[thread overview]
Message-ID: <4E602CA4.4000703@intel.com> (raw)
In-Reply-To: <20110901125140.GA22737@Krystal>

On 09/01/2011 08:51 PM, Mathieu Desnoyers wrote:
> * Huang Ying (ying.huang@intel.com) wrote:
>> On 09/01/2011 03:58 PM, Peter Zijlstra wrote:
>>> On Thu, 2011-09-01 at 11:20 +0800, Huang Ying wrote:
>>>> On 09/01/2011 09:46 AM, Huang Ying wrote:
>>>>>>> -static void __irq_work_queue(struct irq_work *entry)
>>>>>>> +static void __irq_work_queue(struct irq_work *work)
>>>>>>>  {
>>>>>>> -	struct irq_work *next;
>>>>>>> +	struct irq_work_list *irq_work_list;
>>>>>>>  
>>>>>>> -	preempt_disable();
>>>>>>> +	irq_work_list = &get_cpu_var(irq_work_lists);
>>>>>>>  
>>>>>>> -	do {
>>>>>>> -		next = __this_cpu_read(irq_work_list);
>>>>>>> -		/* Can assign non-atomic because we keep the flags set. */
>>>>>>> -		entry->next = next_flags(next, IRQ_WORK_FLAGS);
>>>>>>> -	} while (this_cpu_cmpxchg(irq_work_list, next, entry) != next);
>>>>>>> +	llist_add(&work->llnode, &irq_work_list->llist);
>>>>>>>  
>>>>>>>  	/* The list was empty, raise self-interrupt to start processing. */
>>>>>>> -	if (!irq_work_next(entry))
>>>>>>> +	if (!test_and_set_bit(LIST_NONEMPTY_BIT, &irq_work_list->flags))
>>>>>>>  		arch_irq_work_raise();
>>>>>>
>>>>>> So why can't you simply test work->llnode->next?
>>>>>
>>>>> Yes.  That is better.  Even if there may be a small race window, it is
>>>>> not a big issue to raise one extra self interrupt seldom.
>>>>
>>>> Remember something about this.  I didn't test work->llnode->next here
>>>> because I didn't want expose the implementation details like that here.
>>>>  How about make llist_add() return whether list is empty before adding?
>>>>  Because it will be an inline function, that should be optimized out if
>>>> the caller do not need the information.
> 
> Yes, that would make sense.
> 
> something like
> 
> static inline
> struct llist_node *llist_add(struct llist_node *new, struct llist_head *head)
> {
> 	struct llist_node *entry, *old_entry;
> 
> #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> 	BUG_ON(in_nmi());
> #endif
> 
> 	entry = head->first;
> 	do {
> 		old_entry = entry;
> 		new->next = entry;
> 		cpu_relax();
> 	} while ((entry = cmpxchg(&head->first, old_entry, new)) != old_entry);
>         return old_entry;
> }
> 
> Where llist_add returns NULL if the list was initially empty, else
> returns non-null. This return value usage should be documented in the
> function header, of course.
> 
> BUT, big warning here: this whole thing _should_ be renamed as a
> "lockless stack" rather than a "lockless list". Really. I said it in the
> past, and here I see another example showing why we should. The only
> reason why we can get away this easily with knowing atomically if the
> structure was empty prior to the insertion is because this "list"
> hehaves like a stack (LIFO). I foresee we'll need to add "lockless
> queues" at some point (FIFO), which has the ability to enqueue/dequeue
> concurrently without sharing the same cache-lines when the queue is
> non-empty. Within that kind of queue structure, knowing if the queue was
> empty prior to insertion would become a bottleneck, so I would not
> advise to make that part of _that_ API, which would require to add a new
> "llqueue" API. And at that point, the "llist" vs "llqueue" semantic
> would become really confusing. Maybe "llstack" would be more appropriate
> here instead of llist ? Or llfifo ? The API we can provide with a
> lock-less structure does not only depend on how the elements are
> organized, but also on the operations allowed on the structure. So the
> API should reflect that. So I'm saying this warning again: if we don't
> fix that now, I think we're heading into a lock-free structure
> namespacing trainwreck that will limit our ability to add other
> lock-free operations due vague naming that does not take the operations
> allowed on the structure into consideration, combined with API
> constraints permitted by a specific given behavior (e.g. FIFO semantic)
> that tend to define these lock-free APIs.

I remember the previous consensus between us is to keep the API of llist
and may change its implementation in the future.  But it seems that
define a really general llist API is too hard.  But fortunately, because
llist is just inside kernel (not like a user space library, which is
used by various applications), we can change its name in the future if
it is needed.

> I've been working on lock-free structures myself in Userspace RCU: I
> have lock-free stack and queue, wait-free stack, wait-free queue, and
> (work in progress), RCU red-black tree (not lock-free), and lock-free
> RCU expandable hash table. If we plan namespacing of lock-free data
> structure correctly, I think there will be room for very interesting
> performance and scalability improvements in the future.

That is interesting.  But we need find the user in Linux kernel first.

>>>
>>> You could also use llist_empty() although that brings me to that
>>> ACCESS_ONCE thing in there, what's the point?
>>
>> Something as follow with llist_empty() seems not work.
>>
>>   empty = llist_empty(irq_work_list);
>>   llist_add(&work->llnode, irq_work_list);
>>   if (empty)
>>           arch_irq_work_raise();
>>
>> Because irq_work IRQ handler or timer IRQ handler may be executed just
>> before "llist_add(&work->llnode, irq_work_list)", so that, although
>> "empty == false", arch_irq_work_raise() still should be executed.
>>
>> Can you tell me how to that with llist_empty()?
>>
>>
>> For ACCESS_ONCE, Mathiew suggest me to add it,
>>
>> Hi, Mathiew,
>>
>> Can you explain why ACCESS_ONCE should be used here?
> 
> It was mainly to force the compiler to re-fetch the head->first value
> from memory in busy-waiting loops. So even though the right practice is
> to have a cpu_relax() in the body of the loop (which would force the
> refetch due to the compiler barrier), having the ACCESS_ONCE in
> llist_empty() should not hurt much. It also seems to be what atomic*.h
> does (volatile read), so I guess the expected behavior wrt atomic
> accesses are to behave as volatile, hence my recommendation to make this
> llist_empty a volatile access. Quoting my previous email on that topic:
> 
> "Would it make sense to do:
> 
>  return ACCESS_ONCE(head->first) == NULL;
> 
> instead ? Otherwise the compiler can choose to keep the result around in
> registers without re-reading (e.g. busy waiting loop)."
> 
> So I was not implying it was strictly required, merely wondering whether
> it should be added.

Thanks for your explanation.  I should refer to our previous email
firstly.  I think your point is reasonable.

Best Regards,
Huang Ying

  parent reply	other threads:[~2011-09-02  1:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-30  5:16 [PATCH -mm 0/2] Use llist in irq_work and xlist Huang Ying
2011-08-30  5:16 ` [PATCH -mm 1/2] irq_work, Use llist in irq_work Huang Ying
2011-08-31 10:10   ` Peter Zijlstra
2011-09-01  1:46     ` Huang Ying
2011-09-01  3:20       ` Huang Ying
2011-09-01  7:58         ` Peter Zijlstra
2011-09-01  8:56           ` Huang Ying
2011-09-01  9:57             ` Peter Zijlstra
2011-09-02  1:14               ` Huang Ying
2011-09-03 17:35                 ` Mathieu Desnoyers
2011-09-01 12:51             ` Mathieu Desnoyers
2011-09-01 13:00               ` Mathieu Desnoyers
2011-09-02  1:08               ` Huang Ying [this message]
2011-09-03 16:33                 ` Mathieu Desnoyers
2011-09-01  7:57       ` Peter Zijlstra
2011-09-01  8:44         ` Huang Ying
2011-09-01 10:00           ` Peter Zijlstra
2011-09-02  1:18             ` Huang Ying
2011-09-02 13:26               ` Peter Zijlstra
2011-08-30  5:16 ` [PATCH -mm 2/2] net, rds, Replace xlist in net/rds/xlist.h with llist Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E602CA4.4000703@intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox