From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932380Ab1IAMvs (ORCPT ); Thu, 1 Sep 2011 08:51:48 -0400 Received: from mail.openrapids.net ([64.15.138.104]:58595 "EHLO blackscsi.openrapids.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932078Ab1IAMvp convert rfc822-to-8bit (ORCPT ); Thu, 1 Sep 2011 08:51:45 -0400 Date: Thu, 1 Sep 2011 08:51:40 -0400 From: Mathieu Desnoyers To: Huang Ying Cc: Peter Zijlstra , Andrew Morton , "linux-kernel@vger.kernel.org" , "Paul E. McKenney" Subject: Re: [PATCH -mm 1/2] irq_work, Use llist in irq_work Message-ID: <20110901125140.GA22737@Krystal> References: <1314681384-20881-1-git-send-email-ying.huang@intel.com> <1314681384-20881-2-git-send-email-ying.huang@intel.com> <1314785405.23993.21.camel@twins> <4E5EE409.3060102@intel.com> <4E5EFA08.30205@intel.com> <1314863927.7945.11.camel@twins> <4E5F48D1.801@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: 8BIT In-Reply-To: <4E5F48D1.801@intel.com> X-Editor: vi X-Info: http://www.efficios.com X-Operating-System: Linux/2.6.26-2-686 (i686) X-Uptime: 08:14:32 up 281 days, 17:17, 3 users, load average: 0.18, 0.04, 0.01 User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Huang Ying (ying.huang@intel.com) wrote: > On 09/01/2011 03:58 PM, Peter Zijlstra wrote: > > On Thu, 2011-09-01 at 11:20 +0800, Huang Ying wrote: > >> On 09/01/2011 09:46 AM, Huang Ying wrote: > >>>>> -static void __irq_work_queue(struct irq_work *entry) > >>>>> +static void __irq_work_queue(struct irq_work *work) > >>>>> { > >>>>> - struct irq_work *next; > >>>>> + struct irq_work_list *irq_work_list; > >>>>> > >>>>> - preempt_disable(); > >>>>> + irq_work_list = &get_cpu_var(irq_work_lists); > >>>>> > >>>>> - do { > >>>>> - next = __this_cpu_read(irq_work_list); > >>>>> - /* Can assign non-atomic because we keep the flags set. */ > >>>>> - entry->next = next_flags(next, IRQ_WORK_FLAGS); > >>>>> - } while (this_cpu_cmpxchg(irq_work_list, next, entry) != next); > >>>>> + llist_add(&work->llnode, &irq_work_list->llist); > >>>>> > >>>>> /* The list was empty, raise self-interrupt to start processing. */ > >>>>> - if (!irq_work_next(entry)) > >>>>> + if (!test_and_set_bit(LIST_NONEMPTY_BIT, &irq_work_list->flags)) > >>>>> arch_irq_work_raise(); > >>>> > >>>> So why can't you simply test work->llnode->next? > >>> > >>> Yes. That is better. Even if there may be a small race window, it is > >>> not a big issue to raise one extra self interrupt seldom. > >> > >> Remember something about this. I didn't test work->llnode->next here > >> because I didn't want expose the implementation details like that here. > >> How about make llist_add() return whether list is empty before adding? > >> Because it will be an inline function, that should be optimized out if > >> the caller do not need the information. Yes, that would make sense. something like static inline struct llist_node *llist_add(struct llist_node *new, struct llist_head *head) { struct llist_node *entry, *old_entry; #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG BUG_ON(in_nmi()); #endif entry = head->first; do { old_entry = entry; new->next = entry; cpu_relax(); } while ((entry = cmpxchg(&head->first, old_entry, new)) != old_entry); return old_entry; } Where llist_add returns NULL if the list was initially empty, else returns non-null. This return value usage should be documented in the function header, of course. BUT, big warning here: this whole thing _should_ be renamed as a "lockless stack" rather than a "lockless list". Really. I said it in the past, and here I see another example showing why we should. The only reason why we can get away this easily with knowing atomically if the structure was empty prior to the insertion is because this "list" hehaves like a stack (LIFO). I foresee we'll need to add "lockless queues" at some point (FIFO), which has the ability to enqueue/dequeue concurrently without sharing the same cache-lines when the queue is non-empty. Within that kind of queue structure, knowing if the queue was empty prior to insertion would become a bottleneck, so I would not advise to make that part of _that_ API, which would require to add a new "llqueue" API. And at that point, the "llist" vs "llqueue" semantic would become really confusing. Maybe "llstack" would be more appropriate here instead of llist ? Or llfifo ? The API we can provide with a lock-less structure does not only depend on how the elements are organized, but also on the operations allowed on the structure. So the API should reflect that. So I'm saying this warning again: if we don't fix that now, I think we're heading into a lock-free structure namespacing trainwreck that will limit our ability to add other lock-free operations due vague naming that does not take the operations allowed on the structure into consideration, combined with API constraints permitted by a specific given behavior (e.g. FIFO semantic) that tend to define these lock-free APIs. I've been working on lock-free structures myself in Userspace RCU: I have lock-free stack and queue, wait-free stack, wait-free queue, and (work in progress), RCU red-black tree (not lock-free), and lock-free RCU expandable hash table. If we plan namespacing of lock-free data structure correctly, I think there will be room for very interesting performance and scalability improvements in the future. > > > > You could also use llist_empty() although that brings me to that > > ACCESS_ONCE thing in there, what's the point? > > Something as follow with llist_empty() seems not work. > > empty = llist_empty(irq_work_list); > llist_add(&work->llnode, irq_work_list); > if (empty) > arch_irq_work_raise(); > > Because irq_work IRQ handler or timer IRQ handler may be executed just > before "llist_add(&work->llnode, irq_work_list)", so that, although > "empty == false", arch_irq_work_raise() still should be executed. > > Can you tell me how to that with llist_empty()? > > > For ACCESS_ONCE, Mathiew suggest me to add it, > > Hi, Mathiew, > > Can you explain why ACCESS_ONCE should be used here? It was mainly to force the compiler to re-fetch the head->first value from memory in busy-waiting loops. So even though the right practice is to have a cpu_relax() in the body of the loop (which would force the refetch due to the compiler barrier), having the ACCESS_ONCE in llist_empty() should not hurt much. It also seems to be what atomic*.h does (volatile read), so I guess the expected behavior wrt atomic accesses are to behave as volatile, hence my recommendation to make this llist_empty a volatile access. Quoting my previous email on that topic: "Would it make sense to do: return ACCESS_ONCE(head->first) == NULL; instead ? Otherwise the compiler can choose to keep the result around in registers without re-reading (e.g. busy waiting loop)." So I was not implying it was strictly required, merely wondering whether it should be added. Best regards, Mathieu > > Best Regards, > Huang Ying -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com