From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-sn1nam01on0128.outbound.protection.outlook.com ([104.47.32.128]:30621 "EHLO NAM01-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751251AbcGSSmo (ORCPT ); Tue, 19 Jul 2016 14:42:44 -0400 Message-ID: <578E7497.30602@hpe.com> Date: Tue, 19 Jul 2016 14:42:31 -0400 From: Waiman Long MIME-Version: 1.0 To: Tejun Heo CC: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Christoph Lameter , , , Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Boqun Feng , Scott J Norton , Douglas Hatch Subject: Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists References: <1468604383-40362-1-git-send-email-Waiman.Long@hpe.com> <1468604383-40362-2-git-send-email-Waiman.Long@hpe.com> <20160718233803.GN3078@mtj.duckdns.org> In-Reply-To: <20160718233803.GN3078@mtj.duckdns.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 07/18/2016 07:38 PM, Tejun Heo wrote: > Hello, Waiman. > > On Fri, Jul 15, 2016 at 01:39:40PM -0400, Waiman Long wrote: >> Suggested-by: Tejun Heo > Not sure I should be on suggested-by given that this wasn't my idea at > all. I put the tag there because of your suggestion to restructure the data structure which did make the patch better. I can remove that tag if you think it is not appropriate. >> +/* >> + * include/linux/dlock-list.h >> + * >> + * A distributed (per-cpu) set of lists each of which is protected by its >> + * own spinlock, but acts like a single consolidated list to the callers. >> + * >> + * The dlock_list_head_percpu structure contains the spinlock, the other >> + * dlock_list_node structures only contains a pointer to the spinlock in >> + * dlock_list_head_percpu. >> + */ > The more I think about it, the more bothered I'm about the dlock_list > name. For the most part, this isn't different from other percpu data > structures in the kernel. Sure, it might benefit from doing Nth cpu, > but so are other percpu data structures and it's not just "distributed > lock" list either. The list itself is percpu, not just locking. Can > we please go back to percpu_list? Christoph, what do you think? > As I said before, I don't mind reverting the name back to percpu_list. I am just waiting for a final agreement. >> +struct dlock_list_node { >> + struct list_head list; >> + spinlock_t *lockptr; >> +}; > Wouldn't it be better to point to dlock_list_percpu? I could. However, the only thing that matter is the spinlock that protects the list entry. >> +#define DLOCK_LIST_HEAD_PERCPU_INIT(name) \ >> + { \ >> + .list.prev =&name.list, \ >> + .list.next =&name.list, \ > Use LIST_HEAD_INIT()? Also, why do we even need the initializers if > the data structure can only be dynamically allocated. In fact, do > the definitions even need to be exposed in the header? I put it there for completeness sake. You are right. That macro isn't currently used. I will remove it in the next iteration of the patch. >> + .list.lock = __SPIN_LOCK_UNLOCKED(name), \ >> + } >> + >> +/* >> + * dlock list iteration state >> + */ >> +struct dlock_list_iter { >> + int cpu; > ^ > I'm not sure lining up with space here is common in kernel. OK, I will remove the extra spaces to make it more conformant to the kernel style. >> + spinlock_t *lock; >> + struct list_head *head; /* List head of current per-cpu list */ >> + struct dlock_list_node *curr; >> + struct dlock_list_node *next; >> +}; >> + >> +#define DLOCK_LIST_ITER_INIT() \ > ^ > Don't we usually omit () in these cases? Good point. Will remove the unneeded (). >> + { \ >> + .cpu = -1, \ >> + } >> + >> +#define DEFINE_DLOCK_LIST_ITER(s) \ >> + struct dlock_list_iter s = DLOCK_LIST_ITER_INIT() >> + >> +static inline void init_dlock_list_iter(struct dlock_list_iter *iter) >> +{ >> + *iter = (struct dlock_list_iter)DLOCK_LIST_ITER_INIT(); >> +} >> + >> +#define DLOCK_LIST_NODE_INIT(name) \ >> + { \ >> + .list.prev =&name.list, \ >> + .list.next =&name.list, \ > ^ > LIST_HEAD_INIT()? Will make the change. >> + } >> + >> +static inline void init_dlock_list_node(struct dlock_list_node *node) >> +{ >> + INIT_LIST_HEAD(&node->list); >> + node->lockptr = NULL; > Why not use DLOCK_LIST_NODE_INIT()? > Yes, I can make the change. >> +} >> + >> +/* >> + * Check if all the dlock lists are empty >> + * >> + * This can be a pretty expensive function call. If this function is required >> + * in a performance critical path, we may have to maintain a global count >> + * of the list entries in the global dlock_list_head structure instead. >> + */ > /** function comment please. Sure. >> +static inline bool dlock_list_empty(struct dlock_list_head *dlist) >> +{ >> + int cpu; >> + >> + for_each_possible_cpu(cpu) >> + if (!list_empty(&per_cpu_ptr(dlist->head, cpu)->list)) >> + return false; >> + return true; >> +} >> + >> +/* >> + * Allocation and freeing of dlock list >> + */ >> +extern int alloc_dlock_list_head(struct dlock_list_head *dlist); > ^ > ditto with alignment Will remove the extra space. >> +extern void free_dlock_list_head(struct dlock_list_head *dlist); >> + >> +/* >> + * The dlock list iteration functions which return true if iteration has >> + * to be continued. >> + */ >> +extern bool dlock_list_next(struct dlock_list_head *dlist, >> + struct dlock_list_iter *iter); >> +extern bool dlock_list_next_safe(struct dlock_list_head *dlist, >> + struct dlock_list_iter *iter); > Why not return dlock_list_node * for the current node? That'd more > conventional and allows dlock_list_iter to be opaque. Yes, I can make it return dlock_list_node *. However, to make dlock_list_iter opaque, I will have to dynamically allocate the structure. That will add an extra memory allocation and free calls as well as handling the error case of running out of memory. I don't think that is worth doing at this point. >> diff --git a/lib/dlock-list.c b/lib/dlock-list.c >> new file mode 100644 >> index 0000000..af4a9f3 >> --- /dev/null >> +++ b/lib/dlock-list.c > ... >> +int alloc_dlock_list_head(struct dlock_list_head *dlist) >> +{ >> + struct dlock_list_head dlist_tmp; >> + int cpu; >> + >> + dlist_tmp.head = alloc_percpu(struct dlock_list_head_percpu); >> + if (!dlist_tmp.head) >> + return -ENOMEM; >> + >> + for_each_possible_cpu(cpu) { >> + struct dlock_list_head_percpu *head; >> + >> + head = per_cpu_ptr(dlist_tmp.head, cpu); >> + INIT_LIST_HEAD(&head->list); >> + head->lock = __SPIN_LOCK_UNLOCKED(&head->lock); >> + lockdep_set_class(&head->lock,&dlock_list_key); >> + } >> + >> + dlist->head = dlist_tmp.head; > Just use dlist->head directly or use local __perpcu head pointer? I just don't want to expose the structure to world until it is fully initialized. If you think I am over-cautious, I can use dlist->head as suggested. >> + return 0; >> +} >> +EXPORT_SYMBOL(alloc_dlock_list_head); > Does this actually need to be exported? If so, it might be a better > idea to start with EXPORT_SYMBOL_GPL(). For the current use case, we probably don't need to export the symbols. Other use cases may require that. I will change it to use the version instead. >> +void dlock_list_add(struct dlock_list_node *node, >> + struct dlock_list_head *dlist) >> +{ >> + struct dlock_list_head_percpu *head; > ^ > > This probably requires __percpu annotation. Have you run it through > sparse and checked for address space warnings? You are right. I probably miss the __percpu annotation. I haven't run it through sparse. I will do that to fix any warning found. >> + >> + /* >> + * Disable preemption to make sure that CPU won't gets changed. >> + */ >> + head = get_cpu_ptr(dlist->head); >> + spin_lock(&head->lock); >> + node->lockptr =&head->lock; >> + list_add(&node->list,&head->list); >> + spin_unlock(&head->lock); >> + put_cpu_ptr(dlist->head); >> +} >> +EXPORT_SYMBOL(dlock_list_add); >> + >> +/** >> + * dlock_list_del - Delete a node from a dlock list >> + * @node : The node to be deleted >> + * >> + * We need to check the lock pointer again after taking the lock to guard >> + * against concurrent deletion of the same node. If the lock pointer changes >> + * (becomes NULL or to a different one), we assume that the deletion was done >> + * elsewhere. A warning will be printed if this happens as it is likely to be >> + * a bug. >> + */ >> +void dlock_list_del(struct dlock_list_node *node) >> +{ >> + spinlock_t *lock = READ_ONCE(node->lockptr); >> + >> + if (unlikely(!lock)) { >> + WARN_ONCE(1, >> + "dlock_list_del: node 0x%lx has no associated lock\n", >> + (unsigned long)node); > Maybe "if (WARN_ONCE(!lock...)"? WARN_ONCE implies unlikely. OK, will do that. >> + return; >> + } >> + >> + spin_lock(lock); >> + if (likely(lock == node->lockptr)) { >> + list_del_init(&node->list); >> + node->lockptr = NULL; >> + } else { >> + /* >> + * This path should never be executed. >> + */ >> + WARN_ON_ONCE(1); >> + } > This still kinda bothers me because this pretty much requires the > users to have strong synchronization around the operations and makes > it unusable in situations where opportunistic behaviors are > acceptable. It negates the usefulness quite a bit. I understand your concern. I will make it retry again with the new lock. >> + spin_unlock(lock); >> +} >> +EXPORT_SYMBOL(dlock_list_del); >> + >> +/* >> + * Helper function to find the first entry of the next per-cpu list >> + * It works somewhat like for_each_possible_cpu(cpu). >> + * >> + * Return: true if the entry is found, false if all the lists exhausted >> + * >> + */ >> +static inline bool dlock_list_next_cpu(struct dlock_list_head *dlist, > ^ > Just let the compiler figure it out. Sure. Will remove the inline tag. >> + struct dlock_list_iter *iter) >> +{ >> + if (iter->lock) >> + spin_unlock(iter->lock); >> +next_cpu: >> + /* >> + * for_each_possible_cpu(cpu) >> + */ >> + iter->cpu = cpumask_next(iter->cpu, cpu_possible_mask); >> + if (iter->cpu>= nr_cpu_ids) >> + return false; /* All the per-cpu lists iterated */ >> + >> + iter->head =&per_cpu_ptr(dlist->head, iter->cpu)->list; >> + if (list_empty(iter->head)) >> + goto next_cpu; >> + >> + iter->lock =&per_cpu_ptr(dlist->head, iter->cpu)->lock; >> + spin_lock(iter->lock); >> + /* >> + * There is a slight chance that the list may become empty just >> + * before the lock is acquired. So an additional check is >> + * needed to make sure that iter->curr points to a valid entry. >> + */ >> + if (list_empty(iter->head)) { >> + spin_unlock(iter->lock); >> + goto next_cpu; >> + } >> + iter->curr = list_entry(iter->head->next, >> + struct dlock_list_node, list); >> + return true; >> +} > ... >> +/** >> + * dlock_list_next_safe - Removal-safe iterator of dlock list >> + * @dlist: Pointer to the dlock_list_head structure >> + * @iter : Pointer to the dlock list iterator structure >> + * Return: true if the next entry is found, false if all the entries iterated >> + * >> + * The iterator has to be properly initialized before calling this function. >> + * This iteration function is safe with respect to list entry removal. >> + * However, it cannot correctly iterate newly added entries right after the >> + * current one. >> + */ > This still looks wrong to me. If you want to provide the two variants > of iterations, can't you just implement one next function and build > the two types of iterations on top of it? I have been thinking about making dlock_list_next_cpu() the real external function and have 2 inline functions that implement dlock_list_next() and dlock_list_next_safe(). That may strike a better balance between performance and code abstraction. I will do so if you have no objection to that. Cheers, Longman