Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Waiman Long <Waiman.Long@hpe.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.com>, Jeff Layton <jlayton@poochiereds.net>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andi Kleen <andi@firstfloor.org>,
	Dave Chinner <dchinner@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Scott J Norton <scott.norton@hpe.com>,
	Douglas Hatch <doug.hatch@hpe.com>
Subject: Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists
Date: Wed, 13 Jul 2016 12:08:23 -0400	[thread overview]
Message-ID: <20160713160823.GD4065@mtj.duckdns.org> (raw)
In-Reply-To: <1468258332-61537-2-git-send-email-Waiman.Long@hpe.com>

Hello,

On Mon, Jul 11, 2016 at 01:32:06PM -0400, Waiman Long wrote:
...
> A new header file include/linux/dlock-list.h will be added with the

Heh, I think perpcu_list was the better name but suppose I'm too late.

> associated dlock_list_head and dlock_list_node structures. The following
> functions are provided to manage the per-cpu list:
> 
>  1. int init_dlock_list_head(struct dlock_list_head **pdlock_head)
>  2. void dlock_list_add(struct dlock_list_node *node,
> 		        struct dlock_list_head *head)
>  3. void dlock_list_del(struct dlock_list *node)
> 
> Iteration of all the list entries within a group of per-cpu
> lists is done by calling either the dlock_list_iterate() or
> dlock_list_iterate_safe() functions in a while loop. They correspond
> to the list_for_each_entry() and list_for_each_entry_safe() macros
> respectively. The iteration states are keep in a dlock_list_state
> structure that is passed to the iteration functions.

Why do we need two variants of this?  We need a state variable to walk
the list anyway.  Why not make dlock_list_iterate() safe against
removal and get rid of the second variant?  Also, dlock_list_next()
probably is a better name.

> +/*
> + * include/linux/dlock-list.h
> + *
> + * A distributed (per-cpu) set of lists each of which is protected by its
> + * own spinlock, but acts like a single consolidated list to the callers.
> + *
> + * The dlock_list_head structure contains the spinlock, the other
> + * dlock_list_node structures only contains a pointer to the spinlock in
> + * dlock_list_head.
> + */
> +struct dlock_list_head {
> +	struct list_head list;
> +	spinlock_t lock;
> +};
> +
> +#define DLOCK_LIST_HEAD_INIT(name)				\
> +	{							\
> +		.list.prev = &name.list,			\
> +		.list.next = &name.list,			\
> +		.list.lock = __SPIN_LOCK_UNLOCKED(name),	\
> +	}

This is confusing.  It looks like dlock_list_head and
DLOCK_LIST_HEAD_INIT() can be used to define and initialize static
dlock_lists but that isn't true.  It's weird to require the user to
deal with percpu declaration of the data type.  Shouldn't it be more
like the following?

struct dlock_list_head_cpu {
	struct list_head list;
	spinlock_t lock;
};

struct dlock_list_head {
	struct dlock_list_head_percpu *head_cpu;
};

> +/*
> + * Per-cpu list iteration state
> + */
> +struct dlock_list_state {
> +	int			 cpu;
> +	spinlock_t		*lock;
> +	struct list_head	*head;	/* List head of current per-cpu list */
> +	struct dlock_list_node	*curr;
> +	struct dlock_list_node	*next;
> +};

Maybe dlock_list_iter[ator] is a better name?

> +#define DLOCK_LIST_STATE_INIT()			\
> +	{					\
> +		.cpu  = -1,			\
> +		.lock = NULL,			\
> +		.head = NULL,			\
> +		.curr = NULL,			\
> +		.next = NULL,			\
> +	}

The NULL inits are unnecessary and prone to being left behind.

> +#define DEFINE_DLOCK_LIST_STATE(s)		\
> +	struct dlock_list_state s = DLOCK_LIST_STATE_INIT()
> +
> +static inline void init_dlock_list_state(struct dlock_list_state *state)
> +{
> +	state->cpu  = -1;
> +	state->lock = NULL;
> +	state->head = NULL;
> +	state->curr = NULL;
> +	state->next = NULL;
> +}

Why not "state = (struct dlock_list_state)DLOCK_LIST_STATE_INIT;"?

> +#ifdef CONFIG_DEBUG_SPINLOCK
> +#define DLOCK_LIST_WARN_ON(x)	WARN_ON(x)
> +#else
> +#define DLOCK_LIST_WARN_ON(x)
> +#endif

I'd just use WARN_ON_ONCE() without the CONFIG guard.

> +/*
> + * Next per-cpu list entry
> + */
> +#define dlock_list_next_entry(pos, member) list_next_entry(pos, member.list)

Why does this need to be exposed?

> +/*
> + * Per-cpu node data structure
> + */
> +struct dlock_list_node {
> +	struct list_head list;
> +	spinlock_t *lockptr;
> +};
> +
> +#define DLOCK_LIST_NODE_INIT(name)		\
> +	{					\
> +		.list.prev = &name.list,	\
> +		.list.next = &name.list,	\
> +		.list.lockptr = NULL		\
> +	}

Ditto with NULL init.

> +static inline void init_dlock_list_node(struct dlock_list_node *node)
> +{
> +	INIT_LIST_HEAD(&node->list);
> +	node->lockptr = NULL;
> +}

Ditto with init.

> +static inline void free_dlock_list_head(struct dlock_list_head **pdlock_head)
> +{
> +	free_percpu(*pdlock_head);
> +	*pdlock_head = NULL;
> +}

Why does this need to be inlined?

> +/*
> + * Check if all the per-cpu lists are empty
> + */

Please use proper function comments.

> +static inline bool dlock_list_empty(struct dlock_list_head *dlock_head)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu)
> +		if (!list_empty(&per_cpu_ptr(dlock_head, cpu)->list))
> +			return false;
> +	return true;
> +}
> +
> +/*
> + * Helper function to find the first entry of the next per-cpu list
> + * It works somewhat like for_each_possible_cpu(cpu).
> + *
> + * Return: true if the entry is found, false if all the lists exhausted

Ditto about the comment.

> + */
> +static __always_inline bool
> +__dlock_list_next_cpu(struct dlock_list_head *head,
> +		      struct dlock_list_state *state)
> +{
...
> +static inline bool dlock_list_iterate_safe(struct dlock_list_head *head,
> +					   struct dlock_list_state *state)
> +{

Inlining these doesn't make senes to me.

> diff --git a/lib/Makefile b/lib/Makefile
> index 499fb35..92e8c38 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> +/*
> + * The dlock list lock needs its own class to avoid warning and stack
> + * trace when lockdep is enabled.
> + */

Can you please elaborate on this?

> +static struct lock_class_key dlock_list_key;
> +
> +/*
> + * Initialize the per-cpu list head
> + */
> +int init_dlock_list_head(struct dlock_list_head **pdlock_head)
> +{
> +	struct dlock_list_head *dlock_head;
> +	int cpu;
> +
> +	dlock_head = alloc_percpu(struct dlock_list_head);
> +	if (!dlock_head)
> +		return -ENOMEM;
> +
> +	for_each_possible_cpu(cpu) {
> +		struct dlock_list_head *head = per_cpu_ptr(dlock_head, cpu);
> +
> +		INIT_LIST_HEAD(&head->list);
> +		head->lock = __SPIN_LOCK_UNLOCKED(&head->lock);
> +		lockdep_set_class(&head->lock, &dlock_list_key);
> +	}
> +
> +	*pdlock_head = dlock_head;
> +	return 0;
> +}

Why is this called init?  Why not do the following?

struct dlock_list_head *alloc_dlock_list_head(void);

Also, the pointer type needs to include __percpu annotation.

> +/*
> + * List selection is based on the CPU being used when the dlock_list_add()
> + * function is called. However, deletion may be done by a different CPU.
> + * So we still need to use a lock to protect the content of the list.
> + */
> +void dlock_list_add(struct dlock_list_node *node, struct dlock_list_head *head)
> +{
> +	struct dlock_list_head *myhead;
> +
> +	/*
> +	 * Disable preemption to make sure that CPU won't gets changed.
> +	 */
> +	myhead = get_cpu_ptr(head);
> +	spin_lock(&myhead->lock);
> +	node->lockptr = &myhead->lock;
> +	list_add(&node->list, &myhead->list);
> +	spin_unlock(&myhead->lock);
> +	put_cpu_ptr(head);
> +}

I wonder whether it'd be better to use irqsafe operations.  lists tend
to be often used from irq contexts.

> +/*
> + * Delete a node from a dlock list
> + *
> + * We need to check the lock pointer again after taking the lock to guard
> + * against concurrent delete of the same node. If the lock pointer changes
> + * (becomes NULL or to a different one), we assume that the deletion was done
> + * elsewhere.
> + */
> +void dlock_list_del(struct dlock_list_node *node)
> +{
> +	spinlock_t *lock = READ_ONCE(node->lockptr);
> +
> +	if (unlikely(!lock)) {
> +		WARN(1, "dlock_list_del: node 0x%lx has no associated lock\n",
> +			(unsigned long)node);
> +		return;
> +	}
> +
> +	spin_lock(lock);
> +	if (likely(lock == node->lockptr)) {
> +		list_del_init(&node->list);
> +		node->lockptr = NULL;
> +	} else {
> +		/*
> +		 * This path should never be executed.
> +		 */

What if it races against someone else removing and adding back?
Shouldn't it retry on those cases?

Thanks.

-- 
tejun

next prev parent reply	other threads:[~2016-07-13 16:08 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-11 17:32 [PATCH v2 0/7] vfs: Use dlock list for SB's s_inodes list Waiman Long
2016-07-11 17:32 ` [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists Waiman Long
2016-07-13 16:08   ` Tejun Heo [this message]
2016-07-14  2:54     ` Waiman Long
2016-07-14 11:50       ` Tejun Heo
2016-07-14 14:35         ` Jan Kara
2016-07-14 14:51           ` Tejun Heo
2016-07-14 16:16             ` Waiman Long
2016-07-14 17:13         ` Waiman Long
2016-07-14 17:41           ` Tejun Heo
2016-07-11 17:32 ` [PATCH v2 2/7] lib/dlock-list: Add __percpu modifier for parameters Waiman Long
2016-07-13 16:08   ` Tejun Heo
2016-07-14  2:54     ` Waiman Long
2016-07-11 17:32 ` [PATCH v2 3/7] fsnotify: Simplify inode iteration on umount Waiman Long
2016-07-11 17:32 ` [PATCH v2 4/7] vfs: Remove unnecessary list_for_each_entry_safe() variants Waiman Long
2016-07-11 17:32 ` [PATCH v2 5/7] vfs: Use dlock list for superblock's inode list Waiman Long
2016-07-11 17:32 ` [RFC PATCH v2 6/7] lib/persubnode: Introducing a simple per-subnode APIs Waiman Long
2016-07-12  3:14   ` Boqun Feng
2016-07-12 18:37     ` Waiman Long
2016-07-12 14:27   ` Tejun Heo
2016-07-12 18:51     ` Waiman Long
2016-07-12 18:57       ` Tejun Heo
2016-07-12 19:42         ` Waiman Long
2016-07-11 17:32 ` [RFC PATCH v2 7/7] lib/dlock-list: Use the per-subnode APIs for managing lists Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160713160823.GD4065@mtj.duckdns.org \
    --to=tj@kernel.org \
    --cc=Waiman.Long@hpe.com \
    --cc=andi@firstfloor.org \
    --cc=bfields@fieldses.org \
    --cc=boqun.feng@gmail.com \
    --cc=cl@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=doug.hatch@hpe.com \
    --cc=jack@suse.com \
    --cc=jlayton@poochiereds.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hpe.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.