Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Waiman Long <waiman.long@hpe.com>
To: Tejun Heo <tj@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.com>, Jeff Layton <jlayton@poochiereds.net>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andi Kleen <andi@firstfloor.org>,
	Dave Chinner <dchinner@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Scott J Norton <scott.norton@hpe.com>,
	Douglas Hatch <doug.hatch@hpe.com>
Subject: Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
Date: Wed, 20 Jul 2016 15:53:56 -0400	[thread overview]
Message-ID: <578FD6D4.4010005@hpe.com> (raw)
In-Reply-To: <20160719192333.GP3078@mtj.duckdns.org>

On 07/19/2016 03:23 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Jul 19, 2016 at 02:42:31PM -0400, Waiman Long wrote:
>> On 07/18/2016 07:38 PM, Tejun Heo wrote:
>>>> +struct dlock_list_node {
>>>> +	struct list_head list;
>>>> +	spinlock_t *lockptr;
>>>> +};
>>> Wouldn't it be better to point to dlock_list_percpu?
>> I could. However, the only thing that matter is the spinlock that protects
>> the list entry.
> Yeah, we can get back to this when it's actually necessary.  It just
> looked a bit weird to me.
>
>>>> +/*
>>>> + * The dlock list iteration functions which return true if iteration has
>>>> + * to be continued.
>>>> + */
>>>> +extern bool dlock_list_next(struct dlock_list_head *dlist,
>>>> +			    struct dlock_list_iter *iter);
>>>> +extern bool dlock_list_next_safe(struct dlock_list_head *dlist,
>>>> +				 struct dlock_list_iter *iter);
>>> Why not return dlock_list_node * for the current node?  That'd more
>>> conventional and allows dlock_list_iter to be opaque.
>> Yes, I can make it return dlock_list_node *.
>>
>> However, to make dlock_list_iter opaque, I will have to dynamically allocate
>> the structure. That will add an extra memory allocation and free calls as
>> well as handling the error case of running out of memory. I don't think that
>> is worth doing at this point.
> Sure, keep it defined in the header file.  Just don't require users to
> reach into it and add a comment saying that the struct is opaque to
> its users.

Will do so.

>>>> +int alloc_dlock_list_head(struct dlock_list_head *dlist)
>>>> +{
>>>> +	struct dlock_list_head dlist_tmp;
>>>> +	int cpu;
>>>> +
>>>> +	dlist_tmp.head = alloc_percpu(struct dlock_list_head_percpu);
>>>> +	if (!dlist_tmp.head)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		struct dlock_list_head_percpu *head;
>>>> +
>>>> +		head = per_cpu_ptr(dlist_tmp.head, cpu);
>>>> +		INIT_LIST_HEAD(&head->list);
>>>> +		head->lock = __SPIN_LOCK_UNLOCKED(&head->lock);
>>>> +		lockdep_set_class(&head->lock,&dlock_list_key);
>>>> +	}
>>>> +
>>>> +	dlist->head = dlist_tmp.head;
>>> Just use dlist->head directly or use local __perpcu head pointer?
>> I just don't want to expose the structure to world until it is fully
>> initialized. If you think I am over-cautious, I can use dlist->head as
>> suggested.
> I don't think it makes any actual difference.  No strong opinion
> either way.  Just use local __percpu head pointer then?
>
>>>> +	return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(alloc_dlock_list_head);
>>> Does this actually need to be exported?  If so, it might be a better
>>> idea to start with EXPORT_SYMBOL_GPL().
>> For the current use case, we probably don't need to export the symbols.
>> Other use cases may require that. I will change it to use the version
>> instead.
> If it's not immediately necessary, it's best to not export at all.

I will remove the EXPORT statements.

>>>> +void dlock_list_del(struct dlock_list_node *node)
>>>> +{
>>>> +	spinlock_t *lock = READ_ONCE(node->lockptr);
>>>> +
>>>> +	if (unlikely(!lock)) {
>>>> +		WARN_ONCE(1,
>>>> +			"dlock_list_del: node 0x%lx has no associated lock\n",
>>>> +			(unsigned long)node);
>>> Maybe "if (WARN_ONCE(!lock...)"?  WARN_ONCE implies unlikely.
>> OK, will do that.
>>
>>>> +		return;
>>>> +	}
>>>> +
>>>> +	spin_lock(lock);
>>>> +	if (likely(lock == node->lockptr)) {
>>>> +		list_del_init(&node->list);
>>>> +		node->lockptr = NULL;
>>>> +	} else {
>>>> +		/*
>>>> +		 * This path should never be executed.
>>>> +		 */
>>>> +		WARN_ON_ONCE(1);
>>>> +	}
>>> This still kinda bothers me because this pretty much requires the
>>> users to have strong synchronization around the operations and makes
>>> it unusable in situations where opportunistic behaviors are
>>> acceptable.  It negates the usefulness quite a bit.
>> I understand your concern. I will make it retry again with the new lock.
> It doesn't necessarily have to retry but shouldn't break down when
> used in an opportunistic racy way - e.g. if adds and removes race, the
> order of operations isn't clearly defined as such any outcome is fine
> as long as the list maintains its integrity.

I am going to retry it if the lock changes to different one and ignore 
it if it becomes NULL.

>>>> +/**
>>>> + * dlock_list_next_safe - Removal-safe iterator of dlock list
>>>> + * @dlist: Pointer to the dlock_list_head structure
>>>> + * @iter : Pointer to the dlock list iterator structure
>>>> + * Return: true if the next entry is found, false if all the entries iterated
>>>> + *
>>>> + * The iterator has to be properly initialized before calling this function.
>>>> + * This iteration function is safe with respect to list entry removal.
>>>> + * However, it cannot correctly iterate newly added entries right after the
>>>> + * current one.
>>>> + */
>>> This still looks wrong to me.  If you want to provide the two variants
>>> of iterations, can't you just implement one next function and build
>>> the two types of iterations on top of it?
>> I have been thinking about making dlock_list_next_cpu()  the real external
>> function and have 2 inline functions that implement dlock_list_next() and
>> dlock_list_next_safe(). That may strike a better balance between performance
>> and code abstraction. I will do so if you have no objection to that.
> Yeah, please give it a try.  As mentioned in another reply, it'd
> probably be best to provide an iteration macro which encapsulates the
> whole thing.
>
> Thanks.
>

Yes, I am go to make the dlist_for_each_entry macro as suggested by Al.

Cheers,
Longman

next prev parent reply	other threads:[~2016-07-20 19:54 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-15 17:39 [PATCH v3 0/4] vfs: Use dlock list for SB's s_inodes list Waiman Long
2016-07-15 17:39 ` [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists Waiman Long
2016-07-18 23:38   ` Tejun Heo
2016-07-19 18:42     ` Waiman Long
2016-07-19 19:23       ` Tejun Heo
2016-07-20 19:53         ` Waiman Long [this message]
2016-07-20 22:02         ` Waiman Long
2016-07-20 22:15       ` Waiman Long
2016-07-21  0:48         ` Christoph Lameter
2016-07-21  1:36           ` Waiman Long
2016-07-21  1:49           ` Dave Chinner
2016-07-22 20:43       ` Waiman Long
2016-07-19  5:00   ` Al Viro
2016-07-19 19:01     ` Waiman Long
2016-07-15 17:39 ` [PATCH v3 2/4] fsnotify: Simplify inode iteration on umount Waiman Long
2016-07-15 17:39 ` [PATCH v3 3/4] vfs: Remove unnecessary list_for_each_entry_safe() variants Waiman Long
2016-07-15 17:39 ` [PATCH v3 4/4] vfs: Use dlock list for superblock's inode list Waiman Long
2016-07-19  5:23   ` Al Viro
2016-07-19 18:35     ` Tejun Heo
2016-07-19 19:07     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=578FD6D4.4010005@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=andi@firstfloor.org \
    --cc=bfields@fieldses.org \
    --cc=boqun.feng@gmail.com \
    --cc=cl@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=doug.hatch@hpe.com \
    --cc=jack@suse.com \
    --cc=jlayton@poochiereds.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hpe.com \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox