* [PATCH 0/3] RFC improvements to radix-tree related to DAX @ 2016-02-28 5:09 NeilBrown 2016-02-28 5:09 ` [PATCH 3/3] radix-tree: support locking of individual exception entries NeilBrown ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: NeilBrown @ 2016-02-28 5:09 UTC (permalink / raw) To: Ross Zwisler, Matthew Wilcox, Andrew Morton, Jan Kara Cc: linux-kernel, linux-fsdevel, linux-mm Hi, While pondering some issues with DAX and how it uses the radix tree I conceived the following patches. I don't know if they'll be useful but I thought I would post them in case they are helpful. The first is quite independent of the others - it removes some DAX specific #defines from radix-tree.h which is a generic ADT. The second makes an extra bit available when exception data is stored in the radix tree. The third uses this bit to provide a sleeping lock. With this it should be possible to delete exceptional entries from the radix tree in a race-free way without external locking. Like the page lock it requires an external set of wait_queue_heads. The same ones used for page_lock would be suitable. Note that this code is only compile tested. NeilBrown --- NeilBrown (3): DAX: move RADIX_DAX_ definitions to dax.c radix-tree: make 'indirect' bit available to exception entries. radix-tree: support locking of individual exception entries. fs/dax.c | 9 ++ include/linux/radix-tree.h | 28 +++++--- lib/radix-tree.c | 160 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 185 insertions(+), 12 deletions(-) -- Signature -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/3] radix-tree: support locking of individual exception entries. 2016-02-28 5:09 [PATCH 0/3] RFC improvements to radix-tree related to DAX NeilBrown @ 2016-02-28 5:09 ` NeilBrown 2016-02-28 5:30 ` kbuild test robot ` (2 more replies) 2016-02-28 5:09 ` [PATCH 1/3] DAX: move RADIX_DAX_ definitions to dax.c NeilBrown 2016-02-28 5:09 ` [PATCH 2/3] radix-tree: make 'indirect' bit available to exception entries NeilBrown 2 siblings, 3 replies; 15+ messages in thread From: NeilBrown @ 2016-02-28 5:09 UTC (permalink / raw) To: Ross Zwisler, Matthew Wilcox, Andrew Morton, Jan Kara Cc: linux-kernel, linux-fsdevel, linux-mm The least significant bit of an exception entry is used as a lock flag. A caller can: - create a locked entry by simply adding an entry with this flag set - lock an existing entry with radix_tree_lookup_lock(). This may return NULL if the entry doesn't exists, or was deleted while waiting for the lock. It may return a non-exception entry if that is what is found. If it returns a locked entry then it has exclusive rights to delete the entry. - unlock an entry that is already locked. This will wake any waiters. - delete an entry that is locked. This will wake waiters so that they return NULL without looking at the slot in the radix tree. These must all be called with the radix tree locked (i.e. a spinlock held). That spinlock is passed to radix_tree_lookup_lock() so that it can drop the lock while waiting. This is a "demonstration of concept". I haven't actually tested, only compiled. A possible use case is for the exception entries used by DAX. It is possible that some of the lookups can be optimised away in some cases by storing a slot pointer. I wanted to keep it reasonable simple until it was determined if it might be useful. Signed-off-by: NeilBrown <neilb@suse.com> --- include/linux/radix-tree.h | 8 ++ lib/radix-tree.c | 158 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 166 insertions(+) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 450c12b546b7..8f579f66574b 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -308,6 +308,14 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root, int radix_tree_tagged(struct radix_tree_root *root, unsigned int tag); unsigned long radix_tree_locate_item(struct radix_tree_root *root, void *item); +void *radix_tree_lookup_lock(struct radix_tree_root *root, wait_queue_head_t *wq, + unsigned long index, spinlock_t *lock); +void radix_tree_unlock(struct radix_tree_root *root, wait_queue_head_t *wq, + unsigned long index); +void radix_tree_delete_unlock(struct radix_tree_root *root, wait_queue_head_t *wq, + unsigned long index); + + static inline void radix_tree_preload_end(void) { preempt_enable(); diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 37d4643ab5c0..a24ea002f3eb 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1500,3 +1500,161 @@ void __init radix_tree_init(void) radix_tree_init_maxindex(); hotcpu_notifier(radix_tree_callback, 0); } + +/* Exception entry locking. + * The least significant bit of an exception entry can be used as a + * "locked" flag. Supported locking operations are: + * radix_tree_lookup_lock() - if the indexed entry exists, lock it and + * return the value, else return NULL. If the indexed entry is not + * exceptional it is returned without locking. + * radix_tree_unlock() - release the lock on the indexed entry + * radix_tree_delete_unlock() - the entry must be locked. It will be atomically + * unlocked and removed. Any threads sleeping in lookup_lock() will return. + * Each of these take a radix_tree_root, a wait_queue_head_t, and an index. + * The '*lock' function also takes a spinlock_t which must be held when any + * of the functions is called. *lock will drop the spinlock while waiting for + * the entry lock. + * + * As delete_unlock could free the radix_tree_node, waiters much not touch it + * when woken. We provide a wake function for the waitq which records when the + * item has been deleted. + * + * The wait_queue_head passed should be one that is used for bit_wait, such + * as zone->wait_table. We re-use the 'flags' and 'timeout' fields of the + * wait_bit_key to store the root and index that we are waiting for. + * __wake_up may only be called on one of these keys while the radix tree + * is locked. The wakeup function will take the lock itself if appropriate, or + * may record that the radix tree entry has been deleted. In either case + * the waiting function just looks at the status reported by the wakeup function + * and doesn't look at the radix tree itself. + * + * There is no function for locking an entry while inserting it. Simply + * insert an entry that is already marked as 'locked' - lsb set. + * + */ + +struct wait_slot_queue { + struct radix_tree_root *root; + unsigned long index; + wait_queue_t wait; + enum {SLOT_WAITING, SLOT_LOCKED, SLOT_GONE} state; + void *ret; +}; + +static inline int slot_locked(void *v) +{ + unsigned long l = (unsigned long)v; + return l & 1; +} + +static inline void *lock_slot(void **v) +{ + unsigned long *l = (unsigned long *)v; + return (void*)(*l |= 1); +} + +static inline void * unlock_slot(void **v) +{ + unsigned long *l = (unsigned long *)v; + return (void*)(*l &= ~1UL); +} + +static int wake_slot_function(wait_queue_t *wait, unsigned mode, int sync, + void *arg) +{ + struct wait_bit_key *key = arg; + struct wait_slot_queue *wait_slot = + container_of(wait, struct wait_slot_queue, wait); + void **slot; + + if (wait_slot->root != key->flags || + wait_slot->index != key->timeout) + /* Not waking this waiter */ + return 0; + if (wait_slot->state != SLOT_WAITING) + /* Should be impossible.... */ + return 1; + if (key->bit_nr == -3) + /* Was just deleted, no point in doing a lookup */ + wait_slot = NULL; + else + wait_slot->ret = __radix_tree_lookup( + wait_slot->root, wait_slot->index, NULL, &slot); + if (!wait_slot->ret || !radix_tree_exceptional_entry(wait_slot->ret)) { + wait_slot->state = SLOT_GONE; + return 1; + } + if (slot_locked(slot)) + /* still locked */ + return 0; + wait_slot->ret = lock_slot(slot); + wait_slot->state = SLOT_LOCKED; + return 1; +} + +void *radix_tree_lookup_lock(struct radix_tree_root *root, wait_queue_head_t *wq, + unsigned long index, spinlock_t *lock) +{ + void *ret, **slot; + struct wait_slot_queue wait; + + ret = __radix_tree_lookup(root, index, NULL, &slot); + if (!ret || !radix_tree_exceptional_entry(ret)) + return ret; + if (!slot_locked(slot)) + return lock_slot(slot); + + wait.wait.private = current; + wait.wait.func = wake_slot_function; + INIT_LIST_HEAD(&wait.wait.task_list); + wait.state = SLOT_WAITING; + wait.root = root; + wait.index = index; + wait.ret = NULL; + for (;;) { + prepare_to_wait(wq, &wait.wait, + TASK_UNINTERRUPTIBLE); + if (wait.state != SLOT_WAITING) + break; + + spin_unlock(lock); + schedule(); + spin_lock(lock); + } + finish_wait(wq, &wait.wait); + return wait.ret; +} +EXPORT_SYMBOL(radix_tree_lookup_lock); + +void radix_tree_unlock(struct radix_tree_root *root, wait_queue_head_t *wq, + unsigned long index) +{ + void *ret, **slot; + + ret = __radix_tree_lookup(root, index, NULL, &slot); + if (WARN_ON_ONCE(!ret || !radix_tree_exceptional_entry(ret))) + return; + if (WARN_ON_ONCE(!slot_locked(slot))) + return; + unlock_slot(slot); + + if (waitqueue_active(wq)) { + struct wait_bit_key key = {.flags = root, .bit_nr = -2, + .timeout = index}; + __wake_up(wq, TASK_NORMAL, 1, &key); + } +} +EXPORT_SYMBOL(radix_tree_unlock); + +void radix_tree_delete_unlock(struct radix_tree_root *root, wait_queue_head_t *wq, + unsigned long index) +{ + radix_tree_delete(root, index); + if (waitqueue_active(wq)) { + /* -3 here indicates deletion */ + struct wait_bit_key key = {.flags = root, .bit_nr = -3, + .timeout = index}; + __wake_up(wq, TASK_NORMAL, 1, &key); + } +} +EXPORT_SYMBOL(radix_tree_delete_unlock); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] radix-tree: support locking of individual exception entries. 2016-02-28 5:09 ` [PATCH 3/3] radix-tree: support locking of individual exception entries NeilBrown @ 2016-02-28 5:30 ` kbuild test robot 2016-02-28 6:27 ` NeilBrown 2016-03-03 13:10 ` Jan Kara 2 siblings, 0 replies; 15+ messages in thread From: kbuild test robot @ 2016-02-28 5:30 UTC (permalink / raw) To: NeilBrown Cc: kbuild-all, Ross Zwisler, Matthew Wilcox, Andrew Morton, Jan Kara, linux-kernel, linux-fsdevel, linux-mm [-- Attachment #1: Type: text/plain, Size: 2763 bytes --] Hi NeilBrown, [auto build test ERROR on v4.5-rc5] [also build test ERROR on next-20160226] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] url: https://github.com/0day-ci/linux/commits/NeilBrown/RFC-improvements-to-radix-tree-related-to-DAX/20160228-132214 config: i386-tinyconfig (attached as .config) reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): lib/radix-tree.c: In function 'radix_tree_lookup_lock': >> lib/radix-tree.c:1616:5: error: 'TASK_UNINTERRUPTIBLE' undeclared (first use in this function) TASK_UNINTERRUPTIBLE); ^ lib/radix-tree.c:1616:5: note: each undeclared identifier is reported only once for each function it appears in >> lib/radix-tree.c:1621:3: error: implicit declaration of function 'schedule' [-Werror=implicit-function-declaration] schedule(); ^ lib/radix-tree.c: In function 'radix_tree_unlock': >> lib/radix-tree.c:1644:17: error: 'TASK_NORMAL' undeclared (first use in this function) __wake_up(wq, TASK_NORMAL, 1, &key); ^ lib/radix-tree.c: In function 'radix_tree_delete_unlock': lib/radix-tree.c:1657:17: error: 'TASK_NORMAL' undeclared (first use in this function) __wake_up(wq, TASK_NORMAL, 1, &key); ^ cc1: some warnings being treated as errors vim +/TASK_UNINTERRUPTIBLE +1616 lib/radix-tree.c 1610 wait.state = SLOT_WAITING; 1611 wait.root = root; 1612 wait.index = index; 1613 wait.ret = NULL; 1614 for (;;) { 1615 prepare_to_wait(wq, &wait.wait, > 1616 TASK_UNINTERRUPTIBLE); 1617 if (wait.state != SLOT_WAITING) 1618 break; 1619 1620 spin_unlock(lock); > 1621 schedule(); 1622 spin_lock(lock); 1623 } 1624 finish_wait(wq, &wait.wait); 1625 return wait.ret; 1626 } 1627 EXPORT_SYMBOL(radix_tree_lookup_lock); 1628 1629 void radix_tree_unlock(struct radix_tree_root *root, wait_queue_head_t *wq, 1630 unsigned long index) 1631 { 1632 void *ret, **slot; 1633 1634 ret = __radix_tree_lookup(root, index, NULL, &slot); 1635 if (WARN_ON_ONCE(!ret || !radix_tree_exceptional_entry(ret))) 1636 return; 1637 if (WARN_ON_ONCE(!slot_locked(slot))) 1638 return; 1639 unlock_slot(slot); 1640 1641 if (waitqueue_active(wq)) { 1642 struct wait_bit_key key = {.flags = root, .bit_nr = -2, 1643 .timeout = index}; > 1644 __wake_up(wq, TASK_NORMAL, 1, &key); 1645 } 1646 } 1647 EXPORT_SYMBOL(radix_tree_unlock); --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/octet-stream, Size: 6204 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] radix-tree: support locking of individual exception entries. 2016-02-28 5:09 ` [PATCH 3/3] radix-tree: support locking of individual exception entries NeilBrown 2016-02-28 5:30 ` kbuild test robot @ 2016-02-28 6:27 ` NeilBrown 2016-03-03 13:10 ` Jan Kara 2 siblings, 0 replies; 15+ messages in thread From: NeilBrown @ 2016-02-28 6:27 UTC (permalink / raw) To: Ross Zwisler, Matthew Wilcox, Andrew Morton, Jan Kara Cc: linux-kernel, linux-fsdevel, linux-mm [-- Attachment #1: Type: text/plain, Size: 1176 bytes --] On Sun, Feb 28 2016, NeilBrown <neilb@suse.com> wrote: > +static int wake_slot_function(wait_queue_t *wait, unsigned mode, int sync, > + void *arg) > +{ > + struct wait_bit_key *key = arg; > + struct wait_slot_queue *wait_slot = > + container_of(wait, struct wait_slot_queue, wait); > + void **slot; > + > + if (wait_slot->root != key->flags || > + wait_slot->index != key->timeout) > + /* Not waking this waiter */ > + return 0; > + if (wait_slot->state != SLOT_WAITING) > + /* Should be impossible.... */ > + return 1; > + if (key->bit_nr == -3) > + /* Was just deleted, no point in doing a lookup */ > + wait_slot = NULL; > + else > + wait_slot->ret = __radix_tree_lookup( > + wait_slot->root, wait_slot->index, NULL, &slot); > + if (!wait_slot->ret || !radix_tree_exceptional_entry(wait_slot->ret)) { > + wait_slot->state = SLOT_GONE; > + return 1; > + } > + if (slot_locked(slot)) > + /* still locked */ > + return 0; > + wait_slot->ret = lock_slot(slot); > + wait_slot->state = SLOT_LOCKED; > + return 1; > +} Sorry, just realized that this should: return autoremove_wake_function(wait, mode, sync, arg); instead of "return 1;" NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] radix-tree: support locking of individual exception entries. 2016-02-28 5:09 ` [PATCH 3/3] radix-tree: support locking of individual exception entries NeilBrown 2016-02-28 5:30 ` kbuild test robot 2016-02-28 6:27 ` NeilBrown @ 2016-03-03 13:10 ` Jan Kara 2016-03-03 23:51 ` NeilBrown 2 siblings, 1 reply; 15+ messages in thread From: Jan Kara @ 2016-03-03 13:10 UTC (permalink / raw) To: NeilBrown Cc: Ross Zwisler, Matthew Wilcox, Andrew Morton, Jan Kara, linux-kernel, linux-fsdevel, linux-mm [-- Attachment #1: Type: text/plain, Size: 8876 bytes --] Hi Neil, On Sun 28-02-16 16:09:29, NeilBrown wrote: > The least significant bit of an exception entry is used as a lock flag. > A caller can: > - create a locked entry by simply adding an entry with this flag set > - lock an existing entry with radix_tree_lookup_lock(). This may return > NULL if the entry doesn't exists, or was deleted while waiting for > the lock. It may return a non-exception entry if that is what is > found. If it returns a locked entry then it has exclusive rights > to delete the entry. > - unlock an entry that is already locked. This will wake any waiters. > - delete an entry that is locked. This will wake waiters so that they > return NULL without looking at the slot in the radix tree. > > These must all be called with the radix tree locked (i.e. a spinlock held). > That spinlock is passed to radix_tree_lookup_lock() so that it can drop > the lock while waiting. > > This is a "demonstration of concept". I haven't actually tested, only compiled. > A possible use case is for the exception entries used by DAX. > > It is possible that some of the lookups can be optimised away in some > cases by storing a slot pointer. I wanted to keep it reasonable > simple until it was determined if it might be useful. Thanks for having a look! So the patch looks like it would do the work but frankly the amount of hackiness in it has exceeded my personal threshold... several times ;) In particular I don't quite understand why have you decided to re-lookup the exceptional entry in the wake function? That seems to be the source of a lot of a hackiness? I was hoping for something simpler like what I've attached (compile tested only). What do you think? To avoid false wakeups and thundering herd issues which my simple version does have, we could do something like what I outline in the second patch. Now that I look at the result that is closer to your patch, just cleaner IMHO :). But I wanted to have it separated to see how much complexity does this additional functionality brings... Now I'm going to have a look how to use this in DAX... Honza > Signed-off-by: NeilBrown <neilb@suse.com> > --- > include/linux/radix-tree.h | 8 ++ > lib/radix-tree.c | 158 ++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 166 insertions(+) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index 450c12b546b7..8f579f66574b 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -308,6 +308,14 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root, > int radix_tree_tagged(struct radix_tree_root *root, unsigned int tag); > unsigned long radix_tree_locate_item(struct radix_tree_root *root, void *item); > > +void *radix_tree_lookup_lock(struct radix_tree_root *root, wait_queue_head_t *wq, > + unsigned long index, spinlock_t *lock); > +void radix_tree_unlock(struct radix_tree_root *root, wait_queue_head_t *wq, > + unsigned long index); > +void radix_tree_delete_unlock(struct radix_tree_root *root, wait_queue_head_t *wq, > + unsigned long index); > + > + > static inline void radix_tree_preload_end(void) > { > preempt_enable(); > diff --git a/lib/radix-tree.c b/lib/radix-tree.c > index 37d4643ab5c0..a24ea002f3eb 100644 > --- a/lib/radix-tree.c > +++ b/lib/radix-tree.c > @@ -1500,3 +1500,161 @@ void __init radix_tree_init(void) > radix_tree_init_maxindex(); > hotcpu_notifier(radix_tree_callback, 0); > } > + > +/* Exception entry locking. > + * The least significant bit of an exception entry can be used as a > + * "locked" flag. Supported locking operations are: > + * radix_tree_lookup_lock() - if the indexed entry exists, lock it and > + * return the value, else return NULL. If the indexed entry is not > + * exceptional it is returned without locking. > + * radix_tree_unlock() - release the lock on the indexed entry > + * radix_tree_delete_unlock() - the entry must be locked. It will be atomically > + * unlocked and removed. Any threads sleeping in lookup_lock() will return. > + * Each of these take a radix_tree_root, a wait_queue_head_t, and an index. > + * The '*lock' function also takes a spinlock_t which must be held when any > + * of the functions is called. *lock will drop the spinlock while waiting for > + * the entry lock. > + * > + * As delete_unlock could free the radix_tree_node, waiters much not touch it > + * when woken. We provide a wake function for the waitq which records when the > + * item has been deleted. > + * > + * The wait_queue_head passed should be one that is used for bit_wait, such > + * as zone->wait_table. We re-use the 'flags' and 'timeout' fields of the > + * wait_bit_key to store the root and index that we are waiting for. > + * __wake_up may only be called on one of these keys while the radix tree > + * is locked. The wakeup function will take the lock itself if appropriate, or > + * may record that the radix tree entry has been deleted. In either case > + * the waiting function just looks at the status reported by the wakeup function > + * and doesn't look at the radix tree itself. > + * > + * There is no function for locking an entry while inserting it. Simply > + * insert an entry that is already marked as 'locked' - lsb set. > + * > + */ > + > +struct wait_slot_queue { > + struct radix_tree_root *root; > + unsigned long index; > + wait_queue_t wait; > + enum {SLOT_WAITING, SLOT_LOCKED, SLOT_GONE} state; > + void *ret; > +}; > + > +static inline int slot_locked(void *v) > +{ > + unsigned long l = (unsigned long)v; > + return l & 1; > +} > + > +static inline void *lock_slot(void **v) > +{ > + unsigned long *l = (unsigned long *)v; > + return (void*)(*l |= 1); > +} > + > +static inline void * unlock_slot(void **v) > +{ > + unsigned long *l = (unsigned long *)v; > + return (void*)(*l &= ~1UL); > +} > + > +static int wake_slot_function(wait_queue_t *wait, unsigned mode, int sync, > + void *arg) > +{ > + struct wait_bit_key *key = arg; > + struct wait_slot_queue *wait_slot = > + container_of(wait, struct wait_slot_queue, wait); > + void **slot; > + > + if (wait_slot->root != key->flags || > + wait_slot->index != key->timeout) > + /* Not waking this waiter */ > + return 0; > + if (wait_slot->state != SLOT_WAITING) > + /* Should be impossible.... */ > + return 1; > + if (key->bit_nr == -3) > + /* Was just deleted, no point in doing a lookup */ > + wait_slot = NULL; > + else > + wait_slot->ret = __radix_tree_lookup( > + wait_slot->root, wait_slot->index, NULL, &slot); > + if (!wait_slot->ret || !radix_tree_exceptional_entry(wait_slot->ret)) { > + wait_slot->state = SLOT_GONE; > + return 1; > + } > + if (slot_locked(slot)) > + /* still locked */ > + return 0; > + wait_slot->ret = lock_slot(slot); > + wait_slot->state = SLOT_LOCKED; > + return 1; > +} > + > +void *radix_tree_lookup_lock(struct radix_tree_root *root, wait_queue_head_t *wq, > + unsigned long index, spinlock_t *lock) > +{ > + void *ret, **slot; > + struct wait_slot_queue wait; > + > + ret = __radix_tree_lookup(root, index, NULL, &slot); > + if (!ret || !radix_tree_exceptional_entry(ret)) > + return ret; > + if (!slot_locked(slot)) > + return lock_slot(slot); > + > + wait.wait.private = current; > + wait.wait.func = wake_slot_function; > + INIT_LIST_HEAD(&wait.wait.task_list); > + wait.state = SLOT_WAITING; > + wait.root = root; > + wait.index = index; > + wait.ret = NULL; > + for (;;) { > + prepare_to_wait(wq, &wait.wait, > + TASK_UNINTERRUPTIBLE); > + if (wait.state != SLOT_WAITING) > + break; > + > + spin_unlock(lock); > + schedule(); > + spin_lock(lock); > + } > + finish_wait(wq, &wait.wait); > + return wait.ret; > +} > +EXPORT_SYMBOL(radix_tree_lookup_lock); > + > +void radix_tree_unlock(struct radix_tree_root *root, wait_queue_head_t *wq, > + unsigned long index) > +{ > + void *ret, **slot; > + > + ret = __radix_tree_lookup(root, index, NULL, &slot); > + if (WARN_ON_ONCE(!ret || !radix_tree_exceptional_entry(ret))) > + return; > + if (WARN_ON_ONCE(!slot_locked(slot))) > + return; > + unlock_slot(slot); > + > + if (waitqueue_active(wq)) { > + struct wait_bit_key key = {.flags = root, .bit_nr = -2, > + .timeout = index}; > + __wake_up(wq, TASK_NORMAL, 1, &key); > + } > +} > +EXPORT_SYMBOL(radix_tree_unlock); > + > +void radix_tree_delete_unlock(struct radix_tree_root *root, wait_queue_head_t *wq, > + unsigned long index) > +{ > + radix_tree_delete(root, index); > + if (waitqueue_active(wq)) { > + /* -3 here indicates deletion */ > + struct wait_bit_key key = {.flags = root, .bit_nr = -3, > + .timeout = index}; > + __wake_up(wq, TASK_NORMAL, 1, &key); > + } > +} > +EXPORT_SYMBOL(radix_tree_delete_unlock); > > -- Jan Kara <jack@suse.com> SUSE Labs, CR [-- Attachment #2: 0001-radix-tree-support-locking-of-individual-exception-e.patch --] [-- Type: text/x-patch, Size: 4926 bytes --] >From c30e48c69b3390250f60b780c64ebeb2da2fcf75 Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@suse.com> Date: Sun, 28 Feb 2016 16:09:29 +1100 Subject: [PATCH] radix-tree: support locking of individual exception entries. The least significant bit of an exception entry is used as a lock flag. A caller can: - create a locked entry by simply adding an entry with this flag set - lock an existing entry with radix_tree_lookup_lock(). This may return NULL if the entry doesn't exists, or was deleted while waiting for the lock. It may return a non-exception entry if that is what is found. If it returns a locked entry then it has exclusive rights to delete the entry. - unlock an entry that is already locked. This will wake any waiters. These must all be called with the radix tree locked (i.e. a spinlock held). That spinlock is passed to radix_tree_lookup_lock() so that it can drop the lock while waiting. This is a "demonstration of concept". I haven't actually tested, only compiled. A possible use case is for the exception entries used by DAX. It is possible that some of the lookups can be optimised away in some cases by storing a slot pointer. I wanted to keep it reasonable simple until it was determined if it might be useful. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> --- include/linux/radix-tree.h | 5 +++ lib/radix-tree.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 88 insertions(+) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 450c12b546b7..92138d30b95d 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -308,6 +308,11 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root, int radix_tree_tagged(struct radix_tree_root *root, unsigned int tag); unsigned long radix_tree_locate_item(struct radix_tree_root *root, void *item); +void *radix_tree_lookup_lock(struct radix_tree_root *root, unsigned long index, + wait_queue_head_t *wq, spinlock_t *lock); +void radix_tree_unlock(struct radix_tree_root *root, unsigned long index, + wait_queue_head_t *wq); + static inline void radix_tree_preload_end(void) { preempt_enable(); diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 37d4643ab5c0..a97231ab66f0 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1500,3 +1500,86 @@ void __init radix_tree_init(void) radix_tree_init_maxindex(); hotcpu_notifier(radix_tree_callback, 0); } + +/* + * Exception entry locking + */ +static inline int slot_locked(void **v) +{ + unsigned long l = *(unsigned long *)v; + return l & 1; +} + +static inline void *lock_slot(void **v) +{ + unsigned long *l = (unsigned long *)v; + return (void*)(*l |= 1); +} + +static inline void * unlock_slot(void **v) +{ + unsigned long *l = (unsigned long *)v; + return (void*)(*l &= ~1UL); +} + +/** + * radix_tree_lookup_lock - lookup and lock exceptional entry if found + * @root: radix tree root + * @index: index key + * @wq: waitqueue to wait for exceptional entry lock + * @lock: spinlock protecting the radix tree + * + * Lookup @index in the radix tree @root and if there is an exceptional + * entry at that location, lock it. If entry at @index is not exceptional, + * we just return it. We use @wq as a wait queue to wait for exceptional + * entry lock to be released. @lock is a spinlock protecting the radix + * tree which we assume is locked when entering this function and released + * while waiting for the exceptional entry lock. + */ +void *radix_tree_lookup_lock(struct radix_tree_root *root, unsigned long index, + wait_queue_head_t *wq, spinlock_t *lock) +{ + void *ret, **slot; + DEFINE_WAIT(wait); + + for (;;) { + ret = __radix_tree_lookup(root, index, NULL, &slot); + if (!ret || !radix_tree_exceptional_entry(ret)) + return ret; + if (!slot_locked(slot)) + return lock_slot(slot); + prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE); + spin_unlock(lock); + schedule(); + finish_wait(wq, &wait); + spin_lock(lock); + } +} +EXPORT_SYMBOL(radix_tree_lookup_lock); + +/** + * radix_tree_unlock - unlock exceptional radix tree entry + * @root: radix tree root + * @index: index key + * @wq: waitqueue to wake waiters for exceptional entry lock + * + * Unlock exceptional entry at @index in a radix tree @root and wake up + * waiters for it waiting in wait queue @wq. We expect the radix tree is + * locked against concurrent modifications. + */ +void radix_tree_unlock(struct radix_tree_root *root, unsigned long index, + wait_queue_head_t *wq) +{ + void *ret, **slot; + + ret = __radix_tree_lookup(root, index, NULL, &slot); + if (WARN_ON_ONCE(!ret || !radix_tree_exceptional_entry(ret))) + return; + if (WARN_ON_ONCE(!slot_locked(slot))) + return; + unlock_slot(slot); + + if (waitqueue_active(wq)) + wake_up(wq); +} +EXPORT_SYMBOL(radix_tree_unlock); -- 2.6.2 [-- Attachment #3: 0001-radix-tree-Avoid-false-wakeups-when-waiting-for-exce.patch --] [-- Type: text/x-patch, Size: 2580 bytes --] >From 982c02870b262da742f593e695b4050aa99fabd3 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Thu, 3 Mar 2016 14:06:42 +0100 Subject: [PATCH] radix-tree: Avoid false wakeups when waiting for exceptional entry lock Signed-off-by: Jan Kara <jack@suse.cz> --- lib/radix-tree.c | 42 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/lib/radix-tree.c b/lib/radix-tree.c index a97231ab66f0..be9763dd9de5 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1522,6 +1522,28 @@ static inline void * unlock_slot(void **v) return (void*)(*l &= ~1UL); } +struct exceptional_entry_key { + struct radix_tree_root *root; + unsigned long index; +}; + +struct wait_exceptional_entry_queue { + wait_queue_t wait; + struct exceptional_entry_key key; +}; + +static int wake_exceptional_entry_func(wait_queue_t *wait, unsigned mode, + int sync, void *keyp) +{ + struct exceptional_entry_key *key = keyp; + struct wait_exceptional_entry_queue *ewait = + container_of(wait, struct wait_exceptional_entry_queue, wait); + + if (key->root != ewait->key.root || key->index != ewait->key.index) + return 0; + return autoremove_wake_function(wait, mode, sync, NULL); +} + /** * radix_tree_lookup_lock - lookup and lock exceptional entry if found * @root: radix tree root @@ -1540,7 +1562,12 @@ void *radix_tree_lookup_lock(struct radix_tree_root *root, unsigned long index, wait_queue_head_t *wq, spinlock_t *lock) { void *ret, **slot; - DEFINE_WAIT(wait); + struct wait_exceptional_entry_queue wait; + + init_wait(&wait.wait); + wait.wait.func = wake_exceptional_entry_func; + wait.key.root = root; + wait.key.index = index; for (;;) { ret = __radix_tree_lookup(root, index, NULL, &slot); @@ -1548,10 +1575,10 @@ void *radix_tree_lookup_lock(struct radix_tree_root *root, unsigned long index, return ret; if (!slot_locked(slot)) return lock_slot(slot); - prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE); + prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE); spin_unlock(lock); schedule(); - finish_wait(wq, &wait); + finish_wait(wq, &wait.wait); spin_lock(lock); } } @@ -1579,7 +1606,12 @@ void radix_tree_unlock(struct radix_tree_root *root, unsigned long index, return; unlock_slot(slot); - if (waitqueue_active(wq)) - wake_up(wq); + if (waitqueue_active(wq)) { + struct exceptional_entry_key key; + + key.root = root; + key.index = index; + __wake_up(wq, TASK_NORMAL, 1, &key); + } } EXPORT_SYMBOL(radix_tree_unlock); -- 2.6.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] radix-tree: support locking of individual exception entries. 2016-03-03 13:10 ` Jan Kara @ 2016-03-03 23:51 ` NeilBrown 2016-03-04 10:14 ` NeilBrown 0 siblings, 1 reply; 15+ messages in thread From: NeilBrown @ 2016-03-03 23:51 UTC (permalink / raw) To: Jan Kara Cc: Ross Zwisler, Matthew Wilcox, Andrew Morton, Jan Kara, linux-kernel, linux-fsdevel, linux-mm [-- Attachment #1: Type: text/plain, Size: 3730 bytes --] On Fri, Mar 04 2016, Jan Kara wrote: > Hi Neil, > > On Sun 28-02-16 16:09:29, NeilBrown wrote: >> The least significant bit of an exception entry is used as a lock flag. >> A caller can: >> - create a locked entry by simply adding an entry with this flag set >> - lock an existing entry with radix_tree_lookup_lock(). This may return >> NULL if the entry doesn't exists, or was deleted while waiting for >> the lock. It may return a non-exception entry if that is what is >> found. If it returns a locked entry then it has exclusive rights >> to delete the entry. >> - unlock an entry that is already locked. This will wake any waiters. >> - delete an entry that is locked. This will wake waiters so that they >> return NULL without looking at the slot in the radix tree. >> >> These must all be called with the radix tree locked (i.e. a spinlock held). >> That spinlock is passed to radix_tree_lookup_lock() so that it can drop >> the lock while waiting. >> >> This is a "demonstration of concept". I haven't actually tested, only compiled. >> A possible use case is for the exception entries used by DAX. >> >> It is possible that some of the lookups can be optimised away in some >> cases by storing a slot pointer. I wanted to keep it reasonable >> simple until it was determined if it might be useful. > > Thanks for having a look! So the patch looks like it would do the work but > frankly the amount of hackiness in it has exceeded my personal threshold... > several times ;) Achievement unlocked ? :-) > > In particular I don't quite understand why have you decided to re-lookup > the exceptional entry in the wake function? That seems to be the source of > a lot of a hackiness? Yes..... My original idea was that there would only be a single lookup. If the slot was found to be locked, the address of the slot would be stored in the key so the wakeup function could trivially detect if it was being deleted, or could claim the lock, and would signal the result to the caller. But then I realized that the address of the slot is not necessarily stable. In particular the slot for address 0 can be in the root, or it can be in a node. I could special-case address zero but it was easier just to do the search. Of course the slot disappears if the entry is deleted. That is why the wakeup function (which is called under the tree lock so can still safely inspect the slot) would signal to the caller that the delete had happened. So the patch was a little confused.... > I was hoping for something simpler like what I've > attached (compile tested only). What do you think? Certainly simpler. By not layering on top of wait_bit_key, you've precluded the use of the current page wait_queues for these locks - you need to allocate new wait queue heads. If in > +struct wait_exceptional_entry_queue { > + wait_queue_t wait; > + struct exceptional_entry_key key; > +}; you had the exceptional_entry_key first (like wait_bit_queue does) you would be closer to being able to re-use the queues. Also I don't think it is safe to use an exclusive wait. When a slot is deleted, you need to wake up *all* the waiters. Thanks, NeilBrown > > To avoid false wakeups and thundering herd issues which my simple version does > have, we could do something like what I outline in the second patch. Now > that I look at the result that is closer to your patch, just cleaner IMHO :). > But I wanted to have it separated to see how much complexity does this > additional functionality brings... > > Now I'm going to have a look how to use this in DAX... > > Honza [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] radix-tree: support locking of individual exception entries. 2016-03-03 23:51 ` NeilBrown @ 2016-03-04 10:14 ` NeilBrown 2016-03-04 12:31 ` Jan Kara 0 siblings, 1 reply; 15+ messages in thread From: NeilBrown @ 2016-03-04 10:14 UTC (permalink / raw) To: Jan Kara Cc: Ross Zwisler, Matthew Wilcox, Andrew Morton, Jan Kara, linux-kernel, linux-fsdevel, linux-mm [-- Attachment #1: Type: text/plain, Size: 807 bytes --] On Fri, Mar 04 2016, NeilBrown wrote: > > By not layering on top of wait_bit_key, you've precluded the use of the > current page wait_queues for these locks - you need to allocate new wait > queue heads. > > If in > >> +struct wait_exceptional_entry_queue { >> + wait_queue_t wait; >> + struct exceptional_entry_key key; >> +}; > > you had the exceptional_entry_key first (like wait_bit_queue does) you > would be closer to being able to re-use the queues. Scratch that bit, I was confusing myself again. Sorry. Each wait_queue_t has it's own function so one function will never be called on other items in the queue - of course. > > Also I don't think it is safe to use an exclusive wait. When a slot is > deleted, you need to wake up *all* the waiters. I think this issue is still valid. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] radix-tree: support locking of individual exception entries. 2016-03-04 10:14 ` NeilBrown @ 2016-03-04 12:31 ` Jan Kara 2016-03-09 2:13 ` NeilBrown 0 siblings, 1 reply; 15+ messages in thread From: Jan Kara @ 2016-03-04 12:31 UTC (permalink / raw) To: NeilBrown Cc: Jan Kara, Ross Zwisler, Matthew Wilcox, Andrew Morton, linux-kernel, linux-fsdevel, linux-mm On Fri 04-03-16 21:14:24, NeilBrown wrote: > On Fri, Mar 04 2016, NeilBrown wrote: > > > > > By not layering on top of wait_bit_key, you've precluded the use of the > > current page wait_queues for these locks - you need to allocate new wait > > queue heads. > > > > If in > > > >> +struct wait_exceptional_entry_queue { > >> + wait_queue_t wait; > >> + struct exceptional_entry_key key; > >> +}; > > > > you had the exceptional_entry_key first (like wait_bit_queue does) you > > would be closer to being able to re-use the queues. > > Scratch that bit, I was confusing myself again. Sorry. > Each wait_queue_t has it's own function so one function will never be > called on other items in the queue - of course. Yes. > > Also I don't think it is safe to use an exclusive wait. When a slot is > > deleted, you need to wake up *all* the waiters. > > I think this issue is still valid. Yes, you are right. I have deleted your function radix_tree_delete_unlock() because I thought it won't be needed - but if we use exclusive waits (which I think we want to) we need to wakeup all waiters when deleting entry as you properly spotted. Currently I'm undecided how we want to deal with that. The thing is - when exceptional entries use locking, we need deleting of a radix tree entry to avoid deleting locked entry so the only proper way to delete entry would be via something like radix_tree_delete_unlock(). OTOH when entry locking is not used (like for tmpfs exceptional entries), we don't want to bother with passing waitqueues around and locking entry just to delete it. The best I came up with was that radix_tree_delete_item() would complain about deleting locked entry so that we catch when someone doesn't properly obey the locking protocol... But I'm still somewhat hesitating whether it would not be better to move the locking out of generic radix tree code since it is not quite as generic as I'd like and e.g. clear_exceptional_entry() would use locked delete only for DAX mappings anyway. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] radix-tree: support locking of individual exception entries. 2016-03-04 12:31 ` Jan Kara @ 2016-03-09 2:13 ` NeilBrown 0 siblings, 0 replies; 15+ messages in thread From: NeilBrown @ 2016-03-09 2:13 UTC (permalink / raw) To: Jan Kara Cc: Ross Zwisler, Matthew Wilcox, Andrew Morton, linux-kernel, linux-fsdevel, linux-mm [-- Attachment #1: Type: text/plain, Size: 1845 bytes --] On Fri, Mar 04 2016, Jan Kara wrote: > On Fri 04-03-16 21:14:24, NeilBrown wrote: >> On Fri, Mar 04 2016, NeilBrown wrote: >> >> > >> > By not layering on top of wait_bit_key, you've precluded the use of the >> > current page wait_queues for these locks - you need to allocate new wait >> > queue heads. >> > >> > If in >> > >> >> +struct wait_exceptional_entry_queue { >> >> + wait_queue_t wait; >> >> + struct exceptional_entry_key key; >> >> +}; >> > >> > you had the exceptional_entry_key first (like wait_bit_queue does) you >> > would be closer to being able to re-use the queues. >> >> Scratch that bit, I was confusing myself again. Sorry. >> Each wait_queue_t has it's own function so one function will never be >> called on other items in the queue - of course. > > Yes. I was thinking about this some more, wondering why I thought what I did, and realised there is a small issue that it is worth being aware of. If different users of the same work queue use different "keys", a wake function can get a key of a different type to the one it is expecting. e.g. __wake_up_bit passes the address of a "struct wait_bit_key" to __wake_up which is then passed as a "void* arg" to the wait_queue_func_t. With your code, a "struct exceptional_entry_key" could be passed to wake_bit_function, or a "struct wait_bit_key" could be passed to wake_exceptional_entry_func. Both structures start with a pointer which will never appear in the wrong type, and both function test that pointer first and access nothing else if it doesn't match, so the code is safe. But it could be seen as a bit fragile, and doing something to make it obvious that the different key types need to align on that primary key field would not be a bad thing. .... If we end up using this code. Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] DAX: move RADIX_DAX_ definitions to dax.c 2016-02-28 5:09 [PATCH 0/3] RFC improvements to radix-tree related to DAX NeilBrown 2016-02-28 5:09 ` [PATCH 3/3] radix-tree: support locking of individual exception entries NeilBrown @ 2016-02-28 5:09 ` NeilBrown 2016-02-29 14:28 ` Wilcox, Matthew R 2016-02-28 5:09 ` [PATCH 2/3] radix-tree: make 'indirect' bit available to exception entries NeilBrown 2 siblings, 1 reply; 15+ messages in thread From: NeilBrown @ 2016-02-28 5:09 UTC (permalink / raw) To: Ross Zwisler, Matthew Wilcox, Andrew Morton, Jan Kara Cc: linux-kernel, linux-fsdevel, linux-mm These don't belong in radix-tree.c any more than PAGECACHE_TAG_* do. Let's try to maintain the idea that radix-tree simply implements an abstract data type. Signed-off-by: NeilBrown <neilb@suse.com> --- fs/dax.c | 9 +++++++++ include/linux/radix-tree.h | 9 --------- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 711172450da6..9c4d697fb6fc 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -32,6 +32,15 @@ #include <linux/pfn_t.h> #include <linux/sizes.h> +#define RADIX_DAX_MASK 0xf +#define RADIX_DAX_SHIFT 4 +#define RADIX_DAX_PTE (0x4 | RADIX_TREE_EXCEPTIONAL_ENTRY) +#define RADIX_DAX_PMD (0x8 | RADIX_TREE_EXCEPTIONAL_ENTRY) +#define RADIX_DAX_TYPE(entry) ((unsigned long)entry & RADIX_DAX_MASK) +#define RADIX_DAX_SECTOR(entry) (((unsigned long)entry >> RADIX_DAX_SHIFT)) +#define RADIX_DAX_ENTRY(sector, pmd) ((void *)((unsigned long)sector << \ + RADIX_DAX_SHIFT | (pmd ? RADIX_DAX_PMD : RADIX_DAX_PTE))) + static long dax_map_atomic(struct block_device *bdev, struct blk_dax_ctl *dax) { struct request_queue *q = bdev->bd_queue; diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index f54be7082207..968150ab8a1c 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -51,15 +51,6 @@ #define RADIX_TREE_EXCEPTIONAL_ENTRY 2 #define RADIX_TREE_EXCEPTIONAL_SHIFT 2 -#define RADIX_DAX_MASK 0xf -#define RADIX_DAX_SHIFT 4 -#define RADIX_DAX_PTE (0x4 | RADIX_TREE_EXCEPTIONAL_ENTRY) -#define RADIX_DAX_PMD (0x8 | RADIX_TREE_EXCEPTIONAL_ENTRY) -#define RADIX_DAX_TYPE(entry) ((unsigned long)entry & RADIX_DAX_MASK) -#define RADIX_DAX_SECTOR(entry) (((unsigned long)entry >> RADIX_DAX_SHIFT)) -#define RADIX_DAX_ENTRY(sector, pmd) ((void *)((unsigned long)sector << \ - RADIX_DAX_SHIFT | (pmd ? RADIX_DAX_PMD : RADIX_DAX_PTE))) - static inline int radix_tree_is_indirect_ptr(void *ptr) { return (int)((unsigned long)ptr & RADIX_TREE_INDIRECT_PTR); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: [PATCH 1/3] DAX: move RADIX_DAX_ definitions to dax.c 2016-02-28 5:09 ` [PATCH 1/3] DAX: move RADIX_DAX_ definitions to dax.c NeilBrown @ 2016-02-29 14:28 ` Wilcox, Matthew R 2016-02-29 17:46 ` Ross Zwisler 0 siblings, 1 reply; 15+ messages in thread From: Wilcox, Matthew R @ 2016-02-29 14:28 UTC (permalink / raw) To: NeilBrown, Ross Zwisler, Andrew Morton, Jan Kara Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org I agree with this patch, but it's already part of the patchset that I'm working on, so merging this patch now would just introduce churn for me. -----Original Message----- From: NeilBrown [mailto:neilb@suse.com] Sent: Saturday, February 27, 2016 9:09 PM To: Ross Zwisler; Wilcox, Matthew R; Andrew Morton; Jan Kara Cc: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-mm@kvack.org Subject: [PATCH 1/3] DAX: move RADIX_DAX_ definitions to dax.c These don't belong in radix-tree.c any more than PAGECACHE_TAG_* do. Let's try to maintain the idea that radix-tree simply implements an abstract data type. Signed-off-by: NeilBrown <neilb@suse.com> --- fs/dax.c | 9 +++++++++ include/linux/radix-tree.h | 9 --------- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 711172450da6..9c4d697fb6fc 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -32,6 +32,15 @@ #include <linux/pfn_t.h> #include <linux/sizes.h> +#define RADIX_DAX_MASK 0xf +#define RADIX_DAX_SHIFT 4 +#define RADIX_DAX_PTE (0x4 | RADIX_TREE_EXCEPTIONAL_ENTRY) +#define RADIX_DAX_PMD (0x8 | RADIX_TREE_EXCEPTIONAL_ENTRY) +#define RADIX_DAX_TYPE(entry) ((unsigned long)entry & RADIX_DAX_MASK) +#define RADIX_DAX_SECTOR(entry) (((unsigned long)entry >> RADIX_DAX_SHIFT)) +#define RADIX_DAX_ENTRY(sector, pmd) ((void *)((unsigned long)sector << \ + RADIX_DAX_SHIFT | (pmd ? RADIX_DAX_PMD : RADIX_DAX_PTE))) + static long dax_map_atomic(struct block_device *bdev, struct blk_dax_ctl *dax) { struct request_queue *q = bdev->bd_queue; diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index f54be7082207..968150ab8a1c 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -51,15 +51,6 @@ #define RADIX_TREE_EXCEPTIONAL_ENTRY 2 #define RADIX_TREE_EXCEPTIONAL_SHIFT 2 -#define RADIX_DAX_MASK 0xf -#define RADIX_DAX_SHIFT 4 -#define RADIX_DAX_PTE (0x4 | RADIX_TREE_EXCEPTIONAL_ENTRY) -#define RADIX_DAX_PMD (0x8 | RADIX_TREE_EXCEPTIONAL_ENTRY) -#define RADIX_DAX_TYPE(entry) ((unsigned long)entry & RADIX_DAX_MASK) -#define RADIX_DAX_SECTOR(entry) (((unsigned long)entry >> RADIX_DAX_SHIFT)) -#define RADIX_DAX_ENTRY(sector, pmd) ((void *)((unsigned long)sector << \ - RADIX_DAX_SHIFT | (pmd ? RADIX_DAX_PMD : RADIX_DAX_PTE))) - static inline int radix_tree_is_indirect_ptr(void *ptr) { return (int)((unsigned long)ptr & RADIX_TREE_INDIRECT_PTR); ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] DAX: move RADIX_DAX_ definitions to dax.c 2016-02-29 14:28 ` Wilcox, Matthew R @ 2016-02-29 17:46 ` Ross Zwisler 0 siblings, 0 replies; 15+ messages in thread From: Ross Zwisler @ 2016-02-29 17:46 UTC (permalink / raw) To: Wilcox, Matthew R Cc: NeilBrown, Ross Zwisler, Andrew Morton, Jan Kara, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org On Mon, Feb 29, 2016 at 02:28:46PM +0000, Wilcox, Matthew R wrote: > I agree with this patch, but it's already part of the patchset that I'm > working on, so merging this patch now would just introduce churn for me. > > -----Original Message----- > From: NeilBrown [mailto:neilb@suse.com] > Sent: Saturday, February 27, 2016 9:09 PM > To: Ross Zwisler; Wilcox, Matthew R; Andrew Morton; Jan Kara > Cc: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-mm@kvack.org > Subject: [PATCH 1/3] DAX: move RADIX_DAX_ definitions to dax.c > > These don't belong in radix-tree.c any more than PAGECACHE_TAG_* do. > Let's try to maintain the idea that radix-tree simply implements an > abstract data type. Looks good. I'm fine with this change, whether it happens via this standalone patch or via Matthew's larger change set. > > Signed-off-by: NeilBrown <neilb@suse.com> > --- > fs/dax.c | 9 +++++++++ > include/linux/radix-tree.h | 9 --------- > 2 files changed, 9 insertions(+), 9 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 711172450da6..9c4d697fb6fc 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -32,6 +32,15 @@ > #include <linux/pfn_t.h> > #include <linux/sizes.h> > > +#define RADIX_DAX_MASK 0xf > +#define RADIX_DAX_SHIFT 4 > +#define RADIX_DAX_PTE (0x4 | RADIX_TREE_EXCEPTIONAL_ENTRY) > +#define RADIX_DAX_PMD (0x8 | RADIX_TREE_EXCEPTIONAL_ENTRY) > +#define RADIX_DAX_TYPE(entry) ((unsigned long)entry & RADIX_DAX_MASK) > +#define RADIX_DAX_SECTOR(entry) (((unsigned long)entry >> RADIX_DAX_SHIFT)) > +#define RADIX_DAX_ENTRY(sector, pmd) ((void *)((unsigned long)sector << \ > + RADIX_DAX_SHIFT | (pmd ? RADIX_DAX_PMD : RADIX_DAX_PTE))) > + > static long dax_map_atomic(struct block_device *bdev, struct blk_dax_ctl *dax) > { > struct request_queue *q = bdev->bd_queue; > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index f54be7082207..968150ab8a1c 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -51,15 +51,6 @@ > #define RADIX_TREE_EXCEPTIONAL_ENTRY 2 > #define RADIX_TREE_EXCEPTIONAL_SHIFT 2 > > -#define RADIX_DAX_MASK 0xf > -#define RADIX_DAX_SHIFT 4 > -#define RADIX_DAX_PTE (0x4 | RADIX_TREE_EXCEPTIONAL_ENTRY) > -#define RADIX_DAX_PMD (0x8 | RADIX_TREE_EXCEPTIONAL_ENTRY) > -#define RADIX_DAX_TYPE(entry) ((unsigned long)entry & RADIX_DAX_MASK) > -#define RADIX_DAX_SECTOR(entry) (((unsigned long)entry >> RADIX_DAX_SHIFT)) > -#define RADIX_DAX_ENTRY(sector, pmd) ((void *)((unsigned long)sector << \ > - RADIX_DAX_SHIFT | (pmd ? RADIX_DAX_PMD : RADIX_DAX_PTE))) > - > static inline int radix_tree_is_indirect_ptr(void *ptr) > { > return (int)((unsigned long)ptr & RADIX_TREE_INDIRECT_PTR); > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] radix-tree: make 'indirect' bit available to exception entries. 2016-02-28 5:09 [PATCH 0/3] RFC improvements to radix-tree related to DAX NeilBrown 2016-02-28 5:09 ` [PATCH 3/3] radix-tree: support locking of individual exception entries NeilBrown 2016-02-28 5:09 ` [PATCH 1/3] DAX: move RADIX_DAX_ definitions to dax.c NeilBrown @ 2016-02-28 5:09 ` NeilBrown 2016-02-29 14:41 ` Wilcox, Matthew R 2 siblings, 1 reply; 15+ messages in thread From: NeilBrown @ 2016-02-28 5:09 UTC (permalink / raw) To: Ross Zwisler, Matthew Wilcox, Andrew Morton, Jan Kara Cc: linux-kernel, linux-fsdevel, linux-mm A pointer to a radix_tree_node will always have the 'exception' bit cleared, so if the exception bit is set the value cannot be an indirect pointer. Thus it is safe to make the 'indirect bit' available to store extra information in exception entries. This patch adds a 'PTR_MASK' and a value is only treated as an indirect (pointer) entry the 2 ls-bits are '01'. The change in radix-tree.c ensures the stored value still looks like an indirect pointer, and saves a load as well. We could swap the two bits and so keep all the exectional bits contigious. But I have other plans for that bit.... Signed-off-by: NeilBrown <neilb@suse.com> --- include/linux/radix-tree.h | 11 +++++++++-- lib/radix-tree.c | 2 +- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 968150ab8a1c..450c12b546b7 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -40,8 +40,13 @@ * Indirect pointer in fact is also used to tag the last pointer of a node * when it is shrunk, before we rcu free the node. See shrink code for * details. + * + * To allow an exception entry to only lose one bit, we ignore + * the INDIRECT bit when the exception bit is set. So an entry is + * indirect if the least significant 2 bits are 01. */ #define RADIX_TREE_INDIRECT_PTR 1 +#define RADIX_TREE_INDIRECT_MASK 3 /* * A common use of the radix tree is to store pointers to struct pages; * but shmem/tmpfs needs also to store swap entries in the same tree: @@ -53,7 +58,8 @@ static inline int radix_tree_is_indirect_ptr(void *ptr) { - return (int)((unsigned long)ptr & RADIX_TREE_INDIRECT_PTR); + return ((unsigned long)ptr & RADIX_TREE_INDIRECT_MASK) + == RADIX_TREE_INDIRECT_PTR; } /*** radix-tree API starts here ***/ @@ -221,7 +227,8 @@ static inline void *radix_tree_deref_slot_protected(void **pslot, */ static inline int radix_tree_deref_retry(void *arg) { - return unlikely((unsigned long)arg & RADIX_TREE_INDIRECT_PTR); + return unlikely(((unsigned long)arg & RADIX_TREE_INDIRECT_MASK) + == RADIX_TREE_INDIRECT_PTR); } /** diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 6b79e9026e24..37d4643ab5c0 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1305,7 +1305,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root) * to force callers to retry. */ if (root->height == 0) - *((unsigned long *)&to_free->slots[0]) |= + *((unsigned long *)&to_free->slots[0]) = RADIX_TREE_INDIRECT_PTR; radix_tree_node_free(to_free); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: [PATCH 2/3] radix-tree: make 'indirect' bit available to exception entries. 2016-02-28 5:09 ` [PATCH 2/3] radix-tree: make 'indirect' bit available to exception entries NeilBrown @ 2016-02-29 14:41 ` Wilcox, Matthew R 2016-03-01 21:59 ` Ross Zwisler 0 siblings, 1 reply; 15+ messages in thread From: Wilcox, Matthew R @ 2016-02-29 14:41 UTC (permalink / raw) To: NeilBrown, Ross Zwisler, Andrew Morton, Jan Kara Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3723 bytes --] So based on the bottom two bits, we can tell what this entry is: 00 - data pointer 01 - indirect entry (pointer to another level of the radix tree) 10 - exceptional entry 11 - locked exceptional entry I was concerned that this patch would clash with the support for multi-order entries in the radix tree, but after some thought, I now believe that it doesn't. The multi-order entries changes permit finding data pointers or exceptional entries in the tree where before only indirect entries could be found, but with the changes to radix_tree_is_indirect_ptr below, everything should work fine. -----Original Message----- From: NeilBrown [mailto:neilb@suse.com] Sent: Saturday, February 27, 2016 9:09 PM To: Ross Zwisler; Wilcox, Matthew R; Andrew Morton; Jan Kara Cc: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-mm@kvack.org Subject: [PATCH 2/3] radix-tree: make 'indirect' bit available to exception entries. A pointer to a radix_tree_node will always have the 'exception' bit cleared, so if the exception bit is set the value cannot be an indirect pointer. Thus it is safe to make the 'indirect bit' available to store extra information in exception entries. This patch adds a 'PTR_MASK' and a value is only treated as an indirect (pointer) entry the 2 ls-bits are '01'. The change in radix-tree.c ensures the stored value still looks like an indirect pointer, and saves a load as well. We could swap the two bits and so keep all the exectional bits contigious. But I have other plans for that bit.... Signed-off-by: NeilBrown <neilb@suse.com> --- include/linux/radix-tree.h | 11 +++++++++-- lib/radix-tree.c | 2 +- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 968150ab8a1c..450c12b546b7 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -40,8 +40,13 @@ * Indirect pointer in fact is also used to tag the last pointer of a node * when it is shrunk, before we rcu free the node. See shrink code for * details. + * + * To allow an exception entry to only lose one bit, we ignore + * the INDIRECT bit when the exception bit is set. So an entry is + * indirect if the least significant 2 bits are 01. */ #define RADIX_TREE_INDIRECT_PTR 1 +#define RADIX_TREE_INDIRECT_MASK 3 /* * A common use of the radix tree is to store pointers to struct pages; * but shmem/tmpfs needs also to store swap entries in the same tree: @@ -53,7 +58,8 @@ static inline int radix_tree_is_indirect_ptr(void *ptr) { - return (int)((unsigned long)ptr & RADIX_TREE_INDIRECT_PTR); + return ((unsigned long)ptr & RADIX_TREE_INDIRECT_MASK) + == RADIX_TREE_INDIRECT_PTR; } /*** radix-tree API starts here ***/ @@ -221,7 +227,8 @@ static inline void *radix_tree_deref_slot_protected(void **pslot, */ static inline int radix_tree_deref_retry(void *arg) { - return unlikely((unsigned long)arg & RADIX_TREE_INDIRECT_PTR); + return unlikely(((unsigned long)arg & RADIX_TREE_INDIRECT_MASK) + == RADIX_TREE_INDIRECT_PTR); } /** diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 6b79e9026e24..37d4643ab5c0 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1305,7 +1305,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root) * to force callers to retry. */ if (root->height == 0) - *((unsigned long *)&to_free->slots[0]) |= + *((unsigned long *)&to_free->slots[0]) = RADIX_TREE_INDIRECT_PTR; radix_tree_node_free(to_free); N§²æìr¸zǧu©²Æ {\béì¹»\x1c®&Þ)îÆi¢Ø^nr¶Ý¢j$½§$¢¸\x05¢¹¨è§~'.)îÄÃ,yèm¶ÿÃ\f%{±j+ðèצj)Z· ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] radix-tree: make 'indirect' bit available to exception entries. 2016-02-29 14:41 ` Wilcox, Matthew R @ 2016-03-01 21:59 ` Ross Zwisler 0 siblings, 0 replies; 15+ messages in thread From: Ross Zwisler @ 2016-03-01 21:59 UTC (permalink / raw) To: Wilcox, Matthew R Cc: NeilBrown, Ross Zwisler, Andrew Morton, Jan Kara, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org On Mon, Feb 29, 2016 at 02:41:55PM +0000, Wilcox, Matthew R wrote: > So based on the bottom two bits, we can tell what this entry is: > > 00 - data pointer > 01 - indirect entry (pointer to another level of the radix tree) > 10 - exceptional entry > 11 - locked exceptional entry > > I was concerned that this patch would clash with the support for multi-order > entries in the radix tree, but after some thought, I now believe that it > doesn't. The multi-order entries changes permit finding data pointers or > exceptional entries in the tree where before only indirect entries could be > found, but with the changes to radix_tree_is_indirect_ptr below, everything > should work fine. Yep, this seems workable to me. > -----Original Message----- > From: NeilBrown [mailto:neilb@suse.com] > Sent: Saturday, February 27, 2016 9:09 PM > To: Ross Zwisler; Wilcox, Matthew R; Andrew Morton; Jan Kara > Cc: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-mm@kvack.org > Subject: [PATCH 2/3] radix-tree: make 'indirect' bit available to exception entries. > > A pointer to a radix_tree_node will always have the 'exception' > bit cleared, so if the exception bit is set the value cannot > be an indirect pointer. Thus it is safe to make the 'indirect bit' > available to store extra information in exception entries. > > This patch adds a 'PTR_MASK' and a value is only treated as > an indirect (pointer) entry the 2 ls-bits are '01'. > > The change in radix-tree.c ensures the stored value still looks like an > indirect pointer, and saves a load as well. > > We could swap the two bits and so keep all the exectional bits contigious. > But I have other plans for that bit.... > > Signed-off-by: NeilBrown <neilb@suse.com> > --- > include/linux/radix-tree.h | 11 +++++++++-- > lib/radix-tree.c | 2 +- > 2 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > index 968150ab8a1c..450c12b546b7 100644 > --- a/include/linux/radix-tree.h > +++ b/include/linux/radix-tree.h > @@ -40,8 +40,13 @@ > * Indirect pointer in fact is also used to tag the last pointer of a node > * when it is shrunk, before we rcu free the node. See shrink code for > * details. > + * > + * To allow an exception entry to only lose one bit, we ignore > + * the INDIRECT bit when the exception bit is set. So an entry is > + * indirect if the least significant 2 bits are 01. > */ > #define RADIX_TREE_INDIRECT_PTR 1 > +#define RADIX_TREE_INDIRECT_MASK 3 > /* > * A common use of the radix tree is to store pointers to struct pages; > * but shmem/tmpfs needs also to store swap entries in the same tree: > @@ -53,7 +58,8 @@ > > static inline int radix_tree_is_indirect_ptr(void *ptr) > { > - return (int)((unsigned long)ptr & RADIX_TREE_INDIRECT_PTR); > + return ((unsigned long)ptr & RADIX_TREE_INDIRECT_MASK) > + == RADIX_TREE_INDIRECT_PTR; > } > > /*** radix-tree API starts here ***/ > @@ -221,7 +227,8 @@ static inline void *radix_tree_deref_slot_protected(void **pslot, > */ > static inline int radix_tree_deref_retry(void *arg) > { > - return unlikely((unsigned long)arg & RADIX_TREE_INDIRECT_PTR); > + return unlikely(((unsigned long)arg & RADIX_TREE_INDIRECT_MASK) > + == RADIX_TREE_INDIRECT_PTR); > } > > /** > diff --git a/lib/radix-tree.c b/lib/radix-tree.c > index 6b79e9026e24..37d4643ab5c0 100644 > --- a/lib/radix-tree.c > +++ b/lib/radix-tree.c > @@ -1305,7 +1305,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root) > * to force callers to retry. > */ > if (root->height == 0) > - *((unsigned long *)&to_free->slots[0]) |= > + *((unsigned long *)&to_free->slots[0]) = > RADIX_TREE_INDIRECT_PTR; > > radix_tree_node_free(to_free); > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-03-09 2:13 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-28 5:09 [PATCH 0/3] RFC improvements to radix-tree related to DAX NeilBrown 2016-02-28 5:09 ` [PATCH 3/3] radix-tree: support locking of individual exception entries NeilBrown 2016-02-28 5:30 ` kbuild test robot 2016-02-28 6:27 ` NeilBrown 2016-03-03 13:10 ` Jan Kara 2016-03-03 23:51 ` NeilBrown 2016-03-04 10:14 ` NeilBrown 2016-03-04 12:31 ` Jan Kara 2016-03-09 2:13 ` NeilBrown 2016-02-28 5:09 ` [PATCH 1/3] DAX: move RADIX_DAX_ definitions to dax.c NeilBrown 2016-02-29 14:28 ` Wilcox, Matthew R 2016-02-29 17:46 ` Ross Zwisler 2016-02-28 5:09 ` [PATCH 2/3] radix-tree: make 'indirect' bit available to exception entries NeilBrown 2016-02-29 14:41 ` Wilcox, Matthew R 2016-03-01 21:59 ` Ross Zwisler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).