* [dpdk-dev] [Bug 260] bugDPDK lock-free hash deletion
@ 2019-04-30 6:51 bugzilla
0 siblings, 0 replies; only message in thread
From: bugzilla @ 2019-04-30 6:51 UTC (permalink / raw)
To: dev
https://bugs.dpdk.org/show_bug.cgi?id=260
Bug ID: 260
Summary: bugDPDK lock-free hash deletion
Product: DPDK
Version: 18.11
Hardware: All
OS: All
Status: CONFIRMED
Severity: normal
Priority: Normal
Component: other
Assignee: dev@dpdk.org
Reporter: zhongdahulinfan@163.com
Target Milestone: ---
lock-free版本的哈希表,在创建的时候指定了flag:
RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF | RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD
以使哈希表支持multi-writer,RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD标记使得每个lcore都可以使用local
cache分配key slot:
struct rte_hash *
rte_hash_create(const struct rte_hash_parameters *params)
{
...
if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) {
use_local_cache = 1;
writer_takes_lock = 1;
}
...
/* Store all keys and leave the first entry as a dummy entry for
lookup_bulk */
if (use_local_cache)
/*
* Increase number of slots by total number of indices
* that can be stored in the lcore caches
* except for the first cache
*/
num_key_slots = params->entries + (RTE_MAX_LCORE - 1) *
(LCORE_CACHE_SIZE - 1) + 1;
else
num_key_slots = params->entries + 1;
...
}
这么做的好处是,每个写者可以从本地cache分配key slot,可减少cache miss,提升哈希插入的性能。
这里要先说一下rte_hash的key和data的存储的结构:
| dummy | pdata + key | pdata + key | pdata + key | pdata + key | pdata + key
| pdata + key | pdata + key | pdata + key |...
0 | 1 2 3
4 5 6
7 8 |
| <--------------
bucket
----------> |
struct rte_hash里的成员key_store就是上述数组,存放key的内容和data指针,其中 index 0没被使用,有效数据从index 1
开始。该数组被划分成若干个bucket,每个bucket的大小为8。这么做是有原因的,rte_hash使用cuckoo
哈希实现,引入bucket解决哈希冲突,而非开链。以写为例,在写哈希表的时候,先哈希到primary
bucket,再循环遍历该bucket找到存储位置,如若有空位则插入,否则继续找secondary bucket。
结构体定义如下:
/* Structure that stores key-value pair */
struct rte_hash_key {
union {
uintptr_t idata;
void *pdata;
};
/* Variable key size */
char key[0];
};
/** Bucket structure */
struct rte_hash_bucket {
uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
void *next;
} __rte_cache_aligned;
然后这些key index,用一个rte_ring来存储:
struct rte_hash *
rte_hash_create(const struct rte_hash_parameters *params)
{
...
/* Populate free slots ring. Entry zero is reserved for key misses. */
for (i = 1; i < num_key_slots; i++)
rte_ring_sp_enqueue(r, (void *)((uintptr_t) i));
...
}
当使用lock-free哈希表时,删除一个表项的时候key和value采用延后回收内存的策略,使得multi-readers访问哈希表不需要加锁,大大减少多核应用程序的性能损耗。我们调用dpdk
API rte_hash_del_key_xxx删除表项时,只将表项从哈希表中摘除,返回一个key的存储位置position,就是上述所说的key
index,然后在所有引用该表项的readers/writers都退出引用后,再根据position回收key和value的存储空间。key的回收调用一下接口:
int __rte_experimental
rte_hash_free_key_with_position(const struct rte_hash *h,
const int32_t position)
{
RETURN_IF_TRUE(((h == NULL) || (position == EMPTY_SLOT)), -EINVAL);
unsigned int lcore_id, n_slots;
struct lcore_cache *cached_free_slots;
const int32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
/* Out of bounds */
if (position >= total_entries)
return -EINVAL;
if (h->use_local_cache) {
lcore_id = rte_lcore_id();
cached_free_slots = &h->local_free_slots[lcore_id];
/* Cache full, need to free it. */
if (cached_free_slots->len == LCORE_CACHE_SIZE) {
/* Need to enqueue the free slots in global ring. */
n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
cached_free_slots->objs,
LCORE_CACHE_SIZE, NULL);
cached_free_slots->len -= n_slots;
}
/* Put index of new free slot in cache. */
cached_free_slots->objs[cached_free_slots->len] =
(void *)((uintptr_t)position);
cached_free_slots->len++;
} else {
rte_ring_sp_enqueue(h->free_slots,
(void *)((uintptr_t)position));
}
return 0;
}
仔细看rte_hash的实现,不难发现上述函数有两个地方存在问题:
1、"Out of bounds" 的判断逻辑
这个 "Out of bounds" 的判断逻辑,在哈希表的use_local_cache标识没被置为的时候是成立的,key
index的数量恰好是哈希表entries的数量。但当use_local_cache为真时,它就不正确了。回看一下创建哈希表的函数rte_hash_create,其中key
slots的计算:
/* Store all keys and leave the first entry as a dummy entry for lookup_bulk */
if (use_local_cache)
/*
* Increase number of slots by total number of indices
* that can be stored in the lcore caches
* except for the first cache
*/
num_key_slots = params->entries + (RTE_MAX_LCORE - 1) *
(LCORE_CACHE_SIZE - 1) + 1;
else
num_key_slots = params->entries + 1;
这时候除了分配entries个key内存空间,还要给每个lcore分配LCORE_CACHE_SIZE数量的key空间,那么此时key的数量是会大于哈希表的total
entry的,所以 rte_hash_free_key_with_position的"Out of bounds"判断逻辑有误。
2、position参数问题
position是rte_hash_del_key_xxx()的返回值,为 (key_idx - 1)。前面说过key的index
0未被使用,从1开始有效。那么直接将position enqueue到free_slot队列是不正确的:
rte_ring_sp_enqueue(h->free_slots,
(void *)((uintptr_t)position));
这会导致ring队列里可能存在多个值为0的position,从而损坏ring队列。以ring的size是4为例:
ring初始状态:
|1 | 2 | 3 | 4 |
dequeue完所有key_idx后,返回position为 0,1,2,3,再enqueue回ring队列
一趟dequeue enqueue过后:
| 0 | 1 | 2 | 3 |
dequeue完所有key_idx后,返回position为 0,1,2(其中key_idx 0是无效的,不被使用),再enqueue回ring队列
两趟dequeue enqueue过后:
| 0 | 0 | 1 | 2 |
对比非lock-free的key回收接口,可以发现,remove_entry()是直接将key_indx入队的,所以不存在问题:
static inline void
remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
{
unsigned lcore_id, n_slots;
struct lcore_cache *cached_free_slots;
if (h->use_local_cache) {
lcore_id = rte_lcore_id();
cached_free_slots = &h->local_free_slots[lcore_id];
/* Cache full, need to free it. */
if (cached_free_slots->len == LCORE_CACHE_SIZE) {
/* Need to enqueue the free slots in global ring. */
n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
cached_free_slots->objs,
LCORE_CACHE_SIZE, NULL);
cached_free_slots->len -= n_slots;
}
/* Put index of new free slot in cache. */
cached_free_slots->objs[cached_free_slots->len] =
(void *)((uintptr_t)bkt->key_idx[i]);
cached_free_slots->len++;
} else {
rte_ring_sp_enqueue(h->free_slots,
(void *)((uintptr_t)bkt->key_idx[i]));
}
}
修复方法
1、修改"Out of bounds" 的判断逻辑
2、position在enqueue回free_slots队列之前加1
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2019-04-30 6:51 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-30 6:51 [dpdk-dev] [Bug 260] bugDPDK lock-free hash deletion bugzilla
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.