* [PATCH] Problems with RAID 4/5/6 and kmem_cache
@ 2013-09-20 3:15 Jonathan Brassow
2013-09-20 3:16 ` [PATCH] RAID5: Change kmem_cache name string of RAID 4/5/6 stripe cache Jonathan Brassow
2013-09-21 21:56 ` [PATCH] Problems with RAID 4/5/6 and kmem_cache Christoph Lameter
0 siblings, 2 replies; 7+ messages in thread
From: Jonathan Brassow @ 2013-09-20 3:15 UTC (permalink / raw)
To: linux-raid; +Cc: linux-mm, cl, Jonathan Brassow
I'm sending a patch that changes the name string used with
kmem_cache_create. While I believe this is a bug in the kmem_cache
implementation, it doesn't hurt to work-around it in this simple way.
The problem with kmem_cache* is this:
*) Assume CONFIG_SLUB is set
1) kmem_cache_create(name="foo-a")
- creates new kmem_cache structure
2) kmem_cache_create(name="foo-b")
- If identical cache characteristics, it will be merged with the previously
created cache associated with "foo-a". The cache's refcount will be
incremented and an alias will be created via sysfs_slab_alias().
3) kmem_cache_destroy(<ptr>)
- Attempting to destroy cache associated with "foo-a", but instead the
refcount is simply decremented. I don't even think the sysfs aliases are
ever removed...
4) kmem_cache_create(name="foo-a")
- This FAILS because kmem_cache_sanity_check colides with the existing
name ("foo-a") associated with the non-removed cache.
This is a problem for RAID (specifically dm-raid) because the name used
for the kmem_cache_create is ("raid%d-%p", level, mddev). If the cache
persists for long enough, the memory address of an old mddev will be
reused for a new mddev - causing an identical formulation of the cache
name. Even though kmem_cache_destory had long ago been used to delete
the old cache, the merging of caches has cause the name and cache of that
old instance to be preserved and causes a colision (and thus failure) in
kmem_cache_create(). I see this regularly in my testing.
I haven't tried to reproduce this using MD-specific tools, but I would
think it would be even easier to reproduce there because of the cache
name being used. (Perhaps create two similar RAID4/5/6 arrays. Remove
the first one and then try to recreate the first one. The cache should
stay and the re-use of the name should collide.)
There are a few ways I can think of to correct this bug in kmem_cache,
but none of them seem that clean.
1) force kmem_cache_destroy to be called with a name so that the
proper alias can be removed (and the name of the cache possibly
updated).
2) Change structures around so that we return something small from
kmem_cache_create that contains a name and pointer to the mergable
cache. If new caches are mergable with existing ones, then we
only have to create the small structure. Having that pointer allows
us to properly remove the reference and corresponding name when
calling kmem_cache_destroy().
Perhaps there are cleaner options. In the meantime, please accept my
MD RAID4/5/6 workaround patch.
thanks,
brassow
Jonathan Brassow (1):
RAID5: Change kmem_cache name string of RAID 4/5/6 stripe cache
drivers/md/raid5.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] RAID5: Change kmem_cache name string of RAID 4/5/6 stripe cache
2013-09-20 3:15 [PATCH] Problems with RAID 4/5/6 and kmem_cache Jonathan Brassow
@ 2013-09-20 3:16 ` Jonathan Brassow
2013-10-10 22:50 ` H. Peter Anvin
2013-09-21 21:56 ` [PATCH] Problems with RAID 4/5/6 and kmem_cache Christoph Lameter
1 sibling, 1 reply; 7+ messages in thread
From: Jonathan Brassow @ 2013-09-20 3:16 UTC (permalink / raw)
To: linux-raid; +Cc: linux-mm, cl, Jonathan Brassow
The unique portion of the kmem_cache name used when dm-raid is creating
a RAID 4/5/6 array is the memory address of it's associated 'mddev'
structure. This is not always unique. The memory associated
with the 'mddev' structure can be freed and a future 'mddev' structure
can be allocated from the exact same spot. This causes an identical
name to the old cache to be created when kmem_cache_create is called.
If an old name is still present amoung slab_caches due to cache merging,
the call will fail. This is not theoretical, I see this regularly when
performing device-mapper RAID 4/5/6 tests (although, strangely only on
Fedora-19).
Making the unique portion of the kmem_cache name based on jiffies fixes
this problem.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
---
drivers/md/raid5.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7ff4f25..f731ce9 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1618,7 +1618,7 @@ static int grow_stripes(struct r5conf *conf, int num)
"raid%d-%s", conf->level, mdname(conf->mddev));
else
sprintf(conf->cache_name[0],
- "raid%d-%p", conf->level, conf->mddev);
+ "raid%d-%llu", conf->level, get_jiffies_64());
sprintf(conf->cache_name[1], "%s-alt", conf->cache_name[0]);
conf->active_name = 0;
--
1.7.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] RAID5: Change kmem_cache name string of RAID 4/5/6 stripe cache
2013-09-20 3:16 ` [PATCH] RAID5: Change kmem_cache name string of RAID 4/5/6 stripe cache Jonathan Brassow
@ 2013-10-10 22:50 ` H. Peter Anvin
2013-10-11 2:37 ` Brassow Jonathan
0 siblings, 1 reply; 7+ messages in thread
From: H. Peter Anvin @ 2013-10-10 22:50 UTC (permalink / raw)
To: Jonathan Brassow; +Cc: linux-raid, linux-mm, cl
On 09/19/2013 08:16 PM, Jonathan Brassow wrote:
> The unique portion of the kmem_cache name used when dm-raid is creating
> a RAID 4/5/6 array is the memory address of it's associated 'mddev'
> structure. This is not always unique. The memory associated
> with the 'mddev' structure can be freed and a future 'mddev' structure
> can be allocated from the exact same spot. This causes an identical
> name to the old cache to be created when kmem_cache_create is called.
> If an old name is still present amoung slab_caches due to cache merging,
> the call will fail. This is not theoretical, I see this regularly when
> performing device-mapper RAID 4/5/6 tests (although, strangely only on
> Fedora-19).
>
> Making the unique portion of the kmem_cache name based on jiffies fixes
> this problem.
>
> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
> ---
> drivers/md/raid5.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 7ff4f25..f731ce9 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -1618,7 +1618,7 @@ static int grow_stripes(struct r5conf *conf, int num)
> "raid%d-%s", conf->level, mdname(conf->mddev));
> else
> sprintf(conf->cache_name[0],
> - "raid%d-%p", conf->level, conf->mddev);
> + "raid%d-%llu", conf->level, get_jiffies_64());
> sprintf(conf->cache_name[1], "%s-alt", conf->cache_name[0]);
>
> conf->active_name = 0;
>
And it is not possible to create two inside the same jiffy? Seems
unlikely at best.
Why not just use a simple counter?
-hpa
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] RAID5: Change kmem_cache name string of RAID 4/5/6 stripe cache
2013-10-10 22:50 ` H. Peter Anvin
@ 2013-10-11 2:37 ` Brassow Jonathan
0 siblings, 0 replies; 7+ messages in thread
From: Brassow Jonathan @ 2013-10-11 2:37 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: linux-raid, linux-mm, cl
I think I can NACK this patch. It was fixed properly in the kmem code.
brassow
On Oct 10, 2013, at 5:50 PM, H. Peter Anvin wrote:
> On 09/19/2013 08:16 PM, Jonathan Brassow wrote:
>> The unique portion of the kmem_cache name used when dm-raid is creating
>> a RAID 4/5/6 array is the memory address of it's associated 'mddev'
>> structure. This is not always unique. The memory associated
>> with the 'mddev' structure can be freed and a future 'mddev' structure
>> can be allocated from the exact same spot. This causes an identical
>> name to the old cache to be created when kmem_cache_create is called.
>> If an old name is still present amoung slab_caches due to cache merging,
>> the call will fail. This is not theoretical, I see this regularly when
>> performing device-mapper RAID 4/5/6 tests (although, strangely only on
>> Fedora-19).
>>
>> Making the unique portion of the kmem_cache name based on jiffies fixes
>> this problem.
>>
>> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
>> ---
>> drivers/md/raid5.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 7ff4f25..f731ce9 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -1618,7 +1618,7 @@ static int grow_stripes(struct r5conf *conf, int num)
>> "raid%d-%s", conf->level, mdname(conf->mddev));
>> else
>> sprintf(conf->cache_name[0],
>> - "raid%d-%p", conf->level, conf->mddev);
>> + "raid%d-%llu", conf->level, get_jiffies_64());
>> sprintf(conf->cache_name[1], "%s-alt", conf->cache_name[0]);
>>
>> conf->active_name = 0;
>>
>
> And it is not possible to create two inside the same jiffy? Seems
> unlikely at best.
>
> Why not just use a simple counter?
>
> -hpa
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Problems with RAID 4/5/6 and kmem_cache
2013-09-20 3:15 [PATCH] Problems with RAID 4/5/6 and kmem_cache Jonathan Brassow
2013-09-20 3:16 ` [PATCH] RAID5: Change kmem_cache name string of RAID 4/5/6 stripe cache Jonathan Brassow
@ 2013-09-21 21:56 ` Christoph Lameter
2013-09-28 6:49 ` Pekka Enberg
1 sibling, 1 reply; 7+ messages in thread
From: Christoph Lameter @ 2013-09-21 21:56 UTC (permalink / raw)
To: Jonathan Brassow; +Cc: linux-raid, linux-mm, Pekka Enberg
On Thu, 19 Sep 2013, Jonathan Brassow wrote:
> 4) kmem_cache_create(name="foo-a")
> - This FAILS because kmem_cache_sanity_check colides with the existing
> name ("foo-a") associated with the non-removed cache.
That should not happen. breakage you see will result. Oh. I see the move
to common code resulted in the SLAB checks being used for SLUB.
The following patch should fix this.
Subject: slab_common: Do not check for duplicate slab names
SLUB can alias multiple slab kmem_create_requests to one slab cache
to save memory and increase the cache hotness. As a result the name
of the slab can be stale. Only check the name for duplicates if we are
in debug mode where we do not merge multiple caches.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/slab_common.c
===================================================================
--- linux.orig/mm/slab_common.c 2013-09-20 11:49:13.052208294 -0500
+++ linux/mm/slab_common.c 2013-09-21 16:55:23.097131481 -0500
@@ -56,6 +56,7 @@
continue;
}
+#if !defined(CONFIG_SLUB) || !defined(CONFIG_SLUB_DEBUG_ON)
/*
* For simplicity, we won't check this in the list of memcg
* caches. We have control over memcg naming, and if there
@@ -69,6 +70,7 @@
s = NULL;
return -EINVAL;
}
+#endif
}
WARN_ON(strchr(name, ' ')); /* It confuses parsers */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Problems with RAID 4/5/6 and kmem_cache
2013-09-21 21:56 ` [PATCH] Problems with RAID 4/5/6 and kmem_cache Christoph Lameter
@ 2013-09-28 6:49 ` Pekka Enberg
2013-10-02 14:43 ` Christoph Lameter
0 siblings, 1 reply; 7+ messages in thread
From: Pekka Enberg @ 2013-09-28 6:49 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Jonathan Brassow, linux-raid, linux-mm@kvack.org
On Sun, Sep 22, 2013 at 12:56 AM, Christoph Lameter <cl@linux.com> wrote:
> On Thu, 19 Sep 2013, Jonathan Brassow wrote:
>
>> 4) kmem_cache_create(name="foo-a")
>> - This FAILS because kmem_cache_sanity_check colides with the existing
>> name ("foo-a") associated with the non-removed cache.
>
> That should not happen. breakage you see will result. Oh. I see the move
> to common code resulted in the SLAB checks being used for SLUB.
>
> The following patch should fix this.
>
> Subject: slab_common: Do not check for duplicate slab names
>
> SLUB can alias multiple slab kmem_create_requests to one slab cache
> to save memory and increase the cache hotness. As a result the name
> of the slab can be stale. Only check the name for duplicates if we are
> in debug mode where we do not merge multiple caches.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/mm/slab_common.c
> ===================================================================
> --- linux.orig/mm/slab_common.c 2013-09-20 11:49:13.052208294 -0500
> +++ linux/mm/slab_common.c 2013-09-21 16:55:23.097131481 -0500
> @@ -56,6 +56,7 @@
> continue;
> }
>
> +#if !defined(CONFIG_SLUB) || !defined(CONFIG_SLUB_DEBUG_ON)
> /*
> * For simplicity, we won't check this in the list of memcg
> * caches. We have control over memcg naming, and if there
> @@ -69,6 +70,7 @@
> s = NULL;
> return -EINVAL;
> }
> +#endif
> }
>
> WARN_ON(strchr(name, ' ')); /* It confuses parsers */
Applied to slab/urgent, thanks!
Do we need to come up with something less #ifdeffy for v3.13?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Problems with RAID 4/5/6 and kmem_cache
2013-09-28 6:49 ` Pekka Enberg
@ 2013-10-02 14:43 ` Christoph Lameter
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2013-10-02 14:43 UTC (permalink / raw)
To: Pekka Enberg; +Cc: Jonathan Brassow, linux-raid, linux-mm@kvack.org
On Sat, 28 Sep 2013, Pekka Enberg wrote:
> Do we need to come up with something less #ifdeffy for v3.13?
It would be nice to have something that also checks the runtime debug
configuration. But so far debugging is only switchable at runtime for SLUB
and not for SLAB. We could get that when we unify the object debugging in
both allocators.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-10-11 2:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-20 3:15 [PATCH] Problems with RAID 4/5/6 and kmem_cache Jonathan Brassow
2013-09-20 3:16 ` [PATCH] RAID5: Change kmem_cache name string of RAID 4/5/6 stripe cache Jonathan Brassow
2013-10-10 22:50 ` H. Peter Anvin
2013-10-11 2:37 ` Brassow Jonathan
2013-09-21 21:56 ` [PATCH] Problems with RAID 4/5/6 and kmem_cache Christoph Lameter
2013-09-28 6:49 ` Pekka Enberg
2013-10-02 14:43 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).