All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
@ 2006-08-07 22:36 Daniel Kobras
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Kobras @ 2006-08-07 22:36 UTC (permalink / raw)
  To: dm-devel

Implement private fallback if immediate allocation from mempool fails.
Standard mempool_alloc() fallback can yield a deadlock when only the
calling process is able to refill the pool. In out-of-memory situations,
instead of waiting for itself, kmirrord now waits for someone else to
free some space, using a standard blocking allocation.

Signed-off-by: Daniel Kobras <kobras@linux.de>
---
[long story]

Hi!

On an nForce4-equipped machine with two SATA disk in raid1 setup using
dmraid, we experienced frequent deadlock of the system under high i/o
load. 'cat /dev/zero > ~/zero' was the most reliable way to reproduce
them: Randomly after a few GB, 'cp' would be left in 'D' state along
with kjournald and kmirrord. The functions cp and kjournald were blocked
in did vary, but kmirrord's wchan always pointed to 'mempool_alloc()'.
We've seen this pattern on 2.6.15 and 2.6.17 kernels.
http://lkml.org/lkml/2005/4/20/142 indicates that this problem has been
around even before.

So much for the facts, here's my interpretation: mempool_alloc() first
tries to atomically allocate the requested memory, or falls back to hand
out preallocated chunks from the mempool. If both fail, it puts the
calling process (kmirrord in this case) on a private waitqueue until
somebody refills the pool. Where the only 'somebody' is kmirrord itself,
so we have a deadlock.

I worked around this problem by falling back to a (blocking) kmalloc
when before kmirrord would have ended up on the waitqueue. This defeats
part of the benefits of using the mempool, but at least keeps the system
running. And it could be done with a two-line change. Note that
mempool_alloc() clears the GFP_NOIO flag internally, and only uses it to
decide whether to wait or return an error if immediate allocation fails,
so the attached patch doesn't change behaviour in the non-deadlocking case.
Path is against current git (2.6.18-rc4), but should apply to earlier
versions as well. I've tested on 2.6.15, where this patch makes the
difference between random lockup and a stable system.

Regards,

Daniel.

diff -r dcc321d1340a -r d52bb3a14d60 drivers/md/dm-raid1.c
--- a/drivers/md/dm-raid1.c	Sun Aug 06 19:00:05 2006 +0000
+++ b/drivers/md/dm-raid1.c	Mon Aug 07 23:16:44 2006 +0200
@@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct 
 	struct region *reg, *nreg;
 
 	read_unlock(&rh->hash_lock);
-	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
+	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
+	if (unlikely(!nreg))
+		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
 	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
 		RH_CLEAN : RH_NOSYNC;
 	nreg->rh = rh;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
@ 2006-08-09 16:44 Daniel Kobras
  2006-08-12 20:02 ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Kobras @ 2006-08-09 16:44 UTC (permalink / raw)
  To: dm-devel; +Cc: linux-kernel

Implement private fallback if immediate allocation from mempool fails.
Standard mempool_alloc() fallback can yield a deadlock when only the
calling process is able to refill the pool. In out-of-memory situations,
instead of waiting for itself, kmirrord now waits for someone else to
free some space, using a standard blocking allocation.

Signed-off-by: Daniel Kobras <kobras@linux.de>
---
[Resending with Cc to l-k. First attempt apparently hasn't made it through to 
dm-devel.]

Hi!

On an nForce4-equipped machine with two SATA disk in raid1 setup using
dmraid, we experienced frequent deadlock of the system under high i/o
load. 'cat /dev/zero > ~/zero' was the most reliable way to reproduce
them: Randomly after a few GB, 'cp' would be left in 'D' state along
with kjournald and kmirrord. The functions cp and kjournald were blocked
in did vary, but kmirrord's wchan always pointed to 'mempool_alloc()'.
We've seen this pattern on 2.6.15 and 2.6.17 kernels.
http://lkml.org/lkml/2005/4/20/142 indicates that this problem has been
around even before.

So much for the facts, here's my interpretation: mempool_alloc() first
tries to atomically allocate the requested memory, or falls back to hand
out preallocated chunks from the mempool. If both fail, it puts the
calling process (kmirrord in this case) on a private waitqueue until
somebody refills the pool. Where the only 'somebody' is kmirrord itself,
so we have a deadlock.

I worked around this problem by falling back to a (blocking) kmalloc
when before kmirrord would have ended up on the waitqueue. This defeats
part of the benefits of using the mempool, but at least keeps the system
running. And it could be done with a two-line change. Note that
mempool_alloc() clears the GFP_NOIO flag internally, and only uses it to
decide whether to wait or return an error if immediate allocation fails,
so the attached patch doesn't change behaviour in the non-deadlocking case.
Path is against current git (2.6.18-rc4), but should apply to earlier
versions as well. I've tested on 2.6.15, where this patch makes the
difference between random lockup and a stable system.

Regards,

Daniel.

diff -r dcc321d1340a -r d52bb3a14d60 drivers/md/dm-raid1.c
--- a/drivers/md/dm-raid1.c	Sun Aug 06 19:00:05 2006 +0000
+++ b/drivers/md/dm-raid1.c	Mon Aug 07 23:16:44 2006 +0200
@@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct 
 	struct region *reg, *nreg;
 
 	read_unlock(&rh->hash_lock);
-	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
+	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
+	if (unlikely(!nreg))
+		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
 	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
 		RH_CLEAN : RH_NOSYNC;
 	nreg->rh = rh;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
  2006-08-09 16:44 Daniel Kobras
@ 2006-08-12 20:02 ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2006-08-12 20:02 UTC (permalink / raw)
  To: Daniel Kobras; +Cc: dm-devel, linux-kernel

On Wed, 9 Aug 2006 18:44:21 +0200
Daniel Kobras <kobras@linux.de> wrote:

> Implement private fallback if immediate allocation from mempool fails.
> Standard mempool_alloc() fallback can yield a deadlock when only the
> calling process is able to refill the pool. In out-of-memory situations,
> instead of waiting for itself, kmirrord now waits for someone else to
> free some space, using a standard blocking allocation.
> 
> Signed-off-by: Daniel Kobras <kobras@linux.de>
> ---
> [Resending with Cc to l-k. First attempt apparently hasn't made it through to 
> dm-devel.]
> 
> Hi!
> 
> On an nForce4-equipped machine with two SATA disk in raid1 setup using
> dmraid, we experienced frequent deadlock of the system under high i/o
> load. 'cat /dev/zero > ~/zero' was the most reliable way to reproduce
> them: Randomly after a few GB, 'cp' would be left in 'D' state along
> with kjournald and kmirrord. The functions cp and kjournald were blocked
> in did vary, but kmirrord's wchan always pointed to 'mempool_alloc()'.
> We've seen this pattern on 2.6.15 and 2.6.17 kernels.
> http://lkml.org/lkml/2005/4/20/142 indicates that this problem has been
> around even before.
> 
> So much for the facts, here's my interpretation: mempool_alloc() first
> tries to atomically allocate the requested memory, or falls back to hand
> out preallocated chunks from the mempool. If both fail, it puts the
> calling process (kmirrord in this case) on a private waitqueue until
> somebody refills the pool. Where the only 'somebody' is kmirrord itself,
> so we have a deadlock.

Right.  That's a design error in DM.  mempools are only supposed to be used
in situations where we *know* that if we wait for a bit, some elements will
be returned to the pool.  Obviously, if the only thread in the machine
which can release elements is the one which is waiting for thse elements,
we die.

> I worked around this problem by falling back to a (blocking) kmalloc
> when before kmirrord would have ended up on the waitqueue. This defeats
> part of the benefits of using the mempool, but at least keeps the system
> running. And it could be done with a two-line change. Note that
> mempool_alloc() clears the GFP_NOIO flag internally, and only uses it to
> decide whether to wait or return an error if immediate allocation fails,
> so the attached patch doesn't change behaviour in the non-deadlocking case.
> Path is against current git (2.6.18-rc4), but should apply to earlier
> versions as well. I've tested on 2.6.15, where this patch makes the
> difference between random lockup and a stable system.
> 
> Regards,
> 
> Daniel.
> 
> diff -r dcc321d1340a -r d52bb3a14d60 drivers/md/dm-raid1.c
> --- a/drivers/md/dm-raid1.c	Sun Aug 06 19:00:05 2006 +0000
> +++ b/drivers/md/dm-raid1.c	Mon Aug 07 23:16:44 2006 +0200
> @@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct 
>  	struct region *reg, *nreg;
>  
>  	read_unlock(&rh->hash_lock);
> -	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
> +	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
> +	if (unlikely(!nreg))
> +		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
>  	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
>  		RH_CLEAN : RH_NOSYNC;
>  	nreg->rh = rh;

Yes, that'll fix it.  It's rather nasty to be allocating elements with
kmalloc and then sending them back with mempool_free().  But it'll work.

Alasdair, I'd say that this is a 2.6.18 fix and a 2.6.17.x backport.

A longer-term fix might be to stop using mempools in there, just use
kmalloc.  Or move the mempool_free()ing out of kmorrord context and into
IO-completion context, perhaps.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
@ 2007-08-13 11:33 ` Heiko Carstens
  0 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2007-08-13 11:33 UTC (permalink / raw)
  To: linux-mm, dm-devel
  Cc: Stefan Weinhuber, Stefan Bader, Daniel Kobras, Andrew Morton,
	Linus Torvalds, Alasdair G Kergon

Hi,

the patch below went into 2.6.18. Now my question is: why doesn't it check
if kmalloc(..., GFP_NOIO) returns with a NULL pointer?
Did I miss anything that guarentees that this will always succeed or is it
just a bug?

commit c06aad854fdb9da38fcc22dccfe9d72919453e43
Author: Daniel Kobras <kobras@linux.de>
Date:   Sun Aug 27 01:23:24 2006 -0700

    [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
    
    On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
    we experienced frequent deadlock of the system under high i/o load.  'cat
    /dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
    after a few GB, 'cp' would be left in 'D' state along with kjournald and
    kmirrord.  The functions cp and kjournald were blocked in did vary, but
    kmirrord's wchan always pointed to 'mempool_alloc()'.  We've seen this pattern
    on 2.6.15 and 2.6.17 kernels.  http://lkml.org/lkml/2005/4/20/142 indicates
    that this problem has been around even before.
    
    So much for the facts, here's my interpretation: mempool_alloc() first tries
    to atomically allocate the requested memory, or falls back to hand out
    preallocated chunks from the mempool.  If both fail, it puts the calling
    process (kmirrord in this case) on a private waitqueue until somebody refills
    the pool.  Where the only 'somebody' is kmirrord itself, so we have a
    deadlock.
    
    I worked around this problem by falling back to a (blocking) kmalloc when
    before kmirrord would have ended up on the waitqueue.  This defeats part of
    the benefits of using the mempool, but at least keeps the system running.  And
    it could be done with a two-line change.  Note that mempool_alloc() clears the
    GFP_NOIO flag internally, and only uses it to decide whether to wait or return
    an error if immediate allocation fails, so the attached patch doesn't change
    behaviour in the non-deadlocking case.  Path is against current git
    (2.6.18-rc4), but should apply to earlier versions as well.  I've tested on
    2.6.15, where this patch makes the difference between random lockup and a
    stable system.
    
    Signed-off-by: Daniel Kobras <kobras@linux.de>
    Acked-by: Alasdair G Kergon <agk@redhat.com>
    Cc: <stable@kernel.org>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index be48ced..c54de98 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct region_hash *rh, region_t region)
 	struct region *reg, *nreg;
 
 	read_unlock(&rh->hash_lock);
-	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
+	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
+	if (unlikely(!nreg))
+		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
 	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
 		RH_CLEAN : RH_NOSYNC;
 	nreg->rh = rh;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
@ 2007-08-13 11:33 ` Heiko Carstens
  0 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2007-08-13 11:33 UTC (permalink / raw)
  To: linux-mm, dm-devel
  Cc: Daniel Kobras, Alasdair G Kergon, Stefan Weinhuber, Stefan Bader,
	Andrew Morton, Linus Torvalds

Hi,

the patch below went into 2.6.18. Now my question is: why doesn't it check
if kmalloc(..., GFP_NOIO) returns with a NULL pointer?
Did I miss anything that guarentees that this will always succeed or is it
just a bug?

commit c06aad854fdb9da38fcc22dccfe9d72919453e43
Author: Daniel Kobras <kobras@linux.de>
Date:   Sun Aug 27 01:23:24 2006 -0700

    [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
    
    On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
    we experienced frequent deadlock of the system under high i/o load.  'cat
    /dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
    after a few GB, 'cp' would be left in 'D' state along with kjournald and
    kmirrord.  The functions cp and kjournald were blocked in did vary, but
    kmirrord's wchan always pointed to 'mempool_alloc()'.  We've seen this pattern
    on 2.6.15 and 2.6.17 kernels.  http://lkml.org/lkml/2005/4/20/142 indicates
    that this problem has been around even before.
    
    So much for the facts, here's my interpretation: mempool_alloc() first tries
    to atomically allocate the requested memory, or falls back to hand out
    preallocated chunks from the mempool.  If both fail, it puts the calling
    process (kmirrord in this case) on a private waitqueue until somebody refills
    the pool.  Where the only 'somebody' is kmirrord itself, so we have a
    deadlock.
    
    I worked around this problem by falling back to a (blocking) kmalloc when
    before kmirrord would have ended up on the waitqueue.  This defeats part of
    the benefits of using the mempool, but at least keeps the system running.  And
    it could be done with a two-line change.  Note that mempool_alloc() clears the
    GFP_NOIO flag internally, and only uses it to decide whether to wait or return
    an error if immediate allocation fails, so the attached patch doesn't change
    behaviour in the non-deadlocking case.  Path is against current git
    (2.6.18-rc4), but should apply to earlier versions as well.  I've tested on
    2.6.15, where this patch makes the difference between random lockup and a
    stable system.
    
    Signed-off-by: Daniel Kobras <kobras@linux.de>
    Acked-by: Alasdair G Kergon <agk@redhat.com>
    Cc: <stable@kernel.org>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index be48ced..c54de98 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct region_hash *rh, region_t region)
 	struct region *reg, *nreg;
 
 	read_unlock(&rh->hash_lock);
-	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
+	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
+	if (unlikely(!nreg))
+		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
 	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
 		RH_CLEAN : RH_NOSYNC;
 	nreg->rh = rh;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
  2007-08-13 11:33 ` Heiko Carstens
@ 2007-08-15 22:56   ` Andrew Morton
  -1 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2007-08-15 22:56 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Stefan Weinhuber, Stefan Bader, linux-mm, dm-devel, Daniel Kobras,
	Linus Torvalds, Alasdair G Kergon

On Mon, 13 Aug 2007 13:33:40 +0200
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> Hi,
> 
> the patch below went into 2.6.18. Now my question is: why doesn't it check
> if kmalloc(..., GFP_NOIO) returns with a NULL pointer?
> Did I miss anything that guarentees that this will always succeed or is it
> just a bug?

How come my computer is the only one with a reply button?

Sigh.

> commit c06aad854fdb9da38fcc22dccfe9d72919453e43
> Author: Daniel Kobras <kobras@linux.de>
> Date:   Sun Aug 27 01:23:24 2006 -0700
> 
>     [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
>     
>     On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
>     we experienced frequent deadlock of the system under high i/o load.  'cat
>     /dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
>     after a few GB, 'cp' would be left in 'D' state along with kjournald and
>     kmirrord.  The functions cp and kjournald were blocked in did vary, but
>     kmirrord's wchan always pointed to 'mempool_alloc()'.  We've seen this pattern
>     on 2.6.15 and 2.6.17 kernels.  http://lkml.org/lkml/2005/4/20/142 indicates
>     that this problem has been around even before.
>     
>     So much for the facts, here's my interpretation: mempool_alloc() first tries
>     to atomically allocate the requested memory, or falls back to hand out
>     preallocated chunks from the mempool.  If both fail, it puts the calling
>     process (kmirrord in this case) on a private waitqueue until somebody refills
>     the pool.  Where the only 'somebody' is kmirrord itself, so we have a
>     deadlock.
>     
>     I worked around this problem by falling back to a (blocking) kmalloc when
>     before kmirrord would have ended up on the waitqueue.  This defeats part of
>     the benefits of using the mempool, but at least keeps the system running.  And
>     it could be done with a two-line change.  Note that mempool_alloc() clears the
>     GFP_NOIO flag internally, and only uses it to decide whether to wait or return
>     an error if immediate allocation fails, so the attached patch doesn't change
>     behaviour in the non-deadlocking case.  Path is against current git
>     (2.6.18-rc4), but should apply to earlier versions as well.  I've tested on
>     2.6.15, where this patch makes the difference between random lockup and a
>     stable system.
>     
>     Signed-off-by: Daniel Kobras <kobras@linux.de>
>     Acked-by: Alasdair G Kergon <agk@redhat.com>
>     Cc: <stable@kernel.org>
>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> 
> diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
> index be48ced..c54de98 100644
> --- a/drivers/md/dm-raid1.c
> +++ b/drivers/md/dm-raid1.c
> @@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct region_hash *rh, region_t region)
>  	struct region *reg, *nreg;
>  
>  	read_unlock(&rh->hash_lock);
> -	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
> +	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
> +	if (unlikely(!nreg))
> +		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
>  	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
>  		RH_CLEAN : RH_NOSYNC;
>  	nreg->rh = rh;
> 

Yeah, that's a bug.

kmalloc(small_amount, GFP_NOIO) can fail if the calling process gets
oom-killed, and it can fail if the system is using fault-injection.

One could say "don't use fault injection" and, perhaps, "this is only
ever called by a kernel thread and kernel threads don't get oom-killed". 
But the former is lame and the latter assumes current implementation
details which could change (and indeed have in the past).


So yes, I'd say this is a bug in DM.

Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
__rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
under read_lock(), which is deadlockable and will generate might_sleep()
warnings

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
@ 2007-08-15 22:56   ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2007-08-15 22:56 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-mm, dm-devel, Daniel Kobras, Alasdair G Kergon,
	Stefan Weinhuber, Stefan Bader, Linus Torvalds

On Mon, 13 Aug 2007 13:33:40 +0200
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> Hi,
> 
> the patch below went into 2.6.18. Now my question is: why doesn't it check
> if kmalloc(..., GFP_NOIO) returns with a NULL pointer?
> Did I miss anything that guarentees that this will always succeed or is it
> just a bug?

How come my computer is the only one with a reply button?

Sigh.

> commit c06aad854fdb9da38fcc22dccfe9d72919453e43
> Author: Daniel Kobras <kobras@linux.de>
> Date:   Sun Aug 27 01:23:24 2006 -0700
> 
>     [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
>     
>     On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
>     we experienced frequent deadlock of the system under high i/o load.  'cat
>     /dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
>     after a few GB, 'cp' would be left in 'D' state along with kjournald and
>     kmirrord.  The functions cp and kjournald were blocked in did vary, but
>     kmirrord's wchan always pointed to 'mempool_alloc()'.  We've seen this pattern
>     on 2.6.15 and 2.6.17 kernels.  http://lkml.org/lkml/2005/4/20/142 indicates
>     that this problem has been around even before.
>     
>     So much for the facts, here's my interpretation: mempool_alloc() first tries
>     to atomically allocate the requested memory, or falls back to hand out
>     preallocated chunks from the mempool.  If both fail, it puts the calling
>     process (kmirrord in this case) on a private waitqueue until somebody refills
>     the pool.  Where the only 'somebody' is kmirrord itself, so we have a
>     deadlock.
>     
>     I worked around this problem by falling back to a (blocking) kmalloc when
>     before kmirrord would have ended up on the waitqueue.  This defeats part of
>     the benefits of using the mempool, but at least keeps the system running.  And
>     it could be done with a two-line change.  Note that mempool_alloc() clears the
>     GFP_NOIO flag internally, and only uses it to decide whether to wait or return
>     an error if immediate allocation fails, so the attached patch doesn't change
>     behaviour in the non-deadlocking case.  Path is against current git
>     (2.6.18-rc4), but should apply to earlier versions as well.  I've tested on
>     2.6.15, where this patch makes the difference between random lockup and a
>     stable system.
>     
>     Signed-off-by: Daniel Kobras <kobras@linux.de>
>     Acked-by: Alasdair G Kergon <agk@redhat.com>
>     Cc: <stable@kernel.org>
>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> 
> diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
> index be48ced..c54de98 100644
> --- a/drivers/md/dm-raid1.c
> +++ b/drivers/md/dm-raid1.c
> @@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct region_hash *rh, region_t region)
>  	struct region *reg, *nreg;
>  
>  	read_unlock(&rh->hash_lock);
> -	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
> +	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
> +	if (unlikely(!nreg))
> +		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
>  	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
>  		RH_CLEAN : RH_NOSYNC;
>  	nreg->rh = rh;
> 

Yeah, that's a bug.

kmalloc(small_amount, GFP_NOIO) can fail if the calling process gets
oom-killed, and it can fail if the system is using fault-injection.

One could say "don't use fault injection" and, perhaps, "this is only
ever called by a kernel thread and kernel threads don't get oom-killed". 
But the former is lame and the latter assumes current implementation
details which could change (and indeed have in the past).


So yes, I'd say this is a bug in DM.

Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
__rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
under read_lock(), which is deadlockable and will generate might_sleep()
warnings


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
  2007-08-15 22:56   ` Andrew Morton
@ 2007-08-15 23:59     ` Heiko Carstens
  -1 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2007-08-15 23:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stefan Weinhuber, Stefan Bader, linux-mm, dm-devel, Daniel Kobras,
	Linus Torvalds, Alasdair G Kergon

On Wed, Aug 15, 2007 at 03:56:04PM -0700, Andrew Morton wrote:
> On Mon, 13 Aug 2007 13:33:40 +0200
> Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > the patch below went into 2.6.18. Now my question is: why doesn't it check
> > if kmalloc(..., GFP_NOIO) returns with a NULL pointer?
> > Did I miss anything that guarentees that this will always succeed or is it
> > just a bug?
> > --- a/drivers/md/dm-raid1.c
> > +++ b/drivers/md/dm-raid1.c
> > @@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct region_hash *rh, region_t region)
> >  	struct region *reg, *nreg;
> >  
> >  	read_unlock(&rh->hash_lock);
> > -	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
> > +	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
> > +	if (unlikely(!nreg))
> > +		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
> >  	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
> >  		RH_CLEAN : RH_NOSYNC;
> >  	nreg->rh = rh;
> > 
> 
> Yeah, that's a bug.
> 
> kmalloc(small_amount, GFP_NOIO) can fail if the calling process gets
> oom-killed, and it can fail if the system is using fault-injection.
> 
> One could say "don't use fault injection" and, perhaps, "this is only
> ever called by a kernel thread and kernel threads don't get oom-killed". 
> But the former is lame and the latter assumes current implementation
> details which could change (and indeed have in the past).

Thanks for clarifying!

> So yes, I'd say this is a bug in DM.
> 
> Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
> __rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
> under read_lock(), which is deadlockable and will generate might_sleep()
> warnings

The read_lock() is unlocked at the beginning of the function. Unless
you're talking of a different lock, but I couldn't find any.

So at least _currently_ this should work unless somebody uses fault
injection. Would it make sense then to add the __GFP_NOFAIL flag to
the kmalloc call?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
@ 2007-08-15 23:59     ` Heiko Carstens
  0 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2007-08-15 23:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, dm-devel, Daniel Kobras, Alasdair G Kergon,
	Stefan Weinhuber, Stefan Bader, Linus Torvalds

On Wed, Aug 15, 2007 at 03:56:04PM -0700, Andrew Morton wrote:
> On Mon, 13 Aug 2007 13:33:40 +0200
> Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > the patch below went into 2.6.18. Now my question is: why doesn't it check
> > if kmalloc(..., GFP_NOIO) returns with a NULL pointer?
> > Did I miss anything that guarentees that this will always succeed or is it
> > just a bug?
> > --- a/drivers/md/dm-raid1.c
> > +++ b/drivers/md/dm-raid1.c
> > @@ -255,7 +255,9 @@ static struct region *__rh_alloc(struct region_hash *rh, region_t region)
> >  	struct region *reg, *nreg;
> >  
> >  	read_unlock(&rh->hash_lock);
> > -	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
> > +	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
> > +	if (unlikely(!nreg))
> > +		nreg = kmalloc(sizeof(struct region), GFP_NOIO);
> >  	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
> >  		RH_CLEAN : RH_NOSYNC;
> >  	nreg->rh = rh;
> > 
> 
> Yeah, that's a bug.
> 
> kmalloc(small_amount, GFP_NOIO) can fail if the calling process gets
> oom-killed, and it can fail if the system is using fault-injection.
> 
> One could say "don't use fault injection" and, perhaps, "this is only
> ever called by a kernel thread and kernel threads don't get oom-killed". 
> But the former is lame and the latter assumes current implementation
> details which could change (and indeed have in the past).

Thanks for clarifying!

> So yes, I'd say this is a bug in DM.
> 
> Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
> __rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
> under read_lock(), which is deadlockable and will generate might_sleep()
> warnings

The read_lock() is unlocked at the beginning of the function. Unless
you're talking of a different lock, but I couldn't find any.

So at least _currently_ this should work unless somebody uses fault
injection. Would it make sense then to add the __GFP_NOFAIL flag to
the kmalloc call?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
  2007-08-15 23:59     ` Heiko Carstens
@ 2007-08-16  3:10       ` Andrew Morton
  -1 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2007-08-16  3:10 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Stefan Weinhuber, Stefan Bader, linux-mm, dm-devel, Daniel Kobras,
	Linus Torvalds, Alasdair G Kergon

On Thu, 16 Aug 2007 01:59:56 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> > So yes, I'd say this is a bug in DM.
> > 
> > Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
> > __rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
> > under read_lock(), which is deadlockable and will generate might_sleep()
> > warnings
> 
> The read_lock() is unlocked at the beginning of the function.

Oh, OK.  Looks odd, but whatever.

> Unless
> you're talking of a different lock, but I couldn't find any.
> 
> So at least _currently_ this should work unless somebody uses fault
> injection. Would it make sense then to add the __GFP_NOFAIL flag to
> the kmalloc call?

It would best to avoid that.  __GFP_NOFAIL was added as a way of
consolidating a number of callsites which were performing open-coded
infinite retries and it is also used as a "this is lame and needs to be
fixed" indicator.

It'd be better to fix the kmirrord design so that it can use mempools
properly.  One possible way of doing that might be to notice when mempool
exhaustion happens, submit whatever IO is thus-far buffered up and then do
a sleeping mempool allocation, to wait for that memory to come free (via IO
completion).

That would be a bit abusive of the mempool intent though.  A more idiomatic
fix would be to change kmirrord so that it no longer can consume all of the
mempool's reserves without having submitted any I/O (which is what I assume
it is doing).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
@ 2007-08-16  3:10       ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2007-08-16  3:10 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-mm, dm-devel, Daniel Kobras, Alasdair G Kergon,
	Stefan Weinhuber, Stefan Bader, Linus Torvalds

On Thu, 16 Aug 2007 01:59:56 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> > So yes, I'd say this is a bug in DM.
> > 
> > Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
> > __rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
> > under read_lock(), which is deadlockable and will generate might_sleep()
> > warnings
> 
> The read_lock() is unlocked at the beginning of the function.

Oh, OK.  Looks odd, but whatever.

> Unless
> you're talking of a different lock, but I couldn't find any.
> 
> So at least _currently_ this should work unless somebody uses fault
> injection. Would it make sense then to add the __GFP_NOFAIL flag to
> the kmalloc call?

It would best to avoid that.  __GFP_NOFAIL was added as a way of
consolidating a number of callsites which were performing open-coded
infinite retries and it is also used as a "this is lame and needs to be
fixed" indicator.

It'd be better to fix the kmirrord design so that it can use mempools
properly.  One possible way of doing that might be to notice when mempool
exhaustion happens, submit whatever IO is thus-far buffered up and then do
a sleeping mempool allocation, to wait for that memory to come free (via IO
completion).

That would be a bit abusive of the mempool intent though.  A more idiomatic
fix would be to change kmirrord so that it no longer can consume all of the
mempool's reserves without having submitted any I/O (which is what I assume
it is doing).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
  2007-08-16  3:10       ` Andrew Morton
@ 2007-08-16  7:09         ` Stefan Bader
  -1 siblings, 0 replies; 13+ messages in thread
From: Stefan Bader @ 2007-08-16  7:09 UTC (permalink / raw)
  To: device-mapper development
  Cc: Stefan Weinhuber, Heiko Carstens, linux-mm, Daniel Kobras,
	Linus Torvalds, Alasdair G Kergon

>>> How come my computer is the only one with a reply button?

Hey, I've got one. ;-)

2007/8/16, Andrew Morton <akpm@linux-foundation.org>:
> On Thu, 16 Aug 2007 01:59:56 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
>
> > > So yes, I'd say this is a bug in DM.
> > >
> > > Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
> > > __rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
> > > under read_lock(), which is deadlockable and will generate might_sleep()
> > > warnings
> >
> > The read_lock() is unlocked at the beginning of the function.
>
> Oh, OK.  Looks odd, but whatever.
>

The major trick, if I am not wrong, is to use GFP_ATOMIC on that
mempool_alloc(). This prevents the sleeping allocation but fails, if
memory as well as the pool is exhausted.

>
> It'd be better to fix the kmirrord design so that it can use mempools
> properly.  One possible way of doing that might be to notice when mempool
> exhaustion happens, submit whatever IO is thus-far buffered up and then do
> a sleeping mempool allocation, to wait for that memory to come free (via IO
> completion).
>
> That would be a bit abusive of the mempool intent though.  A more idiomatic
> fix would be to change kmirrord so that it no longer can consume all of the
> mempool's reserves without having submitted any I/O (which is what I assume
> it is doing).
>

The problem is, that only the same thread, that allocates from the
pool would return memory back. This would be done before the new
allocations. But, if there is very high memory pressure, the pool
might get drained in the allocation cycle. Then mempool_alloc() waits
to be woken from mempool_free(). And this never happens, since the
thread will be stuck. So I guess the fix would be to somehow separate
the allocation and freeing functionality. If I remember correctly
back, the patch was always seen as "not quite correctly, but seems to
work". However, due to lack of time, nobody ever came up with a better
solution.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] Re: [PATCH] dm: Fix deadlock under high i/o load in raid1 setup.
@ 2007-08-16  7:09         ` Stefan Bader
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Bader @ 2007-08-16  7:09 UTC (permalink / raw)
  To: device-mapper development
  Cc: Heiko Carstens, Stefan Weinhuber, linux-mm, Daniel Kobras,
	Linus Torvalds, Alasdair G Kergon

>>> How come my computer is the only one with a reply button?

Hey, I've got one. ;-)

2007/8/16, Andrew Morton <akpm@linux-foundation.org>:
> On Thu, 16 Aug 2007 01:59:56 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
>
> > > So yes, I'd say this is a bug in DM.
> > >
> > > Also, __rh_alloc() is called under read_lock(), via __rh_find().  If
> > > __rh_alloc()'s mempool_alloc() fails, it will perform a sleeping allocation
> > > under read_lock(), which is deadlockable and will generate might_sleep()
> > > warnings
> >
> > The read_lock() is unlocked at the beginning of the function.
>
> Oh, OK.  Looks odd, but whatever.
>

The major trick, if I am not wrong, is to use GFP_ATOMIC on that
mempool_alloc(). This prevents the sleeping allocation but fails, if
memory as well as the pool is exhausted.

>
> It'd be better to fix the kmirrord design so that it can use mempools
> properly.  One possible way of doing that might be to notice when mempool
> exhaustion happens, submit whatever IO is thus-far buffered up and then do
> a sleeping mempool allocation, to wait for that memory to come free (via IO
> completion).
>
> That would be a bit abusive of the mempool intent though.  A more idiomatic
> fix would be to change kmirrord so that it no longer can consume all of the
> mempool's reserves without having submitted any I/O (which is what I assume
> it is doing).
>

The problem is, that only the same thread, that allocates from the
pool would return memory back. This would be done before the new
allocations. But, if there is very high memory pressure, the pool
might get drained in the allocation cycle. Then mempool_alloc() waits
to be woken from mempool_free(). And this never happens, since the
thread will be stuck. So I guess the fix would be to somehow separate
the allocation and freeing functionality. If I remember correctly
back, the patch was always seen as "not quite correctly, but seems to
work". However, due to lack of time, nobody ever came up with a better
solution.

Stefan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-08-16  7:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-13 11:33 [PATCH] dm: Fix deadlock under high i/o load in raid1 setup Heiko Carstens
2007-08-13 11:33 ` Heiko Carstens
2007-08-15 22:56 ` Andrew Morton
2007-08-15 22:56   ` Andrew Morton
2007-08-15 23:59   ` Heiko Carstens
2007-08-15 23:59     ` Heiko Carstens
2007-08-16  3:10     ` Andrew Morton
2007-08-16  3:10       ` Andrew Morton
2007-08-16  7:09       ` Stefan Bader
2007-08-16  7:09         ` [dm-devel] " Stefan Bader
  -- strict thread matches above, loose matches on Subject: below --
2006-08-09 16:44 Daniel Kobras
2006-08-12 20:02 ` Andrew Morton
2006-08-07 22:36 Daniel Kobras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.