public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: fix sense_slab/bio swapping livelock
@ 2008-04-06 22:56 Hugh Dickins
  2008-04-06 23:35 ` James Bottomley
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Hugh Dickins @ 2008-04-06 22:56 UTC (permalink / raw)
  To: James Bottomley
  Cc: Linus Torvalds, Andrew Morton, FUJITA Tomonori, Jens Axboe,
	Christoph Lameter, Pekka Enberg, Peter Zijlstra,
	Rafael J. Wysocki, linux-kernel

Since 2.6.25-rc7, I've been seeing an occasional livelock on one
x86_64 machine, copying kernel trees to tmpfs, paging out to swap.

Signature: 6000 pages under writeback but never getting written;
most tasks of interest trying to reclaim, but each get_swap_bio
waiting for a bio in mempool_alloc's io_schedule_timeout(5*HZ);
every five seconds an atomic page allocation failure report from
kblockd failing to allocate a sense_buffer in __scsi_get_command.

__scsi_get_command has a (one item) free_list to protect against
this, but rc1's [SCSI] use dynamically allocated sense buffer
de25deb18016f66dcdede165d07654559bb332bc upset that slightly.
When it fails to allocate from the separate sense_slab, instead
of giving up, it must fall back to the command free_list, which
is sure to have a sense_buffer attached.

Either my earlier -rc testing missed this, or there's some recent
contributory factor.  One very significant factor is SLUB, which
merges slab caches when it can, and on 64-bit happens to merge
both bio cache and sense_slab cache into kmalloc's 128-byte cache:
so that under this swapping load, bios above are liable to gobble
up all the slots needed for scsi_cmnd sense_buffers below.

That's disturbing behaviour, and I tried a few things to fix it.
Adding a no-op constructor to the sense_slab inhibits SLUB from
merging it, and stops all the allocation failures I was seeing;
but it's rather a hack, and perhaps in different configurations
we have other caches on the swapout path which are ill-merged.

Another alternative is to revert the separate sense_slab, using
cache-line-aligned sense_buffer allocated beyond scsi_cmnd from
the one kmem_cache; but that might waste more memory, and is
only a way of diverting around the known problem.

While I don't like seeing the allocation failures, and hate the
idea of all those bios piled up above a scsi host working one by
one, it does seem to emerge fairly soon with the livelock fix.
So lacking better ideas, stick with that one clear fix for now.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 drivers/scsi/scsi.c |   22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

--- 2.6.25-rc8/drivers/scsi/scsi.c	2008-03-05 10:47:40.000000000 +0000
+++ linux/drivers/scsi/scsi.c	2008-04-05 22:23:40.000000000 +0100
@@ -181,6 +181,18 @@ struct scsi_cmnd *__scsi_get_command(str
 	cmd = kmem_cache_alloc(shost->cmd_pool->cmd_slab,
 			       gfp_mask | shost->cmd_pool->gfp_mask);
 
+	if (likely(cmd)) {
+		buf = kmem_cache_alloc(shost->cmd_pool->sense_slab,
+				       gfp_mask | shost->cmd_pool->gfp_mask);
+		if (likely(buf)) {
+			memset(cmd, 0, sizeof(*cmd));
+			cmd->sense_buffer = buf;
+		} else {
+			kmem_cache_free(shost->cmd_pool->cmd_slab, cmd);
+			cmd = NULL;
+		}
+	}
+
 	if (unlikely(!cmd)) {
 		unsigned long flags;
 
@@ -197,16 +209,6 @@ struct scsi_cmnd *__scsi_get_command(str
 			memset(cmd, 0, sizeof(*cmd));
 			cmd->sense_buffer = buf;
 		}
-	} else {
-		buf = kmem_cache_alloc(shost->cmd_pool->sense_slab,
-				       gfp_mask | shost->cmd_pool->gfp_mask);
-		if (likely(buf)) {
-			memset(cmd, 0, sizeof(*cmd));
-			cmd->sense_buffer = buf;
-		} else {
-			kmem_cache_free(shost->cmd_pool->cmd_slab, cmd);
-			cmd = NULL;
-		}
 	}
 
 	return cmd;

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2008-04-08 21:41 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-06 22:56 [PATCH] scsi: fix sense_slab/bio swapping livelock Hugh Dickins
2008-04-06 23:35 ` James Bottomley
2008-04-07  1:01   ` Hugh Dickins
2008-04-07 17:51     ` Hugh Dickins
2008-04-07 18:04       ` James Bottomley
2008-04-07 18:26         ` Hugh Dickins
2008-04-07  2:48 ` FUJITA Tomonori
2008-04-07 18:07   ` Hugh Dickins
2008-04-08 14:04     ` FUJITA Tomonori
2008-04-07  5:26 ` Christoph Lameter
2008-04-07 19:40   ` Hugh Dickins
2008-04-07 19:55     ` Peter Zijlstra
2008-04-07 20:31       ` Hugh Dickins
2008-04-07 20:47         ` Peter Zijlstra
2008-04-07 21:00         ` Pekka Enberg
2008-04-07 21:05           ` Pekka Enberg
2008-04-07 21:15             ` Linus Torvalds
2008-04-07 21:34               ` Pekka Enberg
2008-04-07 21:39                 ` Pekka Enberg
2008-04-07 22:05                   ` Pekka J Enberg
2008-04-07 22:17                     ` Linus Torvalds
2008-04-07 22:42                       ` Pekka Enberg
2008-04-08 20:42                       ` Pekka J Enberg
2008-04-08 20:44                         ` Pekka Enberg
2008-04-08 20:45                         ` Christoph Lameter
2008-04-08 21:11                           ` Pekka Enberg
2008-04-08 21:40                             ` Peter Zijlstra
2008-04-07 21:30             ` Hugh Dickins
2008-04-07 21:36               ` Pekka Enberg
2008-04-08 20:43     ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox