All of lore.kernel.org
 help / color / mirror / Atom feed
From: FUJITA Tomonori <tomof@acm.org>
To: bharrosh@panasas.com
Cc: fujita.tomonori@lab.ntt.co.jp,
	James.Bottomley@HansenPartnership.com,
	linux-scsi@vger.kernel.org,
	tomof@acm.orgfujita.tomonori@lab.ntt.co.jp
Subject: Re: [PATCH v2] use dynamically allocated sense buffer
Date: Wed, 16 Jan 2008 00:08:37 +0900	[thread overview]
Message-ID: <20080116000836Y.tomof@acm.org> (raw)
In-Reply-To: <478CBBA8.4050202@panasas.com>

On Tue, 15 Jan 2008 15:56:56 +0200
Boaz Harrosh <bharrosh@panasas.com> wrote:

> On Tue, Jan 15 2008 at 11:23 +0200, FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
> > This is the second version of
> > 
> > http://marc.info/?l=linux-scsi&m=119933628210006&w=2
> > 
> > I gave up once, but I found that the performance loss is negligible
> > (within 1%) by using kmem_cache_alloc instead of mempool.
> > 
> > I use scsi_debug with fake_rw=1 and disktest (DIO reads with 8
> > threads) again:
> > 
> > scsi-misc (slub)         | 486.9 MB/s  IOPS 124652.9/s
> > dynamic sense buf (slub) | 483.2 MB/s  IOPS 123704.1/s
> > 
> > scsi-misc (slab)         | 467.0 MB/s  IOPS 119544.3/s
> > dynamic sense buf (slab) | 468.7 MB/s  IOPS 119986.0/s
> > 
> > The results are the averages of three runs with a server using two
> > dual-core 1.60 GHz Xeon processors with DDR2 memory.
> > 
> > 
> > I doubt think that someone will complain about the performance
> > regression due to this patch. In addition, unlike scsi_debug, the real
> > LLDs allocate the own data structure per scsi_cmnd so the performance
> > differences would be smaller (and with the real hard disk overheads).
> > 
> > Here's the full results:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/tomo/sense/results.txt
> > 
> TOMO Hi.
> This is grate news. Thanks I like what you did here. and it's good
> to know. Why should a mempool be so slow ;)
> 
> I have a small concern of a leak, please see below, but otherwise
> this is grate.
> > 
> > =
> > From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> > Subject: [PATCH] use dynamically allocated sense buffer
> > 
> > This removes static array sense_buffer in scsi_cmnd and uses
> > dynamically allocated sense_buffer (with GFP_DMA).
> > 
> > The reason for doing this is that some architectures need cacheline
> > aligned buffer for DMA:
> > 
> > http://lkml.org/lkml/2007/11/19/2
> > 
> > The problems are that scsi_eh_prep_cmnd puts scsi_cmnd::sense_buffer
> > to sglist and some LLDs directly DMA to scsi_cmnd::sense_buffer. It's
> > necessary to DMA to scsi_cmnd::sense_buffer safely. This patch solves
> > these issues.
> > 
> > __scsi_get_command allocates sense_buffer via kmem_cache_alloc and
> > attaches it to a scsi_cmnd so everything just work as before.
> > 
> > A scsi_host reserves one sense buffer for the backup command
> > (shost->backup_sense_buffer).
> > 
> > 
> > Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> > ---
> >  drivers/scsi/hosts.c     |   10 ++++++-
> >  drivers/scsi/scsi.c      |   67 ++++++++++++++++++++++++++++++++++++++++++++-
> >  drivers/scsi/scsi_priv.h |    2 +
> >  include/scsi/scsi_cmnd.h |    2 +-
> >  include/scsi/scsi_host.h |    3 ++
> >  5 files changed, 80 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> > index 9a10b43..35c5f0e 100644
> > --- a/drivers/scsi/hosts.c
> > +++ b/drivers/scsi/hosts.c
> > @@ -205,10 +205,14 @@ int scsi_add_host(struct Scsi_Host *shost, struct device *dev)
> >  	if (!shost->shost_gendev.parent)
> >  		shost->shost_gendev.parent = dev ? dev : &platform_bus;
> >  
> > -	error = device_add(&shost->shost_gendev);
> > +	error = scsi_setup_command_sense_buffer(shost);
> >  	if (error)
> >  		goto out;
> >  
> > +	error = device_add(&shost->shost_gendev);
> > +	if (error)
> > +		goto destroy_pool;
> > +
> >  	scsi_host_set_state(shost, SHOST_RUNNING);
> >  	get_device(shost->shost_gendev.parent);
> >  
> > @@ -248,6 +252,8 @@ int scsi_add_host(struct Scsi_Host *shost, struct device *dev)
> >  	class_device_del(&shost->shost_classdev);
> >   out_del_gendev:
> >  	device_del(&shost->shost_gendev);
> > + destroy_pool:
> > +	scsi_destroy_command_sense_buffer(shost);
> >   out:
> >  	return error;
> >  }
> > @@ -267,6 +273,8 @@ static void scsi_host_dev_release(struct device *dev)
> >  		scsi_free_queue(shost->uspace_req_q);
> >  	}
> >  
> > +	scsi_destroy_command_sense_buffer(shost);
> > +
> >  	scsi_destroy_command_freelist(shost);
> >  	if (shost->bqt)
> >  		blk_free_tags(shost->bqt);
> > diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> > index 54ff611..d153da3 100644
> > --- a/drivers/scsi/scsi.c
> > +++ b/drivers/scsi/scsi.c
> > @@ -161,6 +161,9 @@ static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
> >  
> >  static DEFINE_MUTEX(host_cmd_pool_mutex);
> >  
> > +static struct kmem_cache *sense_buffer_slab;
> > +static int sense_buffer_slab_users;
> > +
> >  /**
> >   * __scsi_get_command - Allocate a struct scsi_cmnd
> >   * @shost: host to transmit command
> > @@ -186,6 +189,22 @@ struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *shost, gfp_t gfp_mask)
> >  			list_del_init(&cmd->list);
> >  		}
> >  		spin_unlock_irqrestore(&shost->free_list_lock, flags);
> > +
> > +		if (cmd) {
> > +			memset(cmd, 0, sizeof(*cmd));
> > +			cmd->sense_buffer = shost->backup_sense_buffer;
> [1]
> If command was put on free_list in __put_command(), then this here will leak the 
> sense_buffer that was allocated for that command. See explanations below.
> 
> > +		}
> > +	} else {
> > +		unsigned char *buf;
> > +
> > +		buf = kmem_cache_alloc(sense_buffer_slab, __GFP_DMA|gfp_mask);
> > +		if (likely(buf)) {
> > +			memset(cmd, 0, sizeof(*cmd));
> > +			cmd->sense_buffer = buf;
> > +		} else {
> > +			kmem_cache_free(shost->cmd_pool->slab, cmd);
> > +			cmd = NULL;
> > +		}
> >  	}
> >  
> >  	return cmd;
> > @@ -212,7 +231,6 @@ struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, gfp_t gfp_mask)
> >  	if (likely(cmd != NULL)) {
> >  		unsigned long flags;
> >  
> > -		memset(cmd, 0, sizeof(*cmd));
> >  		cmd->device = dev;
> >  		init_timer(&cmd->eh_timeout);
> >  		INIT_LIST_HEAD(&cmd->list);
> > @@ -246,8 +264,10 @@ void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd,
> >  	}
> >  	spin_unlock_irqrestore(&shost->free_list_lock, flags);
> >  
> > -	if (likely(cmd != NULL))
> > +	if (likely(cmd != NULL)) {
> > +		kmem_cache_free(sense_buffer_slab, cmd->sense_buffer);
> >  		kmem_cache_free(shost->cmd_pool->slab, cmd);
> > +	}
> >  
> >  	put_device(dev);
> >  }
> > @@ -351,6 +371,49 @@ void scsi_destroy_command_freelist(struct Scsi_Host *shost)
> >  	mutex_unlock(&host_cmd_pool_mutex);
> >  }
> >  
> > +int scsi_setup_command_sense_buffer(struct Scsi_Host *shost)
> > +{
> > +	unsigned char *sense_buffer;
> > +
> > +	mutex_lock(&host_cmd_pool_mutex);
> > +	if (!sense_buffer_slab_users) {
> > +		sense_buffer_slab = kmem_cache_create("scsi_sense_buffer",
> > +						      SCSI_SENSE_BUFFERSIZE,
> > +						      0, SLAB_CACHE_DMA, NULL);
> > +		if (!sense_buffer_slab) {
> > +			mutex_unlock(&host_cmd_pool_mutex);
> > +			return -ENOMEM;
> > +		}
> > +	}
> > +	sense_buffer_slab_users++;
> > +	mutex_unlock(&host_cmd_pool_mutex);
> > +
> > +	sense_buffer = kmem_cache_alloc(sense_buffer_slab,
> > +					GFP_KERNEL | __GFP_DMA);
> > +	if (!sense_buffer)
> > +		goto fail;
> > +
> > +	shost->backup_sense_buffer = sense_buffer;
> > +
> > +	return 0;
> > +fail:
> > +	mutex_lock(&host_cmd_pool_mutex);
> > +	if (!--sense_buffer_slab_users)
> > +			kmem_cache_destroy(sense_buffer_slab);
> > +	mutex_unlock(&host_cmd_pool_mutex);
> > +	return -ENOMEM;
> > +}
> > +
> > +void scsi_destroy_command_sense_buffer(struct Scsi_Host *shost)
> > +{
> > +	kmem_cache_free(sense_buffer_slab, shost->backup_sense_buffer);
> > +
> > +	mutex_lock(&host_cmd_pool_mutex);
> > +	if (!--sense_buffer_slab_users)
> > +		kmem_cache_destroy(sense_buffer_slab);
> > +	mutex_unlock(&host_cmd_pool_mutex);
> > +}
> > +
> >  #ifdef CONFIG_SCSI_LOGGING
> >  void scsi_log_send(struct scsi_cmnd *cmd)
> >  {
> > diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> > index 3f34e93..55c6f71 100644
> > --- a/drivers/scsi/scsi_priv.h
> > +++ b/drivers/scsi/scsi_priv.h
> > @@ -27,6 +27,8 @@ extern void scsi_exit_hosts(void);
> >  extern int scsi_dispatch_cmd(struct scsi_cmnd *cmd);
> >  extern int scsi_setup_command_freelist(struct Scsi_Host *shost);
> >  extern void scsi_destroy_command_freelist(struct Scsi_Host *shost);
> > +extern int scsi_setup_command_sense_buffer(struct Scsi_Host *shost);
> > +extern void scsi_destroy_command_sense_buffer(struct Scsi_Host *shost);
> >  extern void __scsi_done(struct scsi_cmnd *cmd);
> >  #ifdef CONFIG_SCSI_LOGGING
> >  void scsi_log_send(struct scsi_cmnd *cmd);
> > diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> > index 3f47e52..abd7479 100644
> > --- a/include/scsi/scsi_cmnd.h
> > +++ b/include/scsi/scsi_cmnd.h
> > @@ -88,7 +88,7 @@ struct scsi_cmnd {
> >  				   	   working on */
> >  
> >  #define SCSI_SENSE_BUFFERSIZE 	96
> > -	unsigned char sense_buffer[SCSI_SENSE_BUFFERSIZE];
> > +	unsigned char *sense_buffer;
> >  				/* obtained by REQUEST SENSE when
> >  				 * CHECK CONDITION is received on original
> >  				 * command (auto-sense) */
> > diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> > index 0fd4746..65d2bcf 100644
> > --- a/include/scsi/scsi_host.h
> > +++ b/include/scsi/scsi_host.h
> > @@ -520,6 +520,9 @@ struct Scsi_Host {
> >  	struct list_head	free_list; /* backup store of cmd structs */
> >  	struct list_head	starved_list;
> >  
> > +	/* sense buffer for the backup command */
> > +	unsigned char *backup_sense_buffer;
> > +
> >  	spinlock_t		default_lock;
> >  	spinlock_t		*host_lock;
> >  
> 
> commands can be put on the free list in 2 places:
> [1]
> void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd,
> 			struct device *dev)
> {
> 	unsigned long flags;
> 
> 	/* changing locks here, don't need to restore the irq state */
> 	spin_lock_irqsave(&shost->free_list_lock, flags);
> 	if (unlikely(list_empty(&shost->free_list))) {
> 		list_add(&cmd->list, &shost->free_list);
> 		cmd = NULL;
> 	}
>  ...
> and 
> [2]
> int scsi_setup_command_freelist(struct Scsi_Host *shost)
> {
>  ...
> 	if (!cmd)
> 		goto fail2;
> 	list_add(&cmd->list, &shost->free_list);
> 	return 0;
>  ...
> 
> case [1] cmnd had a sense_buffer with it, case [2] did not. The easiest fix
> would be to remove just the sense buffer from [1] and have an empty cmnd
> on the free_list in all cases.

I'm not sure about what you mean.

scsi_setup_command_freelist is called only by
scsi_host_alloc. It puts only one backup command to
shost->free_list. The patch allocates one sense_buffer to the backup
command (it's hooked on shost->backup_sense_buffer).

__scsi_get_command always uses shost->backup_sense_buffer for the
backup command. It allocates sense_buffer from sense_buffer_slab for
commands allocated from shost->cmd_pool->slab.


If __scsi_put_command puts a command to shost->free_list, it doesn't
free scmd->sense_buffer since it's the sense_buffer for the backup
sense_buffer. If __scsi_put_command puts a command to
shost->cmd_pool->slab (if shost->free_list isn't empty), it alos puts
its sense_buffer to sense_buffer_slab.


> But I would suggest to just put the extra allocated sense_buffer on the
> cmnd in case [2] and always have cmnd+sense_buffer, this way you can get
> rid of the pointer for the backup_sense_buffer in host struct. (and have
> the code localized to scsi.c only)
> 
> Also, is there a kmem_cache_zalloc()? I would use it for the command allocation
> just to make sure when we do scsi_destroy_command_freelist() in the case that
> a sense_buffer allocation failed and the host is unloaded.
> 
> Boaz
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2008-01-15 15:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-15  9:23 [PATCH v2] use dynamically allocated sense buffer FUJITA Tomonori
2008-01-15 13:56 ` Boaz Harrosh
2008-01-15 15:08   ` FUJITA Tomonori [this message]
2008-01-15 15:44     ` Boaz Harrosh
2008-01-16  1:18       ` FUJITA Tomonori
2008-01-15 15:20 ` James Bottomley
2008-01-15 15:48   ` Boaz Harrosh
2008-01-16 12:35   ` Benny Halevy
2008-01-17  9:13     ` FUJITA Tomonori
2008-01-17 15:58       ` James Bottomley
2008-01-17 20:56         ` FUJITA Tomonori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080116000836Y.tomof@acm.org \
    --to=tomof@acm.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=bharrosh@panasas.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=linux-scsi@vger.kernel.org \
    --cc=tomof@acm.orgfujita.tomonori \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.