All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	chris.mason@oracle.com, david@fromorbit.com,
	akpm@linux-foundation.org, jack@suse.cz,
	yanmin_zhang@linux.intel.com, linux-scsi@vger.kernel.org,
	Matthew Wilcox <matthew@wil.cx>, Andi Kleen <andi@firstfloor.org>,
	James Bottomley <James.Bottomley@hansenpartnership.com>
Subject: Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
Date: Mon, 25 May 2009 13:33:42 +0300	[thread overview]
Message-ID: <4A1A7406.5060106@panasas.com> (raw)
In-Reply-To: <20090525075008.GA23413@infradead.org>

On 05/25/2009 10:50 AM, Christoph Hellwig wrote:
> On Mon, May 25, 2009 at 09:46:47AM +0200, Jens Axboe wrote:
>>> But that patch looks good to me, avoiding one allocation for each
>>> command and simplifying the code.  I try to remember why these were
>>> two slabs to start with but can't find any reason.
>>>

I posted an answer to that here:
http://www.spinics.net/lists/kernel/msg889604.html

It was done for none-cache-coherent systems that need to dma into sense_buffer.

>>> Btw, we might just want to declare the sense buffer directly as a sized
>>> array in the scsi command as there really doesn't seem to be a reason
>>> not to allocate it.
>> That is also a workable solution. I've been trying to cut down on the
>> number of allocations required per-IO, and there's definitely still some
>> low hanging fruit there. Some of it is already included, like the inline
>> io_vecs in the bio.
> 
> Btw, one thing I wanted to do for years is to add ->alloc_cmnd and
> ->destroy_cmnd method to the host template which optionally move the
> command allocation to the LLDD.  That way we can embedd the scsi_cmnd
> into the drivers per-commad structure and eliminate another memory
> allocation.  

It is nice in theory, but when trying to implement I encountered some
problems.

1. If we have a machine with few type of hosts active each with it's own
cmnd_slab we end up with many more slabs then today. Even though at the
end they all happen to be of the same size. (With the pool reserves it
can get big also).

2. Some considerations are system-wide and system-dependent (like above
   problem) and should be centralized into one place so if/when things
   change they can be changed in one place.
2.1. Don't trust driver writers to do the right thing.

3. There are common needs that are cross drivers, and no code should be duplicated.
   For example Bidi-Commands, use of scsi_ptr, ISA_DMA, ... and such not.

I totally agree with the need and robustness this will give...

So I think we might approach this from a slightly different way.

Hosts specify an size_of_private_command at host template, which might include
the common-scsi_cmnd + sense_buffer + private_cmnd + optional scsi_ptr +
bidi_data_buffer + ...

scsi_ml has a base-two-sized set of slabs that get allocated on first use
(at host registration) and hosts get to share the pools with same size.
[Alternatively hosts just keep reserved-commands list and regular use gets
 kmalloced]

All handling is centralized, with special needs specified at host template
like dma_mask ISA_flags and such.

> Also this would naturally extend the keep one cmnd pool
> to drivers without requiring additional code.  As a second step it
> would also allow killing the scsi_host_cmd_pool byt just having
> a set of library routines that drivers which need SLAB_CACHE_DMA can
> use.
> 

I'm afraid this will need to be done first. Layout the new facilities
and implement today's lowest-denominator on top of that. Then convert
driver by driver. Finally remove the old croft.

Lets all agree on a rough sketch and we can all get behind it. There are
a few people I know that will help, Matthew Wilcox, Me , perhaps Jens
and Christoph.

This will also finally help Andi Kleen's needs with the masked allocators

Boaz

  parent reply	other threads:[~2009-05-25 10:33 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-25  7:30 [PATCH 0/12] Per-bdi writeback flusher threads #5 Jens Axboe
2009-05-25  7:30 ` [PATCH 01/13] libata: get rid of ATA_MAX_QUEUE loop in ata_qc_complete_multiple() Jens Axboe
2009-05-25  7:30 ` [PATCH 01/12] ntfs: remove old debug check for dirty data in ntfs_put_super() Jens Axboe
2009-05-25  7:30 ` [PATCH 02/13] block: add static rq allocation cache Jens Axboe
2009-05-25  7:30 ` [PATCH 02/12] btrfs: properly register fs backing device Jens Axboe
2009-05-25  7:30 ` [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer Jens Axboe
2009-05-25  7:41   ` Christoph Hellwig
2009-05-25  7:46     ` Jens Axboe
2009-05-25  7:50       ` Christoph Hellwig
2009-05-25  7:54         ` Jens Axboe
2009-05-25 10:33         ` Boaz Harrosh [this message]
2009-05-25 10:42           ` Christoph Hellwig
2009-05-25 10:49             ` Jens Axboe
2009-05-26  4:36         ` FUJITA Tomonori
2009-05-26  5:08           ` FUJITA Tomonori
2009-05-25  8:15   ` Pekka Enberg
2009-05-25  8:15     ` Pekka Enberg
2009-05-25 11:32     ` Nick Piggin
2009-05-25  9:28   ` Boaz Harrosh
2009-05-26  1:45     ` Roland Dreier
2009-05-26  4:36       ` FUJITA Tomonori
2009-05-26  6:29         ` Jens Axboe
2009-05-26  7:25           ` FUJITA Tomonori
2009-05-26  7:32             ` Jens Axboe
2009-05-26  7:38               ` FUJITA Tomonori
2009-05-26 14:47                 ` James Bottomley
2009-05-26 15:13                   ` Matthew Wilcox
2009-05-26 15:31                   ` FUJITA Tomonori
2009-05-26 16:05                     ` Boaz Harrosh
2009-05-27  1:36                       ` FUJITA Tomonori
2009-05-27  7:54                         ` Boaz Harrosh
2009-05-27  8:26                           ` FUJITA Tomonori
2009-05-27  9:11                             ` Boaz Harrosh
2009-05-26 16:12                   ` Boaz Harrosh
2009-05-26 16:28                     ` Boaz Harrosh
2009-05-26  7:56               ` FUJITA Tomonori
2009-05-26  5:23     ` FUJITA Tomonori
2009-05-25  7:30 ` [PATCH 03/12] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-05-25  7:30 ` [PATCH 04/13] scsi: get rid of lock in __scsi_put_command() Jens Axboe
2009-05-25  7:30 ` [PATCH 04/12] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-05-25  7:30 ` [PATCH 05/13] aio: mostly crap Jens Axboe
2009-05-25  9:09   ` Jan Kara
2009-05-25  7:30 ` [PATCH 05/12] writeback: get rid of pdflush completely Jens Axboe
2009-05-25  7:30 ` [PATCH 06/13] block: move elevator ops into the queue Jens Axboe
2009-05-25  7:30 ` [PATCH 06/12] writeback: separate the flushing state/task from the bdi Jens Axboe
2009-05-25  7:30 ` [PATCH 07/13] block: avoid indirect calls to enter cfq io scheduler Jens Axboe
2009-05-26  9:02   ` Nikanth K
2009-05-26  9:02     ` Nikanth K
2009-05-25  7:30 ` [PATCH 07/12] writeback: support > 1 flusher thread per bdi Jens Axboe
2009-05-25  7:30 ` [PATCH 08/13] block: change the tag sync vs async restriction logic Jens Axboe
2009-05-25  7:30 ` [PATCH 08/12] writeback: include default_backing_dev_info in writeback Jens Axboe
2009-05-25  7:31 ` [PATCH 09/13] libata: switch to using block layer tagging support Jens Axboe
2009-05-25  7:31 ` [PATCH 09/12] writeback: allow sleepy exit of default writeback task Jens Axboe
2009-05-25  7:31 ` [PATCH 10/13] block: add function for waiting for a specific free tag Jens Axboe
2009-05-25  7:31 ` [PATCH 10/12] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-05-25  7:31 ` [PATCH 11/13] block: disallow merging of read-ahead bits into normal request Jens Axboe
2009-05-25  7:31 ` [PATCH 11/12] writeback: add name to backing_dev_info Jens Axboe
2009-05-25  7:31 ` [PATCH 12/13] block: first cut at implementing a NAPI approach for block devices Jens Axboe
2009-05-25  7:31 ` [PATCH 12/12] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-05-25  7:31 ` [PATCH 13/13] block: unlocked completion test patch Jens Axboe
2009-05-25  7:33 ` [PATCH 0/12] Per-bdi writeback flusher threads #5 Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A1A7406.5060106@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=matthew@wil.cx \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.