Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
       [not found] ` <1243236668-3398-6-git-send-email-jens.axboe@oracle.com>
@ 2009-05-25  7:41   ` Christoph Hellwig
  2009-05-25  7:46     ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2009-05-25  7:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-kernel, linux-fsdevel, chris.mason, david, hch, akpm, jack,
	yanmin_zhang, linux-scsi

On Mon, May 25, 2009 at 09:30:48AM +0200, Jens Axboe wrote:
> Fold the sense buffer into the command, thereby eliminating a slab
> allocation and free per command.

Might help to send it to linux-scsi to get people to review and apply it
:)

But that patch looks good to me, avoiding one allocation for each
command and simplifying the code.  I try to remember why these were
two slabs to start with but can't find any reason.

Btw, we might just want to declare the sense buffer directly as a sized
array in the scsi command as there really doesn't seem to be a reason
not to allocate it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-25  7:41   ` [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer Christoph Hellwig
@ 2009-05-25  7:46     ` Jens Axboe
  2009-05-25  7:50       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2009-05-25  7:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-fsdevel, chris.mason, david, akpm, jack,
	yanmin_zhang, linux-scsi

On Mon, May 25 2009, Christoph Hellwig wrote:
> On Mon, May 25, 2009 at 09:30:48AM +0200, Jens Axboe wrote:
> > Fold the sense buffer into the command, thereby eliminating a slab
> > allocation and free per command.
> 
> Might help to send it to linux-scsi to get people to review and apply it
> :)

yeah, as I later posted, this wasn't meant to be sent out as part of
the writeback series :-)

> But that patch looks good to me, avoiding one allocation for each
> command and simplifying the code.  I try to remember why these were
> two slabs to start with but can't find any reason.
> 
> Btw, we might just want to declare the sense buffer directly as a sized
> array in the scsi command as there really doesn't seem to be a reason
> not to allocate it.

That is also a workable solution. I've been trying to cut down on the
number of allocations required per-IO, and there's definitely still some
low hanging fruit there. Some of it is already included, like the inline
io_vecs in the bio.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-25  7:46     ` Jens Axboe
@ 2009-05-25  7:50       ` Christoph Hellwig
  2009-05-25  7:54         ` Jens Axboe
                           ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Christoph Hellwig @ 2009-05-25  7:50 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, linux-kernel, linux-fsdevel, chris.mason,
	david, akpm, jack, yanmin_zhang, linux-scsi

On Mon, May 25, 2009 at 09:46:47AM +0200, Jens Axboe wrote:
> > But that patch looks good to me, avoiding one allocation for each
> > command and simplifying the code.  I try to remember why these were
> > two slabs to start with but can't find any reason.
> > 
> > Btw, we might just want to declare the sense buffer directly as a sized
> > array in the scsi command as there really doesn't seem to be a reason
> > not to allocate it.
> 
> That is also a workable solution. I've been trying to cut down on the
> number of allocations required per-IO, and there's definitely still some
> low hanging fruit there. Some of it is already included, like the inline
> io_vecs in the bio.

Btw, one thing I wanted to do for years is to add ->alloc_cmnd and
->destroy_cmnd method to the host template which optionally move the
command allocation to the LLDD.  That way we can embedd the scsi_cmnd
into the drivers per-commad structure and eliminate another memory
allocation.  Also this would naturally extend the keep one cmnd pool
to drivers without requiring additional code.  As a second step it
would also allow killing the scsi_host_cmd_pool byt just having
a set of library routines that drivers which need SLAB_CACHE_DMA can
use.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-25  7:50       ` Christoph Hellwig
@ 2009-05-25  7:54         ` Jens Axboe
  2009-05-25 10:33         ` Boaz Harrosh
  2009-05-26  4:36         ` FUJITA Tomonori
  2 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2009-05-25  7:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-fsdevel, chris.mason, david, akpm, jack,
	yanmin_zhang, linux-scsi

On Mon, May 25 2009, Christoph Hellwig wrote:
> On Mon, May 25, 2009 at 09:46:47AM +0200, Jens Axboe wrote:
> > > But that patch looks good to me, avoiding one allocation for each
> > > command and simplifying the code.  I try to remember why these were
> > > two slabs to start with but can't find any reason.
> > > 
> > > Btw, we might just want to declare the sense buffer directly as a sized
> > > array in the scsi command as there really doesn't seem to be a reason
> > > not to allocate it.
> > 
> > That is also a workable solution. I've been trying to cut down on the
> > number of allocations required per-IO, and there's definitely still some
> > low hanging fruit there. Some of it is already included, like the inline
> > io_vecs in the bio.
> 
> Btw, one thing I wanted to do for years is to add ->alloc_cmnd and
> ->destroy_cmnd method to the host template which optionally move the
> command allocation to the LLDD.  That way we can embedd the scsi_cmnd
> into the drivers per-commad structure and eliminate another memory
> allocation.  Also this would naturally extend the keep one cmnd pool
> to drivers without requiring additional code.  As a second step it
> would also allow killing the scsi_host_cmd_pool byt just having
> a set of library routines that drivers which need SLAB_CACHE_DMA can
> use.

That's a good idea and could kill one more alloc/free per IO. I'll add
that to the mix!

And in case anyone is interested, the patches that got mixed up with the
writeback patches are from the 'ssd' branch. It's basically a mix of
experimental patches for improving performance. Some are crap, some are
worth continuing with. There's been a steady influx of patches from
there to mainline, so it's a continually changing branch. Well not so
much lately since I spent most of the time in the writeback branch, but
otherwise.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-25  7:50       ` Christoph Hellwig
  2009-05-25  7:54         ` Jens Axboe
@ 2009-05-25 10:33         ` Boaz Harrosh
  2009-05-25 10:42           ` Christoph Hellwig
  2009-05-26  4:36         ` FUJITA Tomonori
  2 siblings, 1 reply; 23+ messages in thread
From: Boaz Harrosh @ 2009-05-25 10:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-kernel, linux-fsdevel, chris.mason, david, akpm,
	jack, yanmin_zhang, linux-scsi, Matthew Wilcox, Andi Kleen,
	James Bottomley

On 05/25/2009 10:50 AM, Christoph Hellwig wrote:
> On Mon, May 25, 2009 at 09:46:47AM +0200, Jens Axboe wrote:
>>> But that patch looks good to me, avoiding one allocation for each
>>> command and simplifying the code.  I try to remember why these were
>>> two slabs to start with but can't find any reason.
>>>

I posted an answer to that here:
http://www.spinics.net/lists/kernel/msg889604.html

It was done for none-cache-coherent systems that need to dma into sense_buffer.

>>> Btw, we might just want to declare the sense buffer directly as a sized
>>> array in the scsi command as there really doesn't seem to be a reason
>>> not to allocate it.
>> That is also a workable solution. I've been trying to cut down on the
>> number of allocations required per-IO, and there's definitely still some
>> low hanging fruit there. Some of it is already included, like the inline
>> io_vecs in the bio.
> 
> Btw, one thing I wanted to do for years is to add ->alloc_cmnd and
> ->destroy_cmnd method to the host template which optionally move the
> command allocation to the LLDD.  That way we can embedd the scsi_cmnd
> into the drivers per-commad structure and eliminate another memory
> allocation.  

It is nice in theory, but when trying to implement I encountered some
problems.

1. If we have a machine with few type of hosts active each with it's own
cmnd_slab we end up with many more slabs then today. Even though at the
end they all happen to be of the same size. (With the pool reserves it
can get big also).

2. Some considerations are system-wide and system-dependent (like above
   problem) and should be centralized into one place so if/when things
   change they can be changed in one place.
2.1. Don't trust driver writers to do the right thing.

3. There are common needs that are cross drivers, and no code should be duplicated.
   For example Bidi-Commands, use of scsi_ptr, ISA_DMA, ... and such not.

I totally agree with the need and robustness this will give...

So I think we might approach this from a slightly different way.

Hosts specify an size_of_private_command at host template, which might include
the common-scsi_cmnd + sense_buffer + private_cmnd + optional scsi_ptr +
bidi_data_buffer + ...

scsi_ml has a base-two-sized set of slabs that get allocated on first use
(at host registration) and hosts get to share the pools with same size.
[Alternatively hosts just keep reserved-commands list and regular use gets
 kmalloced]

All handling is centralized, with special needs specified at host template
like dma_mask ISA_flags and such.

> Also this would naturally extend the keep one cmnd pool
> to drivers without requiring additional code.  As a second step it
> would also allow killing the scsi_host_cmd_pool byt just having
> a set of library routines that drivers which need SLAB_CACHE_DMA can
> use.
> 

I'm afraid this will need to be done first. Layout the new facilities
and implement today's lowest-denominator on top of that. Then convert
driver by driver. Finally remove the old croft.

Lets all agree on a rough sketch and we can all get behind it. There are
a few people I know that will help, Matthew Wilcox, Me , perhaps Jens
and Christoph.

This will also finally help Andi Kleen's needs with the masked allocators

Boaz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-25 10:33         ` Boaz Harrosh
@ 2009-05-25 10:42           ` Christoph Hellwig
  2009-05-25 10:49             ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2009-05-25 10:42 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Christoph Hellwig, Jens Axboe, linux-kernel, linux-fsdevel,
	chris.mason, david, akpm, jack, yanmin_zhang, linux-scsi,
	Matthew Wilcox, Andi Kleen, James Bottomley

On Mon, May 25, 2009 at 01:33:42PM +0300, Boaz Harrosh wrote:
> 1. If we have a machine with few type of hosts active each with it's own
> cmnd_slab we end up with many more slabs then today. Even though at the
> end they all happen to be of the same size. (With the pool reserves it
> can get big also).

Note that this should be optional.  Device not having their own
per-command structure would continue using the global pools.  Those
that have their own per-command structures already have their own pools
anyway.

> Hosts specify an size_of_private_command at host template, which might include
> the common-scsi_cmnd + sense_buffer + private_cmnd + optional scsi_ptr +
> bidi_data_buffer + ...

That sounds fine, too. 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-25 10:42           ` Christoph Hellwig
@ 2009-05-25 10:49             ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2009-05-25 10:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Boaz Harrosh, linux-kernel, linux-fsdevel, chris.mason, david,
	akpm, jack, yanmin_zhang, linux-scsi, Matthew Wilcox, Andi Kleen,
	James Bottomley

On Mon, May 25 2009, Christoph Hellwig wrote:
> On Mon, May 25, 2009 at 01:33:42PM +0300, Boaz Harrosh wrote:
> > 1. If we have a machine with few type of hosts active each with it's own
> > cmnd_slab we end up with many more slabs then today. Even though at the
> > end they all happen to be of the same size. (With the pool reserves it
> > can get big also).
> 
> Note that this should be optional.  Device not having their own
> per-command structure would continue using the global pools.  Those
> that have their own per-command structures already have their own pools
> anyway.

The multiple pools of the same size "issue" can also easily be resolved
by having SCSI provide a way to setup/destroy these pools. Then it can
just reuse an existing pool, if it has the same size.

However, I doubt that this is really a real life issue that's worth
worrying about.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-25  7:50       ` Christoph Hellwig
  2009-05-25  7:54         ` Jens Axboe
  2009-05-25 10:33         ` Boaz Harrosh
@ 2009-05-26  4:36         ` FUJITA Tomonori
  2009-05-26  5:08           ` FUJITA Tomonori
  2 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2009-05-26  4:36 UTC (permalink / raw)
  To: hch
  Cc: jens.axboe, linux-kernel, linux-fsdevel, chris.mason, david, akpm,
	jack, yanmin_zhang, linux-scsi

On Mon, 25 May 2009 03:50:08 -0400
Christoph Hellwig <hch@infradead.org> wrote:

> On Mon, May 25, 2009 at 09:46:47AM +0200, Jens Axboe wrote:
> > > But that patch looks good to me, avoiding one allocation for each
> > > command and simplifying the code.  I try to remember why these were
> > > two slabs to start with but can't find any reason.
> > > 
> > > Btw, we might just want to declare the sense buffer directly as a sized
> > > array in the scsi command as there really doesn't seem to be a reason
> > > not to allocate it.
> > 
> > That is also a workable solution. I've been trying to cut down on the
> > number of allocations required per-IO, and there's definitely still some
> > low hanging fruit there. Some of it is already included, like the inline
> > io_vecs in the bio.
> 
> Btw, one thing I wanted to do for years is to add ->alloc_cmnd and
> ->destroy_cmnd method to the host template which optionally move the
> command allocation to the LLDD.  That way we can embedd the scsi_cmnd
> into the drivers per-commad structure and eliminate another memory
> allocation.  Also this would naturally extend the keep one cmnd pool
> to drivers without requiring additional code.  As a second step it
> would also allow killing the scsi_host_cmd_pool byt just having
> a set of library routines that drivers which need SLAB_CACHE_DMA can
> use.

We discussed this idea when I rewrote the sense allocation code, I
think.

I like that idea that unifying scsi_cmnd and llds' per-commad
structure however there is one tricky thing about it.

Currently, a lld frees (or reuses) its per-commad structure when it
calls scsi_done(). SCSI-ml uses scsi_cmd after that so we need to
change the lifetime management (so we need to inspect all the llds,
e.g. this change will break iscsi ldd).

With that change, we can't tell llds how many per-commad structure are
possibly necessary. In general, LLDs want to know the maximum number
of per-commad structure; drivers allocates the number of per-commad
structure equal to host_template->can_queue.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-26  4:36         ` FUJITA Tomonori
@ 2009-05-26  5:08           ` FUJITA Tomonori
  0 siblings, 0 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2009-05-26  5:08 UTC (permalink / raw)
  To: fujita.tomonori
  Cc: hch, jens.axboe, linux-kernel, linux-fsdevel, chris.mason, david,
	akpm, jack, yanmin_zhang, linux-scsi

On Tue, 26 May 2009 13:36:43 +0900
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:

> On Mon, 25 May 2009 03:50:08 -0400
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Mon, May 25, 2009 at 09:46:47AM +0200, Jens Axboe wrote:
> > > > But that patch looks good to me, avoiding one allocation for each
> > > > command and simplifying the code.  I try to remember why these were
> > > > two slabs to start with but can't find any reason.
> > > > 
> > > > Btw, we might just want to declare the sense buffer directly as a sized
> > > > array in the scsi command as there really doesn't seem to be a reason
> > > > not to allocate it.
> > > 
> > > That is also a workable solution. I've been trying to cut down on the
> > > number of allocations required per-IO, and there's definitely still some
> > > low hanging fruit there. Some of it is already included, like the inline
> > > io_vecs in the bio.
> > 
> > Btw, one thing I wanted to do for years is to add ->alloc_cmnd and
> > ->destroy_cmnd method to the host template which optionally move the
> > command allocation to the LLDD.  That way we can embedd the scsi_cmnd
> > into the drivers per-commad structure and eliminate another memory
> > allocation.  Also this would naturally extend the keep one cmnd pool
> > to drivers without requiring additional code.  As a second step it
> > would also allow killing the scsi_host_cmd_pool byt just having
> > a set of library routines that drivers which need SLAB_CACHE_DMA can
> > use.
> 
> We discussed this idea when I rewrote the sense allocation code, I
> think.
> 
> I like that idea that unifying scsi_cmnd and llds' per-commad
> structure however there is one tricky thing about it.
> 
> Currently, a lld frees (or reuses) its per-commad structure when it
> calls scsi_done(). SCSI-ml uses scsi_cmd after that so we need to
> change the lifetime management (so we need to inspect all the llds,
> e.g. this change will break iscsi ldd).

Oops, as you said, this can be optional (so we don't need to convert
all llds). But as I said, this changes the definition of when
scsi_cmnd is free and ldds don't like that change, I think.


> With that change, we can't tell llds how many per-commad structure are
> possibly necessary. In general, LLDs want to know the maximum number
> of per-commad structure; drivers allocates the number of per-commad
> structure equal to host_template->can_queue.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
       [not found]   ` <20090526062952.GB11363@kernel.dk>
@ 2009-05-26  7:25     ` FUJITA Tomonori
  2009-05-26  7:32       ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2009-05-26  7:25 UTC (permalink / raw)
  To: jens.axboe, James.Bottomley
  Cc: fujita.tomonori, rdreier, bharrosh, linux-kernel, linux-fsdevel,
	chris.mason, david, hch, akpm, jack, yanmin_zhang, linux-scsi

On Tue, 26 May 2009 08:29:53 +0200
Jens Axboe <jens.axboe@oracle.com> wrote:

> On Tue, May 26 2009, FUJITA Tomonori wrote:
> > On Mon, 25 May 2009 18:45:25 -0700
> > Roland Dreier <rdreier@cisco.com> wrote:
> > 
> > >  > Ideally there should be a MACRO that is defined to WORD_SIZE on cache-coherent
> > >  > ARCHs and to SMP_CACHE_BYTES on none-cache-coherent systems and use that size
> > >  > at the __align() attribute. (So only stupid ARCHES get hurt)
> > > 
> > > this seems to come up repeatedly -- I had a proposal a _long_ time ago
> > > that never quite got merged, cf http://lwn.net/Articles/2265/ and
> > > http://lwn.net/Articles/2269/ -- from 2002 (!?).  The idea is to go a
> > 
> > Yeah, I think that Benjamin did last time:
> > 
> > http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg12632.html
> > 
> > IIRC, James didn't like it so I wrote the current code. I didn't see
> > any big performance difference with scsi_debug:
> > 
> > http://marc.info/?l=linux-scsi&m=120038907123706&w=2
> > 
> > Jens, you see the performance difference due to this unification?
> 
> Yes, it's definitely a worth while optimization. The problem isn't as
> such this specific allocation, it's the total number of allocations we
> do for a piece of IO. This sense buffer one is just one of many, I'm
> continually working to reduce them. If we get rid of this one and add
> the ->alloc_cmd() stuff, we can kill one more. The bio path already lost
> one. So in the IO stack, we went from 6 allocations to 3 for a piece of
> IO. And then it starts to add up. Even at just 30-50k iops, that's more
> than 1% of time in the testing I did.

I see, thanks. Hmm, possibly slab becomes slower. ;)

Then I think that we need something like the ->alloc_cmd()
method. Let's ask James. 

I don't think that it's just about simply adding the hook; there are
some issues that we need to think about. Though Boaz worries too much
a bit, I think.

I'm not sure about this patch if we add ->alloc_cmd(). I doubt that
there are any llds don't use ->alloc_cmd() worry about the overhead of
the separated sense buffer allocation. If a lld doesn't define the own
alloc_cmd, then I think it's fine to use the generic command
allocator that does the separate sense buffer allocation.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-26  7:25     ` FUJITA Tomonori
@ 2009-05-26  7:32       ` Jens Axboe
  2009-05-26  7:38         ` FUJITA Tomonori
  2009-05-26  7:56         ` FUJITA Tomonori
  0 siblings, 2 replies; 23+ messages in thread
From: Jens Axboe @ 2009-05-26  7:32 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: James.Bottomley, rdreier, bharrosh, linux-kernel, linux-fsdevel,
	chris.mason, david, hch, akpm, jack, yanmin_zhang, linux-scsi

On Tue, May 26 2009, FUJITA Tomonori wrote:
> On Tue, 26 May 2009 08:29:53 +0200
> Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Tue, May 26 2009, FUJITA Tomonori wrote:
> > > On Mon, 25 May 2009 18:45:25 -0700
> > > Roland Dreier <rdreier@cisco.com> wrote:
> > > 
> > > >  > Ideally there should be a MACRO that is defined to WORD_SIZE on cache-coherent
> > > >  > ARCHs and to SMP_CACHE_BYTES on none-cache-coherent systems and use that size
> > > >  > at the __align() attribute. (So only stupid ARCHES get hurt)
> > > > 
> > > > this seems to come up repeatedly -- I had a proposal a _long_ time ago
> > > > that never quite got merged, cf http://lwn.net/Articles/2265/ and
> > > > http://lwn.net/Articles/2269/ -- from 2002 (!?).  The idea is to go a
> > > 
> > > Yeah, I think that Benjamin did last time:
> > > 
> > > http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg12632.html
> > > 
> > > IIRC, James didn't like it so I wrote the current code. I didn't see
> > > any big performance difference with scsi_debug:
> > > 
> > > http://marc.info/?l=linux-scsi&m=120038907123706&w=2
> > > 
> > > Jens, you see the performance difference due to this unification?
> > 
> > Yes, it's definitely a worth while optimization. The problem isn't as
> > such this specific allocation, it's the total number of allocations we
> > do for a piece of IO. This sense buffer one is just one of many, I'm
> > continually working to reduce them. If we get rid of this one and add
> > the ->alloc_cmd() stuff, we can kill one more. The bio path already lost
> > one. So in the IO stack, we went from 6 allocations to 3 for a piece of
> > IO. And then it starts to add up. Even at just 30-50k iops, that's more
> > than 1% of time in the testing I did.
> 
> I see, thanks. Hmm, possibly slab becomes slower. ;)
> 
> Then I think that we need something like the ->alloc_cmd()
> method. Let's ask James. 
> 
> I don't think that it's just about simply adding the hook; there are
> some issues that we need to think about. Though Boaz worries too much
> a bit, I think.
> 
> I'm not sure about this patch if we add ->alloc_cmd(). I doubt that
> there are any llds don't use ->alloc_cmd() worry about the overhead of
> the separated sense buffer allocation. If a lld doesn't define the own
> alloc_cmd, then I think it's fine to use the generic command
> allocator that does the separate sense buffer allocation.

I think we should do the two things seperately. If we can safely inline
the sense buffer in the command by doing the right alignment, then lets
do that. The ->alloc_cmd() approach will be easier to do with an inline
sense buffer.

But there's really no reason to tie the two things together.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-26  7:32       ` Jens Axboe
@ 2009-05-26  7:38         ` FUJITA Tomonori
  2009-05-26 14:47           ` James Bottomley
  2009-05-26  7:56         ` FUJITA Tomonori
  1 sibling, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2009-05-26  7:38 UTC (permalink / raw)
  To: jens.axboe
  Cc: fujita.tomonori, James.Bottomley, rdreier, bharrosh, linux-kernel,
	linux-fsdevel, chris.mason, david, hch, akpm, jack, yanmin_zhang,
	linux-scsi

On Tue, 26 May 2009 09:32:29 +0200
Jens Axboe <jens.axboe@oracle.com> wrote:

> On Tue, May 26 2009, FUJITA Tomonori wrote:
> > On Tue, 26 May 2009 08:29:53 +0200
> > Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > On Tue, May 26 2009, FUJITA Tomonori wrote:
> > > > On Mon, 25 May 2009 18:45:25 -0700
> > > > Roland Dreier <rdreier@cisco.com> wrote:
> > > > 
> > > > >  > Ideally there should be a MACRO that is defined to WORD_SIZE on cache-coherent
> > > > >  > ARCHs and to SMP_CACHE_BYTES on none-cache-coherent systems and use that size
> > > > >  > at the __align() attribute. (So only stupid ARCHES get hurt)
> > > > > 
> > > > > this seems to come up repeatedly -- I had a proposal a _long_ time ago
> > > > > that never quite got merged, cf http://lwn.net/Articles/2265/ and
> > > > > http://lwn.net/Articles/2269/ -- from 2002 (!?).  The idea is to go a
> > > > 
> > > > Yeah, I think that Benjamin did last time:
> > > > 
> > > > http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg12632.html
> > > > 
> > > > IIRC, James didn't like it so I wrote the current code. I didn't see
> > > > any big performance difference with scsi_debug:
> > > > 
> > > > http://marc.info/?l=linux-scsi&m=120038907123706&w=2
> > > > 
> > > > Jens, you see the performance difference due to this unification?
> > > 
> > > Yes, it's definitely a worth while optimization. The problem isn't as
> > > such this specific allocation, it's the total number of allocations we
> > > do for a piece of IO. This sense buffer one is just one of many, I'm
> > > continually working to reduce them. If we get rid of this one and add
> > > the ->alloc_cmd() stuff, we can kill one more. The bio path already lost
> > > one. So in the IO stack, we went from 6 allocations to 3 for a piece of
> > > IO. And then it starts to add up. Even at just 30-50k iops, that's more
> > > than 1% of time in the testing I did.
> > 
> > I see, thanks. Hmm, possibly slab becomes slower. ;)
> > 
> > Then I think that we need something like the ->alloc_cmd()
> > method. Let's ask James. 
> > 
> > I don't think that it's just about simply adding the hook; there are
> > some issues that we need to think about. Though Boaz worries too much
> > a bit, I think.
> > 
> > I'm not sure about this patch if we add ->alloc_cmd(). I doubt that
> > there are any llds don't use ->alloc_cmd() worry about the overhead of
> > the separated sense buffer allocation. If a lld doesn't define the own
> > alloc_cmd, then I think it's fine to use the generic command
> > allocator that does the separate sense buffer allocation.
> 
> I think we should do the two things seperately. If we can safely inline
> the sense buffer in the command by doing the right alignment, then lets
> do that. The ->alloc_cmd() approach will be easier to do with an inline
> sense buffer.

James rejected this in the past. Let's wait for his verdict.

Yeah, we can inline the sense buffer but as we discussed in the past
several times, there are some good reasons that we should not do so, I
think.


> But there's really no reason to tie the two things together.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-26  7:32       ` Jens Axboe
  2009-05-26  7:38         ` FUJITA Tomonori
@ 2009-05-26  7:56         ` FUJITA Tomonori
  1 sibling, 0 replies; 23+ messages in thread
From: FUJITA Tomonori @ 2009-05-26  7:56 UTC (permalink / raw)
  To: jens.axboe
  Cc: fujita.tomonori, James.Bottomley, rdreier, bharrosh, linux-kernel,
	linux-fsdevel, chris.mason, david, hch, akpm, jack, yanmin_zhang,
	linux-scsi

On Tue, 26 May 2009 09:32:29 +0200
Jens Axboe <jens.axboe@oracle.com> wrote:

> On Tue, May 26 2009, FUJITA Tomonori wrote:
> > On Tue, 26 May 2009 08:29:53 +0200
> > Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > On Tue, May 26 2009, FUJITA Tomonori wrote:
> > > > On Mon, 25 May 2009 18:45:25 -0700
> > > > Roland Dreier <rdreier@cisco.com> wrote:
> > > > 
> > > > >  > Ideally there should be a MACRO that is defined to WORD_SIZE on cache-coherent
> > > > >  > ARCHs and to SMP_CACHE_BYTES on none-cache-coherent systems and use that size
> > > > >  > at the __align() attribute. (So only stupid ARCHES get hurt)
> > > > > 
> > > > > this seems to come up repeatedly -- I had a proposal a _long_ time ago
> > > > > that never quite got merged, cf http://lwn.net/Articles/2265/ and
> > > > > http://lwn.net/Articles/2269/ -- from 2002 (!?).  The idea is to go a
> > > > 
> > > > Yeah, I think that Benjamin did last time:
> > > > 
> > > > http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg12632.html
> > > > 
> > > > IIRC, James didn't like it so I wrote the current code. I didn't see
> > > > any big performance difference with scsi_debug:
> > > > 
> > > > http://marc.info/?l=linux-scsi&m=120038907123706&w=2
> > > > 
> > > > Jens, you see the performance difference due to this unification?
> > > 
> > > Yes, it's definitely a worth while optimization. The problem isn't as
> > > such this specific allocation, it's the total number of allocations we
> > > do for a piece of IO. This sense buffer one is just one of many, I'm
> > > continually working to reduce them. If we get rid of this one and add
> > > the ->alloc_cmd() stuff, we can kill one more. The bio path already lost
> > > one. So in the IO stack, we went from 6 allocations to 3 for a piece of
> > > IO. And then it starts to add up. Even at just 30-50k iops, that's more
> > > than 1% of time in the testing I did.
> > 
> > I see, thanks. Hmm, possibly slab becomes slower. ;)
> > 
> > Then I think that we need something like the ->alloc_cmd()
> > method. Let's ask James. 
> > 
> > I don't think that it's just about simply adding the hook; there are
> > some issues that we need to think about. Though Boaz worries too much
> > a bit, I think.
> > 
> > I'm not sure about this patch if we add ->alloc_cmd(). I doubt that
> > there are any llds don't use ->alloc_cmd() worry about the overhead of
> > the separated sense buffer allocation. If a lld doesn't define the own
> > alloc_cmd, then I think it's fine to use the generic command
> > allocator that does the separate sense buffer allocation.
> 
> I think we should do the two things seperately. If we can safely inline
> the sense buffer in the command by doing the right alignment, then lets
> do that. The ->alloc_cmd() approach will be easier to do with an inline
> sense buffer.

BTW, only alignment is not enough (Boaz didn't point out it, I
think). You need alignment and a hole after the buffer:

http://lkml.org/lkml/2007/12/20/661


I think that it is one of these good reasons that we should not inline
the sense buffer. We will enlarge scsi_cmnd lots.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-26  7:38         ` FUJITA Tomonori
@ 2009-05-26 14:47           ` James Bottomley
  2009-05-26 15:13             ` Matthew Wilcox
                               ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: James Bottomley @ 2009-05-26 14:47 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: jens.axboe, rdreier, bharrosh, linux-kernel, linux-fsdevel,
	chris.mason, david, hch, akpm, jack, yanmin_zhang, linux-scsi

On Tue, 2009-05-26 at 16:38 +0900, FUJITA Tomonori wrote:
> On Tue, 26 May 2009 09:32:29 +0200
> Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Tue, May 26 2009, FUJITA Tomonori wrote:
> > > On Tue, 26 May 2009 08:29:53 +0200
> > > Jens Axboe <jens.axboe@oracle.com> wrote:
> > > 
> > > > On Tue, May 26 2009, FUJITA Tomonori wrote:
> > > > > On Mon, 25 May 2009 18:45:25 -0700
> > > > > Roland Dreier <rdreier@cisco.com> wrote:
> > > > > 
> > > > > >  > Ideally there should be a MACRO that is defined to WORD_SIZE on cache-coherent
> > > > > >  > ARCHs and to SMP_CACHE_BYTES on none-cache-coherent systems and use that size
> > > > > >  > at the __align() attribute. (So only stupid ARCHES get hurt)
> > > > > > 
> > > > > > this seems to come up repeatedly -- I had a proposal a _long_ time ago
> > > > > > that never quite got merged, cf http://lwn.net/Articles/2265/ and
> > > > > > http://lwn.net/Articles/2269/ -- from 2002 (!?).  The idea is to go a
> > > > > 
> > > > > Yeah, I think that Benjamin did last time:
> > > > > 
> > > > > http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg12632.html
> > > > > 
> > > > > IIRC, James didn't like it so I wrote the current code. I didn't see
> > > > > any big performance difference with scsi_debug:
> > > > > 
> > > > > http://marc.info/?l=linux-scsi&m=120038907123706&w=2
> > > > > 
> > > > > Jens, you see the performance difference due to this unification?
> > > > 
> > > > Yes, it's definitely a worth while optimization. The problem isn't as
> > > > such this specific allocation, it's the total number of allocations we
> > > > do for a piece of IO. This sense buffer one is just one of many, I'm
> > > > continually working to reduce them. If we get rid of this one and add
> > > > the ->alloc_cmd() stuff, we can kill one more. The bio path already lost
> > > > one. So in the IO stack, we went from 6 allocations to 3 for a piece of
> > > > IO. And then it starts to add up. Even at just 30-50k iops, that's more
> > > > than 1% of time in the testing I did.
> > > 
> > > I see, thanks. Hmm, possibly slab becomes slower. ;)
> > > 
> > > Then I think that we need something like the ->alloc_cmd()
> > > method. Let's ask James. 
> > > 
> > > I don't think that it's just about simply adding the hook; there are
> > > some issues that we need to think about. Though Boaz worries too much
> > > a bit, I think.
> > > 
> > > I'm not sure about this patch if we add ->alloc_cmd(). I doubt that
> > > there are any llds don't use ->alloc_cmd() worry about the overhead of
> > > the separated sense buffer allocation. If a lld doesn't define the own
> > > alloc_cmd, then I think it's fine to use the generic command
> > > allocator that does the separate sense buffer allocation.
> > 
> > I think we should do the two things seperately. If we can safely inline
> > the sense buffer in the command by doing the right alignment, then lets
> > do that. The ->alloc_cmd() approach will be easier to do with an inline
> > sense buffer.
> 
> James rejected this in the past. Let's wait for his verdict.

OK, so the reason for the original problems where the sense buffer was
inlined with the scsi_command was that we need to DMA to the sense
buffer but not to the command.  Plus the command is in fairly constant
use so we get cacheline interference unless they're always in separate
caches.  This necessitates opening up a hole in the command to achieve
this (you can separate to the next cache line if you can guarantee that
the command always begins on a cacheline.  If not, it has to be
2*cacheline).  The L1 cacheline can be up to 128 bytes on some
architectures, so we'd need to know the waste of space is worth it in
terms of speed.  The other problem is that the entire command now has to
be allocated in DMAable memory, which restricts the allocation on some
systems.

> Yeah, we can inline the sense buffer but as we discussed in the past
> several times, there are some good reasons that we should not do so, I
> think.

There are several other approaches:

     1. Keep the sense buffer packed in the command but disallow DMA to
        it, which fixes all the alignment problems.  Then we supply a
        set of rotating DMA buffers to drivers which need to do the DMA
        (which isn't the majority).
     2. Sense is a comparative rarity, so us a more compact pooling
        scheme and discard sense for reuse as soon as we know it's not
        used (as in at softirq time when there's no sense collected).

I'd need a little more clarity on the actual size of the problem before
making any choices.

The other thing to bear in mind is that two allocations of M and N might
be more costly than a single allocation of N+M; however, an allocation
of M+N+extra can end up more costly if the extra causes more page
reclaim before we get an actual command.

James




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-26 14:47           ` James Bottomley
@ 2009-05-26 15:13             ` Matthew Wilcox
  2009-05-26 15:31             ` FUJITA Tomonori
  2009-05-26 16:12             ` Boaz Harrosh
  2 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2009-05-26 15:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: FUJITA Tomonori, jens.axboe, rdreier, bharrosh, linux-kernel,
	linux-fsdevel, chris.mason, david, hch, akpm, jack, yanmin_zhang,
	linux-scsi

On Tue, May 26, 2009 at 09:47:02AM -0500, James Bottomley wrote:
> > Yeah, we can inline the sense buffer but as we discussed in the past
> > several times, there are some good reasons that we should not do so, I
> > think.
> 
> There are several other approaches:
> 
>      1. Keep the sense buffer packed in the command but disallow DMA to
>         it, which fixes all the alignment problems.  Then we supply a
>         set of rotating DMA buffers to drivers which need to do the DMA
>         (which isn't the majority).
>      2. Sense is a comparative rarity, so us a more compact pooling
>         scheme and discard sense for reuse as soon as we know it's not
>         used (as in at softirq time when there's no sense collected).
> 
> I'd need a little more clarity on the actual size of the problem before
> making any choices.

I'm not sure if this is what you meant by option 2 or not, but one
proposal was to keep a number of sense buffers around per-host, and only
allocate extras when we run close to empty.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-26 14:47           ` James Bottomley
  2009-05-26 15:13             ` Matthew Wilcox
@ 2009-05-26 15:31             ` FUJITA Tomonori
  2009-05-26 16:05               ` Boaz Harrosh
  2009-05-26 16:12             ` Boaz Harrosh
  2 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2009-05-26 15:31 UTC (permalink / raw)
  To: James.Bottomley
  Cc: fujita.tomonori, jens.axboe, rdreier, bharrosh, linux-kernel,
	linux-fsdevel, chris.mason, david, hch, akpm, jack, yanmin_zhang,
	linux-scsi

On Tue, 26 May 2009 09:47:02 -0500
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Tue, 2009-05-26 at 16:38 +0900, FUJITA Tomonori wrote:
> > On Tue, 26 May 2009 09:32:29 +0200
> > Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > On Tue, May 26 2009, FUJITA Tomonori wrote:
> > > > On Tue, 26 May 2009 08:29:53 +0200
> > > > Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > 
> > > > > On Tue, May 26 2009, FUJITA Tomonori wrote:
> > > > > > On Mon, 25 May 2009 18:45:25 -0700
> > > > > > Roland Dreier <rdreier@cisco.com> wrote:
> > > > > > 
> > > > > > >  > Ideally there should be a MACRO that is defined to WORD_SIZE on cache-coherent
> > > > > > >  > ARCHs and to SMP_CACHE_BYTES on none-cache-coherent systems and use that size
> > > > > > >  > at the __align() attribute. (So only stupid ARCHES get hurt)
> > > > > > > 
> > > > > > > this seems to come up repeatedly -- I had a proposal a _long_ time ago
> > > > > > > that never quite got merged, cf http://lwn.net/Articles/2265/ and
> > > > > > > http://lwn.net/Articles/2269/ -- from 2002 (!?).  The idea is to go a
> > > > > > 
> > > > > > Yeah, I think that Benjamin did last time:
> > > > > > 
> > > > > > http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg12632.html
> > > > > > 
> > > > > > IIRC, James didn't like it so I wrote the current code. I didn't see
> > > > > > any big performance difference with scsi_debug:
> > > > > > 
> > > > > > http://marc.info/?l=linux-scsi&m=120038907123706&w=2
> > > > > > 
> > > > > > Jens, you see the performance difference due to this unification?
> > > > > 
> > > > > Yes, it's definitely a worth while optimization. The problem isn't as
> > > > > such this specific allocation, it's the total number of allocations we
> > > > > do for a piece of IO. This sense buffer one is just one of many, I'm
> > > > > continually working to reduce them. If we get rid of this one and add
> > > > > the ->alloc_cmd() stuff, we can kill one more. The bio path already lost
> > > > > one. So in the IO stack, we went from 6 allocations to 3 for a piece of
> > > > > IO. And then it starts to add up. Even at just 30-50k iops, that's more
> > > > > than 1% of time in the testing I did.
> > > > 
> > > > I see, thanks. Hmm, possibly slab becomes slower. ;)
> > > > 
> > > > Then I think that we need something like the ->alloc_cmd()
> > > > method. Let's ask James. 
> > > > 
> > > > I don't think that it's just about simply adding the hook; there are
> > > > some issues that we need to think about. Though Boaz worries too much
> > > > a bit, I think.
> > > > 
> > > > I'm not sure about this patch if we add ->alloc_cmd(). I doubt that
> > > > there are any llds don't use ->alloc_cmd() worry about the overhead of
> > > > the separated sense buffer allocation. If a lld doesn't define the own
> > > > alloc_cmd, then I think it's fine to use the generic command
> > > > allocator that does the separate sense buffer allocation.
> > > 
> > > I think we should do the two things seperately. If we can safely inline
> > > the sense buffer in the command by doing the right alignment, then lets
> > > do that. The ->alloc_cmd() approach will be easier to do with an inline
> > > sense buffer.
> > 
> > James rejected this in the past. Let's wait for his verdict.
> 
> OK, so the reason for the original problems where the sense buffer was
> inlined with the scsi_command was that we need to DMA to the sense
> buffer but not to the command.  Plus the command is in fairly constant
> use so we get cacheline interference unless they're always in separate
> caches.  This necessitates opening up a hole in the command to achieve
> this (you can separate to the next cache line if you can guarantee that
> the command always begins on a cacheline.  If not, it has to be
> 2*cacheline).  The L1 cacheline can be up to 128 bytes on some
> architectures, so we'd need to know the waste of space is worth it in
> terms of speed.  The other problem is that the entire command now has to
> be allocated in DMAable memory, which restricts the allocation on some
> systems.

Yeah, I think that there are good reasons why we shouldn't inline the
sense buffer. As I already wrote, seems that the DMA requirement
wasn't properly understood; it's not about the alignment.


> > Yeah, we can inline the sense buffer but as we discussed in the past
> > several times, there are some good reasons that we should not do so, I
> > think.
> 
> There are several other approaches:
> 
>      1. Keep the sense buffer packed in the command but disallow DMA to
>         it, which fixes all the alignment problems.  Then we supply a
>         set of rotating DMA buffers to drivers which need to do the DMA
>         (which isn't the majority).

Can we just fix some drivers not to do the DMA with the sense buffer in
scsi_cmnd? IIRC, there are only five or six drivers that do such.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-26 15:31             ` FUJITA Tomonori
@ 2009-05-26 16:05               ` Boaz Harrosh
  2009-05-27  1:36                 ` FUJITA Tomonori
  0 siblings, 1 reply; 23+ messages in thread
From: Boaz Harrosh @ 2009-05-26 16:05 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: James.Bottomley, jens.axboe, rdreier, linux-kernel, linux-fsdevel,
	chris.mason, david, hch, akpm, jack, yanmin_zhang, linux-scsi

On 05/26/2009 06:31 PM, FUJITA Tomonori wrote:
> 
> Can we just fix some drivers not to do the DMA with the sense buffer in
> scsi_cmnd? IIRC, there are only five or six drivers that do such.

This is not so.
All drivers that go through scsi_eh_prep_cmnd() will eventually DMA through
the regular read path. Including all the drivers that do nothing and let
scsi-ml do the REQUEST_SENSE

Actually I have exact numbers, from the last time I did all that

Boaz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-26 14:47           ` James Bottomley
  2009-05-26 15:13             ` Matthew Wilcox
  2009-05-26 15:31             ` FUJITA Tomonori
@ 2009-05-26 16:12             ` Boaz Harrosh
  2009-05-26 16:28               ` Boaz Harrosh
  2 siblings, 1 reply; 23+ messages in thread
From: Boaz Harrosh @ 2009-05-26 16:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: FUJITA Tomonori, jens.axboe, rdreier, linux-kernel, linux-fsdevel,
	chris.mason, david, hch, akpm, jack, yanmin_zhang, linux-scsi

On 05/26/2009 05:47 PM, James Bottomley wrote:
> 
> There are several other approaches:
> 
>      1. Keep the sense buffer packed in the command but disallow DMA to
>         it, which fixes all the alignment problems.  Then we supply a
>         set of rotating DMA buffers to drivers which need to do the DMA
>         (which isn't the majority).

This one is not possible because it is scsi-ml in majority of cases that
does the DMA request through scsi_eh_prep_cmnd() and a regular read.
The drivers don't even know anything about it.

>      2. Sense is a comparative rarity, so us a more compact pooling
>         scheme and discard sense for reuse as soon as we know it's not
>         used (as in at softirq time when there's no sense collected).
> 

This is the way to go for sure. And only on ARCHs with none-coherent-cache
all the good ARCHs can just use embedded sense just fine.

> I'd need a little more clarity on the actual size of the problem before
> making any choices.
> 
> The other thing to bear in mind is that two allocations of M and N might
> be more costly than a single allocation of N+M; however, an allocation
> of M+N+extra can end up more costly if the extra causes more page
> reclaim before we get an actual command.
> 
> James
> 
Boaz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-26 16:12             ` Boaz Harrosh
@ 2009-05-26 16:28               ` Boaz Harrosh
  0 siblings, 0 replies; 23+ messages in thread
From: Boaz Harrosh @ 2009-05-26 16:28 UTC (permalink / raw)
  To: James Bottomley
  Cc: FUJITA Tomonori, jens.axboe, rdreier, linux-kernel, linux-fsdevel,
	chris.mason, david, hch, akpm, jack, yanmin_zhang, linux-scsi

On 05/26/2009 07:12 PM, Boaz Harrosh wrote:
> On 05/26/2009 05:47 PM, James Bottomley wrote:
>> There are several other approaches:
>>
>>      1. Keep the sense buffer packed in the command but disallow DMA to
>>         it, which fixes all the alignment problems.  Then we supply a
>>         set of rotating DMA buffers to drivers which need to do the DMA
>>         (which isn't the majority).
> 
> This one is not possible because it is scsi-ml in majority of cases that
> does the DMA request through scsi_eh_prep_cmnd() and a regular read.
> The drivers don't even know anything about it.
> 

I retract that no, yes and scsi-ml is one more possible client of
the "rotating DMA buffers"

>>      2. Sense is a comparative rarity, so us a more compact pooling
>>         scheme and discard sense for reuse as soon as we know it's not
>>         used (as in at softirq time when there's no sense collected).
>>
> 
> This is the way to go for sure. And only on ARCHs with none-coherent-cache
> all the good ARCHs can just use embedded sense just fine.
> 
>> I'd need a little more clarity on the actual size of the problem before
>> making any choices.
>>
>> The other thing to bear in mind is that two allocations of M and N might
>> be more costly than a single allocation of N+M; however, an allocation
>> of M+N+extra can end up more costly if the extra causes more page
>> reclaim before we get an actual command.
>>
>> James
>>
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-26 16:05               ` Boaz Harrosh
@ 2009-05-27  1:36                 ` FUJITA Tomonori
  2009-05-27  7:54                   ` Boaz Harrosh
  0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2009-05-27  1:36 UTC (permalink / raw)
  To: bharrosh
  Cc: fujita.tomonori, James.Bottomley, jens.axboe, rdreier,
	linux-kernel, linux-fsdevel, chris.mason, david, hch, akpm, jack,
	yanmin_zhang, linux-scsi

On Tue, 26 May 2009 19:05:05 +0300
Boaz Harrosh <bharrosh@panasas.com> wrote:

> On 05/26/2009 06:31 PM, FUJITA Tomonori wrote:
> > 
> > Can we just fix some drivers not to do the DMA with the sense buffer in
> > scsi_cmnd? IIRC, there are only five or six drivers that do such.
> 
> This is not so.
> All drivers that go through scsi_eh_prep_cmnd() will eventually DMA through
> the regular read path. Including all the drivers that do nothing and let
> scsi-ml do the REQUEST_SENSE
> 
> Actually I have exact numbers, from the last time I did all that

Hmm, we discussed this before, I think.

scsi-ml uses scsi_eh_prep_cmnd only via scsi_send_eh_cmnd(). There are
some users of scsi_send_eh_cmnd in scsi-ml but only scsi_request_sense
does the DMA in the sense_buffer of scsi_cmnd.

Only scsi_error_handler() uses scsi_request_sense() and
scsi_send_eh_cmnd() works synchronously. So scsi-ml can easily avoid
the the DMA in the sense_buffer of scsi_cmnd if we have one sense
buffer per scsi_host.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-27  1:36                 ` FUJITA Tomonori
@ 2009-05-27  7:54                   ` Boaz Harrosh
  2009-05-27  8:26                     ` FUJITA Tomonori
  0 siblings, 1 reply; 23+ messages in thread
From: Boaz Harrosh @ 2009-05-27  7:54 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: James.Bottomley, jens.axboe, rdreier, linux-kernel, linux-fsdevel,
	chris.mason, david, hch, akpm, jack, yanmin_zhang, linux-scsi

On 05/27/2009 04:36 AM, FUJITA Tomonori wrote:
> On Tue, 26 May 2009 19:05:05 +0300
> Boaz Harrosh <bharrosh@panasas.com> wrote:
> 
>> On 05/26/2009 06:31 PM, FUJITA Tomonori wrote:
>>> Can we just fix some drivers not to do the DMA with the sense buffer in
>>> scsi_cmnd? IIRC, there are only five or six drivers that do such.
>> This is not so.
>> All drivers that go through scsi_eh_prep_cmnd() will eventually DMA through
>> the regular read path. Including all the drivers that do nothing and let
>> scsi-ml do the REQUEST_SENSE
>>
>> Actually I have exact numbers, from the last time I did all that
> 
> Hmm, we discussed this before, I think.
> 

Sure we did I sent these patches. to summarize, 3 types of drivers:
1. Only memcpy into sense_buffer				- 60%
2. Use scsi_eh_prep_cmnd and DMA read into sense.
2.1 Do nothing and scsi-ml does scsi_eh_prep_cmnd		- 30%
3. Prepare DMA descriptors for sense_buffer before execution	- 10%

> scsi-ml uses scsi_eh_prep_cmnd only via scsi_send_eh_cmnd(). There are
> some users of scsi_send_eh_cmnd in scsi-ml but only scsi_request_sense
> does the DMA in the sense_buffer of scsi_cmnd.
> 

Also drivers use scsi_eh_prep_cmnd at interrupt time and proceed to
DMA into the sense_buffer.

> Only scsi_error_handler() uses scsi_request_sense() and
> scsi_send_eh_cmnd() works synchronously. So scsi-ml can easily avoid
> the the DMA in the sense_buffer of scsi_cmnd if we have one sense
> buffer per scsi_host.

Not so. As James explained then, once you have a CHECK_CONDITION return, the
Q-per-host is frozen, yes. But as soon as you send the REQUEST_SENSE the 
target Q is unfrozen again and all in-flight commands can error, much before
the REQUEST_SENSE returns.

Boaz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-27  7:54                   ` Boaz Harrosh
@ 2009-05-27  8:26                     ` FUJITA Tomonori
  2009-05-27  9:11                       ` Boaz Harrosh
  0 siblings, 1 reply; 23+ messages in thread
From: FUJITA Tomonori @ 2009-05-27  8:26 UTC (permalink / raw)
  To: bharrosh
  Cc: fujita.tomonori, James.Bottomley, jens.axboe, rdreier,
	linux-kernel, linux-fsdevel, chris.mason, david, hch, akpm, jack,
	yanmin_zhang, linux-scsi

On Wed, 27 May 2009 10:54:41 +0300
Boaz Harrosh <bharrosh@panasas.com> wrote:

> On 05/27/2009 04:36 AM, FUJITA Tomonori wrote:
> > On Tue, 26 May 2009 19:05:05 +0300
> > Boaz Harrosh <bharrosh@panasas.com> wrote:
> > 
> >> On 05/26/2009 06:31 PM, FUJITA Tomonori wrote:
> >>> Can we just fix some drivers not to do the DMA with the sense buffer in
> >>> scsi_cmnd? IIRC, there are only five or six drivers that do such.
> >> This is not so.
> >> All drivers that go through scsi_eh_prep_cmnd() will eventually DMA through
> >> the regular read path. Including all the drivers that do nothing and let
> >> scsi-ml do the REQUEST_SENSE
> >>
> >> Actually I have exact numbers, from the last time I did all that
> > 
> > Hmm, we discussed this before, I think.
> > 
> 
> Sure we did I sent these patches. to summarize, 3 types of drivers:
> 1. Only memcpy into sense_buffer				- 60%
> 2. Use scsi_eh_prep_cmnd and DMA read into sense.
> 2.1 Do nothing and scsi-ml does scsi_eh_prep_cmnd		- 30%
> 3. Prepare DMA descriptors for sense_buffer before execution	- 10%
> 
> > scsi-ml uses scsi_eh_prep_cmnd only via scsi_send_eh_cmnd(). There are
> > some users of scsi_send_eh_cmnd in scsi-ml but only scsi_request_sense
> > does the DMA in the sense_buffer of scsi_cmnd.
> > 
> 
> Also drivers use scsi_eh_prep_cmnd at interrupt time and proceed to
> DMA into the sense_buffer.
> 
> > Only scsi_error_handler() uses scsi_request_sense() and
> > scsi_send_eh_cmnd() works synchronously. So scsi-ml can easily avoid
> > the the DMA in the sense_buffer of scsi_cmnd if we have one sense
> > buffer per scsi_host.
> 
> Not so. As James explained then, once you have a CHECK_CONDITION return, the
> Q-per-host is frozen, yes. But as soon as you send the REQUEST_SENSE the 
> target Q is unfrozen again and all in-flight commands can error, much before
> the REQUEST_SENSE returns.

Hmm, I'm not sure what you mean.

Why is 'all in-flight commands can error' a problem? The sense_buffer
per host is used by only scsi_eh kernel thread.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer
  2009-05-27  8:26                     ` FUJITA Tomonori
@ 2009-05-27  9:11                       ` Boaz Harrosh
  0 siblings, 0 replies; 23+ messages in thread
From: Boaz Harrosh @ 2009-05-27  9:11 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: James.Bottomley, jens.axboe, rdreier, linux-kernel, linux-fsdevel,
	chris.mason, david, hch, akpm, jack, yanmin_zhang, linux-scsi

On 05/27/2009 11:26 AM, FUJITA Tomonori wrote:
> On Wed, 27 May 2009 10:54:41 +0300
> Boaz Harrosh <bharrosh@panasas.com> wrote:
> 
>> On 05/27/2009 04:36 AM, FUJITA Tomonori wrote:
>>> On Tue, 26 May 2009 19:05:05 +0300
>>> Boaz Harrosh <bharrosh@panasas.com> wrote:
>>>
>>>> On 05/26/2009 06:31 PM, FUJITA Tomonori wrote:
>>>>> Can we just fix some drivers not to do the DMA with the sense buffer in
>>>>> scsi_cmnd? IIRC, there are only five or six drivers that do such.
>>>> This is not so.
>>>> All drivers that go through scsi_eh_prep_cmnd() will eventually DMA through
>>>> the regular read path. Including all the drivers that do nothing and let
>>>> scsi-ml do the REQUEST_SENSE
>>>>
>>>> Actually I have exact numbers, from the last time I did all that
>>> Hmm, we discussed this before, I think.
>>>
>> Sure we did I sent these patches. to summarize, 3 types of drivers:
>> 1. Only memcpy into sense_buffer				- 60%
>> 2. Use scsi_eh_prep_cmnd and DMA read into sense.
>> 2.1 Do nothing and scsi-ml does scsi_eh_prep_cmnd		- 30%
>> 3. Prepare DMA descriptors for sense_buffer before execution	- 10%
>>
>>> scsi-ml uses scsi_eh_prep_cmnd only via scsi_send_eh_cmnd(). There are
>>> some users of scsi_send_eh_cmnd in scsi-ml but only scsi_request_sense
>>> does the DMA in the sense_buffer of scsi_cmnd.
>>>
>> Also drivers use scsi_eh_prep_cmnd at interrupt time and proceed to
>> DMA into the sense_buffer.
>>
>>> Only scsi_error_handler() uses scsi_request_sense() and
>>> scsi_send_eh_cmnd() works synchronously. So scsi-ml can easily avoid
>>> the the DMA in the sense_buffer of scsi_cmnd if we have one sense
>>> buffer per scsi_host.
>> Not so. As James explained then, once you have a CHECK_CONDITION return, the
>> Q-per-host is frozen, yes. But as soon as you send the REQUEST_SENSE the 
>> target Q is unfrozen again and all in-flight commands can error, much before
>> the REQUEST_SENSE returns.
> 
> Hmm, I'm not sure what you mean.
> 
> Why is 'all in-flight commands can error' a problem? The sense_buffer
> per host is used by only scsi_eh kernel thread.

I agree, then the current situation has a problem.

Target has command A && B in Q.
- A returns CHECK_CONDITION, scsi_eh thread kicks in, sends a REQUEST_SENSE.
- Immediately command B returns with CHECK_CONDITION, Target Q is frozen again.
- message is queued for scsi_eh thread but that one is stuck waiting for the first
  REQUEST_SENSE to return, and the second-REQUEST_SENSE is never sent, target Q is
  frozen forever.

I guess all the drivers that support target queueing do not depend on scsi_eh
thread to issue the REQUEST_SENSE command. As I said, there are very few drivers
that do nothing and let scsi_eh take care of REQUEST_SENSE.

This will not however solve these drivers that might need many
concurrent sense buffers.

Boaz

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-05-27  9:11 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1243236668-3398-1-git-send-email-jens.axboe@oracle.com>
     [not found] ` <1243236668-3398-6-git-send-email-jens.axboe@oracle.com>
2009-05-25  7:41   ` [PATCH 03/13] scsi: unify allocation of scsi command and sense buffer Christoph Hellwig
2009-05-25  7:46     ` Jens Axboe
2009-05-25  7:50       ` Christoph Hellwig
2009-05-25  7:54         ` Jens Axboe
2009-05-25 10:33         ` Boaz Harrosh
2009-05-25 10:42           ` Christoph Hellwig
2009-05-25 10:49             ` Jens Axboe
2009-05-26  4:36         ` FUJITA Tomonori
2009-05-26  5:08           ` FUJITA Tomonori
     [not found] <aday6sk8wne.fsf@cisco.com>
     [not found] ` <20090526132914W.fujita.tomonori@lab.ntt.co.jp>
     [not found]   ` <20090526062952.GB11363@kernel.dk>
2009-05-26  7:25     ` FUJITA Tomonori
2009-05-26  7:32       ` Jens Axboe
2009-05-26  7:38         ` FUJITA Tomonori
2009-05-26 14:47           ` James Bottomley
2009-05-26 15:13             ` Matthew Wilcox
2009-05-26 15:31             ` FUJITA Tomonori
2009-05-26 16:05               ` Boaz Harrosh
2009-05-27  1:36                 ` FUJITA Tomonori
2009-05-27  7:54                   ` Boaz Harrosh
2009-05-27  8:26                     ` FUJITA Tomonori
2009-05-27  9:11                       ` Boaz Harrosh
2009-05-26 16:12             ` Boaz Harrosh
2009-05-26 16:28               ` Boaz Harrosh
2009-05-26  7:56         ` FUJITA Tomonori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).