All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: benh@kernel.crashing.org, james.bottomley@steeleye.com,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	rdreier@cisco.com, linux-scsi@vger.kernel.org,
	rmk@arm.linux.org.uk, davem@davemloft.net, ralf@linux-mips.org,
	Christoph Hellwig <hch@infradead.org>,
	Benny Halevy <bhalevy@panasas.com>
Subject: Re: [PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd
Date: Mon, 07 Jan 2008 15:25:36 +0200	[thread overview]
Message-ID: <47822850.1030007@panasas.com> (raw)
In-Reply-To: <20080107155332J.fujita.tomonori@lab.ntt.co.jp>

On Mon, Jan 07 2008 at 8:53 +0200, FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
> On Sun, 23 Dec 2007 13:09:05 +0200
> Boaz Harrosh <bharrosh@panasas.com> wrote:
> 
>> On Fri, Dec 21 2007 at 4:30 +0200, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>>> The sense buffer ins scsi_cmnd can nowadays be DMA'ed into directly
>>> by some low level drivers (that typically happens with USB mass
>>> storage).
>>>
>>> This is a problem on non cache coherent architectures such as
>>> embedded PowerPCs where the sense buffer can share cache lines with
>>> other structure members, which leads to various forms of corruption.
>>>
>>> This uses the newly defined __dma_buffer annotation to enforce that
>>> on such platforms, the sense_buffer is contained within its own
>>> cache line. This has no effect on cache coherent architectures.
>>>
>>> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> ---
>>>
>>>  include/scsi/scsi_cmnd.h |    2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> --- linux-merge.orig/include/scsi/scsi_cmnd.h	2007-12-21 13:07:14.000000000 +1100
>>> +++ linux-merge/include/scsi/scsi_cmnd.h	2007-12-21 13:07:29.000000000 +1100
>>> @@ -88,7 +88,7 @@ struct scsi_cmnd {
>>>  				   	   working on */
>>>  
>>>  #define SCSI_SENSE_BUFFERSIZE 	96
>>> -	unsigned char sense_buffer[SCSI_SENSE_BUFFERSIZE];
>>> +	unsigned char sense_buffer[SCSI_SENSE_BUFFERSIZE] __dma_buffer;
>>>  				/* obtained by REQUEST SENSE when
>>>  				 * CHECK CONDITION is received on original
>>>  				 * command (auto-sense) */
>> This has the potential of leaving a big fat ugly hole in the middle of 
>> scsi_cmnd. I would suggest of *just* moving the sense_buffer array to be
>> the *first member* of struct scsi_cmnd. The command itself is already cache
>> aligned, allocated by the proper flags to it's slab. And put a fat comment
>> near it's definition.
>>
>> This is until a proper fix is sent. I have in my Q a proposition for a 
>> more prominent solution, which I will send next month. Do to short comings
>> in the sense handling and optimizations, but should definitely cover this
>> problem.
>>
>> The code should have time to be discussed and tested, so it is only 2.6.26
>> material. For the duration of the 2.6.25 kernel we can live with a reorder
>> of scsi_cmnd members, if it solves such a grave bug for some ARCHs.
>>
>> Boaz
>> ----
>> [RFD below]
>> My proposed solution will be has follows:
>>
>>  1. Since the scsi protocol mandates an immediate REQUEST_SENSE after an error
>>     in effect the Q is frozen until the REQUEST_SENSE command returns.
>>
>>  2. The scsi-ml needs the sense buffer for its normal operation, independent 
>>     from the ULD's request of the sence_buffer or not at request->sense. But
>>     in effect, 90% of all scsi-requests come with ULD's allocated buffer for
>>     sense, that is copied to, on command completion.
>>
>>  3. 99% of all commands complete successfully, so if an optimization is 
>>     proposed for the successful case, sacrificing a few cycles for the error
>>     case than thats a good thing.
>>
>>  My suggestion is to have a per Q, driver-overridable, sense buffer that is 
>>  DMAed/written to by drivers. At the end of the REQUEST_SENSE command one
>>  of 2 things will be done. Either copy the sense to the ULD's supplied buffer,
>>  or if not available, allocate one from a dedicated mem_cache pool.
>>  
>>  So we are completely saving 92 bytes from scsi_cmnd by replacing the buffer
>>  with a pointer. We can always read the sense into a per Q buffer. And 10% of
>>  %1 of the times we will need to allocate a sense buffer from a dedicated mem_cache
>>  I would say thats a nice optimization.
>>
>>  The changes to scsi_error/scsi_cmnd and friends, is pretty strait forward. But
>>  it depends on a conversion of 4/5 drivers to the new scsi_eh API for 
>>  REQUEST_SENSE. I have only converted these drivers that interfered with the accessors
>>  effort + 1 other places. But there are a few more places that are not converted.
>>  Once done. The implementation can easily change with no affect on drivers.
> 
> I think that removing the sense_buffer array from scsi_cmnd effects
> lots of LLDs. As I wrote in other mail, many LLDs assume that
> scsi_cmnd:sense_buffer is always available. Another big task is to
> take care about auto sense.
> 
> Have you already had some patches? I've just started to work on this
> and I'd like to push that fix into 2.6.25.

Tomo Hi.
Since you ask to push this into 2.6.25, I have ask permission to
prioritize this effort, as until now it was on a back burner.

I have only done 3 drivers up to now. (out of something like 15)

I have seen 4 patterns of "sense" use.

1. Driver allocated sense that is memcpy'ed in interrupt time to
   cmnd->sense.
   The sense is automatically fetched by controller and is
   usually pre-mapped.

2. dma-map of cmnd->sense prior to each command. similar to case-1
   but DMAed directly into cmnd->sense buffer. (AutoSense)

3. Synchronous request for sense, like the places I sent patches
   for. Where cmnd is re-used to map the REQUEST_SENSE read.

4. Do nothing and let scsi_error issue a request sense asynchronously.

What I did until now is, added a new API for case 1 - scsi_eh_cpy_sense(...) ,
and 3,4 are covered by the existing (new) APIs. Case-2 is hardest and I'll
fix it per-driver-case, like in iscsi it was very easy as it already have
a per command structure allocated.

I have not yet converted the scsi midlayer to any new system but my plan
is:

Stage 1
1. convert all drivers to some new/old API where:
   Those drivers with case-2 above, that can DMA_FROM_DEVICE 
   concurrently into more than one sense buffer, will change 
   specifically, to pre-allocate a can_queue number of buffers.

2. Change mid-layer to:
  a. Pre-allocate one dma-able sense buffer per Q.
  b. copy sense out of above, directly into ULD's allocated sense
     buffer which was put on cmnd->request->sense
  c. Those very *few* requests that come without ULD allocated
     sense, allocate one for them out of a mempool.
     (All regular file-system ULDs allocate a sense buffer)

Stage 2
 I think I can see a way how to shuffle the mid-layer post-sense processing
 directly in copy-sense time. So in case ULD did not "request sense" the
 buffer can be discarded. But I'm not 100% sure about that.

Let me write up a sketch of what I mean which will include 3 drivers
that demonstrate above cases, and the mid-layer changes.
Then we can discus it and see if it makes sense (;)) to every body.
But let me have time until tomorrow night for that.

Boaz
 

  reply	other threads:[~2008-01-07 13:27 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-21  2:30 [PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd Benjamin Herrenschmidt
2007-12-21  2:30 ` Benjamin Herrenschmidt
2007-12-21 10:33 ` Alan Cox
2007-12-21 10:33   ` Alan Cox
2007-12-21 13:16   ` Matthew Wilcox
2007-12-21 13:30     ` Thomas Bogendoerfer
2007-12-21 14:00       ` Matthew Wilcox
2007-12-21 16:35         ` Thomas Bogendoerfer
2007-12-21 21:29     ` Benjamin Herrenschmidt
2007-12-21 16:57   ` Roland Dreier
2007-12-21 16:57     ` Roland Dreier
2007-12-21 21:27   ` Benjamin Herrenschmidt
2007-12-23 11:09 ` Boaz Harrosh
2007-12-23 23:04   ` Benjamin Herrenschmidt
2008-01-07  6:53   ` FUJITA Tomonori
2008-01-07 13:25     ` Boaz Harrosh [this message]
2008-01-07 23:32       ` FUJITA Tomonori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47822850.1030007@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=bhalevy@panasas.com \
    --cc=davem@davemloft.net \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=hch@infradead.org \
    --cc=james.bottomley@steeleye.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=ralf@linux-mips.org \
    --cc=rdreier@cisco.com \
    --cc=rmk@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.