All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@fb.com>
To: "Elliott, Robert (Server Storage)" <Elliott@hp.com>,
	Christoph Hellwig <hch@lst.de>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: blk-mq timeout handling fixes
Date: Wed, 17 Sep 2014 15:56:49 -0600	[thread overview]
Message-ID: <541A03A1.9070908@fb.com> (raw)
In-Reply-To: <94D0CD8314A33A4D9D801C0FE68B402958C86C8E@G9W0745.americas.hpqcorp.net>

On 09/17/2014 03:53 PM, Elliott, Robert (Server Storage) wrote:
> 
> 
>> -----Original Message-----
>> From: Christoph Hellwig [mailto:hch@lst.de]
>> Sent: Saturday, 13 September, 2014 6:40 PM
>> To: Jens Axboe
>> Cc: Elliott, Robert (Server Storage); linux-scsi@vger.kernel.org; linux-
>> kernel@vger.kernel.org
>> Subject: blk-mq timeout handling fixes
>>
>> This series fixes various issues with timeout handling that Robert
>> ran into when testing scsi-mq heavily.  He tested an earlier version,
>> and couldn't reproduce the issues anymore, although the series changed
>> quite significantly since and should probably be retested.
>>
>> In summary we not only start the blk-mq timer inside the drivers
>> ->queue_rq method after the request has been fully setup, and we
>> also tell the drivers if we're timing out a reserved (internal)
>> request or a real one.  Many drivers including will need to handle
>> those internal ones differently, e.g. for scsi-mq we don't even
>> have a scsi command structure allocated for the reserved commands.
> 
> I have rerun a variety of tests on:
> * Jens' for-next tree that went into 3.17rc5
> * plus this series
> * plus two patches for infinite recursion on flushes from 
>   Ming and then Christoph

This is pretty much what is queued up for 3.17 as well. It's bigger than
I'd like at this point, but these are real fixes.

> and have not been able to trigger the scsi_times_out req->special
> NULL pointer dereference that prompted this series.

Great!!

> Testing includes:
> * concurrent heavy workload generators:
>   * fio high iodepth direct 512 byte random reads (> 1M IOPS)
>   * programs generating large bursts of paged writes
>     * mkfs.ext4 (followed by e2fsck)
>     * mkfs.xfs (followed by xfs_check)
>     * ddpt
>   * watch -n 0 sync to generate flushes
> * scsi_logging_level MLCOMPLETE set to 0 or 1
>   * scsi_lib.c patched to put all the ACTION_FAIL messages
>     under level 1 so they can be squelched (massive error 
>     prints cause more timeouts themselves)
> * 4 hpsa and 16 mpt3sas devices (all made from SAS SSDs)
>   * lockless hpsa driver
> * injecting errors
>   * device removal
>   * device generating infinite errors
>   * device generating a brief number of errors
> 
> The filesystems don't always recover properly, but nothing in 
> the block or scsi midlayers crashed.
> 
> So, you may add this to the series:
> Tested-by: Robert Elliott <elliott@hp.com>

Thanks a lot for your (continued) testing, Robert. It's a great help.


-- 
Jens Axboe


      reply	other threads:[~2014-09-17 21:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-13 23:40 blk-mq timeout handling fixes Christoph Hellwig
2014-09-13 23:40 ` [PATCH 1/6] blk-mq: remove REQ_END Christoph Hellwig
2014-09-13 23:40 ` [PATCH 2/6] blk-mq: call blk_mq_start_request from ->queue_rq Christoph Hellwig
2014-09-15  6:34   ` Ming Lei
2014-09-15  7:27   ` Ming Lei
2014-09-13 23:40 ` [PATCH 3/6] blk-mq: rename blk_mq_end_io to blk_mq_end_request Christoph Hellwig
2014-09-13 23:40 ` [PATCH 4/6] blk-mq: fix and simplify tag iteration for the timeout handler Christoph Hellwig
2014-09-13 23:40 ` [PATCH 5/6] blk-mq: unshared " Christoph Hellwig
2014-09-13 23:40 ` [PATCH 6/6] blk-mq: pass a reserved argument to the " Christoph Hellwig
2014-09-17 21:53 ` blk-mq timeout handling fixes Elliott, Robert (Server Storage)
2014-09-17 21:56   ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=541A03A1.9070908@fb.com \
    --to=axboe@fb.com \
    --cc=Elliott@hp.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.