* [RFC] block: fix barrier error transmission
@ 2008-04-02 18:02 James Bottomley
2008-04-02 19:08 ` Jens Axboe
0 siblings, 1 reply; 6+ messages in thread
From: James Bottomley @ 2008-04-02 18:02 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-scsi, mtosatti
There's a problem with barriers in SCSI. They're prepared as a
SYNCHRONIZE_CACHE REQ_TYPE_BLOCK_PC command. However, our processing
model dictates that we always return no block error but set req->errors
to the SCSI result. The upshot is that if we get an error on a flush
barrier, it's never seen by the block layer.
This patch attempts to get the block layer to see such errors. Note,
it's still not quite right. There are possible sense errors that could
indicate success of the command (like deferred errors) that this will
treat as failures. I suppose we could process the sense in scsi and try
to return a first pass error even for REQ_TYPE_BLOCK_PC ... I'm just not
sure what that will do to the other users.
James
---
diff --git a/block/blk-barrier.c b/block/blk-barrier.c
index 55c5f1f..3a3947c 100644
--- a/block/blk-barrier.c
+++ b/block/blk-barrier.c
@@ -114,18 +114,24 @@ void blk_ordered_complete_seq(struct request_queue *q, unsigned seq, int error)
static void pre_flush_end_io(struct request *rq, int error)
{
+ error = rq->errors ? -EIO : error;
+
elv_completed_request(rq->q, rq);
blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_PREFLUSH, error);
}
static void bar_end_io(struct request *rq, int error)
{
+ error = rq->errors ? -EIO : error;
+
elv_completed_request(rq->q, rq);
blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_BAR, error);
}
static void post_flush_end_io(struct request *rq, int error)
{
+ error = rq->errors ? -EIO : error;
+
elv_completed_request(rq->q, rq);
blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_POSTFLUSH, error);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC] block: fix barrier error transmission
2008-04-02 18:02 [RFC] block: fix barrier error transmission James Bottomley
@ 2008-04-02 19:08 ` Jens Axboe
2008-04-02 23:11 ` James Bottomley
0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2008-04-02 19:08 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi, mtosatti
On Wed, Apr 02 2008, James Bottomley wrote:
> There's a problem with barriers in SCSI. They're prepared as a
> SYNCHRONIZE_CACHE REQ_TYPE_BLOCK_PC command. However, our processing
> model dictates that we always return no block error but set req->errors
> to the SCSI result. The upshot is that if we get an error on a flush
> barrier, it's never seen by the block layer.
>
> This patch attempts to get the block layer to see such errors. Note,
> it's still not quite right. There are possible sense errors that could
> indicate success of the command (like deferred errors) that this will
> treat as failures. I suppose we could process the sense in scsi and try
> to return a first pass error even for REQ_TYPE_BLOCK_PC ... I'm just not
> sure what that will do to the other users.
>
> James
>
> ---
>
> diff --git a/block/blk-barrier.c b/block/blk-barrier.c
> index 55c5f1f..3a3947c 100644
> --- a/block/blk-barrier.c
> +++ b/block/blk-barrier.c
> @@ -114,18 +114,24 @@ void blk_ordered_complete_seq(struct request_queue *q, unsigned seq, int error)
>
> static void pre_flush_end_io(struct request *rq, int error)
> {
> + error = rq->errors ? -EIO : error;
> +
> elv_completed_request(rq->q, rq);
> blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_PREFLUSH, error);
> }
It's a bit of a hack, SCSI really should pass the error value back
instead of fiddling around with possibly perhaps finding it in ->errors.
And please don't use these ?: constructs, in this case it doesn't even
make a lot of sense and a
if (rq->errors)
error = -EIO;
would have been much cleaner ;-)
So my question is why does the model not allow you to return the error
properly?
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] block: fix barrier error transmission
2008-04-02 19:08 ` Jens Axboe
@ 2008-04-02 23:11 ` James Bottomley
2008-04-03 8:06 ` Jens Axboe
0 siblings, 1 reply; 6+ messages in thread
From: James Bottomley @ 2008-04-02 23:11 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-scsi, mtosatti
On Wed, 2008-04-02 at 21:08 +0200, Jens Axboe wrote:
> > diff --git a/block/blk-barrier.c b/block/blk-barrier.c
> > index 55c5f1f..3a3947c 100644
> > --- a/block/blk-barrier.c
> > +++ b/block/blk-barrier.c
> > @@ -114,18 +114,24 @@ void blk_ordered_complete_seq(struct request_queue *q, unsigned seq, int error)
> >
> > static void pre_flush_end_io(struct request *rq, int error)
> > {
> > + error = rq->errors ? -EIO : error;
> > +
> > elv_completed_request(rq->q, rq);
> > blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_PREFLUSH, error);
> > }
>
> It's a bit of a hack, SCSI really should pass the error value back
> instead of fiddling around with possibly perhaps finding it in ->errors.
> And please don't use these ?: constructs, in this case it doesn't even
> make a lot of sense and a
>
> if (rq->errors)
> error = -EIO;
>
> would have been much cleaner ;-)
>
> So my question is why does the model not allow you to return the error
> properly?
I thought it was the sg_io that would be the problem, but apparently on
further research, it simply discards the error as does scsi_execute_req.
I suppose that's a strong enough reason to try returning an error ...
I'm just a bit leery this close to a release.
I think this will work ... it just really needs quite a bit of
testing ...
James
---
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1b199e1..67f412b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -839,7 +839,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
int this_count = scsi_bufflen(cmd);
struct request_queue *q = cmd->device->request_queue;
struct request *req = cmd->request;
- int clear_errors = 1;
+ int error = 0;
struct scsi_sense_hdr sshdr;
int sense_valid = 0;
int sense_deferred = 0;
@@ -853,7 +853,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
if (blk_pc_request(req)) { /* SG_IO ioctl from block level */
req->errors = result;
if (result) {
- clear_errors = 0;
if (sense_valid && req->sense) {
/*
* SG_IO wants current and deferred errors
@@ -865,6 +864,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
memcpy(req->sense, cmd->sense_buffer, len);
req->sense_len = len;
}
+ if (!sense_deferred)
+ error = -EIO;
}
if (scsi_bidi_cmnd(cmd)) {
/* will also release_buffers */
@@ -885,14 +886,11 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
"%d bytes done.\n",
req->nr_sectors, good_bytes));
- if (clear_errors)
- req->errors = 0;
-
/* A number of bytes were successfully read. If there
* are leftovers and there is some kind of error
* (result != 0), retry the rest.
*/
- if (scsi_end_request(cmd, 0, good_bytes, result == 0) == NULL)
+ if (scsi_end_request(cmd, error, good_bytes, result == 0) == NULL)
return;
/* good_bytes = 0, or (inclusive) there were leftovers and
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC] block: fix barrier error transmission
2008-04-02 23:11 ` James Bottomley
@ 2008-04-03 8:06 ` Jens Axboe
2008-04-03 14:02 ` James Bottomley
0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2008-04-03 8:06 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi, mtosatti
On Wed, Apr 02 2008, James Bottomley wrote:
> On Wed, 2008-04-02 at 21:08 +0200, Jens Axboe wrote:
> > > diff --git a/block/blk-barrier.c b/block/blk-barrier.c
> > > index 55c5f1f..3a3947c 100644
> > > --- a/block/blk-barrier.c
> > > +++ b/block/blk-barrier.c
> > > @@ -114,18 +114,24 @@ void blk_ordered_complete_seq(struct request_queue *q, unsigned seq, int error)
> > >
> > > static void pre_flush_end_io(struct request *rq, int error)
> > > {
> > > + error = rq->errors ? -EIO : error;
> > > +
> > > elv_completed_request(rq->q, rq);
> > > blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_PREFLUSH, error);
> > > }
> >
> > It's a bit of a hack, SCSI really should pass the error value back
> > instead of fiddling around with possibly perhaps finding it in ->errors.
> > And please don't use these ?: constructs, in this case it doesn't even
> > make a lot of sense and a
> >
> > if (rq->errors)
> > error = -EIO;
> >
> > would have been much cleaner ;-)
> >
> > So my question is why does the model not allow you to return the error
> > properly?
>
> I thought it was the sg_io that would be the problem, but apparently on
> further research, it simply discards the error as does scsi_execute_req.
>
> I suppose that's a strong enough reason to try returning an error ...
> I'm just a bit leery this close to a release.
>
> I think this will work ... it just really needs quite a bit of
> testing ...
This looks much better, but I'm with you on the danger of applying
something like this so close to a release...
Now, this isn't a regression, but it also impacts barrier reliability
and as such it's a big nasty to leave this open for another release.
>
> James
>
> ---
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 1b199e1..67f412b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -839,7 +839,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
> int this_count = scsi_bufflen(cmd);
> struct request_queue *q = cmd->device->request_queue;
> struct request *req = cmd->request;
> - int clear_errors = 1;
> + int error = 0;
> struct scsi_sense_hdr sshdr;
> int sense_valid = 0;
> int sense_deferred = 0;
> @@ -853,7 +853,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
> if (blk_pc_request(req)) { /* SG_IO ioctl from block level */
> req->errors = result;
> if (result) {
> - clear_errors = 0;
> if (sense_valid && req->sense) {
> /*
> * SG_IO wants current and deferred errors
> @@ -865,6 +864,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
> memcpy(req->sense, cmd->sense_buffer, len);
> req->sense_len = len;
> }
> + if (!sense_deferred)
> + error = -EIO;
> }
> if (scsi_bidi_cmnd(cmd)) {
> /* will also release_buffers */
> @@ -885,14 +886,11 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
> "%d bytes done.\n",
> req->nr_sectors, good_bytes));
>
> - if (clear_errors)
> - req->errors = 0;
> -
> /* A number of bytes were successfully read. If there
> * are leftovers and there is some kind of error
> * (result != 0), retry the rest.
> */
> - if (scsi_end_request(cmd, 0, good_bytes, result == 0) == NULL)
> + if (scsi_end_request(cmd, error, good_bytes, result == 0) == NULL)
> return;
>
> /* good_bytes = 0, or (inclusive) there were leftovers and
>
>
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] block: fix barrier error transmission
2008-04-03 8:06 ` Jens Axboe
@ 2008-04-03 14:02 ` James Bottomley
2008-04-04 11:46 ` Jens Axboe
0 siblings, 1 reply; 6+ messages in thread
From: James Bottomley @ 2008-04-03 14:02 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-scsi, mtosatti
On Thu, 2008-04-03 at 10:06 +0200, Jens Axboe wrote:
> On Wed, Apr 02 2008, James Bottomley wrote:
> > On Wed, 2008-04-02 at 21:08 +0200, Jens Axboe wrote:
> > > > diff --git a/block/blk-barrier.c b/block/blk-barrier.c
> > > > index 55c5f1f..3a3947c 100644
> > > > --- a/block/blk-barrier.c
> > > > +++ b/block/blk-barrier.c
> > > > @@ -114,18 +114,24 @@ void blk_ordered_complete_seq(struct request_queue *q, unsigned seq, int error)
> > > >
> > > > static void pre_flush_end_io(struct request *rq, int error)
> > > > {
> > > > + error = rq->errors ? -EIO : error;
> > > > +
> > > > elv_completed_request(rq->q, rq);
> > > > blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_PREFLUSH, error);
> > > > }
> > >
> > > It's a bit of a hack, SCSI really should pass the error value back
> > > instead of fiddling around with possibly perhaps finding it in ->errors.
> > > And please don't use these ?: constructs, in this case it doesn't even
> > > make a lot of sense and a
> > >
> > > if (rq->errors)
> > > error = -EIO;
> > >
> > > would have been much cleaner ;-)
> > >
> > > So my question is why does the model not allow you to return the error
> > > properly?
> >
> > I thought it was the sg_io that would be the problem, but apparently on
> > further research, it simply discards the error as does scsi_execute_req.
> >
> > I suppose that's a strong enough reason to try returning an error ...
> > I'm just a bit leery this close to a release.
> >
> > I think this will work ... it just really needs quite a bit of
> > testing ...
>
> This looks much better, but I'm with you on the danger of applying
> something like this so close to a release...
Yes.
> Now, this isn't a regression, but it also impacts barrier reliability
> and as such it's a big nasty to leave this open for another release.
Yes, I agree ... let's put it in after 2.6.25 (so into scsi-misc) but if
no problems turn up by -rc2 say, I'll send it as a backport to stable
2.6.25.X. That way we don't have to wait out the entire release cycle
for users to see the benefit.
James
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] block: fix barrier error transmission
2008-04-03 14:02 ` James Bottomley
@ 2008-04-04 11:46 ` Jens Axboe
0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2008-04-04 11:46 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi, mtosatti
On Thu, Apr 03 2008, James Bottomley wrote:
> On Thu, 2008-04-03 at 10:06 +0200, Jens Axboe wrote:
> > On Wed, Apr 02 2008, James Bottomley wrote:
> > > On Wed, 2008-04-02 at 21:08 +0200, Jens Axboe wrote:
> > > > > diff --git a/block/blk-barrier.c b/block/blk-barrier.c
> > > > > index 55c5f1f..3a3947c 100644
> > > > > --- a/block/blk-barrier.c
> > > > > +++ b/block/blk-barrier.c
> > > > > @@ -114,18 +114,24 @@ void blk_ordered_complete_seq(struct request_queue *q, unsigned seq, int error)
> > > > >
> > > > > static void pre_flush_end_io(struct request *rq, int error)
> > > > > {
> > > > > + error = rq->errors ? -EIO : error;
> > > > > +
> > > > > elv_completed_request(rq->q, rq);
> > > > > blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_PREFLUSH, error);
> > > > > }
> > > >
> > > > It's a bit of a hack, SCSI really should pass the error value back
> > > > instead of fiddling around with possibly perhaps finding it in ->errors.
> > > > And please don't use these ?: constructs, in this case it doesn't even
> > > > make a lot of sense and a
> > > >
> > > > if (rq->errors)
> > > > error = -EIO;
> > > >
> > > > would have been much cleaner ;-)
> > > >
> > > > So my question is why does the model not allow you to return the error
> > > > properly?
> > >
> > > I thought it was the sg_io that would be the problem, but apparently on
> > > further research, it simply discards the error as does scsi_execute_req.
> > >
> > > I suppose that's a strong enough reason to try returning an error ...
> > > I'm just a bit leery this close to a release.
> > >
> > > I think this will work ... it just really needs quite a bit of
> > > testing ...
> >
> > This looks much better, but I'm with you on the danger of applying
> > something like this so close to a release...
>
> Yes.
>
> > Now, this isn't a regression, but it also impacts barrier reliability
> > and as such it's a big nasty to leave this open for another release.
>
> Yes, I agree ... let's put it in after 2.6.25 (so into scsi-misc) but if
> no problems turn up by -rc2 say, I'll send it as a backport to stable
> 2.6.25.X. That way we don't have to wait out the entire release cycle
> for users to see the benefit.
Sounds like a good plan.
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-04-04 11:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-02 18:02 [RFC] block: fix barrier error transmission James Bottomley
2008-04-02 19:08 ` Jens Axboe
2008-04-02 23:11 ` James Bottomley
2008-04-03 8:06 ` Jens Axboe
2008-04-03 14:02 ` James Bottomley
2008-04-04 11:46 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).