* mptsas hangs caused by ATA pass-through explained
@ 2010-04-26 23:11 Ryan Kuester
2010-06-01 19:43 ` Cláudio Martins
0 siblings, 1 reply; 5+ messages in thread
From: Ryan Kuester @ 2010-04-26 23:11 UTC (permalink / raw)
To: Eric Moore, Kashyap Desai, DL-MPTFusionLinux
Cc: support, linux-scsi, linux-kernel
I may have an explanation for the LSI 1068 HBA hangs provoked by ATA
pass-through commands, in particular by smartctl.
First, my version of the symptoms. On an LSI SAS1068E B3 HBA running
01.29.00.00 firmware, with SATA disks, and with smartd running, I'm seeing
occasional task, bus, and host resets, some of which lead to hard faults of
the HBA requiring a reboot. Abusively looping the smartctl command,
# while true; do smartctl -a /dev/sdb > /dev/null; done
dramatically increases the frequency of these failures to nearly one per
minute. A high IO load through the HBA while looping smartctl seems to
improve the chance of a full scsi host reset or a non-recoverable hang.
I reduced what smartctl was doing down to a simple test case which
causes the hang with a single IO when pointed at the sd interface. See
the code at the bottom of this e-mail. It uses an SG_IO ioctl to issue
a single pass-through ATA identify device command. If the buffer
userspace gives for the read data has certain alignments, the task is
issued to the HBA but the HBA fails to respond. If run against the sg
interface, neither the test code nor smartctl causes a hang.
sd and sg handle the SG_IO ioctl slightly differently. Unless you
specifically set a flag to do direct IO, sg passes a buffer of its own,
which is page-aligned, to the block layer and later copies the result
into the userspace buffer regardless of its alignment. sd, on the other
hand, always does direct IO unless the userspace buffer fails an
alignment test at block/blk-map.c line 57, in which case a page-aligned
buffer is created and used for the transfer.
The alignment test currently checks for word-alignment, the default
setup by scsi_lib.c; therefore, userspace buffers of almost any
alignment are given directly to the HBA as DMA targets. The LSI 1068
hardware doesn't seem to like at least a couple of the alignments which
cross a page boundary (see the test code below). Curiously, many
page-boundary-crossing alignments do work just fine.
So, either the hardware has an bug handling certain alignments or the
hardware has a stricter alignment requirement than the driver is
advertising. If stricter alignment is required, then in no case should
misaligned buffers from userspace be allowed through without being
bounced or at least causing an error to be returned.
It seems the mptsas driver could use blk_queue_dma_alignment() to advertise
a stricter alignment requirement. If it does, sd does the right thing and
bounces misaligned buffers (see block/blk-map.c line 57). The following
patch to 2.6.34-rc5 makes my symptoms go away. I'm sure this is the wrong
place for this code, but it gets my idea across.
diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c
index 6796597..1e034ad 100644
--- a/drivers/message/fusion/mptscsih.c
+++ b/drivers/message/fusion/mptscsih.c
@@ -2450,6 +2450,8 @@ mptscsih_slave_configure(struct scsi_device *sdev)
ioc->name,sdev->tagged_supported, sdev->simple_tags,
sdev->ordered_tags));
+ blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
+
return 0;
}
I look forward to hearing from you guys who know this hardware and code
better than I do. Is the hardware at fault, or should the driver be
shielding the hardware better? Where's the right place to add this code, if
it's the right fix?
Does this `fix' the problem for anyone besides me?
Regards,
-- Ryan Kuester
Here is a minimal bit of test code which causes the error. BEWARE: this
will hose the HBA at which you point it. If that's controlling your
root disk, you may hang your machine.
/*
* sg_bomb -- send SG_IO ioctl which causes LSI 1068 HBA to hang
*
* usage: sg_bomb <device>
* e.g.: sg_bomb /dev/sdb
* e.g.: sg_bomb /dev/sg1
*
* Modify offset_into_page to adjust the degree of buffer misalignment.
*/
#include <unistd.h>
#include <scsi/sg.h>
#include <sys/ioctl.h>
#include <fcntl.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
char* filename = argv[1];
unsigned int offset_into_page = 0xe40;
// works: unsigned int offset_into_page = 0x0;
// hangs: unsigned int offset_into_page = 0xf00;
// works: unsigned int offset_into_page = 0xf04;
unsigned char ata_identify_cmd[] = {0x85, 0x08, 0x0e, 0, 0, 0, 0x01,
0, 0, 0, 0, 0, 0, 0, 0xec, 0};
unsigned char sense[32];
unsigned char* data = valloc(0x2000) + offset_into_page;
struct sg_io_hdr hdr = {
.interface_id = 'S',
.dxfer_direction = SG_DXFER_FROM_DEV,
.cmdp = ata_identify_cmd,
.cmd_len = 16,
.dxferp = data,
.dxfer_len = 512,
.sbp = sense,
.mx_sb_len = sizeof(sense),
.timeout = 5000,
};
int fd;
if ((fd = open(filename, O_RDWR|O_NONBLOCK)) < 0)
perror();
return ioctl(fd, SG_IO, &hdr);
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: mptsas hangs caused by ATA pass-through explained
2010-04-26 23:11 mptsas hangs caused by ATA pass-through explained Ryan Kuester
@ 2010-06-01 19:43 ` Cláudio Martins
2010-06-03 16:35 ` Desai, Kashyap
0 siblings, 1 reply; 5+ messages in thread
From: Cláudio Martins @ 2010-06-01 19:43 UTC (permalink / raw)
To: Ryan Kuester
Cc: Eric Moore, Kashyap Desai, DL-MPTFusionLinux, support, linux-scsi,
linux-kernel
On Mon, 26 Apr 2010 18:11:54 -0500 Ryan Kuester <rkuester@kspace.net> wrote:
> I may have an explanation for the LSI 1068 HBA hangs provoked by ATA
> pass-through commands, in particular by smartctl.
>
> First, my version of the symptoms. On an LSI SAS1068E B3 HBA running
> 01.29.00.00 firmware, with SATA disks, and with smartd running, I'm seeing
> occasional task, bus, and host resets, some of which lead to hard faults of
> the HBA requiring a reboot. Abusively looping the smartctl command,
>
> # while true; do smartctl -a /dev/sdb > /dev/null; done
>
> dramatically increases the frequency of these failures to nearly one per
> minute. A high IO load through the HBA while looping smartctl seems to
> improve the chance of a full scsi host reset or a non-recoverable hang.
>
> I reduced what smartctl was doing down to a simple test case which
> causes the hang with a single IO when pointed at the sd interface. See
> the code at the bottom of this e-mail. It uses an SG_IO ioctl to issue
> a single pass-through ATA identify device command. If the buffer
> userspace gives for the read data has certain alignments, the task is
> issued to the HBA but the HBA fails to respond. If run against the sg
> interface, neither the test code nor smartctl causes a hang.
>
> sd and sg handle the SG_IO ioctl slightly differently. Unless you
> specifically set a flag to do direct IO, sg passes a buffer of its own,
> which is page-aligned, to the block layer and later copies the result
> into the userspace buffer regardless of its alignment. sd, on the other
> hand, always does direct IO unless the userspace buffer fails an
> alignment test at block/blk-map.c line 57, in which case a page-aligned
> buffer is created and used for the transfer.
>
> The alignment test currently checks for word-alignment, the default
> setup by scsi_lib.c; therefore, userspace buffers of almost any
> alignment are given directly to the HBA as DMA targets. The LSI 1068
> hardware doesn't seem to like at least a couple of the alignments which
> cross a page boundary (see the test code below). Curiously, many
> page-boundary-crossing alignments do work just fine.
>
> So, either the hardware has an bug handling certain alignments or the
> hardware has a stricter alignment requirement than the driver is
> advertising. If stricter alignment is required, then in no case should
> misaligned buffers from userspace be allowed through without being
> bounced or at least causing an error to be returned.
>
> It seems the mptsas driver could use blk_queue_dma_alignment() to advertise
> a stricter alignment requirement. If it does, sd does the right thing and
> bounces misaligned buffers (see block/blk-map.c line 57). The following
> patch to 2.6.34-rc5 makes my symptoms go away. I'm sure this is the wrong
> place for this code, but it gets my idea across.
>
> diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c
> index 6796597..1e034ad 100644
> --- a/drivers/message/fusion/mptscsih.c
> +++ b/drivers/message/fusion/mptscsih.c
> @@ -2450,6 +2450,8 @@ mptscsih_slave_configure(struct scsi_device *sdev)
> ioc->name,sdev->tagged_supported, sdev->simple_tags,
> sdev->ordered_tags));
>
> + blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
> +
> return 0;
> }
>
Hello,
I have tested v2.6.34 on a box with 16 SATA disks attached to a
LSISAS1068E (through a port expander), with and without this patch:
With vanilla 2.6.34 I can reliably reproduce controller timeouts both
with the example code provided by Ryan and with a simple loop like:
while : ; do for d in `ls /sys/block/ | grep sd` ; do smartctl -a /dev/$d ; done ; done
The result are controller timeouts with the following kind of kernel messages:
mptscsih: ioc0: attempting task abort! (sc=ffff8802be18dc00)
sd 4:0:2:0: [sdc] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be18dc00)
mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset}, SubCode(0x2000)
mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort}, SubCode(0x0101)
mptscsih: ioc0: attempting task abort! (sc=ffff8802be18dc00)
sd 4:0:2:0: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00
mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset}, SubCode(0x2000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be18dc00)
mptscsih: ioc0: attempting target reset! (sc=ffff8802be18dc00)
sd 4:0:2:0: [sdc] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
mptscsih: ioc0: target reset: SUCCESS (sc=ffff8802be18dc00)
mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset}, SubCode(0x2000)
mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort}, SubCode(0x0101)
mptscsih: ioc0: attempting task abort! (sc=ffff8802be18dc00)
sd 4:0:2:0: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00
mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset}, SubCode(0x2000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be18dc00)
mptscsih: ioc0: attempting bus reset! (sc=ffff8802be18dc00)
sd 4:0:2:0: [sdc] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
mptscsih: ioc0: bus reset: SUCCESS (sc=ffff8802be18dc00)
mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset}, SubCode(0x2000)
mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort}, SubCode(0x0101)
mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort}, SubCode(0x0101)
mptscsih: ioc0: attempting host reset! (sc=ffff8802be18dc00)
mptscsih: ioc0: host reset: SUCCESS (sc=ffff8802be18dc00)
As described in
https://bugzilla.kernel.org/show_bug.cgi?id=13594
this can result in nasty side effects, like multiple drives getting
kicked out of an MD array.
With Ryan's patch applied on top of v2.6.34 I cannot reproduce the
above problem with my simple script nor with Ryan's example.
So, IMHO, this patch should be strongly considered for inclusion, or
else the root cause investigated further.
So, as far as I can tell:
Tested-by: Cláudio Martins <ctpm@ist.utl.pt>
I'm also glad to test any further patches, if it turns out that the
above is not the most correct fix for the issue.
Thanks in advance.
Best regards
Cláudio
> I look forward to hearing from you guys who know this hardware and code
> better than I do. Is the hardware at fault, or should the driver be
> shielding the hardware better? Where's the right place to add this code, if
> it's the right fix?
>
> Does this `fix' the problem for anyone besides me?
>
> Regards,
> -- Ryan Kuester
>
>
> Here is a minimal bit of test code which causes the error. BEWARE: this
> will hose the HBA at which you point it. If that's controlling your
> root disk, you may hang your machine.
>
> /*
> * sg_bomb -- send SG_IO ioctl which causes LSI 1068 HBA to hang
> *
> * usage: sg_bomb <device>
> * e.g.: sg_bomb /dev/sdb
> * e.g.: sg_bomb /dev/sg1
> *
> * Modify offset_into_page to adjust the degree of buffer misalignment.
> */
>
> #include <unistd.h>
> #include <scsi/sg.h>
> #include <sys/ioctl.h>
> #include <fcntl.h>
> #include <stdlib.h>
>
> int main(int argc, char* argv[])
> {
> char* filename = argv[1];
> unsigned int offset_into_page = 0xe40;
> // works: unsigned int offset_into_page = 0x0;
> // hangs: unsigned int offset_into_page = 0xf00;
> // works: unsigned int offset_into_page = 0xf04;
>
> unsigned char ata_identify_cmd[] = {0x85, 0x08, 0x0e, 0, 0, 0, 0x01,
> 0, 0, 0, 0, 0, 0, 0, 0xec, 0};
> unsigned char sense[32];
> unsigned char* data = valloc(0x2000) + offset_into_page;
> struct sg_io_hdr hdr = {
> .interface_id = 'S',
> .dxfer_direction = SG_DXFER_FROM_DEV,
> .cmdp = ata_identify_cmd,
> .cmd_len = 16,
> .dxferp = data,
> .dxfer_len = 512,
> .sbp = sense,
> .mx_sb_len = sizeof(sense),
> .timeout = 5000,
> };
>
> int fd;
> if ((fd = open(filename, O_RDWR|O_NONBLOCK)) < 0)
> perror();
>
> return ioctl(fd, SG_IO, &hdr);
> }
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: mptsas hangs caused by ATA pass-through explained
2010-06-01 19:43 ` Cláudio Martins
@ 2010-06-03 16:35 ` Desai, Kashyap
0 siblings, 0 replies; 5+ messages in thread
From: Desai, Kashyap @ 2010-06-03 16:35 UTC (permalink / raw)
To: Cláudio Martins, Ryan Kuester
Cc: Moore, Eric, DL-MPT Fusion Linux, Support, Software,
linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org
> -----Original Message-----
> From: Cláudio Martins [mailto:ctpm@ist.utl.pt]
> Sent: Wednesday, June 02, 2010 1:14 AM
> To: Ryan Kuester
> Cc: Moore, Eric; Desai, Kashyap; DL-MPT Fusion Linux; Support,
> Software; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: mptsas hangs caused by ATA pass-through explained
>
>
> On Mon, 26 Apr 2010 18:11:54 -0500 Ryan Kuester <rkuester@kspace.net>
> wrote:
> > I may have an explanation for the LSI 1068 HBA hangs provoked by ATA
> > pass-through commands, in particular by smartctl.
> >
> > First, my version of the symptoms. On an LSI SAS1068E B3 HBA running
> > 01.29.00.00 firmware, with SATA disks, and with smartd running, I'm
> seeing
> > occasional task, bus, and host resets, some of which lead to hard
> faults of
> > the HBA requiring a reboot. Abusively looping the smartctl command,
> >
> > # while true; do smartctl -a /dev/sdb > /dev/null; done
> >
> > dramatically increases the frequency of these failures to nearly one
> per
> > minute. A high IO load through the HBA while looping smartctl seems
> to
> > improve the chance of a full scsi host reset or a non-recoverable
> hang.
> >
> > I reduced what smartctl was doing down to a simple test case which
> > causes the hang with a single IO when pointed at the sd interface.
> See
> > the code at the bottom of this e-mail. It uses an SG_IO ioctl to
> issue
> > a single pass-through ATA identify device command. If the buffer
> > userspace gives for the read data has certain alignments, the task is
> > issued to the HBA but the HBA fails to respond. If run against the
> sg
> > interface, neither the test code nor smartctl causes a hang.
> >
> > sd and sg handle the SG_IO ioctl slightly differently. Unless you
> > specifically set a flag to do direct IO, sg passes a buffer of its
> own,
> > which is page-aligned, to the block layer and later copies the result
> > into the userspace buffer regardless of its alignment. sd, on the
> other
> > hand, always does direct IO unless the userspace buffer fails an
> > alignment test at block/blk-map.c line 57, in which case a page-
> aligned
> > buffer is created and used for the transfer.
> >
> > The alignment test currently checks for word-alignment, the default
> > setup by scsi_lib.c; therefore, userspace buffers of almost any
> > alignment are given directly to the HBA as DMA targets. The LSI 1068
> > hardware doesn't seem to like at least a couple of the alignments
> which
> > cross a page boundary (see the test code below). Curiously, many
> > page-boundary-crossing alignments do work just fine.
> >
> > So, either the hardware has an bug handling certain alignments or the
> > hardware has a stricter alignment requirement than the driver is
> > advertising. If stricter alignment is required, then in no case
> should
> > misaligned buffers from userspace be allowed through without being
> > bounced or at least causing an error to be returned.
> >
> > It seems the mptsas driver could use blk_queue_dma_alignment() to
> advertise
> > a stricter alignment requirement. If it does, sd does the right
> thing and
> > bounces misaligned buffers (see block/blk-map.c line 57). The
> following
> > patch to 2.6.34-rc5 makes my symptoms go away. I'm sure this is the
> wrong
> > place for this code, but it gets my idea across.
> >
> > diff --git a/drivers/message/fusion/mptscsih.c
> b/drivers/message/fusion/mptscsih.c
> > index 6796597..1e034ad 100644
> > --- a/drivers/message/fusion/mptscsih.c
> > +++ b/drivers/message/fusion/mptscsih.c
> > @@ -2450,6 +2450,8 @@ mptscsih_slave_configure(struct scsi_device
> *sdev)
> > ioc->name,sdev->tagged_supported, sdev->simple_tags,
> > sdev->ordered_tags));
> >
> > + blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
> > +
> > return 0;
> > }
> >
I have also verified this patch + also done code review with our developers including Eric Moore.
Please consider this patch as an ACKed patch and schedule it for next upstream release.
Thanks,
Kashyap
>
> Hello,
>
> I have tested v2.6.34 on a box with 16 SATA disks attached to a
> LSISAS1068E (through a port expander), with and without this patch:
>
> With vanilla 2.6.34 I can reliably reproduce controller timeouts both
> with the example code provided by Ryan and with a simple loop like:
>
> while : ; do for d in `ls /sys/block/ | grep sd` ; do smartctl -a
> /dev/$d ; done ; done
>
> The result are controller timeouts with the following kind of kernel
> messages:
>
> mptscsih: ioc0: attempting task abort! (sc=ffff8802be18dc00)
> sd 4:0:2:0: [sdc] CDB: ATA command pass through(16): 85 08 0e 00 00 00
> 01 00 00 00 00 00 00 00 ec 00
> mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO
> Executed}, SubCode(0x0000)
> mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be18dc00)
> mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset},
> SubCode(0x2000)
> mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort},
> SubCode(0x0101)
> mptscsih: ioc0: attempting task abort! (sc=ffff8802be18dc00)
> sd 4:0:2:0: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00
> mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset},
> SubCode(0x2000)
> mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be18dc00)
> mptscsih: ioc0: attempting target reset! (sc=ffff8802be18dc00)
> sd 4:0:2:0: [sdc] CDB: ATA command pass through(16): 85 08 0e 00 00 00
> 01 00 00 00 00 00 00 00 ec 00
> mptscsih: ioc0: target reset: SUCCESS (sc=ffff8802be18dc00)
> mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset},
> SubCode(0x2000)
> mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort},
> SubCode(0x0101)
> mptscsih: ioc0: attempting task abort! (sc=ffff8802be18dc00)
> sd 4:0:2:0: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00
> mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset},
> SubCode(0x2000)
> mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be18dc00)
> mptscsih: ioc0: attempting bus reset! (sc=ffff8802be18dc00)
> sd 4:0:2:0: [sdc] CDB: ATA command pass through(16): 85 08 0e 00 00 00
> 01 00 00 00 00 00 00 00 ec 00
> mptscsih: ioc0: bus reset: SUCCESS (sc=ffff8802be18dc00)
> mptbase: ioc0: LogInfo(0x31112000): Originator={PL}, Code={Reset},
> SubCode(0x2000)
> mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort},
> SubCode(0x0101)
> mptbase: ioc0: LogInfo(0x31120101): Originator={PL}, Code={Abort},
> SubCode(0x0101)
> mptscsih: ioc0: attempting host reset! (sc=ffff8802be18dc00)
> mptscsih: ioc0: host reset: SUCCESS (sc=ffff8802be18dc00)
>
>
> As described in
>
> https://bugzilla.kernel.org/show_bug.cgi?id=13594
>
> this can result in nasty side effects, like multiple drives getting
> kicked out of an MD array.
>
> With Ryan's patch applied on top of v2.6.34 I cannot reproduce the
> above problem with my simple script nor with Ryan's example.
>
> So, IMHO, this patch should be strongly considered for inclusion, or
> else the root cause investigated further.
>
> So, as far as I can tell:
>
> Tested-by: Cláudio Martins <ctpm@ist.utl.pt>
>
> I'm also glad to test any further patches, if it turns out that the
> above is not the most correct fix for the issue.
>
> Thanks in advance.
>
> Best regards
>
> Cláudio
>
>
> > I look forward to hearing from you guys who know this hardware and
> code
> > better than I do. Is the hardware at fault, or should the driver be
> > shielding the hardware better? Where's the right place to add this
> code, if
> > it's the right fix?
> >
> > Does this `fix' the problem for anyone besides me?
> >
> > Regards,
> > -- Ryan Kuester
> >
> >
> > Here is a minimal bit of test code which causes the error. BEWARE:
> this
> > will hose the HBA at which you point it. If that's controlling your
> > root disk, you may hang your machine.
> >
> > /*
> > * sg_bomb -- send SG_IO ioctl which causes LSI 1068 HBA to hang
> > *
> > * usage: sg_bomb <device>
> > * e.g.: sg_bomb /dev/sdb
> > * e.g.: sg_bomb /dev/sg1
> > *
> > * Modify offset_into_page to adjust the degree of buffer
> misalignment.
> > */
> >
> > #include <unistd.h>
> > #include <scsi/sg.h>
> > #include <sys/ioctl.h>
> > #include <fcntl.h>
> > #include <stdlib.h>
> >
> > int main(int argc, char* argv[])
> > {
> > char* filename = argv[1];
> > unsigned int offset_into_page = 0xe40;
> > // works: unsigned int offset_into_page = 0x0;
> > // hangs: unsigned int offset_into_page = 0xf00;
> > // works: unsigned int offset_into_page = 0xf04;
> >
> > unsigned char ata_identify_cmd[] = {0x85, 0x08, 0x0e, 0, 0, 0,
> 0x01,
> > 0, 0, 0, 0, 0, 0, 0, 0xec, 0};
> > unsigned char sense[32];
> > unsigned char* data = valloc(0x2000) + offset_into_page;
> > struct sg_io_hdr hdr = {
> > .interface_id = 'S',
> > .dxfer_direction = SG_DXFER_FROM_DEV,
> > .cmdp = ata_identify_cmd,
> > .cmd_len = 16,
> > .dxferp = data,
> > .dxfer_len = 512,
> > .sbp = sense,
> > .mx_sb_len = sizeof(sense),
> > .timeout = 5000,
> > };
> >
> > int fd;
> > if ((fd = open(filename, O_RDWR|O_NONBLOCK)) < 0)
> > perror();
> >
> > return ioctl(fd, SG_IO, &hdr);
> > }
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: mptsas hangs caused by ATA pass-through explained
@ 2010-06-03 20:07 Richard Scobie
2010-06-04 18:31 ` Desai, Kashyap
0 siblings, 1 reply; 5+ messages in thread
From: Richard Scobie @ 2010-06-03 20:07 UTC (permalink / raw)
To: linux-scsi; +Cc: Desai, Kashyap
Kashyap wrote:
I have also verified this patch + also done code review with our
developers including Eric Moore. Please consider this patch as an ACKed
patch and schedule it for next upstream release.
----------------------
Hi,
In comment #20 of this bug entry:
https://bugzilla.kernel.org/show_bug.cgi?id=14831
you state:
"Patch for setting dma boundary is mere avoiding condition which is
causing this
issue. LSI Gen-1 controller does not have 512byte dma boundary limitation. I
have started internal chat with our Firmware engineer. I will update you
findings as and when some imp stuffs are found."
Is this still the case, or is the patch the final fix?
Also, comment #25 in the same thread notes a significant negative write
performance impact.
Are you seeing this?
I have several machines in production I would like to update, but not if
there is a better solution to this on it's way.
Regards,
Richard
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: mptsas hangs caused by ATA pass-through explained
2010-06-03 20:07 Richard Scobie
@ 2010-06-04 18:31 ` Desai, Kashyap
0 siblings, 0 replies; 5+ messages in thread
From: Desai, Kashyap @ 2010-06-04 18:31 UTC (permalink / raw)
To: Richard Scobie, linux-scsi@vger.kernel.org
Richard,
I remember that default dma boundary was 512 byte earlier sometime 2.6.30 kernel version.
Later kernel has introduced 4 byte dma boundary in scsi_lib.c....
As I have mentioned we have limitation of single sgl for ATA PIO commands. So making 512 byte dma boundary will solve all functional issue, but
As mentioned in bugzilla #25, people are seeing performance degradation (may due to bounce buffer usage). I need to debug that part.
In my opinion if setting "dma boundary to 512 byte is a solution only if there is no performance degradation"
Can I have data how user at bugzilla is testing performance?
Thanks,
Kashyap
> -----Original Message-----
> From: Richard Scobie [mailto:richard@sauce.co.nz]
> Sent: Friday, June 04, 2010 1:38 AM
> To: linux-scsi@vger.kernel.org
> Cc: Desai, Kashyap
> Subject: RE: mptsas hangs caused by ATA pass-through explained
>
> Kashyap wrote:
>
> I have also verified this patch + also done code review with our
> developers including Eric Moore. Please consider this patch as an ACKed
> patch and schedule it for next upstream release.
>
> ----------------------
>
> Hi,
>
> In comment #20 of this bug entry:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=14831
>
> you state:
>
> "Patch for setting dma boundary is mere avoiding condition which is
> causing this
> issue. LSI Gen-1 controller does not have 512byte dma boundary
> limitation. I
> have started internal chat with our Firmware engineer. I will update
> you
> findings as and when some imp stuffs are found."
>
> Is this still the case, or is the patch the final fix?
>
> Also, comment #25 in the same thread notes a significant negative write
> performance impact.
>
> Are you seeing this?
>
> I have several machines in production I would like to update, but not
> if
> there is a better solution to this on it's way.
>
> Regards,
>
> Richard
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-06-04 19:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-26 23:11 mptsas hangs caused by ATA pass-through explained Ryan Kuester
2010-06-01 19:43 ` Cláudio Martins
2010-06-03 16:35 ` Desai, Kashyap
-- strict thread matches above, loose matches on Subject: below --
2010-06-03 20:07 Richard Scobie
2010-06-04 18:31 ` Desai, Kashyap
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).