* Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)
[not found] ` <45533468.1060400@gmail.com>
@ 2006-11-09 15:50 ` Douglas Gilbert
2006-11-10 10:36 ` Luben Tuikov
0 siblings, 1 reply; 8+ messages in thread
From: Douglas Gilbert @ 2006-11-09 15:50 UTC (permalink / raw)
To: Tejun Heo
Cc: Brice Goglin, Jens Axboe, Gregor Jasny, Linux Kernel, Jeff Garzik,
linux-ide, monty, linux-scsi
Tejun Heo wrote:
> [CC'ing Monty and Douglas.]
>
> Hello, the original thread can be read from the following URL.
>
> http://thread.gmane.org/gmane.linux.ide/13708/focus=13708
>
> Brice Goglin wrote:
>> ens Axboe wrote:
>>> On Mon, Oct 30 2006, Gregor Jasny wrote:
>>>
>>>> 2006/10/30, Jens Axboe <jens.axboe@oracle.com>:
>>>>
>>>>> Can you confirm that 2.6.18 works?
>>>>>
>>>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
>>>> and 2.6.18, too.
>>>>
>>>> Gregor
>>>>
>>>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901
>>>>
>>> Ok, mainly just checking if this was a potential dupe of another bug.
>>>
>>>
>>
>> Jens (or anybody else who has any idea of how to debug this),
>>
>> Did you have a chance to reproduce the problem? I guess we "only" need a
>> machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug
>> some stuff, feel free to tell me what. But, since it freezes the machine
>> and sysrq doesn't even work, I don't really know what to try...
>>
>> I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17
>> and .18 do, don't know about earlier kernels). I didn't have a audio CD
>> here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0
>> (from Debian testing), it reports nothing during about 5 seconds and
>> then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable),
>> it reports an error very quickly, and dmesg gets a couple line like
>> these:
>> sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing
>> data in;
>> program cdparanoia not setting count and/or reply_len properly
>
> Okay, here's the story.
>
> In interface/scan_devices.c::cdda_identify_scsi(), cdparanoia calls
> scsi_inquiry() to identify the device and determine interface type. This
> seems to be the first time to actually issue commands to the device. As
> interface type isn't completely determined, for sg devices, it first
> issues the command w/ d->interface set to SGIO_SCSI. If that fails, it
> falls back to SGIO_SCSI_BUGGY1.
>
> For to-device request, both SGIO_SCSI and SGIO_SCSI_BUGGY1 set
> sg_io_hdr.dxfer_direction to SG_DXFER_TO_DEV. But for from-device
> request, SGIO_SCSI uses SG_DXFER_TO_FROM_DEV while SGIO_SCSI_BUGGY1 uses
> SG_DXFER_FROM_DEV. So, cdparanoia first issues inquiry w/
> SG_DXFER_TO_FROM_DEV and if that fails falls back to SG_DXFER_FROM_DEV.
>
> drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
> block/scsi_ioctl.c interprets it as write. I guess this is historic
> thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten). As
> written above, cdparanoia can handle both cases as long as the kernel
> promptly fails command issued with the wrong direction.
>
> This works for most PATA ATAPI devices. Most devices detect reversed
> transfer and terminate the command promptly. But this doesn't seem to
> be true for SATA device. Many just hang and time out commands with the
> wrong transfer direction. If you consider that most early SATA ATAPI
> devices are actually PATA + bridge, this is sorta inevitable. The
> PATA-SATA bridge cannot issue D2H FIS to abort the command by itself.
> It's just mirroring the status of PATA side and PATA side doesn't know
> SATA protocol mismatch has occurred.
>
> So, IDENTIFY w/ write-DMA protocol times out after quite some seconds.
> This is where things go worse from bad. SATA controllers which have
> shadow TF registers don't handle timeout conditions very well,
> especially when they're waiting for data transfer. They basically hold
> the PCI bus and hang till the transfer completes (which never happens).
> That's where the hard lock up comes from.
>
> Jens, I think we need to match block sg's behavior to SCSI's. Monty,
> the timeout and hard lock up are due to hardware restrictions. Kernel
> and libata can't do much about it. So, please find other way to detect
> interface.
Tejun,
Your SG_DXFER_TO_FROM_DEV analysis is correct.
The stupid ~!@# who wrote the code, and the documentation
for it, defined SG_DXFER_TO_FROM_DEV to mean a "transfer
from device" operation where the kernel buffer receiving
the DMA transfer was prefilled with data that the application
provided. That certainly isn't a bidirectional transfer to/from
the device, but it is a bidirectional transfer to kernel
buffers when indirect IO is used.
Why do this? Because the 'resid' field indicating how much
less data was transferred in a "from_device" transfer than
was requested, was not added to SCSI infrastructure till much
later. There are still LLDs out there that don't implement it.
It also reflected a similar technique used with the sg_header
structure (circa 1992) for precisely the same reason. And
application writers wanted that functionality. Joerg was the
first name of one such application writer.
Coincidentally I am sitting on a patch from Luben Tuikov
to cause the same breakage in the sg driver itself.
Nobody has proposed a patch to the documentation for
the explanation of SG_DXFER_TO_FROM_DEV :-)
http://www.torque.net/sg/p/sg_v3_ho.html
As I am currently proposing a SCSI pass through version 4
interface with twin scatter gather lists for independent
bidirectional transfers for SCSI commands, I'm not sure
what setting DMA_BIDIRECTIONAL in the existing interface
buys us.
When you maintain and document a pass through interface you
sit between two groups of people that have conflicting goals
and don't have a particularly high opinion of each other.
Doug Gilbert
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)
2006-11-09 15:50 ` 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO) Douglas Gilbert
@ 2006-11-10 10:36 ` Luben Tuikov
2006-11-10 12:58 ` Douglas Gilbert
0 siblings, 1 reply; 8+ messages in thread
From: Luben Tuikov @ 2006-11-10 10:36 UTC (permalink / raw)
To: dougg, Tejun Heo
Cc: Brice Goglin, Jens Axboe, Gregor Jasny, Linux Kernel, Jeff Garzik,
linux-ide, monty, linux-scsi
--- Douglas Gilbert <dougg@torque.net> wrote:
> Tejun Heo wrote:
> > [CC'ing Monty and Douglas.]
> >
> > Hello, the original thread can be read from the following URL.
> >
> > http://thread.gmane.org/gmane.linux.ide/13708/focus=13708
> >
> > Brice Goglin wrote:
> >> ens Axboe wrote:
> >>> On Mon, Oct 30 2006, Gregor Jasny wrote:
> >>>
> >>>> 2006/10/30, Jens Axboe <jens.axboe@oracle.com>:
> >>>>
> >>>>> Can you confirm that 2.6.18 works?
> >>>>>
> >>>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
> >>>> and 2.6.18, too.
> >>>>
> >>>> Gregor
> >>>>
> >>>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901
> >>>>
> >>> Ok, mainly just checking if this was a potential dupe of another bug.
> >>>
> >>>
> >>
> >> Jens (or anybody else who has any idea of how to debug this),
> >>
> >> Did you have a chance to reproduce the problem? I guess we "only" need a
> >> machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug
> >> some stuff, feel free to tell me what. But, since it freezes the machine
> >> and sysrq doesn't even work, I don't really know what to try...
> >>
> >> I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17
> >> and .18 do, don't know about earlier kernels). I didn't have a audio CD
> >> here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0
> >> (from Debian testing), it reports nothing during about 5 seconds and
> >> then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable),
> >> it reports an error very quickly, and dmesg gets a couple line like
> >> these:
> >> sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing
> >> data in;
> >> program cdparanoia not setting count and/or reply_len properly
> >
> > Okay, here's the story.
> >
> > In interface/scan_devices.c::cdda_identify_scsi(), cdparanoia calls
> > scsi_inquiry() to identify the device and determine interface type. This
> > seems to be the first time to actually issue commands to the device. As
> > interface type isn't completely determined, for sg devices, it first
> > issues the command w/ d->interface set to SGIO_SCSI. If that fails, it
> > falls back to SGIO_SCSI_BUGGY1.
> >
> > For to-device request, both SGIO_SCSI and SGIO_SCSI_BUGGY1 set
> > sg_io_hdr.dxfer_direction to SG_DXFER_TO_DEV. But for from-device
> > request, SGIO_SCSI uses SG_DXFER_TO_FROM_DEV while SGIO_SCSI_BUGGY1 uses
> > SG_DXFER_FROM_DEV. So, cdparanoia first issues inquiry w/
> > SG_DXFER_TO_FROM_DEV and if that fails falls back to SG_DXFER_FROM_DEV.
> >
> > drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
> > block/scsi_ioctl.c interprets it as write. I guess this is historic
> > thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten). As
> > written above, cdparanoia can handle both cases as long as the kernel
> > promptly fails command issued with the wrong direction.
> >
> > This works for most PATA ATAPI devices. Most devices detect reversed
> > transfer and terminate the command promptly. But this doesn't seem to
> > be true for SATA device. Many just hang and time out commands with the
> > wrong transfer direction. If you consider that most early SATA ATAPI
> > devices are actually PATA + bridge, this is sorta inevitable. The
> > PATA-SATA bridge cannot issue D2H FIS to abort the command by itself.
> > It's just mirroring the status of PATA side and PATA side doesn't know
> > SATA protocol mismatch has occurred.
> >
> > So, IDENTIFY w/ write-DMA protocol times out after quite some seconds.
> > This is where things go worse from bad. SATA controllers which have
> > shadow TF registers don't handle timeout conditions very well,
> > especially when they're waiting for data transfer. They basically hold
> > the PCI bus and hang till the transfer completes (which never happens).
> > That's where the hard lock up comes from.
> >
> > Jens, I think we need to match block sg's behavior to SCSI's. Monty,
> > the timeout and hard lock up are due to hardware restrictions. Kernel
> > and libata can't do much about it. So, please find other way to detect
> > interface.
>
> Tejun,
> Your SG_DXFER_TO_FROM_DEV analysis is correct.
>
> The stupid ~!@# who wrote the code, and the documentation
> for it, defined SG_DXFER_TO_FROM_DEV to mean a "transfer
> from device" operation where the kernel buffer receiving
> the DMA transfer was prefilled with data that the application
> provided. That certainly isn't a bidirectional transfer to/from
> the device, but it is a bidirectional transfer to kernel
> buffers when indirect IO is used.
>
> Why do this? Because the 'resid' field indicating how much
> less data was transferred in a "from_device" transfer than
> was requested, was not added to SCSI infrastructure till much
> later. There are still LLDs out there that don't implement it.
> It also reflected a similar technique used with the sg_header
> structure (circa 1992) for precisely the same reason. And
> application writers wanted that functionality. Joerg was the
> first name of one such application writer.
>
>
> Coincidentally I am sitting on a patch from Luben Tuikov
> to cause the same breakage in the sg driver itself.
Here is a link to the recently posted 8 month patch:
http://marc.theaimsgroup.com/?l=linux-scsi&m=116267031029025&w=2
The patch would appear to fix the problem Tejun is describing.
I cannot quite remember exactly what I was doing that day 8 months
ago, but was either disk or tape devices testing and arrived
at that patch.
This patch had been in my dev (gateway) tree for the last 8
months, without any problems.
Luben
> Nobody has proposed a patch to the documentation for
> the explanation of SG_DXFER_TO_FROM_DEV :-)
> http://www.torque.net/sg/p/sg_v3_ho.html
>
>
> As I am currently proposing a SCSI pass through version 4
> interface with twin scatter gather lists for independent
> bidirectional transfers for SCSI commands, I'm not sure
> what setting DMA_BIDIRECTIONAL in the existing interface
> buys us.
>
>
> When you maintain and document a pass through interface you
> sit between two groups of people that have conflicting goals
> and don't have a particularly high opinion of each other.
>
> Doug Gilbert
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)
2006-11-10 10:36 ` Luben Tuikov
@ 2006-11-10 12:58 ` Douglas Gilbert
2006-11-10 20:08 ` Luben Tuikov
0 siblings, 1 reply; 8+ messages in thread
From: Douglas Gilbert @ 2006-11-10 12:58 UTC (permalink / raw)
To: ltuikov
Cc: Tejun Heo, Brice Goglin, Jens Axboe, Gregor Jasny, Linux Kernel,
Jeff Garzik, linux-ide, monty, linux-scsi
Luben Tuikov wrote:
> --- Douglas Gilbert <dougg@torque.net> wrote:
>> Tejun Heo wrote:
>>> [CC'ing Monty and Douglas.]
>>>
>>> Hello, the original thread can be read from the following URL.
>>>
>>> http://thread.gmane.org/gmane.linux.ide/13708/focus=13708
>>>
>>> Brice Goglin wrote:
>>>> ens Axboe wrote:
>>>>> On Mon, Oct 30 2006, Gregor Jasny wrote:
>>>>>
>>>>>> 2006/10/30, Jens Axboe <jens.axboe@oracle.com>:
>>>>>>
>>>>>>> Can you confirm that 2.6.18 works?
>>>>>>>
>>>>>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
>>>>>> and 2.6.18, too.
>>>>>>
>>>>>> Gregor
>>>>>>
>>>>>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901
>>>>>>
>>>>> Ok, mainly just checking if this was a potential dupe of another bug.
>>>>>
>>>>>
>>>> Jens (or anybody else who has any idea of how to debug this),
>>>>
>>>> Did you have a chance to reproduce the problem? I guess we "only" need a
>>>> machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug
>>>> some stuff, feel free to tell me what. But, since it freezes the machine
>>>> and sysrq doesn't even work, I don't really know what to try...
>>>>
>>>> I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17
>>>> and .18 do, don't know about earlier kernels). I didn't have a audio CD
>>>> here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0
>>>> (from Debian testing), it reports nothing during about 5 seconds and
>>>> then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable),
>>>> it reports an error very quickly, and dmesg gets a couple line like
>>>> these:
>>>> sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing
>>>> data in;
>>>> program cdparanoia not setting count and/or reply_len properly
>>> Okay, here's the story.
>>>
>>> In interface/scan_devices.c::cdda_identify_scsi(), cdparanoia calls
>>> scsi_inquiry() to identify the device and determine interface type. This
>>> seems to be the first time to actually issue commands to the device. As
>>> interface type isn't completely determined, for sg devices, it first
>>> issues the command w/ d->interface set to SGIO_SCSI. If that fails, it
>>> falls back to SGIO_SCSI_BUGGY1.
>>>
>>> For to-device request, both SGIO_SCSI and SGIO_SCSI_BUGGY1 set
>>> sg_io_hdr.dxfer_direction to SG_DXFER_TO_DEV. But for from-device
>>> request, SGIO_SCSI uses SG_DXFER_TO_FROM_DEV while SGIO_SCSI_BUGGY1 uses
>>> SG_DXFER_FROM_DEV. So, cdparanoia first issues inquiry w/
>>> SG_DXFER_TO_FROM_DEV and if that fails falls back to SG_DXFER_FROM_DEV.
>>>
>>> drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
>>> block/scsi_ioctl.c interprets it as write. I guess this is historic
>>> thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten). As
>>> written above, cdparanoia can handle both cases as long as the kernel
>>> promptly fails command issued with the wrong direction.
>>>
>>> This works for most PATA ATAPI devices. Most devices detect reversed
>>> transfer and terminate the command promptly. But this doesn't seem to
>>> be true for SATA device. Many just hang and time out commands with the
>>> wrong transfer direction. If you consider that most early SATA ATAPI
>>> devices are actually PATA + bridge, this is sorta inevitable. The
>>> PATA-SATA bridge cannot issue D2H FIS to abort the command by itself.
>>> It's just mirroring the status of PATA side and PATA side doesn't know
>>> SATA protocol mismatch has occurred.
>>>
>>> So, IDENTIFY w/ write-DMA protocol times out after quite some seconds.
>>> This is where things go worse from bad. SATA controllers which have
>>> shadow TF registers don't handle timeout conditions very well,
>>> especially when they're waiting for data transfer. They basically hold
>>> the PCI bus and hang till the transfer completes (which never happens).
>>> That's where the hard lock up comes from.
>>>
>>> Jens, I think we need to match block sg's behavior to SCSI's. Monty,
>>> the timeout and hard lock up are due to hardware restrictions. Kernel
>>> and libata can't do much about it. So, please find other way to detect
>>> interface.
>> Tejun,
>> Your SG_DXFER_TO_FROM_DEV analysis is correct.
>>
>> The stupid ~!@# who wrote the code, and the documentation
>> for it, defined SG_DXFER_TO_FROM_DEV to mean a "transfer
>> from device" operation where the kernel buffer receiving
>> the DMA transfer was prefilled with data that the application
>> provided. That certainly isn't a bidirectional transfer to/from
>> the device, but it is a bidirectional transfer to kernel
>> buffers when indirect IO is used.
>>
>> Why do this? Because the 'resid' field indicating how much
>> less data was transferred in a "from_device" transfer than
>> was requested, was not added to SCSI infrastructure till much
>> later. There are still LLDs out there that don't implement it.
>> It also reflected a similar technique used with the sg_header
>> structure (circa 1992) for precisely the same reason. And
>> application writers wanted that functionality. Joerg was the
>> first name of one such application writer.
>>
>>
>> Coincidentally I am sitting on a patch from Luben Tuikov
>> to cause the same breakage in the sg driver itself.
>
> Here is a link to the recently posted 8 month patch:
> http://marc.theaimsgroup.com/?l=linux-scsi&m=116267031029025&w=2
>
> The patch would appear to fix the problem Tejun is describing.
>
> I cannot quite remember exactly what I was doing that day 8 months
> ago, but was either disk or tape devices testing and arrived
> at that patch.
>
> This patch had been in my dev (gateway) tree for the last 8
> months, without any problems.
>
> Luben
>
>
>> Nobody has proposed a patch to the documentation for
>> the explanation of SG_DXFER_TO_FROM_DEV :-)
>> http://www.torque.net/sg/p/sg_v3_ho.html
^^^^^^^^^^^^^
Luben,
The failure being reported is that the block layer
SG_IO ioctl already does what you are proposing to
do for the sg driver.
Hence an application, cdparanoia in this case, since
it coded against documented behaviour, assumes that
SG_DXFER_TO_FROM_DEV will read from the device.
See the definition of SG_DXFER_TO_FROM_DEV in sg.h and
the document above.
So your proposed patch would compound the problem. The
solution is _not_ to change the sg driver and put the
equivalent of the reverse of your patch in the block
layer SG_IO ioctl.
There is nothing to stop a new direction flag being
added called SG_DXFER_BIDIRECTIONAL that maps to
DMA_BIDIRECTIONAL.
Doug Gilbert
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)
2006-11-10 12:58 ` Douglas Gilbert
@ 2006-11-10 20:08 ` Luben Tuikov
2006-11-11 10:46 ` Christoph Hellwig
0 siblings, 1 reply; 8+ messages in thread
From: Luben Tuikov @ 2006-11-10 20:08 UTC (permalink / raw)
To: dougg
Cc: Tejun Heo, Brice Goglin, Jens Axboe, Gregor Jasny, Linux Kernel,
Jeff Garzik, linux-ide, monty, linux-scsi
--- Douglas Gilbert <dougg@torque.net> wrote:
> Luben Tuikov wrote:
> > --- Douglas Gilbert <dougg@torque.net> wrote:
> >> Tejun Heo wrote:
> >>> [CC'ing Monty and Douglas.]
> >>>
> >>> Hello, the original thread can be read from the following URL.
> >>>
> >>> http://thread.gmane.org/gmane.linux.ide/13708/focus=13708
> >>>
> >>> Brice Goglin wrote:
> >>>> ens Axboe wrote:
> >>>>> On Mon, Oct 30 2006, Gregor Jasny wrote:
> >>>>>
> >>>>>> 2006/10/30, Jens Axboe <jens.axboe@oracle.com>:
> >>>>>>
> >>>>>>> Can you confirm that 2.6.18 works?
> >>>>>>>
> >>>>>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
> >>>>>> and 2.6.18, too.
> >>>>>>
> >>>>>> Gregor
> >>>>>>
> >>>>>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901
> >>>>>>
> >>>>> Ok, mainly just checking if this was a potential dupe of another bug.
> >>>>>
> >>>>>
> >>>> Jens (or anybody else who has any idea of how to debug this),
> >>>>
> >>>> Did you have a chance to reproduce the problem? I guess we "only" need a
> >>>> machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug
> >>>> some stuff, feel free to tell me what. But, since it freezes the machine
> >>>> and sysrq doesn't even work, I don't really know what to try...
> >>>>
> >>>> I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17
> >>>> and .18 do, don't know about earlier kernels). I didn't have a audio CD
> >>>> here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0
> >>>> (from Debian testing), it reports nothing during about 5 seconds and
> >>>> then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable),
> >>>> it reports an error very quickly, and dmesg gets a couple line like
> >>>> these:
> >>>> sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing
> >>>> data in;
> >>>> program cdparanoia not setting count and/or reply_len properly
> >>> Okay, here's the story.
> >>>
> >>> In interface/scan_devices.c::cdda_identify_scsi(), cdparanoia calls
> >>> scsi_inquiry() to identify the device and determine interface type. This
> >>> seems to be the first time to actually issue commands to the device. As
> >>> interface type isn't completely determined, for sg devices, it first
> >>> issues the command w/ d->interface set to SGIO_SCSI. If that fails, it
> >>> falls back to SGIO_SCSI_BUGGY1.
> >>>
> >>> For to-device request, both SGIO_SCSI and SGIO_SCSI_BUGGY1 set
> >>> sg_io_hdr.dxfer_direction to SG_DXFER_TO_DEV. But for from-device
> >>> request, SGIO_SCSI uses SG_DXFER_TO_FROM_DEV while SGIO_SCSI_BUGGY1 uses
> >>> SG_DXFER_FROM_DEV. So, cdparanoia first issues inquiry w/
> >>> SG_DXFER_TO_FROM_DEV and if that fails falls back to SG_DXFER_FROM_DEV.
> >>>
> >>> drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
> >>> block/scsi_ioctl.c interprets it as write. I guess this is historic
> >>> thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten). As
> >>> written above, cdparanoia can handle both cases as long as the kernel
> >>> promptly fails command issued with the wrong direction.
> >>>
> >>> This works for most PATA ATAPI devices. Most devices detect reversed
> >>> transfer and terminate the command promptly. But this doesn't seem to
> >>> be true for SATA device. Many just hang and time out commands with the
> >>> wrong transfer direction. If you consider that most early SATA ATAPI
> >>> devices are actually PATA + bridge, this is sorta inevitable. The
> >>> PATA-SATA bridge cannot issue D2H FIS to abort the command by itself.
> >>> It's just mirroring the status of PATA side and PATA side doesn't know
> >>> SATA protocol mismatch has occurred.
> >>>
> >>> So, IDENTIFY w/ write-DMA protocol times out after quite some seconds.
> >>> This is where things go worse from bad. SATA controllers which have
> >>> shadow TF registers don't handle timeout conditions very well,
> >>> especially when they're waiting for data transfer. They basically hold
> >>> the PCI bus and hang till the transfer completes (which never happens).
> >>> That's where the hard lock up comes from.
> >>>
> >>> Jens, I think we need to match block sg's behavior to SCSI's. Monty,
> >>> the timeout and hard lock up are due to hardware restrictions. Kernel
> >>> and libata can't do much about it. So, please find other way to detect
> >>> interface.
> >> Tejun,
> >> Your SG_DXFER_TO_FROM_DEV analysis is correct.
> >>
> >> The stupid ~!@# who wrote the code, and the documentation
> >> for it, defined SG_DXFER_TO_FROM_DEV to mean a "transfer
> >> from device" operation where the kernel buffer receiving
> >> the DMA transfer was prefilled with data that the application
> >> provided. That certainly isn't a bidirectional transfer to/from
> >> the device, but it is a bidirectional transfer to kernel
> >> buffers when indirect IO is used.
> >>
> >> Why do this? Because the 'resid' field indicating how much
> >> less data was transferred in a "from_device" transfer than
> >> was requested, was not added to SCSI infrastructure till much
> >> later. There are still LLDs out there that don't implement it.
> >> It also reflected a similar technique used with the sg_header
> >> structure (circa 1992) for precisely the same reason. And
> >> application writers wanted that functionality. Joerg was the
> >> first name of one such application writer.
> >>
> >>
> >> Coincidentally I am sitting on a patch from Luben Tuikov
> >> to cause the same breakage in the sg driver itself.
> >
> > Here is a link to the recently posted 8 month patch:
> > http://marc.theaimsgroup.com/?l=linux-scsi&m=116267031029025&w=2
> >
> > The patch would appear to fix the problem Tejun is describing.
> >
> > I cannot quite remember exactly what I was doing that day 8 months
> > ago, but was either disk or tape devices testing and arrived
> > at that patch.
> >
> > This patch had been in my dev (gateway) tree for the last 8
> > months, without any problems.
> >
> > Luben
> >
> >
> >> Nobody has proposed a patch to the documentation for
> >> the explanation of SG_DXFER_TO_FROM_DEV :-)
> >> http://www.torque.net/sg/p/sg_v3_ho.html
> ^^^^^^^^^^^^^
>
> Luben,
> The failure being reported is that the block layer
> SG_IO ioctl already does what you are proposing to
> do for the sg driver.
>
> Hence an application, cdparanoia in this case, since
> it coded against documented behaviour, assumes that
> SG_DXFER_TO_FROM_DEV will read from the device.
> See the definition of SG_DXFER_TO_FROM_DEV in sg.h and
> the document above.
>
> So your proposed patch would compound the problem. The
> solution is _not_ to change the sg driver and put the
> equivalent of the reverse of your patch in the block
> layer SG_IO ioctl.
>
> There is nothing to stop a new direction flag being
> added called SG_DXFER_BIDIRECTIONAL that maps to
> DMA_BIDIRECTIONAL.
Sounds good!
Luben
P.S. I'd love to see SG_DXFER_TO_FROM_DEV completely ripped out
of sg.c, for obvious reasons. Can you not duplicate the resid "fix"
it provides into "FROM_DEV" -- do apps really rely on it?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)
2006-11-10 20:08 ` Luben Tuikov
@ 2006-11-11 10:46 ` Christoph Hellwig
2006-11-11 16:39 ` Douglas Gilbert
2006-11-11 19:09 ` Luben Tuikov
0 siblings, 2 replies; 8+ messages in thread
From: Christoph Hellwig @ 2006-11-11 10:46 UTC (permalink / raw)
To: Luben Tuikov
Cc: dougg, Tejun Heo, Brice Goglin, Jens Axboe, Gregor Jasny,
Linux Kernel, Jeff Garzik, linux-ide, monty, linux-scsi
On Fri, Nov 10, 2006 at 12:08:15PM -0800, Luben Tuikov wrote:
> P.S. I'd love to see SG_DXFER_TO_FROM_DEV completely ripped out
> of sg.c, for obvious reasons. Can you not duplicate the resid "fix"
> it provides into "FROM_DEV" -- do apps really rely on it?
At the beginning of this thread it was mentioned cdparanio uses it.
But in general we can't just rip out userland interfaces, we pretend
to have a stable userspace abi (and except for the big sysfs mess that
actually comes very close to the truth).
What we should do is to document very well what SG_DXFER_TO_FROM_DEV
is doing and that odd name that's been chosen for it. I'll prepare
a patch for that.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)
2006-11-11 10:46 ` Christoph Hellwig
@ 2006-11-11 16:39 ` Douglas Gilbert
2006-11-11 19:09 ` Luben Tuikov
1 sibling, 0 replies; 8+ messages in thread
From: Douglas Gilbert @ 2006-11-11 16:39 UTC (permalink / raw)
To: Christoph Hellwig, Luben Tuikov, Tejun Heo, Brice Goglin,
Jens Axboe, Gregor Jasny, Linux Kernel, Jeff Garzik, linux-ide,
monty, linux-scsi
Christoph Hellwig wrote:
> On Fri, Nov 10, 2006 at 12:08:15PM -0800, Luben Tuikov wrote:
>> P.S. I'd love to see SG_DXFER_TO_FROM_DEV completely ripped out
>> of sg.c, for obvious reasons. Can you not duplicate the resid "fix"
>> it provides into "FROM_DEV" -- do apps really rely on it?
>
> At the beginning of this thread it was mentioned cdparanio uses it.
> But in general we can't just rip out userland interfaces, we pretend
> to have a stable userspace abi (and except for the big sysfs mess that
> actually comes very close to the truth).
>
> What we should do is to document very well what SG_DXFER_TO_FROM_DEV
> is doing and that odd name that's been chosen for it. I'll prepare
> a patch for that.
Christoph,
It is documented and has been from day one. See scsi/sg.h
and http://sg.torque.net/sg/p/sg_v3_ho.html
Naming it is a challenge and at the time there
were no bidirectional transfers to/from a device
to worry about.
A more appropriate but impractical name might be:
SG_DXFER_TO_KERNEL_BUFFER_THEN_READ_FROM_DEV_VIA_KERNEL_BUFFER
Doug Gilbert
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)
2006-11-11 10:46 ` Christoph Hellwig
2006-11-11 16:39 ` Douglas Gilbert
@ 2006-11-11 19:09 ` Luben Tuikov
2006-11-14 22:52 ` Monty Montgomery
1 sibling, 1 reply; 8+ messages in thread
From: Luben Tuikov @ 2006-11-11 19:09 UTC (permalink / raw)
To: Christoph Hellwig
Cc: dougg, Tejun Heo, Brice Goglin, Jens Axboe, Gregor Jasny,
Linux Kernel, Jeff Garzik, linux-ide, monty, linux-scsi
--- Christoph Hellwig <hch@infradead.org> wrote:
> On Fri, Nov 10, 2006 at 12:08:15PM -0800, Luben Tuikov wrote:
> > P.S. I'd love to see SG_DXFER_TO_FROM_DEV completely ripped out
> > of sg.c, for obvious reasons. Can you not duplicate the resid "fix"
> > it provides into "FROM_DEV" -- do apps really rely on it?
>
> At the beginning of this thread it was mentioned cdparanio uses it.
> But in general we can't just rip out userland interfaces, we pretend
> to have a stable userspace abi (and except for the big sysfs mess that
> actually comes very close to the truth).
The more reason to think things thorougly when introducing
new code and architecture into a kernel.
Luben
> What we should do is to document very well what SG_DXFER_TO_FROM_DEV
> is doing and that odd name that's been chosen for it. I'll prepare
> a patch for that.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)
2006-11-11 19:09 ` Luben Tuikov
@ 2006-11-14 22:52 ` Monty Montgomery
0 siblings, 0 replies; 8+ messages in thread
From: Monty Montgomery @ 2006-11-14 22:52 UTC (permalink / raw)
To: ltuikov
Cc: Christoph Hellwig, dougg, Tejun Heo, Brice Goglin, Jens Axboe,
Gregor Jasny, Linux Kernel, Jeff Garzik, linux-ide, linux-scsi
On 11/11/06, Luben Tuikov <ltuikov@yahoo.com> wrote:
> --- Christoph Hellwig <hch@infradead.org> wrote:
> > On Fri, Nov 10, 2006 at 12:08:15PM -0800, Luben Tuikov wrote:
> > > P.S. I'd love to see SG_DXFER_TO_FROM_DEV completely ripped out
> > > of sg.c, for obvious reasons. Can you not duplicate the resid "fix"
> > > it provides into "FROM_DEV" -- do apps really rely on it?
> >
> > At the beginning of this thread it was mentioned cdparanio uses it.
> > But in general we can't just rip out userland interfaces, we pretend
> > to have a stable userspace abi (and except for the big sysfs mess that
> > actually comes very close to the truth).
>
> The more reason to think things thorougly when introducing
> new code and architecture into a kernel.
It was introduced for a good reason, and that reason is still relevant
today. Cdparanoia is not using it gratuitously. The only problem is
that the implementation had a bug (well, at least two bugs) and only
sg ever implemented it correctly. Had block and sata implemente dit
correctly, we'd not be having this discussion.
Or you can blame a lower level layer for having no way to inform
mid-level drivers that DMA only completed a partial transfer.
"but anyway"...
This lockup was happening using SATA through the block layer, or does
SATA implement its own version of the ioctl? Back when I was testing
my probing code, the buggy kernel would reject the request, not lock
up-- did a change make it inot 2.6.18 or later that causes a lockup
instead?
(I never tested with SATA cdroms, as I don't have any. I tested with
IDE and SCSI and saw correct or detectable behavior)
Monty
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-11-14 22:52 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <9d2cd630610291120l3f1b8053i5337cf3a97ba6ff0@mail.gmail.com>
[not found] ` <20061030114503.GW4563@kernel.dk>
[not found] ` <9d2cd630610300517q5187043eieb0880047ddd03eb@mail.gmail.com>
[not found] ` <20061030132745.GE4563@kernel.dk>
[not found] ` <4552F905.3020109@ens-lyon.org>
[not found] ` <45533468.1060400@gmail.com>
2006-11-09 15:50 ` 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO) Douglas Gilbert
2006-11-10 10:36 ` Luben Tuikov
2006-11-10 12:58 ` Douglas Gilbert
2006-11-10 20:08 ` Luben Tuikov
2006-11-11 10:46 ` Christoph Hellwig
2006-11-11 16:39 ` Douglas Gilbert
2006-11-11 19:09 ` Luben Tuikov
2006-11-14 22:52 ` Monty Montgomery
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).