* [Bug 64171] Block SCSI Generic Driver does not keep data
2013-11-01 19:49 [Bug 64171] New: Block SCSI Generic Driver does not keep data bugzilla-daemon
@ 2013-11-01 19:52 ` bugzilla-daemon
2013-11-02 16:59 ` [Bug 64171] New: " Douglas Gilbert
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2013-11-01 19:52 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=64171
--- Comment #1 from Andrew Falanga <af300wsm@gmail.com> ---
Created attachment 113051
--> https://bugzilla.kernel.org/attachment.cgi?id=113051&action=edit
Demonstration program for BSG defect
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug 64171] New: Block SCSI Generic Driver does not keep data
2013-11-01 19:49 [Bug 64171] New: Block SCSI Generic Driver does not keep data bugzilla-daemon
2013-11-01 19:52 ` [Bug 64171] " bugzilla-daemon
@ 2013-11-02 16:59 ` Douglas Gilbert
2013-11-02 17:09 ` [Bug 64171] " bugzilla-daemon
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Douglas Gilbert @ 2013-11-02 16:59 UTC (permalink / raw)
To: bugzilla-daemon, linux-scsi
On 13-11-01 03:49 PM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=64171
>
> Bug ID: 64171
> Summary: Block SCSI Generic Driver does not keep data
> Product: SCSI Drivers
> Version: 2.5
> Kernel Version: 2.6.32.61
> Hardware: All
> OS: Linux
> Tree: Mainline
> Status: NEW
> Severity: high
> Priority: P1
> Component: Other
> Assignee: scsi_drivers-other@kernel-bugs.osdl.org
> Reporter: af300wsm@gmail.com
> Regression: No
>
> Data written to any given file descriptor should be unique to that descriptor
> and processor space. Currently, the BSG Driver does not keep this uniqueness.
> As the attached simple program demonstrates, a SCSI Command queued to the
> device in one process is dequeued by another process which has opened a handle
> to the same device.
>
> The attached file sends the simple SCSI "Test Unit Ready" command from the SCSI
> Primary Command Spec. to the device using the BSG driver. As the program
> demonstrates, the sg_io_v4.usr_ptr field, which is set in the "push" branch of
> the program, is dequeued from the "pop" branch of the code.
>
> I also tested this behavior on Fedora 19 and the bug exists there as well. F19
> uses kernel 3.9.5.
>
> Compile the attachment:
> g++ -o <out> combined.cpp
>
>
> Execute as follows:
> sudo combined pop /dev/bsg/0:0:0:0 &
> sudo combined push /dev/bsg/0:0:0:0
I ran this test on lk 3.11.6 and it also exhibits this
problem.
When the bsg driver was originally designed, if my memory is
correct, it did not have an asynchronous interface, so it
skipped the complexity of keeping a separate context for
each file handle within each device.
With the addition of the asynchronous interface, the lack of
file handle context is exposed by your simple test. I'm
pretty sure that parallel test programs could show that
synchronous SG_IO ioctls can also be tricked. For example:
send INQUIRYs continuously from one process, TURs from
another process to the same device. Then, once in a while,
I guess that they would pick up the other one's response.
As for fixing it, that seems like a lot of work. I'm busy with
the sg driver at the moment. The short term solution is to use
the sg driver instead of the bsg driver in cases like these
unless:
a) you want to do SCSI bidirectional commands
b) you want to send SCSI commands whose cdb is greater
that 16 bytes
Observation:
wc bsg.c bsg-lib.c
1117 2761 24144 bsg.c
232 797 6117 bsg-lib.c
1349 3558 30261 total
wc sg.c
2669 8327 72340 sg.c
Some of those sg.c lines are bloat and for backward compatibility
(to 1992); but not all of them!
The sg driver has problems which several people are looking at,
but not as fundamental as the one reported here.
Another random thought: if the bsg driver implemented O_EXCL
on its open()s [it just ignores it] then that would be one
mechanism that could be used to guard against what has been
observed.
Doug Gilbert
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 64171] Block SCSI Generic Driver does not keep data
2013-11-01 19:49 [Bug 64171] New: Block SCSI Generic Driver does not keep data bugzilla-daemon
2013-11-01 19:52 ` [Bug 64171] " bugzilla-daemon
2013-11-02 16:59 ` [Bug 64171] New: " Douglas Gilbert
@ 2013-11-02 17:09 ` bugzilla-daemon
2013-11-15 16:42 ` James Bottomley
2013-11-15 15:52 ` bugzilla-daemon
` (2 subsequent siblings)
5 siblings, 1 reply; 9+ messages in thread
From: bugzilla-daemon @ 2013-11-02 17:09 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=64171
--- Comment #2 from d gilbert <dgilbert@interlog.com> ---
On 13-11-01 03:49 PM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=64171
>
> Bug ID: 64171
> Summary: Block SCSI Generic Driver does not keep data
> Product: SCSI Drivers
> Version: 2.5
> Kernel Version: 2.6.32.61
> Hardware: All
> OS: Linux
> Tree: Mainline
> Status: NEW
> Severity: high
> Priority: P1
> Component: Other
> Assignee: scsi_drivers-other@kernel-bugs.osdl.org
> Reporter: af300wsm@gmail.com
> Regression: No
>
> Data written to any given file descriptor should be unique to that descriptor
> and processor space. Currently, the BSG Driver does not keep this uniqueness.
> As the attached simple program demonstrates, a SCSI Command queued to the
> device in one process is dequeued by another process which has opened a handle
> to the same device.
>
> The attached file sends the simple SCSI "Test Unit Ready" command from the SCSI
> Primary Command Spec. to the device using the BSG driver. As the program
> demonstrates, the sg_io_v4.usr_ptr field, which is set in the "push" branch of
> the program, is dequeued from the "pop" branch of the code.
>
> I also tested this behavior on Fedora 19 and the bug exists there as well. F19
> uses kernel 3.9.5.
>
> Compile the attachment:
> g++ -o <out> combined.cpp
>
>
> Execute as follows:
> sudo combined pop /dev/bsg/0:0:0:0 &
> sudo combined push /dev/bsg/0:0:0:0
I ran this test on lk 3.11.6 and it also exhibits this
problem.
When the bsg driver was originally designed, if my memory is
correct, it did not have an asynchronous interface, so it
skipped the complexity of keeping a separate context for
each file handle within each device.
With the addition of the asynchronous interface, the lack of
file handle context is exposed by your simple test. I'm
pretty sure that parallel test programs could show that
synchronous SG_IO ioctls can also be tricked. For example:
send INQUIRYs continuously from one process, TURs from
another process to the same device. Then, once in a while,
I guess that they would pick up the other one's response.
As for fixing it, that seems like a lot of work. I'm busy with
the sg driver at the moment. The short term solution is to use
the sg driver instead of the bsg driver in cases like these
unless:
a) you want to do SCSI bidirectional commands
b) you want to send SCSI commands whose cdb is greater
that 16 bytes
Observation:
wc bsg.c bsg-lib.c
1117 2761 24144 bsg.c
232 797 6117 bsg-lib.c
1349 3558 30261 total
wc sg.c
2669 8327 72340 sg.c
Some of those sg.c lines are bloat and for backward compatibility
(to 1992); but not all of them!
The sg driver has problems which several people are looking at,
but not as fundamental as the one reported here.
Another random thought: if the bsg driver implemented O_EXCL
on its open()s [it just ignores it] then that would be one
mechanism that could be used to guard against what has been
observed.
Doug Gilbert
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug 64171] Block SCSI Generic Driver does not keep data
2013-11-02 17:09 ` [Bug 64171] " bugzilla-daemon
@ 2013-11-15 16:42 ` James Bottomley
2013-11-19 6:15 ` Douglas Gilbert
0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2013-11-15 16:42 UTC (permalink / raw)
To: bugzilla-daemon; +Cc: linux-scsi
On Sat, 2013-11-02 at 17:09 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=64171
>
> --- Comment #2 from d gilbert <dgilbert@interlog.com> ---
> On 13-11-01 03:49 PM, bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=64171
> >
> > Bug ID: 64171
> > Summary: Block SCSI Generic Driver does not keep data
> > Product: SCSI Drivers
> > Version: 2.5
> > Kernel Version: 2.6.32.61
> > Hardware: All
> > OS: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: high
> > Priority: P1
> > Component: Other
> > Assignee: scsi_drivers-other@kernel-bugs.osdl.org
> > Reporter: af300wsm@gmail.com
> > Regression: No
> >
> > Data written to any given file descriptor should be unique to that descriptor
> > and processor space. Currently, the BSG Driver does not keep this uniqueness.
> > As the attached simple program demonstrates, a SCSI Command queued to the
> > device in one process is dequeued by another process which has opened a handle
> > to the same device.
> >
> > The attached file sends the simple SCSI "Test Unit Ready" command from the SCSI
> > Primary Command Spec. to the device using the BSG driver. As the program
> > demonstrates, the sg_io_v4.usr_ptr field, which is set in the "push" branch of
> > the program, is dequeued from the "pop" branch of the code.
> >
> > I also tested this behavior on Fedora 19 and the bug exists there as well. F19
> > uses kernel 3.9.5.
> >
> > Compile the attachment:
> > g++ -o <out> combined.cpp
> >
> >
> > Execute as follows:
> > sudo combined pop /dev/bsg/0:0:0:0 &
> > sudo combined push /dev/bsg/0:0:0:0
>
> I ran this test on lk 3.11.6 and it also exhibits this
> problem.
>
> When the bsg driver was originally designed, if my memory is
> correct, it did not have an asynchronous interface, so it
> skipped the complexity of keeping a separate context for
> each file handle within each device.
>
> With the addition of the asynchronous interface, the lack of
> file handle context is exposed by your simple test. I'm
> pretty sure that parallel test programs could show that
> synchronous SG_IO ioctls can also be tricked. For example:
> send INQUIRYs continuously from one process, TURs from
> another process to the same device. Then, once in a while,
> I guess that they would pick up the other one's response.
OK, so why do you both think this is a bug not a feature? The
read/write interface isn't completion order safe, it was introduced
for /dev/sg compatibility and somehow got carried forwards when it
should have been deprecated. The ioctl interface is, so just use the
latter. If you can find a test case where the ioctl interface has the
same problem, then I'll treat it as a bug.
James
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug 64171] Block SCSI Generic Driver does not keep data
2013-11-15 16:42 ` James Bottomley
@ 2013-11-19 6:15 ` Douglas Gilbert
0 siblings, 0 replies; 9+ messages in thread
From: Douglas Gilbert @ 2013-11-19 6:15 UTC (permalink / raw)
To: James Bottomley, bugzilla-daemon; +Cc: linux-scsi
On 13-11-15 11:42 AM, James Bottomley wrote:
> On Sat, 2013-11-02 at 17:09 +0000, bugzilla-daemon@bugzilla.kernel.org
> wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=64171
>>
>> --- Comment #2 from d gilbert <dgilbert@interlog.com> ---
>> On 13-11-01 03:49 PM, bugzilla-daemon@bugzilla.kernel.org wrote:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=64171
>>>
>>> Bug ID: 64171
>>> Summary: Block SCSI Generic Driver does not keep data
>>> Product: SCSI Drivers
>>> Version: 2.5
>>> Kernel Version: 2.6.32.61
>>> Hardware: All
>>> OS: Linux
>>> Tree: Mainline
>>> Status: NEW
>>> Severity: high
>>> Priority: P1
>>> Component: Other
>>> Assignee: scsi_drivers-other@kernel-bugs.osdl.org
>>> Reporter: af300wsm@gmail.com
>>> Regression: No
>>>
>>> Data written to any given file descriptor should be unique to that descriptor
>>> and processor space. Currently, the BSG Driver does not keep this uniqueness.
>>> As the attached simple program demonstrates, a SCSI Command queued to the
>>> device in one process is dequeued by another process which has opened a handle
>>> to the same device.
>>>
>>> The attached file sends the simple SCSI "Test Unit Ready" command from the SCSI
>>> Primary Command Spec. to the device using the BSG driver. As the program
>>> demonstrates, the sg_io_v4.usr_ptr field, which is set in the "push" branch of
>>> the program, is dequeued from the "pop" branch of the code.
>>>
>>> I also tested this behavior on Fedora 19 and the bug exists there as well. F19
>>> uses kernel 3.9.5.
>>>
>>> Compile the attachment:
>>> g++ -o <out> combined.cpp
>>>
>>>
>>> Execute as follows:
>>> sudo combined pop /dev/bsg/0:0:0:0 &
>>> sudo combined push /dev/bsg/0:0:0:0
>>
>> I ran this test on lk 3.11.6 and it also exhibits this
>> problem.
>>
>> When the bsg driver was originally designed, if my memory is
>> correct, it did not have an asynchronous interface, so it
>> skipped the complexity of keeping a separate context for
>> each file handle within each device.
>>
>> With the addition of the asynchronous interface, the lack of
>> file handle context is exposed by your simple test. I'm
>> pretty sure that parallel test programs could show that
>> synchronous SG_IO ioctls can also be tricked. For example:
>> send INQUIRYs continuously from one process, TURs from
>> another process to the same device. Then, once in a while,
>> I guess that they would pick up the other one's response.
>
> OK, so why do you both think this is a bug not a feature? The
> read/write interface isn't completion order safe, it was introduced
> for /dev/sg compatibility and somehow got carried forwards when it
> should have been deprecated. The ioctl interface is, so just use the
> latter. If you can find a test case where the ioctl interface has the
> same problem, then I'll treat it as a bug.
sg_tst_context is a test program that sends TEST UNIT
READY commands on one thread (or even threads) and START
STOP UNIT commands on another thread (or odd threads).
The SSU commands alternate between start and stop. The
good news is the the bsg ioctl(SG_IO) doesn't break the
way I predicted.
Instead it broke like this:
# sg_tst_context -n 100 -t 2 /dev/bsg/8:0:0:2
Enter work_thread id=0 num=100 share=0
Enter work_thread id=1 num=100 share=0
START STOP UNIT do_scsi_pt() submission error, id=1
pass through OS error: Invalid argument
thread id=1 FAILed at iteration: 0 [negated errno: -22]
thread id=0 normal exit
Expected not_readys on TEST UNIT READY: 0
UNEXPECTED not_readys on START STOP UNIT: 0
Try 10 threads and the majority die in the same fashion.
That is with a separate file handle per thread. So if
there are multiple file handles to a bsg device then
subsequent calls to ioctl(SG_IO) may fail with
errno=EINVAL (for no good reason).
If a single file handle is shared between the threads then
the problem goes away:
# sg_tst_context -n 100 -t 2 -s /dev/bsg/8:0:0:2
Enter work_thread id=0 num=100 share=1
Enter work_thread id=1 num=100 share=1
thread id=0 normal exit
thread id=1 normal exit
Expected not_readys on TEST UNIT READY: 55
UNEXPECTED not_readys on START STOP UNIT: 0
Both the sg driver and the block layer (e.g.
ioctl(/dev/sdc, SG_IO) ) perform properly in these tests.
Doug Gilbert
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 64171] Block SCSI Generic Driver does not keep data
2013-11-01 19:49 [Bug 64171] New: Block SCSI Generic Driver does not keep data bugzilla-daemon
` (2 preceding siblings ...)
2013-11-02 17:09 ` [Bug 64171] " bugzilla-daemon
@ 2013-11-15 15:52 ` bugzilla-daemon
2013-11-15 15:55 ` [Bug 64171] Block SCSI Generic Driver does not maintain file handle context bugzilla-daemon
2013-11-19 6:16 ` bugzilla-daemon
5 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2013-11-15 15:52 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=64171
--- Comment #3 from Andrew Falanga <af300wsm@gmail.com> ---
Doug,
Thanks for the comments. My team decided that this constituted too much
instability to rely upon BSG for our needs. The major draw back is we were
looking for this driver specifically for large CDBs (> 16 bytes). Until this
driver is fixed, which my organization is very interested in, we're informing
our customers that they'll have to use LSI HBAs in order to support pass
through capability of CDBs in excess of 16 bytes.
Incidentally, I couldn't find an e-mail for FUJITA Tomonori, the currently
listed maintainer of this driver, within bugzilla. Is his e-mail in bugzilla
under a different address than the one listed in MAINTAINERS? From
MAINTAINERS: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 64171] Block SCSI Generic Driver does not maintain file handle context
2013-11-01 19:49 [Bug 64171] New: Block SCSI Generic Driver does not keep data bugzilla-daemon
` (3 preceding siblings ...)
2013-11-15 15:52 ` bugzilla-daemon
@ 2013-11-15 15:55 ` bugzilla-daemon
2013-11-19 6:16 ` bugzilla-daemon
5 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2013-11-15 15:55 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=64171
Andrew Falanga <af300wsm@gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Block SCSI Generic Driver |Block SCSI Generic Driver
|does not keep data |does not maintain file
| |handle context
--- Comment #4 from Andrew Falanga <af300wsm@gmail.com> ---
Changed the summary to actually complete a thought.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 64171] Block SCSI Generic Driver does not maintain file handle context
2013-11-01 19:49 [Bug 64171] New: Block SCSI Generic Driver does not keep data bugzilla-daemon
` (4 preceding siblings ...)
2013-11-15 15:55 ` [Bug 64171] Block SCSI Generic Driver does not maintain file handle context bugzilla-daemon
@ 2013-11-19 6:16 ` bugzilla-daemon
5 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2013-11-19 6:16 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=64171
--- Comment #5 from d gilbert <dgilbert@interlog.com> ---
On 13-11-15 11:42 AM, James Bottomley wrote:
> On Sat, 2013-11-02 at 17:09 +0000, bugzilla-daemon@bugzilla.kernel.org
> wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=64171
>>
>> --- Comment #2 from d gilbert <dgilbert@interlog.com> ---
>> On 13-11-01 03:49 PM, bugzilla-daemon@bugzilla.kernel.org wrote:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=64171
>>>
>>> Bug ID: 64171
>>> Summary: Block SCSI Generic Driver does not keep data
>>> Product: SCSI Drivers
>>> Version: 2.5
>>> Kernel Version: 2.6.32.61
>>> Hardware: All
>>> OS: Linux
>>> Tree: Mainline
>>> Status: NEW
>>> Severity: high
>>> Priority: P1
>>> Component: Other
>>> Assignee: scsi_drivers-other@kernel-bugs.osdl.org
>>> Reporter: af300wsm@gmail.com
>>> Regression: No
>>>
>>> Data written to any given file descriptor should be unique to that descriptor
>>> and processor space. Currently, the BSG Driver does not keep this uniqueness.
>>> As the attached simple program demonstrates, a SCSI Command queued to the
>>> device in one process is dequeued by another process which has opened a handle
>>> to the same device.
>>>
>>> The attached file sends the simple SCSI "Test Unit Ready" command from the SCSI
>>> Primary Command Spec. to the device using the BSG driver. As the program
>>> demonstrates, the sg_io_v4.usr_ptr field, which is set in the "push" branch of
>>> the program, is dequeued from the "pop" branch of the code.
>>>
>>> I also tested this behavior on Fedora 19 and the bug exists there as well. F19
>>> uses kernel 3.9.5.
>>>
>>> Compile the attachment:
>>> g++ -o <out> combined.cpp
>>>
>>>
>>> Execute as follows:
>>> sudo combined pop /dev/bsg/0:0:0:0 &
>>> sudo combined push /dev/bsg/0:0:0:0
>>
>> I ran this test on lk 3.11.6 and it also exhibits this
>> problem.
>>
>> When the bsg driver was originally designed, if my memory is
>> correct, it did not have an asynchronous interface, so it
>> skipped the complexity of keeping a separate context for
>> each file handle within each device.
>>
>> With the addition of the asynchronous interface, the lack of
>> file handle context is exposed by your simple test. I'm
>> pretty sure that parallel test programs could show that
>> synchronous SG_IO ioctls can also be tricked. For example:
>> send INQUIRYs continuously from one process, TURs from
>> another process to the same device. Then, once in a while,
>> I guess that they would pick up the other one's response.
>
> OK, so why do you both think this is a bug not a feature? The
> read/write interface isn't completion order safe, it was introduced
> for /dev/sg compatibility and somehow got carried forwards when it
> should have been deprecated. The ioctl interface is, so just use the
> latter. If you can find a test case where the ioctl interface has the
> same problem, then I'll treat it as a bug.
sg_tst_context is a test program that sends TEST UNIT
READY commands on one thread (or even threads) and START
STOP UNIT commands on another thread (or odd threads).
The SSU commands alternate between start and stop. The
good news is the the bsg ioctl(SG_IO) doesn't break the
way I predicted.
Instead it broke like this:
# sg_tst_context -n 100 -t 2 /dev/bsg/8:0:0:2
Enter work_thread id=0 num=100 share=0
Enter work_thread id=1 num=100 share=0
START STOP UNIT do_scsi_pt() submission error, id=1
pass through OS error: Invalid argument
thread id=1 FAILed at iteration: 0 [negated errno: -22]
thread id=0 normal exit
Expected not_readys on TEST UNIT READY: 0
UNEXPECTED not_readys on START STOP UNIT: 0
Try 10 threads and the majority die in the same fashion.
That is with a separate file handle per thread. So if
there are multiple file handles to a bsg device then
subsequent calls to ioctl(SG_IO) may fail with
errno=EINVAL (for no good reason).
If a single file handle is shared between the threads then
the problem goes away:
# sg_tst_context -n 100 -t 2 -s /dev/bsg/8:0:0:2
Enter work_thread id=0 num=100 share=1
Enter work_thread id=1 num=100 share=1
thread id=0 normal exit
thread id=1 normal exit
Expected not_readys on TEST UNIT READY: 55
UNEXPECTED not_readys on START STOP UNIT: 0
Both the sg driver and the block layer (e.g.
ioctl(/dev/sdc, SG_IO) ) perform properly in these tests.
Doug Gilbert
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread