Bugs on Linux 2.6.18-rc2 sg code?

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Bugs on Linux 2.6.18-rc2 sg code?
@ 2006-08-11  1:43 Fajun Chen
  2006-08-18 22:00 ` Douglas Gilbert
  0 siblings, 1 reply; 19+ messages in thread
From: Fajun Chen @ 2006-08-11  1:43 UTC (permalink / raw)
  To: linux-scsi; +Cc: dgilbert

Hi Folks,

I use ATA pass through via sg ioctl interface for data read/write.
Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was
running on ARM IOP80321 board. The HBA Sil3124 was used.

Two problems were observed:
1. sg mmap bug?
    My test program could not write data correctly to the mmapped
buffer in the user space.      The program did a read immediately
after a write and the data mismatches.  Swapped the sg_vma_nopage()
function with the one in 2.6.15.4 release fixed the problem. So this
appears to be a wrong change to the sg mmap code in 2.6.18-rc2
release.

2. sg hangs or have really slow response
   Under certain unknown conditions, sg will be busy with one
read/write ioctl call for over half an hour.  From scsi proc
interface, sg and scsi mid layer were processing requests as  states
"act" or "rcv" was shown in /proc/scsi/sg/debug. My test program set
the command timeout to be 30 seconds, but this timeout did not trigger
the command abort.

Are these problems genuine bugs in sg 2.6.18-rc2 release? Since the
problem is reproducible in my test hardware, please let me know if any
log/traces can be collected.

BTW, scsi logging through proc FS seems to be broken as well even
though SCSI logging and Proc FS are enabled in my 2.6.18-rc2 kernel
config.

Thanks,
Fajun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-11  1:43 Bugs on Linux 2.6.18-rc2 sg code? Fajun Chen
@ 2006-08-18 22:00 ` Douglas Gilbert
  2006-08-19  4:11   ` Douglas Gilbert
  2006-08-21 21:07   ` Fajun Chen
  0 siblings, 2 replies; 19+ messages in thread
From: Douglas Gilbert @ 2006-08-18 22:00 UTC (permalink / raw)
  To: Fajun Chen; +Cc: linux-scsi

Fajun Chen wrote:
> Hi Folks,
> 
> I use ATA pass through via sg ioctl interface for data read/write.
> Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was
> running on ARM IOP80321 board. The HBA Sil3124 was used.
> 
> Two problems were observed:
> 1. sg mmap bug?
>    My test program could not write data correctly to the mmapped
> buffer in the user space.      The program did a read immediately
> after a write and the data mismatches.  Swapped the sg_vma_nopage()
> function with the one in 2.6.15.4 release fixed the problem. So this
> appears to be a wrong change to the sg mmap code in 2.6.18-rc2
> release.

Thanks for the report. I can confirm that mmap-ed IO
in the sg driver is broken. Simply reading 16 blocks
from some arbitrary offset with sg_dd and sgm_dd
and comparing the fetched data shows mismatches starting
above the first page (i.e. above byte offset 4096 on
i386).

Your point about the change to sg_vma_nopage() between
lk 2.6.15 and lk 2.6.16 also seems to be correct.
The most indented part of that function has been
changed from incrementing the change count on the
reported page (as indicated by 'offset') in a
compound page allocation to ignoring the 'offset'
and incrementing the page count on the first page
in a compound page allocation.

> 2. sg hangs or have really slow response
>   Under certain unknown conditions, sg will be busy with one
> read/write ioctl call for over half an hour.  From scsi proc
> interface, sg and scsi mid layer were processing requests as  states
> "act" or "rcv" was shown in /proc/scsi/sg/debug. My test program set
> the command timeout to be 30 seconds, but this timeout did not trigger
> the command abort.

Well yes, I have noticed on transport errors, the attempt
to abort the command (or use a larger hammer) fails and
the command just keeps on chugging. The "elapsed" time
in the /proc/scsi/sg/debug just keeps on growing, typically
to a value much larger than the given timeout.

> Are these problems genuine bugs in sg 2.6.18-rc2 release? Since the
> problem is reproducible in my test hardware, please let me know if any
> log/traces can be collected.

I was able to confirm the breakage in point 1)
with a recent SATA disk with an old SIL 3112 controller
and with a SAS disk with some pretty recent MPT Fusion
hardware. That was enough for me.

> BTW, scsi logging through proc FS seems to be broken as well even
> though SCSI logging and Proc FS are enabled in my 2.6.18-rc2 kernel
> config.

Looks like we should be doing a lot more testing,
especially when our friends from other kernel areas
offer to clean up our code and remove cruft.

Doug Gilbert

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-18 22:00 ` Douglas Gilbert
@ 2006-08-19  4:11   ` Douglas Gilbert
  2006-08-19 17:20     ` Matthew Wilcox
                       ` (3 more replies)
  2006-08-21 21:07   ` Fajun Chen
  1 sibling, 4 replies; 19+ messages in thread
From: Douglas Gilbert @ 2006-08-19  4:11 UTC (permalink / raw)
  To: Fajun Chen; +Cc: linux-scsi, akpm

[-- Attachment #1: Type: text/plain, Size: 1618 bytes --]

Douglas Gilbert wrote:
> Fajun Chen wrote:
>> Hi Folks,
>>
>> I use ATA pass through via sg ioctl interface for data read/write.
>> Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was
>> running on ARM IOP80321 board. The HBA Sil3124 was used.
>>
>> Two problems were observed:
>> 1. sg mmap bug?
>>    My test program could not write data correctly to the mmapped
>> buffer in the user space.      The program did a read immediately
>> after a write and the data mismatches.  Swapped the sg_vma_nopage()
>> function with the one in 2.6.15.4 release fixed the problem. So this
>> appears to be a wrong change to the sg mmap code in 2.6.18-rc2
>> release.
> 
> Thanks for the report. I can confirm that mmap-ed IO
> in the sg driver is broken. Simply reading 16 blocks
> from some arbitrary offset with sg_dd and sgm_dd
> and comparing the fetched data shows mismatches starting
> above the first page (i.e. above byte offset 4096 on
> i386).
> 
> Your point about the change to sg_vma_nopage() between
> lk 2.6.15 and lk 2.6.16 also seems to be correct.
> The most indented part of that function has been
> changed from incrementing the change count on the
> reported page (as indicated by 'offset') in a
> compound page allocation to ignoring the 'offset'
> and incrementing the page count on the first page
> in a compound page allocation.

Fajun,
Could you please try the patch below and report if it fixes
your sg mmap problem. The patch is against lk 2.6.18-rc4
and I assume it will apply against "rc2" as well.
The patch re-instates the former logic and fixes the
problem in my tests.

Doug Gilbert

[-- Attachment #2: sg2618rc4nopage.diff --]
[-- Type: text/x-patch, Size: 950 bytes --]

--- linux/drivers/scsi/sg.c	2006-07-16 08:19:19.000000000 -0400
+++ linux/drivers/scsi/sg.c2618rc4nopage	2006-08-18 23:40:15.000000000 -0400
@@ -18,8 +18,8 @@
  *
  */
 
-static int sg_version_num = 30533;	/* 2 digits for each component */
-#define SG_VERSION_STR "3.5.33"
+static int sg_version_num = 30534;	/* 2 digits for each component */
+#define SG_VERSION_STR "3.5.34"
 
 /*
  *  D. P. Gilbert (dgilbert@interlog.com, dougg@triode.net.au), notes:
@@ -60,7 +60,7 @@
 
 #ifdef CONFIG_SCSI_PROC_FS
 #include <linux/proc_fs.h>
-static char *sg_version_date = "20050908";
+static char *sg_version_date = "20060818";
 
 static int sg_proc_init(void);
 static void sg_proc_cleanup(void);
@@ -1164,7 +1164,7 @@
 		len = vma->vm_end - sa;
 		len = (len < sg->length) ? len : sg->length;
 		if (offset < len) {
-			page = sg->page;
+			page = virt_to_page(page_address(sg->page) + offset);
 			get_page(page);	/* increment page count */
 			break;
 		}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-19  4:11   ` Douglas Gilbert
@ 2006-08-19 17:20     ` Matthew Wilcox
  2006-08-19 17:36       ` Douglas Gilbert
  2006-08-19 18:55     ` James Bottomley
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2006-08-19 17:20 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: Fajun Chen, linux-scsi, akpm

On Sat, Aug 19, 2006 at 12:11:34AM -0400, Douglas Gilbert wrote:
> > Your point about the change to sg_vma_nopage() between
> > lk 2.6.15 and lk 2.6.16 also seems to be correct.
> > The most indented part of that function has been
> > changed from incrementing the change count on the
> > reported page (as indicated by 'offset') in a
> > compound page allocation to ignoring the 'offset'
> > and incrementing the page count on the first page
> > in a compound page allocation.
>  		if (offset < len) {
> -			page = sg->page;
> +			page = virt_to_page(page_address(sg->page) + offset);
>  			get_page(page);	/* increment page count */

But page_address can return NULL on highmem machines.  Or are you
guaranteed that these are lowmem pages?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-19 17:20     ` Matthew Wilcox
@ 2006-08-19 17:36       ` Douglas Gilbert
  2006-08-19 17:41         ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Douglas Gilbert @ 2006-08-19 17:36 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Fajun Chen, linux-scsi, akpm

Matthew Wilcox wrote:
> On Sat, Aug 19, 2006 at 12:11:34AM -0400, Douglas Gilbert wrote:
>>> Your point about the change to sg_vma_nopage() between
>>> lk 2.6.15 and lk 2.6.16 also seems to be correct.
>>> The most indented part of that function has been
>>> changed from incrementing the change count on the
>>> reported page (as indicated by 'offset') in a
>>> compound page allocation to ignoring the 'offset'
>>> and incrementing the page count on the first page
>>> in a compound page allocation.
>>  		if (offset < len) {
>> -			page = sg->page;
>> +			page = virt_to_page(page_address(sg->page) + offset);
>>  			get_page(page);	/* increment page count */
> 
> But page_address can return NULL on highmem machines.  Or are you
> guaranteed that these are lowmem pages?

Matthew,
The allocation is typically compound done with:
  alloc_pages(GFP_ATOMIC|__GFP_COMP|__GFP_NOWARN, order)
where 'order' is 3 (hence 32 KB).
The lack of a '__GFP_HIGHMEM' flag should mean these are
lowmem pages (?).

The (compound) pages in question would have already
been mmap()-ed by the user.

Doug Gilbert



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-19 17:36       ` Douglas Gilbert
@ 2006-08-19 17:41         ` Andrew Morton
  2006-08-19 20:23           ` Matthew Wilcox
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2006-08-19 17:41 UTC (permalink / raw)
  To: dougg; +Cc: Matthew Wilcox, Fajun Chen, linux-scsi

On Sat, 19 Aug 2006 13:36:33 -0400
Douglas Gilbert <dougg@torque.net> wrote:

> Matthew Wilcox wrote:
> > On Sat, Aug 19, 2006 at 12:11:34AM -0400, Douglas Gilbert wrote:
> >>> Your point about the change to sg_vma_nopage() between
> >>> lk 2.6.15 and lk 2.6.16 also seems to be correct.
> >>> The most indented part of that function has been
> >>> changed from incrementing the change count on the
> >>> reported page (as indicated by 'offset') in a
> >>> compound page allocation to ignoring the 'offset'
> >>> and incrementing the page count on the first page
> >>> in a compound page allocation.
> >>  		if (offset < len) {
> >> -			page = sg->page;
> >> +			page = virt_to_page(page_address(sg->page) + offset);
> >>  			get_page(page);	/* increment page count */
> > 
> > But page_address can return NULL on highmem machines.  Or are you
> > guaranteed that these are lowmem pages?
> 
> Matthew,
> The allocation is typically compound done with:
>   alloc_pages(GFP_ATOMIC|__GFP_COMP|__GFP_NOWARN, order)
> where 'order' is 3 (hence 32 KB).

ow, GFP_ATOMIC is bad.  Can we avoid that?

> The lack of a '__GFP_HIGHMEM' flag should mean these are
> lowmem pages (?).

Yup.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-19  4:11   ` Douglas Gilbert
  2006-08-19 17:20     ` Matthew Wilcox
@ 2006-08-19 18:55     ` James Bottomley
  2006-08-20  1:36       ` Douglas Gilbert
       [not found]     ` <8202f4270608200051p688f4654ub6aecb604e0152f1@mail.gmail.com>
  2006-08-28 21:12     ` Fajun Chen
  3 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2006-08-19 18:55 UTC (permalink / raw)
  To: dougg; +Cc: Fajun Chen, linux-scsi, akpm

On Sat, 2006-08-19 at 00:11 -0400, Douglas Gilbert wrote:
> @@ -1164,7 +1164,7 @@
>                 len = vma->vm_end - sa;
>                 len = (len < sg->length) ? len : sg->length;
>                 if (offset < len) {
> -                       page = sg->page;
> +                       page = virt_to_page(page_address(sg->page) + offset);
>                         get_page(page); /* increment page count */
>                         break;
>                 }

Doing something like this always frightens people in linux because
page_address() on highmem returns NULL.  I know, having looked, that in
this case it can't happen, but since sg->page is really an array of
pages, how about something simpler like

page = &sg->page[offset >> PAGE_SHIFT]

or (if you want to be more correct) something like the nth_page macro in
linux/mm.h?

It might also be worthwhile considering GFP_HIGHUSER for this allocation
(in spite of all the kmap_atomic et al that would have to be added to
the code), since that will increase the chance of a contiguous
allocation on large memory machines.

James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-19 17:41         ` Andrew Morton
@ 2006-08-19 20:23           ` Matthew Wilcox
  2006-08-20  1:00             ` Douglas Gilbert
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2006-08-19 20:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dougg, Fajun Chen, linux-scsi

On Sat, Aug 19, 2006 at 10:41:09AM -0700, Andrew Morton wrote:
> ow, GFP_ATOMIC is bad.  Can we avoid that?

I believe all the GFP_ATOMIC calls in sg.c can be downgraded to
GFP_KERNEL.

sg_add_sfp is only called from sg_open(), so the GFP_ATOMIC there can
be downgraded to GFP_KERNEL.

There's one in sg_common_write that can be downgraded.  It's called
from sg_write and sg_ioctl (via sg_new_write).  No locks are held at
those points.

st_map_user_pages uses GFP_ATOMIC.  It's only called from
sg_build_direct, which is only called from sg_start_req, which is only
called from sg_new_write, which is safe per above.

sg_page_malloc uses GFP_ATOMIC.  It's only called from sg_build_indirect.
sg_build_indirect is called from sg_start_req (previously shown safe),
and sg_build_reserve.  sg_build_reserve is safe as it's only called from
sg_ioctl and sg_add_sfp (previously shown safe).

sg_build_sgat is called from sg_build_direct and sg_build_indirect,
both of which have previously been shown safe.

Please check my reasoning, but I'm pretty confident that none of these
paths are called with any locks held.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-19 20:23           ` Matthew Wilcox
@ 2006-08-20  1:00             ` Douglas Gilbert
  2006-08-20 10:07               ` Arjan van de Ven
  0 siblings, 1 reply; 19+ messages in thread
From: Douglas Gilbert @ 2006-08-20  1:00 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, Fajun Chen, linux-scsi, James.Bottomley

Matthew Wilcox wrote:
> On Sat, Aug 19, 2006 at 10:41:09AM -0700, Andrew Morton wrote:
>> ow, GFP_ATOMIC is bad.  Can we avoid that?
> 
> I believe all the GFP_ATOMIC calls in sg.c can be downgraded to
> GFP_KERNEL.

Well, if memory serves, we have been around this loop
before. The GFP_ATOMIC flags were removed, users
complained about unexplained, unbounded waits unrelated
to the SCSI transport or device. So the GFP_ATOMIC
flags were put back again. The semantics of the
function that calls alloc_pages() is to cope with a
(fast) failure, and if it is an ambit claim (e.g. the
reserve buffer allocation on open() ) then lower the
order by 1 (halve the requested allocation) then try again.

The semantics of SCSI command injection in the sg driver
have been to fetch the resources needed and inject the
command to the mid level as quickly as possible.
If there is a resource problem, exit and tell the user
space promptly.

So the action required seems to be __GFP_NORETRY if that
works as its name suggests.

> sg_add_sfp is only called from sg_open(), so the GFP_ATOMIC there can
> be downgraded to GFP_KERNEL.

That is called on open() so GFP_KERNEL | __GFP_NORETRY
should be ok.

> There's one in sg_common_write that can be downgraded. It's called
> from sg_write and sg_ioctl (via sg_new_write).  No locks are held at
> those points.

That one is the last argument to scsi_execute_async().
I didn't add that and don't know what it is for.
Perhaps James could comment.

> st_map_user_pages uses GFP_ATOMIC.  It's only called from
> sg_build_direct, which is only called from sg_start_req, which is only
> called from sg_new_write, which is safe per above.

GFP_KERNEL | __GFP_NORETRY

> sg_page_malloc uses GFP_ATOMIC.  It's only called from sg_build_indirect.
> sg_build_indirect is called from sg_start_req (previously shown safe),
> and sg_build_reserve.  sg_build_reserve is safe as it's only called from
> sg_ioctl and sg_add_sfp (previously shown safe).
> 
> sg_build_sgat is called from sg_build_direct and sg_build_indirect,
> both of which have previously been shown safe.
> 
> Please check my reasoning, but I'm pretty confident that none of these
> paths are called with any locks held.

None of the kmalloc()s or alloc_pages() are called
from softirq callback ( sg_cmd_done() in sg's case) that
I can see. I don't think locks taken by the sg driver
should be a problem.

BTW I'm initially trying to fix a bug added by someone
else into the sg driver in lk 2.6.16 . The code I'm
proposing isn't optimal but, by reverting the code (more
or less) to what it was, at least I know it was well
tested :-) Thus I was hoping to get the fix into
lk 2.6.18 . The other changes being discussed could go
into the lk 2.6.19 cycle.

Doug Gilbert

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-19 18:55     ` James Bottomley
@ 2006-08-20  1:36       ` Douglas Gilbert
  0 siblings, 0 replies; 19+ messages in thread
From: Douglas Gilbert @ 2006-08-20  1:36 UTC (permalink / raw)
  To: James Bottomley; +Cc: Fajun Chen, linux-scsi, akpm

James Bottomley wrote:
> On Sat, 2006-08-19 at 00:11 -0400, Douglas Gilbert wrote:
>> @@ -1164,7 +1164,7 @@
>>                 len = vma->vm_end - sa;
>>                 len = (len < sg->length) ? len : sg->length;
>>                 if (offset < len) {
>> -                       page = sg->page;
>> +                       page = virt_to_page(page_address(sg->page) + offset);
>>                         get_page(page); /* increment page count */
>>                         break;
>>                 }
> 
> Doing something like this always frightens people in linux because
> page_address() on highmem returns NULL.  I know, having looked, that in
> this case it can't happen, but since sg->page is really an array of
> pages, how about something simpler like
> 
> page = &sg->page[offset >> PAGE_SHIFT]

James,
Yes I saw code like that when I reviewed other vma
"nopage" callbacks. And that code frightens me :-)

> or (if you want to be more correct) something like the nth_page macro in
> linux/mm.h?

That nth_page() macro looks better. Is it guaranteed
not to return NULL? No vma "nopage" callbacks are
using nth_page() so I wasn't aware of it.

> It might also be worthwhile considering GFP_HIGHUSER for this allocation
> (in spite of all the kmap_atomic et al that would have to be added to
> the code), since that will increase the chance of a contiguous
> allocation on large memory machines.

Did you mean GFP_KERNEL | __GFP_HIGHMEM since the
allocation in question is for a scatter gather
list element that will be DMA-ed to or from?

Is the HIGHMEM stuff needed in 64 bits? I looked
at the kmap_atomic stuff and it just didn't look
worth the effort.


All good stuff for the lk 2.6.19 cycle. First I
would like to fix the reported bug with code
that was well tested ...


Doug Gilbert


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
       [not found]     ` <8202f4270608200051p688f4654ub6aecb604e0152f1@mail.gmail.com>
@ 2006-08-20  9:01       ` Russell King
  0 siblings, 0 replies; 19+ messages in thread
From: Russell King @ 2006-08-20  9:01 UTC (permalink / raw)
  To: Fajun Chen; +Cc: dougg, linux-scsi, akpm

On Sun, Aug 20, 2006 at 01:51:33AM -0600, Fajun Chen wrote:
> I've tested mmap code in both 2.6.15.4 and 2.6.18-rc2 with
> sg_vma_nopage() from 2.6.15.4, this particular sg mmap bug seems to be
> fixed?$B!%!!#S#i#n#c#e!!#y#o#u#r!!#p#a#t#c#h!!#f#o#r!!#2!%#6!%#1#8!]#r#c#4!!#i#s!!essentially not different

This is unreadable...

> from 2.6.15.4 (the only difference is sg->offset, which is always 0
> for sg mmap or indirect IO),  this should fix the problem.  I'll test
> your patch next week to confirm it anyway.
> 
> I would like to also report a potential cache coherency issue on sg
> when running on some processors other than i386.  I ran into cache
> coherency issue on ARM XScale Iop80321.

I no longer touch SCSI, partly because the only SCSI based systems I
have are extremely (10 years) old now.  All my SCSI knowledge is based
on ancient 2.4 code and is no longer relevant.

> Existing sg code for direct IO:

I don't think direct-IO has ever been tested and debugged on ARM either.
I wouldn't know where to start with direct-IO - never used and never
looked at it.

And more importantly (for me), I'd rather not get into another squabble
with James over cache coherency ideas, so I want to stay well away from
cache coherency and block devices, thanks.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-20  1:00             ` Douglas Gilbert
@ 2006-08-20 10:07               ` Arjan van de Ven
  2006-08-20 17:47                 ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Arjan van de Ven @ 2006-08-20 10:07 UTC (permalink / raw)
  To: dougg
  Cc: Matthew Wilcox, Andrew Morton, Fajun Chen, linux-scsi,
	James.Bottomley

On Sat, 2006-08-19 at 21:00 -0400, Douglas Gilbert wrote:
> Matthew Wilcox wrote:
> > On Sat, Aug 19, 2006 at 10:41:09AM -0700, Andrew Morton wrote:
> >> ow, GFP_ATOMIC is bad.  Can we avoid that?
> > 
> > I believe all the GFP_ATOMIC calls in sg.c can be downgraded to
> > GFP_KERNEL.
> 
> Well, if memory serves, we have been around this loop
> before. The GFP_ATOMIC flags were removed, users
> complained about unexplained, unbounded waits unrelated
> to the SCSI transport or device.

this is not a fair answer... in a way GFP_ATOMIC is an antisocial
behavior that should only be done when there is an absolute need. You'll
dip into the VM reserves by using it after all!

And for normal allocations, people who get the kernel to ask for a lot
of memory need to do their fair share of waiting for memory, just to
have write throttling work correctly...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-20 10:07               ` Arjan van de Ven
@ 2006-08-20 17:47                 ` Andrew Morton
  2006-08-30 16:29                   ` Fajun Chen
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2006-08-20 17:47 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: dougg, Matthew Wilcox, Fajun Chen, linux-scsi, James.Bottomley

On Sun, 20 Aug 2006 12:07:21 +0200
Arjan van de Ven <arjan@fenrus.demon.nl> wrote:

> On Sat, 2006-08-19 at 21:00 -0400, Douglas Gilbert wrote:
> > Matthew Wilcox wrote:
> > > On Sat, Aug 19, 2006 at 10:41:09AM -0700, Andrew Morton wrote:
> > >> ow, GFP_ATOMIC is bad.  Can we avoid that?
> > > 
> > > I believe all the GFP_ATOMIC calls in sg.c can be downgraded to
> > > GFP_KERNEL.
> > 
> > Well, if memory serves, we have been around this loop
> > before. The GFP_ATOMIC flags were removed, users
> > complained about unexplained, unbounded waits unrelated
> > to the SCSI transport or device.
> 
> this is not a fair answer... in a way GFP_ATOMIC is an antisocial
> behavior that should only be done when there is an absolute need. You'll
> dip into the VM reserves by using it after all!

Thing is, SG is (for some reason) using order-3 allocations.  Those can
indeed send the VM into a large amount of reclaim when combined with
GFP_KERNEL.

SG will fall back to lower-order pages if the higher-order attempt didn't
work out.

If we were to fine-tune this then perhaps:

	alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 3)
	alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 2)
	alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 1)
	alloc_pages(GFP_KERNEL, 0)

would suit.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-18 22:00 ` Douglas Gilbert
  2006-08-19  4:11   ` Douglas Gilbert
@ 2006-08-21 21:07   ` Fajun Chen
  2006-08-22  2:47     ` Douglas Gilbert
  1 sibling, 1 reply; 19+ messages in thread
From: Fajun Chen @ 2006-08-21 21:07 UTC (permalink / raw)
  To: dougg; +Cc: linux-scsi

Hi Doug,

I ran into sg hang problem again while running direct IO read/write.
Here are some logs collected. I set scsi_logging_level to 63 and dmesg
was collected right after I noticed invalid elap number.  I hope this
will provide some information for your debugging.

Thanks,
Fajun

Logs:
~ $ cat /proc/scsi/sg/debug
dev_max(currently)=32 max_active_device=2 (origin 1)
 def_reserved_size=32768
 >>> device=sg0 scsi0 chan=0 id=0 lun=0   em=1 sg_tablesize=128 excl=0
   FD(1): timeout=60000ms bufflen=131072 (res)sgat=4 low_dma=0
   cmd_q=1 f_packid=0 k_orphan=0 closed=0
     act: id=0 blen=0 t_o/elap=60000/458503350ms sgat=0 op=0x85

~ $ dmesg
ne: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002
sg_finish_rem_req: res_used=0
sg_remove_scat: k_use_sg=16
sg_ioctl: sg0, cmd=0x2285
scsi_block_when_processing_errors: rtn: 1
sg_common_write:  scsi opcode=0x85, cmd_size=16
sg_start_req: dxfer_len=65536
scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
scsi_delete_timer: scmd: c080f560, rtn: 1
sg_cmd_done: sg0, pack_id=0, res=0x8000002



On 8/18/06, Douglas Gilbert <dougg@torque.net> wrote:
> Fajun Chen wrote:
> > Hi Folks,
> >
> > I use ATA pass through via sg ioctl interface for data read/write.
> > Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was
> > running on ARM IOP80321 board. The HBA Sil3124 was used.
> >
> > Two problems were observed:
> > 1. sg mmap bug?
> >    My test program could not write data correctly to the mmapped
> > buffer in the user space.      The program did a read immediately
> > after a write and the data mismatches.  Swapped the sg_vma_nopage()
> > function with the one in 2.6.15.4 release fixed the problem. So this
> > appears to be a wrong change to the sg mmap code in 2.6.18-rc2
> > release.
>
> Thanks for the report. I can confirm that mmap-ed IO
> in the sg driver is broken. Simply reading 16 blocks
> from some arbitrary offset with sg_dd and sgm_dd
> and comparing the fetched data shows mismatches starting
> above the first page (i.e. above byte offset 4096 on
> i386).
>
> Your point about the change to sg_vma_nopage() between
> lk 2.6.15 and lk 2.6.16 also seems to be correct.
> The most indented part of that function has been
> changed from incrementing the change count on the
> reported page (as indicated by 'offset') in a
> compound page allocation to ignoring the 'offset'
> and incrementing the page count on the first page
> in a compound page allocation.
>
> > 2. sg hangs or have really slow response
> >   Under certain unknown conditions, sg will be busy with one
> > read/write ioctl call for over half an hour.  From scsi proc
> > interface, sg and scsi mid layer were processing requests as  states
> > "act" or "rcv" was shown in /proc/scsi/sg/debug. My test program set
> > the command timeout to be 30 seconds, but this timeout did not trigger
> > the command abort.
>
> Well yes, I have noticed on transport errors, the attempt
> to abort the command (or use a larger hammer) fails and
> the command just keeps on chugging. The "elapsed" time
> in the /proc/scsi/sg/debug just keeps on growing, typically
> to a value much larger than the given timeout.
>
> > Are these problems genuine bugs in sg 2.6.18-rc2 release? Since the
> > problem is reproducible in my test hardware, please let me know if any
> > log/traces can be collected.
>
> I was able to confirm the breakage in point 1)
> with a recent SATA disk with an old SIL 3112 controller
> and with a SAS disk with some pretty recent MPT Fusion
> hardware. That was enough for me.
>
> > BTW, scsi logging through proc FS seems to be broken as well even
> > though SCSI logging and Proc FS are enabled in my 2.6.18-rc2 kernel
> > config.
>
> Looks like we should be doing a lot more testing,
> especially when our friends from other kernel areas
> offer to clean up our code and remove cruft.
>
> Doug Gilbert
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-21 21:07   ` Fajun Chen
@ 2006-08-22  2:47     ` Douglas Gilbert
  2006-08-22  4:27       ` Fajun Chen
  0 siblings, 1 reply; 19+ messages in thread
From: Douglas Gilbert @ 2006-08-22  2:47 UTC (permalink / raw)
  To: Fajun Chen; +Cc: linux-scsi

Fajun Chen wrote:
> Hi Doug,
> 
> I ran into sg hang problem again while running direct IO read/write.
> Here are some logs collected. I set scsi_logging_level to 63 and dmesg
> was collected right after I noticed invalid elap number.  I hope this
> will provide some information for your debugging.

Fajun,
As I said earlier I don't think that this is a sg driver
problem (well not directly). The sg driver sets up a
a SCSI command, complete with a timeout (60 seconds
in your case) and a callback. The sg driver then waits
the whole weekend (judging from the elapsed time)
and hears nothing. The timeout is administered by
the midlevel and when it goes off the mid level asks
the LLD to abort the command (and if that fails it
tries a few other things). Alternatively the LLD
can define its own exception handler. Meanwhile the sg
driver waits passively for that command to finish.

So it looks like you are sending ATA read and write
commands through the ATA PASS THROUGH (16) SCSI command
(opcode 0x85). All the commands shown in the sg debug
output terminate with CHECK CONDITION (res=0x8000002).
That doesn't look correct for read and write commands.
Have you set the CK_COND bit? I can see if an ATA read
fails you may need to fetch the lba registers. Surely
the SATL should give you an ATA Return (sense) descriptor
if an ATA command fails in command 0x85, but I couldn't
see that written in the SAT draft.

I'm just curious about that point. It probably has no
bearing on the hang, but returning sense data when there
is no need may impact performance.

Doug Gilbert

> Logs:
> ~ $ cat /proc/scsi/sg/debug
> dev_max(currently)=32 max_active_device=2 (origin 1)
> def_reserved_size=32768
>>>> device=sg0 scsi0 chan=0 id=0 lun=0   em=1 sg_tablesize=128 excl=0
>   FD(1): timeout=60000ms bufflen=131072 (res)sgat=4 low_dma=0
>   cmd_q=1 f_packid=0 k_orphan=0 closed=0
>     act: id=0 blen=0 t_o/elap=60000/458503350ms sgat=0 op=0x85

BTW The above is normal (i.e. indirect) IO trying to move up to
128 KB of data (with a four element scatter gather list, each
element 32 KB). An ATA command is being sent via the SAT
defined pass through. The elapsed time is 127 hours (over 5 days).

> ~ $ dmesg
> ne: sg0, pack_id=0, res=0x8000002
> sg_finish_rem_req: res_used=0
> sg_remove_scat: k_use_sg=16
> sg_ioctl: sg0, cmd=0x2285
> scsi_block_when_processing_errors: rtn: 1
> sg_common_write:  scsi opcode=0x85, cmd_size=16
> sg_start_req: dxfer_len=65536
> scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
> scsi_delete_timer: scmd: c080f560, rtn: 1
> sg_cmd_done: sg0, pack_id=0, res=0x8000002
                               ^^^^^^^^^^^^^
<snip>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-22  2:47     ` Douglas Gilbert
@ 2006-08-22  4:27       ` Fajun Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Fajun Chen @ 2006-08-22  4:27 UTC (permalink / raw)
  To: dougg; +Cc: linux-scsi

Hi Doug,

Thank you for the detailed explanation. I still have some doubts/questions:
1. This sg hang problem happened and logs were collected shortly after
the test was started (less than an hour), so elap time for 127 hours
seems to be miscalculated somewhere.
2.  I didn't set CK_COND bit in ATA Pass Through command for
read/write, could the CHECK CONDITION be the result of command
failure?
3.  From application level trace,  my test application was blocked in
sg IOCTL call all along. From sg traces "sg_cmd_done: sg0, pack_id=0,
res=0x8000002", it seems sg did get sg_cmd_done() callback. In this
case, why my application was NOT unblocked? Note that my application
only opens one sg file and it will not send additional commands before
the current one is finished.
4. From your reply, middle layer is the one administrating timeout.
But for the sake of defense, should upper layers have it's own timeout
as well? I am thinking about add timeout in my application to defense
low level breakdown.

Thanks,
Fajun

On 8/21/06, Douglas Gilbert <dougg@torque.net> wrote:
> Fajun Chen wrote:
> > Hi Doug,
> >
> > I ran into sg hang problem again while running direct IO read/write.
> > Here are some logs collected. I set scsi_logging_level to 63 and dmesg
> > was collected right after I noticed invalid elap number.  I hope this
> > will provide some information for your debugging.
>
> Fajun,
> As I said earlier I don't think that this is a sg driver
> problem (well not directly). The sg driver sets up a
> a SCSI command, complete with a timeout (60 seconds
> in your case) and a callback. The sg driver then waits
> the whole weekend (judging from the elapsed time)
> and hears nothing. The timeout is administered by
> the midlevel and when it goes off the mid level asks
> the LLD to abort the command (and if that fails it
> tries a few other things). Alternatively the LLD
> can define its own exception handler. Meanwhile the sg
> driver waits passively for that command to finish.
>
> So it looks like you are sending ATA read and write
> commands through the ATA PASS THROUGH (16) SCSI command
> (opcode 0x85). All the commands shown in the sg debug
> output terminate with CHECK CONDITION (res=0x8000002).
> That doesn't look correct for read and write commands.
> Have you set the CK_COND bit? I can see if an ATA read
> fails you may need to fetch the lba registers. Surely
> the SATL should give you an ATA Return (sense) descriptor
> if an ATA command fails in command 0x85, but I couldn't
> see that written in the SAT draft.
>
> I'm just curious about that point. It probably has no
> bearing on the hang, but returning sense data when there
> is no need may impact performance.
>
> Doug Gilbert
>
> > Logs:
> > ~ $ cat /proc/scsi/sg/debug
> > dev_max(currently)=32 max_active_device=2 (origin 1)
> > def_reserved_size=32768
> >>>> device=sg0 scsi0 chan=0 id=0 lun=0   em=1 sg_tablesize=128 excl=0
> >   FD(1): timeout=60000ms bufflen=131072 (res)sgat=4 low_dma=0
> >   cmd_q=1 f_packid=0 k_orphan=0 closed=0
> >     act: id=0 blen=0 t_o/elap=60000/458503350ms sgat=0 op=0x85
>
> BTW The above is normal (i.e. indirect) IO trying to move up to
> 128 KB of data (with a four element scatter gather list, each
> element 32 KB). An ATA command is being sent via the SAT
> defined pass through. The elapsed time is 127 hours (over 5 days).
>
> > ~ $ dmesg
> > ne: sg0, pack_id=0, res=0x8000002
> > sg_finish_rem_req: res_used=0
> > sg_remove_scat: k_use_sg=16
> > sg_ioctl: sg0, cmd=0x2285
> > scsi_block_when_processing_errors: rtn: 1
> > sg_common_write:  scsi opcode=0x85, cmd_size=16
> > sg_start_req: dxfer_len=65536
> > scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98)
> > scsi_delete_timer: scmd: c080f560, rtn: 1
> > sg_cmd_done: sg0, pack_id=0, res=0x8000002
>                                ^^^^^^^^^^^^^
> <snip>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-19  4:11   ` Douglas Gilbert
                       ` (2 preceding siblings ...)
       [not found]     ` <8202f4270608200051p688f4654ub6aecb604e0152f1@mail.gmail.com>
@ 2006-08-28 21:12     ` Fajun Chen
  2006-08-28 22:11       ` Douglas Gilbert
  3 siblings, 1 reply; 19+ messages in thread
From: Fajun Chen @ 2006-08-28 21:12 UTC (permalink / raw)
  To: dougg; +Cc: linux-scsi, akpm

Hi Doug,

Forgot to report this. I've tested your patch and it works.

Thanks,
Fajun

On 8/18/06, Douglas Gilbert <dougg@torque.net> wrote:
> Douglas Gilbert wrote:
> > Fajun Chen wrote:
> >> Hi Folks,
> >>
> >> I use ATA pass through via sg ioctl interface for data read/write.
> >> Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was
> >> running on ARM IOP80321 board. The HBA Sil3124 was used.
> >>
> >> Two problems were observed:
> >> 1. sg mmap bug?
> >>    My test program could not write data correctly to the mmapped
> >> buffer in the user space.      The program did a read immediately
> >> after a write and the data mismatches.  Swapped the sg_vma_nopage()
> >> function with the one in 2.6.15.4 release fixed the problem. So this
> >> appears to be a wrong change to the sg mmap code in 2.6.18-rc2
> >> release.
> >
> > Thanks for the report. I can confirm that mmap-ed IO
> > in the sg driver is broken. Simply reading 16 blocks
> > from some arbitrary offset with sg_dd and sgm_dd
> > and comparing the fetched data shows mismatches starting
> > above the first page (i.e. above byte offset 4096 on
> > i386).
> >
> > Your point about the change to sg_vma_nopage() between
> > lk 2.6.15 and lk 2.6.16 also seems to be correct.
> > The most indented part of that function has been
> > changed from incrementing the change count on the
> > reported page (as indicated by 'offset') in a
> > compound page allocation to ignoring the 'offset'
> > and incrementing the page count on the first page
> > in a compound page allocation.
>
> Fajun,
> Could you please try the patch below and report if it fixes
> your sg mmap problem. The patch is against lk 2.6.18-rc4
> and I assume it will apply against "rc2" as well.
> The patch re-instates the former logic and fixes the
> problem in my tests.
>
> Doug Gilbert
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-28 21:12     ` Fajun Chen
@ 2006-08-28 22:11       ` Douglas Gilbert
  0 siblings, 0 replies; 19+ messages in thread
From: Douglas Gilbert @ 2006-08-28 22:11 UTC (permalink / raw)
  To: Fajun Chen; +Cc: linux-scsi

Fajun Chen wrote:
> Hi Doug,
> 
> Forgot to report this. I've tested your patch and it works.

Fajun,
That is good to hear as that patch appeared in lk 2.6.18-rc5
today.

Doug Gilbert


> On 8/18/06, Douglas Gilbert <dougg@torque.net> wrote:
>> Douglas Gilbert wrote:
>> > Fajun Chen wrote:
>> >> Hi Folks,
>> >>
>> >> I use ATA pass through via sg ioctl interface for data read/write.
>> >> Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was
>> >> running on ARM IOP80321 board. The HBA Sil3124 was used.
>> >>
>> >> Two problems were observed:
>> >> 1. sg mmap bug?
>> >>    My test program could not write data correctly to the mmapped
>> >> buffer in the user space.      The program did a read immediately
>> >> after a write and the data mismatches.  Swapped the sg_vma_nopage()
>> >> function with the one in 2.6.15.4 release fixed the problem. So this
>> >> appears to be a wrong change to the sg mmap code in 2.6.18-rc2
>> >> release.
>> >
>> > Thanks for the report. I can confirm that mmap-ed IO
>> > in the sg driver is broken. Simply reading 16 blocks
>> > from some arbitrary offset with sg_dd and sgm_dd
>> > and comparing the fetched data shows mismatches starting
>> > above the first page (i.e. above byte offset 4096 on
>> > i386).
>> >
>> > Your point about the change to sg_vma_nopage() between
>> > lk 2.6.15 and lk 2.6.16 also seems to be correct.
>> > The most indented part of that function has been
>> > changed from incrementing the change count on the
>> > reported page (as indicated by 'offset') in a
>> > compound page allocation to ignoring the 'offset'
>> > and incrementing the page count on the first page
>> > in a compound page allocation.
>>
>> Fajun,
>> Could you please try the patch below and report if it fixes
>> your sg mmap problem. The patch is against lk 2.6.18-rc4
>> and I assume it will apply against "rc2" as well.
>> The patch re-instates the former logic and fixes the
>> problem in my tests.
>>
>> Doug Gilbert
>>
>>
>>
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bugs on Linux 2.6.18-rc2 sg code?
  2006-08-20 17:47                 ` Andrew Morton
@ 2006-08-30 16:29                   ` Fajun Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Fajun Chen @ 2006-08-30 16:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arjan van de Ven, dougg, Matthew Wilcox, linux-scsi,
	James.Bottomley

Since we are scrutinize sg page alloc code, I have a question here. Is
there any way to make the sg pages allocated noncacheable? If we use
sg in mmap mode, the reserved buffer could be changed by user space,
hardware as well as kernel.  Uncached kernel access to these pages can
make PIO read/writes cache coherent.

Thanks,
Fajun

On 8/20/06, Andrew Morton <akpm@osdl.org> wrote:
> On Sun, 20 Aug 2006 12:07:21 +0200
> Arjan van de Ven <arjan@fenrus.demon.nl> wrote:
>
> > On Sat, 2006-08-19 at 21:00 -0400, Douglas Gilbert wrote:
> > > Matthew Wilcox wrote:
> > > > On Sat, Aug 19, 2006 at 10:41:09AM -0700, Andrew Morton wrote:
> > > >> ow, GFP_ATOMIC is bad.  Can we avoid that?
> > > >
> > > > I believe all the GFP_ATOMIC calls in sg.c can be downgraded to
> > > > GFP_KERNEL.
> > >
> > > Well, if memory serves, we have been around this loop
> > > before. The GFP_ATOMIC flags were removed, users
> > > complained about unexplained, unbounded waits unrelated
> > > to the SCSI transport or device.
> >
> > this is not a fair answer... in a way GFP_ATOMIC is an antisocial
> > behavior that should only be done when there is an absolute need. You'll
> > dip into the VM reserves by using it after all!
>
> Thing is, SG is (for some reason) using order-3 allocations.  Those can
> indeed send the VM into a large amount of reclaim when combined with
> GFP_KERNEL.
>
> SG will fall back to lower-order pages if the higher-order attempt didn't
> work out.
>
> If we were to fine-tune this then perhaps:
>
>         alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 3)
>         alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 2)
>         alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 1)
>         alloc_pages(GFP_KERNEL, 0)
>
> would suit.
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-08-30 16:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-11  1:43 Bugs on Linux 2.6.18-rc2 sg code? Fajun Chen
2006-08-18 22:00 ` Douglas Gilbert
2006-08-19  4:11   ` Douglas Gilbert
2006-08-19 17:20     ` Matthew Wilcox
2006-08-19 17:36       ` Douglas Gilbert
2006-08-19 17:41         ` Andrew Morton
2006-08-19 20:23           ` Matthew Wilcox
2006-08-20  1:00             ` Douglas Gilbert
2006-08-20 10:07               ` Arjan van de Ven
2006-08-20 17:47                 ` Andrew Morton
2006-08-30 16:29                   ` Fajun Chen
2006-08-19 18:55     ` James Bottomley
2006-08-20  1:36       ` Douglas Gilbert
     [not found]     ` <8202f4270608200051p688f4654ub6aecb604e0152f1@mail.gmail.com>
2006-08-20  9:01       ` Russell King
2006-08-28 21:12     ` Fajun Chen
2006-08-28 22:11       ` Douglas Gilbert
2006-08-21 21:07   ` Fajun Chen
2006-08-22  2:47     ` Douglas Gilbert
2006-08-22  4:27       ` Fajun Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).