* Bugs on Linux 2.6.18-rc2 sg code?
@ 2006-08-11 1:43 Fajun Chen
2006-08-18 22:00 ` Douglas Gilbert
0 siblings, 1 reply; 19+ messages in thread
From: Fajun Chen @ 2006-08-11 1:43 UTC (permalink / raw)
To: linux-scsi; +Cc: dgilbert
Hi Folks,
I use ATA pass through via sg ioctl interface for data read/write.
Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was
running on ARM IOP80321 board. The HBA Sil3124 was used.
Two problems were observed:
1. sg mmap bug?
My test program could not write data correctly to the mmapped
buffer in the user space. The program did a read immediately
after a write and the data mismatches. Swapped the sg_vma_nopage()
function with the one in 2.6.15.4 release fixed the problem. So this
appears to be a wrong change to the sg mmap code in 2.6.18-rc2
release.
2. sg hangs or have really slow response
Under certain unknown conditions, sg will be busy with one
read/write ioctl call for over half an hour. From scsi proc
interface, sg and scsi mid layer were processing requests as states
"act" or "rcv" was shown in /proc/scsi/sg/debug. My test program set
the command timeout to be 30 seconds, but this timeout did not trigger
the command abort.
Are these problems genuine bugs in sg 2.6.18-rc2 release? Since the
problem is reproducible in my test hardware, please let me know if any
log/traces can be collected.
BTW, scsi logging through proc FS seems to be broken as well even
though SCSI logging and Proc FS are enabled in my 2.6.18-rc2 kernel
config.
Thanks,
Fajun
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-11 1:43 Bugs on Linux 2.6.18-rc2 sg code? Fajun Chen @ 2006-08-18 22:00 ` Douglas Gilbert 2006-08-19 4:11 ` Douglas Gilbert 2006-08-21 21:07 ` Fajun Chen 0 siblings, 2 replies; 19+ messages in thread From: Douglas Gilbert @ 2006-08-18 22:00 UTC (permalink / raw) To: Fajun Chen; +Cc: linux-scsi Fajun Chen wrote: > Hi Folks, > > I use ATA pass through via sg ioctl interface for data read/write. > Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was > running on ARM IOP80321 board. The HBA Sil3124 was used. > > Two problems were observed: > 1. sg mmap bug? > My test program could not write data correctly to the mmapped > buffer in the user space. The program did a read immediately > after a write and the data mismatches. Swapped the sg_vma_nopage() > function with the one in 2.6.15.4 release fixed the problem. So this > appears to be a wrong change to the sg mmap code in 2.6.18-rc2 > release. Thanks for the report. I can confirm that mmap-ed IO in the sg driver is broken. Simply reading 16 blocks from some arbitrary offset with sg_dd and sgm_dd and comparing the fetched data shows mismatches starting above the first page (i.e. above byte offset 4096 on i386). Your point about the change to sg_vma_nopage() between lk 2.6.15 and lk 2.6.16 also seems to be correct. The most indented part of that function has been changed from incrementing the change count on the reported page (as indicated by 'offset') in a compound page allocation to ignoring the 'offset' and incrementing the page count on the first page in a compound page allocation. > 2. sg hangs or have really slow response > Under certain unknown conditions, sg will be busy with one > read/write ioctl call for over half an hour. From scsi proc > interface, sg and scsi mid layer were processing requests as states > "act" or "rcv" was shown in /proc/scsi/sg/debug. My test program set > the command timeout to be 30 seconds, but this timeout did not trigger > the command abort. Well yes, I have noticed on transport errors, the attempt to abort the command (or use a larger hammer) fails and the command just keeps on chugging. The "elapsed" time in the /proc/scsi/sg/debug just keeps on growing, typically to a value much larger than the given timeout. > Are these problems genuine bugs in sg 2.6.18-rc2 release? Since the > problem is reproducible in my test hardware, please let me know if any > log/traces can be collected. I was able to confirm the breakage in point 1) with a recent SATA disk with an old SIL 3112 controller and with a SAS disk with some pretty recent MPT Fusion hardware. That was enough for me. > BTW, scsi logging through proc FS seems to be broken as well even > though SCSI logging and Proc FS are enabled in my 2.6.18-rc2 kernel > config. Looks like we should be doing a lot more testing, especially when our friends from other kernel areas offer to clean up our code and remove cruft. Doug Gilbert ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-18 22:00 ` Douglas Gilbert @ 2006-08-19 4:11 ` Douglas Gilbert 2006-08-19 17:20 ` Matthew Wilcox ` (3 more replies) 2006-08-21 21:07 ` Fajun Chen 1 sibling, 4 replies; 19+ messages in thread From: Douglas Gilbert @ 2006-08-19 4:11 UTC (permalink / raw) To: Fajun Chen; +Cc: linux-scsi, akpm [-- Attachment #1: Type: text/plain, Size: 1618 bytes --] Douglas Gilbert wrote: > Fajun Chen wrote: >> Hi Folks, >> >> I use ATA pass through via sg ioctl interface for data read/write. >> Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was >> running on ARM IOP80321 board. The HBA Sil3124 was used. >> >> Two problems were observed: >> 1. sg mmap bug? >> My test program could not write data correctly to the mmapped >> buffer in the user space. The program did a read immediately >> after a write and the data mismatches. Swapped the sg_vma_nopage() >> function with the one in 2.6.15.4 release fixed the problem. So this >> appears to be a wrong change to the sg mmap code in 2.6.18-rc2 >> release. > > Thanks for the report. I can confirm that mmap-ed IO > in the sg driver is broken. Simply reading 16 blocks > from some arbitrary offset with sg_dd and sgm_dd > and comparing the fetched data shows mismatches starting > above the first page (i.e. above byte offset 4096 on > i386). > > Your point about the change to sg_vma_nopage() between > lk 2.6.15 and lk 2.6.16 also seems to be correct. > The most indented part of that function has been > changed from incrementing the change count on the > reported page (as indicated by 'offset') in a > compound page allocation to ignoring the 'offset' > and incrementing the page count on the first page > in a compound page allocation. Fajun, Could you please try the patch below and report if it fixes your sg mmap problem. The patch is against lk 2.6.18-rc4 and I assume it will apply against "rc2" as well. The patch re-instates the former logic and fixes the problem in my tests. Doug Gilbert [-- Attachment #2: sg2618rc4nopage.diff --] [-- Type: text/x-patch, Size: 950 bytes --] --- linux/drivers/scsi/sg.c 2006-07-16 08:19:19.000000000 -0400 +++ linux/drivers/scsi/sg.c2618rc4nopage 2006-08-18 23:40:15.000000000 -0400 @@ -18,8 +18,8 @@ * */ -static int sg_version_num = 30533; /* 2 digits for each component */ -#define SG_VERSION_STR "3.5.33" +static int sg_version_num = 30534; /* 2 digits for each component */ +#define SG_VERSION_STR "3.5.34" /* * D. P. Gilbert (dgilbert@interlog.com, dougg@triode.net.au), notes: @@ -60,7 +60,7 @@ #ifdef CONFIG_SCSI_PROC_FS #include <linux/proc_fs.h> -static char *sg_version_date = "20050908"; +static char *sg_version_date = "20060818"; static int sg_proc_init(void); static void sg_proc_cleanup(void); @@ -1164,7 +1164,7 @@ len = vma->vm_end - sa; len = (len < sg->length) ? len : sg->length; if (offset < len) { - page = sg->page; + page = virt_to_page(page_address(sg->page) + offset); get_page(page); /* increment page count */ break; } ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-19 4:11 ` Douglas Gilbert @ 2006-08-19 17:20 ` Matthew Wilcox 2006-08-19 17:36 ` Douglas Gilbert 2006-08-19 18:55 ` James Bottomley ` (2 subsequent siblings) 3 siblings, 1 reply; 19+ messages in thread From: Matthew Wilcox @ 2006-08-19 17:20 UTC (permalink / raw) To: Douglas Gilbert; +Cc: Fajun Chen, linux-scsi, akpm On Sat, Aug 19, 2006 at 12:11:34AM -0400, Douglas Gilbert wrote: > > Your point about the change to sg_vma_nopage() between > > lk 2.6.15 and lk 2.6.16 also seems to be correct. > > The most indented part of that function has been > > changed from incrementing the change count on the > > reported page (as indicated by 'offset') in a > > compound page allocation to ignoring the 'offset' > > and incrementing the page count on the first page > > in a compound page allocation. > if (offset < len) { > - page = sg->page; > + page = virt_to_page(page_address(sg->page) + offset); > get_page(page); /* increment page count */ But page_address can return NULL on highmem machines. Or are you guaranteed that these are lowmem pages? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-19 17:20 ` Matthew Wilcox @ 2006-08-19 17:36 ` Douglas Gilbert 2006-08-19 17:41 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Douglas Gilbert @ 2006-08-19 17:36 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Fajun Chen, linux-scsi, akpm Matthew Wilcox wrote: > On Sat, Aug 19, 2006 at 12:11:34AM -0400, Douglas Gilbert wrote: >>> Your point about the change to sg_vma_nopage() between >>> lk 2.6.15 and lk 2.6.16 also seems to be correct. >>> The most indented part of that function has been >>> changed from incrementing the change count on the >>> reported page (as indicated by 'offset') in a >>> compound page allocation to ignoring the 'offset' >>> and incrementing the page count on the first page >>> in a compound page allocation. >> if (offset < len) { >> - page = sg->page; >> + page = virt_to_page(page_address(sg->page) + offset); >> get_page(page); /* increment page count */ > > But page_address can return NULL on highmem machines. Or are you > guaranteed that these are lowmem pages? Matthew, The allocation is typically compound done with: alloc_pages(GFP_ATOMIC|__GFP_COMP|__GFP_NOWARN, order) where 'order' is 3 (hence 32 KB). The lack of a '__GFP_HIGHMEM' flag should mean these are lowmem pages (?). The (compound) pages in question would have already been mmap()-ed by the user. Doug Gilbert ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-19 17:36 ` Douglas Gilbert @ 2006-08-19 17:41 ` Andrew Morton 2006-08-19 20:23 ` Matthew Wilcox 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2006-08-19 17:41 UTC (permalink / raw) To: dougg; +Cc: Matthew Wilcox, Fajun Chen, linux-scsi On Sat, 19 Aug 2006 13:36:33 -0400 Douglas Gilbert <dougg@torque.net> wrote: > Matthew Wilcox wrote: > > On Sat, Aug 19, 2006 at 12:11:34AM -0400, Douglas Gilbert wrote: > >>> Your point about the change to sg_vma_nopage() between > >>> lk 2.6.15 and lk 2.6.16 also seems to be correct. > >>> The most indented part of that function has been > >>> changed from incrementing the change count on the > >>> reported page (as indicated by 'offset') in a > >>> compound page allocation to ignoring the 'offset' > >>> and incrementing the page count on the first page > >>> in a compound page allocation. > >> if (offset < len) { > >> - page = sg->page; > >> + page = virt_to_page(page_address(sg->page) + offset); > >> get_page(page); /* increment page count */ > > > > But page_address can return NULL on highmem machines. Or are you > > guaranteed that these are lowmem pages? > > Matthew, > The allocation is typically compound done with: > alloc_pages(GFP_ATOMIC|__GFP_COMP|__GFP_NOWARN, order) > where 'order' is 3 (hence 32 KB). ow, GFP_ATOMIC is bad. Can we avoid that? > The lack of a '__GFP_HIGHMEM' flag should mean these are > lowmem pages (?). Yup. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-19 17:41 ` Andrew Morton @ 2006-08-19 20:23 ` Matthew Wilcox 2006-08-20 1:00 ` Douglas Gilbert 0 siblings, 1 reply; 19+ messages in thread From: Matthew Wilcox @ 2006-08-19 20:23 UTC (permalink / raw) To: Andrew Morton; +Cc: dougg, Fajun Chen, linux-scsi On Sat, Aug 19, 2006 at 10:41:09AM -0700, Andrew Morton wrote: > ow, GFP_ATOMIC is bad. Can we avoid that? I believe all the GFP_ATOMIC calls in sg.c can be downgraded to GFP_KERNEL. sg_add_sfp is only called from sg_open(), so the GFP_ATOMIC there can be downgraded to GFP_KERNEL. There's one in sg_common_write that can be downgraded. It's called from sg_write and sg_ioctl (via sg_new_write). No locks are held at those points. st_map_user_pages uses GFP_ATOMIC. It's only called from sg_build_direct, which is only called from sg_start_req, which is only called from sg_new_write, which is safe per above. sg_page_malloc uses GFP_ATOMIC. It's only called from sg_build_indirect. sg_build_indirect is called from sg_start_req (previously shown safe), and sg_build_reserve. sg_build_reserve is safe as it's only called from sg_ioctl and sg_add_sfp (previously shown safe). sg_build_sgat is called from sg_build_direct and sg_build_indirect, both of which have previously been shown safe. Please check my reasoning, but I'm pretty confident that none of these paths are called with any locks held. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-19 20:23 ` Matthew Wilcox @ 2006-08-20 1:00 ` Douglas Gilbert 2006-08-20 10:07 ` Arjan van de Ven 0 siblings, 1 reply; 19+ messages in thread From: Douglas Gilbert @ 2006-08-20 1:00 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Andrew Morton, Fajun Chen, linux-scsi, James.Bottomley Matthew Wilcox wrote: > On Sat, Aug 19, 2006 at 10:41:09AM -0700, Andrew Morton wrote: >> ow, GFP_ATOMIC is bad. Can we avoid that? > > I believe all the GFP_ATOMIC calls in sg.c can be downgraded to > GFP_KERNEL. Well, if memory serves, we have been around this loop before. The GFP_ATOMIC flags were removed, users complained about unexplained, unbounded waits unrelated to the SCSI transport or device. So the GFP_ATOMIC flags were put back again. The semantics of the function that calls alloc_pages() is to cope with a (fast) failure, and if it is an ambit claim (e.g. the reserve buffer allocation on open() ) then lower the order by 1 (halve the requested allocation) then try again. The semantics of SCSI command injection in the sg driver have been to fetch the resources needed and inject the command to the mid level as quickly as possible. If there is a resource problem, exit and tell the user space promptly. So the action required seems to be __GFP_NORETRY if that works as its name suggests. > sg_add_sfp is only called from sg_open(), so the GFP_ATOMIC there can > be downgraded to GFP_KERNEL. That is called on open() so GFP_KERNEL | __GFP_NORETRY should be ok. > There's one in sg_common_write that can be downgraded. It's called > from sg_write and sg_ioctl (via sg_new_write). No locks are held at > those points. That one is the last argument to scsi_execute_async(). I didn't add that and don't know what it is for. Perhaps James could comment. > st_map_user_pages uses GFP_ATOMIC. It's only called from > sg_build_direct, which is only called from sg_start_req, which is only > called from sg_new_write, which is safe per above. GFP_KERNEL | __GFP_NORETRY > sg_page_malloc uses GFP_ATOMIC. It's only called from sg_build_indirect. > sg_build_indirect is called from sg_start_req (previously shown safe), > and sg_build_reserve. sg_build_reserve is safe as it's only called from > sg_ioctl and sg_add_sfp (previously shown safe). > > sg_build_sgat is called from sg_build_direct and sg_build_indirect, > both of which have previously been shown safe. > > Please check my reasoning, but I'm pretty confident that none of these > paths are called with any locks held. None of the kmalloc()s or alloc_pages() are called from softirq callback ( sg_cmd_done() in sg's case) that I can see. I don't think locks taken by the sg driver should be a problem. BTW I'm initially trying to fix a bug added by someone else into the sg driver in lk 2.6.16 . The code I'm proposing isn't optimal but, by reverting the code (more or less) to what it was, at least I know it was well tested :-) Thus I was hoping to get the fix into lk 2.6.18 . The other changes being discussed could go into the lk 2.6.19 cycle. Doug Gilbert ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-20 1:00 ` Douglas Gilbert @ 2006-08-20 10:07 ` Arjan van de Ven 2006-08-20 17:47 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Arjan van de Ven @ 2006-08-20 10:07 UTC (permalink / raw) To: dougg Cc: Matthew Wilcox, Andrew Morton, Fajun Chen, linux-scsi, James.Bottomley On Sat, 2006-08-19 at 21:00 -0400, Douglas Gilbert wrote: > Matthew Wilcox wrote: > > On Sat, Aug 19, 2006 at 10:41:09AM -0700, Andrew Morton wrote: > >> ow, GFP_ATOMIC is bad. Can we avoid that? > > > > I believe all the GFP_ATOMIC calls in sg.c can be downgraded to > > GFP_KERNEL. > > Well, if memory serves, we have been around this loop > before. The GFP_ATOMIC flags were removed, users > complained about unexplained, unbounded waits unrelated > to the SCSI transport or device. this is not a fair answer... in a way GFP_ATOMIC is an antisocial behavior that should only be done when there is an absolute need. You'll dip into the VM reserves by using it after all! And for normal allocations, people who get the kernel to ask for a lot of memory need to do their fair share of waiting for memory, just to have write throttling work correctly... ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-20 10:07 ` Arjan van de Ven @ 2006-08-20 17:47 ` Andrew Morton 2006-08-30 16:29 ` Fajun Chen 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2006-08-20 17:47 UTC (permalink / raw) To: Arjan van de Ven Cc: dougg, Matthew Wilcox, Fajun Chen, linux-scsi, James.Bottomley On Sun, 20 Aug 2006 12:07:21 +0200 Arjan van de Ven <arjan@fenrus.demon.nl> wrote: > On Sat, 2006-08-19 at 21:00 -0400, Douglas Gilbert wrote: > > Matthew Wilcox wrote: > > > On Sat, Aug 19, 2006 at 10:41:09AM -0700, Andrew Morton wrote: > > >> ow, GFP_ATOMIC is bad. Can we avoid that? > > > > > > I believe all the GFP_ATOMIC calls in sg.c can be downgraded to > > > GFP_KERNEL. > > > > Well, if memory serves, we have been around this loop > > before. The GFP_ATOMIC flags were removed, users > > complained about unexplained, unbounded waits unrelated > > to the SCSI transport or device. > > this is not a fair answer... in a way GFP_ATOMIC is an antisocial > behavior that should only be done when there is an absolute need. You'll > dip into the VM reserves by using it after all! Thing is, SG is (for some reason) using order-3 allocations. Those can indeed send the VM into a large amount of reclaim when combined with GFP_KERNEL. SG will fall back to lower-order pages if the higher-order attempt didn't work out. If we were to fine-tune this then perhaps: alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 3) alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 2) alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 1) alloc_pages(GFP_KERNEL, 0) would suit. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-20 17:47 ` Andrew Morton @ 2006-08-30 16:29 ` Fajun Chen 0 siblings, 0 replies; 19+ messages in thread From: Fajun Chen @ 2006-08-30 16:29 UTC (permalink / raw) To: Andrew Morton Cc: Arjan van de Ven, dougg, Matthew Wilcox, linux-scsi, James.Bottomley Since we are scrutinize sg page alloc code, I have a question here. Is there any way to make the sg pages allocated noncacheable? If we use sg in mmap mode, the reserved buffer could be changed by user space, hardware as well as kernel. Uncached kernel access to these pages can make PIO read/writes cache coherent. Thanks, Fajun On 8/20/06, Andrew Morton <akpm@osdl.org> wrote: > On Sun, 20 Aug 2006 12:07:21 +0200 > Arjan van de Ven <arjan@fenrus.demon.nl> wrote: > > > On Sat, 2006-08-19 at 21:00 -0400, Douglas Gilbert wrote: > > > Matthew Wilcox wrote: > > > > On Sat, Aug 19, 2006 at 10:41:09AM -0700, Andrew Morton wrote: > > > >> ow, GFP_ATOMIC is bad. Can we avoid that? > > > > > > > > I believe all the GFP_ATOMIC calls in sg.c can be downgraded to > > > > GFP_KERNEL. > > > > > > Well, if memory serves, we have been around this loop > > > before. The GFP_ATOMIC flags were removed, users > > > complained about unexplained, unbounded waits unrelated > > > to the SCSI transport or device. > > > > this is not a fair answer... in a way GFP_ATOMIC is an antisocial > > behavior that should only be done when there is an absolute need. You'll > > dip into the VM reserves by using it after all! > > Thing is, SG is (for some reason) using order-3 allocations. Those can > indeed send the VM into a large amount of reclaim when combined with > GFP_KERNEL. > > SG will fall back to lower-order pages if the higher-order attempt didn't > work out. > > If we were to fine-tune this then perhaps: > > alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 3) > alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 2) > alloc_pages(GFP_ATOMIC & ~__GFP_HIGH, 1) > alloc_pages(GFP_KERNEL, 0) > > would suit. > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-19 4:11 ` Douglas Gilbert 2006-08-19 17:20 ` Matthew Wilcox @ 2006-08-19 18:55 ` James Bottomley 2006-08-20 1:36 ` Douglas Gilbert [not found] ` <8202f4270608200051p688f4654ub6aecb604e0152f1@mail.gmail.com> 2006-08-28 21:12 ` Fajun Chen 3 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2006-08-19 18:55 UTC (permalink / raw) To: dougg; +Cc: Fajun Chen, linux-scsi, akpm On Sat, 2006-08-19 at 00:11 -0400, Douglas Gilbert wrote: > @@ -1164,7 +1164,7 @@ > len = vma->vm_end - sa; > len = (len < sg->length) ? len : sg->length; > if (offset < len) { > - page = sg->page; > + page = virt_to_page(page_address(sg->page) + offset); > get_page(page); /* increment page count */ > break; > } Doing something like this always frightens people in linux because page_address() on highmem returns NULL. I know, having looked, that in this case it can't happen, but since sg->page is really an array of pages, how about something simpler like page = &sg->page[offset >> PAGE_SHIFT] or (if you want to be more correct) something like the nth_page macro in linux/mm.h? It might also be worthwhile considering GFP_HIGHUSER for this allocation (in spite of all the kmap_atomic et al that would have to be added to the code), since that will increase the chance of a contiguous allocation on large memory machines. James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-19 18:55 ` James Bottomley @ 2006-08-20 1:36 ` Douglas Gilbert 0 siblings, 0 replies; 19+ messages in thread From: Douglas Gilbert @ 2006-08-20 1:36 UTC (permalink / raw) To: James Bottomley; +Cc: Fajun Chen, linux-scsi, akpm James Bottomley wrote: > On Sat, 2006-08-19 at 00:11 -0400, Douglas Gilbert wrote: >> @@ -1164,7 +1164,7 @@ >> len = vma->vm_end - sa; >> len = (len < sg->length) ? len : sg->length; >> if (offset < len) { >> - page = sg->page; >> + page = virt_to_page(page_address(sg->page) + offset); >> get_page(page); /* increment page count */ >> break; >> } > > Doing something like this always frightens people in linux because > page_address() on highmem returns NULL. I know, having looked, that in > this case it can't happen, but since sg->page is really an array of > pages, how about something simpler like > > page = &sg->page[offset >> PAGE_SHIFT] James, Yes I saw code like that when I reviewed other vma "nopage" callbacks. And that code frightens me :-) > or (if you want to be more correct) something like the nth_page macro in > linux/mm.h? That nth_page() macro looks better. Is it guaranteed not to return NULL? No vma "nopage" callbacks are using nth_page() so I wasn't aware of it. > It might also be worthwhile considering GFP_HIGHUSER for this allocation > (in spite of all the kmap_atomic et al that would have to be added to > the code), since that will increase the chance of a contiguous > allocation on large memory machines. Did you mean GFP_KERNEL | __GFP_HIGHMEM since the allocation in question is for a scatter gather list element that will be DMA-ed to or from? Is the HIGHMEM stuff needed in 64 bits? I looked at the kmap_atomic stuff and it just didn't look worth the effort. All good stuff for the lk 2.6.19 cycle. First I would like to fix the reported bug with code that was well tested ... Doug Gilbert ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <8202f4270608200051p688f4654ub6aecb604e0152f1@mail.gmail.com>]
* Re: Bugs on Linux 2.6.18-rc2 sg code? [not found] ` <8202f4270608200051p688f4654ub6aecb604e0152f1@mail.gmail.com> @ 2006-08-20 9:01 ` Russell King 0 siblings, 0 replies; 19+ messages in thread From: Russell King @ 2006-08-20 9:01 UTC (permalink / raw) To: Fajun Chen; +Cc: dougg, linux-scsi, akpm On Sun, Aug 20, 2006 at 01:51:33AM -0600, Fajun Chen wrote: > I've tested mmap code in both 2.6.15.4 and 2.6.18-rc2 with > sg_vma_nopage() from 2.6.15.4, this particular sg mmap bug seems to be > fixed?$B!%!!#S#i#n#c#e!!#y#o#u#r!!#p#a#t#c#h!!#f#o#r!!#2!%#6!%#1#8!]#r#c#4!!#i#s!!essentially not different This is unreadable... > from 2.6.15.4 (the only difference is sg->offset, which is always 0 > for sg mmap or indirect IO), this should fix the problem. I'll test > your patch next week to confirm it anyway. > > I would like to also report a potential cache coherency issue on sg > when running on some processors other than i386. I ran into cache > coherency issue on ARM XScale Iop80321. I no longer touch SCSI, partly because the only SCSI based systems I have are extremely (10 years) old now. All my SCSI knowledge is based on ancient 2.4 code and is no longer relevant. > Existing sg code for direct IO: I don't think direct-IO has ever been tested and debugged on ARM either. I wouldn't know where to start with direct-IO - never used and never looked at it. And more importantly (for me), I'd rather not get into another squabble with James over cache coherency ideas, so I want to stay well away from cache coherency and block devices, thanks. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-19 4:11 ` Douglas Gilbert ` (2 preceding siblings ...) [not found] ` <8202f4270608200051p688f4654ub6aecb604e0152f1@mail.gmail.com> @ 2006-08-28 21:12 ` Fajun Chen 2006-08-28 22:11 ` Douglas Gilbert 3 siblings, 1 reply; 19+ messages in thread From: Fajun Chen @ 2006-08-28 21:12 UTC (permalink / raw) To: dougg; +Cc: linux-scsi, akpm Hi Doug, Forgot to report this. I've tested your patch and it works. Thanks, Fajun On 8/18/06, Douglas Gilbert <dougg@torque.net> wrote: > Douglas Gilbert wrote: > > Fajun Chen wrote: > >> Hi Folks, > >> > >> I use ATA pass through via sg ioctl interface for data read/write. > >> Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was > >> running on ARM IOP80321 board. The HBA Sil3124 was used. > >> > >> Two problems were observed: > >> 1. sg mmap bug? > >> My test program could not write data correctly to the mmapped > >> buffer in the user space. The program did a read immediately > >> after a write and the data mismatches. Swapped the sg_vma_nopage() > >> function with the one in 2.6.15.4 release fixed the problem. So this > >> appears to be a wrong change to the sg mmap code in 2.6.18-rc2 > >> release. > > > > Thanks for the report. I can confirm that mmap-ed IO > > in the sg driver is broken. Simply reading 16 blocks > > from some arbitrary offset with sg_dd and sgm_dd > > and comparing the fetched data shows mismatches starting > > above the first page (i.e. above byte offset 4096 on > > i386). > > > > Your point about the change to sg_vma_nopage() between > > lk 2.6.15 and lk 2.6.16 also seems to be correct. > > The most indented part of that function has been > > changed from incrementing the change count on the > > reported page (as indicated by 'offset') in a > > compound page allocation to ignoring the 'offset' > > and incrementing the page count on the first page > > in a compound page allocation. > > Fajun, > Could you please try the patch below and report if it fixes > your sg mmap problem. The patch is against lk 2.6.18-rc4 > and I assume it will apply against "rc2" as well. > The patch re-instates the former logic and fixes the > problem in my tests. > > Doug Gilbert > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-28 21:12 ` Fajun Chen @ 2006-08-28 22:11 ` Douglas Gilbert 0 siblings, 0 replies; 19+ messages in thread From: Douglas Gilbert @ 2006-08-28 22:11 UTC (permalink / raw) To: Fajun Chen; +Cc: linux-scsi Fajun Chen wrote: > Hi Doug, > > Forgot to report this. I've tested your patch and it works. Fajun, That is good to hear as that patch appeared in lk 2.6.18-rc5 today. Doug Gilbert > On 8/18/06, Douglas Gilbert <dougg@torque.net> wrote: >> Douglas Gilbert wrote: >> > Fajun Chen wrote: >> >> Hi Folks, >> >> >> >> I use ATA pass through via sg ioctl interface for data read/write. >> >> Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was >> >> running on ARM IOP80321 board. The HBA Sil3124 was used. >> >> >> >> Two problems were observed: >> >> 1. sg mmap bug? >> >> My test program could not write data correctly to the mmapped >> >> buffer in the user space. The program did a read immediately >> >> after a write and the data mismatches. Swapped the sg_vma_nopage() >> >> function with the one in 2.6.15.4 release fixed the problem. So this >> >> appears to be a wrong change to the sg mmap code in 2.6.18-rc2 >> >> release. >> > >> > Thanks for the report. I can confirm that mmap-ed IO >> > in the sg driver is broken. Simply reading 16 blocks >> > from some arbitrary offset with sg_dd and sgm_dd >> > and comparing the fetched data shows mismatches starting >> > above the first page (i.e. above byte offset 4096 on >> > i386). >> > >> > Your point about the change to sg_vma_nopage() between >> > lk 2.6.15 and lk 2.6.16 also seems to be correct. >> > The most indented part of that function has been >> > changed from incrementing the change count on the >> > reported page (as indicated by 'offset') in a >> > compound page allocation to ignoring the 'offset' >> > and incrementing the page count on the first page >> > in a compound page allocation. >> >> Fajun, >> Could you please try the patch below and report if it fixes >> your sg mmap problem. The patch is against lk 2.6.18-rc4 >> and I assume it will apply against "rc2" as well. >> The patch re-instates the former logic and fixes the >> problem in my tests. >> >> Doug Gilbert >> >> >> > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-18 22:00 ` Douglas Gilbert 2006-08-19 4:11 ` Douglas Gilbert @ 2006-08-21 21:07 ` Fajun Chen 2006-08-22 2:47 ` Douglas Gilbert 1 sibling, 1 reply; 19+ messages in thread From: Fajun Chen @ 2006-08-21 21:07 UTC (permalink / raw) To: dougg; +Cc: linux-scsi Hi Doug, I ran into sg hang problem again while running direct IO read/write. Here are some logs collected. I set scsi_logging_level to 63 and dmesg was collected right after I noticed invalid elap number. I hope this will provide some information for your debugging. Thanks, Fajun Logs: ~ $ cat /proc/scsi/sg/debug dev_max(currently)=32 max_active_device=2 (origin 1) def_reserved_size=32768 >>> device=sg0 scsi0 chan=0 id=0 lun=0 em=1 sg_tablesize=128 excl=0 FD(1): timeout=60000ms bufflen=131072 (res)sgat=4 low_dma=0 cmd_q=1 f_packid=0 k_orphan=0 closed=0 act: id=0 blen=0 t_o/elap=60000/458503350ms sgat=0 op=0x85 ~ $ dmesg ne: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 sg_finish_rem_req: res_used=0 sg_remove_scat: k_use_sg=16 sg_ioctl: sg0, cmd=0x2285 scsi_block_when_processing_errors: rtn: 1 sg_common_write: scsi opcode=0x85, cmd_size=16 sg_start_req: dxfer_len=65536 scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) scsi_delete_timer: scmd: c080f560, rtn: 1 sg_cmd_done: sg0, pack_id=0, res=0x8000002 On 8/18/06, Douglas Gilbert <dougg@torque.net> wrote: > Fajun Chen wrote: > > Hi Folks, > > > > I use ATA pass through via sg ioctl interface for data read/write. > > Linux 2.6.18-rc2 patched with Jeff Garzik's libata git patch was > > running on ARM IOP80321 board. The HBA Sil3124 was used. > > > > Two problems were observed: > > 1. sg mmap bug? > > My test program could not write data correctly to the mmapped > > buffer in the user space. The program did a read immediately > > after a write and the data mismatches. Swapped the sg_vma_nopage() > > function with the one in 2.6.15.4 release fixed the problem. So this > > appears to be a wrong change to the sg mmap code in 2.6.18-rc2 > > release. > > Thanks for the report. I can confirm that mmap-ed IO > in the sg driver is broken. Simply reading 16 blocks > from some arbitrary offset with sg_dd and sgm_dd > and comparing the fetched data shows mismatches starting > above the first page (i.e. above byte offset 4096 on > i386). > > Your point about the change to sg_vma_nopage() between > lk 2.6.15 and lk 2.6.16 also seems to be correct. > The most indented part of that function has been > changed from incrementing the change count on the > reported page (as indicated by 'offset') in a > compound page allocation to ignoring the 'offset' > and incrementing the page count on the first page > in a compound page allocation. > > > 2. sg hangs or have really slow response > > Under certain unknown conditions, sg will be busy with one > > read/write ioctl call for over half an hour. From scsi proc > > interface, sg and scsi mid layer were processing requests as states > > "act" or "rcv" was shown in /proc/scsi/sg/debug. My test program set > > the command timeout to be 30 seconds, but this timeout did not trigger > > the command abort. > > Well yes, I have noticed on transport errors, the attempt > to abort the command (or use a larger hammer) fails and > the command just keeps on chugging. The "elapsed" time > in the /proc/scsi/sg/debug just keeps on growing, typically > to a value much larger than the given timeout. > > > Are these problems genuine bugs in sg 2.6.18-rc2 release? Since the > > problem is reproducible in my test hardware, please let me know if any > > log/traces can be collected. > > I was able to confirm the breakage in point 1) > with a recent SATA disk with an old SIL 3112 controller > and with a SAS disk with some pretty recent MPT Fusion > hardware. That was enough for me. > > > BTW, scsi logging through proc FS seems to be broken as well even > > though SCSI logging and Proc FS are enabled in my 2.6.18-rc2 kernel > > config. > > Looks like we should be doing a lot more testing, > especially when our friends from other kernel areas > offer to clean up our code and remove cruft. > > Doug Gilbert > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-21 21:07 ` Fajun Chen @ 2006-08-22 2:47 ` Douglas Gilbert 2006-08-22 4:27 ` Fajun Chen 0 siblings, 1 reply; 19+ messages in thread From: Douglas Gilbert @ 2006-08-22 2:47 UTC (permalink / raw) To: Fajun Chen; +Cc: linux-scsi Fajun Chen wrote: > Hi Doug, > > I ran into sg hang problem again while running direct IO read/write. > Here are some logs collected. I set scsi_logging_level to 63 and dmesg > was collected right after I noticed invalid elap number. I hope this > will provide some information for your debugging. Fajun, As I said earlier I don't think that this is a sg driver problem (well not directly). The sg driver sets up a a SCSI command, complete with a timeout (60 seconds in your case) and a callback. The sg driver then waits the whole weekend (judging from the elapsed time) and hears nothing. The timeout is administered by the midlevel and when it goes off the mid level asks the LLD to abort the command (and if that fails it tries a few other things). Alternatively the LLD can define its own exception handler. Meanwhile the sg driver waits passively for that command to finish. So it looks like you are sending ATA read and write commands through the ATA PASS THROUGH (16) SCSI command (opcode 0x85). All the commands shown in the sg debug output terminate with CHECK CONDITION (res=0x8000002). That doesn't look correct for read and write commands. Have you set the CK_COND bit? I can see if an ATA read fails you may need to fetch the lba registers. Surely the SATL should give you an ATA Return (sense) descriptor if an ATA command fails in command 0x85, but I couldn't see that written in the SAT draft. I'm just curious about that point. It probably has no bearing on the hang, but returning sense data when there is no need may impact performance. Doug Gilbert > Logs: > ~ $ cat /proc/scsi/sg/debug > dev_max(currently)=32 max_active_device=2 (origin 1) > def_reserved_size=32768 >>>> device=sg0 scsi0 chan=0 id=0 lun=0 em=1 sg_tablesize=128 excl=0 > FD(1): timeout=60000ms bufflen=131072 (res)sgat=4 low_dma=0 > cmd_q=1 f_packid=0 k_orphan=0 closed=0 > act: id=0 blen=0 t_o/elap=60000/458503350ms sgat=0 op=0x85 BTW The above is normal (i.e. indirect) IO trying to move up to 128 KB of data (with a four element scatter gather list, each element 32 KB). An ATA command is being sent via the SAT defined pass through. The elapsed time is 127 hours (over 5 days). > ~ $ dmesg > ne: sg0, pack_id=0, res=0x8000002 > sg_finish_rem_req: res_used=0 > sg_remove_scat: k_use_sg=16 > sg_ioctl: sg0, cmd=0x2285 > scsi_block_when_processing_errors: rtn: 1 > sg_common_write: scsi opcode=0x85, cmd_size=16 > sg_start_req: dxfer_len=65536 > scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) > scsi_delete_timer: scmd: c080f560, rtn: 1 > sg_cmd_done: sg0, pack_id=0, res=0x8000002 ^^^^^^^^^^^^^ <snip> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bugs on Linux 2.6.18-rc2 sg code? 2006-08-22 2:47 ` Douglas Gilbert @ 2006-08-22 4:27 ` Fajun Chen 0 siblings, 0 replies; 19+ messages in thread From: Fajun Chen @ 2006-08-22 4:27 UTC (permalink / raw) To: dougg; +Cc: linux-scsi Hi Doug, Thank you for the detailed explanation. I still have some doubts/questions: 1. This sg hang problem happened and logs were collected shortly after the test was started (less than an hour), so elap time for 127 hours seems to be miscalculated somewhere. 2. I didn't set CK_COND bit in ATA Pass Through command for read/write, could the CHECK CONDITION be the result of command failure? 3. From application level trace, my test application was blocked in sg IOCTL call all along. From sg traces "sg_cmd_done: sg0, pack_id=0, res=0x8000002", it seems sg did get sg_cmd_done() callback. In this case, why my application was NOT unblocked? Note that my application only opens one sg file and it will not send additional commands before the current one is finished. 4. From your reply, middle layer is the one administrating timeout. But for the sake of defense, should upper layers have it's own timeout as well? I am thinking about add timeout in my application to defense low level breakdown. Thanks, Fajun On 8/21/06, Douglas Gilbert <dougg@torque.net> wrote: > Fajun Chen wrote: > > Hi Doug, > > > > I ran into sg hang problem again while running direct IO read/write. > > Here are some logs collected. I set scsi_logging_level to 63 and dmesg > > was collected right after I noticed invalid elap number. I hope this > > will provide some information for your debugging. > > Fajun, > As I said earlier I don't think that this is a sg driver > problem (well not directly). The sg driver sets up a > a SCSI command, complete with a timeout (60 seconds > in your case) and a callback. The sg driver then waits > the whole weekend (judging from the elapsed time) > and hears nothing. The timeout is administered by > the midlevel and when it goes off the mid level asks > the LLD to abort the command (and if that fails it > tries a few other things). Alternatively the LLD > can define its own exception handler. Meanwhile the sg > driver waits passively for that command to finish. > > So it looks like you are sending ATA read and write > commands through the ATA PASS THROUGH (16) SCSI command > (opcode 0x85). All the commands shown in the sg debug > output terminate with CHECK CONDITION (res=0x8000002). > That doesn't look correct for read and write commands. > Have you set the CK_COND bit? I can see if an ATA read > fails you may need to fetch the lba registers. Surely > the SATL should give you an ATA Return (sense) descriptor > if an ATA command fails in command 0x85, but I couldn't > see that written in the SAT draft. > > I'm just curious about that point. It probably has no > bearing on the hang, but returning sense data when there > is no need may impact performance. > > Doug Gilbert > > > Logs: > > ~ $ cat /proc/scsi/sg/debug > > dev_max(currently)=32 max_active_device=2 (origin 1) > > def_reserved_size=32768 > >>>> device=sg0 scsi0 chan=0 id=0 lun=0 em=1 sg_tablesize=128 excl=0 > > FD(1): timeout=60000ms bufflen=131072 (res)sgat=4 low_dma=0 > > cmd_q=1 f_packid=0 k_orphan=0 closed=0 > > act: id=0 blen=0 t_o/elap=60000/458503350ms sgat=0 op=0x85 > > BTW The above is normal (i.e. indirect) IO trying to move up to > 128 KB of data (with a four element scatter gather list, each > element 32 KB). An ATA command is being sent via the SAT > defined pass through. The elapsed time is 127 hours (over 5 days). > > > ~ $ dmesg > > ne: sg0, pack_id=0, res=0x8000002 > > sg_finish_rem_req: res_used=0 > > sg_remove_scat: k_use_sg=16 > > sg_ioctl: sg0, cmd=0x2285 > > scsi_block_when_processing_errors: rtn: 1 > > sg_common_write: scsi opcode=0x85, cmd_size=16 > > sg_start_req: dxfer_len=65536 > > scsi_add_timer: scmd: c080f560, time: 6000, (c00f8a98) > > scsi_delete_timer: scmd: c080f560, rtn: 1 > > sg_cmd_done: sg0, pack_id=0, res=0x8000002 > ^^^^^^^^^^^^^ > <snip> > ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-08-30 16:29 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-11 1:43 Bugs on Linux 2.6.18-rc2 sg code? Fajun Chen
2006-08-18 22:00 ` Douglas Gilbert
2006-08-19 4:11 ` Douglas Gilbert
2006-08-19 17:20 ` Matthew Wilcox
2006-08-19 17:36 ` Douglas Gilbert
2006-08-19 17:41 ` Andrew Morton
2006-08-19 20:23 ` Matthew Wilcox
2006-08-20 1:00 ` Douglas Gilbert
2006-08-20 10:07 ` Arjan van de Ven
2006-08-20 17:47 ` Andrew Morton
2006-08-30 16:29 ` Fajun Chen
2006-08-19 18:55 ` James Bottomley
2006-08-20 1:36 ` Douglas Gilbert
[not found] ` <8202f4270608200051p688f4654ub6aecb604e0152f1@mail.gmail.com>
2006-08-20 9:01 ` Russell King
2006-08-28 21:12 ` Fajun Chen
2006-08-28 22:11 ` Douglas Gilbert
2006-08-21 21:07 ` Fajun Chen
2006-08-22 2:47 ` Douglas Gilbert
2006-08-22 4:27 ` Fajun Chen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.