SATA_SIL works with 2.6.7-bk8 seagate drive, but oops

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-24 18:46 ` Ricky Beam
@ 2004-06-25 21:34   ` George Georgalis
  2004-06-25 23:16     ` Linus Torvalds
  2004-06-29  8:46     ` Sebastian Slota
  0 siblings, 2 replies; 14+ messages in thread
From: George Georgalis @ 2004-06-25 21:34 UTC (permalink / raw)
  To: Linux Kernel Mail List

[-- Attachment #1: Type: text/plain, Size: 4165 bytes --]

On Thu, Jun 24, 2004 at 02:46:39PM -0400, Ricky Beam wrote:
>On Thu, 24 Jun 2004, George Georgalis wrote:
>...
>>has caused pdflush to block IO, any access to /mnt and the process
>>does not return. other than the pdflush load of ~99% the box seems to
>>function normally. 2.6.7-bk6, seagate drive
>
>-bk6 is not new enough.  bk7 has the necessary max_sectors fix.  You
>may need to add your drive model to the sil_blacklist in
>drivers/scsi/sata_sil.c.

Okay, 2.6.7-bk8 has written 8Gb to the sda4 with SATA_SIL and still
going strong! "dd if=/dev/zero of=/mnt/zero-`date +%s`"

However at about 3Gb (if that is relevant) top segfaulted with a
non critical oops. top will not restart, but the box is otherwise
functioning well considering the write load.

Is there any way to determine the drive model without first connecting
with the other sata driver (as hdc) and using hdparm?


Unable to handle kernel NULL pointer dereference at virtual address 000000b4
 printing eip:
c017c78a
*pde = 00000000
Oops: 0000 [#1]
PREEMPT 
CPU:    0
EIP:    0060:[<c017c78a>]    Not tainted
EFLAGS: 00010286   (2.6.7-sta-bk8) 
EIP is at pid_alive+0xa/0x30
eax: 000000b8   ebx: d32b0310   ecx: 00000000   edx: 00000000
esi: 00000000   edi: ef7bb7a0   ebp: d22b1b40   esp: db473e4c
ds: 007b   es: 007b   ss: 0068
Process top (pid: 489, threadinfo=db472000 task=e60ac7c0)
Stack: c017cca4 00000000 d22b1b40 db473f18 ef7bb7a0 db473ec4 c0159754 d22b1b40 
       db473f18 eaa1f006 eaa1f009 db473ec4 db473f18 c0159cc5 db473f18 db473ecc 
       db473ec4 ef7b86e0 d22b1dfc ee655240 bffff000 c0141ec8 c15cd660 c013e95c 
Call Trace:
 [<c017cca4>] pid_revalidate+0x14/0xc0
 [<c0159754>] do_lookup+0x44/0x80
 [<c0159cc5>] link_path_walk+0x535/0xa20
 [<c0141ec8>] find_extend_vma+0x18/0x70
 [<c013e95c>] follow_page+0x8c/0xb0
 [<c013ea3c>] get_user_pages+0xbc/0x3d0
 [<c015a406>] path_lookup+0x86/0x1a0
 [<c015a6a9>] __user_walk+0x39/0x70
 [<c0155a95>] vfs_stat+0x15/0x60
 [<c02445dd>] copy_to_user+0x2d/0x40
 [<c0156151>] sys_stat64+0x11/0x30
 [<c014dcbd>] __fput+0x8d/0xf0
 [<c014c6c3>] filp_close+0x43/0x70
 [<c014c744>] sys_close+0x54/0x80
 [<c0105dc7>] syscall_call+0x7/0xb




Could this be related to "Unknown HZ value! (91) Assume 100." which
started showing up with VIA motherboards on 2.5.x (I think) on top or ps
commands.  When I researched it before, It never caused ill, had been
identified as a "kernel bug" but benign. I know nothing more.

ATM, ps also seg faults, here is a corresponding oops,

 <1>Unable to handle kernel NULL pointer dereference at virtual address 000000b4
 printing eip:
c017c78a
*pde = 00000000
Oops: 0000 [#5]
PREEMPT 
CPU:    0
EIP:    0060:[<c017c78a>]    Not tainted
EFLAGS: 00010286   (2.6.7-sta-bk8) 
EIP is at pid_alive+0xa/0x30
eax: 000000b8   ebx: d32b0310   ecx: 00000000   edx: 00000000
esi: 00000000   edi: ef7bb7a0   ebp: d22b1b40   esp: ecc59e4c
ds: 007b   es: 007b   ss: 0068
Process ps (pid: 3456, threadinfo=ecc58000 task=e60ac7c0)
Stack: c017cca4 00000000 d22b1b40 ecc59f18 ef7bb7a0 ecc59ec4 c0159754 d22b1b40 
       ecc59f18 cf499006 cf499009 ecc59ec4 ecc59f18 c0159cc5 ecc59f18 ecc59ecc 
       ecc59ec4 ef7b86e0 d22b1dfc ee655240 bffff000 c0141ec8 c15cd660 c013e95c 
Call Trace:
 [<c017cca4>] pid_revalidate+0x14/0xc0
 [<c0159754>] do_lookup+0x44/0x80
 [<c0159cc5>] link_path_walk+0x535/0xa20
 [<c0141ec8>] find_extend_vma+0x18/0x70
 [<c013e95c>] follow_page+0x8c/0xb0
 [<c013ea3c>] get_user_pages+0xbc/0x3d0
 [<c015a406>] path_lookup+0x86/0x1a0
 [<c015a6a9>] __user_walk+0x39/0x70
 [<c0155a95>] vfs_stat+0x15/0x60
 [<c02445dd>] copy_to_user+0x2d/0x40
 [<c0156151>] sys_stat64+0x11/0x30
 [<c014dcbd>] __fput+0x8d/0xf0
 [<c014c6c3>] filp_close+0x43/0x70
 [<c014c744>] sys_close+0x54/0x80
 [<c0105dc7>] syscall_call+0x7/0xb
Code: 39 82 b4 00 00 00 75 07 8b 82 bc 00 00 00 c3 0f 0b 04 03 72 


config attached. I wrote 25G of zero and killed the dd process, top and
ps still segfault. Thanks all for your help!

// George



-- 
George Georgalis, Architect and administrator, Linux services. IXOYE
http://galis.org/george/  cell:646-331-2027  mailto:george@galis.org
Key fingerprint = 5415 2738 61CF 6AE1 E9A7  9EF0 0186 503B 9831 1631

[-- Attachment #2: 2.6.7-sta-bk8.config.gz --]
[-- Type: application/octet-stream, Size: 8062 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-25 21:34   ` SATA_SIL works with 2.6.7-bk8 seagate drive, but oops George Georgalis
@ 2004-06-25 23:16     ` Linus Torvalds
  2004-06-28  2:12       ` George Georgalis
  2004-06-29  8:46     ` Sebastian Slota
  1 sibling, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2004-06-25 23:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel Mail List, George Georgalis

On Fri, 25 Jun 2004, George Georgalis wrote:
> 
> However at about 3Gb (if that is relevant) top segfaulted with a
> non critical oops. top will not restart, but the box is otherwise
> functioning well considering the write load.

Ok, this is unlikely to be SATA-related, unless SATA just happened to 
corrupt something really strange.

> Unable to handle kernel NULL pointer dereference at virtual address 000000b4
>  printing eip:
> c017c78a
> *pde = 00000000
> Oops: 0000 [#1]
> PREEMPT 
> CPU:    0
> EIP:    0060:[<c017c78a>]    Not tainted
> EFLAGS: 00010286   (2.6.7-sta-bk8) 
> EIP is at pid_alive+0xa/0x30

That's "p->pids[PIDTYPE_PID].pidptr", and it looks like "p" is NULL.

That in turn _shouldn't_ happen, since that comes from 

	struct task_struct *task = proc_task(inode);

and proc_task() should always be non-NULL for any /proc file that has one 
of the pid-based dentry ops. 

> Could this be related to "Unknown HZ value! (91) Assume 100." which
> started showing up with VIA motherboards on 2.5.x (I think) on top or ps
> commands.  When I researched it before, It never caused ill, had been
> identified as a "kernel bug" but benign. I know nothing more.

No, that's just a pstools bug. It shouldn't try to guess HZ at all.

> ATM, ps also seg faults, here is a corresponding oops,

Same problem. One of your existing /proc/<xxx>/ directories has a NULL 
"task" pointer, and that really shouldn't happen.

Hmm. I do worry that maybe it's the SATA thing that has written NULL 
somewhere, since the /proc code never clears that field once it is set 
(and it would always be set by the code that creates the inode in the 
first place). 

		Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
@ 2004-06-26 12:37 Albert Cahalan
  2004-06-26 15:12 ` Arjan van de Ven
  2004-06-26 15:54 ` Linus Torvalds
  0 siblings, 2 replies; 14+ messages in thread
From: Albert Cahalan @ 2004-06-26 12:37 UTC (permalink / raw)
  To: linux-kernel mailing list; +Cc: george, Linus Torvalds

Linus Torvalds writes:
> On Fri, 25 Jun 2004, George Georgalis wrote:

>> Could this be related to "Unknown HZ value! (91) Assume 100." which
>> started showing up with VIA motherboards on 2.5.x (I think) on top or ps
>> commands.  When I researched it before, It never caused ill, had been
>> identified as a "kernel bug" but benign. I know nothing more.
>
> No, that's just a pstools bug. It shouldn't try to guess HZ at all.

With an older kernel I'd say he's losing 9% of his clock ticks.
In this case though, incompatible /proc/stat changes are
at fault. No longer does idle CPU time include IO-wait CPU time.
This shouldn't have changed; user tools can subtract as needed.

I'm sorry to say that the HZ-guessing code is now only
used for the 2.2.xx kernels. Over the years it has found
many clock problems. Had the 2.4.xx kernels used a 64-bit
jiffies counter, the HZ-guessing code would still be used.

You never did come up with an alternative to HZ-guessing
that would work on those old 1200-HZ Alpha boxes, the ARM
boxes that ran at 64 HZ and so on. I suppose you can blame
the arch maintainers, but user-space has to deal with it.
So HZ-guessing is a workaround for a kernel bug, especially
because you claim that HZ (USER_HZ now) is part of the ABI.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-26 12:37 SATA_SIL works with 2.6.7-bk8 seagate drive, but oops Albert Cahalan
@ 2004-06-26 15:12 ` Arjan van de Ven
  2004-06-26 16:00   ` Linus Torvalds
  2004-06-26 17:13   ` Albert Cahalan
  2004-06-26 15:54 ` Linus Torvalds
  1 sibling, 2 replies; 14+ messages in thread
From: Arjan van de Ven @ 2004-06-26 15:12 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, george, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 550 bytes --]


> You never did come up with an alternative to HZ-guessing
> that would work on those old 1200-HZ Alpha boxes, the ARM
> boxes that ran at 64 HZ and so on. I suppose you can blame
> the arch maintainers, but user-space has to deal with it.
> So HZ-guessing is a workaround for a kernel bug, especially
> because you claim that HZ (USER_HZ now) is part of the ABI.

well.... this value is *passed to userspace* via the AT_ attributes ....
glibc probably nicely exports this info via sysconf(). I guess/hope your
tools are now using that ?

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-26 12:37 SATA_SIL works with 2.6.7-bk8 seagate drive, but oops Albert Cahalan
  2004-06-26 15:12 ` Arjan van de Ven
@ 2004-06-26 15:54 ` Linus Torvalds
  1 sibling, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2004-06-26 15:54 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, george

On Sat, 26 Jun 2004, Albert Cahalan wrote:
> 
> You never did come up with an alternative to HZ-guessing
> that would work on those old 1200-HZ Alpha boxes, the ARM
> boxes that ran at 64 HZ and so on.

The fix for those should be that they should all export the same HZ to 
user space, regardless of any internal tick. So that's a kernel bug, in 
that those architectures expose the _internal_ HZ rather than some 
user-visible well-defined one.

> I suppose you can blame the arch maintainers, but user-space has to deal
> with it.

If the user space tools didn't try to deal with it, the architectures 
would probably get fixed in a jiffy. All the support for kernel-to-user HZ 
conversion is there.

So I still maintain that procps should _not_ try to guess HZ. As it is, 
it's a bug, and it helps make excuses for _other_ bugs.

		Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-26 15:12 ` Arjan van de Ven
@ 2004-06-26 16:00   ` Linus Torvalds
  2004-06-26 16:14     ` Arjan van de Ven
  2004-06-26 17:17     ` Albert Cahalan
  2004-06-26 17:13   ` Albert Cahalan
  1 sibling, 2 replies; 14+ messages in thread
From: Linus Torvalds @ 2004-06-26 16:00 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Albert Cahalan, linux-kernel mailing list, george



On Sat, 26 Jun 2004, Arjan van de Ven wrote:
> 
> well.... this value is *passed to userspace* via the AT_ attributes ....
> glibc probably nicely exports this info via sysconf(). I guess/hope your
> tools are now using that ?

Even then, it's a bug in my opinion. Yes, procps should be able to just 
use sysconf(_SC_CLK_TCK), but the fact is, using CLK_TCK and HZ is 
traditional unix behaviour, and we should just support it.

		Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-26 16:00   ` Linus Torvalds
@ 2004-06-26 16:14     ` Arjan van de Ven
  2004-06-26 17:17     ` Albert Cahalan
  1 sibling, 0 replies; 14+ messages in thread
From: Arjan van de Ven @ 2004-06-26 16:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Albert Cahalan, linux-kernel mailing list, george

[-- Attachment #1: Type: text/plain, Size: 514 bytes --]

On Sat, 2004-06-26 at 18:00, Linus Torvalds wrote:
> On Sat, 26 Jun 2004, Arjan van de Ven wrote:
> > 
> > well.... this value is *passed to userspace* via the AT_ attributes ....
> > glibc probably nicely exports this info via sysconf(). I guess/hope your
> > tools are now using that ?
> 
> Even then, it's a bug in my opinion.

Agreed 100%. It gets kind of fun though if you have say 32 bit emulation
in a 64 bit kernel and the 64 bit environment has a different user HZ
than the 32 bit env....



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-26 15:12 ` Arjan van de Ven
  2004-06-26 16:00   ` Linus Torvalds
@ 2004-06-26 17:13   ` Albert Cahalan
  1 sibling, 0 replies; 14+ messages in thread
From: Albert Cahalan @ 2004-06-26 17:13 UTC (permalink / raw)
  To: arjanv; +Cc: Albert Cahalan, linux-kernel mailing list, george, Linus Torvalds

On Sat, 2004-06-26 at 11:12, Arjan van de Ven wrote:
> > You never did come up with an alternative to HZ-guessing
> > that would work on those old 1200-HZ Alpha boxes, the ARM
> > boxes that ran at 64 HZ and so on. I suppose you can blame
> > the arch maintainers, but user-space has to deal with it.
> > So HZ-guessing is a workaround for a kernel bug, especially
> > because you claim that HZ (USER_HZ now) is part of the ABI.
> 
> well.... this value is *passed to userspace* via the AT_ attributes ....
> glibc probably nicely exports this info via sysconf(). I guess/hope your
> tools are now using that ?

They use the AT_ attribute when it is provided.
(Linux 2.4 and above only)

The glibc wrapper is defective; it takes a bad
guess at HZ when the AT_ attribute is missing.
Had sysconf() worked as documented, returning an
error code on failure, it would have been used.

The sysconf() call for getting the number of CPUs
is also broken, at least on SPARC. It would return
a 0 instead of the proper -1 error code. I think
it tried to scan /proc/cpuinfo for CPUs.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-26 16:00   ` Linus Torvalds
  2004-06-26 16:14     ` Arjan van de Ven
@ 2004-06-26 17:17     ` Albert Cahalan
  1 sibling, 0 replies; 14+ messages in thread
From: Albert Cahalan @ 2004-06-26 17:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arjan van de Ven, Albert Cahalan, linux-kernel mailing list,
	george

On Sat, 2004-06-26 at 12:00, Linus Torvalds wrote:
> On Sat, 26 Jun 2004, Arjan van de Ven wrote:
> > 
> > well.... this value is *passed to userspace* via the AT_ attributes ....
> > glibc probably nicely exports this info via sysconf(). I guess/hope your
> > tools are now using that ?
> 
> Even then, it's a bug in my opinion. Yes, procps should be able to just 
> use sysconf(_SC_CLK_TCK), but the fact is, using CLK_TCK and HZ is 
> traditional unix behaviour, and we should just support it.

It's not working right now, likely due to non-integer HZ.
Here's a bug report.......


Even with the 2.6.7 kernel, I'm still getting reports of process
start times wandering. Here is an example:

   "About 12 hours since reboot to 2.6.7 there was already a
   difference of about 7 seconds between the real start time
   and the start time reported by ps. Now, 24 hours since reboot
   the difference is 10 seconds."

The calculation used is:

   now - uptime + process_run_time

The code shown below works great on a 2.4.xx or earlier kernel.
It generally relys on USER_HZ, which is supposedly in our ABI.

I have a feeling we'll forever be chasing bugs related to not
using a PLL to drive the clock tick at exactly HZ ticks per second.
Perhaps the DragonflyBSD code could be stolen. Anyway, the code:

///////////////////////////////////////////////////////////////////////////
unsigned long seconds_since_1970 = time(NULL);
unsigned long seconds_since_boot = uptime(0,0);
unsigned long time_of_boot       = seconds_since_1970 - seconds_since_boot;
int pr_stime(char *restrict const outbuf, const proc_t *restrict const pp){
  struct tm *proc_time;
  struct tm *our_time;
  time_t t;
  const char *fmt;
  int tm_year;
  int tm_yday;
  our_time = localtime(&seconds_since_1970);   /* not reentrant */
  tm_year = our_time->tm_year;
  tm_yday = our_time->tm_yday;
  t = time_of_boot + pp->start_time / Hertz;
  proc_time = localtime(&t); /* not reentrant, this corrupts our_time */
  fmt = "%H:%M";                                   /* 03:02 23:59 */
  if(tm_yday != proc_time->tm_yday) fmt = "%b%d";  /* Jun06 Aug27 */
  if(tm_year != proc_time->tm_year) fmt = "%Y";    /* 1991 2001 */
  return strftime(outbuf, 42, fmt, proc_time);
}
///////////////////////////////////////////////////////////////////////////



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-25 23:16     ` Linus Torvalds
@ 2004-06-28  2:12       ` George Georgalis
  0 siblings, 0 replies; 14+ messages in thread
From: George Georgalis @ 2004-06-28  2:12 UTC (permalink / raw)
  To: Linux Kernel Mail List; +Cc: Linus Torvalds, Andrew Morton, linux-ide

On Fri, Jun 25, 2004 at 04:16:03PM -0700, Linus Torvalds wrote:
>That's "p->pids[PIDTYPE_PID].pidptr", and it looks like "p" is NULL.
>
>That in turn _shouldn't_ happen, since that comes from 
>
>       struct task_struct *task = proc_task(inode);
>
>and proc_task() should always be non-NULL for any /proc file that has one 
>of the pid-based dentry ops. 

>On Fri, 25 Jun 2004, George Georgalis wrote:
>> ATM, ps also seg faults, here is a corresponding oops,
>
>Same problem. One of your existing /proc/<xxx>/ directories has a NULL 
>"task" pointer, and that really shouldn't happen.

The first thing I noticed when reading sata_sil.c was readl() and
writel() calls.  Thinking that meant "read/write line" I guessed it
could invoke a sectors = 15 hardware issue with some data, and went to
see exactly what it means.

I haven't determined which include/asm-*/io.h is used for
MCYRIXIII/MVIAC3, but my best guess is include/asm-i386/io.h

#define readl(addr) (*(volatile unsigned int *) (addr))
#define writel(b,addr) (*(volatile unsigned int *) (addr) = (b))

then I find volatile in a driver example from
http://publications.gbdirect.co.uk/c_book/chapter8/const_and_volatile.html
Which describes how volatile is used to peek hardware status.

but, in sata_sil.c, sil_scr_write does

                writel(val, mmio);

volatile is used for data, not status! I can't glean this construct
(when the data runs out it's null and the loop ends?). Was going to say
if hardware caused status to turn up null that could be checked and
assigned before being used...

 On Sun, Jun 27, 2004 at 10:39:08AM -0700, Linus Torvalds wrote:
 >So I stand by the rule: we should make _code_ have the access rules, and
 >the data itself should never be volatile. And yes, jiffies breaks that
 >rule, but hey, that's not something I'm proud of.

So sata_sil.c is using the wrong construct or am I not reading it right?

>Hmm. I do worry that maybe it's the SATA thing that has written NULL 
>somewhere, since the /proc code never clears that field once it is set 
>(and it would always be set by the code that creates the inode in the 
>first place). 

Might it come from reiserfs? I didn't mkfs again after the last sata 
device block(s). I'll be doing some more experimentation, how would I
'find' the null in proc? can I detect that in user space? 

// George

-- 
George Georgalis, Architect and administrator, Linux services. IXOYE
http://galis.org/george/  cell:646-331-2027  mailto:george@galis.org
Key fingerprint = 5415 2738 61CF 6AE1 E9A7  9EF0 0186 503B 9831 1631

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-25 21:34   ` SATA_SIL works with 2.6.7-bk8 seagate drive, but oops George Georgalis
  2004-06-25 23:16     ` Linus Torvalds
@ 2004-06-29  8:46     ` Sebastian Slota
  2004-06-30  4:43       ` George Georgalis
  1 sibling, 1 reply; 14+ messages in thread
From: Sebastian Slota @ 2004-06-29  8:46 UTC (permalink / raw)
  To: George Georgalis; +Cc: linux-kernel

Hi,

I'm breaking the testing for some 3 months, I'm writing a work and I'm short
of time...

so far:
Tried Kernel with bk8:
 
root@t-rex root # cat /proc/mdstat
Personalities : [raid0] [raid5] [multipath]
md1 : active raid5 sdc3[2] sdb3[1] sda3[0]
261730816 blocks level 5, 128k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid0 sdc2[2] sdb2[1] sda2[0]
73256064 blocks 128k chunks

unused devices: <none>

root@t-rex root # hdparm -tT /dev/md0

/dev/md0:
Timing buffer-cache reads: 3896 MB in 2.01 seconds = 1935.71 MB/sec
Timing buffered disk reads: 274 MB in 3.02 seconds = 90.68 MB/sec
root@t-rex root # hdparm -tT /dev/md1

/dev/md1:
Timing buffer-cache reads: 3760 MB in 2.00 seconds = 1879.35 MB/sec
Timing buffered disk reads: 206 MB in 3.01 seconds = 68.49 MB/secc 

Copy a DVD to HD, both went OK!
copy data from an ATA HD ( hda ) broke.

I read from ppl they're running linux on some older hardware, maybe thats
why it doesnt work... but ~25mb/s is nothing for me...
Also I hear about some patches to limit the speed to ~30MB/s.

I hope its kidding!

Back to M$. there it works.

S.


> On Thu, Jun 24, 2004 at 02:46:39PM -0400, Ricky Beam wrote:
> >On Thu, 24 Jun 2004, George Georgalis wrote:
> >...
> >>has caused pdflush to block IO, any access to /mnt and the process
> >>does not return. other than the pdflush load of ~99% the box seems to
> >>function normally. 2.6.7-bk6, seagate drive
> >
> >-bk6 is not new enough.  bk7 has the necessary max_sectors fix.  You
> >may need to add your drive model to the sil_blacklist in
> >drivers/scsi/sata_sil.c.
> 
> Okay, 2.6.7-bk8 has written 8Gb to the sda4 with SATA_SIL and still
> going strong! "dd if=/dev/zero of=/mnt/zero-`date +%s`"
> 
> However at about 3Gb (if that is relevant) top segfaulted with a
> non critical oops. top will not restart, but the box is otherwise
> functioning well considering the write load.
> 
> Is there any way to determine the drive model without first connecting
> with the other sata driver (as hdc) and using hdparm?
> 
> 
> Unable to handle kernel NULL pointer dereference at virtual address
> 000000b4
>  printing eip:
> c017c78a
> *pde = 00000000
> Oops: 0000 [#1]
> PREEMPT 
> CPU:    0
> EIP:    0060:[<c017c78a>]    Not tainted
> EFLAGS: 00010286   (2.6.7-sta-bk8) 
> EIP is at pid_alive+0xa/0x30
> eax: 000000b8   ebx: d32b0310   ecx: 00000000   edx: 00000000
> esi: 00000000   edi: ef7bb7a0   ebp: d22b1b40   esp: db473e4c
> ds: 007b   es: 007b   ss: 0068
> Process top (pid: 489, threadinfo=db472000 task=e60ac7c0)
> Stack: c017cca4 00000000 d22b1b40 db473f18 ef7bb7a0 db473ec4 c0159754
> d22b1b40 
>        db473f18 eaa1f006 eaa1f009 db473ec4 db473f18 c0159cc5 db473f18
> db473ecc 
>        db473ec4 ef7b86e0 d22b1dfc ee655240 bffff000 c0141ec8 c15cd660
> c013e95c 
> Call Trace:
>  [<c017cca4>] pid_revalidate+0x14/0xc0
>  [<c0159754>] do_lookup+0x44/0x80
>  [<c0159cc5>] link_path_walk+0x535/0xa20
>  [<c0141ec8>] find_extend_vma+0x18/0x70
>  [<c013e95c>] follow_page+0x8c/0xb0
>  [<c013ea3c>] get_user_pages+0xbc/0x3d0
>  [<c015a406>] path_lookup+0x86/0x1a0
>  [<c015a6a9>] __user_walk+0x39/0x70
>  [<c0155a95>] vfs_stat+0x15/0x60
>  [<c02445dd>] copy_to_user+0x2d/0x40
>  [<c0156151>] sys_stat64+0x11/0x30
>  [<c014dcbd>] __fput+0x8d/0xf0
>  [<c014c6c3>] filp_close+0x43/0x70
>  [<c014c744>] sys_close+0x54/0x80
>  [<c0105dc7>] syscall_call+0x7/0xb
> 
> 
> 
> 
> Could this be related to "Unknown HZ value! (91) Assume 100." which
> started showing up with VIA motherboards on 2.5.x (I think) on top or ps
> commands.  When I researched it before, It never caused ill, had been
> identified as a "kernel bug" but benign. I know nothing more.
> 
> ATM, ps also seg faults, here is a corresponding oops,
> 
>  <1>Unable to handle kernel NULL pointer dereference at virtual address
> 000000b4
>  printing eip:
> c017c78a
> *pde = 00000000
> Oops: 0000 [#5]
> PREEMPT 
> CPU:    0
> EIP:    0060:[<c017c78a>]    Not tainted
> EFLAGS: 00010286   (2.6.7-sta-bk8) 
> EIP is at pid_alive+0xa/0x30
> eax: 000000b8   ebx: d32b0310   ecx: 00000000   edx: 00000000
> esi: 00000000   edi: ef7bb7a0   ebp: d22b1b40   esp: ecc59e4c
> ds: 007b   es: 007b   ss: 0068
> Process ps (pid: 3456, threadinfo=ecc58000 task=e60ac7c0)
> Stack: c017cca4 00000000 d22b1b40 ecc59f18 ef7bb7a0 ecc59ec4 c0159754
> d22b1b40 
>        ecc59f18 cf499006 cf499009 ecc59ec4 ecc59f18 c0159cc5 ecc59f18
> ecc59ecc 
>        ecc59ec4 ef7b86e0 d22b1dfc ee655240 bffff000 c0141ec8 c15cd660
> c013e95c 
> Call Trace:
>  [<c017cca4>] pid_revalidate+0x14/0xc0
>  [<c0159754>] do_lookup+0x44/0x80
>  [<c0159cc5>] link_path_walk+0x535/0xa20
>  [<c0141ec8>] find_extend_vma+0x18/0x70
>  [<c013e95c>] follow_page+0x8c/0xb0
>  [<c013ea3c>] get_user_pages+0xbc/0x3d0
>  [<c015a406>] path_lookup+0x86/0x1a0
>  [<c015a6a9>] __user_walk+0x39/0x70
>  [<c0155a95>] vfs_stat+0x15/0x60
>  [<c02445dd>] copy_to_user+0x2d/0x40
>  [<c0156151>] sys_stat64+0x11/0x30
>  [<c014dcbd>] __fput+0x8d/0xf0
>  [<c014c6c3>] filp_close+0x43/0x70
>  [<c014c744>] sys_close+0x54/0x80
>  [<c0105dc7>] syscall_call+0x7/0xb
> Code: 39 82 b4 00 00 00 75 07 8b 82 bc 00 00 00 c3 0f 0b 04 03 72 
> 
> 
> config attached. I wrote 25G of zero and killed the dd process, top and
> ps still segfault. Thanks all for your help!
> 
> // George
> 
> 
> 
> -- 
> George Georgalis, Architect and administrator, Linux services. IXOYE
> http://galis.org/george/  cell:646-331-2027  mailto:george@galis.org
> Key fingerprint = 5415 2738 61CF 6AE1 E9A7  9EF0 0186 503B 9831 1631
> 

-- 
"Sie haben neue Mails!" - Die GMX Toolbar informiert Sie beim Surfen!
Jetzt aktivieren unter http://www.gmx.net/info


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-29  8:46     ` Sebastian Slota
@ 2004-06-30  4:43       ` George Georgalis
  2004-06-30  6:16         ` Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: George Georgalis @ 2004-06-30  4:43 UTC (permalink / raw)
  To: Sebastian Slota, Linux Kernel Mail List

On Tue, Jun 29, 2004 at 10:46:45AM +0200, Sebastian Slota wrote:
>Tried Kernel with bk8:
> 
>root@t-rex root # cat /proc/mdstat
>Personalities : [raid0] [raid5] [multipath]
>md1 : active raid5 sdc3[2] sdb3[1] sda3[0]
>261730816 blocks level 5, 128k chunk, algorithm 2 [3/3] [UUU]
>
>md0 : active raid0 sdc2[2] sdb2[1] sda2[0]
>73256064 blocks 128k chunks
>
>unused devices: <none>
>
>root@t-rex root # hdparm -tT /dev/md0
>
>/dev/md0:
>Timing buffer-cache reads: 3896 MB in 2.01 seconds = 1935.71 MB/sec
>Timing buffered disk reads: 274 MB in 3.02 seconds = 90.68 MB/sec
>root@t-rex root # hdparm -tT /dev/md1
>
>/dev/md1:
>Timing buffer-cache reads: 3760 MB in 2.00 seconds = 1879.35 MB/sec
>Timing buffered disk reads: 206 MB in 3.01 seconds = 68.49 MB/secc 
>
>Copy a DVD to HD, both went OK!
>copy data from an ATA HD ( hda ) broke.
>
>I read from ppl they're running linux on some older hardware, maybe thats
>why it doesnt work... but ~25mb/s is nothing for me...
>Also I hear about some patches to limit the speed to ~30MB/s.


I was able to dd ~140 GB with SATA_SIL today, on a stock bk kernel, till
I ran out of disk, no errors. which was a pleasant unexpected surprise.

but when I checked "Timing buffered disk reads" it was around 25 MB/sec
not the ~52 MB/sec I saw before with the oops. The odd thing was this
disk was not in the blacklist so I don't know why it was running slower.

// George


-- 
George Georgalis, Architect and administrator, Linux services. IXOYE
http://galis.org/george/  cell:646-331-2027  mailto:george@galis.org
Key fingerprint = 5415 2738 61CF 6AE1 E9A7  9EF0 0186 503B 9831 1631


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-30  4:43       ` George Georgalis
@ 2004-06-30  6:16         ` Jeff Garzik
  2004-07-02 23:01           ` George Georgalis
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2004-06-30  6:16 UTC (permalink / raw)
  To: George Georgalis; +Cc: Sebastian Slota, Linux Kernel Mail List

George Georgalis wrote:
> I was able to dd ~140 GB with SATA_SIL today, on a stock bk kernel, till
> I ran out of disk, no errors. which was a pleasant unexpected surprise.
> 
> but when I checked "Timing buffered disk reads" it was around 25 MB/sec
> not the ~52 MB/sec I saw before with the oops. The odd thing was this
> disk was not in the blacklist so I don't know why it was running slower.


Try mounting a filesystem, unmounting it, and then doing the timing.

	Jeff



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SATA_SIL works with 2.6.7-bk8 seagate drive, but oops
  2004-06-30  6:16         ` Jeff Garzik
@ 2004-07-02 23:01           ` George Georgalis
  0 siblings, 0 replies; 14+ messages in thread
From: George Georgalis @ 2004-07-02 23:01 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Sebastian Slota, Linux Kernel Mail List, linux-ide

On Wed, Jun 30, 2004 at 02:16:33AM -0400, Jeff Garzik wrote:
>George Georgalis wrote:
>>I was able to dd ~140 GB with SATA_SIL today, on a stock bk kernel, till
>>I ran out of disk, no errors. which was a pleasant unexpected surprise.
>>
>>but when I checked "Timing buffered disk reads" it was around 25 MB/sec
>>not the ~52 MB/sec I saw before with the oops. The odd thing was this
>>disk was not in the blacklist so I don't know why it was running slower.
>
>
>Try mounting a filesystem, unmounting it, and then doing the timing.

With some new (working) ram in the box, and root on hda; sata_sil
gave pretty consistent 29 MB/sec as an auxiliary sda disk, mounted,
remounted, unmounted, verified 29.50 +/- .40 MB/sec, consistently.

However, when I boot with root on sda, I get better performance (?), up
to 41MB/sec and consistently in the 30s; with x and many daemons (but
unloaded)...

 Timing buffered disk reads:  100 MB in  3.05 seconds =  32.74 MB/sec
 Timing buffered disk reads:  112 MB in  3.04 seconds =  36.84 MB/sec
 Timing buffered disk reads:  114 MB in  3.01 seconds =  37.82 MB/sec
 Timing buffered disk reads:  104 MB in  3.02 seconds =  34.48 MB/sec
 Timing buffered disk reads:  110 MB in  3.00 seconds =  36.64 MB/sec
 Timing buffered disk reads:   94 MB in  3.01 seconds =  31.19 MB/sec
 Timing buffered disk reads:   88 MB in  3.01 seconds =  29.25 MB/sec
 Timing buffered disk reads:   90 MB in  3.06 seconds =  29.37 MB/sec
 Timing buffered disk reads:   88 MB in  3.01 seconds =  29.23 MB/sec
 Timing buffered disk reads:  108 MB in  3.03 seconds =  35.70 MB/sec
 Timing buffered disk reads:  120 MB in  3.04 seconds =  39.47 MB/sec
 Timing buffered disk reads:   88 MB in  3.00 seconds =  29.30 MB/sec

(more or less random intervals, at least 5 seconds apart), this
is running a bk kernel checked out June 28. I've written up to 200Gb
to this disk and built a workstation on it, no errors.

Thanks!
// George


-- 
George Georgalis, Architect and administrator, Linux services. IXOYE
http://galis.org/george/  cell:646-331-2027  mailto:george@galis.org
Key fingerprint = 5415 2738 61CF 6AE1 E9A7  9EF0 0186 503B 9831 1631


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-07-02 23:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-26 12:37 SATA_SIL works with 2.6.7-bk8 seagate drive, but oops Albert Cahalan
2004-06-26 15:12 ` Arjan van de Ven
2004-06-26 16:00   ` Linus Torvalds
2004-06-26 16:14     ` Arjan van de Ven
2004-06-26 17:17     ` Albert Cahalan
2004-06-26 17:13   ` Albert Cahalan
2004-06-26 15:54 ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2004-06-24 15:59 SATA_SIL fails with 2.6.7-bk6 seagate drive George Georgalis
2004-06-24 18:46 ` Ricky Beam
2004-06-25 21:34   ` SATA_SIL works with 2.6.7-bk8 seagate drive, but oops George Georgalis
2004-06-25 23:16     ` Linus Torvalds
2004-06-28  2:12       ` George Georgalis
2004-06-29  8:46     ` Sebastian Slota
2004-06-30  4:43       ` George Georgalis
2004-06-30  6:16         ` Jeff Garzik
2004-07-02 23:01           ` George Georgalis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox