All of lore.kernel.org
 help / color / mirror / Atom feed
From: joe briggs <jbriggs@briggsmedia.com>
To: vincent.touquet@pandora.be, linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@osdl.org>, Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: [Bug report] System lockups on Tyan S2469 and lots of io [smp boot time problems too :(]
Date: Tue, 8 Jul 2003 09:59:54 -0400	[thread overview]
Message-ID: <200307080959.54247.jbriggs@briggsmedia.com> (raw)
In-Reply-To: <20030708101950.GB14044@ns.mine.dnsalias.org>

Vincent - 
I wonder if what is really happening is a problem in the in the arbitration 
between the PCI bus and the local bus that the onboard IDE devices are.  In 
your case the problem (onboard IDE device data corruption) manifests when you 
are performing sustained transfers (large files) between the onboard IDE 
device and a PCI block device (the 3ware RAID). In my case, the same problem 
manifests when I have sustained data activity from multiple frame grabbers to 
memory, then from memory to RAID.  When the system drive is used (code load, 
swap, etc.) it gets corrupted.  My point is, the onboard data device only 
seems to get corrupted when there is lots of i/o activity with PCI 
bus-masters that are DMA'ing data to/from memory.  What do you think?


On Tuesday 08 July 2003 06:19 am, Vincent Touquet wrote:
> After my search for what caused my hangs, I decided to wonder if DMA could
> be the culprit. I put ide=nodma in the commandline and the system is still
> not hanging under the copy task (the system hangs when copying from an ide
> disk to a raid array).
>
> Looking at the output of vmstat with dma on:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=105759652518028&w=2
>
> You can see that sometimes there are stalls on the blocks in (bi) side
> [reading data from the IDE disk]. Performance is rather 'stellar' with on
> average 20.000 blocks per second input from the ide disk, but sometimes
> this drops to zero. This could be problems with reading from the disk (or
> is the write-out not happening fast enough ?)
>
> Vmstat without dma on the ide disk is much more moderate (reading less
> blocks per second from the ide disk):
>
> Extract from the now ongoing copy process.
>  2  0   3968   9568  20624 915104    0    0  3972     0  605   748  1 51 48
>  0 1  0   3968   9472  20652 915132    0    0  3588 14980  700   795  1 52
> 47  0 0  1   3968   9488  20644 915212    0    0  3972     0  613   751  0
> 48 51  0 1  0   3968   9540  20652 915092    0    0  3972     0  603   756 
> 3 45 51  0 0  1   3968   9564  20668 915036    0    0  4108     0  616  
> 752  0 56 44  0 1  0   3968   9456  20688 915072    0    0  3976     0  605
>   749  2 43 55  0 1  1   3968   9532  20700 914960    0    0  3532 19344 
> 716   873  1 43 56  0 3  0   3968   9460  20712 915100    0    0  3832    
> 4  609   727  3 50 48  0 1  1   3968   9508  20724 915032    0    0  4108  
>   0  601   761  0 48 51  0 0  1   3968   9480  20752 915040    0    0  4112
>     0  613   814  1 52 47  0 0  4   3968   9532  20780 914888    0    0 
> 3720 19316  610   704  3 48 50  0 1  0   3968   9500  20836 914764    0   
> 0  2100    88  535   782  3 33 64  0
>
> There seem to be no stalls on the reader side.
>
> Now the big question is: is dma really at fault here, or are there problems
> on the write-out side ? [if dma is the problem, maybe we should reopen the
> discussion of enabling dma by default ;)]
>
> I think the answer is in the traces near the point were the machine hangs:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=105759465915936&w=2
>
> This snippet then again, makes me think there is something wrong on the
> scsi side... Or is the problem with the IDE somehow also disturbing the
> scsi system (PCI bus hang ?).
>
> Jul  7 17:52:52 kalimero kernel: kupdated      D 00000001  5204     7     
> 1         8     6 (L-TLB) Jul  7 17:52:52 kalimero kernel: Call Trace:   
> [__down+192/352]  [log_start_commit+216/256] [__down_failed+11/20]
> [.text.lock.super+279/550]  [sync_old_buffers+94/336] Jul  7 17:52:52
> kalimero kernel:   [kupdate+418/480] [kupdate+0/480] [rest_init+0/144]
> [rest_init+0/144] [kernel_thread+46/64] [kupdate+0/480] Jul  7 17:52:52
> kalimero kernel: scsi_eh_0     S 00000000  6080     8      1         9    
> 7 (L-TLB) Jul  7 17:52:52 kalimero kernel: Call Trace:   
> [vsnprintf+500/1056] [__down_interruptible+221/416]
> [__down_failed_interruptible+10/16] [.text.lock.scsi_error+229/290]
> [kernel_thread+46/64] Jul  7 17:52:52 kalimero kernel:  
> [scsi_error_handler+0/608]
>
> I wonder how I could decide the case of dma vs. scsi (as the root cause of
> the problem).
>
> best regards,
>
> Vincent
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Joe Briggs
Briggs Media Systems
105 Burnsen Ave.
Manchester NH 01304 USA
TEL 603-232-3115 FAX 603-625-5809 MOBILE 603-493-2386
www.briggsmedia.com

  reply	other threads:[~2003-07-08 12:46 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-07-06 21:02 [Bug report] System lockups on Tyan S2469 and lots of io [smp boot time problems too :(] Vincent Touquet
2003-07-07  0:30 ` Vincent Touquet
2003-07-07  0:52   ` Andrew Morton
2003-07-07  1:08     ` Vincent Touquet
2003-07-07  0:54   ` Vincent Touquet
2003-07-07  2:19     ` Andrew Morton
2003-07-07  8:32       ` Vincent Touquet
2003-07-07 11:43       ` joe briggs
2003-07-07 11:08         ` Vincent Touquet
2003-07-07 12:47 ` Vincent Touquet
2003-07-08 21:16   ` Vincent Touquet
2003-07-07 16:14 ` Vincent Touquet
2003-07-07 16:15   ` Vincent Touquet
2003-07-07 16:48     ` Vincent Touquet
2003-07-08 10:19 ` Vincent Touquet
2003-07-08 13:59   ` joe briggs [this message]
2003-07-08 13:10     ` Vincent Touquet
2003-07-08 16:14 ` Vincent Touquet
2003-07-08 16:41   ` Vojtech Pavlik
2003-07-08 16:51     ` Vincent Touquet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200307080959.54247.jbriggs@briggsmedia.com \
    --to=jbriggs@briggsmedia.com \
    --cc=akpm@osdl.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vincent.touquet@pandora.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.