From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joel Soete Subject: Re: [parisc-linux] [patch 2/2] backport of sba sg list management to ccio-dma Date: Sat, 24 Nov 2007 20:36:03 +0000 Message-ID: <47488B33.5040904@scarlet.be> References: <20071028064158.GB29233@colo.lackof.org> <4724A084.5090709@scarlet.be> <20071029053015.GA14763@colo.lackof.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: kyle , parisc-linux To: Grant Grundler Return-Path: In-Reply-To: <20071029053015.GA14763@colo.lackof.org> List-Id: parisc-linux developers list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: parisc-linux-bounces@lists.parisc-linux.org Hello Grant, Kyle, I finaly find an interesting paper on this Runway bc: I read it but i don't yet understand it in deep but this detail: "The lower 12 bits of the address must be left alone because of the 4K-byte page size defined by the architecture." make me think that the IOVP_SHIFT of this ccio-dma driver would be always 12 what ever could be the PAGE_SHIFT (it's not yet possible but for pa8000 and later it could be greater)? That said for the moment IOVP_SHIF == PAGE_SHIFT so couldn't be the reason of the followings issues. On my d380, I reach to re-iterate some stress test on scsi ncr53c720 and LASI 53c700 with a simple read/write loop like: # while true ; do nice -n -3 tar -xspf linux-2.6.11-rc3-pa3.tar ; nice -n -3 rm -rf linux-2.6.11-rc3-pa3 ; date ; done With a fs build on a disk connected to a 53c710 hba, with or without my bp patch, unfortunately I always got same errors after some loop's occurence: scsi3: (3:0) phase mismatch at 01e8, phase IO CD MSG BSY REQ MSG IN scsi3: Bus Reset detected, executing command 10a304e0, slot 10a0864c, dsp 001681e8[01e8] failing command because of reset, slot 10a08520, cmnd 10a30720 failing command because of reset, slot 10a0864c, cmnd 10a304e0 failing command because of reset, slot 10a08778, cmnd 10a303c0 failing command because of reset, slot 10a088a4, cmnd 16eacd40 scsi3: (3:0) phase mismatch at 01e8, phase IO CD MSG BSY REQ MSG IN scsi3: Bus Reset detected, executing command 10a30600, slot 10a088a4, dsp 001681e8[01e8] failing command because of reset, slot 10a08520, cmnd 16eacd40 failing command because of reset, slot 10a0864c, cmnd 10a304e0 failing command because of reset, slot 10a08778, cmnd 10a30720 failing command because of reset, slot 10a088a4, cmnd 10a30600 scsi3: (3:0) phase mismatch at 01e8, phase IO CD MSG BSY REQ MSG IN scsi3: Bus Reset detected, executing command 16eac9e0, slot 10a088a4, dsp 001681e8[01e8] failing command because of reset, slot 10a08520, cmnd 16eace60 failing command because of reset, slot 10a0864c, cmnd 10a30600 failing command because of reset, slot 10a08778, cmnd 16eacd40 failing command because of reset, slot 10a088a4, cmnd 16eac9e0 [snip] (this same disk connected to same lasi 53c710 of a b180 i.e. without ccio-dma could loop severall days without showing any issue) On a disk attached to a ncr53c720 hba I also get errors: EXT3-fs error (device dm-0): ext3_free_blocks: Freeing blocks not in datazone - block = 1818455657, count = 1 EXT3-fs error (device dm-0): ext3_free_blocks: Freeing blocks not in datazone - block = 157639797, count = 1 EXT3-fs error (device dm-0): ext3_free_blocks: Freeing blocks not in datazone - block = 1852402748, count = 1 EXT3-fs error (device dm-0): ext3_free_blocks: Freeing blocks not in datazone - block = 1714387061, count = 1 [snip] With the original ccio-dma driver it occures after few occurence of the loop (about 5) but my patch only delay the pb to several houres (not useless work but not yet enough). Any way fs is corrupted and this bring me to next major issue with my c110 (using same ncr53c720, lasi 53c710 and ccio-dma drivers as d380). This box was sleeping till about a year, so I removed additional ram kit of 512Mb for another usage and restored original ram of 64Mb, but internal boot disk stay unchanged connected to the ncr53c720 hba. When I tried to reboot it some weeks ago with an existing & known working kernels (from the time system still own 512Mb; e.g. 2.6.8.1-pa7, 2.6.14-pa0, 2.6.19), it started to make a fsck obviously but this always sadely (fsck generating a fs corrution, well not directly but by border effect) ended by fs corruption too. That's only with the very old debian install kernel 2.4.17-32 that I reach to reboot this system to install latest 2.6.23-pa.orig and 2.6.23-pa+patch kernels. I could also reach to reboot this box with latest mentioned kernels but as soon as I launched an 'apt-get dist-upgrade' (after a update obviously) fs corruption occured again. I was inocently expecting that after some reboot, fsck and renew dist-upgrade, I would finaly recover a system operational like my d380. But I was wrong and after 2 or 3 reboot this box became not-bootable anymore (having lost too much critical files on the root fs :_(). [Sade sade sade to me: in 10 years of linux, it's the very first time I lost a system because of sw issue :__(] All this story to say in summary, a d380 with 256Mb of ram works more or less fine (if I don't stress scsi disk) but a c110 with few 64Mb is not usable at all (with either original or patched 2.6.23-pa kernels)? I have the filing that some cache coherency (I/O, mem??) lakes somewhere but I didn't understand where/what is the code that do it now, so if you have some more time to pin point it to me, I would greatly appreciate. TIA, J. _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux