* 2.4.20-pre4-ac1 trashed my system
@ 2002-08-29 14:06 Mike Isely
2002-08-29 15:32 ` Alan Cox
0 siblings, 1 reply; 35+ messages in thread
From: Mike Isely @ 2002-08-29 14:06 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Alan Cox
Hi,
I've been a Linux user since 1994. I have always built my own
kernels. I have never trashed a system, until 2 nights ago, when I
ran 2.4.20-pre4-ac1...
Unfortunately what I have is very short on detail, because most of the
"evidence burned in the fire" when the disk's file systems were
destroyed. I do have .deb's of the suspect kernel (I'm a Debian user)
and I think I can recover the kernel .config file from that. Here's
what else I can supply:
o System: Athlon XP 1700+ CPU, built on an Asus A7V266-E mainboard.
o Ram: 512MB
o Disk: 160GB Maxtor IDE (note: >137GB)
o Controller: Promise 20265 rev 02 (as reported by lspci)
o Note: The Promise controller is not the primary controller on the
board. This is a second controller equipped on the board. I
point this out because the 160GB disk was connected to this
Promise controller, not the motherboard's default controller.
o I was using ext3 everywhere at the time things exploded.
o The previous kernel before 2.4.20-pre4-ac1 that I ran was
2.4.19-ac4, which ran OK on this hardware combination.
The first symptom I observed was a directory that listed incorrectly
as a file. It wasn't on my root file system so I unmounted it and
attempted an fsck. At this point I wasn't suspecting the new kernel
(I should have), otherwise I should have backed off to 2.4.19-ac4
first. But I didn't. This file system was about 120GB, most of the
disk; I hadn't noticed any trouble with other file systems yet.
The fsck went through about 60% of the file system cleanly and then
just went nuts reporting / fixing errors. Then fsck gave up,
complaining about something wrong with the journal file. I
reattempted it (second mistake); this time it died right away with the
same error. Then I noticed other processes hanging on the system. I
was unable to shut down so I power-cycled. The reboot paniced after
failing to find the init executable (though it did manage to mount
root).
Going further down this trail of damage, I then tried to boot a rescue
partition (about 200MB) previously set up on the same disk. This was
a partition that was _not_ _mounted_ when 2.4.20-pre4-ac1 was running.
This boot attempt got as far as trying to start things in /etc/init.d
before croaking with a pile of SEGVs. I managed to fsck the rescue
partition but the damage had been done and that partition never worked
right again (i.e. corrupted files).
As far as I can tell now, the entire disk has been scrambled (except
for the partition table, which seems to have survived unscathed).
Also FWIW I did check kernel log message output during this melee and
saw nothing unusual, specifically no errors from the IDE subsystem.
The only things I see about this that might be noteworthy is:
1. I was using the Promise controller for my system disk, not the
board's primary controller.
2. I was using a 160GB drive, which exceeds the 137GB limit of ATA-5
(?). Notably, the initial fsck got ugly about 60% the way
through, which I _think_ would have put that right near the 137GB
boundary of the disk, given where that particular partition was
set up.
Is there a possible problem here with huge disk support using the
Promise 20265 controller in 2.4.20-pre4-ac1?
Unfortunately I need that system back so I'm rebuilding it now (and
moving the disk off of the Promise controller out of paranoia).
Sorry...
Is this a known problem with 2.4.20-pre4-ac1? I did note Alan's
statement about using the -ac series for further IDE development and
wonder if perhaps I got caught in the crosshairs.
-Mike
| Mike Isely | PGP fingerprint
POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92
UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8
| (spam-foiling address) |
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-29 14:06 2.4.20-pre4-ac1 trashed my system Mike Isely @ 2002-08-29 15:32 ` Alan Cox 2002-08-29 17:15 ` Andre Hedrick 2002-09-05 5:54 ` [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed Mike Isely 0 siblings, 2 replies; 35+ messages in thread From: Alan Cox @ 2002-08-29 15:32 UTC (permalink / raw) To: Mike Isely, andre; +Cc: Linux Kernel Mailing List The promise 20265 does need special handling for LBA48 I believe. The code should also be handling it correctly. Cc'd to Andre to investigate further ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-29 15:32 ` Alan Cox @ 2002-08-29 17:15 ` Andre Hedrick 2002-08-29 18:02 ` Mike Isely 2002-09-05 5:54 ` [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed Mike Isely 1 sibling, 1 reply; 35+ messages in thread From: Andre Hedrick @ 2002-08-29 17:15 UTC (permalink / raw) To: Alan Cox; +Cc: Mike Isely, Linux Kernel Mailing List That host does have a flag check on the primary channel. The Seconday has been observed and many people have verified the second channel works okay in 48-bit. If you have a system which has a 28-bit limited host, and it has been openly discussed on lkml for many months, why would one not use the jumpon.exe from maxtor to prevent such problems. What I want is details from the last kernel you booted and worked, because I am positive AC's code does the correct thing. I was one of the first people to find the 48-bit bomb in that asic during prototype of the large drive technology. So please add more details, and regardless this is a semi-development thread and nobody else has reported this error. On 29 Aug 2002, Alan Cox wrote: > The promise 20265 does need special handling for LBA48 I believe. The > code should also be handling it correctly. Cc'd to Andre to investigate > further > Cheers, Andre Hedrick LAD Storage Consulting Group ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-29 17:15 ` Andre Hedrick @ 2002-08-29 18:02 ` Mike Isely 2002-08-29 19:15 ` Mike Isely 0 siblings, 1 reply; 35+ messages in thread From: Mike Isely @ 2002-08-29 18:02 UTC (permalink / raw) To: Andre Hedrick; +Cc: Alan Cox, Linux Kernel Mailing List On Thu, 29 Aug 2002, Andre Hedrick wrote: > > That host does have a flag check on the primary channel. > The Seconday has been observed and many people have verified the second > channel works okay in 48-bit. > > If you have a system which has a 28-bit limited host, and it has been > openly discussed on lkml for many months, why would one not use the > jumpon.exe from maxtor to prevent such problems. First, I'm new to lkml. I did search the recent archives for information on this topic, hoping that if it was a new problem it would have shown up here. Since the trouble for me began with 2.4.20-pre4-ac1, I did not search that far back. I have never used "jumpon.exe" from Maxtor. I don't even know what it is (yet, I'm sure I'm going to find out awful quick now...). When I set up the system, it "just worked" from day 1 with the existing IDE driver in the 2.4.19-preX series so I had no reason to go looking for issues like this. > > What I want is details from the last kernel you booted and worked, because > I am positive AC's code does the correct thing. I was one of the first > people to find the 48-bit bomb in that asic during prototype of the large > drive technology. I don't doubt that it worked at some point. It had to have worked, otherwise my hardware would never have worked at all at any time. The fact is that the system was stable for several months; I had installed a full Debian setup on that hard drive, while attached to that controller, and dumped tens of GB to it over that time without incident. The trouble happened when I updated the kernel to 2.4.20-pre4-ac1. The previous kernel I had booted was 2.4.19-ac4, configured similarly (copied its .config forward to build 2.4.20-pre4-ac1). And before that, I had run 2.4.19-pre10-ac2 without any IDE problems. > > So please add more details, and regardless this is a semi-development > thread and nobody else has reported this error. I'll add more details as I learn them, but right now I must point out: The same hardware configuration ran 2.4.19-ac4 just fine. The only change to the system was booting the newer kernel. No hardware changes, no BIOS updates, nothing else. Whatever went wrong got introduced somewhere between that version and 2.4.20-pre4-ac1. Unfortunately as I said originally, all the gory details "burned in the fire" so I have precious little else to offer. I will go back further in lkml and get up to speed on what happened back then with the "48 bit bomb", and I will look into your references about "28-bit limited host" and jumpon.exe. -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-29 18:02 ` Mike Isely @ 2002-08-29 19:15 ` Mike Isely 2002-08-29 19:28 ` Alan Cox 0 siblings, 1 reply; 35+ messages in thread From: Mike Isely @ 2002-08-29 19:15 UTC (permalink / raw) To: Andre Hedrick; +Cc: Alan Cox, Linux Kernel Mailing List On Thu Aug 29 13:46:10 2002, Mike Isely wrote: > > On Thu, 29 Aug 2002, Andre Hedrick wrote: > > > If you have a system which has a 28-bit limited host, and it has been > > openly discussed on lkml for many months, why would one not use the > > jumpon.exe from maxtor to prevent such problems. > > I have never used "jumpon.exe" from Maxtor. I don't even know what it > is (yet, I'm sure I'm going to find out awful quick now...). When I set > up the system, it "just worked" from day 1 with the existing IDE driver > in the 2.4.19-preX series so I had no reason to go looking for issues > like this. I did some digging and I think I can answer these points a little better now. If "28-bit limited host" refers to a system BIOS which can't do LBA48, then I don't think that's a problem here. I've been successfully booting this system without any special tweaks / fixes (hardware or software) for quite some time now. The Asus A7V266-E motherboard does indeed use an Award BIOS, but it's Award version 6.0 dated 2000, not 1999 as in some previous posts about there being trouble booting >32GB hard drives. Note: Since I was booting from the onboard Promise controller, the Promise BIOS was in play here too. It's version 2.01.0 build 43, copyright 2001. I understand now that jumpon.exe is a Maxtor utility to help boot hard drives >32GB in systems which otherwise can't do this. I never learned about it before because I've never had this problem. Indeed, the first OS I put on the hardware was Linux (last December using a 2.4.18 kernel with additional IDE patches to support LBA48); it didn't even see a DOS/Windows type boot disk until months later. So I don't think any of this is an issue. > > I'll add more details as I learn them, but right now I must point out: > > The same hardware configuration ran 2.4.19-ac4 just fine. The only > change to the system was booting the newer kernel. No hardware changes, > no BIOS updates, nothing else. Whatever went wrong got introduced > somewhere between that version and 2.4.20-pre4-ac1. I think the above point is extremely important. > > I will go back further in lkml and get up to speed on what happened back > then with the "48 bit bomb", and I will look into your references about > "28-bit limited host" and jumpon.exe. > I've done some more looking through the lkml archives and I found discussions from March / April about LBA48 problems and the Promise controller. Clearly from that, exactly how well LBA48 works seems to depend a lot on whether or not PIO vs DMA vs UltraDMA is being used. Also it looks like if CONFIG_IDE_TASKFILE_IO is on then things may yet be different. To those points, I can add these details for my situation: I believe the driver was in UltraDMA mode at the time and I had CONFIG_IDE_TASKFILE_IO turned on. I do understand your response here to my post. I'm making an extraordinary claim here for something that should just not happen at all. I understand the doubt. The simple fact however is that I still have a trashed system, and it happened only after updating the kernel. I know that's not a lot to go on, and again I apologize for lack of detail. I originally wasn't going to post to lkml about this; I have been a quiet Linux user for 8+ years and really felt that a problem of this severity would probably already have been noticed. I really didn't want to jump into the fray with this sort of "information". However several others that I work with (who are closer to the lkml community than I) really insisted that I post this information, however incomplete it is. So I did. If I'm the only one that has hit this - another reason for doubt - then I guess have no choice but to dig deeper. I can't really leave the broken system like this to play with. However I do have a smaller spare hard drive and I'll make that the new system disk, leaving the 160GB Maxtor attached to the Promise controller (with nothing valuable on it). I should be able to replicate the corruption and provide more information here, hopefully while still having a usable system. -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-29 19:15 ` Mike Isely @ 2002-08-29 19:28 ` Alan Cox 2002-08-29 19:32 ` Mike Isely 2002-08-30 7:07 ` Mike Isely 0 siblings, 2 replies; 35+ messages in thread From: Alan Cox @ 2002-08-29 19:28 UTC (permalink / raw) To: Mike Isely; +Cc: Andre Hedrick, Linux Kernel Mailing List On Thu, 2002-08-29 at 20:15, Mike Isely wrote: > I've done some more looking through the lkml archives and I found > discussions from March / April about LBA48 problems and the Promise > controller. Clearly from that, exactly how well LBA48 works seems to That was when the original work got done if I remember rightly > depend a lot on whether or not PIO vs DMA vs UltraDMA is being used. > Also it looks like if CONFIG_IDE_TASKFILE_IO is on then things may yet > be different. To those points, I can add these details for my > situation: I believe the driver was in UltraDMA mode at the time and I > had CONFIG_IDE_TASKFILE_IO turned on. PIO LBA48 seems to work on all promise Early promise needs a helping hand with DMA LBA48, one promise doesnt seem to do DMA LBA48 on secondary at all, and newer stuff gets it right. > all. I understand the doubt. The simple fact however is that I still > have a trashed system, and it happened only after updating the kernel. > I know that's not a lot to go on, and again I apologize for lack of > detail. I originally wasn't going to post to lkml about this; I have > been a quiet Linux user for 8+ years and really felt that a problem of > this severity would probably already have been noticed. I really You've actually provided prety much all the key information. The things that matter are: The file system was known good, passed fsck before you ran the recent kernel The file system wasnt good after this The problem is replicatable And what controller/drives which you've provided. > If I'm the only one that has hit this - another reason for doubt - > then I guess have no choice but to dig deeper. I can't really leave > the broken system like this to play with. However I do have a smaller > spare hard drive and I'll make that the new system disk, leaving the > 160GB Maxtor attached to the Promise controller (with nothing valuable > on it). I should be able to replicate the corruption and provide more > information here, hopefully while still having a usable system. If you can replicate it and find out where the problem begins that would be wonderful in itself. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-29 19:28 ` Alan Cox @ 2002-08-29 19:32 ` Mike Isely 2002-08-30 7:07 ` Mike Isely 1 sibling, 0 replies; 35+ messages in thread From: Mike Isely @ 2002-08-29 19:32 UTC (permalink / raw) To: Alan Cox; +Cc: Andre Hedrick, Linux Kernel Mailing List On 29 Aug 2002, Alan Cox wrote: > > PIO LBA48 seems to work on all promise > Early promise needs a helping hand with DMA LBA48, one promise doesnt > seem to do DMA LBA48 on secondary at all, and newer stuff gets it right. > > > And what controller/drives which you've provided. Another detail: The drive was on the primary cable, configured as master. It came up as /dev/hde (because hd[a-d] was for the motherboard's "native" controller). > > If you can replicate it and find out where the problem begins that would > be wonderful in itself. > I'll do what I can. I never have enough time. However I've benefitted from this excellent OS for too long; I should be doing more in return. -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-29 19:28 ` Alan Cox 2002-08-29 19:32 ` Mike Isely @ 2002-08-30 7:07 ` Mike Isely 2002-08-31 5:04 ` Mike Isely 1 sibling, 1 reply; 35+ messages in thread From: Mike Isely @ 2002-08-30 7:07 UTC (permalink / raw) To: Alan Cox; +Cc: Andre Hedrick, Linux Kernel Mailing List OK, I have some good news and some bad news. The bad news is that I replicated the corruption. The good news is that I replicated the corruption. Oh, and I can cause it on demand, and not lose my system in the process. I can provide LOTS and LOTS of details now. What do you want to know? Some additional background: The 160GB Maxtor has a number of file systems on it. Here's the fdisk -l output: Disk /dev/hde: 255 heads, 63 sectors, 19929 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/hde1 * 1 912 7325608+ c Win95 FAT32 (LBA) /dev/hde2 913 19929 152754052+ 5 Extended /dev/hde5 913 936 192748+ 83 Linux /dev/hde6 937 985 393561 83 Linux /dev/hde7 986 1058 586341 82 Linux swap /dev/hde8 1059 1423 2931831 83 Linux /dev/hde9 1424 19929 148649413+ 83 Linux The file system that started all the fireworks was the big one at the end, hde9. The rescue partition that booted up corrupted afterwards was hde6. The toasted root partition was hde8. Here's what I did: 1. I pulled a spare hard drive (80GB Maxtor) and installed it in the system as hda (primary controller, primary channel, master). 2. I put a Debian installation there. Updated the kernel to 2.4.19-ac4. 3. With a stable system on the spare drive, I moved the 160GB Maxtor to be hdc (primary controller, secondary channel, master). 4. Using an alternate superblock I managed to fsck the fsck'ed up file systems on the 160GB drive while running as hdc, while booted under 2.4.19-ac4. 5. I then ran additional fsck passes on the 160GB drive, checking all partitions. Just for paranoia's sake. All now passed clean. 6. I shut down the system, moved the 160GB drive to be hde (Promise controller, primary channel, master), and rebooted. 7. I ran the fsck passes again on the drive. Note: This is still under 2.4.19-ac4, but using the Promise controller. All passed, squeaky clean. So under 2.4.19-ac4 there's no problem. 8. I rebooted the system to 2.4.20-pre4-ac1 and fsck'ed the big partition again. Splat. Some time after 50% done it reported an error. Unlike the initial carnage, I wasn't an idiot and didn't use the -y fsck option this time, so it stopped after the first error and since I'm not writing to the drive, the contents hopefully should still be OK. I've already rebooted again and repeated the last step. I should be able to repeat this experiment as often as needed. Clearly there's something wrong in 2.4.20-pre4-ac1 that wasn't wrong in 2.4.19-ac4 that is impacting my setup. Some additional datapoints: 1. During bootup of 2.4.20-pre4-ac1, I found the following message in the kernel log, not previously seen: > hde: Maxtor 4G160J8, ATA DISK drive > ULTRA 66/100/133: Primary channel of Ultra 66/100/133 requires an 80-pin cable for Ultra66 operation. > Switching to Ultra33 mode. > Warning: Primary channel requires an 80-pin cable for operation. > hde reduced to Ultra33 mode. What makes this notable is that there is indeed an 80 pin cable connecting the 160GB drive to that controller. I hadn't noticed this message in 2.4.19-ac4, but honestly I didn't directly look for it yet. I'll check that. 2. I did something else that night that may have been less than smart. I remembered it tonight and repeated the experiment. I tried to read-only mount hde9 while the fsck was running. When this happens, the fsck process gets a short read and complains. Obviously that's going to mess up fsck. However that little shenanigan is not needed to screw things up. Tonight I ran step 8 (above) twice. The first time was after restarting fsck, after fsck had failed on account of my trying to ro-mount the file system. The second time - after rebooting - I still got the fsck failure some time after 50% completion, without having to try to mount anything. I've got a system here that I can foul-up on demand now. What would you like me to do? -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-30 7:07 ` Mike Isely @ 2002-08-31 5:04 ` Mike Isely 2002-08-31 5:57 ` Andre Hedrick 2002-08-31 10:54 ` Vojtech Pavlik 0 siblings, 2 replies; 35+ messages in thread From: Mike Isely @ 2002-08-31 5:04 UTC (permalink / raw) To: Alan Cox; +Cc: Andre Hedrick, Linux Kernel Mailing List > OK, I have some good news and some bad news. > > The bad news is that I replicated the corruption. > > The good news is that I replicated the corruption. Oh, and I can > cause it on demand, and not lose my system in the process. I can > provide LOTS and LOTS of details now. What do you want to know? > [...] I've done some more tests and have more information now. No smoking gun yet, but a few more clues. 1. I moved the 160GB drive away from the Promise controller and reattached it to the motherboard chipset's controller ("VIA Technologies, Inc. Bus Master IDE (rev 06)", by the way according to lspci). Then I booted 2.4.20-pre4-ac1 (the "bad" kernel) and fsck'ed the big partition again. It passed. Then I moved the drive back to the Promise controller, booted the same OS and fsck'ed again. Failure. 2. I booted 2.4.19-ac4 with the 160GB drive attached to the Promise controller and watched the kernel log output. There's no message about any missing 80 pin cable. This is different than 2.4.20-pre4-ac1 which complains that I allegedly don't have an 80 pin cable plugged. However the cable is there but the driver downshifts the interface to 33MHz anyway. I described this observation before and now today I noticed another poster on the lkml bringing up the same issue with his Promise 20269 controller (but in -pre5-ac1 instead - look for subject "2.4.20-pre5-ac1 PDC20269 80-pin acble misdetection" [sic]). 3. Still looking for the low-hanging fruit, I extracted lots of other info from the system. I grabbed fdisk -l output, dmesg output, the kernel source .config file and a bunch of stuff out of /proc/ide, once apiece for each kernel version (while the 160GB drive remained on the Promise controller). I then diff'ed it all. I have all this saved, but in the spirit of not wasting more bandwidth, I am not including the raw data here. However here's a summary of the the differences I found: o Lots of dmesg differences, but nothing I saw really relevant beyond the thing about the 80 pin cable. o fdisk -l output was unchanged between the kernel versions, so I guess at least disk geometry hasn't been messed up. o hdparm output is different between the kernel versions. This should not be a big surprise since the 2.4.20-pre4-ac1 driver is downshifting the bus speed. hdparm -i (and -I) reports udma2 for the suspect kernel while I get udma5 for the stable kernel. I did see one other alarming(?) change however; hdparm -I is reporting different configurations: 2.4.19-ac4: Configuration: Logical max current cylinders 16383 65535 heads 16 1 sectors/track 63 63 bytes/track: 0 (obsolete) bytes/sector: 0 (obsolete) current sector capacity: 4128705 LBA user addressable sectors = 268435455 2.4.20-pre4-ac1: Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 bytes/track: 0 (obsolete) bytes/sector: 0 (obsolete) current sector capacity: 16514064 LBA user addressable sectors = 268435455 Note the different sector capacity, cylinder counts, and head counts. And yes, the entry reporting the _larger_ capacity is the suspect kernel (double-checked). Is this significant? o Timings (hdparm -t -T output) are also different. The "bad" kernel (2.4.20-pre4-ac1) is only getting 30MB/sec off the device while 2.4.19-ac4 is reading 35MB/sec. Not exactly a fantastic difference, but 35MB/sec exceeds UDMA33 rate so that would suggest that 2.4.19-ac4 really is running the Promise controller at something better than udma2. o Output from /proc/ide/pdc202xx is identical between the kernels. o There are differences in the files in /proc/ide/ide2/hde/* between the kernels but the differences are too cryptic for me to decipher in any meaningful way (but if you want the data, ask). o The two kernel source .config files have more differences than I expected. Notably, I see a new CONFIG_PDC202XX_* options that weren't there before. For CONFIG_BLK_DEV_PDC202XX has _OLD and _NEW variants now (both are set). Also CONFIG_PDC202XX_FORCE is new (and not set). And CONFIG_PDC202XX_BURST was previously set but for some unexplained reason I have it not set in the "bad" kernel. For the record, here are the currently enabled CONFIG_IDE* settings (same for both kernels): CONFIG_IDE=y CONFIG_IDEDISK_MULTI_MODE=y CONFIG_IDEDISK_STROKE=y CONFIG_IDEDMA_AUTO=y CONFIG_IDEDMA_ONLYDISK=y CONFIG_IDEDMA_PCI_AUTO=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_IDE_CHIPSETS=y CONFIG_IDE_TASKFILE_IO=y CONFIG_IDE_TASK_IOCTL=y I'll build another 2.4.20-pre4-ac1 instance with CONFIG_PDC202XX_BURST turned on and see if that makes a difference. Any advice on the ...PDC202XX_OLD vs ...PDC202XX_NEW settings? Turn one of them off? What's the difference? (Don't answer that last one; I haven't checked the Configure help yet for it.) Another thing I can try is to force the driver to downshift to udma2 in 2.4.19-ac4 and see if then the problem appears there. I'll can also build a new kernel from the newest sources and see if the problem still exists. Is there anything else I should try? Advice on a better direction? Should I sit down and shut up already? Are you all still reading this far down the message? -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-31 5:04 ` Mike Isely @ 2002-08-31 5:57 ` Andre Hedrick 2002-08-31 6:07 ` Mike Isely 2002-09-01 2:59 ` Mike Isely 2002-08-31 10:54 ` Vojtech Pavlik 1 sibling, 2 replies; 35+ messages in thread From: Andre Hedrick @ 2002-08-31 5:57 UTC (permalink / raw) To: Mike Isely; +Cc: Alan Cox, Linux Kernel Mailing List Your data is not trashed. Linux failed to understand cut off partitions. When you said you put it on primary channel, I realized that you have a system that breaks the rules of Promise and I am not sure. This will make it more painful to parse systems which can 48-bit and those which can not. This is not going to be fun. grep "hwif->addressing" pdc202xx.c Stub out the three lines. Recompile and reboot, it will be fixed Andre Hedrick LAD Storage Consulting Group On Sat, 31 Aug 2002, Mike Isely wrote: > > > > OK, I have some good news and some bad news. > > > > The bad news is that I replicated the corruption. > > > > The good news is that I replicated the corruption. Oh, and I can > > cause it on demand, and not lose my system in the process. I can > > provide LOTS and LOTS of details now. What do you want to know? > > > > [...] > > I've done some more tests and have more information now. No smoking > gun yet, but a few more clues. > > 1. I moved the 160GB drive away from the Promise controller and > reattached it to the motherboard chipset's controller ("VIA > Technologies, Inc. Bus Master IDE (rev 06)", by the way according > to lspci). Then I booted 2.4.20-pre4-ac1 (the "bad" kernel) and > fsck'ed the big partition again. It passed. Then I moved the > drive back to the Promise controller, booted the same OS and > fsck'ed again. Failure. > > 2. I booted 2.4.19-ac4 with the 160GB drive attached to the Promise > controller and watched the kernel log output. There's no message > about any missing 80 pin cable. This is different than > 2.4.20-pre4-ac1 which complains that I allegedly don't have an 80 > pin cable plugged. However the cable is there but the driver > downshifts the interface to 33MHz anyway. I described this > observation before and now today I noticed another poster on the > lkml bringing up the same issue with his Promise 20269 controller > (but in -pre5-ac1 instead - look for subject "2.4.20-pre5-ac1 > PDC20269 80-pin acble misdetection" [sic]). > > 3. Still looking for the low-hanging fruit, I extracted lots of other > info from the system. I grabbed fdisk -l output, dmesg output, the > kernel source .config file and a bunch of stuff out of /proc/ide, > once apiece for each kernel version (while the 160GB drive remained > on the Promise controller). I then diff'ed it all. I have all > this saved, but in the spirit of not wasting more bandwidth, I am > not including the raw data here. However here's a summary of the > the differences I found: > > o Lots of dmesg differences, but nothing I saw really relevant > beyond the thing about the 80 pin cable. > > o fdisk -l output was unchanged between the kernel versions, so I > guess at least disk geometry hasn't been messed up. > > o hdparm output is different between the kernel versions. This > should not be a big surprise since the 2.4.20-pre4-ac1 driver is > downshifting the bus speed. hdparm -i (and -I) reports udma2 for > the suspect kernel while I get udma5 for the stable kernel. I > did see one other alarming(?) change however; hdparm -I is > reporting different configurations: > > 2.4.19-ac4: > Configuration: > Logical max current > cylinders 16383 65535 > heads 16 1 > sectors/track 63 63 > bytes/track: 0 (obsolete) > bytes/sector: 0 (obsolete) > current sector capacity: 4128705 > LBA user addressable sectors = 268435455 > > 2.4.20-pre4-ac1: > Configuration: > Logical max current > cylinders 16383 16383 > heads 16 16 > sectors/track 63 63 > bytes/track: 0 (obsolete) > bytes/sector: 0 (obsolete) > current sector capacity: 16514064 > LBA user addressable sectors = 268435455 > > Note the different sector capacity, cylinder counts, and head > counts. And yes, the entry reporting the _larger_ capacity is > the suspect kernel (double-checked). Is this significant? > > o Timings (hdparm -t -T output) are also different. The "bad" > kernel (2.4.20-pre4-ac1) is only getting 30MB/sec off the device > while 2.4.19-ac4 is reading 35MB/sec. Not exactly a fantastic > difference, but 35MB/sec exceeds UDMA33 rate so that would > suggest that 2.4.19-ac4 really is running the Promise controller > at something better than udma2. > > o Output from /proc/ide/pdc202xx is identical between the kernels. > > o There are differences in the files in /proc/ide/ide2/hde/* > between the kernels but the differences are too cryptic for me to > decipher in any meaningful way (but if you want the data, ask). > > o The two kernel source .config files have more differences than I > expected. Notably, I see a new CONFIG_PDC202XX_* options that > weren't there before. For CONFIG_BLK_DEV_PDC202XX has _OLD and > _NEW variants now (both are set). Also CONFIG_PDC202XX_FORCE is > new (and not set). And CONFIG_PDC202XX_BURST was previously set > but for some unexplained reason I have it not set in the "bad" > kernel. For the record, here are the currently enabled > CONFIG_IDE* settings (same for both kernels): > > CONFIG_IDE=y > CONFIG_IDEDISK_MULTI_MODE=y > CONFIG_IDEDISK_STROKE=y > CONFIG_IDEDMA_AUTO=y > CONFIG_IDEDMA_ONLYDISK=y > CONFIG_IDEDMA_PCI_AUTO=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_IDE_CHIPSETS=y > CONFIG_IDE_TASKFILE_IO=y > CONFIG_IDE_TASK_IOCTL=y > > > I'll build another 2.4.20-pre4-ac1 instance with CONFIG_PDC202XX_BURST > turned on and see if that makes a difference. Any advice on the > ...PDC202XX_OLD vs ...PDC202XX_NEW settings? Turn one of them off? > What's the difference? (Don't answer that last one; I haven't checked > the Configure help yet for it.) > > Another thing I can try is to force the driver to downshift to udma2 > in 2.4.19-ac4 and see if then the problem appears there. > > I'll can also build a new kernel from the newest sources and see if > the problem still exists. > > Is there anything else I should try? Advice on a better direction? > Should I sit down and shut up already? Are you all still reading this > far down the message? > > -Mike > > > | Mike Isely | PGP fingerprint > POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 > UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 > | (spam-foiling address) | > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-31 5:57 ` Andre Hedrick @ 2002-08-31 6:07 ` Mike Isely 2002-08-31 6:24 ` Andre Hedrick 2002-09-01 2:59 ` Mike Isely 1 sibling, 1 reply; 35+ messages in thread From: Mike Isely @ 2002-08-31 6:07 UTC (permalink / raw) To: Andre Hedrick; +Cc: Alan Cox, Linux Kernel Mailing List On Fri, 30 Aug 2002, Andre Hedrick wrote: > > Your data is not trashed. Well actually it was. After the driver read bad data from the disk (presumably mis-addressed) my knee-jerk reaction was to run e2fsk -y to "fix" it. And _that_ trashed the data. > Linux failed to understand cut off partitions. ??? > When you said you put it on primary channel, I realized that you have a > system that breaks the rules of Promise and I am not sure. What are the "rules of Promise" or where may I find such information? > This will make it more painful to parse systems which can 48-bit and those > which can not. > > This is not going to be fun. But this wasn't a problem in 2.4.19-ac4; what confounding factor now is making it difficult? > > grep "hwif->addressing" pdc202xx.c > > Stub out the three lines. > > Recompile and reboot, it will be fixed Will do. Thanks. If you have a more permanent fix you'd like me to test, let me know. -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-31 6:07 ` Mike Isely @ 2002-08-31 6:24 ` Andre Hedrick 2002-08-31 6:57 ` Mike Isely ` (3 more replies) 0 siblings, 4 replies; 35+ messages in thread From: Andre Hedrick @ 2002-08-31 6:24 UTC (permalink / raw) To: Mike Isely; +Cc: Alan Cox, Linux Kernel Mailing List On Sat, 31 Aug 2002, Mike Isely wrote: > On Fri, 30 Aug 2002, Andre Hedrick wrote: > > > > > Your data is not trashed. > > Well actually it was. After the driver read bad data from the disk > (presumably mis-addressed) my knee-jerk reaction was to run e2fsk -y to > "fix" it. And _that_ trashed the data. Okay that sounds more like it. The driver did not damage the data, only user space forced down the driver trashed it. Regardless of the definition of "is" you system was wrecked. > > > Linux failed to understand cut off partitions. > > ??? This was a great concern of mine when 48-bit was introduced. > > > When you said you put it on primary channel, I realized that you have a > > system that breaks the rules of Promise and I am not sure. > > What are the "rules of Promise" or where may I find such information? You do not want to sign the NDA's to get the data sheets, aquire all the hardware to test, generate tables of irregularities, query Promise, and then scratch your head why. I have a FastTrak 100 TX4 the BIOS fails to see beyond 128GB, but in practice it does. The PDC20267 will puke in 48-bit DMA, but run clean in 48-bit PIO :-/ Oh but that is the primary channel, Seconday Channel is clean both ways :-\ PDC20262 works in 48-bit DMA every where. PDC20265 similar to PDC20267 except yours. Rules are emperical tests and rants back at the OEM, and .... > > > This will make it more painful to parse systems which can 48-bit and those > > which can not. > > > > This is not going to be fun. > > But this wasn't a problem in 2.4.19-ac4; what confounding factor now is > making it difficult? Cause there were reports of PDC20265/PDC20267 comming in as deadlocking. Thanks for the wrinkle in the fabric of ruleless world. :-) > > > > grep "hwif->addressing" pdc202xx.c > > > > Stub out the three lines. > > > > Recompile and reboot, it will be fixed > > Will do. Thanks. If you have a more permanent fix you'd like me to > test, let me know. Oh another dang piece of the puzzle found and it does not fit anywhere! Cheers, Andre Hedrick LAD Storage Consulting Group ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-31 6:24 ` Andre Hedrick @ 2002-08-31 6:57 ` Mike Isely 2002-09-01 5:15 ` Mike Isely ` (2 subsequent siblings) 3 siblings, 0 replies; 35+ messages in thread From: Mike Isely @ 2002-08-31 6:57 UTC (permalink / raw) To: Andre Hedrick; +Cc: Alan Cox, Linux Kernel Mailing List On Fri, 30 Aug 2002, Andre Hedrick wrote: > On Sat, 31 Aug 2002, Mike Isely wrote: > > > On Fri, 30 Aug 2002, Andre Hedrick wrote: > > > > Okay that sounds more like it. The driver did not damage the data, only > user space forced down the driver trashed it. Regardless of the > definition of "is" you system was wrecked. No permanent harm. It was a workstation, and most of the 160GB drive was being used primarily as a backup device for a separate file server machine. Obviously I'd like to get that "backup device" up and running again. > > > > > > Linux failed to understand cut off partitions. > > > > ??? > > This was a great concern of mine when 48-bit was introduced. Ah, a riddle answered with another riddle. I know what 48 bit addresing is; I'm just curious to understand why my system seems to have run afoul of it, especially since things were ok before. (but read on...) > > What are the "rules of Promise" or where may I find such information? > > You do not want to sign the NDA's to get the data sheets, aquire all the > hardware to test, generate tables of irregularities, query Promise, and > then scratch your head why. OK, Uncle! I detect a lot of pain here and perhaps I'm exacerbating it by asking. The technical side of me just wants to understand. I write code for a living and have had my share of pain with crappy hardware (though nothing even close to the scale at which you are working). I hate I2C, by the way, and don't ever ask me about the P.O.S. Philips pcf8584. > > I have a FastTrak 100 TX4 the BIOS fails to see beyond 128GB, but in > practice it does. > > The PDC20267 will puke in 48-bit DMA, but run clean in 48-bit PIO :-/ > Oh but that is the primary channel, Seconday Channel is clean both ways :-\ Oh goodie. This can't be by design, but rather by stupid implementation. But I'll stop now before aggravating your ulcer :-) > > PDC20262 works in 48-bit DMA every where. > > PDC20265 similar to PDC20267 except yours. But I'd still like to understand why my PDC20265 seems unique. Earlier hardware rev? Later hardware rev? Promise BIOS issue? The Asus A7V-266E motherboard was purchased December 2001. If it's any help, I'm staring at the chip on the board now. The label shows: PROMISE (R) TECHNOLOGY INC. PDC20265R (C) 2000-0113 Maybe there is another cleaner way to go at this problem. > > Rules are emperical tests and rants back at the OEM, and .... > Sounds to me like you need a vacation ;-) > > > > But this wasn't a problem in 2.4.19-ac4; what confounding factor now is > > making it difficult? > > Cause there were reports of PDC20265/PDC20267 comming in as deadlocking. > Thanks for the wrinkle in the fabric of ruleless world. :-) > You're welcome :-) -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-31 6:24 ` Andre Hedrick 2002-08-31 6:57 ` Mike Isely @ 2002-09-01 5:15 ` Mike Isely 2002-09-02 8:19 ` Joachim Breuer 2002-09-05 6:05 ` Mike Isely 2002-09-02 8:16 ` Joachim Breuer 2002-09-03 12:41 ` mbs 3 siblings, 2 replies; 35+ messages in thread From: Mike Isely @ 2002-09-01 5:15 UTC (permalink / raw) To: Andre Hedrick; +Cc: Alan Cox, Linux Kernel Mailing List Another update and more information on the "Linux 2.4.20-pre4-ac1 ate my system" problem... Question: I am new to this mailing list; should I keep copying these messages to lkml or should I just pester Andre and/or Alan privately now? I've been studying pdc202xx.c and anything else in the other IDE driver source files which reference any identifier named "addressing". I think I understand the picture better now. The addressing field of ide_drive_t describes how the drive is to be addressed. Indeed, ide.h defines these values: 0= 28 bit, 1= 48 bit, 2= 48 bit doing 28 bit (a hack?) and 3=64 bit. Also it appears that idedisk_add_settings() defines the "address" attribute viewable from /proc/ide/<host>/<device>/settings which shows the current value of this field. Notably, also in pdc202xx.c we find two places where drive->addressing is used, once each in pdc202xx_old_ide_dma_begin() and pdc202xx_old_ide_dma_end(). If addressing is 1 (48 bit), then extra logic is inserted here to manipulate the hardware. I presume this is Alan's Promise controller LBA48 fix... In ide-disk.c we find the function probe_lba_addressing() which appears to be the only real place where this addressing field gets set (I think it can also be set somehow through /proc/ide/whatever but I doubt that's a path I should be concerned about). There's also set_lba_addressing() but it just does nothing but pass through to probe_lba_addressing(). Further down in ide_disk.c we find that idedisk_setup() calls probe_lba_addressing() with hardcoded arguments such that it attempts to set the addressing mode to 1 (48 bit addressing). So far so good. However, back in probe_lba_addressing() there is an interesting thing going on. It first initializes addressing to 0 (28 bit), and then after a few checks sets it to the requested value (second argument of the function). One of those checks appears to be a check of another "addressing" field which is a member of the ide_hwif_t structure. If _this_ addressing field is non-zero, then the rest of probe_lba_addressing() is aborted - thus addressing gets forced to zero (28 bit). It seems that if ide_hwif_t::addressing is non-zero, we force 28 bit addressing no matter what. Here I should point out the contents of /proc/ide/ide2/hde/settings. When my system is running under 2.4.19-ac4 (the "good" kernel), the "address" field is 1 - which makes sense. I have a 160GB drive as hde so naturally it should be addressed LBA48 style. However, that same field in that same file is 0 while running under 2.4.20-pre4-ac1 (the "bad" kernel, and 2.4.20-pre5-ac1 - I checked). So this would suggest that my drive is being treated with 28 bit addressing, which would go a long ways towards explaining why I'm getting the corruption. So why is addressing being set to 0? Yesterday Andre described a work-around I should try. Edit pdc202xx.c and stub out the line matching "hwif->addressing". That occurs in init_hwif_pdc202xx() inside a switch statement based on chip id. What I found was: hwif->addressing = (hwif->channel) ? 0 : 1 in the case for PCI_DEVICE_ID_PROMISE_20265. Well that makes sense. I'm on the primary channel, thus the condition is false and hwif->addressing gets a value of 1. That kills probe_lba_addressing() and I'm stuck at 28 bits. What's more, this line is not in the driver that's part of 2.4.19-ac4 (but it's still there for the 20267 as Andre points out). So the advice to comment out that line makes sense. I killed the line, built a new kernel and tried again. Result? DMA is completely fubared. As soon as the new kernel tries to read hde's partition table, a flurry of DMA timeouts take place and the system either hangs or falls back to PIO mode (had both cases happen). I have no idea why this is happening. Ideas? In addition to commenting out the line, I also tried forcing hwif->addressing to 1 (no effect) and 0 (fubared DMA again). That result of course makes sense. Another thing I did was to force on Alan's LBA48 fix code in pdc202xx.c (by removing the LBA48 check). This had no effect (still got disk corruption). I don't think Alan's code has anything to do with the root cause here. Additional things I am trying right now: 1. Try the same experiment with 2.4.20-ac5-pre1, i.e. kill the hwif->addressing setting in pdc202xx.c and see if it works OK there. 2. Turn off taskfile mode (CONFIG_IDE_TASKFILE_IO), comment out the line and see if DMA still works now. In summary, it seems that we want ide_drive_t::addressing to be 1, but it's currently 0. It was 1 in 2.4.19-ac4 (where things worked). It's zero because of a piece of duct tape in pdc202xx.c where it recognizes the 20265 chip and prevents the driver from enabling 48 bit mode on the primary controller. However if I remove that duct tape and presumably let the addressing field go to 1, then all dma to that device times out. I want to solve this problem. I know I'm probably being an annoying pest by now, but I'm willing to try things, not just sit here and scream "please fix this or I'll hold my breath until I turn purple". I know very little unfortunately about IDE standards, but I can learn quickly. Use me to help find the cause. I'm here. What would you like me to try? Is there anything I can do? Am I missing something blindingly obvious? It's happened before :-) Am I asking too many questions? :-) -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-09-01 5:15 ` Mike Isely @ 2002-09-02 8:19 ` Joachim Breuer 2002-09-05 6:05 ` Mike Isely 1 sibling, 0 replies; 35+ messages in thread From: Joachim Breuer @ 2002-09-02 8:19 UTC (permalink / raw) To: Mike Isely; +Cc: Linux Kernel Mailing List Mike Isely <isely@pobox.com> writes: > Another update and more information on the "Linux 2.4.20-pre4-ac1 ate > my system" problem... > > Question: I am new to this mailing list; should I keep copying these > messages to lkml or should I just pester Andre and/or Alan privately > now? PLEASE do continue to copy to the list; you got some discussion on that issue started and there's more people out here who want to figure out what's going on with Linux IDE support... PLEASE! The thread so far was invalueable to me, I couldn't find similar appropriate infos in the archives... that should have changed now ;-) Thanks! So long, Joe -- "I use emacs, which might be thought of as a thermonuclear word processor." -- Neal Stephenson, "In the beginning... was the command line" ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-09-01 5:15 ` Mike Isely 2002-09-02 8:19 ` Joachim Breuer @ 2002-09-05 6:05 ` Mike Isely 1 sibling, 0 replies; 35+ messages in thread From: Mike Isely @ 2002-09-05 6:05 UTC (permalink / raw) To: Andre Hedrick; +Cc: Alan Cox, Linux Kernel Mailing List Final update on this thread. 1. I fixed the busted DMA. Full explanation of the bug and the associated 2-line patch can be found in a separate message with subject "[PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed". 2. The problem I saw with CDROM detection in 2.4.20-pre4-ac1 was PEBCAK. I had the cable backwards (host connector in the drive, master connector in the controller). 2.4.19-ac4 doesn't seem to be sensitive to this. 3. Still no idea about the broken 80 pin cable detection - yes, I double-checked the cable orientation :-) -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-31 6:24 ` Andre Hedrick 2002-08-31 6:57 ` Mike Isely 2002-09-01 5:15 ` Mike Isely @ 2002-09-02 8:16 ` Joachim Breuer 2002-09-03 12:41 ` mbs 3 siblings, 0 replies; 35+ messages in thread From: Joachim Breuer @ 2002-09-02 8:16 UTC (permalink / raw) To: Andre Hedrick; +Cc: Mike Isely, Alan Cox, Linux Kernel Mailing List Andre Hedrick <andre@linux-ide.org> writes: > On Sat, 31 Aug 2002, Mike Isely wrote: > >> On Fri, 30 Aug 2002, Andre Hedrick wrote: >> >> > When you said you put it on primary channel, I realized that you have a >> > system that breaks the rules of Promise and I am not sure. >> >> What are the "rules of Promise" or where may I find such information? > > You do not want to sign the NDA's to get the data sheets, aquire all the > hardware to test, generate tables of irregularities, query Promise, and > then scratch your head why. > > I have a FastTrak 100 TX4 the BIOS fails to see beyond 128GB, but in > practice it does. > > The PDC20267 will puke in 48-bit DMA, but run clean in 48-bit PIO :-/ > Oh but that is the primary channel, Seconday Channel is clean both ways :-\ > > PDC20262 works in 48-bit DMA every where. > > PDC20265 similar to PDC20267 except yours. > > Rules are emperical tests and rants back at the OEM, and .... Another data point: My experiences with 2.4.19-pre4-ac2 are remarkably similar to Mike Isley's, but for a few interesting differences: - 2.4.18 runs O.K. - 2.4.19 hangs when checking for partitions - 2.4.19-ac4 hangs, too - 2.4.20-pre4 hangs, too - 2.4.20-pre4-ac2 does not hang, but shows problems exactly as Mike is describing: - Claims 80pin cable is missing - wrong data read from disk, write based on wrong read trashes fs My hardware: o Promise PDC20262 On-Board on a GigaByte GA-6BX7+ (Intel 440BX) o Maxtor 120G (4G120J6) >> > grep "hwif->addressing" pdc202xx.c >> > >> > Stub out the three lines. >> > >> > Recompile and reboot, it will be fixed >> >> Will do. Thanks. If you have a more permanent fix you'd like me to >> test, let me know. > > Oh another dang piece of the puzzle found and it does not fit anywhere! Does this fix the bogus 80-pin message or does it just have to do with block addressing and thus the "corruption" issue? I'm asking because the 20262 seems to break ATAPI devices completely once it was in a "wrong" mode. I.e. if my PX-W1610 on the second channel is correctly detected as MDMA2 it works, if it is detected as something else and I try to tweak it the channel and/or controller hangs. Can I somewhere get a complete picture of what is *supposed* to work with the '62 and what not? Thanks a lot! So long, Joe -- "I use emacs, which might be thought of as a thermonuclear word processor." -- Neal Stephenson, "In the beginning... was the command line" ^ permalink raw reply [flat|nested] 35+ messages in thread
* 2.4.20-pre4-ac1 trashed my system 2002-08-31 6:24 ` Andre Hedrick ` (2 preceding siblings ...) 2002-09-02 8:16 ` Joachim Breuer @ 2002-09-03 12:41 ` mbs 2002-09-03 14:34 ` Mike Isely 2002-09-03 15:59 ` Alan Cox 3 siblings, 2 replies; 35+ messages in thread From: mbs @ 2002-09-03 12:41 UTC (permalink / raw) To: Andre Hedrick, Mike Isely; +Cc: Alan Cox, Linux Kernel Mailing List it trashed mine also. supermicro p4dp8-g2 mobo 2x 2.2 Xeon e7500 chipset wd400 40gb hd 2.4.20-pre4-ac2 + RML preempt patch (applied cleanly) boot it and eveything runs fine for a short while, then I start getting "bad CRC" errors and "seek failure" errors. I have had this problem with both ext2 and ext3 initially I thought it was a bad HD, so I installed a new one on a new cable and did a complete rh7.3 install ran for a while eith no problems then built the same kernel over again, rebooted into the new kernel and within seconds was having problems again. 2.4.19-rc3-ac4 +rml preempt has been dead stable, as has (so far) 2.4.29-ac4 +rml and RH 2.4.18-3 and -5 I am not doing anything funky with hd setup, not even specifying idebus= this has happened with 40 and 80 wire cables. if there is any additional info I can provide please let me know. -- /************************************************** ** Mark Salisbury || mbs@mc.com ** ** If you would like to sponsor me for the ** ** Mass Getaway, a 150 mile bicycle ride to for ** ** MS, contact me to donate by cash or check or ** ** click the link below to donate by credit card ** **************************************************/ https://www.nationalmssociety.org/pledge/pledge.asp?participantid=86736 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-09-03 12:41 ` mbs @ 2002-09-03 14:34 ` Mike Isely 2002-09-03 16:00 ` Alan Cox 2002-09-03 15:59 ` Alan Cox 1 sibling, 1 reply; 35+ messages in thread From: Mike Isely @ 2002-09-03 14:34 UTC (permalink / raw) To: mbs; +Cc: Andre Hedrick, Alan Cox, Linux Kernel Mailing List On Tue, 3 Sep 2002, mbs wrote: > it trashed mine also. > > supermicro p4dp8-g2 mobo > 2x 2.2 Xeon > e7500 chipset > wd400 40gb hd > > 2.4.20-pre4-ac2 + RML preempt patch (applied cleanly) > > boot it and eveything runs fine for a short while, then I start getting "bad > CRC" errors and "seek failure" errors. > > I have had this problem with both ext2 and ext3 > > initially I thought it was a bad HD, so I installed a new one on a new cable > and did a complete rh7.3 install ran for a while eith no problems then built > the same kernel over again, rebooted into the new kernel and within seconds > was having problems again. > > 2.4.19-rc3-ac4 +rml preempt has been dead stable, as has (so far) 2.4.29-ac4 > +rml and RH 2.4.18-3 and -5 > > I am not doing anything funky with hd setup, not even specifying idebus= > This is likely different than the problem I've been seeing. My situation appears to be due to the fact that on my Promise controller, LBA48 addressing mode had been turned off on the primary channel, which then causes access problems with my 160GB Maxtor drive. Turning LBA48 mode back on (by removing the hack which turned it off, which wasn't in 2.4.19-ac4) breaks DMA. Either that or I screwed something up when removing the hack. I'm wondering if a bug appeared after 2.4.19-ac4 which breaks DMA on Promise 20265 primary channel access, and that a work-around was put in place that disables LBA48 addressing. There are in fact well over 100 diffs in pdc202xx.c between 2.4.19-ac4 and 2.4.20-pre4-ac1. This wrong addressing is what (indirectly) wrecked my system. I've posted my findings on this so far along with some questions for further investigation, but I haven't seen any answers yet (or even a "go away you're bothering me" reply). Unfortunately you've said you are using a 40GB drive and something other than a Promise controller so your situation may be a different problem. -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-09-03 14:34 ` Mike Isely @ 2002-09-03 16:00 ` Alan Cox 2002-09-04 10:21 ` Rogier Wolff 0 siblings, 1 reply; 35+ messages in thread From: Alan Cox @ 2002-09-03 16:00 UTC (permalink / raw) To: Mike Isely; +Cc: mbs, Andre Hedrick, Linux Kernel Mailing List > Unfortunately you've said you are using a 40GB drive and something other > than a Promise controller so your situation may be a different problem. The 40Gb drives may well be trying to pick LBA48, but LBA48 works on the Intel hardware ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-09-03 16:00 ` Alan Cox @ 2002-09-04 10:21 ` Rogier Wolff 0 siblings, 0 replies; 35+ messages in thread From: Rogier Wolff @ 2002-09-04 10:21 UTC (permalink / raw) To: Alan Cox; +Cc: Mike Isely, mbs, Andre Hedrick, Linux Kernel Mailing List On Tue, Sep 03, 2002 at 05:00:58PM +0100, Alan Cox wrote: > > Unfortunately you've said you are using a 40GB drive and something other > > than a Promise controller so your situation may be a different problem. > > The 40Gb drives may well be trying to pick LBA48, but LBA48 works on the > Intel hardware The maxtor drives XxYYYyZ where X is a number x is a letter YYY is the capacity (040 for a 40G) y is a letter Z is a number. When Z = 2, the disk is single platter, and COULD have been 4 times as large by adding three more platters. That adds up to 160G so the Xy040z2 drives will certainly do LBA48. Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 ** *-- BitWizard writes Linux device drivers for any device you may have! --* * The Worlds Ecosystem is a stable system. Stable systems may experience * * excursions from the stable situation. We are currenly in such an * * excursion: The stable situation does not include humans. *************** ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-09-03 12:41 ` mbs 2002-09-03 14:34 ` Mike Isely @ 2002-09-03 15:59 ` Alan Cox 2002-09-03 18:33 ` mbs 1 sibling, 1 reply; 35+ messages in thread From: Alan Cox @ 2002-09-03 15:59 UTC (permalink / raw) To: mbs; +Cc: Andre Hedrick, Mike Isely, Linux Kernel Mailing List On Tue, 2002-09-03 at 13:41, mbs wrote: > 2.4.20-pre4-ac2 + RML preempt patch (applied cleanly) I'm not interested in any bug reports with the pre-empt patch involved. It just muddies the waters ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-09-03 15:59 ` Alan Cox @ 2002-09-03 18:33 ` mbs 0 siblings, 0 replies; 35+ messages in thread From: mbs @ 2002-09-03 18:33 UTC (permalink / raw) To: Alan Cox; +Cc: Andre Hedrick, Mike Isely, Linux Kernel Mailing List ok, tomorrow, I'll let you know how it goes without preempt. On Tuesday 03 September 2002 11:59, Alan Cox wrote: > On Tue, 2002-09-03 at 13:41, mbs wrote: > > 2.4.20-pre4-ac2 + RML preempt patch (applied cleanly) > > I'm not interested in any bug reports with the pre-empt patch involved. > It just muddies the waters -- /************************************************** ** Mark Salisbury || mbs@mc.com ** ** If you would like to sponsor me for the ** ** Mass Getaway, a 150 mile bicycle ride to for ** ** MS, contact me to donate by cash or check or ** ** click the link below to donate by credit card ** **************************************************/ https://www.nationalmssociety.org/pledge/pledge.asp?participantid=86736 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-31 5:57 ` Andre Hedrick 2002-08-31 6:07 ` Mike Isely @ 2002-09-01 2:59 ` Mike Isely 1 sibling, 0 replies; 35+ messages in thread From: Mike Isely @ 2002-09-01 2:59 UTC (permalink / raw) To: Andre Hedrick; +Cc: Alan Cox, Linux Kernel Mailing List On Fri, 30 Aug 2002, Andre Hedrick wrote: > > > This is not going to be fun. > > grep "hwif->addressing" pdc202xx.c > > Stub out the three lines. > > Recompile and reboot, it will be fixed > What version of the driver source are you using? In 2.4.20-pre4-ac1 and 2.4.20-pre5-ac1, grep "hwif->addressing" pdc202xx.c finds only 1 line. There are however two other places in pdc202xx.c where one can find "drive->addressing", each used as a condition in an if-statement (which looks a lot like this might be the LBA48 fix you and Alan have been telling me about). What exactly do you want me to do? Knock out the if-conditions (and matching close-braces)? Knock out the entire block (and assumedly the LBA48 fix along with it) in each case? I've been trying different combinations but so far either the result has been no effect or fatally broken DMA (timeouts / failures at boot and then the driver falls back to PIO). I can post more details later, but I wonder if I'm missing something blindingly obvious here... Side note: In /proc/ide/ide2/hde/settings in 2.4.19-ac4, the "address" field reports a value of 1. However in 2.4.20-pre4-ac1, I instead find the value 0. Is this related to the addressing field in ide_hwif_t? Oh, and while futzing with things I tried another experiment with the hardware. This may be a totally unrelated problem. I attached a Plextor CD burner to the second cable of the Promise controller. It should show up as hdg. Under 2.4.19-ac4 it shows up. Under 2.4.20-pre4-ac1 it isn't anywhere to be found - no errors, no hints anywhere in the system that it might exist. Yes, the IDE CDROM driver is compiled into the kernel. -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: 2.4.20-pre4-ac1 trashed my system 2002-08-31 5:04 ` Mike Isely 2002-08-31 5:57 ` Andre Hedrick @ 2002-08-31 10:54 ` Vojtech Pavlik 1 sibling, 0 replies; 35+ messages in thread From: Vojtech Pavlik @ 2002-08-31 10:54 UTC (permalink / raw) To: Mike Isely; +Cc: Alan Cox, Andre Hedrick, Linux Kernel Mailing List On Sat, Aug 31, 2002 at 12:04:20AM -0500, Mike Isely wrote: > > > > OK, I have some good news and some bad news. > > > > The bad news is that I replicated the corruption. > > > > The good news is that I replicated the corruption. Oh, and I can > > cause it on demand, and not lose my system in the process. I can > > provide LOTS and LOTS of details now. What do you want to know? > > > > [...] > > I've done some more tests and have more information now. No smoking > gun yet, but a few more clues. > > 1. I moved the 160GB drive away from the Promise controller and > reattached it to the motherboard chipset's controller ("VIA > Technologies, Inc. Bus Master IDE (rev 06)", by the way according > to lspci). Then I booted 2.4.20-pre4-ac1 (the "bad" kernel) and > fsck'ed the big partition again. It passed. Then I moved the > drive back to the Promise controller, booted the same OS and > fsck'ed again. Failure. > > 2. I booted 2.4.19-ac4 with the 160GB drive attached to the Promise > controller and watched the kernel log output. There's no message > about any missing 80 pin cable. This is different than > 2.4.20-pre4-ac1 which complains that I allegedly don't have an 80 > pin cable plugged. However the cable is there but the driver > downshifts the interface to 33MHz anyway. I described this Note that 33 MHz isn't 33 MB/sec (UDMA2). Question remains, what you wanted to say. -- Vojtech Pavlik SuSE Labs ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed 2002-08-29 15:32 ` Alan Cox 2002-08-29 17:15 ` Andre Hedrick @ 2002-09-05 5:54 ` Mike Isely 2002-09-05 12:47 ` Henning P. Schmiedehausen 2002-09-05 14:41 ` Tomas Szepe 1 sibling, 2 replies; 35+ messages in thread From: Mike Isely @ 2002-09-05 5:54 UTC (permalink / raw) To: andre, Alan Cox; +Cc: Linux Kernel Mailing List The trivial patch at the end of this text fixes DMA w/ LBA48 problems on the Promise 202265 controller and probably also the 20267. This patch is against 2.4.20-pre5-ac2 but I suspect it should apply cleanly against anything after 2.4.19-ac4. Problem: LBA48 DMA stopped working on Promise 20265 some time after kernel version 2.4.19-ac4. This manifested itself on systems with large (>137GB) hard drives; addressing above 137GB stopped working correctly, leading to file system errors, and after a foolish "e2fsck -y" operation, massive corruption. Cause: The DMA was broken due to a bad if-statement in the function init_hwif_pdc202xx() in pdc202xx.c. Because "!" binds more tightly than "==", the check against PCI_DEVICE_ID_PROMISE_20246 was incorrect, which prevented the Promise controller LBA48 fix logic from basically ever being turned on. Obfuscating this further was logic in that same function which disabled LBA48 addressing mode for devices on the primary channel of the 20265 or 20267. Solution: Apply parantheses to get evaluation ordering correct. Then remove duct tape which disabled LBA48 addressing. Verification: Before this fix, inspecting /proc/ide/<host>/<device>/settings would show "0" for the "address" attribute, owing to LBA48 being off. Just removing the duct tape causing this however results in a broken system (driver DMA completely fails). After also fixing the if-statement, the system comes up successfully, the "address" attribute reads back as "1" (confirms LBA48 addressing on), and most importantly, fsck'ing the big drive comes back clean! This problem did not exist in 2.4.19-ac4 because the code had since then been rearranged / rewritten. The new code harbored the bug and the LBA48 regression. Note: I have not tested this fix against the Promise 20267, but I suspect (since 2.4.19-ac4 didn't hack up the 20267 either) that the same fix applies there so I deleted the duct tape rather than just moving it. -Mike diff -u -r linux-2.4.20-pre5-ac2/drivers/ide/pci/pdc202xx.c linux-2.4.20-pre5-ac2.fixed/drivers/ide/pci/pdc202xx.c --- linux-2.4.20-pre5-ac2/drivers/ide/pci/pdc202xx.c 2002-09-05 00:09:43.000000000 -0500 +++ linux-2.4.20-pre5-ac2.fixed/drivers/ide/pci/pdc202xx.c 2002-09-05 00:16:43.000000000 -0500 @@ -952,7 +952,6 @@ break; case PCI_DEVICE_ID_PROMISE_20267: case PCI_DEVICE_ID_PROMISE_20265: - hwif->addressing = (hwif->channel) ? 0 : 1; case PCI_DEVICE_ID_PROMISE_20263: case PCI_DEVICE_ID_PROMISE_20262: hwif->busproc = &pdc202xx_tristate; @@ -979,7 +978,7 @@ if (!(hwif->udma_four)) hwif->udma_four = (!(hwif->INB(hwif->dma_vendor3) & 0x04)); } else { - if (!hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246) { + if (!(hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246)) { u16 mask = (hwif->channel) ? (1<<11) : (1<<10); u16 CIS = 0; hwif->ide_dma_begin = &pdc202xx_old_ide_dma_begin; | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed 2002-09-05 5:54 ` [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed Mike Isely @ 2002-09-05 12:47 ` Henning P. Schmiedehausen 2002-09-05 14:12 ` Mike Isely 2002-09-05 14:41 ` Tomas Szepe 1 sibling, 1 reply; 35+ messages in thread From: Henning P. Schmiedehausen @ 2002-09-05 12:47 UTC (permalink / raw) To: linux-kernel Mike Isely <isely@pobox.com> writes: >The trivial patch at the end of this text fixes DMA w/ LBA48 problems More readable would be: >- if (!hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246) { >+ if (!(hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246)) { if (hwif->pci_dev->device != PCI_DEVICE_ID_PROMISE_20246) { Regards Henning -- Dipl.-Inf. (Univ.) Henning P. Schmiedehausen -- Geschaeftsfuehrer INTERMETA - Gesellschaft fuer Mehrwertdienste mbH hps@intermeta.de Am Schwabachgrund 22 Fon.: 09131 / 50654-0 info@intermeta.de D-91054 Buckenhof Fax.: 09131 / 50654-20 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed 2002-09-05 12:47 ` Henning P. Schmiedehausen @ 2002-09-05 14:12 ` Mike Isely 2002-09-05 14:31 ` Alan Cox 2002-09-05 14:35 ` Horst von Brand 0 siblings, 2 replies; 35+ messages in thread From: Mike Isely @ 2002-09-05 14:12 UTC (permalink / raw) To: Henning P. Schmiedehausen; +Cc: linux-kernel On Thu, 5 Sep 2002, Henning P. Schmiedehausen wrote: > Mike Isely <isely@pobox.com> writes: > > >The trivial patch at the end of this text fixes DMA w/ LBA48 problems > > More readable would be: > > >- if (!hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246) { > >+ if (!(hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246)) { > > if (hwif->pci_dev->device != PCI_DEVICE_ID_PROMISE_20246) { > Yes that is true. But this is Andre's code and it seemed to me to be more important to follow his style. But whatever... -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed 2002-09-05 14:12 ` Mike Isely @ 2002-09-05 14:31 ` Alan Cox 2002-09-05 14:33 ` Mike Isely 2002-09-05 14:35 ` Horst von Brand 1 sibling, 1 reply; 35+ messages in thread From: Alan Cox @ 2002-09-05 14:31 UTC (permalink / raw) To: Mike Isely; +Cc: Henning P. Schmiedehausen, linux-kernel On Thu, 2002-09-05 at 15:12, Mike Isely wrote: > Yes that is true. But this is Andre's code and it seemed to me to be > more important to follow his style. But whatever... Its a good general rule but for the IDE, break it ;) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed 2002-09-05 14:31 ` Alan Cox @ 2002-09-05 14:33 ` Mike Isely 0 siblings, 0 replies; 35+ messages in thread From: Mike Isely @ 2002-09-05 14:33 UTC (permalink / raw) To: Alan Cox; +Cc: Henning P. Schmiedehausen, linux-kernel@vger.kernel.org On 5 Sep 2002, Alan Cox wrote: > On Thu, 2002-09-05 at 15:12, Mike Isely wrote: > > Yes that is true. But this is Andre's code and it seemed to me to be > > more important to follow his style. But whatever... > > Its a good general rule but for the IDE, break it ;) > Point taken. I should have expected to hear this :-) -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed 2002-09-05 14:12 ` Mike Isely 2002-09-05 14:31 ` Alan Cox @ 2002-09-05 14:35 ` Horst von Brand 2002-09-05 14:47 ` Mike Isely 1 sibling, 1 reply; 35+ messages in thread From: Horst von Brand @ 2002-09-05 14:35 UTC (permalink / raw) To: Mike Isely; +Cc: Henning P. Schmiedehausen, linux-kernel Mike Isely <isely@pobox.com> said: > On Thu, 5 Sep 2002, Henning P. Schmiedehausen wrote: > > > Mike Isely <isely@pobox.com> writes: > > > > >The trivial patch at the end of this text fixes DMA w/ LBA48 problems > > > > More readable would be: > > > > >- if (!hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246) { > > >+ if (!(hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246)) { > > > > if (hwif->pci_dev->device != PCI_DEVICE_ID_PROMISE_20246) { > > > > Yes that is true. But this is Andre's code and it seemed to me to be > more important to follow his style. But whatever... What is wrong with != here? -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed 2002-09-05 14:35 ` Horst von Brand @ 2002-09-05 14:47 ` Mike Isely 2002-09-05 14:56 ` Tomas Szepe 0 siblings, 1 reply; 35+ messages in thread From: Mike Isely @ 2002-09-05 14:47 UTC (permalink / raw) To: Horst von Brand; +Cc: Henning P. Schmiedehausen, linux-kernel@vger.kernel.org On Thu, 5 Sep 2002, Horst von Brand wrote: > Mike Isely <isely@pobox.com> said: > > On Thu, 5 Sep 2002, Henning P. Schmiedehausen wrote: > > > > > Mike Isely <isely@pobox.com> writes: > > > > > > >The trivial patch at the end of this text fixes DMA w/ LBA48 problems > > > > > > More readable would be: > > > > > > >- if (!hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246) { > > > >+ if (!(hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246)) { > > > > > > if (hwif->pci_dev->device != PCI_DEVICE_ID_PROMISE_20246) { > > > > > > > Yes that is true. But this is Andre's code and it seemed to me to be > > more important to follow his style. But whatever... > > What is wrong with != here? Nothing whatsoever. If I wrote the code I would have used "!=". But when editing code written by someone else I try to adopt that person's style, for better or for worse. Using !(a == b) is more obtuse but it is still unambiguous and readable. So I didn't feel it was that big of a deal to leave it in that form. Besides, there are many MANY other places in that driver far worse than this - just try to follow the code that sets up DMA operations or look at the mostly dead code which tries to identify if it is a cause for an asserted interrupt. If we want to start nitpicking issues as small as this then I invite you to inspect the rest of pdc202xx.c. Have the antacids ready... But in the future, if I post more fixes to the IDE driver (probably won't), I'll sanitize as I go along. I find it amusing that a post from me which describes evidence of completely broken Promise controller DMA goes unresponded to, yet there are concerns about whether to spell code as "a != b" or "!(a == b)". -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed 2002-09-05 14:47 ` Mike Isely @ 2002-09-05 14:56 ` Tomas Szepe 2002-09-05 15:12 ` Mike Isely 0 siblings, 1 reply; 35+ messages in thread From: Tomas Szepe @ 2002-09-05 14:56 UTC (permalink / raw) To: Mike Isely Cc: Horst von Brand, Henning P. Schmiedehausen, linux-kernel@vger.kernel.org > But in the future, if I post more fixes to the IDE driver (probably > won't), I'll sanitize as I go along. >From what Andre said in the past I've gathered he's very much ok with code sanitizing and cleanups... Knock yourself out if you please. > I find it amusing that a post from me which describes evidence of > completely broken Promise controller DMA goes unresponded to, yet there > are concerns about whether to spell code as "a != b" or "!(a == b)". Well, your patch is obviously correct -- there's not much to comment on. T. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed 2002-09-05 14:56 ` Tomas Szepe @ 2002-09-05 15:12 ` Mike Isely 0 siblings, 0 replies; 35+ messages in thread From: Mike Isely @ 2002-09-05 15:12 UTC (permalink / raw) To: Tomas Szepe Cc: Horst von Brand, Henning P. Schmiedehausen, linux-kernel@vger.kernel.org On Thu, 5 Sep 2002, Tomas Szepe wrote: > > But in the future, if I post more fixes to the IDE driver (probably > > won't), I'll sanitize as I go along. > > From what Andre said in the past I've gathered he's very much ok with > code sanitizing and cleanups... Knock yourself out if you please. > > > I find it amusing that a post from me which describes evidence of > > completely broken Promise controller DMA goes unresponded to, yet there > > are concerns about whether to spell code as "a != b" or "!(a == b)". > > Well, your patch is obviously correct -- there's not much to comment on. > I was refering to the longer unanswered messages posted over the weekend (search for subject "trashed") asking for guidance on how to proceed. Having never debugged IDE before, I was hoping for some help. -Mike | Mike Isely | PGP fingerprint POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8 | (spam-foiling address) | ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed 2002-09-05 5:54 ` [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed Mike Isely 2002-09-05 12:47 ` Henning P. Schmiedehausen @ 2002-09-05 14:41 ` Tomas Szepe 1 sibling, 0 replies; 35+ messages in thread From: Tomas Szepe @ 2002-09-05 14:41 UTC (permalink / raw) To: Mike Isely; +Cc: lkml > - if (!hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246) { > + if (!(hwif->pci_dev->device == PCI_DEVICE_ID_PROMISE_20246)) { Good eye, btw. I was looking at this line a couple times and always assumed this kind of obfuscation had a purpose of some sort. And it does after all! It's a bug :) ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2002-09-05 15:08 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-08-29 14:06 2.4.20-pre4-ac1 trashed my system Mike Isely 2002-08-29 15:32 ` Alan Cox 2002-08-29 17:15 ` Andre Hedrick 2002-08-29 18:02 ` Mike Isely 2002-08-29 19:15 ` Mike Isely 2002-08-29 19:28 ` Alan Cox 2002-08-29 19:32 ` Mike Isely 2002-08-30 7:07 ` Mike Isely 2002-08-31 5:04 ` Mike Isely 2002-08-31 5:57 ` Andre Hedrick 2002-08-31 6:07 ` Mike Isely 2002-08-31 6:24 ` Andre Hedrick 2002-08-31 6:57 ` Mike Isely 2002-09-01 5:15 ` Mike Isely 2002-09-02 8:19 ` Joachim Breuer 2002-09-05 6:05 ` Mike Isely 2002-09-02 8:16 ` Joachim Breuer 2002-09-03 12:41 ` mbs 2002-09-03 14:34 ` Mike Isely 2002-09-03 16:00 ` Alan Cox 2002-09-04 10:21 ` Rogier Wolff 2002-09-03 15:59 ` Alan Cox 2002-09-03 18:33 ` mbs 2002-09-01 2:59 ` Mike Isely 2002-08-31 10:54 ` Vojtech Pavlik 2002-09-05 5:54 ` [PATCH] 2.4.20-pre5-ac2: Promise Controller LBA48 DMA fixed Mike Isely 2002-09-05 12:47 ` Henning P. Schmiedehausen 2002-09-05 14:12 ` Mike Isely 2002-09-05 14:31 ` Alan Cox 2002-09-05 14:33 ` Mike Isely 2002-09-05 14:35 ` Horst von Brand 2002-09-05 14:47 ` Mike Isely 2002-09-05 14:56 ` Tomas Szepe 2002-09-05 15:12 ` Mike Isely 2002-09-05 14:41 ` Tomas Szepe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox