public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] IDE TCQ #4
@ 2002-04-15 19:28 Petr Vandrovec
  2002-04-16 10:25 ` Jens Axboe
  0 siblings, 1 reply; 47+ messages in thread
From: Petr Vandrovec @ 2002-04-15 19:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel

On 15 Apr 02 at 21:11, Petr Vandrovec wrote:
> On 15 Apr 02 at 21:00, Jens Axboe wrote:
> > > 
> > > NULL pointer ...
> > 
> > Could you decode that? It doesn't look like any of your drives support
> > TCQ, it should have enabled them right here:
> 
> They were already decoded... Also others reported that - after accessing
> /proc/ide/ide0/hda/identify system dies... I believe that passing
> hand-created request to ide_raw_taskfile corrupts drive->free_req,
> and so subsequent drive command after this cat finds that 
> drive->free_req.next is NULL and dies.

ide_raw_taskfile() sets rq.special to &ar - and &ar is on the stack,
in this function. Later it falls through to __ide_end_request(), which
does

  ar = rq->special;
  ...
  if (ar)
    ata_ar_put(drive, ar);
    
which adds this ar into drive's free_req chain unconditionally. Maybe 
ata_ar_put should check for ar_queue validity. And where ar_queue
member is initialized (or at least cleared) in this case at all?

Unfortunately here my knowledge ends.
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz
                                                

^ permalink raw reply	[flat|nested] 47+ messages in thread
* Re: [PATCH] IDE TCQ #4
@ 2002-04-18 22:05 Andries.Brouwer
  0 siblings, 0 replies; 47+ messages in thread
From: Andries.Brouwer @ 2002-04-18 22:05 UTC (permalink / raw)
  To: alan, axboe; +Cc: dalecki, linux-kernel

    > It sure is... sr doesn't do it and lots of others don't as well, so I
    > suppose we could rip it out. We already require reblocking with loop in
    > those cases anyway.

As a stopgap measure, not because we are so proud of that solution.

    For the file system ones. It would be nice to be able to handle non power
    of two block sizes as well through the block interface (even if it means we
    hand back a 4K buffer that the caller is required to know is partly full).
    That would remove a lot of special case magic for audio/video

Yes, it is as if non power-of-two block sizes are getting more
common. In the good old days one occasionally saw IBM SCSI disks
with strange sector size, difficult to read, but recently there
have been many cases of (power-of-two plus metadata), with e.g.
528 or 572 bytes. There have also been cases of small power-of-two,
like 64 or 256 bytes.
All kinds of strange constructions can be avoided if the block layer
just handles arbitrary block size, up to some reasonable limit.

Andries

^ permalink raw reply	[flat|nested] 47+ messages in thread
* Re: [PATCH] IDE TCQ #4
@ 2002-04-15 19:29 Petr Vandrovec
  2002-04-15 19:00 ` Jens Axboe
  0 siblings, 1 reply; 47+ messages in thread
From: Petr Vandrovec @ 2002-04-15 19:29 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel

On 15 Apr 02 at 19:41, Petr Vandrovec wrote:
> 
> If I parsed file correctly (it is 83 decimal word, yes?), WD's
> WDC WD1200JB-00CRA0 supports TCQ too. I'm still deciding which of
> TCQ #X and IDE #YY patches should be aplied to 2.5.8 to get optimal
> results (and I have to disconnect slaves...).

I applied TCQ#4 only...

I have to swap my host, probably, or there is some other problem.
I got two unexpected interrupts, then machine looked fine, but shortly 
thereafter oops appeared. /proc/ide/ide0/hda/tcq showed 'not active'
before it crashed. Machine is SMP.

NULL pointer ...

EAX=EDX=EBX = 0
ECX,ESP,EBP = some stack addresses
ESI=ide_hwifs + 248 / 15400
EDI=ide_hwifs + 0

EIP=idedisk_do_request + 210/632
    start_request + 690/807
    __elv_next_request + 10/14
    ide_queue_command + 422/647
    ide_do_request + 86/122
    do_ide_request + 18/22
    generic_unplug_device + 86/135
    __run_task_queue + 179/190
    do_page_cache_readahead + 351/379
    page_cache_readahead + 256/264
    do_generic_file_read + 95/782
    generic_file_read + 126/304
    file_read_actor + 0 (passed as argument to ^^^)
    sys_read + 150/285
    syscall_call + 7/...

# ATA/IDE/MFM/RLL support
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_IDE_TCQ=y
CONFIG_BLK_DEV_IDE_TCQ_FULL=y
CONFIG_BLK_DEV_IDE_TCQ_DEFAULT=y
CONFIG_BLK_DEV_IDE_TCQ_DEPTH=8
CONFIG_IDEDMA_AUTO=y

Linux version 2.5.8-smp (root@vana) (gcc version 2.95.4 20011002 (Debian prerelease)) #1 SMP Mon Apr 15 19:59:12 CEST 2002
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000000fff0000 (usable)
 BIOS-e820: 000000000fff0000 - 000000000fff8000 (ACPI data)
 BIOS-e820: 000000000fff8000 - 0000000010000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
255MB LOWMEM available.
found SMP MP-table at 000fb120
hm, page 000fb000 reserved twice.
hm, page 000fc000 reserved twice.
hm, page 000f5000 reserved twice.
hm, page 000f6000 reserved twice.
On node 0 totalpages: 65520
zone(0): 4096 pages.
zone(1): 61424 pages.
zone(2): 0 pages.
Intel MultiProcessor Specification v1.1
    Virtual Wire compatibility mode.
OEM ID: VIA      Product ID: VT3075       APIC at: 0xFEE00000
Processor #0 Pentium(tm) Pro APIC version 17
Processor #1 Pentium(tm) Pro APIC version 17
I/O APIC #2 Version 17 at 0xFEC00000.
Processors: 2
Kernel command line: auto BOOT_IMAGE=Linux ro root=301 ramdisk=0 devfs=nomount video=matrox:vesa:0x105,fv:100,lower:1 pciorder=1:0 video=scrollback:0
Initializing CPU#0
Detected 797.071 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 1589.24 BogoMIPS
Memory: 256660k/262080k available (1395k kernel code, 5032k reserved, 549k data, 240k init, 0k highmem)
Dentry-cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode-cache hash table entries: 16384 (order: 5, 131072 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer-cache hash table entries: 16384 (order: 4, 65536 bytes)
CPU: Before vendor init, caps: 0387fbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU: After vendor init, caps: 0387fbff 00000000 00000000 00000000
CPU serial number disabled.
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0383fbff 00000000 00000000 00000000
CPU:             Common caps: 0383fbff 00000000 00000000 00000000
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
CPU: Before vendor init, caps: 0383fbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU: After vendor init, caps: 0383fbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0383fbff 00000000 00000000 00000000
CPU:             Common caps: 0383fbff 00000000 00000000 00000000
CPU0: Intel Pentium III (Coppermine) stepping 03
per-CPU timeslice cutoff: 731.12 usecs.
task migration cache decay timeout: 10 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000004
ESR value after enabling vector: 00000000
Booting processor 1/1 eip 2000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 1592.52 BogoMIPS
CPU: Before vendor init, caps: 0387fbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU: After vendor init, caps: 0387fbff 00000000 00000000 00000000
CPU serial number disabled.
Intel machine check reporting enabled on CPU#1.
CPU:     After generic, caps: 0383fbff 00000000 00000000 00000000
CPU:             Common caps: 0383fbff 00000000 00000000 00000000
CPU1: Intel Pentium III (Coppermine) stepping 03
Total of 2 processors activated (3181.77 BogoMIPS).
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-20, 2-21, 2-22, 2-23 not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 23.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
.......    : physical APIC id: 02
.... register #01: 00170011
.......     : max redirection entries: 0017
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 00000000
.......     : arbitration: 00
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 000 00  1    0    0   0   0    0    0    00
 01 003 03  0    0    0   0   0    1    1    39
 02 003 03  0    0    0   0   0    1    1    31
 03 003 03  0    0    0   0   0    1    1    41
 04 003 03  0    0    0   0   0    1    1    49
 05 003 03  1    1    0   1   0    1    1    51
 06 003 03  0    0    0   0   0    1    1    59
 07 003 03  0    0    0   0   0    1    1    61
 08 003 03  0    0    0   0   0    1    1    69
 09 003 03  1    1    0   1   0    1    1    71
 0a 003 03  1    1    0   1   0    1    1    79
 0b 003 03  1    1    0   1   0    1    1    81
 0c 003 03  0    0    0   0   0    1    1    89
 0d 003 03  0    0    0   0   0    1    1    91
 0e 003 03  0    0    0   0   0    1    1    99
 0f 003 03  0    0    0   0   0    1    1    A1
 10 003 03  1    1    0   1   0    1    1    A9
 11 003 03  1    1    0   1   0    1    1    B1
 12 003 03  1    1    0   1   0    1    1    B9
 13 003 03  1    1    0   1   0    1    1    C1
 14 000 00  1    0    0   0   0    0    0    00
 15 000 00  1    0    0   0   0    0    0    00
 16 000 00  1    0    0   0   0    0    0    00
 17 000 00  1    0    0   0   0    0    0    00
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ16 -> 0:16
IRQ17 -> 0:17
IRQ18 -> 0:18
IRQ19 -> 0:19
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 797.0946 MHz.
..... host bus clock speed is 99.6368 MHz.
cpu: 0, clocks: 996368, slice: 332122
CPU0<T0:996368,T1:664240,D:6,S:332122,C:996368>
cpu: 1, clocks: 996368, slice: 332122
CPU1<T0:996368,T1:332112,D:12,S:332122,C:996368>
checking TSC synchronization across CPUs: passed.
mtrr: your CPUs had inconsistent variable MTRR settings
mtrr: probably your BIOS does not setup all CPUs
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
PCI: PCI BIOS revision 2.10 entry at 0xfdb01, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI->APIC IRQ transform: (B0,I7,P3) -> 19
PCI->APIC IRQ transform: (B0,I7,P3) -> 19
PCI->APIC IRQ transform: (B0,I10,P0) -> 18
PCI->APIC IRQ transform: (B0,I11,P0) -> 19
PCI->APIC IRQ transform: (B0,I11,P0) -> 19
PCI->APIC IRQ transform: (B0,I14,P0) -> 18
PCI->APIC IRQ transform: (B1,I0,P0) -> 16
PCI: Enabling Via external APIC routing
PCI: Via IRQ fixup for 00:07.3, from 10 to 3
PCI: Via IRQ fixup for 00:07.2, from 10 to 3
IA-32 Microcode Update Driver: v1.11 <tigran@veritas.com>
Starting kswapd
BIO: pool of 256 setup, 14Kb (56 bytes/bio)
biovec: init pool 0, 1 entries, 12 bytes
biovec: init pool 1, 4 entries, 48 bytes
biovec: init pool 2, 16 entries, 192 bytes
biovec: init pool 3, 64 entries, 768 bytes
biovec: init pool 4, 128 entries, 1536 bytes
biovec: init pool 5, 256 entries, 3072 bytes
Journalled Block Device driver loaded
devfs: v1.13 (20020406) Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x0
parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE]
parport0: cpp_daisy: aa5500ff(38)
parport0: assign_addrs: aa5500ff(38)
parport0: cpp_daisy: aa5500ff(38)
parport0: assign_addrs: aa5500ff(38)
parport_pc: Via 686A parallel port: io=0x378
i2c-core.o: i2c core module
i2c-dev.o: i2c /dev entries driver module
i2c-core.o: driver i2c-dev dummy driver registered.
i2c-algo-bit.o: i2c bit algorithm module
i2c-proc.o version 2.6.1 (20010825)
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.11
Non-volatile memory driver v1.1
block: 512 slots per queue, batch=32
Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 203M
agpgart: Detected Via Apollo Pro chipset
agpgart: AGP aperture is 256M @ 0xe0000000
[drm] AGP 0.99 on VIA Apollo Pro @ 0xe0000000 256MB
[drm] Initialized mga 3.0.2 20010321 on minor 0
Uniform Multi-Platform E-IDE driver ver.:7.0.0
ide: system bus speed 33MHz
VIA Technologies, Inc. Bus Master IDE: IDE controller on PCI slot 00:07.1
VIA Technologies, Inc. Bus Master IDE: chipset revision 16
VIA Technologies, Inc. Bus Master IDE: not 100%% native mode: will probe irqs later
VP_IDE: VIA vt82c686a (rev 22) IDE UDMA66 controller on pci00:07.1
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
hda: WDC WD1200JB-00CRA0, ATA DISK drive
hdc: TOSHIBA MK6409MAV, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
ide: unexpected interrupt
hda: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=232581/16/63, UDMA(66)
ide: unexpected interrupt
hdc: 12685680 sectors (6495 MB), CHS=13424/15/63, UDMA(33)
Partition check:
 /dev/ide/host0/bus0/target0/lun0: p1 p2
 /dev/ide/host1/bus1/target0/lun0: [PTBL] [839/240/63] p1 p2 p4
SCSI subsystem driver Revision: 1.00
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
i2c-core.o: driver maven registered.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 1024 buckets, 16Kbytes
TCP: Hash tables configured (established 8192 bind 10922)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
matroxfb: Matrox G450 detected
matroxfb: MTRR's turned on
matroxfb: 1024x768x8bpp (virtual: 1024x16380)
matroxfb: framebuffer at 0xDA000000, mapped to 0xd084d000, size 16777216
Console: switching to colour frame buffer device 128x48
fb0: MATROX VGA frame buffer device
matroxfb_crtc2: secondary head of fb0 was registered as fb1
i2c-dev.o: Registered 'DDC:fb0 #0 on i2c-matroxfb' as minor 0
i2c-core.o: adapter DDC:fb0 #0 on i2c-matroxfb registered as adapter 0.
PCI: Enabling device 00:09.0 (0080 -> 0082)
matroxfb: Matrox Millennium (PCI) detected
matroxfb: MTRR's turned on
matroxfb: 1024x768x8bpp (virtual: 1024x8192)
matroxfb: framebuffer at 0xDE000000, mapped to 0xd2853000, size 8388608
fb2: MATROX VGA frame buffer device
fb2: initializing hardware
i2c-dev.o: Registered 'DDC:fb2 #0 on i2c-matroxfb' as minor 1
i2c-core.o: adapter DDC:fb2 #0 on i2c-matroxfb registered as adapter 1.
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 240k freed
Adding Swap: 1024120k swap-space (priority -1)
es1371: version v0.30 time 19:58:57 Apr 15 2002
es1371: found chip, vendor id 0x1274 device id 0x1371 revision 0x07
es1371: found es1371 rev 7 at io 0xd000 irq 18
es1371: features: joystick 0x0
ac97_codec: AC97 Audio codec, id: 0x8384:0x7608 (SigmaTel STAC9708)
Linux Tulip driver version 1.1.12 (Mar 07, 2002)
tulip0: Transceiver selection forced to 100baseTx.
tulip0:  EEPROM default media type Autosense.
tulip0:  Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block.
tulip0:  Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block.
tulip0:  Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block.
tulip0:  Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block.
eth0: Digital DS21143 Tulip rev 48 at 0xd30f2f80, 00:80:AD:83:5C:4D, IRQ 11.
eth0: Using user-specified media 100baseTx.
NET4: Linux IPX 0.48 for NET4.0
IPX Portions Copyright (c) 1995 Caldera, Inc.
IPX Portions Copyright (c) 2000, 2001 Conectiva, Inc.

^ permalink raw reply	[flat|nested] 47+ messages in thread
* Re: [PATCH] IDE TCQ #4
@ 2002-04-15 19:11 Petr Vandrovec
  0 siblings, 0 replies; 47+ messages in thread
From: Petr Vandrovec @ 2002-04-15 19:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On 15 Apr 02 at 21:00, Jens Axboe wrote:
> > 
> > NULL pointer ...
> 
> Could you decode that? It doesn't look like any of your drives support
> TCQ, it should have enabled them right here:

They were already decoded... Also others reported that - after accessing
/proc/ide/ide0/hda/identify system dies... I believe that passing
hand-created request to ide_raw_taskfile corrupts drive->free_req,
and so subsequent drive command after this cat finds that 
drive->free_req.next is NULL and dies.
 
> > Uniform Multi-Platform E-IDE driver ver.:7.0.0
> > ide: system bus speed 33MHz
> > VIA Technologies, Inc. Bus Master IDE: IDE controller on PCI slot 00:07.1
> > VIA Technologies, Inc. Bus Master IDE: chipset revision 16
> > VIA Technologies, Inc. Bus Master IDE: not 100%% native mode: will probe irqs later
> > VP_IDE: VIA vt82c686a (rev 22) IDE UDMA66 controller on pci00:07.1
> >     ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
> >     ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
> > hda: WDC WD1200JB-00CRA0, ATA DISK drive
> > hdc: TOSHIBA MK6409MAV, ATA DISK drive
> > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> > ide1 at 0x170-0x177,0x376 on irq 15
> > ide: unexpected interrupt
> 
> hda: tagged command queueing enabled, command queue depth 8
> 
> is missing, TCQ is not supported on this drive.

I expected that too, but I thought that unexpected interrupt may
have something related with this. Strange. I'll try to find some 
better identify parser than my eyes.
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz
                                                

^ permalink raw reply	[flat|nested] 47+ messages in thread
* Re: [PATCH] IDE TCQ #4
@ 2002-04-15 18:41 Petr Vandrovec
  2002-04-15 18:26 ` Jens Axboe
  2002-04-16  5:11 ` Martin Dalecki
  0 siblings, 2 replies; 47+ messages in thread
From: Petr Vandrovec @ 2002-04-15 18:41 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On 15 Apr 02 at 18:44, Jens Axboe wrote:

> > You can run a IDENTIFY_DEVICE from user space with the task ioctls and
> > look at word 83 -- bit 1 and 14 must be set for TCQ to be supported. If
> > you give me the model identifier from the IBM drive, I can tell you if
> > it has tcq or not...
> > 
> > I'll write a small util to detect this tomorrow and send it to you + the
> > list.
> 
> Duh, you can of course just look at /proc/ide/ideX/hdY/identify and
> parse that. The info above is still valid for that, of course :-)

If I parsed file correctly (it is 83 decimal word, yes?), WD's
WDC WD1200JB-00CRA0 supports TCQ too. I'm still deciding which of
TCQ #X and IDE #YY patches should be aplied to 2.5.8 to get optimal
results (and I have to disconnect slaves...).
                                                Best regards,
                                                    Petr Vandrovec
                                                    vandrove@vc.cvut.cz
                                                    

^ permalink raw reply	[flat|nested] 47+ messages in thread
* [PATCH] IDE TCQ #4
@ 2002-04-15 12:56 Jens Axboe
  2002-04-15 12:24 ` Martin Dalecki
  2002-04-15 16:13 ` Aaron Tiensivu
  0 siblings, 2 replies; 47+ messages in thread
From: Jens Axboe @ 2002-04-15 12:56 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Martin Dalecki

Hi,

Updated the patch to 2.5.8 (painful). Changes:

- Expand configure help options a bit
- Fix xconfig bug (thanks to
- Decrease queue depth if a command takes too long to complete
- Test master/slave stuff. It works, but one device can heavily starve
  another. This is the simple approach right now, means that one device
  will wait until the other is completely idle before starting any
  commands This is not necessary since we can have queued commands on
  both devices at the same time. TODO.
- Add proc output for oldest command, just for testing.
- pci_dev compile fixes.
- Make sure ide-disk doesn't BUG if TCQ is not used, basically this was
  fixed by off-loading the using_tcq setting to ide-tcq.
- Remove warning about 'queued feature set not supported'
- Abstract ide_tcq_wait_dataphase() into a function

Small enough to include here, since 2.5.8 already had some of the tcq
stuff.

diff -urN -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.8/drivers/block/ll_rw_blk.c linux/drivers/block/ll_rw_blk.c
--- /opt/kernel/linux-2.5.8/drivers/block/ll_rw_blk.c	2002-04-15 07:59:53.000000000 +0200
+++ linux/drivers/block/ll_rw_blk.c	2002-04-15 08:42:23.000000000 +0200
@@ -899,10 +899,10 @@
 	spin_lock_prefetch(q->queue_lock);
 
 	generic_unplug_device(q);
-	add_wait_queue(&rl->wait, &wait);
+	add_wait_queue_exclusive(&rl->wait, &wait);
 	do {
 		set_current_state(TASK_UNINTERRUPTIBLE);
-		if (rl->count < batch_requests)
+		if (!rl->count)
 			schedule();
 		spin_lock_irq(q->queue_lock);
 		rq = get_request(q, rw);
@@ -1720,9 +1720,11 @@
 	 * Free request slots per queue.
 	 * (Half for reads, half for writes)
 	 */
-	queue_nr_requests = 64;
-	if (total_ram > MB(32))
-		queue_nr_requests = 256;
+	queue_nr_requests = (total_ram >> 8) & ~15;	/* One per quarter-megabyte */
+	if (queue_nr_requests < 32)
+		queue_nr_requests = 32;
+	if (queue_nr_requests > 512)
+		queue_nr_requests = 512;
 
 	/*
 	 * Batch frees according to queue length
diff -urN -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.8/drivers/ide/Config.help linux/drivers/ide/Config.help
--- /opt/kernel/linux-2.5.8/drivers/ide/Config.help	2002-04-15 07:59:53.000000000 +0200
+++ linux/drivers/ide/Config.help	2002-04-10 12:51:43.000000000 +0200
@@ -753,9 +753,17 @@
 
   If you have such a drive, say Y here.
 
-CONFIG_BLK_DEV_IDE_TCQ_DEFAULT
-  Enabled tagged command queueing unconditionally on drives that report
-  support for it.
+CONFIG_BLK_DEV_IDE_TCQ_FULL
+  When a command completes from the drive, the SERVICE bit is checked to
+  see if other queued commands are ready to be started. Doing this
+  immediately after a command completes has a tendency to 'starve' the
+  device hardware queue, since we risk emptying the queue completely
+  before starting any new commands. This shows up during stressing the
+  drive as a /\/\/\/\ queue size balance, where we could instead try and
+  maintain a minimum queue size and get a /---------\ graph instead.
+
+  Saying Y here will attempt to always keep the queue full when possible
+  at the cost of possibly increasing command turn-around latency.
 
   Generally say Y here.
 
@@ -766,6 +774,18 @@
   You probably just want the default of 32 here. If you enter an invalid
   number, the default value will be used.
 
+CONFIG_BLK_DEV_IDE_TCQ_DEFAULT
+  Enabled tagged command queueing unconditionally on drives that report
+  support for it. Regardless of the chosen value here, tagging can be
+  controlled at run time:
+
+  echo "using_tcq:32" > /proc/ide/hdX/settings
+
+  where any value between 1-32 selects chosen queue depth and enables
+  TCQ, and 0 disables it.
+
+  Generally say Y here.
+
 CONFIG_BLK_DEV_IT8172
   Say Y here to support the on-board IDE controller on the Integrated
   Technology Express, Inc. ITE8172 SBC.  Vendor page at
diff -urN -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.8/drivers/ide/Config.in linux/drivers/ide/Config.in
--- /opt/kernel/linux-2.5.8/drivers/ide/Config.in	2002-04-15 07:59:53.000000000 +0200
+++ linux/drivers/ide/Config.in	2002-04-15 09:14:03.000000000 +0200
@@ -48,10 +48,11 @@
          dep_bool '    Enable DMA only for disks ' CONFIG_IDEDMA_ONLYDISK $CONFIG_IDEDMA_PCI_AUTO
 	 define_bool CONFIG_BLK_DEV_IDEDMA $CONFIG_BLK_DEV_IDEDMA_PCI
 	 dep_bool '    ATA tagged command queueing' CONFIG_BLK_DEV_IDE_TCQ $CONFIG_BLK_DEV_IDEDMA_PCI
-	   dep_bool '      TCQ on by default' CONFIG_BLK_DEV_IDE_TCQ_DEFAULT $CONFIG_BLK_DEV_IDE_TCQ
-	   if [ $CONFIG_BLK_DEV_IDE_TCQ_DEFAULT != "n" ]; then
-		int '      Default queue depth' CONFIG_BLK_DEV_IDE_TCQ_DEPTH 32
-	   fi
+	 dep_bool '      Attempt to keep queue full' CONFIG_BLK_DEV_IDE_TCQ_FULL $CONFIG_BLK_DEV_IDE_TCQ
+	 dep_bool '      TCQ on by default' CONFIG_BLK_DEV_IDE_TCQ_DEFAULT $CONFIG_BLK_DEV_IDE_TCQ
+	 if [ "$CONFIG_BLK_DEV_IDE_TCQ" != "n" ]; then
+	    int '      Default queue depth' CONFIG_BLK_DEV_IDE_TCQ_DEPTH 8
+	 fi
 	 dep_bool '    ATA Work(s) In Progress (EXPERIMENTAL)' CONFIG_IDEDMA_PCI_WIP $CONFIG_BLK_DEV_IDEDMA_PCI $CONFIG_EXPERIMENTAL
 	 dep_bool '    Good-Bad DMA Model-Firmware (WIP)' CONFIG_IDEDMA_NEW_DRIVE_LISTINGS $CONFIG_IDEDMA_PCI_WIP
 	 dep_bool '    AEC62XX chipset support' CONFIG_BLK_DEV_AEC62XX $CONFIG_BLK_DEV_IDEDMA_PCI
diff -urN -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.8/drivers/ide/ide.c linux/drivers/ide/ide.c
--- /opt/kernel/linux-2.5.8/drivers/ide/ide.c	2002-04-15 10:05:03.000000000 +0200
+++ linux/drivers/ide/ide.c	2002-04-15 14:31:08.000000000 +0200
@@ -381,8 +381,23 @@
 		add_blkdev_randomness(major(rq->rq_dev));
 
 		spin_lock_irqsave(&ide_lock, flags);
+
+		if ((jiffies - ar->ar_time > ATA_AR_MAX_TURNAROUND) && drive->queue_depth > 1) {
+			printk("%s: exceeded max command turn-around time (%d seconds)\n", drive->name, ATA_AR_MAX_TURNAROUND / HZ);
+			drive->queue_depth >>= 1;
+		}
+
+		if (jiffies - ar->ar_time > drive->tcq->oldest_command)
+			drive->tcq->oldest_command = jiffies - ar->ar_time;
+
 		ata_ar_put(drive, ar);
 		end_that_request_last(rq);
+		/*
+		 * IDE_SET_CUR_TAG(drive, IDE_INACTIVE_TAG) will do this
+		 * too, but it really belongs here. assumes that the
+		 * ended request is the active one.
+		 */
+		HWGROUP(drive)->rq = NULL;
 		spin_unlock_irqrestore(&ide_lock, flags);
 	}
 }
@@ -770,8 +785,7 @@
 		struct ata_taskfile *args = &ar->ar_task;
 
 		rq->errors = !OK_STAT(stat, READY_STAT, BAD_STAT);
-		if (args && args->taskfile.command == WIN_NOP)
-			printk(KERN_INFO "%s: NOP completed\n", __FUNCTION__);
+
 		if (args) {
 			args->taskfile.feature = err;
 			args->taskfile.sector_count = IN_BYTE(IDE_NSECTOR_REG);
@@ -1101,11 +1115,7 @@
 	while ((read_timer() - hwif->last_time) < DISK_RECOVERY_TIME);
 #endif
 
-	if (test_bit(IDE_DMA, &HWGROUP(drive)->flags))
-		printk("start_request: auch, DMA in progress 1\n");
 	SELECT_DRIVE(hwif, drive);
-	if (test_bit(IDE_DMA, &HWGROUP(drive)->flags))
-		printk("start_request: auch, DMA in progress 2\n");
 	if (ide_wait_stat(&startstop, drive, drive->ready_stat,
 			  BUSY_STAT|DRQ_STAT, WAIT_READY)) {
 		printk(KERN_WARNING "%s: drive not ready for command\n", drive->name);
@@ -1277,6 +1287,170 @@
 	return best;
 }
 
+static ide_drive_t *ide_choose_drive(ide_hwgroup_t *hwgroup)
+{
+	ide_drive_t *drive = choose_drive(hwgroup);
+	unsigned long sleep = 0;
+
+	if (drive)
+		return drive;
+
+	hwgroup->rq = NULL;
+	drive = hwgroup->drive;
+	do {
+		if (drive->PADAM_sleep && (!sleep || time_after(sleep, drive->PADAM_sleep)))
+			sleep = drive->PADAM_sleep;
+	} while ((drive = drive->next) != hwgroup->drive);
+
+	if (sleep) {
+		/*
+		 * Take a short snooze, and then wake up this hwgroup
+		 * again.  This gives other hwgroups on the same a
+		 * chance to play fairly with us, just in case there
+		 * are big differences in relative throughputs.. don't
+		 * want to hog the cpu too much.
+		 */
+		if (0 < (signed long)(jiffies + WAIT_MIN_SLEEP - sleep))
+			sleep = jiffies + WAIT_MIN_SLEEP;
+
+		if (timer_pending(&hwgroup->timer))
+			printk("ide_set_handler: timer already active\n");
+
+		set_bit(IDE_SLEEP, &hwgroup->flags);
+		mod_timer(&hwgroup->timer, sleep);
+		/* we purposely leave hwgroup busy while
+		 * sleeping */
+	} else {
+		/* Ugly, but how can we sleep for the lock
+		 * otherwise? perhaps from tq_disk? */
+		ide_release_lock(&ide_intr_lock);/* for atari only */
+		clear_bit(IDE_BUSY, &hwgroup->flags);
+	}
+
+	return NULL;
+}
+
+#ifdef CONFIG_BLK_DEV_IDE_TCQ
+ide_startstop_t ide_check_service(ide_drive_t *drive);
+#else
+#define ide_check_service(drive)	(ide_stopped)
+#endif
+
+/*
+ * feed commands to a drive until it barfs. used to be part of ide_do_request.
+ * called with ide_lock/DRIVE_LOCK held and busy hwgroup
+ */
+static void ide_queue_commands(ide_drive_t *drive, int masked_irq)
+{
+	ide_hwgroup_t *hwgroup = HWGROUP(drive);
+	ide_startstop_t startstop = -1;
+	struct request *rq;
+	int do_service = 0;
+
+	do {
+		rq = NULL;
+
+		if (!test_bit(IDE_BUSY, &hwgroup->flags))
+			printk("%s: hwgroup not busy while queueing\n", drive->name);
+
+		/*
+		 * abort early if we can't queue another command. for non
+		 * tcq, ide_can_queue is always 1 since we never get here
+		 * unless the drive is idle.
+		 */
+		if (!ide_can_queue(drive)) {
+			if (!ide_pending_commands(drive))
+				clear_bit(IDE_BUSY, &hwgroup->flags);
+			break;
+		}
+
+		drive->PADAM_sleep = 0;
+		drive->PADAM_service_start = jiffies;
+
+		if (test_bit(IDE_DMA, &hwgroup->flags)) {
+			printk("ide_do_request: DMA in progress...\n");
+			break;
+		}
+
+		/*
+		 * there's a small window between where the queue could be
+		 * replugged while we are in here when using tcq (in which
+		 * case the queue is probably empty anyways...), so check
+		 * and leave if appropriate. When not using tcq, this is
+		 * still a severe BUG!
+		 */
+		if (blk_queue_plugged(&drive->queue)) {
+			BUG_ON(!drive->using_tcq);
+			break;
+		}
+
+		if (!(rq = elv_next_request(&drive->queue))) {
+			if (!ide_pending_commands(drive))
+				clear_bit(IDE_BUSY, &hwgroup->flags);
+			hwgroup->rq = NULL;
+			break;
+		}
+
+		/*
+		 * if there are queued commands, we can't start a non-fs
+		 * request (really, a non-queuable command) until the
+		 * queue is empty
+		 */
+		if (!(rq->flags & REQ_CMD) && ide_pending_commands(drive))
+			break;
+
+		hwgroup->rq = rq;
+
+service:
+		/*
+		 * Some systems have trouble with IDE IRQs arriving while
+		 * the driver is still setting things up.  So, here we disable
+		 * the IRQ used by this interface while the request is being
+		 * started.  This may look bad at first, but pretty much the
+		 * same thing happens anyway when any interrupt comes in, IDE
+		 * or otherwise -- the kernel masks the IRQ while it is being
+		 * handled.
+		 */
+		if (masked_irq && HWIF(drive)->irq != masked_irq)
+			disable_irq_nosync(HWIF(drive)->irq);
+
+		spin_unlock(&ide_lock);
+		ide__sti();	/* allow other IRQs while we start this request */
+		if (!do_service)
+			startstop = start_request(drive, rq);
+		else
+			startstop = ide_check_service(drive);
+
+		spin_lock_irq(&ide_lock);
+		if (masked_irq && HWIF(drive)->irq != masked_irq)
+			enable_irq(HWIF(drive)->irq);
+
+		/*
+		 * command started, we are busy
+		 */
+		if (startstop == ide_started)
+			break;
+
+		/*
+		 * start_request() can return either ide_stopped (no command
+		 * was started), ide_started (command started, don't queue
+		 * more), or ide_released (command started, try and queue
+		 * more).
+		 */
+#if 0
+		if (startstop == ide_stopped)
+			set_bit(IDE_BUSY, &hwgroup->flags);
+#endif
+
+	} while (1);
+
+	if (startstop == ide_started)
+		return;
+
+	if ((do_service = drive->service_pending))
+		goto service;
+}
+
 /*
  * Issue a new request to a drive from hwgroup
  * Caller must have already done spin_lock_irqsave(&ide_lock, ...)
@@ -1311,115 +1485,34 @@
 {
 	ide_drive_t *drive;
 	struct ata_channel *hwif;
-	ide_startstop_t	startstop;
-	struct request	*rq;
 
 	ide_get_lock(&ide_intr_lock, ide_intr, hwgroup);/* for atari only: POSSIBLY BROKEN HERE(?) */
 
 	__cli();	/* necessary paranoia: ensure IRQs are masked on local CPU */
 
 	while (!test_and_set_bit(IDE_BUSY, &hwgroup->flags)) {
-		drive = choose_drive(hwgroup);
-		if (drive == NULL) {
-			unsigned long sleep = 0;
-			hwgroup->rq = NULL;
-			drive = hwgroup->drive;
-			do {
-				if (drive->PADAM_sleep && (!sleep || time_after(sleep, drive->PADAM_sleep)))
-					sleep = drive->PADAM_sleep;
-			} while ((drive = drive->next) != hwgroup->drive);
-			if (sleep) {
-				/*
-				 * Take a short snooze, and then wake up this hwgroup again.
-				 * This gives other hwgroups on the same a chance to
-				 * play fairly with us, just in case there are big differences
-				 * in relative throughputs.. don't want to hog the cpu too much.
-				 */
-				if (0 < (signed long)(jiffies + WAIT_MIN_SLEEP - sleep))
-					sleep = jiffies + WAIT_MIN_SLEEP;
-#if 1
-				if (timer_pending(&hwgroup->timer))
-					printk("ide_set_handler: timer already active\n");
-#endif
-				set_bit(IDE_SLEEP, &hwgroup->flags);
-				mod_timer(&hwgroup->timer, sleep);
-				/* we purposely leave hwgroup busy while sleeping */
-			} else {
-				/* Ugly, but how can we sleep for the lock otherwise? perhaps from tq_disk? */
-				ide_release_lock(&ide_intr_lock);/* for atari only */
-				clear_bit(IDE_BUSY, &hwgroup->flags);
-			}
-			return;		/* no more work for this hwgroup (for now) */
-		}
+
+		/*
+		 * will clear IDE_BUSY, if appropriate
+		 */
+		if ((drive = ide_choose_drive(hwgroup)) == NULL)
+			break;
+
 		hwif = drive->channel;
-		if (hwgroup->hwif->sharing_irq && hwif != hwgroup->hwif && hwif->io_ports[IDE_CONTROL_OFFSET]) {
+		if (hwgroup->hwif->sharing_irq && hwif != hwgroup->hwif && IDE_CONTROL_REG) {
 			/* set nIEN for previous hwif */
-
 			if (hwif->intrproc)
 				hwif->intrproc(drive);
 			else
-				OUT_BYTE((drive)->ctl|2, hwif->io_ports[IDE_CONTROL_OFFSET]);
+				OUT_BYTE(drive->ctl|2, IDE_CONTROL_REG);
 		}
 		hwgroup->hwif = hwif;
 		hwgroup->drive = drive;
-		drive->PADAM_sleep = 0;
-queue_next:
-		drive->PADAM_service_start = jiffies;
-
-		if (test_bit(IDE_DMA, &hwgroup->flags)) {
-			printk("ide_do_request: DMA in progress...\n");
-			break;
-		}
 
 		/*
-		 * there's a small window between where the queue could be
-		 * replugged while we are in here when using tcq (in which
-		 * case the queue is probably empty anyways...), so check
-		 * and leave if appropriate. When not using tcq, this is
-		 * still a severe BUG!
+		 * main queueing loop
 		 */
-		if (blk_queue_plugged(&drive->queue)) {
-			BUG_ON(!drive->using_tcq);
-			break;
-		}
-
-		/*
-		 * just continuing an interrupted request maybe
-		 */
-		rq = hwgroup->rq = elv_next_request(&drive->queue);
-
-		if (!rq) {
-			if (!ide_pending_commands(drive))
-				clear_bit(IDE_BUSY, &HWGROUP(drive)->flags);
-			break;
-		}
-
-		/*
-		 * Some systems have trouble with IDE IRQs arriving while
-		 * the driver is still setting things up.  So, here we disable
-		 * the IRQ used by this interface while the request is being started.
-		 * This may look bad at first, but pretty much the same thing
-		 * happens anyway when any interrupt comes in, IDE or otherwise
-		 *  -- the kernel masks the IRQ while it is being handled.
-		 */
-		if (masked_irq && hwif->irq != masked_irq)
-			disable_irq_nosync(hwif->irq);
-
-		spin_unlock(&ide_lock);
-		ide__sti();	/* allow other IRQs while we start this request */
-		startstop = start_request(drive, rq);
-
-		spin_lock_irq(&ide_lock);
-		if (masked_irq && hwif->irq != masked_irq)
-			enable_irq(hwif->irq);
-
-		if (startstop == ide_released)
-			goto queue_next;
-		else if (startstop == ide_stopped) {
-			if (test_bit(IDE_DMA, &hwgroup->flags))
-				printk("2nd illegal clear\n");
-			clear_bit(IDE_BUSY, &hwgroup->flags);
-		}
+		ide_queue_commands(drive, masked_irq);
 	}
 }
 
@@ -1673,6 +1766,7 @@
 		goto out_lock;
 
 	if ((handler = hwgroup->handler) == NULL || hwgroup->poll_timeout != 0) {
+		printk("ide: unexpected interrupt\n");
 		/*
 		 * Not expecting an interrupt from this drive.
 		 * That means this could be:
@@ -1685,9 +1779,7 @@
 		 * For PCI, we cannot tell the difference,
 		 * so in that case we just ignore it and hope it goes away.
 		 */
-#ifdef CONFIG_BLK_DEV_IDEPCI
 		if (hwif->pci_dev && !hwif->pci_dev->vendor)
-#endif
 		{
 			/*
 			 * Probably not a shared PCI interrupt,
@@ -1751,7 +1843,8 @@
 		} else {
 			printk("%s: ide_intr: huh? expected NULL handler on exit\n", drive->name);
 		}
-	}
+	} else if (startstop == ide_released)
+		ide_queue_commands(drive, hwif->irq);
 
 out_lock:
 	spin_unlock_irqrestore(&ide_lock, flags);
@@ -2221,6 +2314,8 @@
 	channel->udma_four = old_hwif.udma_four;
 #ifdef CONFIG_BLK_DEV_IDEPCI
 	channel->pci_dev = old_hwif.pci_dev;
+#else
+	channel->pci_dev = NULL;
 #endif
 	channel->straight8 = old_hwif.straight8;
 abort:
@@ -2699,47 +2794,34 @@
 	}
 }
 
-void ide_teardown_commandlist(ide_drive_t *drive)
-{
-	struct pci_dev *pdev= drive->channel->pci_dev;
-	struct list_head *entry;
-
-	list_for_each(entry, &drive->free_req) {
-		struct ata_request *ar = list_ata_entry(entry);
-
-		list_del(&ar->ar_queue);
-		kfree(ar->ar_sg_table);
-		pci_free_consistent(pdev, PRD_SEGMENTS * PRD_BYTES, ar->ar_dmatable_cpu, ar->ar_dmatable);
-		kfree(ar);
-	}
-}
-
 int ide_build_commandlist(ide_drive_t *drive)
 {
 	struct pci_dev *pdev= drive->channel->pci_dev;
+	struct list_head *p;
+	unsigned long flags;
 	struct ata_request *ar;
-	ide_tag_info_t *tcq;
-	int i, err;
+	int i, cur;
 
-	tcq = kmalloc(sizeof(ide_tag_info_t), GFP_ATOMIC);
-	if (!tcq)
-		return -ENOMEM;
+	spin_lock_irqsave(&ide_lock, flags);
 
-	drive->tcq = tcq;
-	memset(drive->tcq, 0, sizeof(ide_tag_info_t));
+	cur = 0;
+	list_for_each(p, &drive->free_req)
+		cur++;
 
-	INIT_LIST_HEAD(&drive->free_req);
-	drive->using_tcq = 0;
+	/*
+	 * for now, just don't shrink it...
+	 */
+	if (drive->queue_depth <= cur) {
+		spin_unlock_irqrestore(&ide_lock, flags);
+		return 0;
+	}
 
-	err = -ENOMEM;
-	for (i = 0; i < drive->queue_depth; i++) {
-		/* Having kzmalloc would help reduce code size at quite
-		 * many places in kernel. */
+	for (i = cur; i < drive->queue_depth; i++) {
 		ar = kmalloc(sizeof(*ar), GFP_ATOMIC);
 		if (!ar)
 			break;
 
-		memset(ar, 0, sizeof(*ar));
+		memset(ar, 0, sizeof(ar));
 		INIT_LIST_HEAD(&ar->ar_queue);
 
 		ar->ar_sg_table = kmalloc(PRD_SEGMENTS * sizeof(struct scatterlist), GFP_ATOMIC);
@@ -2755,29 +2837,40 @@
 			break;
 		}
 
+		
+
 		/*
 		 * pheew, all done, add to list
 		 */
 		list_add_tail(&ar->ar_queue, &drive->free_req);
+		cur++;
 	}
 
-	if (i) {
-		drive->queue_depth = i;
-		if (i >= 1) {
-			drive->using_tcq = 1;
-			drive->tcq->queued = 0;
-			drive->tcq->active_tag = -1;
-			return 0;
-		}
+	drive->queue_depth = cur;
+	spin_unlock_irqrestore(&ide_lock, flags);
+	return 0;
+}
 
-		kfree(drive->tcq);
-		drive->tcq = NULL;
-		err = 0;
-	}
+int ide_init_commandlist(ide_drive_t *drive)
+{
+	INIT_LIST_HEAD(&drive->free_req);
 
-	kfree(drive->tcq);
-	drive->tcq = NULL;
-	return err;
+	return ide_build_commandlist(drive);
+}
+
+void ide_teardown_commandlist(ide_drive_t *drive)
+{
+	struct pci_dev *pdev= drive->channel->pci_dev;
+	struct list_head *entry;
+
+	list_for_each(entry, &drive->free_req) {
+		struct ata_request *ar = list_ata_entry(entry);
+
+		list_del(&ar->ar_queue);
+		kfree(ar->ar_sg_table);
+		pci_free_consistent(pdev, PRD_SEGMENTS * PRD_BYTES, ar->ar_dmatable_cpu, ar->ar_dmatable);
+		kfree(ar);
+	}
 }
 
 static int ide_check_media_change (kdev_t i_rdev)
diff -urN -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.8/drivers/ide/ide-disk.c linux/drivers/ide/ide-disk.c
--- /opt/kernel/linux-2.5.8/drivers/ide/ide-disk.c	2002-04-15 07:59:53.000000000 +0200
+++ linux/drivers/ide/ide-disk.c	2002-04-15 10:20:33.000000000 +0200
@@ -107,10 +107,7 @@
 	 * 48-bit commands are pretty sanely laid out
 	 */
 	if (lba48bit) {
-		if (cmd == READ)
-			command = WIN_READ_EXT;
-		else
-			command = WIN_WRITE_EXT;
+		command = cmd == READ ? WIN_READ_EXT : WIN_WRITE_EXT;
 
 		if (drive->using_dma) {
 			command++;		/* WIN_*DMA_EXT */
@@ -118,31 +115,33 @@
 				command++;	/* WIN_*DMA_QUEUED_EXT */
 		} else if (drive->mult_count)
 			command += 5;		/* WIN_MULT*_EXT */
-	} else {
-		/*
-		 * 28-bit commands seem not to be, though...
-		 */
-		if (cmd == READ) {
-			if (drive->using_dma) {
-				if (drive->using_tcq)
-					command = WIN_READDMA_QUEUED;
-				else
-					command = WIN_READDMA;
-			} else if (drive->mult_count)
-				command = WIN_MULTREAD;
+
+		return command;
+	}
+
+	/*
+	 * 28-bit commands seem not to be, though...
+	 */
+	if (cmd == READ) {
+		if (drive->using_dma) {
+			if (drive->using_tcq)
+				command = WIN_READDMA_QUEUED;
 			else
-				command = WIN_READ;
-		} else {
-			if (drive->using_dma) {
-				if (drive->using_tcq)
-					command = WIN_WRITEDMA_QUEUED;
-				else
-					command = WIN_WRITEDMA;
-			} else if (drive->mult_count)
-				command = WIN_MULTWRITE;
+				command = WIN_READDMA;
+		} else if (drive->mult_count)
+			command = WIN_MULTREAD;
+		else
+			command = WIN_READ;
+	} else {
+		if (drive->using_dma) {
+			if (drive->using_tcq)
+				command = WIN_WRITEDMA_QUEUED;
 			else
-				command = WIN_WRITE;
-		}
+				command = WIN_WRITEDMA;
+		} else if (drive->mult_count)
+			command = WIN_MULTWRITE;
+		else
+			command = WIN_WRITE;
 	}
 
 	return command;
@@ -895,24 +894,24 @@
 
 		__set_bit(i, &tag_mask);
 		len += sprintf(out+len, "%d, ", i);
-		if (ar->ar_time > max_jif)
-			max_jif = ar->ar_time;
+		if (cur_jif - ar->ar_time > max_jif)
+			max_jif = cur_jif - ar->ar_time;
 		cmds++;
 	}
 	len += sprintf(out+len, "]\n");
 
+	len += sprintf(out+len, "Queue:\t\t\treleased [ %d ] - started [ %d ]\n", drive->tcq->immed_rel, drive->tcq->immed_comp);
+
 	if (drive->tcq->queued != cmds)
-		len += sprintf(out+len, "pending request and queue count mismatch (%d)\n", cmds);
+		len += sprintf(out+len, "pending request and queue count mismatch (counted: %d)\n", cmds);
 
 	if (tag_mask != drive->tcq->tag_mask)
 		len += sprintf(out+len, "tag masks differ (counted %lx != %lx\n", tag_mask, drive->tcq->tag_mask);
 
 	len += sprintf(out+len, "DMA status:\t\t%srunning\n", test_bit(IDE_DMA, &HWGROUP(drive)->flags) ? "" : "not ");
 
-	if (max_jif)
-		len += sprintf(out+len, "Oldest command:\t\t%lu\n", cur_jif - max_jif);
-
-	len += sprintf(out+len, "immed rel %d, immed comp %d\n", drive->tcq->immed_rel, drive->tcq->immed_comp);
+	len += sprintf(out+len, "Oldest command:\t\t%lu jiffies\n", max_jif);
+	len += sprintf(out+len, "Oldest command ever:\t%lu\n", drive->tcq->oldest_command);
 
 	drive->tcq->max_last_depth = 0;
 
@@ -1017,8 +1016,10 @@
 		return -EPERM;
 	if (!drive->channel->dmaproc)
 		return -EPERM;
+	if (arg == drive->queue_depth && drive->using_tcq)
+		return 0;
 
-	drive->using_tcq = arg;
+	drive->queue_depth = arg ? arg : 1;
 	if (drive->channel->dmaproc(arg ? ide_dma_queued_on : ide_dma_queued_off, drive))
 		return -EIO;
 
diff -urN -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.8/drivers/ide/ide-dma.c linux/drivers/ide/ide-dma.c
--- /opt/kernel/linux-2.5.8/drivers/ide/ide-dma.c	2002-04-15 07:59:53.000000000 +0200
+++ linux/drivers/ide/ide-dma.c	2002-04-15 09:17:55.000000000 +0200
@@ -610,9 +610,6 @@
 		case ide_dma_check:
 			return config_drive_for_dma (drive);
 		case ide_dma_begin:
-#ifdef DEBUG
-			printk("ide_dma_begin: from %p\n", __builtin_return_address(0));
-#endif
 			if (test_and_set_bit(IDE_DMA, &HWGROUP(drive)->flags))
 				BUG();
 			/* Note that this is done *after* the cmd has
@@ -629,19 +626,18 @@
 		case ide_dma_write_queued:
 		case ide_dma_queued_start:
 			return ide_tcq_dmaproc(func, drive);
-#endif
+#endif /* CONFIG_BLK_DEV_IDE_TCQ */
 
 		case ide_dma_read:
 			reading = 1 << 3;
 		case ide_dma_write:
-			ar = HWGROUP(drive)->rq->special;
+			ar = IDE_CUR_AR(drive);
 
 			if (ide_start_dma(hwif, drive, func))
 				return 1;
 
 			if (drive->type != ATA_DISK)
 				return 0;
-
 			BUG_ON(HWGROUP(drive)->handler);
 			ide_set_handler(drive, &ide_dma_intr, WAIT_CMD, dma_timer_expiry);	/* issue cmd to drive */
 			if ((ar->ar_rq->flags & REQ_DRIVE_TASKFILE) &&
@@ -655,20 +651,13 @@
 			}
 			return hwif->dmaproc(ide_dma_begin, drive);
 		case ide_dma_end: /* returns 1 on error, 0 otherwise */
-#ifdef DEBUG
-			printk("ide_dma_end: from %p\n", __builtin_return_address(0));
-#endif
-			if (!test_and_clear_bit(IDE_DMA, &HWGROUP(drive)->flags)) {
-				printk("ide_dma_end: dma not going? %p\n", __builtin_return_address(0));
-				return 1;
-			}
+			if (!test_and_clear_bit(IDE_DMA, &HWGROUP(drive)->flags))
+				BUG();
 			drive->waiting_for_dma = 0;
 			outb(inb(dma_base)&~1, dma_base);	/* stop DMA */
 			dma_stat = inb(dma_base+2);	/* get DMA status */
 			outb(dma_stat|6, dma_base+2);	/* clear the INTR & ERROR bits */
 			ide_destroy_dmatable(drive);	/* purge DMA mappings */
-			if (drive->tcq)
-				IDE_SET_CUR_TAG(drive, -1);
 			return (dma_stat & 7) != 4 ? (0x10 | dma_stat) : 0;	/* verify good DMA status */
 		case ide_dma_test_irq: /* returns 1 if dma irq issued, 0 otherwise */
 			dma_stat = inb(dma_base+2);
diff -urN -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.8/drivers/ide/ide-probe.c linux/drivers/ide/ide-probe.c
--- /opt/kernel/linux-2.5.8/drivers/ide/ide-probe.c	2002-04-15 07:59:53.000000000 +0200
+++ linux/drivers/ide/ide-probe.c	2002-04-15 09:11:18.000000000 +0200
@@ -192,19 +192,16 @@
 	/*
 	 * it's an ata drive, build command list
 	 */
-#ifndef CONFIG_BLK_DEV_IDE_TCQ
 	drive->queue_depth = 1;
+#ifdef CONFIG_BLK_DEV_IDE_TCQ_DEPTH
+	drive->queue_depth = CONFIG_BLK_DEV_IDE_TCQ_DEPTH;
 #else
-# ifndef CONFIG_BLK_DEV_IDE_TCQ_DEPTH
-#  define CONFIG_BLK_DEV_IDE_TCQ_DEPTH 1
-# endif
 	drive->queue_depth = drive->id->queue_depth + 1;
-	if (drive->queue_depth > CONFIG_BLK_DEV_IDE_TCQ_DEPTH)
-		drive->queue_depth = CONFIG_BLK_DEV_IDE_TCQ_DEPTH;
+#endif
 	if (drive->queue_depth < 1 || drive->queue_depth > IDE_MAX_TAG)
 		drive->queue_depth = IDE_MAX_TAG;
-#endif
-	if (ide_build_commandlist(drive))
+
+	if (ide_init_commandlist(drive))
 		goto err_misc;
 
 	return;
@@ -482,11 +479,9 @@
 	sprintf(hwif->dev.bus_id, "%04x", hwif->io_ports[IDE_DATA_OFFSET]);
 	sprintf(hwif->dev.name, "ide");
 	hwif->dev.driver_data = hwif;
-#ifdef CONFIG_BLK_DEV_IDEPCI
 	if (hwif->pci_dev)
 		hwif->dev.parent = &hwif->pci_dev->dev;
 	else
-#endif
 		hwif->dev.parent = NULL; /* Would like to do = &device_legacy */
 	device_register(&hwif->dev);
 
diff -urN -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.8/drivers/ide/ide-tcq.c linux/drivers/ide/ide-tcq.c
--- /opt/kernel/linux-2.5.8/drivers/ide/ide-tcq.c	2002-04-15 07:59:53.000000000 +0200
+++ linux/drivers/ide/ide-tcq.c	2002-04-15 11:31:15.000000000 +0200
@@ -29,9 +29,11 @@
 #include <asm/delay.h>
 
 /*
- * warning: it will be _very_ verbose if set to 1
+ * warning: it will be _very_ verbose if defined
  */
-#if 0
+#undef IDE_TCQ_DEBUG
+
+#ifdef IDE_TCQ_DEBUG
 #define TCQ_PRINTK printk
 #else
 #define TCQ_PRINTK(x...)
@@ -49,13 +51,18 @@
  */
 #undef IDE_TCQ_FIDDLE_SI
 
-int ide_tcq_end(ide_drive_t *drive);
+/*
+ * wait for data phase before starting DMA or not
+ */
+#undef IDE_TCQ_WAIT_DATAPHASE
+
 ide_startstop_t ide_dmaq_intr(ide_drive_t *drive);
+ide_startstop_t ide_service(ide_drive_t *drive);
 
 static inline void drive_ctl_nien(ide_drive_t *drive, int clear)
 {
 #ifdef IDE_TCQ_NIEN
-	int mask = clear ? 0 : 2;
+	int mask = clear ? 0 : 1 << 1;
 
 	if (IDE_CONTROL_REG)
 		OUT_BYTE(drive->ctl | mask, IDE_CONTROL_REG);
@@ -142,7 +149,7 @@
 
 	spin_lock_irqsave(&ide_lock, flags);
 
-	if (test_bit(IDE_BUSY, &hwgroup->flags))
+	if (test_and_set_bit(IDE_BUSY, &hwgroup->flags))
 		printk("ide_tcq_intr_timeout: hwgroup not busy\n");
 	if (hwgroup->handler == NULL)
 		printk("ide_tcq_intr_timeout: missing isr!\n");
@@ -151,6 +158,13 @@
 
 	spin_unlock_irqrestore(&ide_lock, flags);
 
+	/*
+	 * if pending commands, try service before giving up
+	 */
+	if (ide_pending_commands(drive) && (GET_STAT() & SERVICE_STAT))
+		if (ide_service(drive) == ide_started)
+			return;
+
 	if (drive)
 		ide_tcq_invalidate_queue(drive);
 }
@@ -192,6 +206,7 @@
 
 		if (unlikely(i++ > IDE_TCQ_WAIT))
 			return 1;
+
 	} while ((*stat = GET_ALTSTAT()) & busy_mask);
 
 	return 0;
@@ -206,7 +221,12 @@
 ide_startstop_t ide_service(ide_drive_t *drive)
 {
 	struct ata_request *ar;
-	byte feat, tag, stat;
+	byte feat, stat;
+	int tag;
+
+	TCQ_PRINTK("%s: started service\n", drive->name);
+
+	drive->service_pending = 0;
 
 	if (test_bit(IDE_DMA, &HWGROUP(drive)->flags))
 		printk("ide_service: DMA in progress\n");
@@ -238,9 +258,9 @@
 	 * FIXME, invalidate queue
 	 */
 	if (stat & ERR_STAT) {
-		printk("%s: error SERVICING drive (%x)\n", drive->name, stat);
 		ide_dump_status(drive, "ide_service", stat);
-		return ide_tcq_end(drive);
+		ide_tcq_invalidate_queue(drive);
+		return ide_stopped;
 	}
 
 	/*
@@ -248,7 +268,7 @@
 	 */
 	if ((feat = GET_FEAT()) & NSEC_REL) {
 		printk("%s: release in service\n", drive->name);
-		IDE_SET_CUR_TAG(drive, -1);
+		IDE_SET_CUR_TAG(drive, IDE_INACTIVE_TAG);
 		return ide_stopped;
 	}
 
@@ -265,6 +285,8 @@
 		return ide_stopped;
 	}
 
+	HWGROUP(drive)->rq = ar->ar_rq;
+
 	/*
 	 * we'll start a dma read or write, device will trigger
 	 * interrupt to indicate end of transfer, release is not allowed
@@ -287,6 +309,8 @@
 {
 	byte stat;
 
+	TCQ_PRINTK("%s: ide_check_service\n", drive->name);
+
 	if (!ide_pending_commands(drive))
 		return ide_stopped;
 
@@ -300,40 +324,14 @@
 	return ide_started;
 }
 
-int ide_tcq_end(ide_drive_t *drive)
-{
-	byte stat = GET_STAT();
-
-	if (stat & ERR_STAT) {
-		ide_dump_status(drive, "ide_tcq_end", stat);
-		ide_tcq_invalidate_queue(drive);
-		return ide_stopped;
-	} else if (stat & SERVICE_STAT) {
-		TCQ_PRINTK("ide_tcq_end: serv stat=%x\n", stat);
-		return ide_service(drive);
-	}
-
-	TCQ_PRINTK("ide_tcq_end: stat=%x, feat=%x\n", stat, GET_FEAT());
-	return ide_stopped;
-}
-
 ide_startstop_t ide_dmaq_complete(ide_drive_t *drive, byte stat)
 {
-	struct ata_request *ar;
+	struct ata_request *ar = IDE_CUR_TAG(drive);
 	byte dma_stat;
 
-#if 0
-	byte feat = GET_FEAT();
-
-	if ((feat & (NSEC_CD | NSEC_IO)) != (NSEC_CD | NSEC_IO))
-		printk("%s: C/D | I/O not set\n", drive->name);
-#endif
-
 	/*
 	 * transfer was in progress, stop DMA engine
 	 */
-	ar = IDE_CUR_TAG(drive);
-
 	dma_stat = drive->channel->dmaproc(ide_dma_end, drive);
 
 	/*
@@ -352,7 +350,23 @@
 	TCQ_PRINTK("ide_dmaq_intr: ending %p, tag %d\n", ar, ar->ar_tag);
 	ide_end_queued_request(drive, !dma_stat, ar->ar_rq);
 
-	IDE_SET_CUR_TAG(drive, -1);
+	IDE_SET_CUR_TAG(drive, IDE_INACTIVE_TAG);
+
+	/*
+	 * keep the queue full, or honor SERVICE? note that this may race
+	 * and no new command will be started, in which case idedisk_do_request
+	 * will notice and do the service check
+	 */
+#if CONFIG_BLK_DEV_IDE_TCQ_FULL
+	if (!drive->service_pending && (ide_pending_commands(drive) > 1)) {
+		if (!blk_queue_empty(&drive->queue)) {
+			drive->service_pending = 1;
+			ide_tcq_set_intr(HWGROUP(drive), ide_dmaq_intr);
+			return ide_released;
+		}
+	}
+#endif
+
 	return ide_check_service(drive);
 }
 
@@ -376,7 +390,7 @@
 	 * if a command completion interrupt is pending, do that first and
 	 * check service afterwards
 	 */
-	if (drive->tcq->active_tag != -1)
+	if (drive->tcq->active_tag != IDE_INACTIVE_TAG)
 		return ide_dmaq_complete(drive, stat);
 
 	/*
@@ -403,10 +417,8 @@
 	 * bit 14 and 1 must be set in word 83 of the device id to indicate
 	 * support for dma queued protocol
 	 */
-	if ((drive->id->command_set_2 & tcq_supp) != tcq_supp) {
-		printk("%s: queued feature set not supported\n", drive->name);
-		return 1;
-	}
+	if ((drive->id->command_set_2 & tcq_supp) != tcq_supp)
+		return -EIO;
 
 	memset(&args, 0, sizeof(args));
 	args.taskfile.feature = SETFEATURES_EN_WCACHE;
@@ -446,6 +458,16 @@
 		return 1;
 	}
 #endif
+
+	if (!drive->tcq) {
+		drive->tcq = kmalloc(sizeof(ide_tag_info_t), GFP_ATOMIC);
+		if (!drive->tcq)
+			return -ENOMEM;
+
+		memset(drive->tcq, 0, sizeof(ide_tag_info_t));
+		drive->tcq->active_tag = IDE_INACTIVE_TAG;
+	}
+
 	return 0;
 }
 
@@ -461,26 +483,33 @@
 	if (!on) {
 		printk("%s: TCQ disabled\n", drive->name);
 		drive->using_tcq = 0;
-
 		return 0;
-	} else if (drive->using_tcq) {
-		drive->queue_depth = drive->using_tcq;
-
-		goto out;
 	}
 
 	if (ide_tcq_configure(drive)) {
 		drive->using_tcq = 0;
-
 		return 1;
 	}
 
-out:
-	drive->tcq->max_depth = 0;
+	if (ide_build_commandlist(drive))
+		return 1;
 
 	printk("%s: tagged command queueing enabled, command queue depth %d\n", drive->name, drive->queue_depth);
 	drive->using_tcq = 1;
+	drive->tcq->max_depth = 0;
+	return 0;
+}
 
+int ide_tcq_wait_dataphase(ide_drive_t *drive)
+{
+#ifdef IDE_TCQ_WAIT_DATAPHASE
+	ide_startstop_t foo;
+
+	if (ide_wait_stat(&startstop, drive, READY_STAT | DRQ_STAT, BUSY_STAT, WAIT_READY)) {
+		printk("%s: timeout waiting for data phase\n", drive->name);
+		return 1;
+	}
+#endif
 	return 0;
 }
 
@@ -488,7 +517,6 @@
 {
 	struct ata_channel *hwif = drive->channel;
 	unsigned int reading = 0, enable_tcq = 1;
-	ide_startstop_t startstop;
 	struct ata_request *ar;
 	byte stat, feat;
 
@@ -501,15 +529,13 @@
 			reading = 1 << 3;
 		case ide_dma_write_queued:
 			TCQ_PRINTK("ide_dma: setting up queued %d\n", drive->tcq->active_tag);
-			BUG_ON(drive->tcq->active_tag == -1);
+			BUG_ON(drive->tcq->active_tag == IDE_INACTIVE_TAG);
 
 			if (!test_bit(IDE_BUSY, &HWGROUP(drive)->flags))
 				printk("queued_rw: IDE_BUSY not set\n");
 
-			if (ide_wait_stat(&startstop, drive, READY_STAT | DRQ_STAT, BUSY_STAT, WAIT_READY)) {
-				printk("%s: timeout waiting for data phase\n", drive->name);
-				return startstop;
-			}
+			if (ide_tcq_wait_dataphase(drive))
+				return ide_stopped;
 
 			if (ide_start_dma(hwif, drive, func))
 				return 1;
@@ -521,7 +547,7 @@
 			 * start a queued command from scratch
 			 */
 		case ide_dma_queued_start:
-			BUG_ON(drive->tcq->active_tag == -1);
+			BUG_ON(drive->tcq->active_tag == IDE_INACTIVE_TAG);
 			ar = IDE_CUR_TAG(drive);
 
 			/*
@@ -540,14 +566,19 @@
 			drive_ctl_nien(drive, 0);
 
 			if (stat & ERR_STAT) {
-				printk("ide_dma_queued_start: abort (stat=%x)\n", stat);
+				ide_dump_status(drive, "tcq_start", stat);
 				return ide_stopped;
 			}
 
+			/*
+			 * drive released the bus, clear active tag and
+			 * check for service
+			 */
 			if ((feat = GET_FEAT()) & NSEC_REL) {
+				IDE_SET_CUR_TAG(drive, IDE_INACTIVE_TAG);
 				drive->tcq->immed_rel++;
+
 				TCQ_PRINTK("REL in queued_start\n");
-				IDE_SET_CUR_TAG(drive, -1);
 
 				if ((stat = GET_STAT()) & SERVICE_STAT)
 					return ide_service(drive);
@@ -558,21 +589,24 @@
 
 			drive->tcq->immed_comp++;
 
-			if (ide_wait_stat(&startstop, drive, READY_STAT | DRQ_STAT, BUSY_STAT, WAIT_READY)) {
-				printk("%s: timeout waiting for data phase\n", drive->name);
-				return startstop;
-			}
+			if (ide_tcq_wait_dataphase(drive))
+				return ide_stopped;
 
 			if (ide_start_dma(hwif, drive, func))
 				return ide_stopped;
 
+			/*
+			 * need to arm handler before starting dma engine,
+			 * transfer could complete right away
+			 */
+			ide_tcq_set_intr(HWGROUP(drive), ide_dmaq_intr);
+
 			if (hwif->dmaproc(ide_dma_begin, drive))
 				return ide_stopped;
 
 			/*
 			 * wait for SERVICE or completion interrupt
 			 */
-			ide_tcq_set_intr(HWGROUP(drive), ide_dmaq_intr);
 			return ide_started;
 
 		case ide_dma_queued_off:
@@ -590,6 +624,10 @@
 ide_startstop_t ide_start_tag(ide_dma_action_t func, ide_drive_t *drive,
 			      struct ata_request *ar)
 {
+	ide_startstop_t startstop;
+
+	TCQ_PRINTK("%s: ide_start_tag: begin tag %p/%d, rq %p\n", drive->name,ar,ar->ar_tag, ar->ar_rq);
+
 	/*
 	 * do this now, no need to run that with interrupts disabled
 	 */
@@ -597,5 +635,14 @@
 		return ide_stopped;
 
 	IDE_SET_CUR_TAG(drive, ar->ar_tag);
-	return ide_tcq_dmaproc(func, drive);
+	HWGROUP(drive)->rq = ar->ar_rq;
+
+	startstop = ide_tcq_dmaproc(func, drive);
+
+	if (unlikely(startstop == ide_stopped)) {
+		IDE_SET_CUR_TAG(drive, IDE_INACTIVE_TAG);
+		HWGROUP(drive)->rq = NULL;
+	}
+
+	return startstop;
 }
diff -urN -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.5.8/include/linux/ide.h linux/include/linux/ide.h
--- /opt/kernel/linux-2.5.8/include/linux/ide.h	2002-04-15 07:59:55.000000000 +0200
+++ linux/include/linux/ide.h	2002-04-15 14:46:44.000000000 +0200
@@ -273,36 +273,44 @@
 	} b;
 } special_t;
 
-#define IDE_MAX_TAG	32		/* spec says 32 max */
+#define IDE_MAX_TAG		(32)		/* spec says 32 max */
+#define IDE_INACTIVE_TAG	(-1)
 
 struct ata_request;
 typedef struct ide_tag_info_s {
 	unsigned long tag_mask;			/* next tag bit mask */
 	struct ata_request *ar[IDE_MAX_TAG];	/* in-progress requests */
 	int active_tag;				/* current active tag */
+
 	int queued;				/* current depth */
 
 	/*
 	 * stats ->
 	 */
 	int max_depth;				/* max depth ever */
+
 	int max_last_depth;			/* max since last check */
 
 	/*
-	 * Either the command completed immediately after being started
+	 * either the command complete immediately after being started
 	 * (immed_comp), or the device did a bus release before dma was
-	 * started (immed_rel).
+	 * started (immed_rel)
 	 */
 	int immed_rel;
 	int immed_comp;
+	unsigned long oldest_command;
 } ide_tag_info_t;
 
 #define IDE_GET_AR(drive, tag)	((drive)->tcq->ar[(tag)])
 #define IDE_CUR_TAG(drive)	(IDE_GET_AR((drive), (drive)->tcq->active_tag))
-#define IDE_SET_CUR_TAG(drive, tag)	((drive)->tcq->active_tag = (tag))
+#define IDE_SET_CUR_TAG(drive, tag)	\
+	do {						\
+		((drive)->tcq->active_tag = (tag));	\
+		if ((tag) == IDE_INACTIVE_TAG)		\
+			HWGROUP((drive))->rq = NULL;	\
+	} while (0);
 
-#define IDE_CUR_AR(drive)	\
-	((drive)->using_tcq ? IDE_CUR_TAG((drive)) : HWGROUP((drive))->rq->special)
+#define IDE_CUR_AR(drive)	(HWGROUP((drive))->rq->special)
 
 struct ide_settings_s;
 
@@ -359,6 +367,7 @@
 	unsigned autotune	: 2;	/* 1=autotune, 2=noautotune, 0=default */
 	unsigned remap_0_to_1	: 2;	/* 0=remap if ezdrive, 1=remap, 2=noremap */
 	unsigned ata_flash	: 1;	/* 1=present, 0=default */
+	unsigned service_pending: 1;
 	unsigned	addressing;	/* : 2; 0=28-bit, 1=48-bit, 2=64-bit */
 	byte		scsi;		/* 0=default, 1=skip current ide-subdriver for ide-scsi emulation */
 	select_t	select;		/* basic drive/head select reg value */
@@ -493,9 +502,7 @@
 
 	ide_ioreg_t	io_ports[IDE_NR_PORTS];	/* task file registers */
 	hw_regs_t	hw;		/* Hardware info */
-#ifdef CONFIG_BLK_DEV_IDEPCI
 	struct pci_dev	*pci_dev;	/* for pci chipsets */
-#endif
 	ide_drive_t	drives[MAX_DRIVES];	/* drive info */
 	struct gendisk	*gd;		/* gendisk structure */
 	ide_tuneproc_t	*tuneproc;	/* routine to tune PIO mode for drives */
@@ -546,7 +553,7 @@
 typedef enum {
 	ide_stopped,	/* no drive operation was started */
 	ide_started,	/* a drive operation was started, and a handler was set */
-	ide_released	/* started and released bus */
+	ide_released,	/* started, handler set, bus released */
 } ide_startstop_t;
 
 /*
@@ -980,6 +987,11 @@
 #define ATA_AR_SETUP	2
 #define ATA_AR_RETURN	4
 
+/*
+ * if turn-around time is longer than this, halve queue depth
+ */
+#define ATA_AR_MAX_TURNAROUND	(3 * HZ)
+
 #define list_ata_entry(entry) list_entry((entry), struct ata_request, ar_queue)
 
 static inline void ata_ar_init(ide_drive_t *drive, struct ata_request *ar)
@@ -1017,7 +1029,9 @@
 	list_add(&ar->ar_queue, &drive->free_req);
 
 	if (ar->ar_flags & ATA_AR_QUEUED) {
-		/* clear the tag */
+		/*
+		 * clear tag
+		 */
 		drive->tcq->ar[ar->ar_tag] = NULL;
 		__clear_bit(ar->ar_tag, &drive->tcq->tag_mask);
 		drive->tcq->queued--;
@@ -1026,7 +1040,7 @@
 	ar->ar_rq = NULL;
 }
 
-extern inline int ide_get_tag(ide_drive_t *drive)
+static inline int ide_get_tag(ide_drive_t *drive)
 {
 	int tag = ffz(drive->tcq->tag_mask);
 
@@ -1043,14 +1057,30 @@
 }
 
 #ifdef CONFIG_BLK_DEV_IDE_TCQ
-# define ide_pending_commands(drive)	((drive)->using_tcq && (drive)->tcq->queued)
+static inline int ide_pending_commands(ide_drive_t *drive)
+{
+	if (!drive->tcq)
+		return 0;
+
+	return drive->tcq->queued;
+}
+
+static inline int ide_can_queue(ide_drive_t *drive)
+{
+	if (!drive->tcq)
+		return 1;
+
+	return drive->tcq->queued < drive->queue_depth;
+}
 #else
-# define ide_pending_commands(drive)	0
+#define ide_pending_commands(drive)	(0)
+#define ide_can_queue(drive)		(1)
 #endif
 
 int ide_build_commandlist(ide_drive_t *);
+int ide_init_commandlist(ide_drive_t *);
 void ide_teardown_commandlist(ide_drive_t *);
 int ide_tcq_dmaproc(ide_dma_action_t, ide_drive_t *);
 ide_startstop_t ide_start_tag(ide_dma_action_t, ide_drive_t *, struct ata_request *);
 
-#endif
+#endif /* _IDE_H */

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2002-04-30 19:58 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-15 19:28 [PATCH] IDE TCQ #4 Petr Vandrovec
2002-04-16 10:25 ` Jens Axboe
2002-04-16 10:39   ` USB-Mouse-Bug in 2.4.16-8 ? mtopper
2002-04-19 16:28     ` Greg KH
2002-04-16 11:01   ` [PATCH] IDE TCQ #4 Martin Dalecki
2002-04-16 12:28     ` Jens Axboe
2002-04-16 11:44       ` Martin Dalecki
  -- strict thread matches above, loose matches on Subject: below --
2002-04-18 22:05 Andries.Brouwer
2002-04-15 19:29 Petr Vandrovec
2002-04-15 19:00 ` Jens Axboe
2002-04-15 19:11 Petr Vandrovec
2002-04-15 18:41 Petr Vandrovec
2002-04-15 18:26 ` Jens Axboe
2002-04-16  5:11 ` Martin Dalecki
2002-04-15 12:56 Jens Axboe
2002-04-15 12:24 ` Martin Dalecki
2002-04-15 13:33   ` Jens Axboe
2002-04-15 12:39     ` Martin Dalecki
2002-04-15 16:13 ` Aaron Tiensivu
2002-04-15 16:18   ` Jens Axboe
2002-04-15 16:44     ` Jens Axboe
2002-04-16 10:25   ` Jens Axboe
     [not found]     ` <20020416200051.7ae38411.sebastian.droege@gmx.de>
     [not found]       ` <20020416180914.GR1097@suse.de>
2002-04-16 18:43         ` Sebastian Droege
2002-04-17  7:48           ` Martin Dalecki
2002-04-17 11:28             ` Sebastian Droege
2002-04-17 10:32               ` Martin Dalecki
2002-04-17 11:40                 ` Sebastian Droege
2002-04-17 10:42               ` Martin Dalecki
2002-04-18 12:17                 ` Sebastian Droege
2002-04-18 11:20                   ` Martin Dalecki
2002-04-18 12:26                     ` Sebastian Droege
2002-04-18 12:57                       ` Jens Axboe
2002-04-18 12:57                     ` Jens Axboe
2002-04-18 11:59                       ` Martin Dalecki
2002-04-18 13:07                         ` Jens Axboe
2002-04-18 12:08                           ` Martin Dalecki
2002-04-18 13:12                             ` Jens Axboe
2002-04-18 12:16                               ` Martin Dalecki
2002-04-18 13:26                                 ` Jens Axboe
2002-04-18 12:40                                   ` Martin Dalecki
2002-04-18 12:45                                     ` Martin Dalecki
2002-04-18 14:17                               ` Alan Cox
2002-04-18 13:01                   ` Jens Axboe
2002-04-17  7:32     ` Helge Hafting
2002-04-17 13:01       ` Dave Jones
2002-04-17 13:05         ` Jens Axboe
2002-04-30 19:58     ` Martin Schewe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox