public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
@ 2006-06-19  8:21 Andreas Mohr
  2006-06-20  8:37 ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Mohr @ 2006-06-19  8:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: hal, Greg KH

Hi all,

I'm having severe issues with cdrecord aborting with "buffer underrun"
message, *always*. Problem nailed, see below.

AMD Athlon @1200 512MB
Debian testing
Linux 2.6.17-rc6-mm2 (newest, testing on 2.6.16ish and 2.6.15ish didn't
make a difference)
cdrecord 4:2.01+01a03-3 (newest)
hal 0.5.7-2 (newest)
libhal-storage1 0.5.7-1 (newest)
libhal1 0.5.7-1 (newest)

burner:
  Vendor: HL-DT-ST  Model: DVDRAM GSA-4160B  Rev: A306
  Type:   CD-ROM                             ANSI SCSI revision: 00
via:
Bus 001 Device 005: ID 0402:5621 ALi Corp. USB 2.0 Storage Device
(that's the "HL-DT-ST" external case ID string given above)
on USB 2.0 connector:
0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)

# lsmod (USB-related only)
sd_mod                 18560  2
sg                     31772  0
sr_mod                 16164  0
usb_storage            71880  1
scsi_mod              132876  4 sd_mod,sg,sr_mod,usb_storage
uhci_hcd               23948  0
ehci_hcd               29964  0
usbcore               130844  4 usb_storage,uhci_hcd,ehci_hcd

# cat /proc/interrupts
           CPU0
  0:   14492818   IO-APIC-edge     timer
  1:      14298   IO-APIC-edge     i8042
  6:          3   IO-APIC-edge     floppy
  8:          4   IO-APIC-edge     rtc
  9:          0   IO-APIC-fasteoi  acpi
 12:     247193   IO-APIC-edge     i8042
 14:     172079   IO-APIC-edge     ide0
 16:     741075   IO-APIC-fasteoi  radeon@pci:0000:01:00.0
 19:     128123   IO-APIC-fasteoi  ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3,
uhci_hcd:usb4
 20:       7235   IO-APIC-fasteoi  au8830, eth1
 21:          2   IO-APIC-fasteoi  ohci1394
 22:      13029   IO-APIC-fasteoi  Aztech AZF3328 (PCI168), au8820
 23:          0   IO-APIC-fasteoi  VIA8233
NMI:          0
LOC:   14494370
ERR:          0
MIS:          0



There's a /usr/lib/hal/hald-addon-storage running for this CD device:

nanosleep({2, 0}, {2, 0})               = 0 <2.000965>
open("/dev/scsi/host2/bus0/target0/lun0/cd", O_RDONLY|O_NONBLOCK|O_EXCL|O_LARGEFILE) = 4 <0.002021>
ioctl(4, CDROM_DRIVE_STATUS, 0x7fffffff) = 4 <0.001811>
ioctl(4, CDROM_MEDIA_CHANGED, 0x7fffffff) = 0 <0.006341>
ioctl(4, SG_IO, 0xbfdbc0c8)             = 0 <0.002186>
close(4)                                = 0 <0.001087>
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000016>
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0 <0.000015>
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000015>
nanosleep({2, 0},  <unfinished ...>




It has a 2 second polling loop, and this is *exactly* the delay until
cdrecord bombs with an "underrun" message:



Script started on Mo 26 Jun 2006 13:10:04 CEST
# cdrecord -speed=4 -dummy dev=2,0,0 -eject - \rv ubuntu-6.06-desktop-i386.iso
cdrecord: No write mode specified.
cdrecord: Asuming -tao mode.
cdrecord: Future versions of cdrecord may have different drive dependent defaults.
cdrecord: Continuing in 5 seconds...
Cdrecord-Clone 2.01.01a03 (i686-pc-linux-gnu) Copyright (C) 1995-2005 Joerg Schilling
NOTE: this version of cdrecord is an inofficial (modified) release of cdrecord
      and thus may have bugs that are not present in the original version.
      Please send bug reports and support requests to <cdrtools@packages.debian.org>.
      The original author should not be bothered with problems of this version.

cdrecord: Warning: Running on Linux-2.6.17-rc6-mm2
cdrecord: There are unsettled issues with Linux-2.5 and newer.
cdrecord: If you have unexpected problems, please try Linux-2.4 or Solaris.
TOC Type: 1 = CD-ROM
scsidev: '2,0,0'
scsibus: 2 target: 0 lun: 0
Linux sg driver version: 3.5.33
Using libscg version 'ubuntu-0.8ubuntu1'.
cdrecord: Warning: using inofficial version of libscg (ubuntu-0.8ubuntu1 '@(#)scsitransp.c	1.91 04/06/17 Copyright 1988,1995,2000-2004 J. Schilling').
SCSI buffer size: 64512
atapi: 1
Device type    : Removable CD-ROM
Version        : 0
Response Format: 2
Capabilities   : 
Vendor_info    : 'HL-DT-ST'
Identifikation : 'DVDRAM GSA-4160B'
Revision       : 'A306'
Device seems to be: Generic mmc2 DVD-R/DVD-RW.
Current: 0x000A
Profile: 0x0012 
Profile: 0x0011 
Profile: 0x0014 
Profile: 0x0013 
Profile: 0x001A 
Profile: 0x001B 
Profile: 0x002B 
Profile: 0x0010 
Profile: 0x0009 
Profile: 0x000A (current)
Profile: 0x0008 
Profile: 0x0002 
cdrecord: This version of cdrecord does not include DVD-R/DVD-RW support code.
cdrecord: See /usr/share/doc/cdrecord/README.DVD.Debian for details on DVD support.
Using generic SCSI-3/mmc   CD-R/CD-RW driver (mmc_cdr).
Driver flags   : MMC-3 SWABAUDIO BURNFREE 
Supported modes: TAO PACKET SAO SAO/R96P SAO/R96R RAW/R16 RAW/R96P RAW/R96R
Drive buf size : 1053696 = 1029 KB
Drive DMA Speed: 11837 kB/s 67x CD 8x DVD
FIFO size      : 4194304 = 4096 KB
Track 01: data   697 MB        
Total size:      801 MB (79:23.98) = 357299 sectors
Lout start:      801 MB (79:25/74) = 357299 sectors
Current Secsize: 2048
ATIP info from disk:
  Indicated writing power: 3
  Reference speed: 6
  Is not unrestricted
  Is erasable
  Disk sub type: High speed Rewritable (CAV) media (1)
  ATIP start of lead in:  -11700 (97:26/00)
  ATIP start of lead out: 359849 (79:59/74)
  1T speed low:  4 1T speed high: 10
  2T speed low:  4 2T speed high:  0 (reserved val  6)
  power mult factor: 1 5
  recommended erase/write power: 5
  A1 values: 24 1A D8
  A2 values: 26 B2 4A
Disk type:    Phase change
Manuf. index: 5
Manufacturer: FORNET INTERNATIONAL PTE LTD.
Blocks total: 359849 Blocks current: 359849 Blocks remaining: 2550
Starting to write CD/DVD at speed 8 in dummy TAO mode for single session.
Last chance to quit, starting dummy write in 9 seconds.\b\b\b\b\b\b\b\b\b\b\b\b\b   8 seconds.\b\b\b\b\b\b\b\b\b\b\b\b\b   7 seconds.\b\b\b\b\b\b\b\b\b\b\b\b\b   6 seconds.\b\b\b\b\b\b\b\b\b\b\b\b\b   5 seconds.\b\b\b\b\b\b\b\b\b\b\b\b\b   4 seconds.\b\b\b\b\b\b\b\b\b\b\b\b\b   3 seconds.\b\b\b\b\b\b\b\b\b\b\b\b\b   2 seconds.\b\b\b\b\b\b\b\b\b\b\b\b\b   1 seconds.\b\b\b\b\b\b\b\b\b\b\b\b\b   0 seconds. Operation starts.
Waiting for reader process to fill input buffer ... input buffer ready.
BURN-Free is OFF.
Starting new track at sector: 0
\rTrack 01:    0 of  697 MB written.\rTrack 01:    1 of  697 MB written (fifo 100%) [buf  98%]   2.5x.cdrecord: Input/output error. write_g1: scsi sendcmd: no error
CDB:  2A 00 00 00 02 2E 00 00 1F 00
status: 0x2 (CHECK CONDITION)
Sense Bytes: 70 00 05 00 00 00 00 10 2A 30 06 90 21 02 00 00
Sense Key: 0x5 Illegal Request, Segment 0
Sense Code: 0x21 Qual 0x02 (invalid address for write) Fru 0x0
Sense flags: Blk 0 (not valid) 
resid: 63488
cmd finished after 0.004s timeout 40s

write track data: error after 1142784 bytes
cdrecord: The current problem looks like a buffer underrun.
cdrecord: Try to use 'driveropts=burnfree'.
cdrecord: Make sure that you are root, enable DMA and check your HW/OS set up.
Writing  time:   10.479s
Average write speed 457.0x.
Min drive buffer fill was 98%
Fixating...
WARNING: Some drives don't like fixation in dummy mode.
Fixating time:   33.828s
cdrecord: fifo had 82 puts and 19 gets.
cdrecord: fifo was 0 times empty and 7 times full, min fill was 90%.
# 
Script done on Mo 26 Jun 2006 13:11:17 CEST


Killing the SCSI /usr/lib/hal/hald-addon-storage (after killing its hald-runner
supervisor parent) lets (non-simulation) burning of a whole CD proceed
properly...


However running a simple test program as a different non-root user:

#define __USE_LARGEFILE64

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>

/* argh, still not defined... manually! */
# define O_LARGEFILE    0100000


int main () {
        int fd;

        fd = open("/dev/scsi/host2/bus0/target0/lun0/cd", O_RDONLY|O_NONBLOCK|O_EXCL|O_LARGEFILE);
        printf("fd is %d\n", fd);
        close(fd);
        return 0;
}

kills concurrent cdrecord burning (by root) on the spot.

Note that this still happens if I omit O_NONBLOCK, even though this yields
me an invalid fd (-1)!.
Any other flag combination (even 0) does the same.
Thus it's clearly caused by open() handling itself.


Perfect DoS, if you ask me (but OTOH on many distro setups there probably
won't be user access permissions for this device by default).

This might still be on the borders of tolerable under normal working
conditions, but the fact that hald does an open() polling loop on it
renders it catastrophic.


Note that I also had severe issues with a chinese "Cenda 201" USB OTG 40GB HDD
device, I got USB bus reset every couple seconds, resulting in an audible
drive head-park reset each time:

May 29 02:05:23 andi kernel: ehci_hcd 0000:00:10.3: qh deb16080 (#00) state 4
May 29 02:05:24 andi kernel: usb 4-2: reset high speed USB device using ehci_hcd and address 2
May 29 02:05:56 andi last message repeated 17 times
May 29 02:06:57 andi last message repeated 32 times
May 29 02:07:35 andi last message repeated 20 times
May 29 02:07:37 andi kernel: ehci_hcd 0000:00:10.3: qh deb16200 (#00) state 4
May 29 02:07:37 andi kernel: usb 4-2: reset high speed USB device using ehci_hcd and address 2
May 29 02:07:43 andi last message repeated 3 times


Now that I found this I'd *bet* that this was exactly the same
problem there (caused by HAL polling the device file).


I cannot quite believe that such a severe issue hasn't been noticed
and fixed immediately by whichever person out there.
Possibly this is a VIA-only (oh no, not again... ;) problem?
(either USB controller chip or southbridge or so).

0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
0000:00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50)


I don't think hal can do much about hald-addon-storage to prevent a simple
open() from distorting a running device operation, so I'd tend to think
that this is a kernel issue (perhaps in the USB host controller drivers).


Thanks,

Andreas Mohr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-19  8:21 USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets) Andreas Mohr
@ 2006-06-20  8:37 ` Andrew Morton
  2006-06-20  8:47   ` Andreas Mohr
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Andrew Morton @ 2006-06-20  8:37 UTC (permalink / raw)
  To: Andreas Mohr; +Cc: linux-kernel, hal, gregkh, linux-usb-devel

On Mon, 19 Jun 2006 10:21:54 +0200
Andreas Mohr <andi@rhlx01.fht-esslingen.de> wrote:

> I'm having severe issues with cdrecord aborting with "buffer underrun"
> message, *always*. Problem nailed, see below.

[hald polling causes cdrecord to go bad on a USB CD drive]

One possible reason is that we're shooting down the device's pagecache by
accident as a result of hald activity.  This shouldn't happen, but still.
Could you please do

`watch -n1 cat /proc/meminfo' and/or `vmstat 1'

then run cdrecord

then wait a bit

then run hald, see if the amount of pagecache (/proc/meminfo:Buffers or
/proc/meminfo:Cached) suddenly decreases.

If the answer is "no" then I guess there's some unpleasant interaction
happening at the USB level


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-20  8:37 ` Andrew Morton
@ 2006-06-20  8:47   ` Andreas Mohr
  2006-06-20  8:56     ` Andrew Morton
  2006-06-20  9:06   ` [linux-usb-devel] " Alan Cox
  2006-06-21  9:10   ` Andreas Mohr
  2 siblings, 1 reply; 15+ messages in thread
From: Andreas Mohr @ 2006-06-20  8:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, hal, gregkh, linux-usb-devel

Hi,

On Tue, Jun 20, 2006 at 01:37:41AM -0700, Andrew Morton wrote:
> On Mon, 19 Jun 2006 10:21:54 +0200
> Andreas Mohr <andi@rhlx01.fht-esslingen.de> wrote:
> 
> > I'm having severe issues with cdrecord aborting with "buffer underrun"
> > message, *always*. Problem nailed, see below.
> 
> [hald polling causes cdrecord to go bad on a USB CD drive]
> 
> One possible reason is that we're shooting down the device's pagecache by
> accident as a result of hald activity.  This shouldn't happen, but still.
> Could you please do

[...]

Thanks, will do.

Just filed this as
http://bugzilla.kernel.org/show_bug.cgi?id=6722
due to no response here before.

(#6194 looks like it might be the same issue)

I'd suggest continuing discussion at bug #6722 from now.

Thanks,

Andreas Mohr

-- 
No programming skills!? Why not help translate many Linux applications! 
https://launchpad.ubuntu.com/rosetta
(or alternatively buy nicely packaged Linux distros/OSS software to help
support Linux developers creating shiny new things for you?)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-20  8:47   ` Andreas Mohr
@ 2006-06-20  8:56     ` Andrew Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2006-06-20  8:56 UTC (permalink / raw)
  To: andi; +Cc: andim2, linux-kernel, hal, gregkh, linux-usb-devel

On Tue, 20 Jun 2006 10:47:59 +0200
Andreas Mohr <andim2@users.sourceforge.net> wrote:

> Just filed this as
> http://bugzilla.kernel.org/show_bug.cgi?id=6722
> due to no response here before.

I saw it, which is why I hunted down this email.  I hadn't even got that
far into my lkml reading, actually.  You should wait 2-3 days to be really
sure that a bug is being ignored.  Although with one like this, it's a fair
bet that it will be.

> (#6194 looks like it might be the same issue)
> 
> I'd suggest continuing discussion at bug #6722 from now.

No, I tend to think it's better to resort to bugzilla only after a few
days, when the trail has gone cold.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-20  9:06   ` [linux-usb-devel] " Alan Cox
@ 2006-06-20  9:05     ` Andreas Mohr
  2006-06-20 10:18       ` Alan Cox
  2006-06-20 14:22       ` Alan Stern
  0 siblings, 2 replies; 15+ messages in thread
From: Andreas Mohr @ 2006-06-20  9:05 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, gregkh, linux-kernel, linux-usb-devel, hal

Hi,

On Tue, Jun 20, 2006 at 10:06:56AM +0100, Alan Cox wrote:
> Ar Maw, 2006-06-20 am 01:37 -0700, ysgrifennodd Andrew Morton:
> > [hald polling causes cdrecord to go bad on a USB CD drive]
> > 
> > One possible reason is that we're shooting down the device's pagecache by
> > accident as a result of hald activity. 
> 
> On IDE hal causes problems with some drives because the additional
> commands sent while the drive is busy end up timing out which triggers a
> bus reset and breaks everything. Really HAL should have better manners
> than to poll a drive that is busy.

But how would HAL safely determine whether a (IDE/USB) drive is busy?
As my test app demonstrates (without HAL running), the *very first* open()
happening during an ongoing burning operation will kill it instantly, in the
USB case.
Are there any options left for HAL at all? Still seems to strongly point
towards a kernel issue so far.

One (rather less desireable) way I can make up might be to have HAL
keep the device open permanently and do an ioctl query on whether it's "busy"
and then quickly close the device again before the newly started
burning process gets disrupted (if this even properly works at all).

Andreas Mohr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-20  8:37 ` Andrew Morton
  2006-06-20  8:47   ` Andreas Mohr
@ 2006-06-20  9:06   ` Alan Cox
  2006-06-20  9:05     ` Andreas Mohr
  2006-06-21  9:10   ` Andreas Mohr
  2 siblings, 1 reply; 15+ messages in thread
From: Alan Cox @ 2006-06-20  9:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andreas Mohr, gregkh, linux-kernel, linux-usb-devel, hal

Ar Maw, 2006-06-20 am 01:37 -0700, ysgrifennodd Andrew Morton:
> [hald polling causes cdrecord to go bad on a USB CD drive]
> 
> One possible reason is that we're shooting down the device's pagecache by
> accident as a result of hald activity. 

On IDE hal causes problems with some drives because the additional
commands sent while the drive is busy end up timing out which triggers a
bus reset and breaks everything. Really HAL should have better manners
than to poll a drive that is busy.

Alan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-20  9:05     ` Andreas Mohr
@ 2006-06-20 10:18       ` Alan Cox
  2006-06-20 14:22       ` Alan Stern
  1 sibling, 0 replies; 15+ messages in thread
From: Alan Cox @ 2006-06-20 10:18 UTC (permalink / raw)
  To: andi; +Cc: Andrew Morton, gregkh, linux-kernel, linux-usb-devel, hal

Ar Maw, 2006-06-20 am 11:05 +0200, ysgrifennodd Andreas Mohr:
> But how would HAL safely determine whether a (IDE/USB) drive is busy?
> As my test app demonstrates (without HAL running), the *very first* open()
> happening during an ongoing burning operation will kill it instantly, in the
> USB case.
> Are there any options left for HAL at all? Still seems to strongly point
> towards a kernel issue so far.

In the IDE space O_EXCL has the needed semantics. At least it does on
Fedora and I don't think thats a Fedora patch, not sure if this is the
case for the USB side of things. 

> One (rather less desireable) way I can make up might be to have HAL
> keep the device open permanently and do an ioctl query on whether it's "busy"
> and then quickly close the device again before the newly started
> burning process gets disrupted (if this even properly works at all).

O_EXCL used by cdrecord is probably the right thing


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-20  9:05     ` Andreas Mohr
  2006-06-20 10:18       ` Alan Cox
@ 2006-06-20 14:22       ` Alan Stern
  2006-06-21  9:33         ` Andreas Mohr
  1 sibling, 1 reply; 15+ messages in thread
From: Alan Stern @ 2006-06-20 14:22 UTC (permalink / raw)
  To: andi; +Cc: Alan Cox, Andrew Morton, gregkh, linux-kernel, hal,
	linux-usb-devel

On Tue, 20 Jun 2006, Andreas Mohr wrote:

> But how would HAL safely determine whether a (IDE/USB) drive is busy?
> As my test app demonstrates (without HAL running), the *very first* open()
> happening during an ongoing burning operation will kill it instantly, in the
> USB case.
> Are there any options left for HAL at all? Still seems to strongly point
> towards a kernel issue so far.
> 
> One (rather less desireable) way I can make up might be to have HAL
> keep the device open permanently and do an ioctl query on whether it's "busy"
> and then quickly close the device again before the newly started
> burning process gets disrupted (if this even properly works at all).

The open() call is not in itself the problem.

I would guess that the problem is sparked by the TEST UNIT READY command
automatically sent when the device file is opened.  Although a drive
should have no difficulty handling this command while carrying out a burn,
apparently yours aborts.  In other words, this is likely to be a firmware 
problem in the CD drive.

I can't tell what's going on with the USB HDD since you haven't provided 
any information.

If you want to find out what's actually happening instead of just 
guessing, turn on CONFIG_USB_STORAGE_DEBUG and see what the kernel log has 
to say for the time when the underrun/reset occurs.

Alan Stern


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-20  8:37 ` Andrew Morton
  2006-06-20  8:47   ` Andreas Mohr
  2006-06-20  9:06   ` [linux-usb-devel] " Alan Cox
@ 2006-06-21  9:10   ` Andreas Mohr
  2 siblings, 0 replies; 15+ messages in thread
From: Andreas Mohr @ 2006-06-21  9:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, hal, gregkh, linux-usb-devel

Hi,

On Tue, Jun 20, 2006 at 01:37:41AM -0700, Andrew Morton wrote:
> On Mon, 19 Jun 2006 10:21:54 +0200
> Andreas Mohr <andi@rhlx01.fht-esslingen.de> wrote:
> 
> > I'm having severe issues with cdrecord aborting with "buffer underrun"
> > message, *always*. Problem nailed, see below.
> 
> [hald polling causes cdrecord to go bad on a USB CD drive]
> 
> One possible reason is that we're shooting down the device's pagecache by
> accident as a result of hald activity.  This shouldn't happen, but still.
> Could you please do
> 
> `watch -n1 cat /proc/meminfo' and/or `vmstat 1'
> 
> then run cdrecord
> 
> then wait a bit
> 
> then run hald, see if the amount of pagecache (/proc/meminfo:Buffers or
> /proc/meminfo:Cached) suddenly decreases.
> 
> If the answer is "no" then I guess there's some unpleasant interaction
> happening at the USB level

"no":
both values don't show any noticeable decrease whatsoever upon a second
parallel open() of the device which makes burning abort.

It seems to be a USB thingy, as I will show in my next mail.

Andreas Mohr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-20 14:22       ` Alan Stern
@ 2006-06-21  9:33         ` Andreas Mohr
  2006-06-21 16:15           ` Alan Stern
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Mohr @ 2006-06-21  9:33 UTC (permalink / raw)
  To: Alan Stern
  Cc: andi, Alan Cox, Andrew Morton, gregkh, linux-kernel, hal,
	linux-usb-devel

Hi,

On Tue, Jun 20, 2006 at 10:22:57AM -0400, Alan Stern wrote:
> On Tue, 20 Jun 2006, Andreas Mohr wrote:
> 
> > But how would HAL safely determine whether a (IDE/USB) drive is busy?
> > As my test app demonstrates (without HAL running), the *very first* open()
> > happening during an ongoing burning operation will kill it instantly, in the
> > USB case.
> > Are there any options left for HAL at all? Still seems to strongly point
> > towards a kernel issue so far.
> > 
> > One (rather less desireable) way I can make up might be to have HAL
> > keep the device open permanently and do an ioctl query on whether it's "busy"
> > and then quickly close the device again before the newly started
> > burning process gets disrupted (if this even properly works at all).
> 
> The open() call is not in itself the problem.
> 
> I would guess that the problem is sparked by the TEST UNIT READY command
> automatically sent when the device file is opened.  Although a drive
> should have no difficulty handling this command while carrying out a burn,
> apparently yours aborts.  In other words, this is likely to be a firmware 
> problem in the CD drive.

OK, at http://lisas.de/~andi/temp_usb/ there are logs:
debug.log.gz: burning, then running my open() test app: test app started
at 22:43:49, disrupted burning process.
only_device_open.log.gz: device open() *only* (plus close()!),
no other USB activity such as burning happening (for comparison purposes, to see
what a usual open()/close() is like)

only_device_open.log.gz (and close()!) contains:
- TEST_UNIT_READY (failure?)
- TEST_UNIT_READY
- TEST_UNIT_READY
- READ_TOC (failure?)
- ALLOW_MEDIUM_REMOVAL
- (unknown command) !!!!!!!
- TEST_UNIT_READY
- READ_TOC (failure?)
- READ_TOC (failure?)
- READ_CAPACITY
- ALLOW_MEDIUM_REMOVAL

Hmm, multiple failures in there: might be cable issues??


debug.log.gz contains:
*** ongoing burning: ***
- lots of WRITE_10 (NO failure!)
- READ BUFFER CAPACITY
- lots of WRITE_10 (NO failure!)
- READ BUFFER CAPACITY
- a couple WRITE_10 (NO failure!)
*** [22:43:49] device open(): ***
- TEST_UNIT_READY (ok!)
- WRITE_10 (ok!!)
- TEST_UNIT_READY (ok!)
- WRITE_10 (ok!)
- READ_TOC (*** ERROR!! ***)
- WRITE_10 (ok!)
- ALLOW_MEDIUM_REMOVAL (ok!)
- WRITE_10 (*** FAILURE! ***)
- going downhill from here...


So what could be the problem here?
READ_TOC might be it, but then it might be fully ok to have it fail
(after all it's non-valid data content), so ALLOW_MEDIUM_REMOVAL would be the
problem then? (next WRITE_10 FAILS!).

I could be totally wrong, though, since I don't have much storage debugging
experience.


A good idea would be to further check whether it's the open() or the close()
which disrupts burning for me.


> I can't tell what's going on with the USB HDD since you haven't provided 
> any information.

I'd like to, but can't since I don't have device access any more.


Oh, and that burner_switchoff_oops.jpg in the same directory
is an OOPS that happened when I tried to blank a CDRW,
then cancelled the operation (2x Ctrl-C on cdrecord),
but then had HAL device polling daemon and my test app block on I/O wait on the
device that continued blanking the CDRW. Since I then didn't want to wait
for the blanking to finish I had to switch off the device: immediate OOPS,
possibly due to mis-handling the two processes still waiting on a busy device
which then got switched off completely. Kernel 2.6.17-rc6-mm2.

Thanks!

Andreas Mohr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-21  9:33         ` Andreas Mohr
@ 2006-06-21 16:15           ` Alan Stern
  2006-06-21 16:44             ` Andreas Mohr
  0 siblings, 1 reply; 15+ messages in thread
From: Alan Stern @ 2006-06-21 16:15 UTC (permalink / raw)
  To: andi; +Cc: Alan Cox, Andrew Morton, gregkh, linux-kernel, hal,
	linux-usb-devel

On Wed, 21 Jun 2006, Andreas Mohr wrote:

> OK, at http://lisas.de/~andi/temp_usb/ there are logs:
> debug.log.gz: burning, then running my open() test app: test app started
> at 22:43:49, disrupted burning process.
> only_device_open.log.gz: device open() *only* (plus close()!),
> no other USB activity such as burning happening (for comparison purposes, to see
> what a usual open()/close() is like)
> 
> only_device_open.log.gz (and close()!) contains:
> - TEST_UNIT_READY (failure?)

Normal failure.  This indicates that the media has been changed since the
last I/O operation.

> - TEST_UNIT_READY
> - TEST_UNIT_READY
> - READ_TOC (failure?)

I don't know why this failed.  Maybe the disc didn't have a valid Table of 
Contents.

> - ALLOW_MEDIUM_REMOVAL
> - (unknown command) !!!!!!!
> - TEST_UNIT_READY
> - READ_TOC (failure?)
> - READ_TOC (failure?)

These failed for the same reason as the earlier READ_TOC.

> - READ_CAPACITY
> - ALLOW_MEDIUM_REMOVAL
> 
> Hmm, multiple failures in there: might be cable issues??

No.  If the cable was a problem then all the commands (not just READ_TOC)  
would have gotten errors (not failures).

> debug.log.gz contains:
> *** ongoing burning: ***
> - lots of WRITE_10 (NO failure!)
> - READ BUFFER CAPACITY
> - lots of WRITE_10 (NO failure!)
> - READ BUFFER CAPACITY
> - a couple WRITE_10 (NO failure!)
> *** [22:43:49] device open(): ***
> - TEST_UNIT_READY (ok!)
> - WRITE_10 (ok!!)
> - TEST_UNIT_READY (ok!)
> - WRITE_10 (ok!)
> - READ_TOC (*** ERROR!! ***)

This was a different sort of error.  The code was "Logical unit not ready, 
long write in progress", which makes sense.

> - WRITE_10 (ok!)
> - ALLOW_MEDIUM_REMOVAL (ok!)
> - WRITE_10 (*** FAILURE! ***)
> - going downhill from here...
> 
> 
> So what could be the problem here?
> READ_TOC might be it, but then it might be fully ok to have it fail
> (after all it's non-valid data content), so ALLOW_MEDIUM_REMOVAL would be the
> problem then? (next WRITE_10 FAILS!).

It sure does look like the ALLOW_MEDIUM_REMOVAL is the cause of the 
problem.

> I could be totally wrong, though, since I don't have much storage debugging
> experience.
> 
> 
> A good idea would be to further check whether it's the open() or the close()
> which disrupts burning for me.

Yep.  The ALLOW_MEDIUM_REMOVAL occurs as part of handling the close().  
And you can understand a CD drive not wanting to carry out a long write 
when the door is unlocked.

The real problem seems to be that the device is reachable in two different 
ways, and they don't implement proper mutual exclusion.  HAL (or your test 
program) is undoubtedly using /dev/sr0 or something similar, whereas 
cdrecord uses /dev/sg0.  Going through two different drivers, it's no 
surprise they wind up interfering with each other.


> Oh, and that burner_switchoff_oops.jpg in the same directory
> is an OOPS that happened when I tried to blank a CDRW,
> then cancelled the operation (2x Ctrl-C on cdrecord),
> but then had HAL device polling daemon and my test app block on I/O wait on the
> device that continued blanking the CDRW. Since I then didn't want to wait
> for the blanking to finish I had to switch off the device: immediate OOPS,
> possibly due to mis-handling the two processes still waiting on a busy device
> which then got switched off completely. Kernel 2.6.17-rc6-mm2.

Unfortunately I can't debug this without seeing the start of the oops 
message.

Alan Stern


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-21 16:15           ` Alan Stern
@ 2006-06-21 16:44             ` Andreas Mohr
  2006-06-21 19:06               ` Alan Stern
  2006-06-21 20:08               ` Pete Zaitcev
  0 siblings, 2 replies; 15+ messages in thread
From: Andreas Mohr @ 2006-06-21 16:44 UTC (permalink / raw)
  To: Alan Stern
  Cc: andi, Alan Cox, Andrew Morton, gregkh, linux-kernel, hal,
	linux-usb-devel

Hi,

On Wed, Jun 21, 2006 at 12:15:08PM -0400, Alan Stern wrote:
> On Wed, 21 Jun 2006, Andreas Mohr wrote:
> > - TEST_UNIT_READY
> > - TEST_UNIT_READY
> > - READ_TOC (failure?)
> 
> I don't know why this failed.  Maybe the disc didn't have a valid Table of 
> Contents.

Ah, silly me, I should have stated that this was a simulation burn on an
otherwise rather blank disc ;)


> > - WRITE_10 (ok!)
> > - ALLOW_MEDIUM_REMOVAL (ok!)
> > - WRITE_10 (*** FAILURE! ***)
> > - going downhill from here...
> > 
> > 
> > So what could be the problem here?
> > READ_TOC might be it, but then it might be fully ok to have it fail
> > (after all it's non-valid data content), so ALLOW_MEDIUM_REMOVAL would be the
> > problem then? (next WRITE_10 FAILS!).
> 
> It sure does look like the ALLOW_MEDIUM_REMOVAL is the cause of the 
> problem.

Yup, already was quite sure of that after having written the previous mail.

I'll try to verify this by simply removing all ALLOW_MEDIUM_REMOVAL calls ;)


> > I could be totally wrong, though, since I don't have much storage debugging
> > experience.
> > 
> > 
> > A good idea would be to further check whether it's the open() or the close()
> > which disrupts burning for me.
> 
> Yep.  The ALLOW_MEDIUM_REMOVAL occurs as part of handling the close().  
> And you can understand a CD drive not wanting to carry out a long write 
> when the door is unlocked.
> 
> The real problem seems to be that the device is reachable in two different 
> ways, and they don't implement proper mutual exclusion.  HAL (or your test 
> program) is undoubtedly using /dev/sr0 or something similar, whereas 
> cdrecord uses /dev/sg0.  Going through two different drivers, it's no 
> surprise they wind up interfering with each other.

HAL is /dev/host0/.../cd
cdrecord is -dev=0,0,0 (whatever Linux device file this translates into)
or a similar device ID as returned by -scanbus.


Probably (stating the obvious here, I'm afraid) we should only send
non-ALLOW_MEDIUM_REMOVAL for the *very first* device open,
and then send ALLOW_MEDIUM_REMOVAL after the *very last* device close only.

So you think that with sr and sg drivers both talking to the device,
proper inter-driver device tracking is not doable or quite difficult
to implement?


> Unfortunately I can't debug this without seeing the start of the oops 
> message.

[OOPS output of a *different* issue]

Right, it's a rather incomplete OOPS. Let me try to get one with a nice
long-line VGA mode soon...

Thanks!

Andreas Mohr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-21 16:44             ` Andreas Mohr
@ 2006-06-21 19:06               ` Alan Stern
  2006-06-21 20:52                 ` Alan Cox
  2006-06-21 20:08               ` Pete Zaitcev
  1 sibling, 1 reply; 15+ messages in thread
From: Alan Stern @ 2006-06-21 19:06 UTC (permalink / raw)
  To: andi; +Cc: Alan Cox, Andrew Morton, gregkh, linux-kernel, hal,
	linux-usb-devel

On Wed, 21 Jun 2006, Andreas Mohr wrote:

> > The real problem seems to be that the device is reachable in two different 
> > ways, and they don't implement proper mutual exclusion.  HAL (or your test 
> > program) is undoubtedly using /dev/sr0 or something similar, whereas 
> > cdrecord uses /dev/sg0.  Going through two different drivers, it's no 
> > surprise they wind up interfering with each other.
> 
> HAL is /dev/host0/.../cd

That goes through the sr driver.

> cdrecord is -dev=0,0,0 (whatever Linux device file this translates into)
> or a similar device ID as returned by -scanbus.

That goes through the sg driver.

> Probably (stating the obvious here, I'm afraid) we should only send
> non-ALLOW_MEDIUM_REMOVAL for the *very first* device open,
> and then send ALLOW_MEDIUM_REMOVAL after the *very last* device close only.
> 
> So you think that with sr and sg drivers both talking to the device,
> proper inter-driver device tracking is not doable or quite difficult
> to implement?

Well, it's not being done now.  I suspect it wouldn't be too difficult 
technically.  The hardest part might be to obtain the agreement of the 
SCSI and CDROM developers.  :-)

Alan Stern


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-21 16:44             ` Andreas Mohr
  2006-06-21 19:06               ` Alan Stern
@ 2006-06-21 20:08               ` Pete Zaitcev
  1 sibling, 0 replies; 15+ messages in thread
From: Pete Zaitcev @ 2006-06-21 20:08 UTC (permalink / raw)
  To: andi; +Cc: linux-usb-devel, linux-kernel, zaitcev

On Wed, 21 Jun 2006 18:44:25 +0200, Andreas Mohr <andim2@users.sourceforge.net> wrote:

> I'll try to verify this by simply removing all ALLOW_MEDIUM_REMOVAL calls ;)

Try to burn with ub. It does not implement door locking.

-- Pete

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-usb-devel] USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets)
  2006-06-21 19:06               ` Alan Stern
@ 2006-06-21 20:52                 ` Alan Cox
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Cox @ 2006-06-21 20:52 UTC (permalink / raw)
  To: Alan Stern
  Cc: andi, Andrew Morton, gregkh, linux-kernel, hal, linux-usb-devel

Ar Mer, 2006-06-21 am 15:06 -0400, ysgrifennodd Alan Stern:
> > cdrecord is -dev=0,0,0 (whatever Linux device file this translates into)
> > or a similar device ID as returned by -scanbus.
> 
> That goes through the sg driver.

Use a cdrecord that understands SG_IO and dev=/dev/sr0


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-06-21 20:36 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-19  8:21 USB/hal: USB open() broken? (USB CD burner underruns, USB HDD hard resets) Andreas Mohr
2006-06-20  8:37 ` Andrew Morton
2006-06-20  8:47   ` Andreas Mohr
2006-06-20  8:56     ` Andrew Morton
2006-06-20  9:06   ` [linux-usb-devel] " Alan Cox
2006-06-20  9:05     ` Andreas Mohr
2006-06-20 10:18       ` Alan Cox
2006-06-20 14:22       ` Alan Stern
2006-06-21  9:33         ` Andreas Mohr
2006-06-21 16:15           ` Alan Stern
2006-06-21 16:44             ` Andreas Mohr
2006-06-21 19:06               ` Alan Stern
2006-06-21 20:52                 ` Alan Cox
2006-06-21 20:08               ` Pete Zaitcev
2006-06-21  9:10   ` Andreas Mohr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox