From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mikael Andersson Subject: Re: Disk output lockup 2.6.12_rc2 2.6.11.7 Date: Mon, 30 May 2005 12:30:40 +0200 Message-ID: <429AEB50.3010506@karett.se> References: <4266CEED.3010108@karett.se> <42676D02.3070201@karett.se> <20050525233838.744af094.akpm@osdl.org> Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20050525233838.744af094.akpm@osdl.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: Andrew Morton Cc: dm-devel@redhat.com List-Id: dm-devel.ids Andrew Morton wrote: >Mikeal, this smells like a devicemapper lockup. Could you please test >2.6.12-rc5 and provide us with a status update? > > I haven't got any unused disks to try this on atm, but i might be able to use dmsetup to create a dmraid inside my 2G swap partition and craft a test which works with the limited space available. I'll send a report to the list as soon as i've got any results, but it will probably take some time in any case. >Thanks. > > /Mikael >Mikael Andersson wrote: > > >>Mikael Andersson wrote: >> >> >>>During heavy io-load a lockup occurs that appears to prevent any disk >>>output from taking place. fs is reiserfs on two device-mapper mirrored >>>200G maxtor disks. After the lockup occurs you can to things like 'ls', >>>but echo > test.txt will hang. >>> >>> >>fs is now ext3 >> >> >> >>>A typical workload producing the error is doing: >>>rsync of large (1GB) over 100Mbit ethernet >>>simultaneous compilation / gunzip >>> >>> >>Or almost anything that writes something to the disk. >> >> >> >>>I've disabled preemption, and tried with and without acpi enabled, with >>>and without smp support (it was smp by default so i switched it off). >>>Also tried with another nic (rtl8139) since i got an nv_stop_tx: >>>TransmitterStatus remained busy<6> in the logs. I also tried 2.6.11.7 >>>with the same result. >>> >>> >> Tried converting to ext3, some problem, albeit the lockups are less >>severe. More of the locked processes can be killed and echo > test.txt >>works. So _some_ io gets through. >> The output from sysrq-T is somewhat less confusing though, it appears >>then hung processes is somehow being hung in __generic_unplug_device, i >>had a look at the assembler, but couldn't make heads or tails of it. the >>code at __generic_unplug_device+19 was test %eax,%eax immediately >>preceded by a callq to the test instruction. Obviously something magic >>(by my eyes) is going on here. >> >> Also tried 2.6.12_rc3-mm3 >> >> I'd really like to find a solution to this since it kinda borks the >>nice an shiny machine if it can't handle large files without getting >>into trouble. >> >> I've been working on this for two days, have been trying to find >>similar bug reports, trying a lot of different kernels and kernel >>options to no avail. >> I'm a little out of options right now, any ideas for something to try, >>patches to test, or some help in understanding what's happening ? >> >> >>kmirrord/0 D ffff81003f1bccd8 0 978 9 1731 977 (L-TLB) >>Call Trace: >>{cache_alloc_refill+1222} >>{io_schedule+15} >>-- >>kjournald D ffff81003e94bcd8 0 1748 1 2060 953 (L-TLB) >>Call Trace: >>{__generic_unplug_device+19} >>{generic_unplug_device+189} >>-- >>rsync D 000000701553dccf 0 6903 6901 (NOTLB) >>Call Trace: >>{__generic_unplug_device+19} >>{generic_unplug_device+189} >>-- >>x86_64-pc-lin D 0000006dc7d23e49 0 13785 13742 (NOTLB) >>Call Trace: >>{generic_unplug_device+189} >>{dm_unplug_all+29} >> >>/Mikael Andersson >>- >> >>