Sorry about the long delay. Niel Lambrechts wrote: > Morning Tejun, > > Tejun Heo wrote: >> Hello, >> >> Can you please do the followings? >> >> 1. Apply the attached patch, build & boot >> > I chose 2.6.30-rc7... >> 2. Trigger the problem and record dmesg >> > It took 3 days and quite a few hibernate attempts ... :-) > >> 3. On failed IO, the kernel will print the address of bi_endio. Run >> "nm -n" on the vmlinux in the kernel build root and look up which >> function it is and post the dmesg and function name. > I did not have that specific vmlinux.o file any more, but > /boot/System.map-2.6.30-rc7-pae shows: > c01a49fd t end_bio_bh_io_sync So, it's coming from submit_bh() > Hope this is sufficient to help you. Sorry if this is silly - being so > inexperienced with the kernel - but I wondered if or why a dump_stack() > in that debug patch would not be helpful? The result is perfectly good and yeah dump_stack() on the issue path would help but the problem is that block IO requests are processed asynchronously so by the time we find out which request fail, the requester stack is long gone. We can either record the stack trace with each request or trace it back one step at a time by chasing down the completion callbacks. The first requires more coding, so... :-) Looks like the request gotta be coming from __breadahead(). The only place this is used in ext4 is in __ext4_get_inode_loc(). Ah.. it also contains the matching error message. I still don't see how the READA buffer reads can affect the synchronous path. They're doing proper exclusion via buffer lock. Maybe they're getting merged? Yeap, looks like block code is merging READAs and regular READs. Can you please try the attached patch and reproduce the problem and report the kernel log? Hopefully, this will be the last debug run. Thanks. -- tejun