From: Tejun Heo <htejun@gmail.com>
To: Niel Lambrechts <niel.lambrechts@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
"linux.kernel" <linux-kernel@vger.kernel.org>,
Theodore Tso <tytso@mit.edu>
Subject: Re: 2.6.29 regression: ATA bus errors on resume
Date: Thu, 25 Jun 2009 21:57:38 +0900 [thread overview]
Message-ID: <4A437442.8000909@gmail.com> (raw)
In-Reply-To: <4A2A1521.5020407@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1864 bytes --]
Sorry about the long delay.
Niel Lambrechts wrote:
> Morning Tejun,
>
> Tejun Heo wrote:
>> Hello,
>>
>> Can you please do the followings?
>>
>> 1. Apply the attached patch, build & boot
>>
> I chose 2.6.30-rc7...
>> 2. Trigger the problem and record dmesg
>>
> It took 3 days and quite a few hibernate attempts ... :-)
>
>> 3. On failed IO, the kernel will print the address of bi_endio. Run
>> "nm -n" on the vmlinux in the kernel build root and look up which
>> function it is and post the dmesg and function name.
> I did not have that specific vmlinux.o file any more, but
> /boot/System.map-2.6.30-rc7-pae shows:
> c01a49fd t end_bio_bh_io_sync
So, it's coming from submit_bh()
> Hope this is sufficient to help you. Sorry if this is silly - being so
> inexperienced with the kernel - but I wondered if or why a dump_stack()
> in that debug patch would not be helpful?
The result is perfectly good and yeah dump_stack() on the issue path
would help but the problem is that block IO requests are processed
asynchronously so by the time we find out which request fail, the
requester stack is long gone. We can either record the stack trace
with each request or trace it back one step at a time by chasing down
the completion callbacks. The first requires more coding, so... :-)
Looks like the request gotta be coming from __breadahead(). The only
place this is used in ext4 is in __ext4_get_inode_loc(). Ah.. it also
contains the matching error message. I still don't see how the READA
buffer reads can affect the synchronous path. They're doing proper
exclusion via buffer lock. Maybe they're getting merged? Yeap, looks
like block code is merging READAs and regular READs.
Can you please try the attached patch and reproduce the problem and
report the kernel log? Hopefully, this will be the last debug run.
Thanks.
--
tejun
[-- Attachment #2: bio_endio-debug2.patch --]
[-- Type: text/x-patch, Size: 1340 bytes --]
diff --git a/block/blk-core.c b/block/blk-core.c
index b06cf5c..c8b3a6f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -155,8 +155,13 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
if (bio_integrity(bio))
bio_integrity_advance(bio, nbytes);
- if (bio->bi_size == 0)
+ if (bio->bi_size == 0) {
+ if (error)
+ printk("XXX %s: failing bio %p bi_rw=0x%lx with %d\n",
+ rq->rq_disk ? rq->rq_disk->disk_name : "?",
+ bio, bio->bi_rw, error);
bio_endio(bio, error);
+ }
} else {
/*
diff --git a/fs/bio.c b/fs/bio.c
index 24c9140..007edb9 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1390,13 +1390,24 @@ void bio_check_pages_dirty(struct bio *bio)
**/
void bio_endio(struct bio *bio, int error)
{
+ char name[BDEVNAME_SIZE] = "?";
+
+ if (bio->bi_bdev)
+ bdevname(bio->bi_bdev, name);
+
if (error)
clear_bit(BIO_UPTODATE, &bio->bi_flags);
- else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
+ else if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) {
+ printk("XXX %s: !uptodate on bio %p\n", name, bio);
error = -EIO;
+ }
- if (bio->bi_end_io)
+ if (bio->bi_end_io) {
+ if (error)
+ printk("XXX %s: bio=%p error=%d bi_end_io=%p\n",
+ name, bio, error, bio->bi_end_io);
bio->bi_end_io(bio, error);
+ }
}
void bio_pair_release(struct bio_pair *bp)
next prev parent reply other threads:[~2009-06-25 12:57 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <clqON-2Xv-7@gated-at.bofh.it>
[not found] ` <clqON-2Xv-9@gated-at.bofh.it>
[not found] ` <clqON-2Xv-11@gated-at.bofh.it>
[not found] ` <clqON-2Xv-13@gated-at.bofh.it>
[not found] ` <clqON-2Xv-15@gated-at.bofh.it>
[not found] ` <clqON-2Xv-17@gated-at.bofh.it>
[not found] ` <clqON-2Xv-19@gated-at.bofh.it>
[not found] ` <clqON-2Xv-5@gated-at.bofh.it>
[not found] ` <clqYt-3bu-5@gated-at.bofh.it>
2009-03-30 18:24 ` 2.6.29 regression: ATA bus errors on resume Niel Lambrechts
2009-03-30 19:17 ` Jeff Garzik
[not found] ` <cmknZ-8lW-9@gated-at.bofh.it>
[not found] ` <cmoBl-6Ok-21@gated-at.bofh.it>
[not found] ` <cmp4n-7rb-15@gated-at.bofh.it>
[not found] ` <cmsYg-5BR-27@gated-at.bofh.it>
[not found] ` <cmvW7-1Yj-23@gated-at.bofh.it>
[not found] ` <cnheh-3vO-7@gated-at.bofh.it>
[not found] ` <cnPg1-7Q4-19@gated-at.bofh.it>
[not found] ` <cnTWo-7bV-25@gated-at.bofh.it>
[not found] ` <co1Kd-350-5@gated-at.bofh.it>
[not found] ` <co2Qf-4QQ-27@gated-at.bofh.it>
[not found] ` <co4yj-7Mc-5@gated-at.bofh.it>
[not found] ` <cp71c-4py-29@gated-at.bofh.it>
[not found] ` <cEVyE-re-1@gated-at.bofh.it>
2009-05-23 9:36 ` Niel Lambrechts
2009-05-25 1:10 ` Tejun Heo
2009-05-25 8:15 ` Alan Cox
2009-05-25 22:06 ` Niel Lambrechts
2009-05-26 4:58 ` Tejun Heo
2009-05-26 5:43 ` Niel Lambrechts
2009-05-26 5:50 ` Tejun Heo
2009-05-26 6:13 ` Niel Lambrechts
2009-05-26 13:33 ` Tejun Heo
2009-05-26 18:14 ` Niel Lambrechts
2009-05-27 0:07 ` Tejun Heo
2009-05-27 14:01 ` Niel Lambrechts
2009-06-01 18:57 ` Niel Lambrechts
2009-06-03 3:14 ` Tejun Heo
2009-06-03 4:28 ` Tejun Heo
2009-06-06 7:05 ` Niel Lambrechts
2009-06-19 15:04 ` Pavel Machek
2009-06-25 12:57 ` Tejun Heo [this message]
2009-06-25 15:25 ` Niel Lambrechts
2009-06-26 0:46 ` Tejun Heo
2009-06-26 6:24 ` Niel Lambrechts
2009-09-18 20:26 ` Berthold Gunreben
2009-09-25 4:11 ` Tejun Heo
2009-09-30 9:58 ` Berthold Gunreben
2009-09-30 10:26 ` Tejun Heo
2009-05-26 4:58 ` Tejun Heo
[not found] <ckpL0-3TE-3@gated-at.bofh.it>
[not found] ` <ckpL0-3TE-5@gated-at.bofh.it>
[not found] ` <ckpL0-3TE-7@gated-at.bofh.it>
[not found] ` <ckpL0-3TE-9@gated-at.bofh.it>
[not found] ` <ckpL0-3TE-11@gated-at.bofh.it>
[not found] ` <ckpL0-3TE-1@gated-at.bofh.it>
[not found] ` <cllvN-2Gf-1@gated-at.bofh.it>
2009-03-30 14:30 ` Niel Lambrechts
2009-03-30 14:40 ` Jeff Garzik
2009-04-01 19:48 ` Niel Lambrechts
2009-04-03 20:09 ` Jeff Garzik
2009-04-03 20:54 ` Niel Lambrechts
2009-04-02 1:50 ` Tejun Heo
2009-04-02 6:20 ` Niel Lambrechts
2009-04-02 6:52 ` Tejun Heo
2009-04-02 11:03 ` Niel Lambrechts
2009-04-02 14:15 ` Niel Lambrechts
2009-04-04 4:54 ` Tejun Heo
2009-04-06 5:01 ` Niel Lambrechts
2009-04-06 10:09 ` Tejun Heo
2009-04-06 18:23 ` Niel Lambrechts
2009-04-06 19:39 ` Tejun Heo
2009-04-06 21:26 ` Niel Lambrechts
2009-04-09 18:18 ` Tejun Heo
2009-05-23 9:17 ` Niel Lambrechts
[not found] <cjtH6-3Ll-13@gated-at.bofh.it>
[not found] ` <cjtH6-3Ll-15@gated-at.bofh.it>
[not found] ` <cjtH6-3Ll-11@gated-at.bofh.it>
[not found] ` <cjutt-577-11@gated-at.bofh.it>
[not found] ` <cjJCb-47c-23@gated-at.bofh.it>
2009-03-27 19:10 ` Niel Lambrechts
2009-03-27 22:30 ` Arjan van de Ven
2009-03-28 10:22 ` Niel Lambrechts
2009-03-28 14:06 ` Rafael J. Wysocki
2009-03-30 8:43 ` Tejun Heo
2009-03-30 8:55 ` Tejun Heo
[not found] <cjlqb-7sp-1@gated-at.bofh.it>
[not found] ` <cjq6y-6sq-11@gated-at.bofh.it>
2009-03-25 5:19 ` 2.6.29 regression: ATA bus errors on resume (was: EXT4: __ext4_get_inode_loc errors after s2disk) Niel Lambrechts
2009-03-25 6:06 ` 2.6.29 regression: ATA bus errors on resume Jeff Garzik
2009-03-25 21:40 ` Niel Lambrechts
2009-03-25 22:16 ` James Bottomley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A437442.8000909@gmail.com \
--to=htejun@gmail.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=niel.lambrechts@gmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).