From: NeilBrown <neilb@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
Bernd Schubert <bernd-schubert@gmx.de>,
Dan Williams <dan.j.williams@intel.com>
Subject: [PATCH 004 of 10] md: md: raid5 rate limit error printk
Date: Mon, 19 May 2008 11:10:31 +1000 [thread overview]
Message-ID: <1080519011031.7672@suse.de> (raw)
In-Reply-To: 20080519110910.7473.patches@notabene
From: Bernd Schubert <bernd-schubert@gmx.de>
last night we had scsi problems and a hardware raid
unit was offlined during heavy i/o. While this happened we got for
about 3 minutes a huge number messages like these
Apr 12 03:36:07 pfs1n14 kernel: [197510.696595] raid5:md7: read error not correctable (sector 2993096568 on sdj2).
I guess the high error rate is responsible for not scheduling other
events - during this time the system was not pingable and in the end
also other devices run into scsi command timeouts causing problems on
these unrelated devices as well.
Signed-off-by: Bernd Schubert <bernd-schubert@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid5.c | 34 ++++++++++++++++++++++------------
1 file changed, 22 insertions(+), 12 deletions(-)
diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c 2008-05-19 11:02:09.000000000 +1000
+++ ./drivers/md/raid5.c 2008-05-19 11:02:44.000000000 +1000
@@ -94,6 +94,8 @@
#define __inline__
#endif
+#define printk_rl(args...) ((void) (printk_ratelimit() && printk(args)))
+
#if !RAID6_USE_EMPTY_ZERO_PAGE
/* In .bss so it's zeroed */
const char raid6_empty_zero_page[PAGE_SIZE] __attribute__((aligned(256)));
@@ -1143,10 +1145,12 @@ static void raid5_end_read_request(struc
set_bit(R5_UPTODATE, &sh->dev[i].flags);
if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
rdev = conf->disks[i].rdev;
- printk(KERN_INFO "raid5:%s: read error corrected (%lu sectors at %llu on %s)\n",
- mdname(conf->mddev), STRIPE_SECTORS,
- (unsigned long long)(sh->sector + rdev->data_offset),
- bdevname(rdev->bdev, b));
+ printk_rl(KERN_INFO "raid5:%s: read error corrected"
+ " (%lu sectors at %llu on %s)\n",
+ mdname(conf->mddev), STRIPE_SECTORS,
+ (unsigned long long)(sh->sector
+ + rdev->data_offset),
+ bdevname(rdev->bdev, b));
clear_bit(R5_ReadError, &sh->dev[i].flags);
clear_bit(R5_ReWrite, &sh->dev[i].flags);
}
@@ -1160,16 +1164,22 @@ static void raid5_end_read_request(struc
clear_bit(R5_UPTODATE, &sh->dev[i].flags);
atomic_inc(&rdev->read_errors);
if (conf->mddev->degraded)
- printk(KERN_WARNING "raid5:%s: read error not correctable (sector %llu on %s).\n",
- mdname(conf->mddev),
- (unsigned long long)(sh->sector + rdev->data_offset),
- bdn);
+ printk_rl(KERN_WARNING
+ "raid5:%s: read error not correctable "
+ "(sector %llu on %s).\n",
+ mdname(conf->mddev),
+ (unsigned long long)(sh->sector
+ + rdev->data_offset),
+ bdn);
else if (test_bit(R5_ReWrite, &sh->dev[i].flags))
/* Oh, no!!! */
- printk(KERN_WARNING "raid5:%s: read error NOT corrected!! (sector %llu on %s).\n",
- mdname(conf->mddev),
- (unsigned long long)(sh->sector + rdev->data_offset),
- bdn);
+ printk_rl(KERN_WARNING
+ "raid5:%s: read error NOT corrected!! "
+ "(sector %llu on %s).\n",
+ mdname(conf->mddev),
+ (unsigned long long)(sh->sector
+ + rdev->data_offset),
+ bdn);
else if (atomic_read(&rdev->read_errors)
> conf->max_nr_stripes)
printk(KERN_WARNING
WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <neilb@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Bernd Schubert <bernd-schubert@gmx.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Subject: [PATCH 004 of 10] md: md: raid5 rate limit error printk
Date: Mon, 19 May 2008 11:10:31 +1000 [thread overview]
Message-ID: <1080519011031.7672@suse.de> (raw)
In-Reply-To: 20080519110910.7473.patches@notabene
From: Bernd Schubert <bernd-schubert@gmx.de>
last night we had scsi problems and a hardware raid
unit was offlined during heavy i/o. While this happened we got for
about 3 minutes a huge number messages like these
Apr 12 03:36:07 pfs1n14 kernel: [197510.696595] raid5:md7: read error not correctable (sector 2993096568 on sdj2).
I guess the high error rate is responsible for not scheduling other
events - during this time the system was not pingable and in the end
also other devices run into scsi command timeouts causing problems on
these unrelated devices as well.
Signed-off-by: Bernd Schubert <bernd-schubert@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid5.c | 34 ++++++++++++++++++++++------------
1 file changed, 22 insertions(+), 12 deletions(-)
diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c 2008-05-19 11:02:09.000000000 +1000
+++ ./drivers/md/raid5.c 2008-05-19 11:02:44.000000000 +1000
@@ -94,6 +94,8 @@
#define __inline__
#endif
+#define printk_rl(args...) ((void) (printk_ratelimit() && printk(args)))
+
#if !RAID6_USE_EMPTY_ZERO_PAGE
/* In .bss so it's zeroed */
const char raid6_empty_zero_page[PAGE_SIZE] __attribute__((aligned(256)));
@@ -1143,10 +1145,12 @@ static void raid5_end_read_request(struc
set_bit(R5_UPTODATE, &sh->dev[i].flags);
if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
rdev = conf->disks[i].rdev;
- printk(KERN_INFO "raid5:%s: read error corrected (%lu sectors at %llu on %s)\n",
- mdname(conf->mddev), STRIPE_SECTORS,
- (unsigned long long)(sh->sector + rdev->data_offset),
- bdevname(rdev->bdev, b));
+ printk_rl(KERN_INFO "raid5:%s: read error corrected"
+ " (%lu sectors at %llu on %s)\n",
+ mdname(conf->mddev), STRIPE_SECTORS,
+ (unsigned long long)(sh->sector
+ + rdev->data_offset),
+ bdevname(rdev->bdev, b));
clear_bit(R5_ReadError, &sh->dev[i].flags);
clear_bit(R5_ReWrite, &sh->dev[i].flags);
}
@@ -1160,16 +1164,22 @@ static void raid5_end_read_request(struc
clear_bit(R5_UPTODATE, &sh->dev[i].flags);
atomic_inc(&rdev->read_errors);
if (conf->mddev->degraded)
- printk(KERN_WARNING "raid5:%s: read error not correctable (sector %llu on %s).\n",
- mdname(conf->mddev),
- (unsigned long long)(sh->sector + rdev->data_offset),
- bdn);
+ printk_rl(KERN_WARNING
+ "raid5:%s: read error not correctable "
+ "(sector %llu on %s).\n",
+ mdname(conf->mddev),
+ (unsigned long long)(sh->sector
+ + rdev->data_offset),
+ bdn);
else if (test_bit(R5_ReWrite, &sh->dev[i].flags))
/* Oh, no!!! */
- printk(KERN_WARNING "raid5:%s: read error NOT corrected!! (sector %llu on %s).\n",
- mdname(conf->mddev),
- (unsigned long long)(sh->sector + rdev->data_offset),
- bdn);
+ printk_rl(KERN_WARNING
+ "raid5:%s: read error NOT corrected!! "
+ "(sector %llu on %s).\n",
+ mdname(conf->mddev),
+ (unsigned long long)(sh->sector
+ + rdev->data_offset),
+ bdn);
else if (atomic_read(&rdev->read_errors)
> conf->max_nr_stripes)
printk(KERN_WARNING
next prev parent reply other threads:[~2008-05-19 1:10 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-19 1:10 [PATCH 000 of 10] md: Various bug fixes and small improvements for md in 2.6.26-rc NeilBrown
2008-05-19 1:10 ` NeilBrown
2008-05-19 1:10 ` [PATCH 001 of 10] md: Fix possible oops when removing a bitmap from an active array NeilBrown
2008-05-19 1:10 ` [PATCH 002 of 10] md: proper extern for mdp_major NeilBrown
2008-05-19 1:10 ` NeilBrown
2008-05-19 1:10 ` [PATCH 003 of 10] md: kill file_path wrapper NeilBrown
2008-05-19 1:10 ` NeilBrown
2008-05-19 1:10 ` NeilBrown [this message]
2008-05-19 1:10 ` [PATCH 004 of 10] md: md: raid5 rate limit error printk NeilBrown
2008-05-19 1:10 ` [PATCH 005 of 10] md: raid1: Fix restoration of bio between failed read and write NeilBrown
2008-05-19 1:10 ` NeilBrown
2008-05-19 1:10 ` [PATCH 006 of 10] md: Notify userspace on 'write-pending' changes to array_state NeilBrown
2008-05-19 1:10 ` [PATCH 007 of 10] md: notify userspace on 'stop' events NeilBrown
2008-05-19 1:10 ` NeilBrown
2008-05-19 1:10 ` [PATCH 008 of 10] md: Improve setting of "events_cleared" for write-intent bitmaps NeilBrown
2008-05-19 1:10 ` NeilBrown
2008-05-19 1:11 ` [PATCH 009 of 10] md: Allow parallel resync of md-devices NeilBrown
2008-05-19 1:11 ` NeilBrown
2008-05-19 1:11 ` [PATCH 010 of 10] md: Restart recovery cleanly after device failure NeilBrown
2008-05-19 1:11 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1080519011031.7672@suse.de \
--to=neilb@suse.de \
--cc=akpm@linux-foundation.org \
--cc=bernd-schubert@gmx.de \
--cc=dan.j.williams@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.