All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc MERLIN <marc@merlins.org>
To: Robert Hancock <hancockrwd@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, linux-ide@vger.kernel.org
Subject: Re: PMP failure decoding help
Date: Thu, 25 Mar 2010 20:08:46 -0700	[thread overview]
Message-ID: <20100326030846.GI4442@merlins.org> (raw)
In-Reply-To: <4BAC14F9.4070109@gmail.com>

On Thu, Mar 25, 2010 at 07:59:21PM -0600, Robert Hancock wrote:
> These are two different issues, see below.

Thanks for looking.

> >ata6.15: Port Multiplier 1.1, 0x1095:0x4726 r31, 7 ports, feat 0x1/0x9
> >scsi_eh_7: page allocation failure. order:4, mode:0x10
> 
> Well, that's abnormal. Does dmesg show a stack trace after that line?

Wep. I snipped it not to muddle the logs.
scsi_eh_7: page allocation failure. order:4, mode:0x10
Pid: 1798, comm: scsi_eh_7 Not tainted 2.6.31.6-core2smp-1khznohz-preempt-noticks-noide-4gb-20091118 #1
Call Trace:
 [<c03a2203>] ? printk+0xf/0x14
 [<c017735b>] __alloc_pages_nodemask+0x3da/0x41c
 [<c01907ff>] cache_alloc_refill+0x245/0x404
 [<c0190b23>] kmem_cache_alloc+0x4f/0xe4
 [<c02f035c>] sata_pmp_attach+0xde/0x355
 [<c02ebcfa>] ata_eh_recover+0x5d6/0xa8b
 [<c02e20e8>] ? ata_std_postreset+0x0/0x126
 [<f8aec732>] ? sil24_hardreset+0x0/0x222 [sata_sil24]
 [<f8aec9c4>] ? sil24_softreset+0x0/0x1e0 [sata_sil24]
 [<c02e235d>] ? ata_std_prereset+0x0/0x9e
 [<c02efa97>] sata_pmp_error_handler+0xad/0x894
 [<f8aec732>] ? sil24_hardreset+0x0/0x222 [sata_sil24]
 [<c02e20e8>] ? ata_std_postreset+0x0/0x126
 [<c013c1d2>] ? __cancel_work_timer+0x2b/0x144
 [<c03a4d3f>] ? _spin_unlock_irq+0x15/0x29
 [<c02e0db6>] ? ata_wait_register+0x27/0x5c
 [<f8aec1c1>] ? sil24_init_port+0x80/0xae [sata_sil24]
 [<f8aec6ad>] sil24_error_handler+0x24/0x2f [sata_sil24]
 [<c02eca44>] ata_scsi_error+0x2bc/0x5a6
 [<c02ab3a7>] scsi_error_handler+0xb2/0x4c4
 [<c012069b>] ? complete+0x34/0x3e
 [<c02ab2f5>] ? scsi_error_handler+0x0/0x4c4
 [<c013eb00>] kthread+0x6b/0x70
 [<c013ea95>] ? kthread+0x0/0x70
 [<c0103707>] kernel_thread_helper+0x7/0x10
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:   0
CPU    2: hi:  186, btch:  31 usd:   0
CPU    3: hi:  186, btch:  31 usd:   0
HighMem per-cpu:
CPU    0: hi:   42, btch:   7 usd:   0
CPU    1: hi:   42, btch:   7 usd:   0
CPU    2: hi:   42, btch:   7 usd:   0
CPU    3: hi:   42, btch:   7 usd:   0
Active_anon:10305 active_file:47093 inactive_anon:28812
 inactive_file:46344 unevictable:675 dirty:110 writeback:37 unstable:0
 free:13169 slab:103294 mapped:5706 pagetables:1206 bounce:0
DMA free:3672kB min:64kB low:80kB high:96kB active_anon:20kB inactive_anon:56kB active_file:4048kB inactive_file:2588kB unevictable:0kB present:15800kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 865 1000 1000
Normal free:48088kB min:3728kB low:4660kB high:5592kB active_anon:22648kB inactive_anon:70080kB active_file:153232kB inactive_file:146932kB unevictable:360kB present:885944kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 1079 1079
HighMem free:916kB min:132kB low:276kB high:420kB active_anon:18552kB inactive_anon:45112kB active_file:31092kB inactive_file:35856kB unevictable:2340kB present:138120kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 764*4kB 35*8kB 24*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3720kB
Normal: 8864*4kB 1525*8kB 25*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48088kB
HighMem: 164*4kB 18*8kB 6*16kB 2*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 960kB
101437 total pagecache pages
7322 pages in swap cache
Swap cache stats: add 46622724, delete 46615402, find 110219915/113491673
Free swap  = 440648kB
Total swap = 1050608kB
262112 pages RAM
34802 pages HighMem
4882 pages reserved
111180 pages shared
166683 pages non-shared
ata6.15: failed to initialize PMP links

> >ata6.04: configured for UDMA/100
> >ata6.05: unsupported device, disabling
> 
> The device that's being disabled is the configuration pseudo-disk built 
> into the PMP, I believe. Nothing to really worry about there.
 
Ok, thanks.

> >sd 7:2:0:0: [sdo] Attached SCSI disk
> >sd 7:4:0:0: [sdq] Attached SCSI disk
> >ata6.00: failed to read SCR 1 (Emask=0x40)
> >ata6.01: failed to read SCR 1 (Emask=0x40)
> >ata6.02: failed to read SCR 1 (Emask=0x40)
> >ata6.03: failed to read SCR 1 (Emask=0x40)
> >ata6.04: failed to read SCR 1 (Emask=0x40)
> >ata6.05: failed to read SCR 1 (Emask=0x40)
> >ata6.06: failed to read SCR 1 (Emask=0x40)
> >ata6.15: exception Emask 0x10 SAct 0x0 SErr 0x80000 action 0xe frozen
> >ata6.15: irq_stat 0x01140010, PHY RDY changed
> >ata6.15: SError: { 10B8B }
> 
> This one looks like some kind of communication error between the 
> controller and the PMP (maybe the cable wasn't plugged in all the way 
> yet or something?)
 
Well, I plugged the cable in first and then turned the disk array on to
hopefully avoid a half connection, but who knows?

> >ata6.00: exception Emask 0x0 SAct 0xd SErr 0x0 action 0x6
> >ata6.00: irq_stat 0x00060002, device error via SDB FIS
> >ata6.00: cmd 60/d8:00:77:05:90/00:00:d0:00:00/40 tag 0 ncq 110592 in
> >          res 2e/36:00:00:00:00/00:00:00:00:2e/00 Emask 0x2 (HSM violation)
> >ata6.00: status: { DF DRQ }
> >ata6.00: error: { IDNF ABRT }
> >ata6.00: cmd 60/10:10:5f:05:90/00:00:d0:00:00/40 tag 2 ncq 8192 in
> >          res 41/40:00:69:05:90/2e:00:d0:00:00/40 Emask 0x409 (media 
> >          error)<F>
> >ata6.00: status: { DRDY ERR }
> >ata6.00: error: { UNC }
> 
> That drive doesn't seem to be happy, it's reporting an uncorrectable 
> read error. Have you checked its SMART status? Could be a bad drive, 
> insufficient power, too hot, etc.
 
It's in a hot swappable disk enclosure with 5 drives. I'm hoping it has enough power.
As for too hot, I don't think so. I have reasonable cooling in that cabinet
and 14 other drives running without problem.

But you're right, the drive looks toast out of the box. I should have
checked this first, silly me.

gargamel:~# smartctl -A /dev/sdm
smartctl 5.39 2009-10-10 r2955 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       6
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       4
  5 Reallocated_Sector_Ct   0x0033   166   166   140    Pre-fail  Always       -       267
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  7 Seek_Error_Rate         0x002e   200   191   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       20
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       3
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       31
194 Temperature_Celsius     0x0022   126   122   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   188   188   000    Old_age   Always       -       12
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       3
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

  reply	other threads:[~2010-03-26  3:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-25 18:21 PMP failure decoding help Marc MERLIN
2010-03-26  1:59 ` Robert Hancock
2010-03-26  3:08   ` Marc MERLIN [this message]
2010-03-26  6:06     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100326030846.GI4442@merlins.org \
    --to=marc@merlins.org \
    --cc=hancockrwd@gmail.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.