Possible Suspend to Ram bug?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Possible Suspend to Ram bug?
@ 2009-07-10 10:58 Thomas Fjellstrom
  2009-07-14 10:17 ` Thomas Fjellstrom
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Fjellstrom @ 2009-07-10 10:58 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]

I've recently gotten a new OCZ Vertex 30G SSD and have noticed that it will 
flip out the second time linux wakes up from "suspend to ram".

The system will run fine for days or weeks, so long as it isn't waking up a 
second StR.

Here is an example error that I get from the device (my root / device):

[42018.455204] sd 0:0:0:0: [sda] Unhandled error code
[42018.455208] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK,SUGGEST_OK
[42018.455215] end_request: I/O error, dev sda, sector 12583031
[42018.455221] EXT3-fs error (device sda2): ext3_get_inode_loc: unable to read 
inode block - inode=391005, block=1572871

At that point using my / fs is pretty much impossible. Every single app fails 
to launch with an I/O Error, about the only command I can run in that state is 
"dmesg" in an existing konsole.

I'm currently using 2.6.29-2-amd64 from debian, and am running on a Gigabyte 
MA790FXT-UD5P, with a AMD Phenom II X4 810 cpu, and 4G ram.

One interesting thing to note, the file system on the Vertex SSD reports as 
clean to fsck on the next boot, while my /home which is on a Seagate 7200.12 
drive reports with several orphaned inodes (every single time). And that's 
regardless if I use ALT+SYSRQ+S/U to try and sync everything. Also, 
ALT+SYSRQ+B doesn't work at that point, only ALT+SYSRQ+O or using the system 
power/reset buttons will work.

I'm attaching the full log I was able to save from dmesg (over nfs, luckily 
that worked).

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

[-- Attachment #2: error.log.gz --]
[-- Type: application/x-gzip, Size: 6895 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Possible Suspend to Ram bug?
  2009-07-10 10:58 Possible Suspend to Ram bug? Thomas Fjellstrom
@ 2009-07-14 10:17 ` Thomas Fjellstrom
  2009-07-14 15:53   ` Jiri Kosina
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Fjellstrom @ 2009-07-14 10:17 UTC (permalink / raw)
  To: linux-kernel

On Fri July 10 2009, Thomas Fjellstrom wrote:
> I've recently gotten a new OCZ Vertex 30G SSD and have noticed that it will
> flip out the second time linux wakes up from "suspend to ram".
>
> The system will run fine for days or weeks, so long as it isn't waking up a
> second StR.
>
> Here is an example error that I get from the device (my root / device):
>
> [42018.455204] sd 0:0:0:0: [sda] Unhandled error code
> [42018.455208] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK,SUGGEST_OK
> [42018.455215] end_request: I/O error, dev sda, sector 12583031
> [42018.455221] EXT3-fs error (device sda2): ext3_get_inode_loc: unable to
> read inode block - inode=391005, block=1572871
>
> At that point using my / fs is pretty much impossible. Every single app
> fails to launch with an I/O Error, about the only command I can run in that
> state is "dmesg" in an existing konsole.
>
> I'm currently using 2.6.29-2-amd64 from debian, and am running on a
> Gigabyte MA790FXT-UD5P, with a AMD Phenom II X4 810 cpu, and 4G ram.
>
> One interesting thing to note, the file system on the Vertex SSD reports as
> clean to fsck on the next boot, while my /home which is on a Seagate
> 7200.12 drive reports with several orphaned inodes (every single time). And
> that's regardless if I use ALT+SYSRQ+S/U to try and sync everything. Also,
> ALT+SYSRQ+B doesn't work at that point, only ALT+SYSRQ+O or using the
> system power/reset buttons will work.
>
> I'm attaching the full log I was able to save from dmesg (over nfs, luckily
> that worked).

Anyone have a clue what might be wrong?

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Possible Suspend to Ram bug?
  2009-07-14 10:17 ` Thomas Fjellstrom
@ 2009-07-14 15:53   ` Jiri Kosina
  2009-07-15  9:59     ` Thomas Fjellstrom
  0 siblings, 1 reply; 8+ messages in thread
From: Jiri Kosina @ 2009-07-14 15:53 UTC (permalink / raw)
  To: Thomas Fjellstrom; +Cc: linux-kernel

On Tue, 14 Jul 2009, Thomas Fjellstrom wrote:

> On Fri July 10 2009, Thomas Fjellstrom wrote:
> > I've recently gotten a new OCZ Vertex 30G SSD and have noticed that it will
> > flip out the second time linux wakes up from "suspend to ram".
> >
> > The system will run fine for days or weeks, so long as it isn't waking up a
> > second StR.
> >
> > Here is an example error that I get from the device (my root / device):
> >
> > [42018.455204] sd 0:0:0:0: [sda] Unhandled error code
> > [42018.455208] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK,SUGGEST_OK
> > [42018.455215] end_request: I/O error, dev sda, sector 12583031
> > [42018.455221] EXT3-fs error (device sda2): ext3_get_inode_loc: unable to
> > read inode block - inode=391005, block=1572871
> >
> > At that point using my / fs is pretty much impossible. Every single app
> > fails to launch with an I/O Error, about the only command I can run in that
> > state is "dmesg" in an existing konsole.
> >
> > I'm currently using 2.6.29-2-amd64 from debian, and am running on a
> > Gigabyte MA790FXT-UD5P, with a AMD Phenom II X4 810 cpu, and 4G ram.
> >
> > One interesting thing to note, the file system on the Vertex SSD reports as
> > clean to fsck on the next boot, while my /home which is on a Seagate
> > 7200.12 drive reports with several orphaned inodes (every single time). And
> > that's regardless if I use ALT+SYSRQ+S/U to try and sync everything. Also,
> > ALT+SYSRQ+B doesn't work at that point, only ALT+SYSRQ+O or using the
> > system power/reset buttons will work.
> >
> > I'm attaching the full log I was able to save from dmesg (over nfs, luckily
> > that worked).
> Anyone have a clue what might be wrong?

First please try to reproduce with recent kernel (2.6.30 at least, 
2.6.31-rc3 preferrably).

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Possible Suspend to Ram bug?
  2009-07-15 11:45         ` Thomas Fjellstrom
@ 2009-07-15  0:04           ` Pavel Machek
  2009-07-16 22:53           ` Thomas Fjellstrom
  1 sibling, 0 replies; 8+ messages in thread
From: Pavel Machek @ 2009-07-15  0:04 UTC (permalink / raw)
  To: Thomas Fjellstrom; +Cc: linux-kernel

On Wed 2009-07-15 05:45:14, Thomas Fjellstrom wrote:
> On Wed July 15 2009, Jiri Kosina wrote:
> > On Wed, 15 Jul 2009, Thomas Fjellstrom wrote:
> > > I'll try with debian's 2.6.30 first. But there's a small issue with
> > > that, .30 and .31 seem to have some performance regressions according to
> > > sites like phoronix.
> >
> > You know, there are lies, then horrible lies, then benchmarks, and
> > benchmarks done wrong.
> >
> > Just do your own measurements under your particular workload, and if you
> > see any performance regression, just report it.
> 
> The benchmarks they run are pretty much what I'd do to test, so I'd more than 
> likely get the same results, and waste a bunch of time.

So rather you expect us to waste a bunch of time?

> 2.6.30 did seem to fix the ssd error. but the first time I suspended, my r8169 
> decided to flip out. I had to rmmod and modprobe it to get the network back 
> up.
> 
> [  867.780034] ------------[ cut here ]------------
> [  867.780165] WARNING: at 
> /home/blank/debian/kernel/tmp/linux-2.6-2.6.30/debian/build/source_amd64_none/net/sched/sch_generic.c:226 
> dev_watchdog+0xc7/0x164()
> [  867.780373] Hardware name: GA-MA790FXT-UD5P
> [  867.780488] NETDEV WATCHDOG: eth1 (r8169): transmit timed out
> [  867.780610] Modules linked in: nvidia(P) powernow_k8 cpufreq_conservative 
> cpufreq_stats cpufreq_userspace cpufreq_powersave nfsd exportfs nfs lockd 
> fscache nfs_acl auth_rpcgss sunrpc it87 hwmon_vid adt7473 firewire_sbp2 loop 
> snd_hda$
> [  867.785472] Pid: 0, comm: swapper Tainted: P

....and some more.


> At this point I was getting repeated "link up" messages and even though 
> ifconfig said the network was up, there was no actual connectivity. as 
> mentioned only rmmod+modprobe of r8169 fixed the problem. It doesn't seem to 
> happen often though.

Reproduce it without taints, then youu can report a regression in network...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Possible Suspend to Ram bug?
  2009-07-14 15:53   ` Jiri Kosina
@ 2009-07-15  9:59     ` Thomas Fjellstrom
  2009-07-15 11:31       ` Jiri Kosina
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Fjellstrom @ 2009-07-15  9:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jiri Kosina

On Tue July 14 2009, Jiri Kosina wrote:
> On Tue, 14 Jul 2009, Thomas Fjellstrom wrote:
> > On Fri July 10 2009, Thomas Fjellstrom wrote:
> > > I've recently gotten a new OCZ Vertex 30G SSD and have noticed that it
> > > will flip out the second time linux wakes up from "suspend to ram".
> > >
> > > The system will run fine for days or weeks, so long as it isn't waking
> > > up a second StR.
> > >
> > > Here is an example error that I get from the device (my root / device):
> > >
> > > [42018.455204] sd 0:0:0:0: [sda] Unhandled error code
> > > [42018.455208] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
> > > driverbyte=DRIVER_OK,SUGGEST_OK
> > > [42018.455215] end_request: I/O error, dev sda, sector 12583031
> > > [42018.455221] EXT3-fs error (device sda2): ext3_get_inode_loc: unable
> > > to read inode block - inode=391005, block=1572871
> > >
> > > At that point using my / fs is pretty much impossible. Every single app
> > > fails to launch with an I/O Error, about the only command I can run in
> > > that state is "dmesg" in an existing konsole.
> > >
> > > I'm currently using 2.6.29-2-amd64 from debian, and am running on a
> > > Gigabyte MA790FXT-UD5P, with a AMD Phenom II X4 810 cpu, and 4G ram.
> > >
> > > One interesting thing to note, the file system on the Vertex SSD
> > > reports as clean to fsck on the next boot, while my /home which is on a
> > > Seagate 7200.12 drive reports with several orphaned inodes (every
> > > single time). And that's regardless if I use ALT+SYSRQ+S/U to try and
> > > sync everything. Also, ALT+SYSRQ+B doesn't work at that point, only
> > > ALT+SYSRQ+O or using the system power/reset buttons will work.
> > >
> > > I'm attaching the full log I was able to save from dmesg (over nfs,
> > > luckily that worked).
> >
> > Anyone have a clue what might be wrong?
>
> First please try to reproduce with recent kernel (2.6.30 at least,
> 2.6.31-rc3 preferrably).

I'll try with debian's 2.6.30 first. But there's a small issue with that, .30 
and .31 seem to have some performance regressions according to sites like 
phoronix.

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Possible Suspend to Ram bug?
  2009-07-15  9:59     ` Thomas Fjellstrom
@ 2009-07-15 11:31       ` Jiri Kosina
  2009-07-15 11:45         ` Thomas Fjellstrom
  0 siblings, 1 reply; 8+ messages in thread
From: Jiri Kosina @ 2009-07-15 11:31 UTC (permalink / raw)
  To: Thomas Fjellstrom; +Cc: linux-kernel

On Wed, 15 Jul 2009, Thomas Fjellstrom wrote:

> I'll try with debian's 2.6.30 first. But there's a small issue with 
> that, .30 and .31 seem to have some performance regressions according to 
> sites like phoronix.

You know, there are lies, then horrible lies, then benchmarks, and 
benchmarks done wrong.

Just do your own measurements under your particular workload, and if you 
see any performance regression, just report it.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Possible Suspend to Ram bug?
  2009-07-15 11:31       ` Jiri Kosina
@ 2009-07-15 11:45         ` Thomas Fjellstrom
  2009-07-15  0:04           ` Pavel Machek
  2009-07-16 22:53           ` Thomas Fjellstrom
  0 siblings, 2 replies; 8+ messages in thread
From: Thomas Fjellstrom @ 2009-07-15 11:45 UTC (permalink / raw)
  To: linux-kernel

On Wed July 15 2009, Jiri Kosina wrote:
> On Wed, 15 Jul 2009, Thomas Fjellstrom wrote:
> > I'll try with debian's 2.6.30 first. But there's a small issue with
> > that, .30 and .31 seem to have some performance regressions according to
> > sites like phoronix.
>
> You know, there are lies, then horrible lies, then benchmarks, and
> benchmarks done wrong.
>
> Just do your own measurements under your particular workload, and if you
> see any performance regression, just report it.

The benchmarks they run are pretty much what I'd do to test, so I'd more than 
likely get the same results, and waste a bunch of time.

2.6.30 did seem to fix the ssd error. but the first time I suspended, my r8169 
decided to flip out. I had to rmmod and modprobe it to get the network back 
up.

[  867.780034] ------------[ cut here ]------------
[  867.780165] WARNING: at 
/home/blank/debian/kernel/tmp/linux-2.6-2.6.30/debian/build/source_amd64_none/net/sched/sch_generic.c:226 
dev_watchdog+0xc7/0x164()
[  867.780373] Hardware name: GA-MA790FXT-UD5P
[  867.780488] NETDEV WATCHDOG: eth1 (r8169): transmit timed out
[  867.780610] Modules linked in: nvidia(P) powernow_k8 cpufreq_conservative 
cpufreq_stats cpufreq_userspace cpufreq_powersave nfsd exportfs nfs lockd 
fscache nfs_acl auth_rpcgss sunrpc it87 hwmon_vid adt7473 firewire_sbp2 loop 
snd_hda$
[  867.785472] Pid: 0, comm: swapper Tainted: P           2.6.30-1-amd64 #1
[  867.785598] Call Trace:
[  867.785699]  <IRQ>  [<ffffffff804229aa>] ? dev_watchdog+0xc7/0x164
[  867.785877]  [<ffffffff804229aa>] ? dev_watchdog+0xc7/0x164
[  867.786005]  [<ffffffff8024236b>] ? warn_slowpath_common+0x77/0xa3
[  867.786130]  [<ffffffff804228e3>] ? dev_watchdog+0x0/0x164
[  867.786252]  [<ffffffff802423f3>] ? warn_slowpath_fmt+0x51/0x59
[  867.786377]  [<ffffffff802342fe>] ? enqueue_task+0x5c/0x65
[  867.786499]  [<ffffffff802546f7>] ? autoremove_wake_function+0x9/0x2e
[  867.786626]  [<ffffffff804228b7>] ? netif_tx_lock+0x3d/0x69
[  867.786749]  [<ffffffff8040f3fc>] ? netdev_drivername+0x3b/0x40
[  867.786873]  [<ffffffff804229aa>] ? dev_watchdog+0xc7/0x164
[  867.786993]  [<ffffffff80235601>] ? __wake_up+0x30/0x44
[  867.787116]  [<ffffffff804228e3>] ? dev_watchdog+0x0/0x164
[  867.787239]  [<ffffffff8024aa2b>] ? run_timer_softirq+0x193/0x210
[  867.787364]  [<ffffffff8025b465>] ? getnstimeofday+0x55/0xaf
[  867.787487]  [<ffffffff80246f55>] ? __do_softirq+0xac/0x173
[  867.787609]  [<ffffffff80210bcc>] ? call_softirq+0x1c/0x30
[  867.787730]  [<ffffffff802125fa>] ? do_softirq+0x3a/0x7e
[  867.787849]  [<ffffffff80246cd2>] ? irq_exit+0x3f/0x80
[  867.787968]  [<ffffffff80220e63>] ? smp_apic_timer_interrupt+0x87/0x94
[  867.788105]  [<ffffffff802105d3>] ? apic_timer_interrupt+0x13/0x20
[  867.788231]  <EOI>  [<ffffffff80227518>] ? native_safe_halt+0x2/0x3
[  867.788410]  [<ffffffff80216995>] ? default_idle+0x40/0x68
[  867.788531]  [<ffffffff8025d714>] ? clockevents_notify+0x2b/0x75
[  867.788656]  [<ffffffff80216d48>] ? c1e_idle+0xe5/0x10d
[  867.788776]  [<ffffffff8020edda>] ? cpu_idle+0x50/0x91
[  867.788894] ---[ end trace 521854739609a619 ]---
[  867.804550] r8169: eth1: link up
[  915.796566] r8169: eth1: link up
[  963.796491] r8169: eth1: link up
[  989.420829] r8169: eth1: link up

At this point I was getting repeated "link up" messages and even though 
ifconfig said the network was up, there was no actual connectivity. as 
mentioned only rmmod+modprobe of r8169 fixed the problem. It doesn't seem to 
happen often though.

I'll update if i see anymore issues.

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Possible Suspend to Ram bug?
  2009-07-15 11:45         ` Thomas Fjellstrom
  2009-07-15  0:04           ` Pavel Machek
@ 2009-07-16 22:53           ` Thomas Fjellstrom
  1 sibling, 0 replies; 8+ messages in thread
From: Thomas Fjellstrom @ 2009-07-16 22:53 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 334 bytes --]

On Wed July 15 2009, Thomas Fjellstrom wrote:
>
> I'll update if i see anymore issues.

Earlier today I was missing my dvdrw, I probably have the dmesg for that some 
place, but I just had my sdb device having issues (/home and swap) after a 
resume. I've attached the log (its a bit long).

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

[-- Attachment #2: error.dmesg.gz --]
[-- Type: application/x-gzip, Size: 23365 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-07-16 22:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-10 10:58 Possible Suspend to Ram bug? Thomas Fjellstrom
2009-07-14 10:17 ` Thomas Fjellstrom
2009-07-14 15:53   ` Jiri Kosina
2009-07-15  9:59     ` Thomas Fjellstrom
2009-07-15 11:31       ` Jiri Kosina
2009-07-15 11:45         ` Thomas Fjellstrom
2009-07-15  0:04           ` Pavel Machek
2009-07-16 22:53           ` Thomas Fjellstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox