From: Jim Duchek <jim.duchek@gmail.com>
To: Mark Knecht <markknecht@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Array 'freezes' for some time after large writes?
Date: Tue, 30 Mar 2010 14:59:20 -0600 [thread overview]
Message-ID: <r2vdead81ad1003301359g73b5a6ady3a9bcc174d556579@mail.gmail.com> (raw)
In-Reply-To: <5bdc1c8b1003301345p46efdaddv5d420c30e75b013f@mail.gmail.com>
I'm using ext4 on everything, but it's hard to judge which ext3 bugs
might affect ext4 as well. I really don't have the ability to
destructively test the array, I need all the data that's on it and I
don't have enough spare space elsewhere to back it all up. You might
see if you can trigger it with dd, writing to the drive directly w/no
filesystem?
Jim
On 30 March 2010 14:45, Mark Knecht <markknecht@gmail.com> wrote:
> Hi,
> I am running the nvidia binary drivers. I'm not doing anything with
> X at this point so I an just unload them I think. I could even remove
> the card I suppose.
>
> I built a machine for my dad a couple of months ago that uses the
> same 1TB WD drive that I am using now. I don't remember seeing
> anything like this on his machine but I'm going to go check that.
>
> One other similarity I suspect we have is ext3? There were problems
> with ext3 priority inversion in earlier kernel. It's my understanding
> that they thought they had that worked out but possibly we're
> triggering this somehow? since I've got a lot of disk space I can set
> up some other partitions, etc4, reiser4, etc., and try copying files
> to trigger it. However it's difficult for me if it requires read/write
> as I'm not set up to really use the machine yet. Is that something you
> have room to try?
>
> Also, we haven't discussed what drivers are loaded or kernel
> config. Here's my current driver set:
>
> keeper ~ # lsmod
> Module Size Used by
> ipv6 207757 30
> usbhid 21529 0
> nvidia 10611606 22
> snd_hda_codec_realtek 239530 1
> snd_hda_intel 17688 0
> ehci_hcd 30854 0
> snd_hda_codec 45755 2 snd_hda_codec_realtek,snd_hda_intel
> snd_pcm 58104 2 snd_hda_intel,snd_hda_codec
> snd_timer 15030 1 snd_pcm
> snd 37476 5
> snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
> soundcore 800 1 snd
> snd_page_alloc 5809 2 snd_hda_intel,snd_pcm
> rtc_cmos 7678 0
> rtc_core 11093 1 rtc_cmos
> sg 23029 0
> uhci_hcd 18047 0
> usbcore 115023 4 usbhid,ehci_hcd,uhci_hcd
> agpgart 24341 1 nvidia
> processor 23121 0
> e1000e 111701 0
> firewire_ohci 20022 0
> rtc_lib 1617 1 rtc_core
> firewire_core 36109 1 firewire_ohci
> thermal 11650 0
> keeper ~ #
>
> - Mark
>
> On Tue, Mar 30, 2010 at 1:32 PM, Jim Duchek <jim.duchek@gmail.com> wrote:
>> Hrm, I've never seen that kernel message. I don't think any of my
>> freezes have lasted for up to 120 seconds though (my drives are half
>> as big -- might matter?) It looks like we've both got WD drives --
>> and we both have nvidia 9500gt's as well. Are you running the nvidia
>> binary drivers, or noveau? (It seems like it wouldn't matter
>> especially as, at least on my system, they don't share an interrupt or
>> anything, but I hate to ignore any hardware that we both have the same
>> of). I did move to 2.6.33 for some time, but that didn't change the
>> behaviour.
>>
>> Jim
>>
>>
>> On 30 March 2010 13:05, Mark Knecht <markknecht@gmail.com> wrote:
>>> On Tue, Mar 30, 2010 at 10:47 AM, Jim Duchek <jim.duchek@gmail.com> wrote:
>>> <SNIP>
>>>> You're having this happen even if the disk in question is not in an
>>>> array? If so perhaps it's an SATA issue and not a RAID one, and we
>>>> should move this discussion accordingly.
>>>
>>> Yes, in my case the delays are so long - sometimes 2 or 3 minutes -
>>> that when I tried to build the system using RAID1 I got this kernel
>>> bug in dmesg. It's jsut info - not a real failure - but because it's
>>> talking about long delays I gave up on RAID and tried a standard
>>> single drive build. Turns out that it has (I think...) nothing to do
>>> with RAID at all. you'll not that there are instructions for turning
>>> the message off but I've not tried them. I intend to do a parallel
>>> RAID1 build on this machine and be able to test both RAID vs non-RAID.
>>>
>>> - Mark
>>>
>>> INFO: task kjournald:17466 blocked for more than 120 seconds.
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> kjournald D ffff8800280bbe00 0 17466 2 0x00000000
>>> ffff8801adf9d890 0000000000000046 0000000000000000 0000000000000000
>>> ffff8801adcbde44 0000000000004000 000000000000fe00 000000000000c878
>>> 0000000800000050 ffff88017a99aa40 ffff8801af90a150 ffff8801adf9db08
>>> Call Trace:
>>> [<ffffffff812dd063>] ? md_make_request+0xb6/0xf1
>>> [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
>>> [<ffffffff8137a4fc>] ? io_schedule+0x2d/0x3a
>>> [<ffffffff8109c283>] ? sync_buffer+0x3b/0x40
>>> [<ffffffff8137a879>] ? __wait_on_bit+0x41/0x70
>>> [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
>>> [<ffffffff8137a913>] ? out_of_line_wait_on_bit+0x6b/0x77
>>> [<ffffffff810438b2>] ? wake_bit_function+0x0/0x23
>>> [<ffffffff8109c637>] ? sync_dirty_buffer+0x72/0xaa
>>> [<ffffffff81131b8e>] ? journal_commit_transaction+0xa74/0xde2
>>> [<ffffffff8103abcc>] ? lock_timer_base+0x26/0x4b
>>> [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
>>> [<ffffffff81134804>] ? kjournald+0xe3/0x206
>>> [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
>>> [<ffffffff81134721>] ? kjournald+0x0/0x206
>>> [<ffffffff81043591>] ? kthread+0x8b/0x93
>>> [<ffffffff8100bd3a>] ? child_rip+0xa/0x20
>>> [<ffffffff81043506>] ? kthread+0x0/0x93
>>> [<ffffffff8100bd30>] ? child_rip+0x0/0x20
>>> livecd ~ #
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-03-30 20:59 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-30 17:07 Array 'freezes' for some time after large writes? Jim Duchek
2010-03-30 17:18 ` Mark Knecht
2010-03-30 17:47 ` Jim Duchek
2010-03-30 18:00 ` Mark Knecht
2010-03-30 18:05 ` Mark Knecht
2010-03-30 20:32 ` Jim Duchek
2010-03-30 20:45 ` Mark Knecht
2010-03-30 20:59 ` Jim Duchek [this message]
2010-03-30 22:21 ` Mark Knecht
2010-03-30 23:50 ` Mark Knecht
2010-03-31 0:22 ` Jim Duchek
2010-03-31 1:35 ` Roger Heflin
2010-03-31 16:12 ` Mark Knecht
2010-03-31 16:25 ` Jim Duchek
2010-03-31 16:37 ` Asdo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=r2vdead81ad1003301359g73b5a6ady3a9bcc174d556579@mail.gmail.com \
--to=jim.duchek@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=markknecht@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).