Re: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies)Pid:902, comm: md1_raid5

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Janos Haar" <janos.haar@netcenter.hu>
To: "Neil Brown" <neilb@suse.de>
Cc: <paulmck@linux.vnet.ibm.com>, <linux-kernel@vger.kernel.org>
Subject: Re: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies)Pid:902, comm: md1_raid5
Date: Thu, 21 May 2009 23:53:16 +0200	[thread overview]
Message-ID: <050e01c9da5e$8d142b20$0400a8c0@dcccs> (raw)
In-Reply-To: 18964.63919.206864.619147@notabene.brown

Neil, Paul,

The problem solved.
It was a bios bug.
(The fedora install CD makes the same, and i am checked with the latest BIOS 
version, and the delays are gone. 8-)

Thanks for all help for you too!

Janos Haar

----- Original Message ----- 
From: "Neil Brown" <neilb@suse.de>
To: <paulmck@linux.vnet.ibm.com>
Cc: "Janos Haar" <janos.haar@netcenter.hu>; <linux-kernel@vger.kernel.org>
Sent: Thursday, May 21, 2009 8:50 AM
Subject: Re: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies)Pid:902, 
comm: md1_raid5


> On Wednesday May 20, paulmck@linux.vnet.ibm.com wrote:
>> On Thu, May 21, 2009 at 06:46:15AM +0200, Janos Haar wrote:
>> > Paul,
>> >
>> > Thank you for your attention.
>> > Yes, the PC makes 2-3 second "pauses" and drop this message again and
>> > again.
>> > If i remove the RCU debugging, the message disappears, but the pauses 
>> > still
>> > here, and makes 2-3 load on the idle system.
>> > Can i do something?
>> > You suggest to use PREEMPT? (This is a server.)
>>
>> One possibility is that the lock that bitmap_daemon_work() acquires is
>> being held for too long.  Another possibility is the list traversal in
>> md_check_recovery() that might loop for a long time if the list were
>> excessively long or could be temporarily tied in a knot.
>>
>> Neil, thoughts?
>>
>
> I would be surprised if any of these things take as long as 3 seconds
> (or even 1 second) but I cannot completely rule it out.
>
> I assume that you mean 3 seconds of continuous running with no
> sleeping, so it cannot be a slow kmalloc that is causing the delay?
>
> bitmap_daemon_work is the most likely candidate as bitmap->chunks
> can be very large (thousands, probably not millions though).
> Taking and dropping the lock every time around that loop doesn't
> really make much sense, does it....
> And it looks like it could actually be optimised quite a bit to skip a
> lot of the iterations in most cases - there are two places where we
> can accelerate 'j' quite a lot.
>
> Janos: Can you try this and see if it makes a difference?
> Thanks.
>
> NeilBrown
>
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index 47c68bc..56df1ce 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -1097,14 +1097,12 @@ void bitmap_daemon_work(struct bitmap *bitmap)
>  }
>  bitmap->allclean = 1;
>
> + spin_lock_irqsave(&bitmap->lock, flags);
>  for (j = 0; j < bitmap->chunks; j++) {
>  bitmap_counter_t *bmc;
> - spin_lock_irqsave(&bitmap->lock, flags);
> - if (!bitmap->filemap) {
> + if (!bitmap->filemap)
>  /* error or shutdown */
> - spin_unlock_irqrestore(&bitmap->lock, flags);
>  break;
> - }
>
>  page = filemap_get_page(bitmap, j);
>
> @@ -1121,6 +1119,8 @@ void bitmap_daemon_work(struct bitmap *bitmap)
>  write_page(bitmap, page, 0);
>  bitmap->allclean = 0;
>  }
> + spin_lock_irqsave(&bitmap->lock, flags);
> + j |= (PAGE_BITS - 1);
>  continue;
>  }
>
> @@ -1181,9 +1181,10 @@ void bitmap_daemon_work(struct bitmap *bitmap)
>  ext2_clear_bit(file_page_offset(j), paddr);
>  kunmap_atomic(paddr, KM_USER0);
>  }
> - }
> - spin_unlock_irqrestore(&bitmap->lock, flags);
> + } else
> + j |= PAGE_COUNTER_MASK;
>  }
> + spin_unlock_irqrestore(&bitmap->lock, flags);
>
>  /* now sync the final page */
>  if (lastpage != NULL) {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

     prev parent reply	other threads:[~2009-05-21 21:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-20  9:46 Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies) Pid: 902, comm: md1_raid5 Janos Haar
2009-05-21  2:50 ` Paul E. McKenney
2009-05-21  4:46   ` Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies) Pid:902, " Janos Haar
2009-05-21  5:16     ` Paul E. McKenney
2009-05-21  6:50       ` Neil Brown
2009-05-21  9:50         ` Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies)Pid:902, " Janos Haar
2009-05-21 21:53         ` Janos Haar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='050e01c9da5e$8d142b20$0400a8c0@dcccs' \
    --to=janos.haar@netcenter.hu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).