linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Adam Kwolek <adam.kwolek@intel.com>
Cc: linux-raid@vger.kernel.org, dan.j.williams@intel.com,
	ed.ciechanowski@intel.com
Subject: Re: [PATCH 10/10] FIX: wait_backup() sometimes hungs
Date: Fri, 3 Dec 2010 15:16:55 +1100	[thread overview]
Message-ID: <20101203151655.04e48dff@notabene.brown> (raw)
In-Reply-To: <20101202081958.4639.17010.stgit@gklab-170-024.igk.intel.com>

On Thu, 02 Dec 2010 09:19:58 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote:

> Sometimes wait_backup() omits transition from reshape to iddle state and mdadm seams to be hung.
> Add 1 sec. timeout wor waiting on select. This allows for wait_backup exit when reshape is ended.
> 
> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> ---
> 
>  Grow.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/Grow.c b/Grow.c
> index 24c5c39..e16b1ad 100644
> --- a/Grow.c
> +++ b/Grow.c
> @@ -2074,10 +2074,14 @@ static int wait_backup(struct mdinfo *sra,
>  		sysfs_set_str(sra, NULL, "sync_action", "reshape");
>  	do {
>  		char action[20];
> +		struct timeval t;
> +
> +		t.tv_sec = 1;
> +		t.tv_usec = 0;
>  		fd_set rfds;
>  		FD_ZERO(&rfds);
>  		FD_SET(fd, &rfds);
> -		select(fd+1, NULL, NULL, &rfds, NULL);
> +		select(fd+1, NULL, NULL, &rfds, &t);
>  		if (sysfs_fd_get_ll(fd, &completed) < 0) {
>  			close(fd);
>  			return -1;


Thanks.  However I don't think the 1 second timeout is necessary.  This is
really the same problem as the previous one.  We just need to read
'completed' before the first 'select'.  Like this.

Thanks,
NeilBrown

commit 97bef35459306dfd291f40bc5221ad20ab9c21ba
Author: Adam Kwolek <adam.kwolek@intel.com>
Date:   Fri Dec 3 15:15:51 2010 +1100

    FIX: wait_backup() sometimes hungs
    
    Sometimes wait_backup() omits transition from reshape to idle state
    and mdadm seams to be hung.  So check the 'complete' count
    *before* waiting rather than only after.
    
    Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
    Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/Grow.c b/Grow.c
index 3322cf7..99807b4 100644
--- a/Grow.c
+++ b/Grow.c
@@ -2058,12 +2058,17 @@ static int wait_backup(struct mdinfo *sra,
 	sysfs_set_num(sra, NULL, "sync_max", offset + blocks + blocks2);
 	if (offset == 0)
 		sysfs_set_str(sra, NULL, "sync_action", "reshape");
-	do {
+
+	if (sysfs_fd_get_ll(fd, &completed) < 0) {
+		close(fd);
+		return -1;
+	}
+	while (completed < offset + blocks) {
 		char action[20];
 		fd_set rfds;
 		FD_ZERO(&rfds);
 		FD_SET(fd, &rfds);
-		select(fd+1, NULL, NULL, &rfds, NULL);
+		select(fd+1, NULL, NULL, &rfds, &t);
 		if (sysfs_fd_get_ll(fd, &completed) < 0) {
 			close(fd);
 			return -1;
@@ -2072,7 +2077,7 @@ static int wait_backup(struct mdinfo *sra,
 				  action, 20) > 0 &&
 		    strncmp(action, "reshape", 7) != 0)
 			break;
-	} while (completed < offset + blocks);
+	}
 	close(fd);
 
 	if (part) {

  reply	other threads:[~2010-12-03  4:16 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-02  8:18 [PATCH 00/10] Pre-migration patch series Adam Kwolek
2010-12-02  8:18 ` [PATCH 01/10] FIX: Cannot exit monitor after takeover Adam Kwolek
2010-12-02  8:18 ` [PATCH 02/10] FIX: Problem with removing array " Adam Kwolek
2010-12-03  3:46   ` Neil Brown
2010-12-02  8:19 ` [PATCH 03/10] FIX: Add error code for raid_disks set Adam Kwolek
2010-12-02 18:56   ` Dan Williams
2010-12-02  8:19 ` [PATCH 04/10] Add support to skip slot configuration Adam Kwolek
2010-12-02  8:19 ` [PATCH 05/10] Add spares to raid0 array using takeover Adam Kwolek
2010-12-03  3:52   ` Neil Brown
2010-12-02  8:19 ` [PATCH 06/10] FIX: open backup file for reshape as function Adam Kwolek
2010-12-03  4:01   ` Neil Brown
2010-12-02  8:19 ` [PATCH 07/10] FIX: Do not use layout for raid4 and raid0 while geo map computing Adam Kwolek
2010-12-02  8:19 ` [PATCH 08/10] FIX: sync_completed_fd handler has to be closed Adam Kwolek
2010-12-02  8:19 ` [PATCH 09/10] FIX: Honor !reshape state on wait_reshape() entry Adam Kwolek
2010-12-03  4:11   ` Neil Brown
2010-12-02  8:19 ` [PATCH 10/10] FIX: wait_backup() sometimes hungs Adam Kwolek
2010-12-03  4:16   ` Neil Brown [this message]
2010-12-03  7:45     ` Kwolek, Adam
2010-12-03 10:35       ` Neil Brown
2010-12-03  4:19 ` [PATCH 00/10] Pre-migration patch series Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101203151655.04e48dff@notabene.brown \
    --to=neilb@suse.de \
    --cc=adam.kwolek@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=ed.ciechanowski@intel.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).