[PATCH] fix: mdadm -Ss for external metadata don't stop container

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] fix: mdadm -Ss for external metadata don't stop container
@ 2010-12-07  6:44 Hawrylewicz Czarnowski, Przemyslaw
  2010-12-07 10:16 ` Neil Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Hawrylewicz Czarnowski, Przemyslaw @ 2010-12-07  6:44 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed,
	Labun, Marcin, Czarnowska, Anna

Neil,

The one below is a fix for the problem we encounter quite often when we try to stop all arrays with mdadm -Ss. The main problem is that mdmon holds open container device and then exits. The time that system make clean up is quite long and mdadm invokes ARRAY_STOP ioctl when device is still opened. 
Second resolution is to retry ioctl in mdadm after mdmon exits, but closing handle is I what should be done before process exist.
Take a look at the patch below:

--
Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its device
and do not close it before exit(). The period between open and release of
handle is too long and md is not able stop device. Releasing handle before
exit does not block md.

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
---
 monitor.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/monitor.c b/monitor.c
index 59b4181..f166bc8 100644
--- a/monitor.c
+++ b/monitor.c
@@ -525,6 +525,7 @@ static int wait_and_act(struct supertype *container, int nowait)
 				remove_pidfile(container->devname);
 			exit_now = 1;
 			signal_manager();
+			close(fd);
 			exit(0);
 		}
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix: mdadm -Ss for external metadata don't stop container
  2010-12-07  6:44 [PATCH] fix: mdadm -Ss for external metadata don't stop container Hawrylewicz Czarnowski, Przemyslaw
@ 2010-12-07 10:16 ` Neil Brown
  2010-12-07 11:07   ` Hawrylewicz Czarnowski, Przemyslaw
  2011-03-16 22:24   ` Hawrylewicz Czarnowski, Przemyslaw
  0 siblings, 2 replies; 8+ messages in thread
From: Neil Brown @ 2010-12-07 10:16 UTC (permalink / raw)
  To: Hawrylewicz Czarnowski, Przemyslaw
  Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed,
	Labun, Marcin, Czarnowska, Anna

On Tue, 7 Dec 2010 06:44:21 +0000 "Hawrylewicz Czarnowski, Przemyslaw"
<przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:

> Neil,
> 
> The one below is a fix for the problem we encounter quite often when we try to stop all arrays with mdadm -Ss. The main problem is that mdmon holds open container device and then exits. The time that system make clean up is quite long and mdadm invokes ARRAY_STOP ioctl when device is still opened. 
> Second resolution is to retry ioctl in mdadm after mdmon exits, but closing handle is I what should be done before process exist.
> Take a look at the patch below:
> 
> --
> Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its device
> and do not close it before exit(). The period between open and release of
> handle is too long and md is not able stop device. Releasing handle before
> exit does not block md.
> 
> Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>

I've applied this, but I'm not 100% sure it is completely safe.
mdmon holds the O_EXCL open to be sure that mdadm isn't creating or
assembling another array in the container.
mdadm will get an O_EXCL and then try sending a signal to mdmon.  If it
succeeds, it knows mdmon is still running.  But this patch might open a
window where mdadm can get O_EXCL, and a signal still works.

However I'm not certain that window wasn't already there, and this might just
make it a bit bigger.
I've put a note in my to-do list to look into this more closely and figure
out if there is a problem, and if so, how to fix it.

Thanks,
NeilBrown



> ---
>  monitor.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 59b4181..f166bc8 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -525,6 +525,7 @@ static int wait_and_act(struct supertype *container, int nowait)
>  				remove_pidfile(container->devname);
>  			exit_now = 1;
>  			signal_manager();
> +			close(fd);
>  			exit(0);
>  		}
>  	}


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] fix: mdadm -Ss for external metadata don't stop container
  2010-12-07 10:16 ` Neil Brown
@ 2010-12-07 11:07   ` Hawrylewicz Czarnowski, Przemyslaw
  2010-12-07 16:09     ` Dan Williams
  2011-03-16 22:24   ` Hawrylewicz Czarnowski, Przemyslaw
  1 sibling, 1 reply; 8+ messages in thread
From: Hawrylewicz Czarnowski, Przemyslaw @ 2010-12-07 11:07 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed,
	Labun, Marcin, Czarnowska, Anna

> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Tuesday, December 07, 2010 11:16 AM
> To: Hawrylewicz Czarnowski, Przemyslaw
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed; Labun,
> Marcin; Czarnowska, Anna
> Subject: Re: [PATCH] fix: mdadm -Ss for external metadata don't stop
> container
> 
> On Tue, 7 Dec 2010 06:44:21 +0000 "Hawrylewicz Czarnowski, Przemyslaw"
> <przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> 
> > Neil,
> >
> > The one below is a fix for the problem we encounter quite often when we
> try to stop all arrays with mdadm -Ss. The main problem is that mdmon holds
> open container device and then exits. The time that system make clean up is
> quite long and mdadm invokes ARRAY_STOP ioctl when device is still opened.
> > Second resolution is to retry ioctl in mdadm after mdmon exits, but
> closing handle is I what should be done before process exist.
> > Take a look at the patch below:
> >
> > --
> > Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its
> device
> > and do not close it before exit(). The period between open and release of
> > handle is too long and md is not able stop device. Releasing handle
> before
> > exit does not block md.
> >
> > Signed-off-by: Przemyslaw Czarnowski
> <przemyslaw.hawrylewicz.czarnowski@intel.com>
> 
> I've applied this, but I'm not 100% sure it is completely safe.
> mdmon holds the O_EXCL open to be sure that mdadm isn't creating or
> assembling another array in the container.
> mdadm will get an O_EXCL and then try sending a signal to mdmon.  If it
> succeeds, it knows mdmon is still running.  But this patch might open a
> window where mdadm can get O_EXCL, and a signal still works.
On the manual pages, behavior of O_EXCL is only defined in connection with O_CREAT flag, which is not present in open_dev_excl (of course:). I have just make test for open(name, O_RDWR | O_EXCL) few times on the same file and it does not block other processes...

> 
> However I'm not certain that window wasn't already there, and this might
> just
> make it a bit bigger.
> I've put a note in my to-do list to look into this more closely and figure
> out if there is a problem, and if so, how to fix it.
Yes, this fix do not close this issue completely. First, the window exist and mdadm still have a chance to hit it. Second - monitor should wait until manager finishes his work (what is not fulfilled right now). I have used "return -1" instead of exit(0), but manager seems to miss that ping preformed right before...

> 
> Thanks,
> NeilBrown
> 
> 
> 
> > ---
> >  monitor.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/monitor.c b/monitor.c
> > index 59b4181..f166bc8 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -525,6 +525,7 @@ static int wait_and_act(struct supertype *container,
> int nowait)
> >  				remove_pidfile(container->devname);
> >  			exit_now = 1;
> >  			signal_manager();
> > +			close(fd);
> >  			exit(0);
> >  		}
> >  	}


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix: mdadm -Ss for external metadata don't stop container
  2010-12-07 11:07   ` Hawrylewicz Czarnowski, Przemyslaw
@ 2010-12-07 16:09     ` Dan Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2010-12-07 16:09 UTC (permalink / raw)
  To: Hawrylewicz Czarnowski, Przemyslaw
  Cc: Neil Brown, linux-raid@vger.kernel.org, Ciechanowski, Ed,
	Labun, Marcin, Czarnowska, Anna

On 12/7/2010 3:07 AM, Hawrylewicz Czarnowski, Przemyslaw wrote:
>> I've applied this, but I'm not 100% sure it is completely safe.
>> mdmon holds the O_EXCL open to be sure that mdadm isn't creating or
>> assembling another array in the container.
>> mdadm will get an O_EXCL and then try sending a signal to mdmon.  If it
>> succeeds, it knows mdmon is still running.  But this patch might open a
>> window where mdadm can get O_EXCL, and a signal still works.
> On the manual pages, behavior of O_EXCL is only defined in connection with O_CREAT flag, which is not present in open_dev_excl (of course:). I have just make test for open(name, O_RDWR | O_EXCL) few times on the same file and it does not block other processes...

Try the same test on a device special file.

--
Dan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] fix: mdadm -Ss for external metadata don't stop container
  2010-12-07 10:16 ` Neil Brown
  2010-12-07 11:07   ` Hawrylewicz Czarnowski, Przemyslaw
@ 2011-03-16 22:24   ` Hawrylewicz Czarnowski, Przemyslaw
  2011-03-17  2:38     ` NeilBrown
  1 sibling, 1 reply; 8+ messages in thread
From: Hawrylewicz Czarnowski, Przemyslaw @ 2011-03-16 22:24 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed

> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Tuesday, December 07, 2010 11:16 AM
> To: Hawrylewicz Czarnowski, Przemyslaw
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed; Labun,
> Marcin; Czarnowska, Anna
> Subject: Re: [PATCH] fix: mdadm -Ss for external metadata don't stop
> container
> 
> On Tue, 7 Dec 2010 06:44:21 +0000 "Hawrylewicz Czarnowski, Przemyslaw"
> <przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> 
> > Neil,
> >
> > The one below is a fix for the problem we encounter quite often when we
> try to stop all arrays with mdadm -Ss. The main problem is that mdmon holds
> open container device and then exits. The time that system make clean up is
> quite long and mdadm invokes ARRAY_STOP ioctl when device is still opened.
> > Second resolution is to retry ioctl in mdadm after mdmon exits, but
> closing handle is I what should be done before process exist.
> > Take a look at the patch below:
> >
> > --
> > Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its
> device
> > and do not close it before exit(). The period between open and release of
> > handle is too long and md is not able stop device. Releasing handle
> before
> > exit does not block md.
> >
> > Signed-off-by: Przemyslaw Czarnowski
> <przemyslaw.hawrylewicz.czarnowski@intel.com>
> 
> I've applied this, but I'm not 100% sure it is completely safe.
> mdmon holds the O_EXCL open to be sure that mdadm isn't creating or
> assembling another array in the container.
> mdadm will get an O_EXCL and then try sending a signal to mdmon.  If it
> succeeds, it knows mdmon is still running.  But this patch might open a
> window where mdadm can get O_EXCL, and a signal still works.

Hi,

As we test new RHEL6.1 snapshots the problem described in this thread shows
up very often. 
As you noticed above, opening container with O_EXCL flag, blocks other
processes to access container which is good (let us name it "case 1"). But 
the problem is that there is always a gap between the time mdmon releases
container handle (either by close() or exit()) and STOP_ARRAY ioctl
invoked by mdadm, where any other process can access container device and
block stop action (case 2).
Right now, ioctl is started without any synchronization, what often leads to
"busy" as mdmon asynchronously cleans up after subarrays are stopped.
First what comes to mind is to store mdmon pid, and create loop waiting for
mdmon to close right before ARRAY_STOP action. It will help with "case 1" but
do not block "case 2". I have noticed that such approach helps in about 90% of
all "busy" errors.
To block against both cases I think that kind mdmon/mdadm/kernel
synchronization is needed. Do you have any ideas how it can be made?

> 
> However I'm not certain that window wasn't already there, and this might
> just
> make it a bit bigger.
> I've put a note in my to-do list to look into this more closely and figure
> out if there is a problem, and if so, how to fix it.
Could you please move that problem to the top of your TODOs? :)

> 
> Thanks,
> NeilBrown
> 
> 
> 
> > ---
> >  monitor.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/monitor.c b/monitor.c
> > index 59b4181..f166bc8 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -525,6 +525,7 @@ static int wait_and_act(struct supertype *container,
> int nowait)
> >  				remove_pidfile(container->devname);
> >  			exit_now = 1;
> >  			signal_manager();
> > +			close(fd);
> >  			exit(0);
> >  		}
> >  	}


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix: mdadm -Ss for external metadata don't stop container
  2011-03-16 22:24   ` Hawrylewicz Czarnowski, Przemyslaw
@ 2011-03-17  2:38     ` NeilBrown
  2011-03-17 16:51       ` Wojcik, Krzysztof
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2011-03-17  2:38 UTC (permalink / raw)
  To: Hawrylewicz Czarnowski, Przemyslaw
  Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed

On Wed, 16 Mar 2011 22:24:20 +0000 "Hawrylewicz Czarnowski, Przemyslaw"
<przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:

> > -----Original Message-----
> > From: Neil Brown [mailto:neilb@suse.de]
> > Sent: Tuesday, December 07, 2010 11:16 AM
> > To: Hawrylewicz Czarnowski, Przemyslaw
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed; Labun,
> > Marcin; Czarnowska, Anna
> > Subject: Re: [PATCH] fix: mdadm -Ss for external metadata don't stop
> > container
> > 
> > On Tue, 7 Dec 2010 06:44:21 +0000 "Hawrylewicz Czarnowski, Przemyslaw"
> > <przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> > 
> > > Neil,
> > >
> > > The one below is a fix for the problem we encounter quite often when we
> > try to stop all arrays with mdadm -Ss. The main problem is that mdmon holds
> > open container device and then exits. The time that system make clean up is
> > quite long and mdadm invokes ARRAY_STOP ioctl when device is still opened.
> > > Second resolution is to retry ioctl in mdadm after mdmon exits, but
> > closing handle is I what should be done before process exist.
> > > Take a look at the patch below:
> > >
> > > --
> > > Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its
> > device
> > > and do not close it before exit(). The period between open and release of
> > > handle is too long and md is not able stop device. Releasing handle
> > before
> > > exit does not block md.
> > >
> > > Signed-off-by: Przemyslaw Czarnowski
> > <przemyslaw.hawrylewicz.czarnowski@intel.com>
> > 
> > I've applied this, but I'm not 100% sure it is completely safe.
> > mdmon holds the O_EXCL open to be sure that mdadm isn't creating or
> > assembling another array in the container.
> > mdadm will get an O_EXCL and then try sending a signal to mdmon.  If it
> > succeeds, it knows mdmon is still running.  But this patch might open a
> > window where mdadm can get O_EXCL, and a signal still works.
> 
> Hi,
> 
> As we test new RHEL6.1 snapshots the problem described in this thread shows
> up very often. 
> As you noticed above, opening container with O_EXCL flag, blocks other
> processes to access container which is good (let us name it "case 1"). But 
> the problem is that there is always a gap between the time mdmon releases
> container handle (either by close() or exit()) and STOP_ARRAY ioctl
> invoked by mdadm, where any other process can access container device and
> block stop action (case 2).
> Right now, ioctl is started without any synchronization, what often leads to
> "busy" as mdmon asynchronously cleans up after subarrays are stopped.
> First what comes to mind is to store mdmon pid, and create loop waiting for
> mdmon to close right before ARRAY_STOP action. It will help with "case 1" but
> do not block "case 2". I have noticed that such approach helps in about 90% of
> all "busy" errors.
> To block against both cases I think that kind mdmon/mdadm/kernel
> synchronization is needed. Do you have any ideas how it can be made?

Yes.  I think we can make use of O_EXCL, and a little bit of retry-on-failure
to make this all more reliable.  See patch below which will appear in my
devel-3.2 tree later today.

If this doesn't fix your symptom, then it is possible I have misunderstood.
In that case I'll get some more details and try again.

> 
> > 
> > However I'm not certain that window wasn't already there, and this might
> > just
> > make it a bit bigger.
> > I've put a note in my to-do list to look into this more closely and figure
> > out if there is a problem, and if so, how to fix it.
> Could you please move that problem to the top of your TODOs? :)

The concern that I had was unfounded.  Because mdmon removes its pid file
before closing the fd (and losing the O_EXCL lock), there is no race.

Thanks,
NeilBrown

commit eb0af52689656f6526540ee3a72b0647e7a7b20d
Author: NeilBrown <neilb@suse.de>
Date:   Thu Mar 17 13:35:10 2011 +1100

    --stop: separate 'is busy' test for 'did it stop properly'.
    
    Stopping an md array requires that there is no other user of it.
    However with udev and udisks and such there can be transient other
    users of md devices which can interfere with stopping the array.
    
    If there is a transient users, we really want "mdadm --stop" to wait a
    little while and retry.
    However if the array is genuinely in-use (e.g. mounted), then we
    don't want to wait at all - we want to fail immediately.
    
    So before trying to stop, re-open device with O_EXCL.  If this fails
    then the device is probably in use, so give up.
    
    If it succeeds, but a subsequent STOP_ARRAY fails, then it is possibly
    a transient failure, so try again for a few seconds.
    
    Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/Manage.c b/Manage.c
index 5996646..3361269 100644
--- a/Manage.c
+++ b/Manage.c
@@ -208,10 +208,26 @@ int Manage_runstop(char *devname, int fd, int runstop, int quiet)
 		struct stat stb;
 		struct mdinfo *mdi;
 		int devnum;
+		int err;
+		int count;
 		/* If this is an mdmon managed array, just write 'inactive'
 		 * to the array state and let mdmon clear up.
 		 */
 		devnum = fd2devnum(fd);
+		/* Get EXCL access first.  If this fails, then attempting
+		 * to stop is probably a bad idea.
+		 */
+		close(fd);
+		fd = open(devname, O_RDONLY|O_EXCL);
+		if (fd < 0 || fd2devnum(fd) != devnum) {
+			if (fd >= 0)
+				close(fd);
+			fprintf(stderr,
+				Name ": Cannot get exclusive access to %s:"
+				" possibly it is still in use.\n",
+				devname);
+			return 1;
+		}
 		mdi = sysfs_read(fd, -1, GET_LEVEL|GET_VERSION);
 		if (mdi &&
 		    mdi->array.level > 0 &&
@@ -230,7 +246,14 @@ int Manage_runstop(char *devname, int fd, int runstop, int quiet)
 			/* Give monitor a chance to act */
 			ping_monitor(mdi->text_version);
 
-			fd = open(devname, O_RDONLY);
+			fd = open_dev_excl(devnum);
+			if (fd < 0) {
+				fprintf(stderr, Name
+					": failed to completely stop %s"
+					": Device is busy\n",
+					devname);
+				return 1;
+			}
 		} else if (mdi &&
 			   mdi->array.major_version == -1 &&
 			   mdi->array.minor_version == -2 &&
@@ -263,7 +286,18 @@ int Manage_runstop(char *devname, int fd, int runstop, int quiet)
 				}
 		}
 
-		if (fd >= 0 && ioctl(fd, STOP_ARRAY, NULL)) {
+		/* As we have an O_EXCL open, any use of the device
+		 * which blocks STOP_ARRAY is probably a transient use,
+		 * so it is reasonable to retry for a while - 5 seconds.
+		 */
+		count = 25;
+		while (count && fd >= 0
+		       && (err = ioctl(fd, STOP_ARRAY, NULL)) < 0
+		       && errno == EBUSY) {
+			usleep(200000);
+			count --;
+		}
+		if (fd >= 0 && err) {
 			if (quiet == 0) {
 				fprintf(stderr, Name
 					": failed to stop array %s: %s\n",


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [PATCH] fix: mdadm -Ss for external metadata don't stop container
  2011-03-17  2:38     ` NeilBrown
@ 2011-03-17 16:51       ` Wojcik, Krzysztof
  2011-03-17 21:08         ` Hawrylewicz Czarnowski, Przemyslaw
  0 siblings, 1 reply; 8+ messages in thread
From: Wojcik, Krzysztof @ 2011-03-17 16:51 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid@vger.kernel.org, Ciechanowski, Ed,
	Hawrylewicz Czarnowski, Przemyslaw

Neil,

We tested proposed patch. It resolve issue discussed bellow.
But it is not full resolution for problems with array stopping.
It turned out that another issue exists.
If we try to write 'inactive' do array state, sometimes operation fails.
I've send patch for this. Please review it and apply if ok.

Regards
Krzysztof

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of NeilBrown
> Sent: Thursday, March 17, 2011 3:38 AM
> To: Hawrylewicz Czarnowski, Przemyslaw
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed
> Subject: Re: [PATCH] fix: mdadm -Ss for external metadata don't stop
> container
> 
> On Wed, 16 Mar 2011 22:24:20 +0000 "Hawrylewicz Czarnowski, Przemyslaw"
> <przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: Neil Brown [mailto:neilb@suse.de]
> > > Sent: Tuesday, December 07, 2010 11:16 AM
> > > To: Hawrylewicz Czarnowski, Przemyslaw
> > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Labun,
> > > Marcin; Czarnowska, Anna
> > > Subject: Re: [PATCH] fix: mdadm -Ss for external metadata don't
> stop
> > > container
> > >
> > > On Tue, 7 Dec 2010 06:44:21 +0000 "Hawrylewicz Czarnowski,
> Przemyslaw"
> > > <przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> > >
> > > > Neil,
> > > >
> > > > The one below is a fix for the problem we encounter quite often
> when we
> > > try to stop all arrays with mdadm -Ss. The main problem is that
> mdmon holds
> > > open container device and then exits. The time that system make
> clean up is
> > > quite long and mdadm invokes ARRAY_STOP ioctl when device is still
> opened.
> > > > Second resolution is to retry ioctl in mdadm after mdmon exits,
> but
> > > closing handle is I what should be done before process exist.
> > > > Take a look at the patch below:
> > > >
> > > > --
> > > > Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens
> its
> > > device
> > > > and do not close it before exit(). The period between open and
> release of
> > > > handle is too long and md is not able stop device. Releasing
> handle
> > > before
> > > > exit does not block md.
> > > >
> > > > Signed-off-by: Przemyslaw Czarnowski
> > > <przemyslaw.hawrylewicz.czarnowski@intel.com>
> > >
> > > I've applied this, but I'm not 100% sure it is completely safe.
> > > mdmon holds the O_EXCL open to be sure that mdadm isn't creating or
> > > assembling another array in the container.
> > > mdadm will get an O_EXCL and then try sending a signal to mdmon.
> If it
> > > succeeds, it knows mdmon is still running.  But this patch might
> open a
> > > window where mdadm can get O_EXCL, and a signal still works.
> >
> > Hi,
> >
> > As we test new RHEL6.1 snapshots the problem described in this thread
> shows
> > up very often.
> > As you noticed above, opening container with O_EXCL flag, blocks
> other
> > processes to access container which is good (let us name it "case
> 1"). But
> > the problem is that there is always a gap between the time mdmon
> releases
> > container handle (either by close() or exit()) and STOP_ARRAY ioctl
> > invoked by mdadm, where any other process can access container device
> and
> > block stop action (case 2).
> > Right now, ioctl is started without any synchronization, what often
> leads to
> > "busy" as mdmon asynchronously cleans up after subarrays are stopped.
> > First what comes to mind is to store mdmon pid, and create loop
> waiting for
> > mdmon to close right before ARRAY_STOP action. It will help with
> "case 1" but
> > do not block "case 2". I have noticed that such approach helps in
> about 90% of
> > all "busy" errors.
> > To block against both cases I think that kind mdmon/mdadm/kernel
> > synchronization is needed. Do you have any ideas how it can be made?
> 
> Yes.  I think we can make use of O_EXCL, and a little bit of retry-on-
> failure
> to make this all more reliable.  See patch below which will appear in
> my
> devel-3.2 tree later today.
> 
> If this doesn't fix your symptom, then it is possible I have
> misunderstood.
> In that case I'll get some more details and try again.
> 
> >
> > >
> > > However I'm not certain that window wasn't already there, and this
> might
> > > just
> > > make it a bit bigger.
> > > I've put a note in my to-do list to look into this more closely and
> figure
> > > out if there is a problem, and if so, how to fix it.
> > Could you please move that problem to the top of your TODOs? :)
> 
> The concern that I had was unfounded.  Because mdmon removes its pid
> file
> before closing the fd (and losing the O_EXCL lock), there is no race.
> 
> Thanks,
> NeilBrown
> 
> commit eb0af52689656f6526540ee3a72b0647e7a7b20d
> Author: NeilBrown <neilb@suse.de>
> Date:   Thu Mar 17 13:35:10 2011 +1100
> 
>     --stop: separate 'is busy' test for 'did it stop properly'.
> 
>     Stopping an md array requires that there is no other user of it.
>     However with udev and udisks and such there can be transient other
>     users of md devices which can interfere with stopping the array.
> 
>     If there is a transient users, we really want "mdadm --stop" to
> wait a
>     little while and retry.
>     However if the array is genuinely in-use (e.g. mounted), then we
>     don't want to wait at all - we want to fail immediately.
> 
>     So before trying to stop, re-open device with O_EXCL.  If this
> fails
>     then the device is probably in use, so give up.
> 
>     If it succeeds, but a subsequent STOP_ARRAY fails, then it is
> possibly
>     a transient failure, so try again for a few seconds.
> 
>     Signed-off-by: NeilBrown <neilb@suse.de>
> 
> diff --git a/Manage.c b/Manage.c
> index 5996646..3361269 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -208,10 +208,26 @@ int Manage_runstop(char *devname, int fd, int
> runstop, int quiet)
>  		struct stat stb;
>  		struct mdinfo *mdi;
>  		int devnum;
> +		int err;
> +		int count;
>  		/* If this is an mdmon managed array, just write 'inactive'
>  		 * to the array state and let mdmon clear up.
>  		 */
>  		devnum = fd2devnum(fd);
> +		/* Get EXCL access first.  If this fails, then attempting
> +		 * to stop is probably a bad idea.
> +		 */
> +		close(fd);
> +		fd = open(devname, O_RDONLY|O_EXCL);
> +		if (fd < 0 || fd2devnum(fd) != devnum) {
> +			if (fd >= 0)
> +				close(fd);
> +			fprintf(stderr,
> +				Name ": Cannot get exclusive access to %s:"
> +				" possibly it is still in use.\n",
> +				devname);
> +			return 1;
> +		}
>  		mdi = sysfs_read(fd, -1, GET_LEVEL|GET_VERSION);
>  		if (mdi &&
>  		    mdi->array.level > 0 &&
> @@ -230,7 +246,14 @@ int Manage_runstop(char *devname, int fd, int
> runstop, int quiet)
>  			/* Give monitor a chance to act */
>  			ping_monitor(mdi->text_version);
> 
> -			fd = open(devname, O_RDONLY);
> +			fd = open_dev_excl(devnum);
> +			if (fd < 0) {
> +				fprintf(stderr, Name
> +					": failed to completely stop %s"
> +					": Device is busy\n",
> +					devname);
> +				return 1;
> +			}
>  		} else if (mdi &&
>  			   mdi->array.major_version == -1 &&
>  			   mdi->array.minor_version == -2 &&
> @@ -263,7 +286,18 @@ int Manage_runstop(char *devname, int fd, int
> runstop, int quiet)
>  				}
>  		}
> 
> -		if (fd >= 0 && ioctl(fd, STOP_ARRAY, NULL)) {
> +		/* As we have an O_EXCL open, any use of the device
> +		 * which blocks STOP_ARRAY is probably a transient use,
> +		 * so it is reasonable to retry for a while - 5 seconds.
> +		 */
> +		count = 25;
> +		while (count && fd >= 0
> +		       && (err = ioctl(fd, STOP_ARRAY, NULL)) < 0
> +		       && errno == EBUSY) {
> +			usleep(200000);
> +			count --;
> +		}
> +		if (fd >= 0 && err) {
>  			if (quiet == 0) {
>  				fprintf(stderr, Name
>  					": failed to stop array %s: %s\n",
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] fix: mdadm -Ss for external metadata don't stop container
  2011-03-17 16:51       ` Wojcik, Krzysztof
@ 2011-03-17 21:08         ` Hawrylewicz Czarnowski, Przemyslaw
  0 siblings, 0 replies; 8+ messages in thread
From: Hawrylewicz Czarnowski, Przemyslaw @ 2011-03-17 21:08 UTC (permalink / raw)
  To: Wojcik, Krzysztof, NeilBrown; +Cc: linux-raid@vger.kernel.org, Ciechanowski, Ed


> -----Original Message-----
> From: Wojcik, Krzysztof
> Sent: Thursday, March 17, 2011 5:51 PM
> To: NeilBrown
> Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Hawrylewicz Czarnowski,
> Przemyslaw
> Subject: RE: [PATCH] fix: mdadm -Ss for external metadata don't stop
> container
> 
> Neil,
> 
> We tested proposed patch. It resolve issue discussed bellow.
> But it is not full resolution for problems with array stopping.
> It turned out that another issue exists.
> If we try to write 'inactive' do array state, sometimes operation fails.
> I've send patch for this. Please review it and apply if ok.
> 

On more note: bigger chance that problem shows up can be observed if in mdstat 
container is listed before array (almost always in RHEL, rarely in SLES). Problem
with busy array (not container) is caused mostly by "blockid" and "udisks-part-id"
started by udev.

Przemyslaw Hawrylewicz-Czarnowski

> Regards
> Krzysztof
> 
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > owner@vger.kernel.org] On Behalf Of NeilBrown
> > Sent: Thursday, March 17, 2011 3:38 AM
> > To: Hawrylewicz Czarnowski, Przemyslaw
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed
> > Subject: Re: [PATCH] fix: mdadm -Ss for external metadata don't stop
> > container
> >
> > On Wed, 16 Mar 2011 22:24:20 +0000 "Hawrylewicz Czarnowski, Przemyslaw"
> > <przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> >
> > > > -----Original Message-----
> > > > From: Neil Brown [mailto:neilb@suse.de]
> > > > Sent: Tuesday, December 07, 2010 11:16 AM
> > > > To: Hawrylewicz Czarnowski, Przemyslaw
> > > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > Labun,
> > > > Marcin; Czarnowska, Anna
> > > > Subject: Re: [PATCH] fix: mdadm -Ss for external metadata don't
> > stop
> > > > container
> > > >
> > > > On Tue, 7 Dec 2010 06:44:21 +0000 "Hawrylewicz Czarnowski,
> > Przemyslaw"
> > > > <przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> > > >
> > > > > Neil,
> > > > >
> > > > > The one below is a fix for the problem we encounter quite often
> > when we
> > > > try to stop all arrays with mdadm -Ss. The main problem is that
> > mdmon holds
> > > > open container device and then exits. The time that system make
> > clean up is
> > > > quite long and mdadm invokes ARRAY_STOP ioctl when device is still
> > opened.
> > > > > Second resolution is to retry ioctl in mdadm after mdmon exits,
> > but
> > > > closing handle is I what should be done before process exist.
> > > > > Take a look at the patch below:
> > > > >
> > > > > --
> > > > > Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens
> > its
> > > > device
> > > > > and do not close it before exit(). The period between open and
> > release of
> > > > > handle is too long and md is not able stop device. Releasing
> > handle
> > > > before
> > > > > exit does not block md.
> > > > >
> > > > > Signed-off-by: Przemyslaw Czarnowski
> > > > <przemyslaw.hawrylewicz.czarnowski@intel.com>
> > > >
> > > > I've applied this, but I'm not 100% sure it is completely safe.
> > > > mdmon holds the O_EXCL open to be sure that mdadm isn't creating or
> > > > assembling another array in the container.
> > > > mdadm will get an O_EXCL and then try sending a signal to mdmon.
> > If it
> > > > succeeds, it knows mdmon is still running.  But this patch might
> > open a
> > > > window where mdadm can get O_EXCL, and a signal still works.
> > >
> > > Hi,
> > >
> > > As we test new RHEL6.1 snapshots the problem described in this thread
> > shows
> > > up very often.
> > > As you noticed above, opening container with O_EXCL flag, blocks
> > other
> > > processes to access container which is good (let us name it "case
> > 1"). But
> > > the problem is that there is always a gap between the time mdmon
> > releases
> > > container handle (either by close() or exit()) and STOP_ARRAY ioctl
> > > invoked by mdadm, where any other process can access container device
> > and
> > > block stop action (case 2).
> > > Right now, ioctl is started without any synchronization, what often
> > leads to
> > > "busy" as mdmon asynchronously cleans up after subarrays are stopped.
> > > First what comes to mind is to store mdmon pid, and create loop
> > waiting for
> > > mdmon to close right before ARRAY_STOP action. It will help with
> > "case 1" but
> > > do not block "case 2". I have noticed that such approach helps in
> > about 90% of
> > > all "busy" errors.
> > > To block against both cases I think that kind mdmon/mdadm/kernel
> > > synchronization is needed. Do you have any ideas how it can be made?
> >
> > Yes.  I think we can make use of O_EXCL, and a little bit of retry-on-
> > failure
> > to make this all more reliable.  See patch below which will appear in
> > my
> > devel-3.2 tree later today.
> >
> > If this doesn't fix your symptom, then it is possible I have
> > misunderstood.
> > In that case I'll get some more details and try again.
> >
> > >
> > > >
> > > > However I'm not certain that window wasn't already there, and this
> > might
> > > > just
> > > > make it a bit bigger.
> > > > I've put a note in my to-do list to look into this more closely and
> > figure
> > > > out if there is a problem, and if so, how to fix it.
> > > Could you please move that problem to the top of your TODOs? :)
> >
> > The concern that I had was unfounded.  Because mdmon removes its pid
> > file
> > before closing the fd (and losing the O_EXCL lock), there is no race.
> >
> > Thanks,
> > NeilBrown
> >
> > commit eb0af52689656f6526540ee3a72b0647e7a7b20d
> > Author: NeilBrown <neilb@suse.de>
> > Date:   Thu Mar 17 13:35:10 2011 +1100
> >
> >     --stop: separate 'is busy' test for 'did it stop properly'.
> >
> >     Stopping an md array requires that there is no other user of it.
> >     However with udev and udisks and such there can be transient other
> >     users of md devices which can interfere with stopping the array.
> >
> >     If there is a transient users, we really want "mdadm --stop" to
> > wait a
> >     little while and retry.
> >     However if the array is genuinely in-use (e.g. mounted), then we
> >     don't want to wait at all - we want to fail immediately.
> >
> >     So before trying to stop, re-open device with O_EXCL.  If this
> > fails
> >     then the device is probably in use, so give up.
> >
> >     If it succeeds, but a subsequent STOP_ARRAY fails, then it is
> > possibly
> >     a transient failure, so try again for a few seconds.
> >
> >     Signed-off-by: NeilBrown <neilb@suse.de>
> >
> > diff --git a/Manage.c b/Manage.c
> > index 5996646..3361269 100644
> > --- a/Manage.c
> > +++ b/Manage.c
> > @@ -208,10 +208,26 @@ int Manage_runstop(char *devname, int fd, int
> > runstop, int quiet)
> >  		struct stat stb;
> >  		struct mdinfo *mdi;
> >  		int devnum;
> > +		int err;
> > +		int count;
> >  		/* If this is an mdmon managed array, just write 'inactive'
> >  		 * to the array state and let mdmon clear up.
> >  		 */
> >  		devnum = fd2devnum(fd);
> > +		/* Get EXCL access first.  If this fails, then attempting
> > +		 * to stop is probably a bad idea.
> > +		 */
> > +		close(fd);
> > +		fd = open(devname, O_RDONLY|O_EXCL);
> > +		if (fd < 0 || fd2devnum(fd) != devnum) {
> > +			if (fd >= 0)
> > +				close(fd);
> > +			fprintf(stderr,
> > +				Name ": Cannot get exclusive access to %s:"
> > +				" possibly it is still in use.\n",
> > +				devname);
> > +			return 1;
> > +		}
> >  		mdi = sysfs_read(fd, -1, GET_LEVEL|GET_VERSION);
> >  		if (mdi &&
> >  		    mdi->array.level > 0 &&
> > @@ -230,7 +246,14 @@ int Manage_runstop(char *devname, int fd, int
> > runstop, int quiet)
> >  			/* Give monitor a chance to act */
> >  			ping_monitor(mdi->text_version);
> >
> > -			fd = open(devname, O_RDONLY);
> > +			fd = open_dev_excl(devnum);
> > +			if (fd < 0) {
> > +				fprintf(stderr, Name
> > +					": failed to completely stop %s"
> > +					": Device is busy\n",
> > +					devname);
> > +				return 1;
> > +			}
> >  		} else if (mdi &&
> >  			   mdi->array.major_version == -1 &&
> >  			   mdi->array.minor_version == -2 &&
> > @@ -263,7 +286,18 @@ int Manage_runstop(char *devname, int fd, int
> > runstop, int quiet)
> >  				}
> >  		}
> >
> > -		if (fd >= 0 && ioctl(fd, STOP_ARRAY, NULL)) {
> > +		/* As we have an O_EXCL open, any use of the device
> > +		 * which blocks STOP_ARRAY is probably a transient use,
> > +		 * so it is reasonable to retry for a while - 5 seconds.
> > +		 */
> > +		count = 25;
> > +		while (count && fd >= 0
> > +		       && (err = ioctl(fd, STOP_ARRAY, NULL)) < 0
> > +		       && errno == EBUSY) {
> > +			usleep(200000);
> > +			count --;
> > +		}
> > +		if (fd >= 0 && err) {
> >  			if (quiet == 0) {
> >  				fprintf(stderr, Name
> >  					": failed to stop array %s: %s\n",
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> > in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-03-17 21:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-07  6:44 [PATCH] fix: mdadm -Ss for external metadata don't stop container Hawrylewicz Czarnowski, Przemyslaw
2010-12-07 10:16 ` Neil Brown
2010-12-07 11:07   ` Hawrylewicz Czarnowski, Przemyslaw
2010-12-07 16:09     ` Dan Williams
2011-03-16 22:24   ` Hawrylewicz Czarnowski, Przemyslaw
2011-03-17  2:38     ` NeilBrown
2011-03-17 16:51       ` Wojcik, Krzysztof
2011-03-17 21:08         ` Hawrylewicz Czarnowski, Przemyslaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).