linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors
@ 2017-11-16  1:14 Darrick J. Wong
  2017-11-16 21:10 ` Eric Sandeen
  0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2017-11-16  1:14 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs, djwong

From: Darrick J. Wong <darrick.wong@oracle.com>

If xfs_copy is told to copy a filesystem and /all/ the writer threads
hit an write error, there won't be any threads to unlock mainwait, which
means that write_wbuf will deadlock with itself trying to lock mainwait.
Therefore, if we discover that all the writer threads are dead, just
bail out.

Discovered by running xfs/073 with a tiny test device.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 copy/xfs_copy.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
index 33e05df..fb37375 100644
--- a/copy/xfs_copy.c
+++ b/copy/xfs_copy.c
@@ -476,6 +476,7 @@ void
 write_wbuf(void)
 {
 	int		i;
+	int		badness = 0;
 
 	/* verify target threads */
 	for (i = 0; i < num_targets; i++)
@@ -486,6 +487,17 @@ write_wbuf(void)
 	for (i = 0; i < num_targets; i++)
 		if (target[i].state != INACTIVE)
 			pthread_mutex_unlock(&targ[i].wait);	/* wake up */
+		else
+			badness++;
+
+	/*
+	 * If all the targets are inactive then there won't be any io
+	 * threads left to release mainwait.  We're screwed, so bail out.
+	 */
+	if (badness == num_targets) {
+		check_errors();
+		exit(1);
+	}
 
 	signal_maskfunc(SIGCHLD, SIG_UNBLOCK);
 	pthread_mutex_lock(&mainwait);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors
  2017-11-16  1:14 [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors Darrick J. Wong
@ 2017-11-16 21:10 ` Eric Sandeen
  2017-11-17  3:45   ` Darrick J. Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Sandeen @ 2017-11-16 21:10 UTC (permalink / raw)
  To: darrick.wong, Eric Sandeen; +Cc: xfs, djwong



On 11/15/17 7:14 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> If xfs_copy is told to copy a filesystem and /all/ the writer threads
> hit an write error, there won't be any threads to unlock mainwait, which
> means that write_wbuf will deadlock with itself trying to lock mainwait.
> Therefore, if we discover that all the writer threads are dead, just
> bail out.
> 
> Discovered by running xfs/073 with a tiny test device.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  copy/xfs_copy.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
> index 33e05df..fb37375 100644
> --- a/copy/xfs_copy.c
> +++ b/copy/xfs_copy.c
> @@ -476,6 +476,7 @@ void
>  write_wbuf(void)
>  {
>  	int		i;
> +	int		badness = 0;
>  
>  	/* verify target threads */
>  	for (i = 0; i < num_targets; i++)
> @@ -486,6 +487,17 @@ write_wbuf(void)
>  	for (i = 0; i < num_targets; i++)
>  		if (target[i].state != INACTIVE)
>  			pthread_mutex_unlock(&targ[i].wait);	/* wake up */
> +		else
> +			badness++;
> +
> +	/*
> +	 * If all the targets are inactive then there won't be any io
> +	 * threads left to release mainwait.  We're screwed, so bail out.
> +	 */
> +	if (badness == num_targets) {
> +		check_errors();

libxfs_umount(mp); ?

-Eric

> +		exit(1);
> +	}
>  
>  	signal_maskfunc(SIGCHLD, SIG_UNBLOCK);
>  	pthread_mutex_lock(&mainwait);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors
  2017-11-16 21:10 ` Eric Sandeen
@ 2017-11-17  3:45   ` Darrick J. Wong
  2017-11-17  4:48     ` Darrick J. Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2017-11-17  3:45 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Eric Sandeen, xfs, djwong

On Thu, Nov 16, 2017 at 03:10:39PM -0600, Eric Sandeen wrote:
> 
> 
> On 11/15/17 7:14 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > If xfs_copy is told to copy a filesystem and /all/ the writer threads
> > hit an write error, there won't be any threads to unlock mainwait, which
> > means that write_wbuf will deadlock with itself trying to lock mainwait.
> > Therefore, if we discover that all the writer threads are dead, just
> > bail out.
> > 
> > Discovered by running xfs/073 with a tiny test device.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  copy/xfs_copy.c |   12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
> > index 33e05df..fb37375 100644
> > --- a/copy/xfs_copy.c
> > +++ b/copy/xfs_copy.c
> > @@ -476,6 +476,7 @@ void
> >  write_wbuf(void)
> >  {
> >  	int		i;
> > +	int		badness = 0;
> >  
> >  	/* verify target threads */
> >  	for (i = 0; i < num_targets; i++)
> > @@ -486,6 +487,17 @@ write_wbuf(void)
> >  	for (i = 0; i < num_targets; i++)
> >  		if (target[i].state != INACTIVE)
> >  			pthread_mutex_unlock(&targ[i].wait);	/* wake up */
> > +		else
> > +			badness++;
> > +
> > +	/*
> > +	 * If all the targets are inactive then there won't be any io
> > +	 * threads left to release mainwait.  We're screwed, so bail out.
> > +	 */
> > +	if (badness == num_targets) {
> > +		check_errors();
> 
> libxfs_umount(mp); ?

Doh. v2 on its way

--D

> -Eric
> 
> > +		exit(1);
> > +	}
> >  
> >  	signal_maskfunc(SIGCHLD, SIG_UNBLOCK);
> >  	pthread_mutex_lock(&mainwait);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors
  2017-11-17  3:45   ` Darrick J. Wong
@ 2017-11-17  4:48     ` Darrick J. Wong
  2017-11-17  5:05       ` Eric Sandeen
  0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2017-11-17  4:48 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Eric Sandeen, xfs, djwong

On Thu, Nov 16, 2017 at 07:45:09PM -0800, Darrick J. Wong wrote:
> On Thu, Nov 16, 2017 at 03:10:39PM -0600, Eric Sandeen wrote:
> > 
> > 
> > On 11/15/17 7:14 PM, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > If xfs_copy is told to copy a filesystem and /all/ the writer threads
> > > hit an write error, there won't be any threads to unlock mainwait, which
> > > means that write_wbuf will deadlock with itself trying to lock mainwait.
> > > Therefore, if we discover that all the writer threads are dead, just
> > > bail out.
> > > 
> > > Discovered by running xfs/073 with a tiny test device.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  copy/xfs_copy.c |   12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
> > > index 33e05df..fb37375 100644
> > > --- a/copy/xfs_copy.c
> > > +++ b/copy/xfs_copy.c
> > > @@ -476,6 +476,7 @@ void
> > >  write_wbuf(void)
> > >  {
> > >  	int		i;
> > > +	int		badness = 0;
> > >  
> > >  	/* verify target threads */
> > >  	for (i = 0; i < num_targets; i++)
> > > @@ -486,6 +487,17 @@ write_wbuf(void)
> > >  	for (i = 0; i < num_targets; i++)
> > >  		if (target[i].state != INACTIVE)
> > >  			pthread_mutex_unlock(&targ[i].wait);	/* wake up */
> > > +		else
> > > +			badness++;
> > > +
> > > +	/*
> > > +	 * If all the targets are inactive then there won't be any io
> > > +	 * threads left to release mainwait.  We're screwed, so bail out.
> > > +	 */
> > > +	if (badness == num_targets) {
> > > +		check_errors();
> > 
> > libxfs_umount(mp); ?
> 
> Doh. v2 on its way

Hmmm.  The other error bailouts don't call libxfs_umount and it hardly
matters since we're exiting anyway.  The mp is a local variable to main
so we'd have to convey abort status out of write_wbuf back to main.
That's a bigger change; do you want me to pursue that instead?

--D

> --D
> 
> > -Eric
> > 
> > > +		exit(1);
> > > +	}
> > >  
> > >  	signal_maskfunc(SIGCHLD, SIG_UNBLOCK);
> > >  	pthread_mutex_lock(&mainwait);
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors
  2017-11-17  4:48     ` Darrick J. Wong
@ 2017-11-17  5:05       ` Eric Sandeen
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2017-11-17  5:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Eric Sandeen, xfs, djwong



On 11/16/17 10:48 PM, Darrick J. Wong wrote:
> On Thu, Nov 16, 2017 at 07:45:09PM -0800, Darrick J. Wong wrote:
>> On Thu, Nov 16, 2017 at 03:10:39PM -0600, Eric Sandeen wrote:
>>>
>>>
>>> On 11/15/17 7:14 PM, Darrick J. Wong wrote:
>>>> From: Darrick J. Wong <darrick.wong@oracle.com>
>>>>
>>>> If xfs_copy is told to copy a filesystem and /all/ the writer threads
>>>> hit an write error, there won't be any threads to unlock mainwait, which
>>>> means that write_wbuf will deadlock with itself trying to lock mainwait.
>>>> Therefore, if we discover that all the writer threads are dead, just
>>>> bail out.
>>>>
>>>> Discovered by running xfs/073 with a tiny test device.
>>>>
>>>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>>>> ---
>>>>  copy/xfs_copy.c |   12 ++++++++++++
>>>>  1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
>>>> index 33e05df..fb37375 100644
>>>> --- a/copy/xfs_copy.c
>>>> +++ b/copy/xfs_copy.c
>>>> @@ -476,6 +476,7 @@ void
>>>>  write_wbuf(void)
>>>>  {
>>>>  	int		i;
>>>> +	int		badness = 0;
>>>>  
>>>>  	/* verify target threads */
>>>>  	for (i = 0; i < num_targets; i++)
>>>> @@ -486,6 +487,17 @@ write_wbuf(void)
>>>>  	for (i = 0; i < num_targets; i++)
>>>>  		if (target[i].state != INACTIVE)
>>>>  			pthread_mutex_unlock(&targ[i].wait);	/* wake up */
>>>> +		else
>>>> +			badness++;
>>>> +
>>>> +	/*
>>>> +	 * If all the targets are inactive then there won't be any io
>>>> +	 * threads left to release mainwait.  We're screwed, so bail out.
>>>> +	 */
>>>> +	if (badness == num_targets) {
>>>> +		check_errors();
>>>
>>> libxfs_umount(mp); ?
>>
>> Doh. v2 on its way
> 
> Hmmm.  The other error bailouts don't call libxfs_umount and it hardly
> matters since we're exiting anyway.  The mp is a local variable to main
> so we'd have to convey abort status out of write_wbuf back to main.
> That's a bigger change; do you want me to pursue that instead?

Eh if there's precedent for such sloppiness, I guess we can stick with
V1.  ;)

Thanks for checking.

-Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-17  5:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-16  1:14 [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors Darrick J. Wong
2017-11-16 21:10 ` Eric Sandeen
2017-11-17  3:45   ` Darrick J. Wong
2017-11-17  4:48     ` Darrick J. Wong
2017-11-17  5:05       ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).