* [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors
@ 2017-11-16 1:14 Darrick J. Wong
2017-11-16 21:10 ` Eric Sandeen
0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2017-11-16 1:14 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs, djwong
From: Darrick J. Wong <darrick.wong@oracle.com>
If xfs_copy is told to copy a filesystem and /all/ the writer threads
hit an write error, there won't be any threads to unlock mainwait, which
means that write_wbuf will deadlock with itself trying to lock mainwait.
Therefore, if we discover that all the writer threads are dead, just
bail out.
Discovered by running xfs/073 with a tiny test device.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
copy/xfs_copy.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
index 33e05df..fb37375 100644
--- a/copy/xfs_copy.c
+++ b/copy/xfs_copy.c
@@ -476,6 +476,7 @@ void
write_wbuf(void)
{
int i;
+ int badness = 0;
/* verify target threads */
for (i = 0; i < num_targets; i++)
@@ -486,6 +487,17 @@ write_wbuf(void)
for (i = 0; i < num_targets; i++)
if (target[i].state != INACTIVE)
pthread_mutex_unlock(&targ[i].wait); /* wake up */
+ else
+ badness++;
+
+ /*
+ * If all the targets are inactive then there won't be any io
+ * threads left to release mainwait. We're screwed, so bail out.
+ */
+ if (badness == num_targets) {
+ check_errors();
+ exit(1);
+ }
signal_maskfunc(SIGCHLD, SIG_UNBLOCK);
pthread_mutex_lock(&mainwait);
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors
2017-11-16 1:14 [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors Darrick J. Wong
@ 2017-11-16 21:10 ` Eric Sandeen
2017-11-17 3:45 ` Darrick J. Wong
0 siblings, 1 reply; 5+ messages in thread
From: Eric Sandeen @ 2017-11-16 21:10 UTC (permalink / raw)
To: darrick.wong, Eric Sandeen; +Cc: xfs, djwong
On 11/15/17 7:14 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> If xfs_copy is told to copy a filesystem and /all/ the writer threads
> hit an write error, there won't be any threads to unlock mainwait, which
> means that write_wbuf will deadlock with itself trying to lock mainwait.
> Therefore, if we discover that all the writer threads are dead, just
> bail out.
>
> Discovered by running xfs/073 with a tiny test device.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> copy/xfs_copy.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
> index 33e05df..fb37375 100644
> --- a/copy/xfs_copy.c
> +++ b/copy/xfs_copy.c
> @@ -476,6 +476,7 @@ void
> write_wbuf(void)
> {
> int i;
> + int badness = 0;
>
> /* verify target threads */
> for (i = 0; i < num_targets; i++)
> @@ -486,6 +487,17 @@ write_wbuf(void)
> for (i = 0; i < num_targets; i++)
> if (target[i].state != INACTIVE)
> pthread_mutex_unlock(&targ[i].wait); /* wake up */
> + else
> + badness++;
> +
> + /*
> + * If all the targets are inactive then there won't be any io
> + * threads left to release mainwait. We're screwed, so bail out.
> + */
> + if (badness == num_targets) {
> + check_errors();
libxfs_umount(mp); ?
-Eric
> + exit(1);
> + }
>
> signal_maskfunc(SIGCHLD, SIG_UNBLOCK);
> pthread_mutex_lock(&mainwait);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors
2017-11-16 21:10 ` Eric Sandeen
@ 2017-11-17 3:45 ` Darrick J. Wong
2017-11-17 4:48 ` Darrick J. Wong
0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2017-11-17 3:45 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Eric Sandeen, xfs, djwong
On Thu, Nov 16, 2017 at 03:10:39PM -0600, Eric Sandeen wrote:
>
>
> On 11/15/17 7:14 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > If xfs_copy is told to copy a filesystem and /all/ the writer threads
> > hit an write error, there won't be any threads to unlock mainwait, which
> > means that write_wbuf will deadlock with itself trying to lock mainwait.
> > Therefore, if we discover that all the writer threads are dead, just
> > bail out.
> >
> > Discovered by running xfs/073 with a tiny test device.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > copy/xfs_copy.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
> > index 33e05df..fb37375 100644
> > --- a/copy/xfs_copy.c
> > +++ b/copy/xfs_copy.c
> > @@ -476,6 +476,7 @@ void
> > write_wbuf(void)
> > {
> > int i;
> > + int badness = 0;
> >
> > /* verify target threads */
> > for (i = 0; i < num_targets; i++)
> > @@ -486,6 +487,17 @@ write_wbuf(void)
> > for (i = 0; i < num_targets; i++)
> > if (target[i].state != INACTIVE)
> > pthread_mutex_unlock(&targ[i].wait); /* wake up */
> > + else
> > + badness++;
> > +
> > + /*
> > + * If all the targets are inactive then there won't be any io
> > + * threads left to release mainwait. We're screwed, so bail out.
> > + */
> > + if (badness == num_targets) {
> > + check_errors();
>
> libxfs_umount(mp); ?
Doh. v2 on its way
--D
> -Eric
>
> > + exit(1);
> > + }
> >
> > signal_maskfunc(SIGCHLD, SIG_UNBLOCK);
> > pthread_mutex_lock(&mainwait);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors
2017-11-17 3:45 ` Darrick J. Wong
@ 2017-11-17 4:48 ` Darrick J. Wong
2017-11-17 5:05 ` Eric Sandeen
0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2017-11-17 4:48 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Eric Sandeen, xfs, djwong
On Thu, Nov 16, 2017 at 07:45:09PM -0800, Darrick J. Wong wrote:
> On Thu, Nov 16, 2017 at 03:10:39PM -0600, Eric Sandeen wrote:
> >
> >
> > On 11/15/17 7:14 PM, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > >
> > > If xfs_copy is told to copy a filesystem and /all/ the writer threads
> > > hit an write error, there won't be any threads to unlock mainwait, which
> > > means that write_wbuf will deadlock with itself trying to lock mainwait.
> > > Therefore, if we discover that all the writer threads are dead, just
> > > bail out.
> > >
> > > Discovered by running xfs/073 with a tiny test device.
> > >
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > > copy/xfs_copy.c | 12 ++++++++++++
> > > 1 file changed, 12 insertions(+)
> > >
> > > diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
> > > index 33e05df..fb37375 100644
> > > --- a/copy/xfs_copy.c
> > > +++ b/copy/xfs_copy.c
> > > @@ -476,6 +476,7 @@ void
> > > write_wbuf(void)
> > > {
> > > int i;
> > > + int badness = 0;
> > >
> > > /* verify target threads */
> > > for (i = 0; i < num_targets; i++)
> > > @@ -486,6 +487,17 @@ write_wbuf(void)
> > > for (i = 0; i < num_targets; i++)
> > > if (target[i].state != INACTIVE)
> > > pthread_mutex_unlock(&targ[i].wait); /* wake up */
> > > + else
> > > + badness++;
> > > +
> > > + /*
> > > + * If all the targets are inactive then there won't be any io
> > > + * threads left to release mainwait. We're screwed, so bail out.
> > > + */
> > > + if (badness == num_targets) {
> > > + check_errors();
> >
> > libxfs_umount(mp); ?
>
> Doh. v2 on its way
Hmmm. The other error bailouts don't call libxfs_umount and it hardly
matters since we're exiting anyway. The mp is a local variable to main
so we'd have to convey abort status out of write_wbuf back to main.
That's a bigger change; do you want me to pursue that instead?
--D
> --D
>
> > -Eric
> >
> > > + exit(1);
> > > + }
> > >
> > > signal_maskfunc(SIGCHLD, SIG_UNBLOCK);
> > > pthread_mutex_lock(&mainwait);
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors
2017-11-17 4:48 ` Darrick J. Wong
@ 2017-11-17 5:05 ` Eric Sandeen
0 siblings, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2017-11-17 5:05 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Eric Sandeen, xfs, djwong
On 11/16/17 10:48 PM, Darrick J. Wong wrote:
> On Thu, Nov 16, 2017 at 07:45:09PM -0800, Darrick J. Wong wrote:
>> On Thu, Nov 16, 2017 at 03:10:39PM -0600, Eric Sandeen wrote:
>>>
>>>
>>> On 11/15/17 7:14 PM, Darrick J. Wong wrote:
>>>> From: Darrick J. Wong <darrick.wong@oracle.com>
>>>>
>>>> If xfs_copy is told to copy a filesystem and /all/ the writer threads
>>>> hit an write error, there won't be any threads to unlock mainwait, which
>>>> means that write_wbuf will deadlock with itself trying to lock mainwait.
>>>> Therefore, if we discover that all the writer threads are dead, just
>>>> bail out.
>>>>
>>>> Discovered by running xfs/073 with a tiny test device.
>>>>
>>>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>>>> ---
>>>> copy/xfs_copy.c | 12 ++++++++++++
>>>> 1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
>>>> index 33e05df..fb37375 100644
>>>> --- a/copy/xfs_copy.c
>>>> +++ b/copy/xfs_copy.c
>>>> @@ -476,6 +476,7 @@ void
>>>> write_wbuf(void)
>>>> {
>>>> int i;
>>>> + int badness = 0;
>>>>
>>>> /* verify target threads */
>>>> for (i = 0; i < num_targets; i++)
>>>> @@ -486,6 +487,17 @@ write_wbuf(void)
>>>> for (i = 0; i < num_targets; i++)
>>>> if (target[i].state != INACTIVE)
>>>> pthread_mutex_unlock(&targ[i].wait); /* wake up */
>>>> + else
>>>> + badness++;
>>>> +
>>>> + /*
>>>> + * If all the targets are inactive then there won't be any io
>>>> + * threads left to release mainwait. We're screwed, so bail out.
>>>> + */
>>>> + if (badness == num_targets) {
>>>> + check_errors();
>>>
>>> libxfs_umount(mp); ?
>>
>> Doh. v2 on its way
>
> Hmmm. The other error bailouts don't call libxfs_umount and it hardly
> matters since we're exiting anyway. The mp is a local variable to main
> so we'd have to convey abort status out of write_wbuf back to main.
> That's a bigger change; do you want me to pursue that instead?
Eh if there's precedent for such sloppiness, I guess we can stick with
V1. ;)
Thanks for checking.
-Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-11-17 5:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-16 1:14 [PATCH for 4.14] xfs_copy: don't hang if /all/ the targets hit write errors Darrick J. Wong
2017-11-16 21:10 ` Eric Sandeen
2017-11-17 3:45 ` Darrick J. Wong
2017-11-17 4:48 ` Darrick J. Wong
2017-11-17 5:05 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).