public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* mkfs.xfs pagefault when removed storage during operation
@ 2011-02-01 11:06 Ajeet Yadav
  2011-02-02  8:09 ` Ajeet Yadav
  2011-02-03  4:10 ` Eric Sandeen
  0 siblings, 2 replies; 7+ messages in thread
From: Ajeet Yadav @ 2011-02-01 11:06 UTC (permalink / raw)
  To: xfs

We are testing mkfs.xfs and xfs_repair stability to look for crashes
and other issues specially with removable devices.
And unfortunately crashes does occur.
Code inspection shows in most cases the caller does not handle
libxfs_readbuf() for error cases i.e when return value = NULL.

Now I need your suggestion.
We should fix all such cases or the simplest way is to exit... if
read() or write() fails with EIO errorno in libxfs_readbufr() and
libxfs_writebufr().
Fortunately these function already support exit, if we use flag
LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively.

The current problem is related to function libxfs_trans_read_buf()

       bp = libxfs_readbuf(dev, blkno, len, flags);
#ifdef XACT_DEBUG
        fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp);
#endif
        xfs_buf_item_init(bp, tp->t_mountp);
        bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
        bip->bli_recur = 0;
        xfs_trans_add_item(tp, (xfs_log_item_t *)bip);

        /* initialise b_fsprivate2 so we can find it incore */
        XFS_BUF_SET_FSPRIVATE2(bp, tp);
        *bpp = bp;
        return 0;

if  libxfs_readbuf() fails due to device removal or other error, bp = NULL.
In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is
dereferenced occurs

mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.xfs pagefault when removed storage during operation
  2011-02-01 11:06 mkfs.xfs pagefault when removed storage during operation Ajeet Yadav
@ 2011-02-02  8:09 ` Ajeet Yadav
  2011-02-08  1:35   ` Alex Elder
  2011-02-03  4:10 ` Eric Sandeen
  1 sibling, 1 reply; 7+ messages in thread
From: Ajeet Yadav @ 2011-02-02  8:09 UTC (permalink / raw)
  To: xfs

If I see the current sigfault, its easy to fix adding one more patch
to xfsprogs.

diff -Nurp xfsprogs-3.0.5/libxfs/rdwr.c xfsprogs-3.0.5-dirty/libxfs/rdwr.c
--- xfsprogs-3.0.5/libxfs/rdwr.c        2011-01-28 20:22:11.000000000 +0900
+++ xfsprogs-3.0.5-dirty/libxfs/rdwr.c  2011-02-02 16:59:16.000000000 +0900
@@ -207,10 +207,11 @@ libxfs_trace_readbuf(const char *func, c
 {
        xfs_buf_t       *bp = libxfs_readbuf(dev, blkno, len, flags);

-       bp->b_func = func;
-       bp->b_file = file;
-       bp->b_line = line;
-
+       if (bp){
+               bp->b_func = func;
+               bp->b_file = file;
+               bp->b_line = line;
+       }
        return bp;
 }

@@ -485,6 +486,7 @@ libxfs_readbuf(dev_t dev, xfs_daddr_t bl
                error = libxfs_readbufr(dev, blkno, bp, len, flags);
                if (error) {
                        libxfs_putbuf(bp);
+                       errno = error;
                        return NULL;
                }
        }
diff -Nurp xfsprogs-3.0.5/libxfs/trans.c xfsprogs-3.0.5-dirty/libxfs/trans.c
--- xfsprogs-3.0.5/libxfs/trans.c       2011-01-28 20:22:11.000000000 +0900
+++ xfsprogs-3.0.5-dirty/libxfs/trans.c 2011-02-02 17:00:42.000000000 +0900
@@ -508,6 +508,10 @@ libxfs_trans_read_buf(
        }

        bp = libxfs_readbuf(dev, blkno, len, flags);
+       if (!bp){
+               *bpp = NULL;
+               return errno;
+       }
 #ifdef XACT_DEBUG
        fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp);
 #endif


But when I start reviewing the complete project w.r.t read() /
read64() / write() / write64() more importantly libxfs_readbufr() /
libxfs_writebufr().
I find error handing is broken at may places and I get my self lost in
m^n complexity also errno is lost.. therefore caller cannot examine
the exact error,

Back again I think, What if I exit on error ? Does xfsprogs uses
read() / write() error as a part of its functionality, for example
does xfs_repair uses these errors as a part of repair funtionality.

diff -Nurp xfsprogs/libxfs/rdwr.c xfsprogs-dirty/libxfs/rdwr.c
--- xfsprogs/libxfs/rdwr.c        2011-01-28 20:22:11.000000000 +0900
+++ xfsprogs-dirty/libxfs/rdwr.c  2011-02-02 16:42:32.000000000 +0900
@@ -458,8 +458,7 @@ libxfs_readbufr(dev_t dev, xfs_daddr_t b
        if (pread64(fd, bp->b_addr, bytes, LIBXFS_BBTOOFF64(blkno)) < 0) {
                fprintf(stderr, _("%s: read failed: %s\n"),
                        progname, strerror(errno));
-               if (flags & LIBXFS_EXIT_ON_FAILURE)
-                       exit(1);
+               exit(1);
                return errno;
        }
 #ifdef IO_DEBUG
@@ -501,8 +500,7 @@ libxfs_writebufr(xfs_buf_t *bp)
        if (sts < 0) {
                fprintf(stderr, _("%s: pwrite64 failed: %s\n"),
                        progname, strerror(errno));
-               if (bp->b_flags & LIBXFS_B_EXIT)
-                       exit(1);
+               exit(1);
                return errno;
        }
        else if (sts != bp->b_bcount) {


On Tue, Feb 1, 2011 at 8:06 PM, Ajeet Yadav <ajeet.yadav.77@gmail.com> wrote:
> We are testing mkfs.xfs and xfs_repair stability to look for crashes
> and other issues specially with removable devices.
> And unfortunately crashes does occur.
> Code inspection shows in most cases the caller does not handle
> libxfs_readbuf() for error cases i.e when return value = NULL.
>
> Now I need your suggestion.
> We should fix all such cases or the simplest way is to exit... if
> read() or write() fails with EIO errorno in libxfs_readbufr() and
> libxfs_writebufr().
> Fortunately these function already support exit, if we use flag
> LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively.
>
> The current problem is related to function libxfs_trans_read_buf()
>
>       bp = libxfs_readbuf(dev, blkno, len, flags);
> #ifdef XACT_DEBUG
>        fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp);
> #endif
>        xfs_buf_item_init(bp, tp->t_mountp);
>        bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
>        bip->bli_recur = 0;
>        xfs_trans_add_item(tp, (xfs_log_item_t *)bip);
>
>        /* initialise b_fsprivate2 so we can find it incore */
>        XFS_BUF_SET_FSPRIVATE2(bp, tp);
>        *bpp = bp;
>        return 0;
>
> if  libxfs_readbuf() fails due to device removal or other error, bp = NULL.
> In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is
> dereferenced occurs
>
> mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.xfs pagefault when removed storage during operation
  2011-02-01 11:06 mkfs.xfs pagefault when removed storage during operation Ajeet Yadav
  2011-02-02  8:09 ` Ajeet Yadav
@ 2011-02-03  4:10 ` Eric Sandeen
  2011-02-03  6:03   ` Ajeet Yadav
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2011-02-03  4:10 UTC (permalink / raw)
  To: Ajeet Yadav; +Cc: xfs

On 2/1/11 5:06 AM, Ajeet Yadav wrote:
> We are testing mkfs.xfs and xfs_repair stability to look for crashes
> and other issues specially with removable devices.
> And unfortunately crashes does occur.
> Code inspection shows in most cases the caller does not handle
> libxfs_readbuf() for error cases i.e when return value = NULL.
> 
> Now I need your suggestion.
> We should fix all such cases or the simplest way is to exit... if
> read() or write() fails with EIO errorno in libxfs_readbufr() and
> libxfs_writebufr().

I see very little reason to gracefully handle all error cases
during mkfs. It would be prettier, yes, but if mkfs fails, with
or without an error, with or without a segfault, you have to 
just start it over anyway, right?

I think there are better places to focus effort.

-Eric

> Fortunately these function already support exit, if we use flag
> LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively.
> 
> The current problem is related to function libxfs_trans_read_buf()
> 
>        bp = libxfs_readbuf(dev, blkno, len, flags);
> #ifdef XACT_DEBUG
>         fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp);
> #endif
>         xfs_buf_item_init(bp, tp->t_mountp);
>         bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
>         bip->bli_recur = 0;
>         xfs_trans_add_item(tp, (xfs_log_item_t *)bip);
> 
>         /* initialise b_fsprivate2 so we can find it incore */
>         XFS_BUF_SET_FSPRIVATE2(bp, tp);
>         *bpp = bp;
>         return 0;
> 
> if  libxfs_readbuf() fails due to device removal or other error, bp = NULL.
> In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is
> dereferenced occurs
> 
> mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.xfs pagefault when removed storage during operation
  2011-02-03  4:10 ` Eric Sandeen
@ 2011-02-03  6:03   ` Ajeet Yadav
  2011-02-03  6:07     ` Eric Sandeen
  0 siblings, 1 reply; 7+ messages in thread
From: Ajeet Yadav @ 2011-02-03  6:03 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Sorry I do not agree, we have a bug so we cannot ignore it.
Solving at first place can save a lot of time if same problem create a
side effect that may sometime be very hard to catch.

Now lets consider the current problem
1. Its related to libxfs in xfsprogs, so its not mkfs issue anymore
2. If we come across any critical problem in libxfs we can cross
verify kernel xfs implementation to find if there also a logical
issue.
    One learning and be used in other part.
3. Yes I agree that if mkfs.xfs fails we have to re-run it anyways,
but then what is the difference between a novice code and professional
product.
     If you cscope libxfs_trans_read_buf() in xfsprogs, its caller
always checks the return value, and its used extensively in xfsprogs.
But this function always return 0. Infact there is no error handding
at all, lets not consider EIO error only.
4. We are here in open community out of need, at the same time to make
it better.

I was wondering why I am not getting any reply, I think mail subject
was wrong......mkfs ;)
I will release the patch, please take out time to review it.

On Thu, Feb 3, 2011 at 1:10 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 2/1/11 5:06 AM, Ajeet Yadav wrote:
>> We are testing mkfs.xfs and xfs_repair stability to look for crashes
>> and other issues specially with removable devices.
>> And unfortunately crashes does occur.
>> Code inspection shows in most cases the caller does not handle
>> libxfs_readbuf() for error cases i.e when return value = NULL.
>>
>> Now I need your suggestion.
>> We should fix all such cases or the simplest way is to exit... if
>> read() or write() fails with EIO errorno in libxfs_readbufr() and
>> libxfs_writebufr().
>
> I see very little reason to gracefully handle all error cases
> during mkfs. It would be prettier, yes, but if mkfs fails, with
> or without an error, with or without a segfault, you have to
> just start it over anyway, right?
>
> I think there are better places to focus effort.
>
> -Eric
>
>> Fortunately these function already support exit, if we use flag
>> LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively.
>>
>> The current problem is related to function libxfs_trans_read_buf()
>>
>>        bp = libxfs_readbuf(dev, blkno, len, flags);
>> #ifdef XACT_DEBUG
>>         fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp);
>> #endif
>>         xfs_buf_item_init(bp, tp->t_mountp);
>>         bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
>>         bip->bli_recur = 0;
>>         xfs_trans_add_item(tp, (xfs_log_item_t *)bip);
>>
>>         /* initialise b_fsprivate2 so we can find it incore */
>>         XFS_BUF_SET_FSPRIVATE2(bp, tp);
>>         *bpp = bp;
>>         return 0;
>>
>> if  libxfs_readbuf() fails due to device removal or other error, bp = NULL.
>> In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is
>> dereferenced occurs
>>
>> mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.xfs pagefault when removed storage during operation
  2011-02-03  6:03   ` Ajeet Yadav
@ 2011-02-03  6:07     ` Eric Sandeen
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2011-02-03  6:07 UTC (permalink / raw)
  To: Ajeet Yadav; +Cc: xfs

On 2/3/11 12:03 AM, Ajeet Yadav wrote:
> Sorry I do not agree, we have a bug so we cannot ignore it.
> Solving at first place can save a lot of time if same problem create a
> side effect that may sometime be very hard to catch.
> 
> Now lets consider the current problem
> 1. Its related to libxfs in xfsprogs, so its not mkfs issue anymore
> 2. If we come across any critical problem in libxfs we can cross
> verify kernel xfs implementation to find if there also a logical
> issue.
>     One learning and be used in other part.
> 3. Yes I agree that if mkfs.xfs fails we have to re-run it anyways,
> but then what is the difference between a novice code and professional
> product.
>      If you cscope libxfs_trans_read_buf() in xfsprogs, its caller
> always checks the return value, and its used extensively in xfsprogs.
> But this function always return 0. Infact there is no error handding
> at all, lets not consider EIO error only.
> 4. We are here in open community out of need, at the same time to make
> it better.
> 
> I was wondering why I am not getting any reply, I think mail subject
> was wrong......mkfs ;)

I may have jumped at that too quickly, yes ;)

> I will release the patch, please take out time to review it.

Well, that's fair enough, that's how it works - if it's important to you,
and you want to fix it, then you can!  And if properly done it gets
merged.

-Eric

> On Thu, Feb 3, 2011 at 1:10 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>> On 2/1/11 5:06 AM, Ajeet Yadav wrote:
>>> We are testing mkfs.xfs and xfs_repair stability to look for crashes
>>> and other issues specially with removable devices.
>>> And unfortunately crashes does occur.
>>> Code inspection shows in most cases the caller does not handle
>>> libxfs_readbuf() for error cases i.e when return value = NULL.
>>>
>>> Now I need your suggestion.
>>> We should fix all such cases or the simplest way is to exit... if
>>> read() or write() fails with EIO errorno in libxfs_readbufr() and
>>> libxfs_writebufr().
>>
>> I see very little reason to gracefully handle all error cases
>> during mkfs. It would be prettier, yes, but if mkfs fails, with
>> or without an error, with or without a segfault, you have to
>> just start it over anyway, right?
>>
>> I think there are better places to focus effort.
>>
>> -Eric
>>
>>> Fortunately these function already support exit, if we use flag
>>> LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively.
>>>
>>> The current problem is related to function libxfs_trans_read_buf()
>>>
>>>        bp = libxfs_readbuf(dev, blkno, len, flags);
>>> #ifdef XACT_DEBUG
>>>         fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp);
>>> #endif
>>>         xfs_buf_item_init(bp, tp->t_mountp);
>>>         bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
>>>         bip->bli_recur = 0;
>>>         xfs_trans_add_item(tp, (xfs_log_item_t *)bip);
>>>
>>>         /* initialise b_fsprivate2 so we can find it incore */
>>>         XFS_BUF_SET_FSPRIVATE2(bp, tp);
>>>         *bpp = bp;
>>>         return 0;
>>>
>>> if  libxfs_readbuf() fails due to device removal or other error, bp = NULL.
>>> In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is
>>> dereferenced occurs
>>>
>>> mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
>>
>>
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.xfs pagefault when removed storage during operation
  2011-02-02  8:09 ` Ajeet Yadav
@ 2011-02-08  1:35   ` Alex Elder
  2011-02-08  2:27     ` Ajeet Yadav
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Elder @ 2011-02-08  1:35 UTC (permalink / raw)
  To: Ajeet Yadav; +Cc: xfs

On Wed, 2011-02-02 at 17:09 +0900, Ajeet Yadav wrote:
> If I see the current sigfault, its easy to fix adding one more patch
> to xfsprogs.
. . .
> 
> But when I start reviewing the complete project w.r.t read() /
> read64() / write() / write64() more importantly libxfs_readbufr() /
> libxfs_writebufr().
> I find error handing is broken at may places and I get my self lost in
> m^n complexity also errno is lost.. therefore caller cannot examine
> the exact error,
> 
> Back again I think, What if I exit on error ? Does xfsprogs uses
> read() / write() error as a part of its functionality, for example
> does xfs_repair uses these errors as a part of repair funtionality.
. . .

I think these are good observations and questions.

This doesn't answer your questions, but I want to mention to
you that libxfs is just about to have a major update that
affects lots of files.  I expect the update will *not*
address the issues you point out here, but I guess what I'd
like to do is get us moved to the updated code before trying
to address problems that might be widespread like this.

I hope to complete this update process soon, but I expect
it will be a week or more before that happens.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.xfs pagefault when removed storage during operation
  2011-02-08  1:35   ` Alex Elder
@ 2011-02-08  2:27     ` Ajeet Yadav
  0 siblings, 0 replies; 7+ messages in thread
From: Ajeet Yadav @ 2011-02-08  2:27 UTC (permalink / raw)
  To: aelder; +Cc: xfs

On Tue, Feb 8, 2011 at 10:35 AM, Alex Elder <aelder@sgi.com> wrote:
> On Wed, 2011-02-02 at 17:09 +0900, Ajeet Yadav wrote:
>> If I see the current sigfault, its easy to fix adding one more patch
>> to xfsprogs.
> . . .
>>
>> But when I start reviewing the complete project w.r.t read() /
>> read64() / write() / write64() more importantly libxfs_readbufr() /
>> libxfs_writebufr().
>> I find error handing is broken at may places and I get my self lost in
>> m^n complexity also errno is lost.. therefore caller cannot examine
>> the exact error,
>>
>> Back again I think, What if I exit on error ? Does xfsprogs uses
>> read() / write() error as a part of its functionality, for example
>> does xfs_repair uses these errors as a part of repair funtionality.
> . . .
>
> I think these are good observations and questions.
>
> This doesn't answer your questions, but I want to mention to
> you that libxfs is just about to have a major update that
> affects lots of files.  I expect the update will *not*
> address the issues you point out here, but I guess what I'd
> like to do is get us moved to the updated code before trying
> to address problems that might be widespread like this.
>
> I hope to complete this update process soon, but I expect
> it will be a week or more before that happens.
>
>                                        -Alex
>
>
Thanks, I am waiting for the next release of xfsprogs.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-02-08  2:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-01 11:06 mkfs.xfs pagefault when removed storage during operation Ajeet Yadav
2011-02-02  8:09 ` Ajeet Yadav
2011-02-08  1:35   ` Alex Elder
2011-02-08  2:27     ` Ajeet Yadav
2011-02-03  4:10 ` Eric Sandeen
2011-02-03  6:03   ` Ajeet Yadav
2011-02-03  6:07     ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox