* mkfs.xfs pagefault when removed storage during operation
@ 2011-02-01 11:06 Ajeet Yadav
2011-02-02 8:09 ` Ajeet Yadav
2011-02-03 4:10 ` Eric Sandeen
0 siblings, 2 replies; 7+ messages in thread
From: Ajeet Yadav @ 2011-02-01 11:06 UTC (permalink / raw)
To: xfs
We are testing mkfs.xfs and xfs_repair stability to look for crashes
and other issues specially with removable devices.
And unfortunately crashes does occur.
Code inspection shows in most cases the caller does not handle
libxfs_readbuf() for error cases i.e when return value = NULL.
Now I need your suggestion.
We should fix all such cases or the simplest way is to exit... if
read() or write() fails with EIO errorno in libxfs_readbufr() and
libxfs_writebufr().
Fortunately these function already support exit, if we use flag
LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively.
The current problem is related to function libxfs_trans_read_buf()
bp = libxfs_readbuf(dev, blkno, len, flags);
#ifdef XACT_DEBUG
fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp);
#endif
xfs_buf_item_init(bp, tp->t_mountp);
bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
bip->bli_recur = 0;
xfs_trans_add_item(tp, (xfs_log_item_t *)bip);
/* initialise b_fsprivate2 so we can find it incore */
XFS_BUF_SET_FSPRIVATE2(bp, tp);
*bpp = bp;
return 0;
if libxfs_readbuf() fails due to device removal or other error, bp = NULL.
In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is
dereferenced occurs
mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: mkfs.xfs pagefault when removed storage during operation 2011-02-01 11:06 mkfs.xfs pagefault when removed storage during operation Ajeet Yadav @ 2011-02-02 8:09 ` Ajeet Yadav 2011-02-08 1:35 ` Alex Elder 2011-02-03 4:10 ` Eric Sandeen 1 sibling, 1 reply; 7+ messages in thread From: Ajeet Yadav @ 2011-02-02 8:09 UTC (permalink / raw) To: xfs If I see the current sigfault, its easy to fix adding one more patch to xfsprogs. diff -Nurp xfsprogs-3.0.5/libxfs/rdwr.c xfsprogs-3.0.5-dirty/libxfs/rdwr.c --- xfsprogs-3.0.5/libxfs/rdwr.c 2011-01-28 20:22:11.000000000 +0900 +++ xfsprogs-3.0.5-dirty/libxfs/rdwr.c 2011-02-02 16:59:16.000000000 +0900 @@ -207,10 +207,11 @@ libxfs_trace_readbuf(const char *func, c { xfs_buf_t *bp = libxfs_readbuf(dev, blkno, len, flags); - bp->b_func = func; - bp->b_file = file; - bp->b_line = line; - + if (bp){ + bp->b_func = func; + bp->b_file = file; + bp->b_line = line; + } return bp; } @@ -485,6 +486,7 @@ libxfs_readbuf(dev_t dev, xfs_daddr_t bl error = libxfs_readbufr(dev, blkno, bp, len, flags); if (error) { libxfs_putbuf(bp); + errno = error; return NULL; } } diff -Nurp xfsprogs-3.0.5/libxfs/trans.c xfsprogs-3.0.5-dirty/libxfs/trans.c --- xfsprogs-3.0.5/libxfs/trans.c 2011-01-28 20:22:11.000000000 +0900 +++ xfsprogs-3.0.5-dirty/libxfs/trans.c 2011-02-02 17:00:42.000000000 +0900 @@ -508,6 +508,10 @@ libxfs_trans_read_buf( } bp = libxfs_readbuf(dev, blkno, len, flags); + if (!bp){ + *bpp = NULL; + return errno; + } #ifdef XACT_DEBUG fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp); #endif But when I start reviewing the complete project w.r.t read() / read64() / write() / write64() more importantly libxfs_readbufr() / libxfs_writebufr(). I find error handing is broken at may places and I get my self lost in m^n complexity also errno is lost.. therefore caller cannot examine the exact error, Back again I think, What if I exit on error ? Does xfsprogs uses read() / write() error as a part of its functionality, for example does xfs_repair uses these errors as a part of repair funtionality. diff -Nurp xfsprogs/libxfs/rdwr.c xfsprogs-dirty/libxfs/rdwr.c --- xfsprogs/libxfs/rdwr.c 2011-01-28 20:22:11.000000000 +0900 +++ xfsprogs-dirty/libxfs/rdwr.c 2011-02-02 16:42:32.000000000 +0900 @@ -458,8 +458,7 @@ libxfs_readbufr(dev_t dev, xfs_daddr_t b if (pread64(fd, bp->b_addr, bytes, LIBXFS_BBTOOFF64(blkno)) < 0) { fprintf(stderr, _("%s: read failed: %s\n"), progname, strerror(errno)); - if (flags & LIBXFS_EXIT_ON_FAILURE) - exit(1); + exit(1); return errno; } #ifdef IO_DEBUG @@ -501,8 +500,7 @@ libxfs_writebufr(xfs_buf_t *bp) if (sts < 0) { fprintf(stderr, _("%s: pwrite64 failed: %s\n"), progname, strerror(errno)); - if (bp->b_flags & LIBXFS_B_EXIT) - exit(1); + exit(1); return errno; } else if (sts != bp->b_bcount) { On Tue, Feb 1, 2011 at 8:06 PM, Ajeet Yadav <ajeet.yadav.77@gmail.com> wrote: > We are testing mkfs.xfs and xfs_repair stability to look for crashes > and other issues specially with removable devices. > And unfortunately crashes does occur. > Code inspection shows in most cases the caller does not handle > libxfs_readbuf() for error cases i.e when return value = NULL. > > Now I need your suggestion. > We should fix all such cases or the simplest way is to exit... if > read() or write() fails with EIO errorno in libxfs_readbufr() and > libxfs_writebufr(). > Fortunately these function already support exit, if we use flag > LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively. > > The current problem is related to function libxfs_trans_read_buf() > > bp = libxfs_readbuf(dev, blkno, len, flags); > #ifdef XACT_DEBUG > fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp); > #endif > xfs_buf_item_init(bp, tp->t_mountp); > bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *); > bip->bli_recur = 0; > xfs_trans_add_item(tp, (xfs_log_item_t *)bip); > > /* initialise b_fsprivate2 so we can find it incore */ > XFS_BUF_SET_FSPRIVATE2(bp, tp); > *bpp = bp; > return 0; > > if libxfs_readbuf() fails due to device removal or other error, bp = NULL. > In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is > dereferenced occurs > > mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017 > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mkfs.xfs pagefault when removed storage during operation 2011-02-02 8:09 ` Ajeet Yadav @ 2011-02-08 1:35 ` Alex Elder 2011-02-08 2:27 ` Ajeet Yadav 0 siblings, 1 reply; 7+ messages in thread From: Alex Elder @ 2011-02-08 1:35 UTC (permalink / raw) To: Ajeet Yadav; +Cc: xfs On Wed, 2011-02-02 at 17:09 +0900, Ajeet Yadav wrote: > If I see the current sigfault, its easy to fix adding one more patch > to xfsprogs. . . . > > But when I start reviewing the complete project w.r.t read() / > read64() / write() / write64() more importantly libxfs_readbufr() / > libxfs_writebufr(). > I find error handing is broken at may places and I get my self lost in > m^n complexity also errno is lost.. therefore caller cannot examine > the exact error, > > Back again I think, What if I exit on error ? Does xfsprogs uses > read() / write() error as a part of its functionality, for example > does xfs_repair uses these errors as a part of repair funtionality. . . . I think these are good observations and questions. This doesn't answer your questions, but I want to mention to you that libxfs is just about to have a major update that affects lots of files. I expect the update will *not* address the issues you point out here, but I guess what I'd like to do is get us moved to the updated code before trying to address problems that might be widespread like this. I hope to complete this update process soon, but I expect it will be a week or more before that happens. -Alex _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mkfs.xfs pagefault when removed storage during operation 2011-02-08 1:35 ` Alex Elder @ 2011-02-08 2:27 ` Ajeet Yadav 0 siblings, 0 replies; 7+ messages in thread From: Ajeet Yadav @ 2011-02-08 2:27 UTC (permalink / raw) To: aelder; +Cc: xfs On Tue, Feb 8, 2011 at 10:35 AM, Alex Elder <aelder@sgi.com> wrote: > On Wed, 2011-02-02 at 17:09 +0900, Ajeet Yadav wrote: >> If I see the current sigfault, its easy to fix adding one more patch >> to xfsprogs. > . . . >> >> But when I start reviewing the complete project w.r.t read() / >> read64() / write() / write64() more importantly libxfs_readbufr() / >> libxfs_writebufr(). >> I find error handing is broken at may places and I get my self lost in >> m^n complexity also errno is lost.. therefore caller cannot examine >> the exact error, >> >> Back again I think, What if I exit on error ? Does xfsprogs uses >> read() / write() error as a part of its functionality, for example >> does xfs_repair uses these errors as a part of repair funtionality. > . . . > > I think these are good observations and questions. > > This doesn't answer your questions, but I want to mention to > you that libxfs is just about to have a major update that > affects lots of files. I expect the update will *not* > address the issues you point out here, but I guess what I'd > like to do is get us moved to the updated code before trying > to address problems that might be widespread like this. > > I hope to complete this update process soon, but I expect > it will be a week or more before that happens. > > -Alex > > Thanks, I am waiting for the next release of xfsprogs. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mkfs.xfs pagefault when removed storage during operation 2011-02-01 11:06 mkfs.xfs pagefault when removed storage during operation Ajeet Yadav 2011-02-02 8:09 ` Ajeet Yadav @ 2011-02-03 4:10 ` Eric Sandeen 2011-02-03 6:03 ` Ajeet Yadav 1 sibling, 1 reply; 7+ messages in thread From: Eric Sandeen @ 2011-02-03 4:10 UTC (permalink / raw) To: Ajeet Yadav; +Cc: xfs On 2/1/11 5:06 AM, Ajeet Yadav wrote: > We are testing mkfs.xfs and xfs_repair stability to look for crashes > and other issues specially with removable devices. > And unfortunately crashes does occur. > Code inspection shows in most cases the caller does not handle > libxfs_readbuf() for error cases i.e when return value = NULL. > > Now I need your suggestion. > We should fix all such cases or the simplest way is to exit... if > read() or write() fails with EIO errorno in libxfs_readbufr() and > libxfs_writebufr(). I see very little reason to gracefully handle all error cases during mkfs. It would be prettier, yes, but if mkfs fails, with or without an error, with or without a segfault, you have to just start it over anyway, right? I think there are better places to focus effort. -Eric > Fortunately these function already support exit, if we use flag > LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively. > > The current problem is related to function libxfs_trans_read_buf() > > bp = libxfs_readbuf(dev, blkno, len, flags); > #ifdef XACT_DEBUG > fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp); > #endif > xfs_buf_item_init(bp, tp->t_mountp); > bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *); > bip->bli_recur = 0; > xfs_trans_add_item(tp, (xfs_log_item_t *)bip); > > /* initialise b_fsprivate2 so we can find it incore */ > XFS_BUF_SET_FSPRIVATE2(bp, tp); > *bpp = bp; > return 0; > > if libxfs_readbuf() fails due to device removal or other error, bp = NULL. > In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is > dereferenced occurs > > mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mkfs.xfs pagefault when removed storage during operation 2011-02-03 4:10 ` Eric Sandeen @ 2011-02-03 6:03 ` Ajeet Yadav 2011-02-03 6:07 ` Eric Sandeen 0 siblings, 1 reply; 7+ messages in thread From: Ajeet Yadav @ 2011-02-03 6:03 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs Sorry I do not agree, we have a bug so we cannot ignore it. Solving at first place can save a lot of time if same problem create a side effect that may sometime be very hard to catch. Now lets consider the current problem 1. Its related to libxfs in xfsprogs, so its not mkfs issue anymore 2. If we come across any critical problem in libxfs we can cross verify kernel xfs implementation to find if there also a logical issue. One learning and be used in other part. 3. Yes I agree that if mkfs.xfs fails we have to re-run it anyways, but then what is the difference between a novice code and professional product. If you cscope libxfs_trans_read_buf() in xfsprogs, its caller always checks the return value, and its used extensively in xfsprogs. But this function always return 0. Infact there is no error handding at all, lets not consider EIO error only. 4. We are here in open community out of need, at the same time to make it better. I was wondering why I am not getting any reply, I think mail subject was wrong......mkfs ;) I will release the patch, please take out time to review it. On Thu, Feb 3, 2011 at 1:10 PM, Eric Sandeen <sandeen@sandeen.net> wrote: > On 2/1/11 5:06 AM, Ajeet Yadav wrote: >> We are testing mkfs.xfs and xfs_repair stability to look for crashes >> and other issues specially with removable devices. >> And unfortunately crashes does occur. >> Code inspection shows in most cases the caller does not handle >> libxfs_readbuf() for error cases i.e when return value = NULL. >> >> Now I need your suggestion. >> We should fix all such cases or the simplest way is to exit... if >> read() or write() fails with EIO errorno in libxfs_readbufr() and >> libxfs_writebufr(). > > I see very little reason to gracefully handle all error cases > during mkfs. It would be prettier, yes, but if mkfs fails, with > or without an error, with or without a segfault, you have to > just start it over anyway, right? > > I think there are better places to focus effort. > > -Eric > >> Fortunately these function already support exit, if we use flag >> LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively. >> >> The current problem is related to function libxfs_trans_read_buf() >> >> bp = libxfs_readbuf(dev, blkno, len, flags); >> #ifdef XACT_DEBUG >> fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp); >> #endif >> xfs_buf_item_init(bp, tp->t_mountp); >> bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *); >> bip->bli_recur = 0; >> xfs_trans_add_item(tp, (xfs_log_item_t *)bip); >> >> /* initialise b_fsprivate2 so we can find it incore */ >> XFS_BUF_SET_FSPRIVATE2(bp, tp); >> *bpp = bp; >> return 0; >> >> if libxfs_readbuf() fails due to device removal or other error, bp = NULL. >> In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is >> dereferenced occurs >> >> mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017 >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs >> > > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mkfs.xfs pagefault when removed storage during operation 2011-02-03 6:03 ` Ajeet Yadav @ 2011-02-03 6:07 ` Eric Sandeen 0 siblings, 0 replies; 7+ messages in thread From: Eric Sandeen @ 2011-02-03 6:07 UTC (permalink / raw) To: Ajeet Yadav; +Cc: xfs On 2/3/11 12:03 AM, Ajeet Yadav wrote: > Sorry I do not agree, we have a bug so we cannot ignore it. > Solving at first place can save a lot of time if same problem create a > side effect that may sometime be very hard to catch. > > Now lets consider the current problem > 1. Its related to libxfs in xfsprogs, so its not mkfs issue anymore > 2. If we come across any critical problem in libxfs we can cross > verify kernel xfs implementation to find if there also a logical > issue. > One learning and be used in other part. > 3. Yes I agree that if mkfs.xfs fails we have to re-run it anyways, > but then what is the difference between a novice code and professional > product. > If you cscope libxfs_trans_read_buf() in xfsprogs, its caller > always checks the return value, and its used extensively in xfsprogs. > But this function always return 0. Infact there is no error handding > at all, lets not consider EIO error only. > 4. We are here in open community out of need, at the same time to make > it better. > > I was wondering why I am not getting any reply, I think mail subject > was wrong......mkfs ;) I may have jumped at that too quickly, yes ;) > I will release the patch, please take out time to review it. Well, that's fair enough, that's how it works - if it's important to you, and you want to fix it, then you can! And if properly done it gets merged. -Eric > On Thu, Feb 3, 2011 at 1:10 PM, Eric Sandeen <sandeen@sandeen.net> wrote: >> On 2/1/11 5:06 AM, Ajeet Yadav wrote: >>> We are testing mkfs.xfs and xfs_repair stability to look for crashes >>> and other issues specially with removable devices. >>> And unfortunately crashes does occur. >>> Code inspection shows in most cases the caller does not handle >>> libxfs_readbuf() for error cases i.e when return value = NULL. >>> >>> Now I need your suggestion. >>> We should fix all such cases or the simplest way is to exit... if >>> read() or write() fails with EIO errorno in libxfs_readbufr() and >>> libxfs_writebufr(). >> >> I see very little reason to gracefully handle all error cases >> during mkfs. It would be prettier, yes, but if mkfs fails, with >> or without an error, with or without a segfault, you have to >> just start it over anyway, right? >> >> I think there are better places to focus effort. >> >> -Eric >> >>> Fortunately these function already support exit, if we use flag >>> LIBXFS_EXIT_ON_FAILURE, LIBXFS_B_EXIT but they are used selectively. >>> >>> The current problem is related to function libxfs_trans_read_buf() >>> >>> bp = libxfs_readbuf(dev, blkno, len, flags); >>> #ifdef XACT_DEBUG >>> fprintf(stderr, "trans_read_buf buffer %p, transaction %p\n", bp, tp); >>> #endif >>> xfs_buf_item_init(bp, tp->t_mountp); >>> bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *); >>> bip->bli_recur = 0; >>> xfs_trans_add_item(tp, (xfs_log_item_t *)bip); >>> >>> /* initialise b_fsprivate2 so we can find it incore */ >>> XFS_BUF_SET_FSPRIVATE2(bp, tp); >>> *bpp = bp; >>> return 0; >>> >>> if libxfs_readbuf() fails due to device removal or other error, bp = NULL. >>> In function xfs_buf_item_init(bp, tp->t_mountp) as soon as bp is >>> dereferenced occurs >>> >>> mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017 >>> >>> _______________________________________________ >>> xfs mailing list >>> xfs@oss.sgi.com >>> http://oss.sgi.com/mailman/listinfo/xfs >>> >> >> > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-02-08 2:25 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-01 11:06 mkfs.xfs pagefault when removed storage during operation Ajeet Yadav 2011-02-02 8:09 ` Ajeet Yadav 2011-02-08 1:35 ` Alex Elder 2011-02-08 2:27 ` Ajeet Yadav 2011-02-03 4:10 ` Eric Sandeen 2011-02-03 6:03 ` Ajeet Yadav 2011-02-03 6:07 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox